Python/Django: How to convert utf-16 str bytes to unicode? -
fellows,
i unable parse unicode text file submitted using django forms. here quick steps performed:
uploaded text file ( encoding: utf-16 ) ( file contents:
hello world 13
)on server side, received file using
filename = request.files['file_field']
going line line:
for line in filename: yield line
type(filename)
gives me<class 'django.core.files.uploadedfile.inmemoryuploadedfile'>
type(line)
<type 'str'>
print line
:'\xff\xfeh\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00 \x001\x003\x00'
codecs.bom_utf16_le == line[:2]
returnstrue
now, want re-construct unicode or ascii string "hello world 13" can parse integer line.
one of ugliest way of doing retrieve using line[-5:]
(= '\x001\x003\x00'
) , construct using line[-5:][1]
, line[-5:][3]
.
i sure there must better way of doing this. please help.
thanks in advance!
use codecs.iterdecode()
decode object on fly:
from codecs import iterdecode line in iterdecode(filename, 'utf16'): yield line
Comments
Post a Comment