Python/Django: How to convert utf-16 str bytes to unicode? -
fellows,
i unable parse unicode text file submitted using django forms. here quick steps performed:
uploaded text file ( encoding: utf-16 ) ( file contents:
hello world 13)on server side, received file using
filename = request.files['file_field']going line line:
for line in filename: yield linetype(filename)gives me<class 'django.core.files.uploadedfile.inmemoryuploadedfile'>type(line)<type 'str'>print line:'\xff\xfeh\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00 \x001\x003\x00'codecs.bom_utf16_le == line[:2]returnstruenow, want re-construct unicode or ascii string "hello world 13" can parse integer line.
one of ugliest way of doing retrieve using line[-5:] (= '\x001\x003\x00') , construct using line[-5:][1], line[-5:][3].
i sure there must better way of doing this. please help.
thanks in advance!
use codecs.iterdecode() decode object on fly:
from codecs import iterdecode line in iterdecode(filename, 'utf16'): yield line
Comments
Post a Comment