What 8-bit encodings use the C1 range for characters? (x80—x9F or 128—159) -
wikipedia has listing of x80—x9f "c1" range under latin 1 supplement unicode. range reserved in iso-8859-1 codepage.
i'm looking @ file of strings, of within 7-bit ascii range except few instances of \x96
looks dash be, such middle of street address.
i don't know if other characters in c1 range might show in data, i'd know if there's correct way read file. there 8-bit encodings use x80 through x9f character data instead of terminal control characters?
there large number (potentially infinite number) of 8-bit encodings assign graphic characters or bytes in range 0x80 0x9f. several encodings defined microsoft have u+2013 en dash “–” @ byte position 0x96, , character conceivably appear in street address, between numbers.
on other hand, e.g. macroman has letter “ñ” @ position 0x96, , appear within street name in spanish, example.
for rational analysis of situation, should inspect data whole, possibly using filter finds bytes outside ascii range 0x00 0x7f, @ contexts in characters appear, , try find technical information origin of data.
Comments
Post a Comment