character encoding - Java URLEncode giving different results -
i have code stub:
system.out.println(param+"="+value); param = urlencoder.encode(param, "utf-8"); value = urlencoder.encode(value, "utf-8"); system.out.println(param+"="+value);
this gives result in eclipse:
p=指甲油 p=%e6%8c%87%e7%94%b2%e6%b2%b9
but when run same code command line, following output:
p=指甲油 p=%c3%8a%c3%a5%c3%a1%c3%81%c3%ae%e2%89%a4%c3%8a%e2%89%a4%cf%80
what problem?
your mac using mac os roman encoding in terminal. chinese characters incorrectly been interpreted using mac os roman encoding instead of utf-8 encoding before sending java.
as evidence, chinese characters exist in utf-8 encoding of following (hex) bytes:
指
= 0xe6 0x8c 0x87甲
= 0xe7 0x94 0xb2油
= 0xe6 0xb2 0xb9
then check mac os roman codepage layout, (hex) bytes represent following characters:
- 0xe6 0x8c 0x87 =
Ê
å
á
- 0xe7 0x94 0xb2 =
Á
î
≤
- 0xe6 0xb2 0xb9 =
Ê
≤
π
now, put them , url-encode them using utf-8:
system.out.println(urlencoder.encode("指甲油", "utf-8"));
look prints?
%c3%8a%c3%a5%c3%a1%c3%81%c3%ae%e2%89%a4%c3%8a%e2%89%a4%cf%80
to fix problem, tell mac use utf-8 encoding in terminal. honestly, can't answer part off top of head don't mac. eclipse encoding configuration totally fine, case that, configure via window > preferences > general > workspace > text file encoding.
update: missed comment:
i reading value text file
if variables originating text file instead of commandline input — expected —, need solve problem differently. apparently, using reader
implementation using runtime environment's default character encoding so:
reader reader = new filereader("/file.txt"); // ...
you should instead explicitly specifying desired encoding while creating reader. can inputstreamreader
constructor.
reader reader = new inputstreamreader(new fileinputstream("/file.txt"), "utf-8"); // ...
this explicitly tell java read /file.txt
using utf-8 instead of runtime environment's default encoding available charset#defaultcharset()
.
system.out.println("this runtime environment uses default charset " + charset.defaultcharset());
Comments
Post a Comment