java - Getting some data from HTML using regex -
i trying data html. code:
public static void main(string[] args) { final string str = "<div class=\"b-vacancy-list-salary\">\n" + " 50 000\n" + " 70 000\n" + " usd.\n" + " </div>"; system.out.println(arrays.tostring(gettagvalues(str).toarray())); } static final string tag = "<div class=\"b-vacancy-list-salary\">\n"; private static final pattern tag_regex = pattern.compile(tag+"(.+?)</div>"); private static list<string> gettagvalues(final string str) { system.out.println(tag); final list<string> tagvalues = new arraylist<string>(); final matcher matcher = tag_regex.matcher(str); while (matcher.find()) { tagvalues.add(matcher.group(1)); } return tagvalues; }
it returns []
, not value. what's wrong?
you can remove line feed
.
the better way parse html use dom parser or xpath.
e.g :
public static void main(string[] args) { final string str = "<div class=\"b-vacancy-list-salary\">\n" + " 50 000\n" + " 70 000\n" + " usd.\n" + " </div>"; system.out.println(arrays.tostring(gettagvalues(str).toarray())); } static final string tag = "<div class=\"b-vacancy-list-salary\">"; private static final pattern tag_regex = pattern.compile(tag + "(.+?)</div>"); private static list<string> gettagvalues(final string str) { system.out.println(tag); final list<string> tagvalues = new arraylist<string>(); final matcher matcher = tag_regex.matcher(str.replace("\n", "")); while (matcher.find()) { tagvalues.add(matcher.group(1).trim()); } return tagvalues; }
Comments
Post a Comment