vb.net - Parsing HTML with VB DOTNET -
i trying parse data website specific items tables. know tag bgcolor attribute set #ffffff or #f4f4ff want start , actual data sits in 2nd within .
currently have:
private sub runform() dim theelementcollection htmlelementcollection = webbrowser1.document.getelementsbytagname("tr") each curelement htmlelement in theelementcollection dim controlvalue string = curelement.getattribute("bgcolor").tostring msgbox(controlvalue) if controlvalue.equals("#f4f4ff") or controlvalue.equals("#ffffff") end if next end sub
this code gets tr element need, have no idea how (if possible) investigate inner elements. if not, think best route take? site not label of tables. 's looking like:
<td><b><font size="2"><a href="/movie/?id=movietitle.htm">the movie</a></font></b></td>
i want pull out "the movie" text , add text file.
use innerhtml
property of htmlelement
object (curelement
) have, this:
for each curelement htmlelement in theelementcollection dim controlvalue string = curelement.getattribute("bgcolor").tostring msgbox(controlvalue) if controlvalue.equals("#f4f4ff") or controlvalue.equals("#ffffff") dim elementvalue string = curelement.innerhtml end if next
read documentation of htmlelement.innerhtml property more information.
update:
to second child of <tr>
html element, use combination of firstchild
, nextsibling
, this:
for each curelement htmlelement in theelementcollection dim controlvalue string = curelement.getattribute("bgcolor").tostring msgbox(controlvalue) if controlvalue.equals("#f4f4ff") or controlvalue.equals("#ffffff") dim firstchildelement = curelement.firstchild dim secondchildelement = firstchildelement.nextsibling ' secondchildelement should second <td>, value of inner html dim elementvalue string = secondchildelement.innerhtml end if next
Comments
Post a Comment