php - Simple HTML DOM - traversing html -
i'm using simple html dom parser - http://simplehtmldom.sourceforge.net/manual.htm i'm trying scrape data scoreboard page. below example shows me pulling html of "akron rushing" table.
inside $tr->find('td', 0)
, first column, there hyperlink. how can extract hyperlink? using $tr->find('td', 0')->find('a')
not seem work.
also: can write conditions each table (passing, rushing, receiving, etc), there more efficient way this? i'm open ideas on one.
include('simple_html_dom.php'); $html = file_get_html('http://espn.go.com/ncf/boxscore?gameid=322432006'); $teama['rushing'] = $html->find('table.mod-data',5); foreach ($teama $type=>$data) { switch ($type) { # rushing table case "rushing": foreach ($data->find('tr') $tr) { echo $tr->find('td', 0); // first td column (player name) echo $tr->find('td', 1); // second td column (carries) echo $tr->find('td', 2); // third td column (yards) echo $tr->find('td', 3); // fourth td column (avg) echo $tr->find('td', 4); // fifth td column (tds) echo $tr->find('td', 5); // sixth td column (lgs) echo "<hr />"; } } }
in case, find('tr')
returns 10 elments instead of 7 rows expected only.
also, not names has links associated them, trying retrieve link when doesnt exist may return error.
therefore, here's modified working version of code:
$url = 'http://espn.go.com/ncf/boxscore?gameid=322432006'; $html = file_get_html('http://espn.go.com/ncf/boxscore?gameid=322432006'); $teama['rushing'] = $html->find('table.mod-data',5); foreach ($teama $type=>$data) { switch ($type) { # rushing table case "rushing": echo count($data->find('tr')) . " \$tr found !<br />"; foreach ($data->find('tr') $key => $tr) { $td = $tr->find('td'); if (isset($td[0])) { echo "<br />"; echo $td[0]->plaintext . " | "; // first td column (player name) // if anchor exists if($anchor = $td[0]->find('a', 0)) echo $anchor->href; // href echo " | "; echo $td[1]->plaintext . " | "; // second td column (carries) echo $td[2]->plaintext . " | "; // third td column (yards) echo $td[3]->plaintext . " | "; // fourth td column (avg) echo $td[4]->plaintext . " | "; // fifth td column (tds) echo $td[5]->plaintext; // sixth td column (lgs) echo "<hr />"; } } } }
as can see, attribute can reched using format $tag->attributename
. in case, attributename
href
notes:
it idea handle find's errors, knowing returns "false" when nothing found
$td = $tr->find('td'); // find suceeded if ($td) { // code here } else echo "find() failed in xxxxx";
php simple html dom parser has known memory leaks issues php5, don't forget free memory when dom objects no more used:
$html = file_get_html(...); // something... $html->clear(); unset($html); source: http://simplehtmldom.sourceforge.net/manual_faq.htm#memory_leak
Comments
Post a Comment