php - Simple HTML DOM - traversing html -

- February 15, 2012

i'm using simple html dom parser - http://simplehtmldom.sourceforge.net/manual.htm i'm trying scrape data scoreboard page. below example shows me pulling html of "akron rushing" table.

inside $tr->find('td', 0), first column, there hyperlink. how can extract hyperlink? using $tr->find('td', 0')->find('a') not seem work.

also: can write conditions each table (passing, rushing, receiving, etc), there more efficient way this? i'm open ideas on one.

include('simple_html_dom.php'); $html = file_get_html('http://espn.go.com/ncf/boxscore?gameid=322432006');  $teama['rushing'] = $html->find('table.mod-data',5);  foreach ($teama $type=>$data) {   switch ($type) {     # rushing table     case "rushing":        foreach ($data->find('tr') $tr) {         echo $tr->find('td', 0);    // first td column (player name)         echo $tr->find('td', 1);    // second td column (carries)         echo $tr->find('td', 2);    // third td column (yards)         echo $tr->find('td', 3);    // fourth td column (avg)         echo $tr->find('td', 4);    // fifth td column (tds)         echo $tr->find('td', 5);    // sixth td column (lgs)         echo "<hr />";         }    } }

in case, find('tr') returns 10 elments instead of 7 rows expected only.

also, not names has links associated them, trying retrieve link when doesnt exist may return error.

therefore, here's modified working version of code:

$url = 'http://espn.go.com/ncf/boxscore?gameid=322432006';  $html = file_get_html('http://espn.go.com/ncf/boxscore?gameid=322432006');  $teama['rushing'] = $html->find('table.mod-data',5);  foreach ($teama $type=>$data) {   switch ($type) {     # rushing table     case "rushing":         echo count($data->find('tr')) . " \$tr found !<br />";          foreach ($data->find('tr') $key => $tr) {              $td = $tr->find('td');              if (isset($td[0])) {                 echo "<br />";                 echo $td[0]->plaintext . " | ";         // first td column (player name)                  // if anchor exists                 if($anchor = $td[0]->find('a', 0))                     echo $anchor->href;                 // href                  echo " | ";                  echo $td[1]->plaintext . " | ";     // second td column (carries)                 echo $td[2]->plaintext . " | ";     // third td column (yards)                 echo $td[3]->plaintext . " | ";     // fourth td column (avg)                 echo $td[4]->plaintext . " | ";     // fifth td column (tds)                 echo $td[5]->plaintext;             // sixth td column (lgs)                 echo "<hr />";             }          }    } }

as can see, attribute can reched using format $tag->attributename. in case, attributename href

notes:

it idea handle find's errors, knowing returns "false" when nothing found

$td = $tr->find('td');  // find suceeded if ($td) {     // code here } else   echo "find() failed in xxxxx";

php simple html dom parser has known memory leaks issues php5, don't forget free memory when dom objects no more used:

$html = file_get_html(...);  // something...   $html->clear();  unset($html);  source: http://simplehtmldom.sourceforge.net/manual_faq.htm#memory_leak

Search This Blog

Share

php - Simple HTML DOM - traversing html -

notes:

Comments

Post a Comment

Popular posts from this blog

Line ending issue with Mercurial or Visual Studio -

php - Retrieving data submitted with Yii's CActiveForm -

fatal error - Android RunTimeError: Java.lang.RunTimeException: Unable to Instantiate activity -