parsing - Parse HTML and Get All h3's After an h2 Before the Next h2 Using PHP -
i looking find first h2 in article. once found, h3's until next h2 found. rinse , repeat until headings , subheadings have been located.
before flag or close question duplicate parsing question, please take note of question title, isn't basic node retrieval. i've got part down.
i using domdocument
parse html using domdocument::loadhtml()
, domdocument::getelementsbytagname()
, domdocument::savehtml()
retrieve important headings of article.
my code follows:
$matches = array(); $dom = new domdocument; $dom->loadhtml($content); foreach($dom->getelementsbytagname('h2') $node) { $matches['heading-two'][] = $dom->savehtml($node); } foreach($dom->getelementsbytagname('h3') $node) { $matches['heading-three'][] = $dom->savehtml($node); } if($matches){ $this->key_points = $matches; }
which gives me output of like:
array( 'heading-two' => array( '<h2>here first heading two</h2>', '<h2>here second heading two</h2>' ), 'heading-three' => array( '<h3>here first h3</h3>', '<h3>here second h3</h3>', '<h3>here third h3</h3>', '<h3>here fourth h3</h3>', ) );
i'm looking have more like:
array( '<h2>here first heading two</h2>' => array( '<h3>here h3 under first h2</h3>', '<h3>here h3 found under first h2, after first h3</h3>' ), '<h2>here second heading two</h2>' => array( '<h3>here h3 under second h2</h3>', '<h3>here h3 found under second h2, after first h3</h3>' ) );
i'm not looking code completion (if feel better others doing -- go ahead), more or less guidance or advice in right direction accomplish nested array directly above above.
i assume headings on same level in dom, every h3 sibling of h2. assumption , can iterate on siblings of h2 until next h2 encountered:
foreach($dom->getelementsbytagname('h2') $node) { $key = $dom->savehtml($node); $matches[$key] = array(); while(($node = $node->nextsibling) && $node->nodename !== 'h2') { if($node->nodename == 'h3') { $matches[$key][] = $dom->savehtml($node); } } }
Comments
Post a Comment