xml - Large document XPath query performance -


with 5 mb document, following query takes libxml2 3 seconds evaluate. there speed things up? need resulting node-set further processing, no count, etc.

thanks!

descendant::text() | descendant::* [ self::p or self::h1 or self::h2 or self::h3 or self::h4 or self::h5 or self::h6 or self::dl or self::dt or self::dd or self::ol or self::ul or self::li or self::dir or self::address or self::blockquote or self::center or self::del or self::div or self::hr or self::ins or self::pre ] 

edit:

using descendant::node()[self::text() or self::p or ... suggested jens erat (see accepted answer) improved speed; original 2.865330s perfect 0.164336s.

benchmarking without document benchmark on difficult.

two ideas optimizing:

  • use few descendant:: axis steps possible. they're expensive , can speed little bit. can combine text() , element tests this:

    descendant::node()[self::text() or self::h1 or self::h2] 

    and extend elements (i'm keeping query short better readability).

  • use string-tests instead of node tests. could faster (probably aren't, see comments answer). need keep text() test, of course.

    descendant::node()[self::text() or local-name(.) = 'h1' or local-name(.) = 'h2'] 

if you're querying same document, think using native xml database basex, exist db, zorba, marklogic, ... (the first 3 free). they're putting indices on data , should able serve results much faster (and support xpath 2.0/xquery, makes developing easier). of them have apis large set of programming languages.


Comments

Popular posts from this blog

css - Which browser returns the correct result for getBoundingClientRect of an SVG element? -

gcc - Calling fftR4() in c from assembly -

Function that returns a formatted array in VBA -