xml - Large document XPath query performance -
with 5 mb document, following query takes libxml2
3 seconds evaluate. there speed things up? need resulting node-set further processing, no count
, etc.
thanks!
descendant::text() | descendant::* [ self::p or self::h1 or self::h2 or self::h3 or self::h4 or self::h5 or self::h6 or self::dl or self::dt or self::dd or self::ol or self::ul or self::li or self::dir or self::address or self::blockquote or self::center or self::del or self::div or self::hr or self::ins or self::pre ]
edit:
using descendant::node()[self::text() or self::p or ...
suggested jens erat (see accepted answer) improved speed; original 2.865330s perfect 0.164336s.
benchmarking without document benchmark on difficult.
two ideas optimizing:
use few
descendant::
axis steps possible. they're expensive , can speed little bit. can combinetext()
, element tests this:descendant::node()[self::text() or self::h1 or self::h2]
and extend elements (i'm keeping query short better readability).
use string-tests instead of node tests. could faster (probably aren't, see comments answer). need keep
text()
test, of course.descendant::node()[self::text() or local-name(.) = 'h1' or local-name(.) = 'h2']
if you're querying same document, think using native xml database basex, exist db, zorba, marklogic, ... (the first 3 free). they're putting indices on data , should able serve results much faster (and support xpath 2.0/xquery, makes developing easier). of them have apis large set of programming languages.
Comments
Post a Comment