node.js - Scrape a webpage and navigate by clicking buttons -


i want perform following actions @ server side:

1) scrape webpage
2) simulate click on page , navigate new page.
3) scrape new page
4) simulate button clicks on new page
5) sending data client via json or

i thinking of using node.js.

but confused module should use
a) zombie
b) node.io
c) phantomjs
d) jsdom
e) else

i have installed node,io not able run via command prompt.

ps: working in windows 2008 server

zombie.js , node.io run on jsdom, hence options either going jsdom (or equivalent wrapper), headless browser (phantomjs, slimerjs) or cheerio.

  • jsdom slow because has recreate dom , cssom in node.js.
  • phantomjs/slimerjs proper headless browsers, performances ok , reliable.
  • cheerio lightweight alternative jsdom. doesn't recreate entire page in node.js (it downloads , parses dom - no javascript executed). therefore can't click on buttons/links, it's fast scrape webpages.

given requirements, i'd go headless browser. in particular, i'd choose casperjs because has nice , expressive api, it's fast , reliable (it doesn't need reinvent wheel on how parse , render dom or css jsdom does) , it's easy interact elements such buttons , links.

your workflow in casperjs should more or less this:

casper.start();  casper   .then(function(){     console.log("start:");   })   .thenopen("https://www.domain.com/page1")   .then(function(){     // scrape     this.echo(this.gethtml('h1#foobar'));   })   .thenclick("#button1")   .then(function(){     // scrape else     this.echo(this.gethtml('h2#foobar'));   })   .thenclick("#button2")   thenopen("http://myserver.com", {     method: "post",     data: {         my: 'data',     }   }, function() {       this.echo("data sent server")   });  casper.run();  

Comments

Popular posts from this blog

css - Which browser returns the correct result for getBoundingClientRect of an SVG element? -

gcc - Calling fftR4() in c from assembly -

.htaccess - Matching full URL in RewriteCond -