scripting - Selecting "" links with mechanize in ruby -


i made script in ruby uses mechanize. goes google.com, logs in , image search cats. next want select 1 of results links page , save image.

my problem links of results shown empty strings im not sure how specify , click them.

here output of pp page can see links im talking about. note first link suggested links, can click because have title "past 24 hours" second link actual result search cannot click.

#<mechanize::page::link   "past 24 hours"   "/search?q=cats&hl=en&gbv=1&ie=utf8&tbm=isch&source=lnt&tbs=qdr:d&sa=x&ei=t8kduu7ab4f8iwkzx4hobg&ved=0ccqqpwuoaq">  #<mechanize::page::link "" "http://www.google.com/imgres?imgurl=http://jasonlefkowitz.net/wp-content/uploads/2013/07/cute-cats-cats-33440930-1280-800.jpg&imgrefurl=http://jasonlefkowitz.net/2013/07/slideshow-20-cats-that-suck-at-reducing-tensions-in-the-israeli-palestinian-conflict/&usg=__1yeuvke4a9r6iirkcz9pu6ahn8q=&h=800&w=1280&sz=433&hl=en&start=1&sig2=ekqjelpnqsk-qq2r-4teeq&zoom=1&tbnid=xz9p1wd4o4tslm:&tbnh=94&tbnw=150&ei=b8sduq36ge3figlczoby&itbs=1&sa=x&ved=0ccwqrqmwaa"> 

now here snip of output of:

page.links.each |link| puts link.text. end 

which display links on page.

more large face photo clip art line drawing animated past 24 hours past week reset tools                    funny cats cats , kittens cats musical cute cats lots of cats cats guns 2 3 4 5 6 7 8 9 10 next 

notice whitespace on screen? empty name "" links on pp page output. have ideas on how can click one?

here code script.

require 'mechanize' agent = mechanize.new page = agent.get('https://google.com') page = agent.page.link_with(:text => 'sign in').click # pp page sign_in = page.form()       ##leave empty = nil sign_in.email = '10halec' sign_in.passwd = 'password' page = agent.submit(sign_in)  page = agent.page.link_with(:text => 'images').click search = page.form('f') search.q = 'cats' page = agent.submit(search)  # pp page  # agent.page.image_with(:src => /imgres?/).fetch.save page = agent.page.link_with(:text => '').click # pp page  # page.links.each |link| #   puts link.text # end pp page  def save filename = nil   filename = find_free_name filename   save! filename end 

notice whitespace on screen? empty name "" links on pp page output. have ideas on how can click one?

page = agent.page.link_with(:text => '').click

that line works me. put both of following html pages in local apache server's htdocs directory(a publicly accessible directory):

page1.html:

<!doctype html> <html>   <head><title>test</title></head>   <body>     <div><a href="/somesite.com/cat1.jpg">cat1</a></div>     <div><a href="/page2.html"></a></div>     <div><a href="/somesite.com/cat3.jpg"></a></div>   </body> </html> 

page2.html:

<!doctype html> <html>   <head><title>page2</title></head>   <body>     <div>hello</div>   </body> </html> 

then started server, meant page1.html accessible in browser using url:

http://localhost:8080/page1.html 

then ran ruby program:

require 'mechanize'  agent = mechanize.new agent.get('http://localhost:8080/page1.html') pp agent.page  page = agent.page.link_with(:text => '').click puts page.title  

...and output was:

#<mechanize::page  {url #<uri::http:0x00000100c8dc18 url:http://localhost:8080/page1.html>}  {meta_refresh}  {title "test"}  {iframes}  {frames}  {links   #<mechanize::page::link "cat1" "/somesite.com/cat1.jpg">   #<mechanize::page::link "" "/page2.html">   #<mechanize::page::link "" "/somesite.com/cat3.jpg">}  {forms}>  page2 

the pp page output looks same output, , able click on link has no text--as evidenced output page2.

the problem code that link_with() returns first match. if use links_with(), matching links:

require 'mechanize'  agent = mechanize.new agent.get('http://localhost:8080/page1.html')  links = agent.page.links_with(:text => '') p links  --output:-- [#<mechanize::page::link "" "/page2.html"> , #<mechanize::page::link "" "/somesite.com/cat3.jpg"> ] 

i see actual html of links having problems with.


Comments

Popular posts from this blog

css - Which browser returns the correct result for getBoundingClientRect of an SVG element? -

gcc - Calling fftR4() in c from assembly -

.htaccess - Matching full URL in RewriteCond -