scripting - Selecting "" links with mechanize in ruby -
i made script in ruby uses mechanize. goes google.com, logs in , image search cats. next want select 1 of results links page , save image.
my problem links of results shown empty strings im not sure how specify , click them.
here output of pp page can see links im talking about. note first link suggested links, can click because have title "past 24 hours" second link actual result search cannot click.
#<mechanize::page::link "past 24 hours" "/search?q=cats&hl=en&gbv=1&ie=utf8&tbm=isch&source=lnt&tbs=qdr:d&sa=x&ei=t8kduu7ab4f8iwkzx4hobg&ved=0ccqqpwuoaq"> #<mechanize::page::link "" "http://www.google.com/imgres?imgurl=http://jasonlefkowitz.net/wp-content/uploads/2013/07/cute-cats-cats-33440930-1280-800.jpg&imgrefurl=http://jasonlefkowitz.net/2013/07/slideshow-20-cats-that-suck-at-reducing-tensions-in-the-israeli-palestinian-conflict/&usg=__1yeuvke4a9r6iirkcz9pu6ahn8q=&h=800&w=1280&sz=433&hl=en&start=1&sig2=ekqjelpnqsk-qq2r-4teeq&zoom=1&tbnid=xz9p1wd4o4tslm:&tbnh=94&tbnw=150&ei=b8sduq36ge3figlczoby&itbs=1&sa=x&ved=0ccwqrqmwaa">
now here snip of output of:
page.links.each |link| puts link.text. end
which display links on page.
more large face photo clip art line drawing animated past 24 hours past week reset tools funny cats cats , kittens cats musical cute cats lots of cats cats guns 2 3 4 5 6 7 8 9 10 next
notice whitespace on screen? empty name "" links on pp page output. have ideas on how can click one?
here code script.
require 'mechanize' agent = mechanize.new page = agent.get('https://google.com') page = agent.page.link_with(:text => 'sign in').click # pp page sign_in = page.form() ##leave empty = nil sign_in.email = '10halec' sign_in.passwd = 'password' page = agent.submit(sign_in) page = agent.page.link_with(:text => 'images').click search = page.form('f') search.q = 'cats' page = agent.submit(search) # pp page # agent.page.image_with(:src => /imgres?/).fetch.save page = agent.page.link_with(:text => '').click # pp page # page.links.each |link| # puts link.text # end pp page def save filename = nil filename = find_free_name filename save! filename end
notice whitespace on screen? empty name "" links on pp page output. have ideas on how can click one?
page = agent.page.link_with(:text => '').click
that line works me. put both of following html pages in local apache server's htdocs directory(a publicly accessible directory):
page1.html:
<!doctype html> <html> <head><title>test</title></head> <body> <div><a href="/somesite.com/cat1.jpg">cat1</a></div> <div><a href="/page2.html"></a></div> <div><a href="/somesite.com/cat3.jpg"></a></div> </body> </html>
page2.html:
<!doctype html> <html> <head><title>page2</title></head> <body> <div>hello</div> </body> </html>
then started server, meant page1.html accessible in browser using url:
http://localhost:8080/page1.html
then ran ruby program:
require 'mechanize' agent = mechanize.new agent.get('http://localhost:8080/page1.html') pp agent.page page = agent.page.link_with(:text => '').click puts page.title
...and output was:
#<mechanize::page {url #<uri::http:0x00000100c8dc18 url:http://localhost:8080/page1.html>} {meta_refresh} {title "test"} {iframes} {frames} {links #<mechanize::page::link "cat1" "/somesite.com/cat1.jpg"> #<mechanize::page::link "" "/page2.html"> #<mechanize::page::link "" "/somesite.com/cat3.jpg">} {forms}> page2
the pp page output looks same output, , able click on link has no text--as evidenced output page2.
the problem code that link_with() returns first match. if use links_with(), matching links:
require 'mechanize' agent = mechanize.new agent.get('http://localhost:8080/page1.html') links = agent.page.links_with(:text => '') p links --output:-- [#<mechanize::page::link "" "/page2.html"> , #<mechanize::page::link "" "/somesite.com/cat3.jpg"> ]
i see actual html of links having problems with.
Comments
Post a Comment