Is there an update to open-uri that changes the way you call a User-Agent?

1.4k Views Asked by At

In the book "Instant Nokogiri" and on the Packt Hub Nokogiri page it has a User-Agent application for spoofing a browser while crawling the New York Times website for the top story.

I am working through this book but the code is a little dated, but I updated it.

My version of the code is:

require 'open-uri'
require 'nokogiri'
require 'sinatra'

browser = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4)
AppleWebKit/536.30.1 (KHTML, like Gecko) Version/6.0.5 Safari/536.30.1'

doc = Nokogiri::HTML(open ('http://nytimes.com', browser))

nyt_headline = doc.at_css('h2 span').content

nyt_url = "http://nytimes.com" + doc.at_css('.css-16ugw5f a')[:href]


html = "<h1>Nokogiri News Service</h1>"
html += "<h2>Top Story: <a href=\"#{nyt_url}\">#{nyt_headline}</a></h2>"

get '/' do
    html
end

I am running this through a terminal session on Mac OS and getting this error:

invalid access mode Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) (ArgumentError)
AppleWebKit/536.30.1 (KHTML, like Gecko) Version/6.0.5 Safari/536.30.1 (URI::HTTP resource is read only.)

I don't believe I am attempting to 'write'. Not sure why a 'read only' error would block this from running. It was working before I added the User Agent info.

1

There are 1 best solutions below

0
the Tin Man On

See OpenURI's open documentation:

URI.open("http://www.ruby-lang.org/en/",
  "User-Agent" => "Ruby/#{RUBY_VERSION}",
  "From" => "[email protected]",
  "Referer" => "http://www.ruby-lang.org/") {|f|
  # ...
}

The options are a Hash. You're passing a String.