I'm using HtmlProvider to web scrape stock company news e.g. https://www.nasdaq.com/symbol/{STOCK_SYMBOL_HERE}/news-headlines but I'm getting an error in this line of code

let [<Literal>] stockNewsUrl = "https://www.nasdaq.com/symbol/AAPL/news-headlines"
let news = new HtmlProvider<stockNewsUrl>()

There is squiggle on the second line and the error was Error FS3033 The type provider 'ProviderImplementation.HtmlProvider' reported an error: Cannot read sample HTML from 'https://www.nasdaq.com/symbol/AAPL/news-headlines': The 'Value'='AAPL,technology' part of the cookie is invalid.

2

There are 2 best solutions below

1
Nghia Bui On BEST ANSWER

To make an HTTP request to https://www.nasdaq.com/symbol/AAPL/news-headlines, we are required to provide a CookieContainer. Since you are using the FSharp.Data library, I suggest to use its HTTP Utilities:

type Nasdaq = HtmlProvider<"/tmp.html">
let cc = CookieContainer ()
let data =
    Http.RequestString ("https://www.nasdaq.com/symbol/AAPL/news-headlines", cookieContainer = cc)
    |> Nasdaq.Parse
data.Tables.``Today's Market Activity``.Html
|> printfn "%A"

Of course you have to pre-download the page and save to /tmp.html first.

Small note: if we already have the HTML string (as in our case), we use Nasdaq.Parse; if we have a url, we use Nasdaq.Load.

3
Tomas Petricek On

It looks like this fails because F# Data sends cookies in a format that the Nasdaq service does not like. An easy workaround is to download the page once to have a sample available at compile-time and then download the page at runtime using some other means.

type Nasdaq = HtmlProvider<"c:/temp/nasdaq.html">

let wc = new WebClient()
let downloaded = wc.DownloadString("https://www.nasdaq.com/symbol/AAPL/news-headlines")

let ns = Nasdaq.Load(downloaded)

This works, but there are two issues:

  • The page dos not contain any tables/lists, so the ns value is not giving you nice static access to anything useful
  • I get timeout exception when I try to download the data using WebClient, so perhaps that also does not work (but it might just be that I'm behind a proxy or something..)

Related Questions in F#