R qdap Search exclude syntax

49 Views Asked by At

I have the following output from data that I have downloaded from the Wall Street Journal.

> Search(MySymList, " Net Income")
    Fiscal year is July-June. All values AUD Millions.   2018    2017    2016   2015 2014 5-year trend
82                             Consolidated Net Income    949     814     376    850  769             
86                                          Net Income    934     792     335    817  737             
88                                   Net Income Growth 18.04% 135.99% -58.93% 10.83%    -             
103                   Net Income After Extraordinaries    934     792     335    817  909             
107                     Net Income Available to Common    934     792     335    817  565      

I want to capture Net Income but as there is no consistency in where Net Income will be in the data (as in line number), I tried using library qdap and Search in particular. It does a wonderful job of finding most information but I am stumped with how to remove the other lines.

I thought that exclude might be helpful but it just doesn't seem to work.

Search(MySymList, " Net Income", exclude = "Common")
Error in agrep(term, x, ignore.case = TRUE, max.distance = max.distance,  : 
  unused argument (exclude = "Common")

I can get the Net Income by other means but I would prefer to do it with just one function, that being Search or anything that the library qdap might offer.

Any guidance would be most welcome.

EDIT!!

The cut down code is as follows as it is easier to run it than to provide data for it. The symbol is different from the original so the line numbers will have changed.

library(httr)
library(XML)
library(data.table)
library(qdap)
library(Hmisc)
getwsj.quotes <- function(Symbol) 
{
    MyUrl <- sprintf("https://quotes.wsj.com/AU/XASX/%s/financials/annual/income-statement", Symbol)
        Symbol.Data <- GET(MyUrl)   
        x <- content(Symbol.Data, as = 'text')
        wsj.tables <- sub('cr_dataTable cr_sub_capital', '\\1', x)
        SymData <- readHTMLTable(wsj.tables)
        return(SymData)       
}
TickerList <- c("AMC")
SymbolDataList <- lapply(TickerList, FUN = getwsj.quotes)
MySymList <- data.frame()
MySymList <- SymbolDataList[[1]][[2]]
Search(MySymList, " Net Income")

Regards Stephen

1

There are 1 best solutions below

0
Stephen On

I have made a breakthrough but it might not be the most efficient code. Giving a short name to the first column helped a lot. The function which provides an exact match function for searching. Alas, I cannot answer my own question about the library qdap Search function.

library(httr)
library(XML)
library(data.table)
library(qdap)
library(Hmisc)
getwsj.quotes <- function(Symbol) 
{
    MyUrl <- sprintf("https://quotes.wsj.com/AU/XASX/%s/financials/annual/income-statement", Symbol)
        Symbol.Data <- GET(MyUrl)   
        x <- content(Symbol.Data, as = 'text')
        wsj.tables <- sub('cr_dataTable cr_sub_capital', '\\1', x)
        SymData <- readHTMLTable(wsj.tables)
        return(SymData)       
}
TickerList <- c("BHP")
SymbolDataList <- lapply(TickerList, FUN = getwsj.quotes)
MySymList <- data.frame()
MySymList <- SymbolDataList[[1]][[2]]
Search(MySymList, " Net Income") # purely for testing what is available.
names(MySymList) <- c("FinElement", "2018", "2017", "2016", "2015", "2014", "5-year trend")
lineNo <- which(MySymList$FinElement == "Net Income")
MySymList[ lineNo:lineNo, ]

The output is:

> Ratio  2018  2017    2016  2015   2014 5-year trend
91 Net Income 8,585 8,453 (8,774) 4,109 14,775 

Thanks to everyone who considered this problem. Regards Stephen