Extracting PDF's from EDGAR database

31 Views Asked by At

Trying to extract the information from the PDF located here: https://www.sec.gov/Archives/edgar/data/784028/000078402823000002

However, even though this is an open API, it appears that the SEC blocks the extraction/scraping of the PDF (I get a 403 error when trying to use pdftools in R) and the associated .txt file appears to be encrypted. Have tried using the XBRL and EDGAR package in R, but since this company doesn't publish the data via XML, those appear not to work. Open to Python solutions as well.

Here is what I have tried:

library(httr)
library(XBRL)
library(edgar)
library(pdftools)

pdf_Academy <- "https://www.sec.gov/Archives/edgar/data/784028/000078402823000002/2022public.pdf"
academyInfo <- pdftools::pdf_info(pdf_Academy) 
academytxt <- pdftools::pdf_text(pdf_Academy)

df0 = httr::GET("https://data.sec.gov/submissions/CIK784028.json")

df1 = edgar::getFilings(cik.no = "784028",filing.year = 2022, downl.permit = "y", form.type = "ALL")
0

There are 0 best solutions below