Extracting PDF's from EDGAR database

31 Views Asked by P5C768 At 14 March 2024 at 23:46

Trying to extract the information from the PDF located here: https://www.sec.gov/Archives/edgar/data/784028/000078402823000002

However, even though this is an open API, it appears that the SEC blocks the extraction/scraping of the PDF (I get a 403 error when trying to use pdftools in R) and the associated .txt file appears to be encrypted. Have tried using the XBRL and EDGAR package in R, but since this company doesn't publish the data via XML, those appear not to work. Open to Python solutions as well.

Here is what I have tried:

library(httr)
library(XBRL)
library(edgar)
library(pdftools)

pdf_Academy <- "https://www.sec.gov/Archives/edgar/data/784028/000078402823000002/2022public.pdf"
academyInfo <- pdftools::pdf_info(pdf_Academy) 
academytxt <- pdftools::pdf_text(pdf_Academy)

df0 = httr::GET("https://data.sec.gov/submissions/CIK784028.json")

df1 = edgar::getFilings(cik.no = "784028",filing.year = 2022, downl.permit = "y", form.type = "ALL")

Original Q&A

Extracting PDF's from EDGAR database

There are 0 best solutions below

Related Questions in R

Related Questions in HTTR

Related Questions in EDGAR

Trending Questions

Popular # Hahtags

Popular Questions