# importing required modules
import PyPDF2
# creating a pdf file object
pdfFileObj = open('C:\\sem1\\691-project\\Dataset\\Maths\\A Spiral Workbook for Discrete Mathematics.pdf', 'rb')
# creating a pdf reader object
pdfReader = PyPDF2.PdfReader(pdfFileObj)
# printing number of pages in pdf file
print(len(pdfReader.pages))
# creating a page object
pageObj = pdfReader.pages[89]
# extracting text from page
print(pageObj.extract_text())
# closing the pdf file object
pdfFileObj.close()
I can extract text from only one page but unable to extract from multiple pages.
You just need to iterate over all the pages of the PDF,
See in this line:
What you're actually doing is getting page 90 in the document, because pages are zero indexed (start from 0, page 1 is the 0th page, page 2 is the 1st page ....).
Instead loop through all the pages like this :
To save the text to a file:
You open the file before the loop and append the text of every page to the end of the file.
Your whole code may look like this:
Please not that this code will always append at the end of this file
'C:\sem1\691-project\Dataset\Maths\A Spiral Workbook for Discrete Mathematics.txt', if you run this code multiple times, be sure to empty the file first.