import PyPDF2
pdfFileObj = open('C:\\sem1\\691-project\\Dataset\\Maths\\A Spiral Workbook for Discrete Mathematics.pdf', 'rb')
pdfReader = PyPDF2.PdfReader(pdfFileObj)
out_file = open('C:\\sem1\\691-project\\Dataset\\Maths\\A Spiral Workbook for Discrete Mathematics.txt', 'a')
for pageObj in pdfReader.pages:
page_text = pageObj.extract_text()
print(page_text)
out_file.write(page_text)
out_file.close()
pdfFileObj.close()
Am able to extract text from whole book. Rather I need text only from selected page numbers or selected range.
You can try like this :
The
range()function is used to obtain the page numbers in the specified range, and then the corresponding page objects are extracted usingpdf_reader.pages[page_number]. The extracted text is then written to the output_file.