I am converting a PDF to Text using 'iText.PdfTextExtractor' and I am receiving this error ONLY on some of the pdf pages I am trying to convert:
'BuiltIn' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method. (Parameter 'name')
I've tried adding the following code before opening the file stream, but I am still receiving the error:
System.Text.EncodingProvider provider = System.Text.CodePagesEncodingProvider.Instance;
Encoding.RegisterProvider(provider);
Here is my code:
public void ExtractFromPdf(string pdfFile, ClaimInfo claimInfo, string memberId)
{
System.Text.EncodingProvider provider = System.Text.CodePagesEncodingProvider.Instance;
Encoding.RegisterProvider(provider);
PdfReader pdfRead = new PdfReader(pdfFile);
PdfDocument pdfDoc = new PdfDocument(pdfRead);
for (int page = 1; page < pdfDoc.GetNumberOfPages(); page++)
{
string convertToText = PdfToText(pdfDoc, page);
}
}
private string PdfToText(PdfDocument pdfDoc, int pageNo)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
return PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(pageNo), strategy);
}
The error occurs at return PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(pageNo), strategy);
I've tried looking everywhere and it seems that 'BuiltIn' is a built in encoding name that I don't know how to find. Any ideas?