I am currently confronted with an issue related to the processing of PDF files generated through Ghostscript. Specifically, when attempting to extract text from these PDFs using pdfminer and fitz, I am encountering a RuntimeError accompanied by the message 'pdf device does not support type 3 fonts.' This error has introduced significant disruptions to my workflow.
I am seeking input from fellow community members who might have encountered a similar issue. If you have faced this specific problem or something analogous, I kindly request your insights on how you successfully resolved it or any effective workarounds you employed. Your comprehensive explanations would be greatly appreciated, as I am actively seeking a resolution to this challenge."
I uses the pdfminer.six package v 20221105
Type 3 fonts can be extractable in some cases such as here (7 line fonts of Type3 and 1 of type 1), but not easily since they are often a custom encoding. So see how the extraction on the left would need recoding to the numeric styles in the body (Just like in Caesar's Roman Times encryption, but clearly NOT Times Roman Font :-) However the Type 1 is
CMR10which is Computer Modern Roman. Restructured output is "Doable" but specific to that document.This is a very simple recoded format in a simple usage, most may be more complex, and be simply bitmaps / outline shapes i.e. not plain text.
So for an example, HTML extraction will need much parsing to make output, similar to source, and "Find And Replace" during conversion will easily work in this case, by provide alternate coding and Font substitutions for parts of lines.