Using ArialMT for Arabic text without embedding font with PDFBox

907 Views Asked by At

I'm using Apache PDFBox to write Arabic text on a page without embedding the font. It would appear that ArialMT is generally available so that both PDFBox will work and a PDF viewer will not have trouble with the final document; however, I have not managed to uncover a code strategy by which the font can be used but will not be embedded.

Note: This is perfectly possible by the PDF standard and I've seen such generated documents.

ADDENDUM (further explaining the case)

The specific case for non-embedded font is the case where I'm generating a document with images and placing invisible text (e.g. produced via OCR) on top of the images. When conforming to the PDF/A standard, embedding of the font in such cases is not necessary, as the image is the only source for rasterization of the document. The "standard 14" fonts do not include Arabic codepoints, so that another font would need to be referenced for PDFBox to work, but loading a font makes it embedded.

1

There are 1 best solutions below

2
Mike 'Pomax' Kamermans On

To elaborate on Tilman's comment,

Just because you can do something doesn't mean you should. There are computers that don't have much fonts and the result may be weird

They're entirely correct: don't do this, use subset embedding because different setups can have different versions of Arial all of which will resolve against the ArialMT identifier, but with completely different internal glyphIDs.

As PDFs point to glyphids, not 'letters', what looks like cake with your copy of Arial could —when encoded as glyphid array— end up being B^r( in a different version of Arial. And that even includes newer versions of Arial that you yourself might end up using a year from now: suddenly your PDF files are completely unusable even for you.

PDF should be stand-alone documents. If you want people to read your PDFs, use subset embedding for the fonts you used, even if they're supposedly "generally available". The only way to not embed a font is to make the document use only fonts from the predefined standard set of 14 fonts, which any PDF-spec compliant reader must come with in order to render content without font embeds. And notice that Arial is not in that list.