Get PDF XMP Metadata without loading the complete document

1k Views Asked by Michel van Engelen At 08 November 2021 at 15:46

With libraries like iTextSharp or iText you can extract metadata from PDF documents via a PdfReader:

using (var reader = new PdfReader(pdfBytes))
{
    return reader.Metadata == null ? null : Encoding.UTF8.GetString(reader.Metadata);
}

These kind of libraries completely parse the PDF document before being able to soup up the metadata. This will, in my case, lead to high usage of system resources since we get many requests per second, with large PDF's.

Is there a way to extract the metadata from the PDF without completely loading it in memory first?

Original Q&A

There are 2 best solutions below

mkl On 09 November 2021 at 09:18 BEST ANSWER

iText 5.x allows partial reading of PDFs, too, it merely looks a bit more complicated.

Instead of

using (var reader = new PdfReader(pdfBytes))

use

using (var reader = new PdfReader(new RandomAccessFileOrArray(pdfBytes), null, true))

where the final true requests partial reading.

iPDFdev On 09 November 2021 at 08:31

With PDF4NET you can extract the XMP metadata without loading the entire document in memory:

// This does a minimal parsing of the PDF file and loads 
// only a few objects from the file
PDFFile pdfFile = new PDFFile(new MemoryStream(pdfBytes));

string xmpMetadata = pdfFile.ExtractXmpMetadata();

Update 1: code changed to load the file from a byte array

Disclaimer: I work the for company that develops PDF4NET library.

Get PDF XMP Metadata without loading the complete document

There are 2 best solutions below

Related Questions in C#

Related Questions in PDF

Related Questions in XMP

Trending Questions

Popular # Hahtags

Popular Questions