Extract geometric objects (lines, circles,...) from a pdf using PDFMM

111 Views Asked by At

I have a PDF containing several geometric objects (mostly lines) in different sizes and color. I want to extract them in the following form, e.g. for lines:

  • (startx, starty)
  • (endx, endy)
  • width
  • color

Optinal a "z" Position determining which object is drawn first. The language of my choice is C++ and I thought about PoDoFo, respectively PDFMM, as it should be more accessible. However I am total lost how to acess this information...

I found the following reference: PDF parsing in C++ (PoDoFo)

however I was not able to make the PdfTokenizer work. The Tokenizer.TryReadNextToken needs a InputStreamDevice object, and I do not know how to get it.

For example: I create a single page with just one line in pdfmm. And now I want to extract this information:

#include <pdfmm/pdfmm.h>

int main()
{


try {
    PdfMemDocument document;

    document.Load("test.pdf");
    PdfPage* page = document.GetPages().CreatePage(PdfPage::CreateStandardPageSize(PdfPageSize::A4));

    // Draw single line
    PdfPainter painter;
    painter.SetCanvas(page);

    painter.GetGraphicsState().SetLineWidth(10);
    painter.DrawLine(0, 0, page->GetRect().GetWidth(), page->GetRect().GetHeight());
    painter.FinishDrawing();


    // Loop over all token of page
    PdfTokenizer token(true);
    char* stoken = nullptr;
    PdfVariant var;
    PdfContentType type;

    while (token.TryReadNextToken( ????  ,stoken,type)) {


    }


}
catch (PdfError& err)
{

    err.PrintErrorMsg();
    return (int)err.GetError();

}


}

If anybody could push me in the correct direction, this would be awesome! And if somebody has a good documentation about the structure of a pdf and/or a good tutorial of pdfmm / PoDoFo, this would also highly appreciated...

0

There are 0 best solutions below