Getting textcontent pdf.js

Question

Getting textcontent pdf.js

2.9k Views Asked by Difusio At 16 May 2025 at 12:44

I'm trying to get the text from a pdf document using pdf.js in JS. However, pdf.js has no decent documentation, i've looked at the available examples, and I came up to this:

var pdfUrl = "http://localhost/test.pdf"
var pdf = PDFJS.getDocument(pdfUrl);
pdf.then(function(pdf) {
    var maxPages = pdf.pdfInfo.numPages;
    for (var j = 1; j < maxPages; j++) {
        var page = pdf.getPage(j);

        page.then(function() {
            var textContent = page.getTextContent();

        })
    }
});

The page bit is working, because I can see it is a promiss. However, running this bit gives:

Warning: Unhandled rejection: TypeError: Object #<Object> has no method 'getTextContent'
TypeError: Object #<Object> has no method 'getTextContent'

It is working this way in examples i've seen. It is getting the page, and I can print out number of pages.

Anyone with experience who can shed a light?

*Bonus question: I'm only interested in parsing pdf, not in rendering it in browser. However it has to be done clientside. Is pdf.js the right hammer for the job?

Original Q&A

There are 3 best solutions below

Jussi Palo On 23 January 2015 at 13:35

You also need to change it to

for (var j = 1; j <= maxPages; j++) {

otherwise you'll never get the first page.

Qaddura On 25 June 2014 at 07:43

PDF.js renders your pdf file and generates words then outputs them as html elements . Each element is then placed above your pdf with css property {position:absolute;left:X,top:Y} and masked over your pdf.

These divs are given css property {color:transparent}. this does the trick of selection highlighting, it appears that you are directly selecting from the pdf file but actually you are selecting the created html elements.

this is exactly how it works, if you want to render the pdf file it is okay but keep it in your mind that if you wanted to change the output technique (html transparent divs) you have to bring your own replacement...

**Dean Taylor** · Accepted Answer

Dean Taylor On 15 December 2013 at 19:06 BEST ANSWER

page.then(function() { should be page.then(function(page) {

Getting textcontent pdf.js

There are 3 best solutions below

Related Questions in JAVASCRIPT

Related Questions in PDF.JS

Trending Questions

Popular # Hahtags

Popular Questions