I'm trying to get the text from a pdf document using pdf.js in JS. However, pdf.js has no decent documentation, i've looked at the available examples, and I came up to this:
var pdfUrl = "http://localhost/test.pdf"
var pdf = PDFJS.getDocument(pdfUrl);
pdf.then(function(pdf) {
var maxPages = pdf.pdfInfo.numPages;
for (var j = 1; j < maxPages; j++) {
var page = pdf.getPage(j);
page.then(function() {
var textContent = page.getTextContent();
})
}
});
The page bit is working, because I can see it is a promiss. However, running this bit gives:
Warning: Unhandled rejection: TypeError: Object #<Object> has no method 'getTextContent'
TypeError: Object #<Object> has no method 'getTextContent'
It is working this way in examples i've seen. It is getting the page, and I can print out number of pages.
Anyone with experience who can shed a light?
*Bonus question: I'm only interested in parsing pdf, not in rendering it in browser. However it has to be done clientside. Is pdf.js the right hammer for the job?
page.then(function() {
should bepage.then(function(page) {