I have a WinForms application which uses PDFBox for extracting text from a PDF file. The version I've been using is 1.8.4. In that version, I was using PDDocument.load, as shown below
public class PdfTextExtractor
{
/// <summary>
/// Apache PDFBox classes that permit conversion
/// of PDF documents to string.
/// </summary>
public static String PDFText(String PDFFilePath)
{
PDDocument doc = PDDocument.load(PDFFilePath);
PDFTextStripper stripper = new PDFTextStripper();
string text = stripper.getText(doc);
doc.close();
return text;
}
}
However, in the article below, I've read that PDDocument.loadNonSeq is the preferred method, and that 'Since 2.0.0 the former correct PDDocument.loadNonSeq has become PDDocument.load'
I've tried switching my code to use PDDocument.loadNonSeq, but it doesn't seem to have an overload allowing me to pass in a FilePath.
My question is this. Where can I find version 2.x.x of PDFBox And matching IKVM?
I do not want to have to "Build" this from source. The web sites I have found that have IKVM included with PDFBox are version 1.8.9 or older. Sites listing newer versions of PDFBox 2.x.x and up seem to be missing IKVM, which I'm reading utilizes IKVM to create a fully functioning PDF library for the .NET framework.
As the newer 'PDDocument.load' method from version 2.0.0 on are supposedly more accurate, please let me know where I can find both libraries for download.
@Tilman Hausherr. OK, if IKVM has been discontinued, then I would like to know the following:
- Is it possible to pass a FilePath to PDDocument.loadNonSeq? If so, which of the overloads should I use?
- Or, of the available newer .NET implementations of PDFBox i.e. 2.x.x, is there one that by itself, would have the functionality of PDF to Text conversion without the use of IKVM?