I am trying to convert a JDF file to a PDF file using C#.
After looking at the JDF format... I can see that the file is simply an XML placed at the top of a PDF document.
I've tried using the StreamWriter / StreamReader functionality in C# but due to the PDF document also containing binary data, and variable newlines (\r\t and \t) the file produced cannot be opened as some of the binary data is distroyed on the PDF's. Here is some of the code I've tried using without success.
using (StreamReader reader = new StreamReader(_jdf.FullName, Encoding.Default))
{
using (StreamWriter writer = new StreamWriter(_pdf.FullName, false, Encoding.Default))
{
writer.NewLine = "\n"; //Tried without this and with \r\n
bool IsStartOfPDF = false;
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (line.IndexOf("%PDF-") != -1)
{
IsStartOfPDF = true;
}
if (!IsStartOfPDF)
{
continue;
}
writer.WriteLine(line);
}
}
}
I am self answering this question, as it may be a somewhat common problem, and the solution could be informative to others.
As the document contains both binary and text, we cannot simply use the
StreamWriterto write the binary back to another file. Even when you use theStreamWriterto read a file then write all the contents into another file you will realize differences between the documents.You can utilize the
BinaryWriterin order to search a multi-part document and write each byte exactly as you found it into another document.This code example uses the
BinaryReaderto read each char 1 by 1 and if it finds a match of the string%PDF-(The PDF Start Signature) it will move the reader position back to the%and then write the remaining document usingwriter.Write(reader.ReadByte()).