How to convert a JDF file to a PDF (Removing text from a multi-encoded document)

Question

How to convert a JDF file to a PDF (Removing text from a multi-encoded document)

940 Views Asked by Dean Meehan At 26 April 2019 at 16:13

I am trying to convert a JDF file to a PDF file using C#.

After looking at the JDF format... I can see that the file is simply an XML placed at the top of a PDF document.

I've tried using the StreamWriter / StreamReader functionality in C# but due to the PDF document also containing binary data, and variable newlines (\r\t and \t) the file produced cannot be opened as some of the binary data is distroyed on the PDF's. Here is some of the code I've tried using without success.

using (StreamReader reader = new StreamReader(_jdf.FullName, Encoding.Default))
{
    using (StreamWriter writer = new StreamWriter(_pdf.FullName, false, Encoding.Default))
    {

        writer.NewLine = "\n"; //Tried without this and with \r\n

        bool IsStartOfPDF = false;
        while (!reader.EndOfStream)
        {
            var line = reader.ReadLine();

            if (line.IndexOf("%PDF-") != -1)
            {
                IsStartOfPDF = true;
            }

            if (!IsStartOfPDF)
            {
                continue;
            }

            writer.WriteLine(line);
        }
    }
}

Original Q&A

There are 1 best solutions below

**Dean Meehan** · Accepted Answer · 2019-04-26T16:13:05.320000

I am self answering this question, as it may be a somewhat common problem, and the solution could be informative to others.

As the document contains both binary and text, we cannot simply use the StreamWriter to write the binary back to another file. Even when you use the StreamWriter to read a file then write all the contents into another file you will realize differences between the documents.

You can utilize the BinaryWriter in order to search a multi-part document and write each byte exactly as you found it into another document.

//Using a Binary Reader/Writer as the PDF is multitype
using (var reader = new BinaryReader(File.Open(_file.FullName, FileMode.Open)))
{
    using (var writer = new BinaryWriter(File.Open(tempFileName.FullName, FileMode.CreateNew)))
    {

        //We are searching for the start of the PDF 
        bool searchingForstartOfPDF = true;
        var startOfPDF = "%PDF-".ToCharArray();

        //While we haven't reached the end of the stream
        while (reader.BaseStream.Position != reader.BaseStream.Length)
        {
            //If we are still searching for the start of the PDF
            if (searchingForstartOfPDF)
            {
                //Read the current Char
                var str = reader.ReadChar();

                //If it matches the start of the PDF signiture
                if (str.Equals(startOfPDF[0]))
                {
                    //Check the next few characters to see if they match
                    //keeping an eye on our current position in the stream incase something goes wrong
                    var currBasePos = reader.BaseStream.Position;
                    for (var i = 1; i < startOfPDF.Length; i++)
                    {
                        //If we found a char that isn't in the PDF signiture, then resume the while loop
                        //to start searching again from the next position
                        if (!reader.ReadChar().Equals(startOfPDF[i]))
                        {
                            reader.BaseStream.Position = currBasePos;
                            break;
                        }
                        //If we've reached the end of the PDF signiture then we've found a match
                        if (i == startOfPDF.Length - 1)
                        {
                            //Success
                            //Set the Position to the start of the PDF signiture 
                            searchingForstartOfPDF = false;
                            reader.BaseStream.Position -= startOfPDF.Length;
                            //We are no longer searching for the PDF Signiture so 
                            //the remaining bytes in the file will be directly wrote
                            //using the stream writer
                        }
                    }
                }
            }
            else
            {
                //We are writing the binary now
                writer.Write(reader.ReadByte());
            }
        }

    }
}

This code example uses the BinaryReader to read each char 1 by 1 and if it finds a match of the string %PDF- (The PDF Start Signature) it will move the reader position back to the % and then write the remaining document using writer.Write(reader.ReadByte()).

How to convert a JDF file to a PDF (Removing text from a multi-encoded document)

There are 1 best solutions below

Related Questions in C#

Related Questions in PDF

Related Questions in JDF

Trending Questions

Popular # Hahtags

Popular Questions