My goal is to covert a pdf into a file that fits the factur-x format.
I successfully converted a pdf into pdfA/3-b Here's the code:
import subprocess
gs_path = r"C:\Program Files\gs\gs10.02.1\bin\gswin64.exe"
def convert_to_pdfa(input_path, output_path, pdfa_def_path):
command = [
gs_path,
"-dPDFA=3",
"-dBATCH",
"-dNOPAUSE",
"-sColorConversionStrategy=UseDeviceIndependentColor",
"-sDEVICE=pdfwrite",
"-sOutputFile=" + output_path,
"-dPDFACompatibilityPolicy=2",
pdfa_def_path,
input_path
]
subprocess.run(command)
if __name__ == "__main__":
input_pdf_path = "facture.pdf"
output_pdfa_path = "output_pdfa.pdf"
pdfa_def_path = "PDFA_def.ps"
convert_to_pdfa(input_pdf_path, output_pdfa_path, pdfa_def_path)
Here's the code in the PDFA_def.ps file:
% Define entries in the document Info dictionary :
/ICCProfile (sRGB_v4_ICC_preference.icc)
def
[ /Title (test)
/DOCINFO pdfmark
% Define an ICC profile :
[/_objdef {icc_PDFA} /type /stream /OBJ pdfmark
[{icc_PDFA} <</N systemdict /ProcessColorModel get /DeviceGray eq {1} {4} ifelse >> /PUT pdfmark
[{icc_PDFA} ICCProfile (r) file /PUT pdfmark
% Define the output intent dictionary :
[/_objdef {OutputIntent_PDFA} /type /dict /OBJ pdfmark
[{OutputIntent_PDFA} <<
/Type /OutputIntent % Must be so (the standard requires).
/S /GTS_PDFA1 % Must be so (the standard requires).
/DestOutputProfile {icc_PDFA} % Must be so (see above).
/OutputConditionIdentifier (sRGBv4 ICC preference)
/PUT pdfmark
% Embed XML file:
[ /_objdef {InvoiceStream} /type /stream /OBJ pdfmark
[ {InvoiceStream} << /Type /EmbeddedFile /Subtype (text/xml) cvn /Params << /ModDate (D:20130121081433+01’00’) >> >> /PUT pdfmark
[ {InvoiceStream} (output.xml) (r) file /PUT pdfmark
[ {InvoiceStream} /CLOSE pdfmark
[ /_objdef {Invoice_FSDict} /type /dict /OBJ pdfmark
[ {Invoice_FSDict} << /Type /FileSpec /F (output.xml) /UF (output.xml) /Desc (ZUGFeRD XML invoice) /AFRelationship /Alternative /EF << /F {InvoiceStream} /UF {InvoiceStream} >> >> /PUT pdfmark
[ /_objdef {AFArray} /type /array /OBJ pdfmark
[ {AFArray} {FSDict} /APPEND pdfmark
[ {Catalog} << /AF {AFArray} >> /PUT pdfmark
[ /Name (output.xml) /FS {FSDict} /EMBED pdfmark
[
/XML
(
...
)
/Ext_Metadata pdfmark
I followed this tutorial on the zugferd blog
When I open the pdf, there's no attached xml file: There is no xml files attached
I compared the pdf I rendered with a pdf that follows the factur-x format
the pdf I rendered:
46 0 obj
<</Type/Metadata
/Subtype/XML/Length 1294>>stream
<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
<?adobe-xap-filters esc="CRLF"?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='XMP toolkit 2.9.1-13, framework 1.6'>
...
</x:xmpmeta>
<?xpacket end='w'?>
endstream
endobj
valid pdf:
8 0 obj
<<
/Filter /FlateDecode
/Subtype /XML
/Type /Metadata
/Length 978
>>
stream
... binary data ...
endstream
endobj




I see not that your subprocess followed the command.ine description of ghostwriter -> here:
There are also factur-x python libraries on PyPi.