Get text from Font Patterns (FNG) from AFP file

Question

Get text from Font Patterns (FNG) from AFP file

143 Views Asked by Alex At 26 August 2019 at 21:02

Can anyone help obtaining text from "Font Patterns (FNG)" field from an AFP file. Is there any library (preferably Java) which can be used for this task?

Thank you,

Original Q&A

There are 2 best solutions below

**Yan Hackl-Feldbusch** · Answer 1 · 2019-08-28T10:04:12.827000

You can try afplib. It has some sample code that dumps all structured fields (org.afplib.samples.DumpAFP). It produces output like this:

...
FNG  number:47,offset:49787,id:13889161,length:8201,rawData:null,charset:null,PatData:[B@4e3958e7,
FNG  number:48,offset:57988,id:13889161,length:8201,rawData:null,charset:null,PatData:[B@77f80c04,
FNG  number:49,offset:66189,id:13889161,length:8201,rawData:null,charset:null,PatData:[B@1dac5ef,
FNG  number:50,offset:74390,id:13889161,length:6991,rawData:null,charset:null,PatData:[B@5c90e579,
EFN  number:51,offset:81381,id:13871497,length:17,rawData:null,charset:null,RSName:C0EX0480,

You could use the binary array PatData to extract the font pattern like this:

    try (AfpInputStream in = new AfpInputStream(
        new BufferedInputStream(new FileInputStream(args[0])))) {

        SF sf;
        while((sf = in.readStructuredField()) != null) {
            if(sf instanceof FNG) {
                byte[] pattern = ((FNG)sf).getPatData();
            }
        }
    } catch (IOException e) {
        e.printStackTrace();
    }

**WeightedWaffle** · Answer 2 · 2023-09-21T02:32:37.540000

I'm using Python OCR + Pytesseract to do this. Convert to jpg first then read the jpg using OCR to txt forms.

def convert_pdf_to_txt(pdf_file_nm):

    # If you need to assign tesseract to path
    # pytesseract.pytesseract.tesseract_cmd = r'C:\Users\xxx\AppData\Local\Tesseract-OCR\tesseract.exe'

    dir = './pdf/'
    pdf_path = dir + pdf_file_nm    
    output_filename = pdf_file_nm.replace('.pdf','') + ".txt"
    output_path = './text/'+ output_filename
    pages = convert_from_path(pdf_path)
    pg_cntr = 1
    #list = []


    sub_dir = str("images/" + pdf_path.split('/')[-1].replace('.pdf','') + "/")

    ## To ensure directory is exist / created
    if not os.path.exists(sub_dir):
        os.makedirs(sub_dir)

    for page in pages:
        print("ok")
        filename = "pg_"+str(pg_cntr)+'_'+pdf_path.split('/')[-1].replace('.pdf','.jpg')
        page.save(sub_dir+filename)
        
        ###list.append(str(pytesseract.image_to_string(sub_dir+filename)))

        with io.open(output_path, 'a+', encoding='utf8') as f:
            f.write(str("======================================================== PAGE " + str(pg_cntr) + " ========================================================\n"))
            f.write(str(pytesseract.image_to_string(sub_dir+filename)+"\n"))
            f.write(str(devider))
        pg_cntr += 1
            
    print('1. Process to convert PDF to image completed successfully.\n')

    return output_filename

Get text from Font Patterns (FNG) from AFP file

There are 2 best solutions below

Related Questions in JAVA

Related Questions in AFP

Trending Questions

Popular # Hahtags

Popular Questions