Order PDF content in content panel based on tagging order ( I want to alter content stream text order)?

79 Views Asked by At

As we know in pdf the content order is random. Can place text anywhere within the page. This leads to the pdf content order is random not like human reading order.

Because of this when pdf is reading by using some (Orbit Note, Read & Write) screen readers it will read based on the content panel order not tagging order (logical order).

But when you tag content in adobe it's sort the content panel order based on your tagging order or logical order (like Mcid order). Same I wanted to do in my case.

For example here I know exactly which order I want to arrange the text. I have BBOX(Rectangle) of each and every Tj and graphics state is available. Is it possible by using Itext or PDFBox can I alter the content order and save the pdf?

In case directly if I can't do? May I override any files in these (Itext or PDFBox) library's will help to achieve this functionality?

The code I tried , But didn't worked.

@Component
public class Contentorder {
@Autowired
public static List<Object> sortContentByMCID(List<Object> tokens)         
{
    int min = 2147483647;// max integer value
    int max = 0;
    Map<Integer, ArrayList> indexes = new HashMap<>();
//        ArrayList<ArrayList> indexes = new ArrayList<ArrayList>();
    ArrayList indices = new ArrayList();
    for (int ind=0; ind < tokens.size(); ind++) {
        if (tokens.get(ind) instanceof Operator) {
            Operator op = (Operator) tokens.get(ind);
            if(op.getName().equals("EMC") && indices.size() == 2){
                indices.add(ind+1);
                int key = (int) indices.get(0);
                indexes.put(key, indices);
            }
        }else if (tokens.get(ind) instanceof COSDictionary) {
            if( ((COSDictionary)tokens.get(ind)).containsKey("MCID")
                    &&  tokens.get(ind+1) instanceof Operator
                    && ((Operator) tokens.get(ind+1)).getName().equals("BDC")
                    && !((COSName) tokens.get(ind-1)).getName().equals("Figure") ) {
                if(min > ((COSInteger)((COSDictionary)tokens.get(ind)).getItem("MCID")).intValue())
                    min = ((COSInteger)((COSDictionary)tokens.get(ind)).getItem("MCID")).intValue();
                if(max < ((COSInteger)((COSDictionary)tokens.get(ind)).getItem("MCID")).intValue())
                    max = ((COSInteger)((COSDictionary)tokens.get(ind)).getItem("MCID")).intValue();
                indices = new ArrayList();
                indices.add(((COSInteger)((COSDictionary)tokens.get(ind)).getItem("MCID")).intValue());
                indices.add(getLastTf(tokens, ind));
            }
        }
    }
    System.out.println("print mcid min, max");
    System.out.println(indexes);
    for (Integer key : indexes.keySet()) {
        ArrayList lst = indexes.get(key);
        for (int ind=0; ind < tokens.size(); ind++) {
            if (tokens.get(ind) instanceof COSDictionary) {
                if( ((COSDictionary)tokens.get(ind)).containsKey("MCID")
                        &&  tokens.get(ind+1) instanceof Operator
                        && ((Operator) tokens.get(ind+1)).getName().equals("BDC")
                        && !((COSName) tokens.get(ind-1)).getName().equals("Figure") ) {
                    //some code
                    if(((COSInteger)((COSDictionary)tokens.get(ind)).getItem("MCID")).intValue() < key){
                        continue;
                    }else if(((COSInteger)((COSDictionary)tokens.get(ind)).getItem("MCID")).intValue() == key){
                        break;
                    }else{
                        List<Object> toks = tokens.subList((Integer) lst.get(1), (Integer) lst.get(2));
                        Integer size = toks.size();
                        Integer index = getLastTf(tokens, ind);
                        if(index < (Integer) lst.get(1)) {
                            tokens.addAll(index, toks);
                            tokens.subList((Integer) lst.get(1)+size, (Integer) lst.get(2)+size).clear();
                        }else{
                            tokens.addAll(index, toks);
                            tokens.subList((Integer) lst.get(1), (Integer) lst.get(2)).clear();
                        }
                        break;
                    }
                }
            }
        }
    }

    return tokens;
}

private static int getLastTf(List<Object> tokens, int ind) {
    for(int i = ind; i > 0; i--){
        if(tokens.get(i) instanceof Operator){
            if( ((Operator) tokens.get(i)).getName().equals("Tf")){
                return i-2;
            }
        }
    }
    return ind;
}

}

0

There are 0 best solutions below