ASM documentation says a label represent a basic block, and it is a node in the control graph. So I test the visitLabel method on this simple example:
public static void main(String[] args) {
int x = 3, y = 4;
if (x < y) {
x++;
}
}
For the visitLabel method, I instrument it with a native API: setID(int id), where the id is incremental. In this example, a CFG should have 3 nodes: one at the beginning, and one for each branch of the if statement. So I expect setID would be called in 3 locations. However, it is called 5 times, and there are a lot of nop instructions. Could anybody explain for me why?
Here is the instrumented bytecode for the above program.
public static void main(java.lang.String[]);
Code:
0: iconst_2
1: invokestatic #13 // Method setId:(I)V
4: iconst_3
5: istore_1
6: iconst_3
7: invokestatic #13 // Method setId:(I)V
10: iconst_4
11: istore_2
12: iconst_4
13: invokestatic #13 // Method setId:(I)V
16: iload_1
17: iload_2
18: if_icmpge 28
21: iconst_5
22: invokestatic #13 // Method setId:(I)V
25: iinc 1, 1
28: bipush 6
30: invokestatic #13 // Method setId:(I)V
33: return
34: nop
35: nop
36: nop
37: nop
38: athrow
What I don't understand is why there is a label before each istore instruction. There is no branching to make it a new node in the CFG.
The primary purpose of a
Labelis to denote a position in the bytecode sequence. Since this is needed for branch targets, you can use them to identify basic blocks. But you have to be aware, that they are also used for reporting line numbers, when aLineNumberTableattribute is present and for reporting local variable scopes when aLocalVariableTableattribute is present as well as, for newer class files, their type annotations recorded in aRuntimeVisibleTypeAnnotationsattribute. Further, labels may mark the protected area of an exception handler. For code generated from Java source code, this protected area matches thetryblock, so its a basic block, but that doesn’t need to hold for other bytecode.See
visitLocalVariable(java.lang.String name, java.lang.String descriptor, java.lang.String signature, Label start, Label end, int index)visitLocalVariableAnnotation(int typeRef, TypePath typePath, Label[] start, Label[] end, int[] index, java.lang.String descriptor, boolean visible)visitTryCatchBlock(Label start, Label end, Label handler, java.lang.String type)Since the scope of local variables may span the last
returninstruction, it is possible to encounter labels after that last instruction, which is what happens in your case. You are injecting abipush 7, invokestatic #13after thereturninstruction, resulting in unreachable code.Apparently, you are also using the
COMPUTE_FRAMESoptions to let ASM recalculate stack map frames from scratch, but it is impossible to calculate frames for unreachable code, due to the unknown initial stack state. ASM solves this problem by replacing unreachable code withnopinstructions, followed by a singleathrowstatement. For this sequence, it’s possible to specify a valid initial stack frame and it has no impact on the execution (as the code is unreachable).As you can see, four
nopinstructions plus oneathrowinstruction span five bytes, which is the same size as the replacedbipush 7, invokestatic #13sequence had.You can get rid of most of these reported labels, by specifying
ClassReader.SKIP_DEBUGto itsacceptmethod. Then, you get only one reported label for your example, the branch target associated with theifstatement. But you have to handle thevisitJumpInsnto identify the start of the conditional code.So, to identify all basic blocks, you have to handle all branch instructions, i.e. via
visitJumpInsn,visitLookupSwitchInsn, andvisitTableSwitchInsn, as well as all end instructions, i.e.athrowand all variants ofreturn. Further, you need to process allvisitTryCatchBlockcalls. If you need to identify potential targets of branch instructions in a single pass, I’d usevisitFrameinstead of labels, as frames are mandatory at all branch targets for class file version of 51 (Java 7) or higher.By the way, when all you’re injecting are these sequences of loading a constant and calling a static method (at reachable locations), I’d use
COMPUTE_MAXSinstead ofCOMPUTE_FRAMES, as an expensive recalculation is not necessary when the general code structure does not change.