Python-PyTorch Static Analysis Tool

50 Views Asked by At

I am familiar with angr - a binary static analysis tool, where given a statically compiled C binary, for example, I could get the control flow graph out of the program and go down to the level of system call. I have experience building a system call sandbox for statically compiled C programs using angr.

I was thinking of a similar tool for Python programs. To be more specific, PyTorch programs.

A tool using which I could extract the control flow graphs across all the modules that our main Python script imports and possibly go down to the level of CUDA API calls from the Python/PyTorch level.

With angr, this was not an issue, provide a statically compiled binary with build symbols, and getting the control flow graph was easy, as every single piece of code required for the executable to work was packaged into the single binary.

I had a rough look at Scalpel. The tool promises many things, but the documentation is not that clean. On top of that, operator overloading stuff like overloading * or + (which is usually the case for operations on PyTorch tensors, where a simple * operator is overloaded internally to make CUDA kernel call on the GPU) are not detected and are being considered as simple expressions by the abstract syntax tree.

The Scalpel tool talks about static code rewriting for simplification of the subsequent static analysis, which makes me think of using it to first do typeinference to get the type of the operands and then rewrite the code to make use of the functions __add__ or __sub__ for instance for operator overloading stuff.

But it requires me to do a lot of stuff in building the static analyzer myself before using it.

Any step-by-step directions on building what I think of using more than one tool is also acceptable.

0

There are 0 best solutions below