ebpf kprobe argument not matching the syscall

626 Views Asked by At

I'm learning eBPF and I'm playing with it in order to understand it better while following the docs but there's something I don't understand why it's not working...

I have this very simple code that stops the code and returns 5.

int main() {
   exit(5);
   return 0;
}

The exit function from the code above calls the exit_group syscall as can we can see by using strace (image below) yet within my Python code that's using eBPF through bcc the output I get for my bpf_trace_printk is the value 208682672 and not the value 5 that the exit_group syscall is called with as I was expecting...

strace return

from bcc import BPF

def main():
    bpftext = """
    #include <uapi/linux/ptrace.h>

    void my_exit(struct pt_regs *ctx, int status){
        bpf_trace_printk("%d", status);
    }
    """

    bpf = BPF(text=bpftext)
    fname = bpf.get_syscall_fnname('exit_group')
    bpf.attach_kprobe(event=fname, fn_name='my_exit')

    while True:
        print(bpf.trace_fields())


if __name__ == '__main__':
    main()

I've looked into whatever I found online but I couldn't find a solution as I've been investigating this problem for a few days now...

I truly appreciate any help available and thank you!

2

There are 2 best solutions below

4
Nick ODell On BEST ANSWER

Fix

You need to rename your function from my_exit to syscall__exit_group.

Why does this matter? BPF programs named in this way get special handling from BCC. Here's what the documentation says:

8. system call tracepoints

Syntax: syscall__SYSCALLNAME

syscall__ is a special prefix that creates a kprobe for the system call name provided as the remainder. You can use it by declaring a normal C function, then using the Python BPF.get_syscall_fnname(SYSCALLNAME) and BPF.attach_kprobe() to associate it.

Arguments are specified on the function declaration: syscall__SYSCALLNAME(struct pt_regs *ctx, [, argument1 ...]).

For example:

int syscall__execve(struct pt_regs *ctx,
    const char __user *filename,
    const char __user *const __user *__argv,
    const char __user *const __user *__envp)
{
    [...]
}

This instruments the execve system call.

Source.

Corrected Code

from bcc import BPF

def main():
    bpftext = """
    #include <uapi/linux/ptrace.h>

    void syscall__exit_group(struct pt_regs *ctx, int status){
        bpf_trace_printk("%d", status);
    }
    """

    bpf = BPF(text=bpftext)
    fname = bpf.get_syscall_fnname('exit_group')
    bpf.attach_kprobe(event=fname, fn_name='syscall__exit_group')

    while True:
        print(bpf.trace_fields())


if __name__ == '__main__':
    main()

Output from the sample program exiting:

(b'<...>', 14896, 0, b'd...1', 3996.079261, b'5')

How it Works

After BCC transforms your BPF program, this results in a slightly different interpretation of the arguments passed. You can use bpf = BPF(text=bpftext, debug=bcc.DEBUG_PREPROCESSOR) to see how your code is transformed.

Here's what happens without the syscall__ prefix:

void my_exit(struct pt_regs *ctx){
 int status = ctx->di;
        ({ char _fmt[] = "%d"; bpf_trace_printk_(_fmt, sizeof(_fmt), status); });
    }

This reads in the RDI register and interprets it as the syscall argument.

On the other hand, here's what happens if it's named syscall__exit_group:

void syscall__exit_group(struct pt_regs *ctx){
#if defined(CONFIG_ARCH_HAS_SYSCALL_WRAPPER) && !defined(__s390x__)
 struct pt_regs * __ctx = ctx->di;
 int status; bpf_probe_read(&status, sizeof(status), &__ctx->di);
#else
 int status = ctx->di;
#endif

        ({ char _fmt[] = "%d"; bpf_trace_printk_(_fmt, sizeof(_fmt), status); });
    }

If the CONFIG_ARCH_HAS_SYSCALL_WRAPPER is defined (it is on x86_64) then the RDI register is interpreted as a pointer to a struct pt_regs, which looks up the RDI register in that, which is the first argument to exit_group().

On systems without syscall wrappers, this does the same thing as the previous example.

3
Marco Bonelli On

I am not sure if your probe function should take 3 arguments. They seem to many. In any case, the struct pt_regs *ctx you have should already hold any information you need. You should be able to read any register value through dedicated macros (PT_REGS_xxx) or manually accessing the structure fields.

The first syscall argument can be extracted with PT_REGS_PARM1:

    bpftext = """
    #include <uapi/linux/ptrace.h>

    void my_exit(struct pt_regs *ctx){
        bpf_trace_printk("%ld\\n", PT_REGS_PARM1(ctx));
    }
    """