The LD_PRELOAD technique allows us to supply our own custom standard library functions to an existing binary, overriding the standard ones or manipulating their behaviour, giving a fun way to experiment with a binary and understand its behaviour.
I've read that LD_PRELOAD can be used to "checkpoint" a program --- that is, to produce a record of the full memory state, call stack and instruction pointer at any given time --- allowing us to "reset" the program back to that previous state at will.
It's clear to me how we can record the state of the heap. Since we can provide our own version of malloc and related functions, our preloaded library can obviously gain perfect knowledge of the memory state.
What I can't work out is how our preloaded functions can determine the call stack and instruction pointer; and then reset them at a later time to the previously recorded value. Clearly this is necessary for checkpointing. Are there standard library functions that can do this? Or is a different technique required?
That is a gross simplification. This "checkpoint" mechanism can not possibly restore any open file descriptors, or any mutexes, since the state of these is partially inside the kernel.
The instruction pointer is inside the preloaded function, and is trivially available as e.g.
register void *rip __asm__("rip")onx86_64. But you (likely) don't care about that address -- you probably care about the caller of your function. That is also trivially available as__builtin_return_address()(at least when using GCC).And the rest of the call stack is saved in memory (in the stack region to be more precise), so if you know the contents of memory, you know the call stack.
Indeed, when you use e.g. GDB
wherecommand with acoredump, that's exactly what GDB does -- it reads contents of memory from thecoreand recovers the call stack from it.Update:
Inspecting memory works the same regardless of whether that memory "belongs" to heap, stack, or code. You simply dereference a pointer and voilà -- you get the contents of memory at that location.
What you probably mean is:
The answer to the first question is OS-specific, and you didn't tag your question with any OS.
Assuming you are on Linux, one way to locate the stack is to parse entries in
/proc/self/mapslooking for an entry (continuous address range) which "covers" current stack (i.e. "covers" an address of any local variable).For the second question, the answer is:
1To figure out how to decode stack, you could look at sources for debuggers (such as GDB and LLDB).
This is also very OS and processor specific.
You would need to know calling conventions. On
x86_64you would need to know about unwind descriptors. To find local variables, you would need to know aboutDWARFdebugging format.Did I mention it's complicated?