How to reverse the virtual address of string from a core dump?

977 Views Asked by At

I'm trying to find a specific string in a process's memory. Specifically I want to find the virtual address where it's stored. I wrote a python script to call gcore on the process and scan the resulting file for all matches. Then I call pmap and iterate through the entries there. My idea is to find the section of memory each index corresponds to, then subtract the sum of the sizes of previous sections to get the offset in the correct section, add it to the base, and get the virtual address. However, when I search for strings at the virtual addresses I'm computing using gdb, I don't find the right strings. Why doesn't my method work? Does gcore not dump the entire contents of virtual memory in order?

#!/usr/bin/python3
import sys
import ctypes
import ctypes.util
import subprocess
import os
import ptrace
import re

if(len(sys.argv) != 2):
    print("Usage: search_and_replace.py target_pid")
    sys.exit(-1)

pid = sys.argv[1]
if not pid.isdigit():
    print("Invalid PID specified.  Make sure PID is an integer")
    sys.exit(-1)

bash_cmd = "sudo gcore -a {}".format(pid)
os.system(bash_cmd)

with open("core." + sys.argv[1], 'rb') as f:
    s = f.read()
# with open("all.dump", 'rb') as f:
#   s = f.read()

str_query = b'a random string in program\'s memory'
str_replc = b'This is an inserted string, replacing the original.'
indices = []
for match in re.finditer(str_query, s):
    indices.append(match.start())
print("number of indices is " + str(len(indices)))

#index = s.find(str_query)

# print("offset is " + str(index))
# if(index == 0):
#   print("error: String not found")
#   sys.exit(-1)

bash_cmd = "sudo pmap -x {} > maps".format(pid)
print(bash_cmd)
subprocess.call(bash_cmd, shell=True)

with open("maps") as m:
    lines = m.readlines()

#calculate the virtual address of the targeted string the running process via parsing the pmap output
pages = []
v_addrs = []

for index in indices:
    sum = 0
    offset = 0
    v_addr = 0  
    #print(index)
    for i in range(2, len(lines) - 2):
        line = lines[i]
        items = line.split()
        v_addr = int(items[0], 16)
        old_sum = sum
        sum += int(items[1]) * 1024
        if sum > index:
            offset = index - old_sum
            print("max is " + hex(v_addr + int(items[1]) * 1024))
            print("offset is " + str(offset) + " hex " + hex(offset))
            print("final va is " + hex(v_addr + offset))
            pages.append(hex(v_addr) + ", " + hex(v_addr + int(items[1]) * 1024))
            v_addrs.append(hex(v_addr + offset))
            break

print("base va is " + hex(v_addr))
v_addr += offset

for page in set(pages):
    print(page)

for va in v_addrs:
    print(va)

On a related note, I also tried to use gdb to scan the file manually--it doesn't seem to find nearly as many matches when I use its find command to scan for the string in the region of memory in question (exact numbers vary greatly). Why is that?

1

There are 1 best solutions below

0
Eirik Fuller On BEST ANSWER

You can use python code to locate various things in core files. The structer package includes an elf module whose Elf class provides methods for that. The following output from a gdb session has examples of how to use that code.

The first excerpt of that session shows gdb opening a core file which was generated by gcore, and providing some data for the subsequent searches.

18:33:00 $ gdb -q /home/efuller/gnu/bin/gdb core.17856 
Reading symbols from /home/efuller/gnu/bin/gdb...done.
[New LWP 17856]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/efuller/gnu/bin/gdb /home/efuller/gnu/bin/gdb'.
Program terminated with signal SIGINT, Interrupt.
#0  0x00007ffff62c5660 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:84
84  ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) backtrace
#0  0x00007ffff62c5660 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1  0x00005555557f7ea6 in gdb_wait_for_event (block=1) at event-loop.c:772
#2  0x00005555557f7185 in gdb_do_one_event () at event-loop.c:347
#3  0x00005555557f71bd in start_event_loop () at event-loop.c:371
#4  0x00005555557f003a in captured_command_loop (data=0x0) at main.c:324
#5  0x00005555557eb2e9 in catch_errors (func=0x5555557efff8 <captured_command_loop(void*)>, func_args=0x0, errstring=0x555555b4f733 "", mask=RETURN_MASK_ALL) at exceptions.c:236
#6  0x00005555557f16e2 in captured_main (data=0x7fffffffea10) at main.c:1149
#7  0x00005555557f170b in gdb_main (args=0x7fffffffea10) at main.c:1159
#8  0x00005555555f2daa in main (argc=2, argv=0x7fffffffeb18) at gdb.c:32
(gdb) frame 6
#6  0x00005555557f16e2 in captured_main (data=0x7fffffffea10) at main.c:1149
1149              catch_errors (captured_command_loop, 0, "", RETURN_MASK_ALL);
(gdb) info locals
context = 0x7fffffffea10
argc = 2
argv = 0x7fffffffeb18
quiet = 0
set_args = 0
inhibit_home_gdbinit = 0
symarg = 0x7fffffffed8e "/home/efuller/gnu/bin/gdb"
execarg = 0x7fffffffed8e "/home/efuller/gnu/bin/gdb"
pidarg = 0x0
corearg = 0x0
pid_or_core_arg = 0x0
cdarg = 0x0
ttyarg = 0x0
print_help = 0
print_version = 0
print_configuration = 0
cmdarg_vec = 0x0
cmdarg_p = 0x0
dirarg = 0x555555fdeb80
dirsize = 1
ndir = 0
system_gdbinit = 0x0
home_gdbinit = 0x555556174960 "/home/efuller/.gdbinit"
local_gdbinit = 0x0
i = 0
save_auto_load = 1
objfile = 0x0
pre_stat_chain = 0x555555b2c000 <sentinel_cleanup>
(gdb) 

The next excerpt shows gdb importing the python code, and performing two searches based on the value of a local variable. The first search shows multiple addresses at which that value occurs (the value of symarg and execarg is among them). The findbytes method requires a bytes object, not a str object. The second search shows just one address which contains the address of the first match from the first search, which happens to have a name in the symbol table.

(gdb) pi
>>> from structer import memmap, elf
>>> core = elf.Elf(memmap('core.17856'))
>>> from pprint import pprint
>>> 
(gdb) python pprint(tuple(hex(a) for a in core.findbytes(b"/home/efuller/gnu/bin/gdb")))
('0x555555fdef30',
 '0x55555606fce0',
 '0x55555614ff72',
 '0x5555562496a0',
 '0x55555624b915',
 '0x55555625f250',
 '0x5555562c6c4b',
 '0x55555689f2b5',
 '0x7ffff5f2d490',
 '0x7fffffffed74',
 '0x7fffffffed8e',
 '0x7fffffffedf0',
 '0x7fffffffefde')
(gdb) python pprint(tuple(hex(a) for a in core.findwords(0x555555fdef30)))
('0x555555faea38',)
(gdb) x/a 0x555555faea38
0x555555faea38 <_ZL16gdb_program_name>:     0x555555fdef30
(gdb) 

The next excerpt shows other variations on the search. Searching for the dirname of the first search pattern turns up multiple hits, which include all of the hits from the first search. The subsequent search filters out the common hits by requiring a null terminator, and the one after that filters out hits which do not begin with a null terminator. Those last two searches report the same results, although the addresses differ by one, because the searches which require a leading null point at that leading null.

(gdb) python pprint(tuple(hex(a) for a in core.findbytes(b"/home/efuller/gnu/bin")))
('0x555555b4f701',
 '0x555555bd33f0',
 '0x555555fdef30',
 '0x55555606fce0',
 '0x55555614ff72',
 '0x5555562496a0',
 '0x55555624b915',
 '0x55555625f250',
 '0x5555562c6c4b',
 '0x55555689f2b5',
 '0x7ffff5f2d490',
 '0x7fffffffed74',
 '0x7fffffffed8e',
 '0x7fffffffedf0',
 '0x7fffffffefde')
(gdb) python pprint(tuple(hex(a) for a in core.findbytes(b"/home/efuller/gnu/bin\x00")))
('0x555555b4f701', '0x555555bd33f0')
(gdb) python pprint(tuple(hex(a) for a in core.findbytes(b"\x00/home/efuller/gnu/bin\x00")))
('0x555555b4f700', '0x555555bd33ef')
(gdb) 

The final excerpt separates the hits from the first search into two cases, those with leading nulls and those without leading nulls. The latter uses the most general type of search (the one that both findbytes and findwords rely on) so that it can include the non-null characters preceding the fixed part of the search pattern.

(gdb) python pprint(tuple(hex(a) for a in core.findbytes(b"\x00/home/efuller/gnu/bin/gdb")))
('0x555555fdef2f',
 '0x55555606fcdf',
 '0x55555624969f',
 '0x55555625f24f',
 '0x7fffffffed73',
 '0x7fffffffed8d',
 '0x7fffffffefdd')
(gdb) python import re
(gdb) python pprint(tuple(hex(a) for a in core.find(re.compile(rb"\x00[^\x00]+/home/efuller/gnu/bin/gdb"))))
('0x55555614ff6f',
 '0x55555624b8ff',
 '0x5555562c6c37',
 '0x55555689f297',
 '0x7ffff5f2d487',
 '0x7fffffffeded')
(gdb) x/s 0x55555614ff6f + 1
0x55555614ff70:     "_=/home/efuller/gnu/bin/gdb"
(gdb) 

The + 1 in the last command skips the leading null in that search hit, although that could also be incorporated into the search code, as follows.

(gdb) python pprint(tuple(hex(a+1) for a in core.find(re.compile(rb"\x00[^\x00]+/home/efuller/gnu/bin/gdb"))))
('0x55555614ff70',
 '0x55555624b900',
 '0x5555562c6c38',
 '0x55555689f298',
 '0x7ffff5f2d488',
 '0x7fffffffedee')
(gdb) 

The structer code does not require gdb; it can run in a python interpreter outside of gdb. It is not compatible with python2, so running it within gdb requires a gdb binary linked against python3.5.

Searching for patterns in a core file can report results which are not reported by the search methods in the structer code. There are two reasons for that. The structer code only searches the load segments, so it will not find the contents of note segments, which contains various things which do not correspond to virtual addresses in the core. The structer code does not find results which span multiple load segments, if two adjacent segments have a gap (an unmapped region between the segments). The code combines adjacent segments which are contiguous in the virtual address space, so a search result need not be confined to a single segment.