I'm trying to find a specific string in a process's memory. Specifically I want to find the virtual address where it's stored. I wrote a python script to call gcore on the process and scan the resulting file for all matches. Then I call pmap and iterate through the entries there. My idea is to find the section of memory each index corresponds to, then subtract the sum of the sizes of previous sections to get the offset in the correct section, add it to the base, and get the virtual address. However, when I search for strings at the virtual addresses I'm computing using gdb, I don't find the right strings. Why doesn't my method work? Does gcore not dump the entire contents of virtual memory in order?
#!/usr/bin/python3
import sys
import ctypes
import ctypes.util
import subprocess
import os
import ptrace
import re
if(len(sys.argv) != 2):
print("Usage: search_and_replace.py target_pid")
sys.exit(-1)
pid = sys.argv[1]
if not pid.isdigit():
print("Invalid PID specified. Make sure PID is an integer")
sys.exit(-1)
bash_cmd = "sudo gcore -a {}".format(pid)
os.system(bash_cmd)
with open("core." + sys.argv[1], 'rb') as f:
s = f.read()
# with open("all.dump", 'rb') as f:
# s = f.read()
str_query = b'a random string in program\'s memory'
str_replc = b'This is an inserted string, replacing the original.'
indices = []
for match in re.finditer(str_query, s):
indices.append(match.start())
print("number of indices is " + str(len(indices)))
#index = s.find(str_query)
# print("offset is " + str(index))
# if(index == 0):
# print("error: String not found")
# sys.exit(-1)
bash_cmd = "sudo pmap -x {} > maps".format(pid)
print(bash_cmd)
subprocess.call(bash_cmd, shell=True)
with open("maps") as m:
lines = m.readlines()
#calculate the virtual address of the targeted string the running process via parsing the pmap output
pages = []
v_addrs = []
for index in indices:
sum = 0
offset = 0
v_addr = 0
#print(index)
for i in range(2, len(lines) - 2):
line = lines[i]
items = line.split()
v_addr = int(items[0], 16)
old_sum = sum
sum += int(items[1]) * 1024
if sum > index:
offset = index - old_sum
print("max is " + hex(v_addr + int(items[1]) * 1024))
print("offset is " + str(offset) + " hex " + hex(offset))
print("final va is " + hex(v_addr + offset))
pages.append(hex(v_addr) + ", " + hex(v_addr + int(items[1]) * 1024))
v_addrs.append(hex(v_addr + offset))
break
print("base va is " + hex(v_addr))
v_addr += offset
for page in set(pages):
print(page)
for va in v_addrs:
print(va)
On a related note, I also tried to use gdb to scan the file manually--it doesn't seem to find nearly as many matches when I use its find command to scan for the string in the region of memory in question (exact numbers vary greatly). Why is that?
You can use python code to locate various things in core files. The structer package includes an
elfmodule whoseElfclass provides methods for that. The following output from agdbsession has examples of how to use that code.The first excerpt of that session shows
gdbopening a core file which was generated bygcore, and providing some data for the subsequent searches.The next excerpt shows
gdbimporting the python code, and performing two searches based on the value of a local variable. The first search shows multiple addresses at which that value occurs (the value ofsymargandexecargis among them). Thefindbytesmethod requires abytesobject, not astrobject. The second search shows just one address which contains the address of the first match from the first search, which happens to have a name in the symbol table.The next excerpt shows other variations on the search. Searching for the
dirnameof the first search pattern turns up multiple hits, which include all of the hits from the first search. The subsequent search filters out the common hits by requiring a null terminator, and the one after that filters out hits which do not begin with a null terminator. Those last two searches report the same results, although the addresses differ by one, because the searches which require a leading null point at that leading null.The final excerpt separates the hits from the first search into two cases, those with leading nulls and those without leading nulls. The latter uses the most general type of search (the one that both
findbytesandfindwordsrely on) so that it can include the non-null characters preceding the fixed part of the search pattern.The
+ 1in the last command skips the leading null in that search hit, although that could also be incorporated into the search code, as follows.The structer code does not require
gdb; it can run in a python interpreter outside of gdb. It is not compatible with python2, so running it withingdbrequires agdbbinary linked against python3.5.Searching for patterns in a core file can report results which are not reported by the search methods in the structer code. There are two reasons for that. The structer code only searches the load segments, so it will not find the contents of note segments, which contains various things which do not correspond to virtual addresses in the core. The structer code does not find results which span multiple load segments, if two adjacent segments have a gap (an unmapped region between the segments). The code combines adjacent segments which are contiguous in the virtual address space, so a search result need not be confined to a single segment.