I have a program (parent.rs) that will:
- Fork
- Create a
userfaultfdon the child, then transfer it to the parent (withpidfd_getfd). - Execs the child (
child.rs) - Child allocates memory with
mmapand "sends" it to the parent - Parent receives the memory pointer and tries to register it with the transferred
userfaultfdobject
Unfortunately this fails at the last step with ENOMEM.
I'm aware that, due to virtual memory, the child pointer will not be valid on the parent. But I expect it to be valid to the userfaultfd handle I have, since it was originally created on the parent and, due to not having O_CLOEXEC, should remain open afterwards (and I've verified this by reading /proc/{child_pid}/fd).
Cargo.toml:
[package]
name = "..."
version = "0.1.0"
edition = "2021"
[dependencies]
nix = "0.26.2"
pidfd = "0.2.4"
pidfd_getfd = { version = "0.2.1", features = ["nightly"] }
pipe-channel = "1.3.0"
rustix = { version = "0.37.3", features = ["mm"] }
userfaultfd = { version = "0.5.1", features = ["linux4_14", "linux5_7"] }
examples/parent.rs:
use {
nix::unistd,
rustix::fd::{AsRawFd, FromRawFd},
std::{
ffi::{self, CString},
mem,
},
userfaultfd::Uffd,
};
fn main() {
let child_name = std::env::args().nth(1).expect("Expected argument");
// Fork and execute the child
let (mut uffd_tx, mut uffd_rx) = pipe_channel::channel();
let (mut ready_tx, mut ready_rx) = pipe_channel::channel();
let child_pid = match unsafe { unistd::fork() }.expect("Unable to fork") {
unistd::ForkResult::Parent { child } => child,
unistd::ForkResult::Child => {
// Open the uffd and send it to the parent
// Note: We forget it so it doesn't get closed.
let uffd = userfaultfd::UffdBuilder::new()
.close_on_exec(false)
.user_mode_only(true)
.non_blocking(false)
.create()
.expect("Unable to create uffd");
uffd_tx.send(uffd.as_raw_fd()).expect("Unable to send uffd");
mem::forget(uffd);
// Wait until the monitor process is ready
ready_rx.recv().expect("Unable to wait for parent");
// Then execute the child
println!("Executing child");
let path = CString::new(child_name.clone()).unwrap();
let args = [CString::new(child_name).unwrap()];
unistd::execv(&path, &args).expect("Unable to `execv`");
unreachable!();
},
};
// Open a pid_fd for the child process
let child_pidfd =
unsafe { pidfd::PidFd::open(child_pid.as_raw(), 0) }.expect("Unable to allocate pidfd for child process");
// Receive the uffd from the child
let child_uffd_fd = uffd_rx.recv().expect("Unable to receive uffd");
let uffd_fd = unsafe { pidfd_getfd::pidfd_getfd(child_pidfd.as_raw_fd(), child_uffd_fd, 0) };
let uffd = unsafe { Uffd::from_raw_fd(uffd_fd) };
// Tell the child we're ready to execute
ready_tx.send(()).expect("Unable to send parent event");
// Then read the pointer it wrote
std::thread::sleep(std::time::Duration::from_secs(1));
let page = std::fs::read("ptr").expect("Unable to read pointer");
let page = page.try_into().expect("File wasn't the right size");
let page = usize::from_le_bytes(page);
let page = page as *mut ffi::c_void;
// Prove the pointer is on the process's maps
let memory_map =
std::fs::read_to_string(format!("/proc/{}/maps", child_pid.as_raw())).expect("Unable to read memory maps");
assert!(memory_map.contains(&format!("{:x}", page as usize)));
// Then try to register
uffd.register(page, 4096).expect("Unable to register dummy pointer");
}
examples/child.rs:
use {rustix::mm, std::ptr};
pub fn main() {
// Allocate the page
println!("Child: Allocating");
let page = unsafe {
mm::mmap_anonymous(
ptr::null_mut(),
4096,
mm::ProtFlags::READ | mm::ProtFlags::WRITE,
mm::MapFlags::PRIVATE,
)
.expect("Unable to allocate page")
};
// Write to file
println!("Child: Writing to file");
std::fs::write("ptr", (page as usize).to_le_bytes()).expect("Unable to write");
println!("Child: Sleeping");
loop {
std::thread::park();
}
}
This is run with cargo build --examples && ./target/debug/examples/parent ./target/debug/examples/child
(Note the uffd handler isn't here, but even with it it doesn't work, I removed it to create a smaller mvcp).
How can I make userfaultfd actually work here and register on the child?
If it helps, I'm using ptrace in my actual application, so I can control the child much more easily, if I need to do it at a specific time.
I have currently found a very hacky workaround using LD_PRELOAD, by creating the uffd object after execve on a shared library loaded by LD_PRELOAD, then transfering it to the parent with pipes (whose fds are given to the library by environment variables). This unfortunately is not great because:
I need to use the non-usermode + fork features of uffd, which requires theI've managed to fix this by setting thecap_sys_ptracecapability. In turn this capability disables the ability toLD_PRELOAD. I've heard that if the library has the setuid bit and is owned by root, it should still work, but I haven't been able to get it working.cap_sys_ptracecapability on the parent, then raising theInheritableandAmbientcapability sets ofcap_sys_ptrace(still on the parent). This preserves the capability afterexecv, without disabling the ability to load aLD_PRELOADlibrary.- It delays the creation of the uffd. In my use case, I'm interested in tracking allocations, and some do happen before
LD_PRELOADinitialization. This isn't a major issue, but it does complicate the design of the program, as now we need to go back and check all allocations that happened before the uufd was loaded. - It's a very hacky and unsecure method, which could easily break and/or have security vulnerabilities.