The reason why I ask this question is that, when testing the behavior of the Linux soft-dirty bit, I found that if I create a thread without touching any memory, the soft-dirty bit of all pages will be set to 1 (dirty).
For example, malloc(100MB) in the main thread, then clean soft dirty bits, then create a thread that just sleeps. After the thread is created, the soft-dirty bit of all that 100MB memory chunk is set to 1.
Here is the test program I'm using:
#include <thread>
#include <iostream>
#include <vector>
#include <cstdint>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#define PAGE_SIZE_4K 0x1000
int GetDirtyBit(uint64_t vaddr) {
int fd = open("/proc/self/pagemap", O_RDONLY);
if (fd < 0) {
perror("Failed open pagemap");
exit(1);
}
off_t offset = vaddr / 4096 * 8;
if (lseek(fd, offset, SEEK_SET) < 0) {
perror("Failed lseek pagemap");
exit(1);
}
uint64_t pfn = 0;
if (read(fd, &pfn, sizeof(pfn)) != sizeof(pfn)) {
perror("Failed read pagemap");
sleep(1000);
exit(1);
}
close(fd);
return pfn & (1UL << 55) ? 1 : 0;
}
void CleanSoftDirty() {
int fd = open("/proc/self/clear_refs", O_RDWR);
if (fd < 0) {
perror("Failed open clear_refs");
exit(1);
}
char cmd[] = "4";
if (write(fd, cmd, sizeof(cmd)) != sizeof(cmd)) {
perror("Failed write clear_refs");
exit(1);
}
close(fd);
}
int demo(int argc, char *argv[]) {
int x = 1;
// 100 MB
uint64_t size = 1024UL * 1024UL * 100;
void *ptr = malloc(size);
for (uint64_t s = 0; s < size; s += PAGE_SIZE_4K) {
// populate pages
memset(ptr + s, x, PAGE_SIZE_4K);
}
char *cptr = reinterpret_cast<char *>(ptr);
printf("Soft dirty after malloc: %ld, (50MB offset)%ld\n",
GetDirtyBit(reinterpret_cast<uint64_t>(cptr)),
GetDirtyBit(reinterpret_cast<uint64_t>(cptr + 50 * 1024 * 1024)));
printf("ALLOCATE FINISHED\n");
std::string line;
std::vector<std::thread> threads;
while (true) {
sleep(2);
// Set soft dirty of all pages to 0.
CleanSoftDirty();
char *cptr = reinterpret_cast<char *>(ptr);
printf("Soft dirty after reset: %ld, (50MB offset)%ld\n",
GetDirtyBit(reinterpret_cast<uint64_t>(cptr)),
GetDirtyBit(reinterpret_cast<uint64_t>(cptr + 50 * 1024 * 1024)));
// Create thread.
threads.push_back(std::thread([]() { while(true) sleep(1); }));
sleep(2);
printf("Soft dirty after create thread: %ld, (50MB offset)%ld\n",
GetDirtyBit(reinterpret_cast<uint64_t>(cptr)),
GetDirtyBit(reinterpret_cast<uint64_t>(cptr + 50 * 1024 * 1024)));
// memset the first 20MB
memset(cptr, x++, 1024UL * 1024UL * 20);
printf("Soft dirty after memset: %ld, (50MB offset)%ld\n",
GetDirtyBit(reinterpret_cast<uint64_t>(cptr)),
GetDirtyBit(reinterpret_cast<uint64_t>(cptr + 50 * 1024 * 1024)));
}
return 0;
}
int main(int argc, char *argv[]) {
std::string last_arg = argv[argc - 1];
printf("PID: %d\n", getpid());
return demo(argc, argv);
}
I print the dirty bit of the first page, and the page at offset 50 * 1024 * 1024. Here is what happens:
- The soft-dirty bits after
malloc()are 1, which is expected. - After clean soft-dirty, they become 0.
- Create a thread that just sleeps.
- Check dirty bit, all pages in the 100MB region (I didn't print dirty bits of all pages, but I did the check on my own) now have the soft-dirty bit set to 1.
- Restart the loop, now the behavior is correct, soft-dirty bits remain 0 after creating additional threads.
- The soft-dirty bit of the page at offset 0 is 1 since I did
memset(), and the soft-dirty bit of page50 MBremains 0.
Here is the output:
Soft dirty after malloc: 1, (50MB offset)1
ALLOCATE FINISHED
Soft dirty after reset: 0, (50MB offset)0
Soft dirty after create thread: 1, (50MB offset)1
Soft dirty after memset: 1, (50MB offset)1
(steps 1-4 above)
(step 5 starts below)
Soft dirty after reset: 0, (50MB offset)0
Soft dirty after create thread: 0, (50MB offset)0
Soft dirty after memset: 1, (50MB offset)0
Soft dirty after reset: 0, (50MB offset)0
Soft dirty after create thread: 0, (50MB offset)0
Soft dirty after memset: 1, (50MB offset)0
Soft dirty after reset: 0, (50MB offset)0
Soft dirty after create thread: 0, (50MB offset)0
Soft dirty after memset: 1, (50MB offset)0
I thought thread creation would just mark the pages as being in a "shared" state, not modify them, so the soft-dirty bit should remain unchanged. Apparently, the behavior is different. Therefore I'm thinking: does creating a thread trigger page faults on all of the pages? So the OS sets all pages' soft-dirty bit to 1 when handling the page fault.
If this is not the case, why does creating a thread make all memory pages of the process become "dirty"? Why does only the first thread creation have such behavior?
I hope I explained the question well, please let me know if more details are needed, or if anything doesn't make sense.
So, this is kind of funny and interesting. Your specific situation, as well as the behavior of the soft-dirty bits, are quite peculiar. No page faults are happening, and the soft-dirty bit is not being set on all memory pages, but just on some of them (the ones you allocated through
malloc).If you run your program under
straceyou will notice a couple of things that will help explain what you are observing:As you can see above:
Your
malloc()is pretty large, so you will not get a normal heap chunk, but a dedicated memory area reserved through ammapsyscall.When you create a thread, library code sets up a stack for the thread through another
mmapfollowed bymprotect.The normal
mmapbehavior in Linux is to reserve memory starting from ammap_basechosen at process creation time, subtracting each time the size of the request (unless a specific address is explicitly requested, in which casemmap_baseis not considered). For this reason, themmapat point 1 will reserve pages right above the last shared library mapped by the dynamic loader, and themmapat point 2 above will reserve pages right before the pages mapped at point 1. Themprotectwill then mark this second area (except for the very first page) as RW.Since these mappings are contiguous, both anonymous and both with the same protections (RW), from the kernel's perspective this looks like a single memory region that has grown in size. In fact, the kernel treats this as a single VMA (
vm_area_struct).Now, as we can read from the kernel documentation about the soft-dirty bit (notice the part I highlighted in bold):
So the reason why you see the soft-dirty bit re-appear on the initial malloc'd chunk of memory after clearing it is a funny coincidence: a result of the not-so-intuitive "expansion" of the memory region (VMA) containing it caused by the allocation of the thread stack.
To make things clearer, we can inspect the virtual memory layout of the process through
/proc/[pid]/mapsat different stages. It will look something like this (taken from my machine):Before
malloc():After
malloc():After creating the first thread (notice how the start of the VMA changes from
7f8669b66000to7f8669366000since it has grown in size):You can clearly see that, after creating the thread, the kernel shows the two memory regions (thread stack + your malloc'd chunk) together as a single VMA, given that they are contiguous, anonymous and have the same protections (
rw).The guard page above the thread stack is treated as a separate VMA (it has different protections), and subsequent threads will
mmaptheir stack above it, so they will not affect the soft-dirty bits of your original memory region:This is why from the second thread onward you don't see anything unexpected happening.