why direct io much slower than non-direct io on SSD(performance measured after system cache cleared)?

433 Views Asked by At

I'm new to optimizing disk IO performance. I compared the performance of reading from file with or without direct IO enabled. The chunk size is 512KiB. As Direct IO reads data from disk directly to buffer in user space, I think Direct IO should be faster than non Direct IO(data is not cached before measurement). However, the result is that non Direct IO is much faster than Direct IO. But if I change the chunk size to 2MiB, speed is equal. Here is the test result:

ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null iflag=direct bs=512K count=1024
1024+0 records in
1024+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 1.32862 s, 404 MB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null bs=512K count=1024
1024+0 records in
1024+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.365581 s, 1.5 GB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null bs=2M count=256
256+0 records in
256+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.370193 s, 1.5 GB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null iflag=direct bs=2M count=256
256+0 records in
256+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.36575 s, 1.5 GB/s

output of df:

ps@701083:/mnt/md0/cuda-learning$ df -h
Filesystem                         Size  Used Avail Use% Mounted on
udev                                32G     0   32G   0% /dev
tmpfs                              6.3G  1.3M  6.3G   1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv  117G   28G   83G  26% /
tmpfs                               32G     0   32G   0% /dev/shm
tmpfs                              5.0M  4.0K  5.0M   1% /run/lock
tmpfs                               32G     0   32G   0% /sys/fs/cgroup
/dev/md0                           3.5T  756G  2.6T  23% /mnt/md0
/dev/nvme0n1p2                     976M  204M  705M  23% /boot
/dev/nvme0n1p1                     511M  6.7M  505M   2% /boot/efi
tmpfs                              6.3G     0  6.3G   0% /run/user/1000
ps@701083:/mnt/md0/cuda-learning$ 

Why?

0

There are 0 best solutions below