How do I ensure data is written to the physical media?

495 Views Asked by At

I have a program that is called by a script. This program writes a lot of data to a file on the disk and then terminates. As soon as it is done running, the script kills power to the entire system.

The problem I am having is that the file does not get written in its entirety. If it is a 4GiB file, only around 2GiB will actually be on the disk when I review it later. The only way I have been able to reliably ensure all data is written is to sleep the program for a small period once it's done before exiting but that is a really bad and unreliable hack that I don't want to use. Here is some sample code of what my latest attempt involved:

int main () {
    FILE *output;
    output = fopen("/logs/data", "w");

    [fwrite several GiB of data to output]

    fflush(output);

    int fdo = open("/logs", O_RDONLY);
    fsync(fdo);

    fclose(output);
    close(fdo);

    return 0;
}

I initially tried building my FILE with a file descriptor and calling fsync() on the descriptor used (/logs/data) however that produced the same issue. According to the spec for fsync(2):

Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed.

which led me to the code I have above, creating a specific file descriptor just for the directory containing my data file and calling fsync() on that. However the results were the same. I don't really understand why this is happening because fsync() is supposed to be blocking:

The call blocks until the device reports that the transfer has completed.

additionally as you can see I added an fflush() on the FILE thinking maybe fsync() was only syncing data that had previously been flushed but this did not make any difference in the situation.

I need to somehow verify that the data has in fact been written to the physical media before ending the program and I'm not sure how to do that. I see that there are some files such as /sys/block/[device]/[partition]/stat which can tell me how many dirty blocks are left to write and I can wait for that value to hit 0 but this doesn't seem like a great way to solve what should be a simple issue and in addition if any other program is operating on the disk then I don't want to be waiting on them to sync their data as well since I only care about the integrity of this specific file and the stat file does not discriminate.

EDIT As per a suggestion I attempted to fsync() twice, first on the file and then on the directory:

int main () {
    FILE *output;
    int fd = open("/logs/data", O_WRONLY | O_CREAT, 660);
    output = fdopen(fd, "w");

    [fwrite several GiB of data to output]

    fsync(fd);
    int fdo = open("/logs", O_RDONLY);
    fsync(fdo);

    fclose(output);
    close(fd);
    close(fdo);

    return 0;
}

This produced some interesting output. With a 4GiB (4294967296 bytes) file, the actual size of data on the disk was 4294963200, which just so happens to be 1 page file (4096 bytes) off from the total value. It seems to be very close to a working solution, but it is still not guaranteeing every single byte of data.

3

There are 3 best solutions below

1
Rachid K. On

Have you considered passing the O_DIRECT and/or O_SYNC flags to open() ? From open() manual:

O_DIRECT
Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user-space buffers. The O_DIRECT flag on its own makes an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC flag that data and necessary metadata are transferred. To guarantee synchronous I/O, O_SYNC must be used in addition to O_DIRECT.

O_SYNC
Write operations on the file will complete according to the requirements of synchronized I/O file integrity completion...

This article on LWN (quite old now) also provides some guidelines to ensure data integrity.

0
stark On

To ensure that all data is written to non-volatile storage, the shutdown command issues the sd_shutdown call to each disk. See https://elixir.bootlin.com/linux/v4.10.17/source/drivers/scsi/sd.c#L3338

This issues two SCSI commands: SYNC_CACHE and START_STOP_UNIT, which are translated to the appropriate action on the underlying device. For SATA devices this means putting the drive in STANDBY mode, which spins down the disk.

0
Glärbo On

In your script:

  • Optional: Run /bin/sync to flush changes in page cache to storage

  • Unmount the target file system (umount /mountpoint), or remount it read-only.

    If the target file system includes root (/) and/or system binaries or libraries (/usr), you cannot unmount the filesystem. In that case, remount the target file system read-only (mount -o remount,ro /mountpoint).

  • Run shutdown -h now to power down the system

This is the standard sequence that ensures the filesystems are in a clean state at shutdown, and that all changes hit the storage media.