NTFS does support sparse files, but I want to make sure the files I have to write to (which might have been created, set as sparse, and partially filled by another application) are fully allocated, so that I won't get an error due to lack of space when writing to the middle of such file at a later time (i.e. if they are to happen, out of space errors should happen now).
Is there a WinAPI function to ensure a sparse file is fully allocated (preferably atomically), like we have posix_fallocate() in POSIX systems? If not, how do I preallocate it?
I don't think these are duplicates:
- How do you pre-allocate space for a file in C/C++ on Windows? - the question is about files in general, which in Windows are non-sparse by default, so the answers don't address how to do it with sparse files.
- How to create a file of a particular size on windows without io? - this question specifically asks about
posix_fallocate(), but proceeds to ask a question that almost the opposite, i.e. how to create a large file quickly on Windows (for which one commenter suggested NTFS sparse files).
Following the link from this documentation page, I could think of 3 ways of pre-allocating the sparse ranges of a file, but none are atomic, like
posix_fallocate(). I was hoping someone could point to an existing solution in the WinApi.Here they are:
Just copy the file
Copy the full file to another, delete the old file, then rename. This approach has the drawback of always being slow, as it has to read and write the whole file, and potentially takes twice the file space on disk.
It could be improved a little by checking the
FILE_ATTRIBUTE_SPARSE_FILE, so you can skip the operation if the file can't be sparse.Copy the file inplace
Open the file twice, once for reading and once for writing, and alternate between reading from one side and writing to the other, until the whole file has been rewritten. The performance is as bad as the first solution, but at least doesn't take more space than the full file size.
This (maybe) can be improved by reading and writing only one byte per cluster (if you know the cluster size), because the whole cluster have to be allocated. Allocated clusters will keep the old value, and new clusters will be automatically filled with the default value. I say maybe because writing one byte or one full cluster is the same for the NTFS layer, so maybe it is not worth the extra system calls to
fseek()the file.Write zeros to the sparse region
As suggested in the comments of the question, you can use
FSCTL_QUERY_ALLOCATED_RANGESto figure out the ranges where the file are allocated, and write zeros to the space between them. Actually, I've read somewhere that the default read value for unallocated ranges is not necessarily zero, so, to be safe, in my implementation I read one byte from one of those regions and use this value to write back to the spaces between allocations.Again, only one byte per cluster is sufficient.
Depending on how much of the file is allocated, the performance can be much better than the other methods.