How to create a sparse file

Sparse files are files with “holes”. File holes do not take up any physical space as the file system does not allocate any disk blocks for a hole until data is written into it. Reading a hole returns a null byte.

An example of sparse files is virtual machine images. For instance, when I create a VirtualBox machine and assign it a maximum storage of 100Gb, only the storage corresponding to the actual data in the machine is consumed.

Here’s an example of how to create your own sparse file. The following program writes the string ‘text’ to ‘file’ starting at ‘offset’. If the file does not exist, it is created.

/*
    Usage:
    write file offset text
*/

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>

void errorExit(char *format, const char *text) {
    printf(format, errno, text);
    perror("");
    exit(EXIT_FAILURE);
}

int main(int argc, char const *argv[]) {
    off_t offset;

    if (argc != 4 || strcmp(argv[1], "--help") == 0)
        printf("%s file offset text \n", argv[0]);

    if (*argv[2] == '0') {
        offset = 0;
    } else if ((offset = atol(argv[2])) == 0) {
        printf("cursor_position parameter is %s but must be an integer >= 0",
               argv[2]);
        exit(EXIT_FAILURE);
    }

    int fd = open(argv[1], O_WRONLY | O_CREAT, S_IRUSR | S_IWUSR);
    if (fd == -1) {
        errorExit("error %d while opening file %s", argv[1]);
    }

    if (lseek(fd, offset, SEEK_SET) == -1) {
        errorExit("error %d while seeking in file %s", argv[1]);
    }

    if (write(fd, argv[3], strlen(argv[3])) != strlen(argv[3])) {
        errorExit("error %d while writing file %s", argv[1]);
    }

    if (close(fd) == -1) perror("close input");

    exit(EXIT_SUCCESS);
}

This other program truncates ‘file’ to size ‘length’. If ‘length’ is greater than the current size of the file, it is extended by padding with a sequence of holes (null bytes).

/*
    Usage:
    truncate file length
*/

#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>

void errorExit(char *format, const char *text) {
    printf(format, errno, text);
    perror("");
    exit(EXIT_FAILURE);
}

int main(int argc, char *argv[]) {
    long length;

    if (argc != 3 || strcmp(argv[1], "--help") == 0)
        printf("%s file length\n", argv[0]);

    if((length = atol(argv[2])) == 0) {
        printf("'length' parameter is %s but must be an integer > 0", argv[2]);
        exit(EXIT_FAILURE);
    }
    
    if (truncate(argv[1], length) == -1)
        errorExit("error %d while truncating file %s", argv[1]);

    exit(EXIT_SUCCESS);
}

Here’s the above programs in action. Let’s get started by creating a file:

% ./write myfile 0 hello
% cat myfile             
hello
% ll myfile   
-rw-------  1 xxx  staff  5 14 Mar 20:43 myfile
% du -h myfile               
4.0K    myfile

‘myfile’ contains 5 bytes. However, as most file systems allocate space in blocks, the size in disk is 1 block of 4096 bytes.

Next, we are going to increase the size of the file by adding holes:

% ./truncate myfile 5000       
% ll myfile
-rw-------  1 xxx  staff  5000 14 Mar 20:49 myfile
% du -h myfile
4.0K    myfile

The new size is 5000 bytes and yet, the space in disk remains 4096 bytes! Now the final trick, let’s write something in some of the holes without changing the size of the file:

% ./write myfile 4500 bye
% ll myfile
-rw-------  1 xxx  staff  5000 14 Mar 20:54 myfile
% du -h myfile
8.0K    myfile

The size of the file hasn’t changed but the file system has allocated a new block to account for the data stored in the holes.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.