Linux Filesystem Drivers

Linux Filesystem Drivers

Robbert Haarman

2010-12-11


Introduction

As part of the netfs project, I wanted to write a filesystem driver for Linux, so that files shared through netfs could be accessed like any other file. However, I had a hard time finding good documentation, and did not find a good overview of what functions need to be provided and how, nor a good step by step guide. That step by step guide is what I intend to provide here.


The Virtual Filesystem

Linux, like most UNIX-like operating systems that I know, supports multiple filesystems through an abstraction layer. In Linux, this layer is called VFS, for Virtual FileSystem. This way, the kernel need not worry about the specifics of any one filesystem; it just makes VFS calls, and VFS calls the right functions in the filesystem driver. Conversely, filesystem drivers can rely on the VFS for certain operations, simplifying filesystem implementation a bit.


Initialization and Cleanup

In this section, we develop the skeleton of our filesystem, which I have called tutorfs. First, we need to include some header files:

#include <linux/module.h>
#include <linux/version.h>
#include <linux/init.h>
#include <linux/fs.h>

Then, we write the initialization and exit code, and define appropriate macros to inform the build system of our init and exit functions:

int __init tutorfs_init(void) {
	register_filesystem(&tutorfs_type);
	return 0;
}

void __exit tutorfs_exit(void) {
}

module_init(tutorfs_init);
module_exit(tutorfs_exit);

As you may have guessed, the function register_filesystem registers the filesystem with the kernel. Its argument is a structure describing the filesystem. For the filesystem developed here, we shall use the following definition:

static struct file_system_type tutorfs_type = {
        .name           = "tutorfs",
        .get_sb         = tutorfs_get_sb,
        .kill_sb        = kill_anon_super
};

This describes a filesystem named "tutorfs", which is mounted by calling tutorfs_get_sb and unmounted by kill_anon_super. The latter function is provided by VFS, and does the finalization for an anonymous superblock, that is, one that is not associated with a physical block on a real disk. Now, we can put in a tutorfs_get_sb function that returns -EINVAL, and that would give us a valid filesystem module, providing a filesystem that cannot be mounted. Let's go on and implement a working filesystem, though.


The Superblock

When a filesystem is mounted, the VFS calls the get_sb function, which is responsible for filling the superblock structure it is passed with appropriate values. For canonical filesystems, the superblock contains information about the filesystem, such as the block size it uses, where the root directory is located, and when it was last checked. The get_sb function would then read the superblock, extract these values, and store them in the appropriate fields in the superblock sturct.

Our filesystem will reside in RAM, and there will be no superblock to read. The VFS provides a function named get_sb_nodev for this case, which calls a specified filler function to put meaningful values in the superblock struct. Thus, our tutorfs_get_sb reduces to:


struct super_block *tutorfs_get_sb(struct file_system_type *fs_type,
        int flags, const char *dev_name, void *data) {

        return get_sb_nodev(fs_type, flags, data, tutorfs_fill_sb);
}

and tutorfs_fill_sb is defined as:

int tutorfs_fill_sb(struct super_block *sb, void *data, int silent) {
        struct inode *root;

        sb->s_op = &tutorfs_ops;

        sb->s_blocksize = 4096;
        sb->s_blocksize_bits = 12;

        root = iget(sb, ROOT_INO);
        sb->s_root = d_alloc_root(root);
        if(!sb->s_root) return -EINVAL;

        /* Indicate success */
        return 0;
}

This function sets the blocksize for our filesystem, creates an inode for the root directory (ROOT_INO can be any integer value, I use 1), and tells the VFS about the operations our filesystem supports. These are passed in a struct super_operations; for our filesystem, we use the following definition:

static struct super_operations tutorfs_ops = {
        .read_inode     = tutorfs_read_inode,
        .put_super      = tutorfs_put_super,
};

Oh joy! Two more functions we need to provide! put_super is the reverse of get_super; it is called when the kernel is done with the superblock. Typical use is to free any structures allocated in get_super. In our case, there aren't any such structures, so the function is empty:

static void tutorfs_put_super(struct super_block *sb) {
        return;
}

tutorfs_read_inode will be explained in the next section.


Inodes

First, a short introduction to the concept of inodes. An inode describes a file, that is, it contains information about the file type, ownership and permissions, and the location of the file on disk. It does not contain the name of the file, that role is reserved for directory entries. Some filesystems (notably FAT1) do not have the concept of inodes. A driver for such a filesystem must make up something sensible whenever the VFS asks it to look up an inode.

Since our filesystem resides in volatile memory, there are no inodes to read. Instead, we fill the structure passed to us by the VFS with appropriate values.

void tutorfs_read_inode(struct inode *inode) {
        ino_t ino = inode->i_ino;

	/* Defaults */
	inode->i_mode = S_IRUGO | S_IWUSR;
        inode->i_nlink = 1;
       	inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
       	inode->i_blocks = 0;
       	inode->i_blksize = inode->i_sb->s_blocksize;

	if(ino == ROOT_INO) {
		/* Special case for the root directory */
		inode->i_mode |= S_IFDIR | S_IXUGO;
        	inode->i_nlink = 2;
                inode->i_fop = &simple_dir_operations;
                inode->i_op = &simple_dir_inode_operations;
	}
}

There is a lot of stuff in that function, but most of it should be pretty self-explanatory for people who have experience with filesystems on UNIX-like systems. The i_op and i_fop fields specify the inode operations and file operations for an object. Inode operations are things like create, mkdir and getattr, whereas file operations are the more common open, read, flush, etc.

The simple_dir_operations and simple_dir_inode_operations structures refered to here are provided by the Linux kernel. They are not sufficient for a fully functional filesystem, but we'll still use them, excactly because the allow us to mount something before we have fully implemented the filesystem.


Mounting

We now have:

  1. Module initialization and finalization code
  2. A filesystem type struct
  3. A function to read the superblock and mount the filesystem
  4. A function returning a partially implemented root-directory

This is enough for a filesystem that can be mounted. So let's do so! The download section provides you with the source of the filesystem developed so far, plus instructions on how to use it.


Download

The tarball below contains the code written so far, and a Makefile for Linux 2.6. It is available under the terms of the MIT license.

Download and extract the tarball, make, insert the module with insmod, then run mount -t tutorfs tutorfs /mount/point. The second tutorfs can be anything you like, it normally specifies the device the filesystem resides on, but in our case, there is no such device. The /mount/point should be replaced with the directory you want to mount the filesystem on.


1 FAT, which stands for File Allocation Table, refers to the filesystems used by DOS and MicroSoft Windows. It is a very simple filesystem, not suitable for multi-user environments, but widely employed in devices such as digital cameras.


References

  1. Overview of the Virtual File System
  2. Linux Cross-Reference