Linux Filesystem Drivers
2010-12-11
Introduction
As part of the netfs project, I wanted to write a filesystem driver for Linux, so that files shared through netfs could be accessed like any other file. However, I had a hard time finding good documentation, and did not find a good overview of what functions need to be provided and how, nor a good step by step guide. That step by step guide is what I intend to provide here.
The Virtual Filesystem
Linux, like most UNIX-like operating systems that I know, supports multiple filesystems through an abstraction layer. In Linux, this layer is called VFS, for Virtual FileSystem. This way, the kernel need not worry about the specifics of any one filesystem; it just makes VFS calls, and VFS calls the right functions in the filesystem driver. Conversely, filesystem drivers can rely on the VFS for certain operations, simplifying filesystem implementation a bit.
Initialization and Cleanup
In this section, we develop the skeleton of our filesystem, which I have called tutorfs. First, we need to include some header files:
#include <linux/module.h>
#include <linux/version.h>
#include <linux/init.h>
#include <linux/fs.h>
Then, we write the initialization and exit code, and define appropriate macros to inform the build system of our init and exit functions:
int __init tutorfs_init(void) {
register_filesystem(&tutorfs_type);
return 0;
}
void __exit tutorfs_exit(void) {
}
module_init(tutorfs_init);
module_exit(tutorfs_exit);
As you may have guessed, the function
register_filesystem
registers the filesystem with the kernel. Its
argument is a structure describing the filesystem. For the filesystem developed
here, we shall use the following definition:
static struct file_system_type tutorfs_type = {
.name = "tutorfs",
.get_sb = tutorfs_get_sb,
.kill_sb = kill_anon_super
};
This describes a filesystem named "tutorfs", which is mounted
by calling tutorfs_get_sb
and unmounted by
kill_anon_super
. The latter function is provided by
VFS, and does the finalization for an anonymous superblock, that
is, one that is not associated with a physical block on a real disk. Now, we
can put in a tutorfs_get_sb
function that returns
-EINVAL
, and that would give us a valid filesystem module,
providing a filesystem that cannot be mounted. Let's go on and implement a
working filesystem, though.
The Superblock
When a filesystem is mounted, the VFS calls
the get_sb
function, which is responsible for filling the
superblock structure it is passed with appropriate
values. For canonical filesystems, the superblock contains information about
the filesystem, such as the block size it uses, where the root directory
is located, and when it was last checked. The get_sb
function
would then read the superblock, extract these values, and store them in the
appropriate fields in the superblock sturct.
Our filesystem will reside in RAM, and there will be no
superblock to read. The VFS provides a function named
get_sb_nodev
for this case, which calls a specified filler
function to put meaningful values in the superblock struct. Thus, our
tutorfs_get_sb
reduces to:
struct super_block *tutorfs_get_sb(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data) {
return get_sb_nodev(fs_type, flags, data, tutorfs_fill_sb);
}
and tutorfs_fill_sb
is defined as:
int tutorfs_fill_sb(struct super_block *sb, void *data, int silent) {
struct inode *root;
sb->s_op = &tutorfs_ops;
sb->s_blocksize = 4096;
sb->s_blocksize_bits = 12;
root = iget(sb, ROOT_INO);
sb->s_root = d_alloc_root(root);
if(!sb->s_root) return -EINVAL;
/* Indicate success */
return 0;
}
This function sets the blocksize for our filesystem, creates
an inode for the root directory (ROOT_INO
can be any integer
value, I use 1), and tells the VFS about the operations our
filesystem supports. These are passed in a struct super_operations
;
for our filesystem, we use the following definition:
static struct super_operations tutorfs_ops = {
.read_inode = tutorfs_read_inode,
.put_super = tutorfs_put_super,
};
Oh joy! Two more functions we need to provide!
put_super
is the reverse of get_super
; it is
called when the kernel is done with the superblock. Typical use is to
free any structures allocated in get_super
. In our case, there
aren't any such structures, so the function is empty:
static void tutorfs_put_super(struct super_block *sb) {
return;
}
tutorfs_read_inode
will be explained in the
next section.
Inodes
First, a short introduction to the concept of inodes. An inode describes a file, that is, it contains information about the file type, ownership and permissions, and the location of the file on disk. It does not contain the name of the file, that role is reserved for directory entries. Some filesystems (notably FAT1) do not have the concept of inodes. A driver for such a filesystem must make up something sensible whenever the VFS asks it to look up an inode.
Since our filesystem resides in volatile memory, there are no inodes to read. Instead, we fill the structure passed to us by the VFS with appropriate values.
void tutorfs_read_inode(struct inode *inode) {
ino_t ino = inode->i_ino;
/* Defaults */
inode->i_mode = S_IRUGO | S_IWUSR;
inode->i_nlink = 1;
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
inode->i_blocks = 0;
inode->i_blksize = inode->i_sb->s_blocksize;
if(ino == ROOT_INO) {
/* Special case for the root directory */
inode->i_mode |= S_IFDIR | S_IXUGO;
inode->i_nlink = 2;
inode->i_fop = &simple_dir_operations;
inode->i_op = &simple_dir_inode_operations;
}
}
There is a lot of stuff in that function, but most of it
should be pretty self-explanatory for people who have experience with
filesystems on UNIX-like systems. The i_op
and i_fop
fields specify the inode operations and file operations for an object. Inode
operations are things like create
, mkdir
and
getattr
, whereas file operations are the more common
open
, read
, flush
, etc.
The simple_dir_operations
and
simple_dir_inode_operations
structures refered to here are
provided by the Linux kernel. They are not sufficient for a fully functional
filesystem, but we'll still use them, excactly because the allow us to mount
something before we have fully implemented the filesystem.
Mounting
We now have:
- Module initialization and finalization code
- A filesystem type struct
- A function to read the superblock and mount the filesystem
- A function returning a partially implemented root-directory
This is enough for a filesystem that can be mounted. So let's do so! The download section provides you with the source of the filesystem developed so far, plus instructions on how to use it.
Download
The tarball below contains the code written so far, and a Makefile for Linux 2.6. It is available under the terms of the MIT license.
- tutorfs-0.1.3.tar.bz2 (2 KB)
Download and extract the tarball, make, insert the module with
insmod
, then run mount -t tutorfs tutorfs
/mount/point
. The second tutorfs
can be anything you like,
it normally specifies the device the filesystem resides on, but in our case,
there is no such device. The /mount/point
should be replaced
with the directory you want to mount the filesystem on.
1 FAT, which stands for File Allocation Table, refers to the filesystems used by DOS and MicroSoft Windows. It is a very simple filesystem, not suitable for multi-user environments, but widely employed in devices such as digital cameras.