Database as a file system

I was thinking about a few things recently (in different contexts). And I think I stumbled over two things that could improve database-ness a good bit with very little code changes:

The first would be to support POSIX O_EXEC and O_SEARCH, as well as Linux’s O_PATH. They are all basically aliases (POSIX specifically/explicitly allows to use the same value for all of them). If set in open() (and friends) they open the file for filesystem functions but no data functions. This allows them to be used e.g. for *at() calls without the need to get them to the point you can actually do I/O. Plus the topic of permissions is slightly different.

In Haiku this is already a thing. Haiku however uses the classic model of O_RDONLY with O_RDONLY=0. What I would do here is to add the constants tu the corresponding header and reserve a bit for it. Also check in the kernel about that bit and kernel-land alias that to O_RDONLY (which is a superset of O_EXEC/O_SEARCH/O_PATH, therefore a safe kernel land aliasing).

This change will also improve general POSIX and Linux compatibility with likely less than 10 lines of code change.

The second change would be to support data sets that are not classic files in the filesystem. Those files would be always zero bytes (just like for example sockets, fifos, device files, …). The sole purpose is to use them to store extended attributes plus keeping the inode alive (with the inode number being the primary index for the filesystem basically, so making them a true inode makes them first class citizens, just like extended attributes and stuff).

This is already “the Haiku way”, see e.g. Person files.

This would mean to define a new inode type (within S_IFMT, which might need to be adjusted). It would break the binary compatibility towards old code (but not towards newer code). In the most simple case the kernel would only need to check that no write is ever done to such an inode. Any filesystem checking would check that they are 0 byte (and if not allow to convert them to a plain file or delete the data).

Creation could be done via mknod() (and friends). Also there could be an ioctl() or similar to convert an existing 0 byte plain file to the new type.

I think that those two changes together would bring the most improvement at the lowest work right now. And specifically the first one would clearly also be good as it would improve POSIX compatibility at basically no cost at all.

What is still missing is proper references. Maybe at some point an extended metadata could be referencing another inode. I don’t think that would be much of the problem. However it might need full hardlink support first.

As always, happy to discuss here or on IRC. Also happy to provide/discuss specific patches.

3 Likes