BFS Graph Database

ArDrakho · October 5, 2023, 9:08pm

I decided to segment this from Native shell? - #101 by SamuraiCrow postings.
In that posting, the discussion involved thoughts about BFS and its database-like features. I think deserves its own attention versus the native shell discussions.

To begin with, Haiku inherited the BFS features from BeOS–so we already have the very primitive database ‘engine’. The BFS database would be equivalent to a flat file database, such as dBASE, FoxPro, and others. It will never be a relational database–but BFS is closer to No-SQL database, such as a structured graph database.

Let me explain by using an example from my Multimedia database now.

PART I

Create a Database Data Type (i.e. Custom Filetype Group)
The database data type or schema name equates to the creation of a new Filetype group. This is accomplished via the Preferences–> FileTypes application.

For example, a new schema is created by adding a new group using the FileType: ‘Add…’ button:

a. For the ‘Group:’ select the ‘Add new group’ from the drop-down option
b. For the ‘Group name:’ enter ‘database’
c. Then, click on the ‘Add group’ button

From the FileTypes’ left panel, the new group ‘database’ is now listed. Highlight ‘database’ and update the following in the ‘Description’ section:

a. The ‘Internal name:’ is displayed
b. Enter the ‘Type name:’ as ‘Multimedia database schema’
c. Enter the ‘Description:’ as ‘Generic database schema’

Database object (i.e. Custom FileType) creation is accomplished via FileTypes. It is at this point, we are defining the data object that will possess the data. In graph database terminology, this would be a ‘node’.

For example, a new data object is adding a new FileType to the new group ‘database’ by highlighting ‘database’ clicking on the FileType:‘Add…’ button:

a. For the ‘Group:’ select the ‘database’ from the drop-down option
b. For the ‘Internal name:’ enter ‘MusicDB’
c. Then, click on the ‘Add type’ button

From the FileTypes’ left panel, the new filetype ‘MusicDB’ is now listed. Highlight ‘MusicDB’ and update the following sections:

Add the file extensions linked to this new FileType so the system will recognize files by their extention. In the 'File recognition:" section, add the following extensions: ‘mp3’ and ‘MP3’
In the FileType description section, update the ‘Type name:’ and the ‘Description:’ options as follows:

Type name: Music (MP3) Database
Description: Track all your music files.

The ‘Extra attributes’ will be equivalent to node properties in graph database design or columns in a database table.

In this section, we will enter all the custom attributes that we need for this FileType. Clicking the ‘Add…’ button opens a panel to enter all the data for the new attribute. You can edit an existing attribute with a double-click.

Complete Data Type design by entering the following oprions:
Attributename: Internal name: Type: Visible: Display as: Editable: Width: Alignment:Examples:
Artist database_MusicDB:Artist String X Default X 100 Left
Album database_MusicDB:Album String X Default X 100 Left
Title database_MusicDB:Title String X Default X 100 Left
…
Rating database_MusicDB:Rating Integer 32 bit X Default X 15 Left
Frontcover database_MusicDB:Frontcover String X Default X 60 Left
Comment database_MusicDB:Comment String X Default X 100 Left

There is no direct correlation to a database relationship or an edge in BFS, but this can be accomplished by group assignments in an attribute(s) of the file type. A few examples: in a Person type, group assignment as Family will link other Person FileTypes that are in the Family group together. Another is the album name linking all the tracks together or the Artist_name linking all the albums, and so forth. So, there is some flexibility to link data types within the FileType.
The Haiku-specific commandline applications provide basic database-wise operations. These commands can display, read, add, and remove attributes of the FileTypes: listattr, catattr, addattr, rmattr, and copyattr. We should consider enhancing these commands to better serve the database features of BFS operations.

ArDrakho · October 5, 2023, 9:15pm

PART II

Another database feature that BFS exhibits is capacity for database-like indexing at this low-level.

Before we start entering data into MusicDB database, we need to create indexes on specific FileType attributes. The reason is that indexed attributes can leverage Haiku’s fast Queries feature using the ‘query’ commandline application or the Find application.

From the defined datatype from Part I, I am creating indexes on the attributes that will be the primary attributes to be queried, grouped, and searched for:

Internal Name Attribute type

database_MusicDB:Artist text
database_MusicDB:Album text
database_MusicDB:Title text
database_MusicDB:Track int-32
database_MusicDB:Year int-32
database_MusicDB:Genre text

To index these attributes, open the Terminal and create the indexes as follows:

mkindex -t string database_MusicDB:Artist
mkindex -t string database_MusicDB:Album
mkindex -t string database_MusicDB:Title
mkindex -t int database_MusicDB:Track
mkindex -t int database_MusicDB:Year
mkindex -t string database_MusicDB:Genre

Note: The -t option defines the type of attribute, which is “string” for text and int for integer numbers.

A quick note on the Haiku-specific commandline applications provide basic index operations. These commands can display, create, reorder and remove FileType indexes: lsindex, mkindex, reindex, and rmindex. I have not used the indexes long enough to recommend any improvements at this time.

At this point, we are ready to populate the MusicDB database. I copied all my MP3 music files into one directory named ‘Music’. The first level sub-directory is the artist/group’s name, the next lower level is the album that contains all the music tracks.

I wrote a bash script that reads every MP3 file in each sub-directory and captures its ID3 tags and assigns any non-NULL values into their respective FileType’s attributes. After all 25K+ MP3 files have been processed, I can now query the database. Here is an example query:

query 'database_MusicDB:Artist==ABBA' | sed 's~\\~~g'

For those who need to see that MusicDB attributes in Tracker, I created a MusicDB DefaultQueryTemplate. Then I created another bash script to assign the database template to all lowest leaf sub-directories in the Music directory.

The bash script basically calls the command ‘copyattr ~/config/settings/Tracker/DefaultQueryTemplates/database_MusicDB “{sub-directory name}”’.

What I have showed is nothing new–it is a combination of what Scott Hacker wrote about and from many others in this forum in their discourses when the topic was previously discussed. I use Neo4j graph databases, so thinking in a natural database design is very easy for me–so I see the BFS database potential. We need better reporting tools to leverage the what BFS currently offers and better query capabilities for more complex searching by grouping FileTypes by their attributes to get results we expect to see.

Just some thoughts on this topic…

SamuraiCrow · October 6, 2023, 3:17am

There is a query user interface under the “find” option in the tracker menu also. See the Query entry in the user guide for more instruction and to make sure we’re on the same page.

After reading the article linked above, is there a way to restrict Tracker’s find by limiting the results of an attribute based query to a folder and its subfolders?

humdinger · October 6, 2023, 7:38am

Not yet. See #18156 (Support BQuery filtering by folder) – Haiku.