ANNOUNCEMENT SEN + Personal Knowledge Graphs Book featuring Haiku

grexe · June 15, 2023, 7:52pm

After one year of work (on/off writing, researching, refining…) it is my pleasure to announce what is maybe the first book to feature Haiku outside of the core audience, which has been released today:

Personal Knowledge Graphs

I am using Haiku as the foundation for my project SEN (Semantic ExteNsions) to integrate semantic extensions into a desktop OS using a files-as-entities metaphor (following a data-centric approach, just like the original BeOS did with e-mail and people “files”) and the filesystem as a direct way to manage all your information, be it notes, books, locations, events, ingredients or whatever:

(to get a rough idea, relations can be named more appropriately of course)

The twist is using filesystem attributes to hold relations (indexed attributes to search for source+target), which I finally managed in a simple and unobtrusive way, but allowing for advanced features like relation properties - all that without modifying any core parts of the system, esp. the filesystem, but just with native features.
I’ve added some 40 lines of code to Tracker just to allow visual navigation of relations.

I’ll follow up with more details soon - if i don’t get instantly burned and banned for heretically misusing Haiku, that is;=).

In the meantime, check out the abstract (or better yet, get the book:^) here:
Extending the Desktop into a Personal Knowledge Graph with SEN

A dedicated website is in the works, too.

cheers,
Gregor

Munchausen · June 16, 2023, 8:41am

This sounds very interesting, it would be nice to see a video demonstration. Are there any earlier papers (or slides perhaps) about it, or only the book chapter so far?

Nexus-6 · June 16, 2023, 8:58am

It is an interesting topic and it got my attention as I am a Zettelkasten practioner myself.
I used tools like DevonThink, Eaglefiler as well as RemNote or even Apple Notes before adopting Keep It.
I confess I was tempted to create something similar for Haiku before focusing on Genio by leveraging BFS attributes and queries, so this topic resonates with me.
The thing that I never actually figured out how to implement is a way to do do that at the application level, that is the place where we all spend the majority of our time not the Tracker.
Personally, I would like to be able to select a text or an object in an application and attach notes and other files or “collections” and find those connections again next time I open the very same file. It doesn’t seem a trivial problem to solve, though.
I look forward to reading the book and hearing about your progress on this.
Good luck!

Nexus-6 · June 16, 2023, 9:01am

Something close to this concept is LiquidText but it’s limited to the application itself (although there is a recent update that opened to external resources)

PulkoMandy · June 16, 2023, 9:03am

On the contrary, this sounds like a great way to use Haiku in a better and more interesting way

It would indeed be interesting to see a demo or some screenshots, and if the patch is only 40 lines, I guess we can consider reviewing and including it.

tqh · June 16, 2023, 10:13pm

This sounds very interesting, what a great idea.Hope you will feel welcome here!

ubu · June 17, 2023, 8:57am

Sounds really interresting, hoping for that website soon, this could become the “killer app” for haiku

grexe · June 17, 2023, 12:25pm

I’m very glad to see so many positive and encouraging responses! Thanks guys, that means a lot to me right now.

I’ve updated the announcement to include a screenshot and some links to the datacentric manifesto and also to my fork of Haiku for necessary changes to Tracker (I’ve been developing in Java and more recently Kotlin in the last 20 years so please ignore my rusty C++…).

Let me answer to all questions in one sweep:

@Munchausen: no video yet, need to brush up the prototype some more to make it also more telling and understandable.

@Nexus-6: you get the idea. I am also an avid supporter of Zettelkasten, however not as strict as the original concept, but more flexible in the way of Atomic Notes or a “Second Brain”.

SEN is just a tiny core that enables applications to resolve relations using the SENPAI (SEN Programmatic Application Interface:^).
But the gist is still to offer users a simple way to do much information handling in Tracker, without the need for separate applications. Tracker is already more than a file manager, since even BeOS had e-mail “files” that were actually entities with their metadata extracted into filesystem attributes.

With such a concept, there is no need for writing simple CRUD applications just to manage all possible kinds of data (think entities) like movies, books or even recipes. This can all be handled via files and the underlying entity relations in Tracker, with careful extensions and adaptations.
If you follow that path, you get a simple but very powerful system to manage your information in a connected way, still using specialised applications where needed.

For this to work however, we need to free data from the control of applications. Even with open data formats, users still rely on a handful of applications can handle them, when they often just need to access basic metadata - this should not be necessary.
E.g., a PDF document has all kind of metadata hidden away, like keywords, author, page count etc. This should be extracted into attributes, sothat users and applications can work with them in a more generic way (think horizontal connections, not vertical ones that really need a deeper understanding of the content format, e.g. word processors).

As for linking, this is already baked into SEN, using relation properties like offset, page number etc. I call this “SENCHA” (SEN Context Highlighting and Annotations:^). When you open a relation in Tracker, SEN will intercept and open the preferred application, moving to the relation target context. This would allow you to follow a relation from a book note or PDF annotation stored as text file to the original document. Of course, the target application needs to support this in some way, e.g. scripting message in BePDF to jump to a page and highlight a part of the text. This is very much resembled in WebAnnotations now, and could be handled the same way in Haiku, where locally stored WebAnnotations would allow you to jump to the web page and highlight the text - great for research, without the dependency on online services and subscriptions.

@PulkoMandy: thanks for your encouraging words! This means a lot to me, since I know you can be very critical and challenge ideas floating around here:)

@ubu: that’s my secret hope, really, however in a slightly different way: Haiku does not a single killer application, but a real enabler. Something that makes people wonder why they cannot do this with their current OS. I have thought a lot about (and heard from many people) how crazy it is to build this system on Haiku, but after some more research I found it was a very sane thing to do, as no other OS offers this powerful combination (e.g. you cannot search for custom attributes in any other filesystem, which is really astonishing).
I don’t know if SEN can ever be such an enabler, but it could make some people curious, and for me it is the easiest and most enjoyable way to build this solution:)

PulkoMandy · June 17, 2023, 1:43pm

I think BePDF already does this for some of the attributes?

I had started organizing my collections of electronic components datasheet this way (so I could search by manufacturer, and component name), but it didn’t go very far. I think that’s one use case where an approach like this could be useful. If only file metadata in a standard format was a more widespread thing and I could download datasheets with the component parameters already available as metadata… But other OS not handling this so well means it won’t really get to the web for now.

The goal in challenging ideas is to explore their limits and sometimes find better approaches. I don’t mean to discourage people from exploring them, even if sometimes it comes out that way. But I try to think of “how would we implement this?” and sometimes I don’t see it working very well (but people sometimes prove me wrong).

But in the domain of filesystem indexing and queries, I think there is a lot of unused potential in the technology we already have, and it is great to see someone exploring that space
As you said, just one small change in Tracker, and suddently we have a lot more possible uses to think about.

ubu · June 17, 2023, 3:52pm

@grexe That is what i meant, so i put killer app in parentheses.

For me the missing features to use haiku as a real daily driver are multimonitor and disk encryption support, but it is coming closer every day.

Thanks to all haiku developers

KitsunePrefecture · June 18, 2023, 12:43pm

Use case #1 :

Well, recently when you discussed redesign of drivers/driver tree I also thought that first why not use attributes of BFS to store connectioins and relations among drivers, so that driver for what capable of – in a standardized, human readable way.
As I know, there are text based databases, Iknow mainly relational ones, so if you create driver database tree once, a template - like for all systems - it can be generic like X512 suggested and can be stored in “/dev” filesystem structure (directories/files as branches) and files (leaves) as attributes. Branches (directories) could reflect to this machine is mainly PCI-based (x86, x86_64) or not (arm, riscv).
After one probation for existing HW/SW config done - in a mini VM (for example in qemu - during installation … then you can setup - by default - a filled out, non generic, but tailored device tree, where actual config stored in file attributes on disk level and there is one instance in memory too for fast access.
You can split up the tree for permanent base components (parts oif the mainboard – only branches) and variable classes ones ( USB, FireWire, CardBus/ PC Card, network attached ones – branches/leaves). This way you must need only frequently refresh/probe the latter one – if you have a HW config service daemon that can maintain this DB and invoke appropriate driver - reading / comparing the attributes for what it is usable – in case HW event ( by ACPI?) and/or inner schedule triggered checks.

Use case #2 :

I have video collections and it would be good if I would have a small program or applet that can check out query ID3 tags and codec infos of these and subtitle and store it as attributes.
Nowadays it would mean support multimedia – as playback multiple video sources is not unique feature of Haiku anymore.
But what if we can search audio and video files for their content.

Also I can reply to your concerns you can loose the attributes when our file copied to another OS.
As I know, such non-basic-like file attributes - those are not natively supported by affected filesystem type - are called as : extended attributes.
As I see – there are implementations on other OSes :

it is just not standardized.

I remembered it from OS/2 times where those are stored in separate files for non native filesystem like FAT to be able to reach it from DOS/Windos and built in for OS/2 due to HPFS (B+tree).

We can follow this approach at file copy in Tracker …
OR
add BFS - like extended attributes to filesystems those are getting availeble for Haiku are in progress – XFS, BTRFS, NTFS – and upstream it for others.
This would be our addition to a standardized multimedia enhancement for OSes and transparent copy from one FS to another without losing such extended attributes

Nexus-6 · June 18, 2023, 2:19pm

Me neither, Zettelkasten is a heavy process that requires discipline and a lot of effort so we all end up customising it.

I don’t seem to have read about this in the abstract. Are these (SENPAI and SENCHA) discussed in the book chapter? PS: I like the reference to Japanese words

It seems a hard task to me. There are niche data format that can’t be abstracted that easily. Moreover, if you need to translate them into another format you may lose all these references.
Either the application is fully aware of SEN and SENCHA and leverages a common API or a sort of general abstraction is provided by the OS itself.
What is your take here?

grexe · June 19, 2023, 9:59pm

This is a bit fuzzy but I understand what you mean, I’ve thought a lot about different use cases so the design of SEN would be flexible enough to accommodate different scenarios and needs.

I’d definitely not want to have the system or core components rely on semantic extensions though, at least as long as the filesystem does not support it by itself.
I think it’s best to keep this as extension in the user/application layer and try out how it’s accepted and used to see if the design goals work out in real world use.

I’m aware of the nature and use of filesystem attributes in different operating systems, and there different implementation and transient nature. Also, as already mentioned elsewhere, only Haiku and BFS support indexed queries of custom attributes, which is an essential requirement for the implementation of SEN.

This is why SEN is deliberately focusing on Haiku only - it’s enough work to design a consistent and efficient solution with this restriction already.
However, I have thought of ways to bridge the gap to network attached storage and remote systems by storing a remote reference as SEN ID (the unique object id which is needed to semantically link files and keep them location independent also across devices and between systems or after a backup/restore - so inodes doesn’t work as a useful id here).

grexe · June 19, 2023, 10:12pm

Zettelkasten: exactly, it’s too rigid for my taste, as I want to develop and collect my knowledge on the go, keep it in flow and dynamically search for bits and pieces.

Yes both parts of SEN are described in the book, but you don’t have to buy it just to get that information. I’ll write down the technical design together with the prototype and publish it in the coming weeks.
The main ingredients are really just:

filesystem attributes for metadata and semantic links
Relations represented and configured through file types and their attributes for storing relation properties
good old messaging to interact with applications and the SEN core if you need to resolve semantic links or supported relation types

This resembles a truly loosely coupled system in best Unix fashion (tools metaphor using pipes and filters), as you can use any python script and lib to extract metadata and store it in attributes, where other applications can pick it up - independently of SEN - and relations can connect and use them.

Naming: yeah I had my fun with these🤩, glad you like it.

File formats and attributes: you’re right, that’s the crucial part, but also not a new problem or specific to Haiku or SEN, as any ontologist or semantic expert will painfully agree.

However, there are now a lot of standard file formats for the use cases I want to support, like ics calendar entries, mail format, document formats, gpx for locations etc.
And there are widely adopted schemas for defining metadata.
So when configuring relations, SEN adopts the best in breed without the fuzz.
There may be several names for attributes which we could alias, but then we have to be careful with semantic incompatibilities like different date or time formats.

In the end, we have to start somewhere, and at some point I decided to not go down that rabbit hole and just build a sane consistent foundation to work with.

PS: congrats and kudos for making Genio, really keen on trying this out with SEN!

grexe · June 19, 2023, 10:26pm

The goal here imo would be not to rely on end user applications to extract attributes, but to have this done transparently in the background or on demand, by something like the rumored indexing service😏 (just like the MIME type sniffer does).
This way I don’t have to open a file first just to extract its metadata.
For now, this is simply done using Python scripts and PDF libs to extract document information and even the structure (TOC). Because SEN supports reflexive relations (links pointing to and into the source file), we can elegantly open a specific chapter of a document using the relation menu.

BePDF only extracts page count afaik, and sadly doesn’t support navigation through scripting, but that could be easily added (infact I’ve already looked into this briefly).

ECs: exactly, I had a similar use case in mind, covering Haiku hardware compatibility. You could have hardware components as file types with properties like manufacturer, model, type, standard/interface etc., and relations to model dependencies and compatibility.

File/content analysis and metadata extraction would be consolidated at configuration level, so relations could rely on a standard set of attributes extracted by accompanying scripts that extract information they know and support, working together to resolve a common set of metadata (until a more streamlined infrastructure service with an extensible SPI takes over).

SEN is deliberately offline and desktop centric for now, but as hinted above, references in semantic links could include a remote prefix and id. Then, we would (sadly) need a small helper service on the remote side to keep track of references files so they can be found using the id.

PS: no worries, being a developer and doing a lot of tech design myself, I fully understand the importance of challenging ideas. As long as we stay constructive here (in general), all is good. I’ve been challenging my design and ideas a lot and changed the design considerably in the year before and while writing the book chapter.
This was a great pain err help and improved the solution a lot - I had many smaller and bigger epiphanies in the process…

RogFanther · June 19, 2023, 11:07pm

would need some good discussion. That is one of the things most people first disable in Windows ( and I suppose in linux too )

SamuraiCrow · June 19, 2023, 11:17pm

Indexing on Mac has been mandatory for ages. When I switched my Mac Mini to Linux, the 80 GB hard drive that served just as a boot volume, suddenly was half full when containing a full install of Linux. The external hard drive became entirely optional.

Instead of indexing completely, perhaps an icon representation of a softlink would be better. That way there’d be room for metadata in the icon storage.

sgzfer · June 21, 2023, 3:37pm

Very interesting project.

It might be worthwhile to have a look at TrackerBase by Scot Hacker, the author of The BeOS Bible:

TrackerBase
https://web.archive.org/web/20230327012238/http://blog.birdhouse.org/2019/01/27/trackerbase/
https://web.archive.org/web/20181110234122/http://betips.net/TrackerBase/

Perhaps re-implementing a Haiku Tips based on attributes (eventually published/updated as a package on Haiku Depot) could be a good way to test and further develop this idea.

Looking forward to see how this project evolves!

zblace · June 22, 2023, 6:59am

I was thinking to write :

…as it was kind of the closest we got IMHO to OS as a killer app, but it missed its chance by not developing video and using more multimedia.

this is a must indeed for 2024 as more new people come in and not all known aspects are familiar (forgotten?) even to pioneers.

Starcrasher · June 22, 2023, 8:29am

How does this behave with localisation?
Attribute name localisation is possible and already done in Tracker but that’s for a short list of known attributes. Here we would face possibly an infinite list of user created attributes.
If we’re taking the example of food recipes and we’re imagining that users of this forum were sharing the best meal of their country. What would happen if someone wanted to collect them? Would they have to to import each recipe and to convert each attribute in his own language?

**ANNOUNCEMENT** SEN + Personal Knowledge Graphs Book featuring Haiku

Use case #1 :

Use case #2 :

ANNOUNCEMENT SEN + Personal Knowledge Graphs Book featuring Haiku