Index Feeder (like) Server

One thing that kinda bugs me when I’m using my BeOS system, is that if a file dosn’t have any attributes you have to manually add them (or use a tool like MP3 army knife). I was having a browse around at SkyOS’s home page and they have a server running in the background that systematically scans for new and modified files and using plugins pulls out details like bit-rate, track-name and play-length. something that BeOS translators can already do.

Also instead of changing the source code to change menu names and the like, why not have the text fields in an attribute stream, kinda like apple resource forks (ok, I’ve probably messed this part up). that way people could localize the apps without a recompile.

1 Like

Hi,
I am realy intressting to programing a server like the skyos index feeder. ( I already have an similar idea the first time I used BeOS right before skyos used the BeFS :wink: )

When I finished the curent program I am coding, I will start to look at an index server… If anybody want to help me : YES!

I don`t know the translator concept. Is there the facility to read the id3tag from a mp3 file over a translator? And is there a mp3 translator?

an interesting, and much needed idea. MP3s are a good example to deal with, as they are perhaps one of the more pervasive examples of embedded metadata. I’ve had similar thoughts about this, so I might as well chip in.

My idea was that whenever files leave or enter Haiku, they were also scanned for any embedded metadata. ID3 tags, in the case of Mp3 files. This metadata was then converted to BeFS style attributes so that Haiku can maintain the same level of speed when dealing with such an array of extended information.

Entering or leaving the system could be any number of methods. Email attachements, USB hard drive (or to any non BeFS platter), filesharing. The actual cross platform embedded data would only need to be used when the files leave a BeFS environment, it would use native attributes the rest of the time.

However, that being said, it’s not impossible that an index_server would do just as good a job. Maybe a combination of the two?

Yes, mp3s where also my first idea. But with addons the index-server should be extendable to any kind of data.
For a detail concept look at the skyos page: (index feeder)

http://www.skyos.org/?q=node/485

I like this ideas and there are some nice videos. There is also described how to handle with files on removeable devices.

With addons you could realize a fulltext search in an externel database :slight_smile:

But the first step will be much simpler:
-create a server that watch the files und devices
-create an addon interface to the server
-write the first addon :slight_smile:

MP3s were just an example, by using BeOSs own translators it should be possible to pull all sorts of data from all supported media files. resolution and colour depth from images and videos, play length from videos and audio tracks. Like SkyOSs implementation it should also be able to pull titles and the like from html and other text files (source, pdf…)

How do the parts that already are in place work today? There is at least one attribute that gets filled for every file; the mime type. Is that set by the application creating the file or what? Also, whenever an attribute gets changed, the index is immediately updated. Is the indexing initiated by BFS or is there a separate process which monitors the file system for changes?

For efficiency, these features should be integrated as closely as possible. One must of course take care to not let a buggy metadata plugin crash anything as important as BFS, though.

bogomipz wrote:
How do the parts that already are in place work today? There is at least one attribute that gets filled for every file; the mime type. Is that set by the application creating the file or what? Also, whenever an attribute gets changed, the index is immediately updated. Is the indexing initiated by BFS or is there a separate process which monitors the file system for changes?

Currently (within BeOS) attributes are indexed (buy the kernel I think) when the file is written to disk (on the file close) or when its updated. And only the attributes that an index already exists for. There are a few nuances with this though. If a new index is created, existing files aren’t automatically added, and if the file is copied from a file system that dosn’t support attributes, only the name, size and dates are added. It takes tracker to id and add the mimetype.

To id the mimetype first the extension is checked, failing that the sniffer rules are used (there’s a command line tool to check and edit these, but I’ve forgotten the name).

SkyOS has a separate ‘server’ that scans or new files and extracts the extra meta data then adds that to the files attributes and hence the indexes as well.

the indexing is done by the BFS. The mime types are writen when you copy files with the tracker (not by the shell :frowning: ) also there is no index over BEOS:TYPE when you created one you have to reindex the mime types.

take a look here:
http://haiku-os.org/forums/viewtopic.php?t=881&highlight=

I wrote a basic index server but there is still much to do… Anybody know some opensource libs to extract data from a file? At the moment there is only one mp3 addon based on axels mp3 tool…

Ok, so the mime type is written by the program creating the file. When you download something from the web, Net+ adds the mime type, right? Somehow, this feels slightly disappointing, although it makes sense for most network applications (like BMail and Net+) because they usually know the mime type from headers in the protocols.

I should have known that the indexing is done by BFS, it’s the only way that makes sense.

In light of this, I doubt there is much hope of integrating these parts, so I think a server which monitors the file system and simply fills in attributes is the way to go. Of course it has to be plugin based, and for some file types it might be feasible to make use of translators (a single plugin could extract height, width, depth, comment, etc for all image formats the system happens to have translators for).

Does BeOS support monitoring the whole file system, or do you have to watch each and every directory separately? Most unix kernels do not let you monitor the whole FS, Haiku should add this if it’s not already there.

About that mime type again. If the index feeder (or whatever name fits better) figured out the type when missing, then copying files from the shell or from non-BFS volumes wouldn’t be a problem any more. The only question is this; do the native applications create, write and close the file, then add metadata? If so, the index feeder would process the file before mime type and other attributes were added by the original writer, which is probably less than optimal behavior.

I don’t know at which time a program write a attribut. A solution is that there is a setting in the index server to update empty attributs only .

You can only watch nodes and not the whole FS. I hope this feature will be in HAIKU R1,1 :wink: I do a workaround and index the BEOS:TYPE and make a query on the mime types. But so only files manipulated with the tracker or become a mime type are indexed…

When I found enough time I hope I could release a mockup version in a month :slight_smile: In the moment I work on the preference app to configure the index server…
An other problem is that the addons should never crash the index server. But this will come after the basics…

ZzLeCzZ wrote:
I don't know at which time a program write a attribut. A solution is that there is a setting in the index server to update empty attributs only .
If the indexer comes into the picture before the app writes attributes, then it will find all attributes to be emtpy. The indexer has to figure out the mime type by itself (in contrast to reading BEOS:TYPE), call the correct plugin, which scans through the file contents, and then write the extracted attributes. At this point the original app may have written BEOS:TYPE and other metadata, so it's kind of random which one writes first.
ZzLeCzZ wrote:
You can only watch nodes and not the whole FS. I hope this feature will be in HAIKU R1,1 ;) I do a workaround and index the BEOS:TYPE and make a query on the mime types. But so only files manipulated with the tracker or become a mime type are indexed...
Clever, but this means that the indexer doesn't update attributes when the files change, and like you mentioned isn't notified about all files.

Perhaps the Index server could hook into the function that writes a file’s mimetype, wait a little bit in case the application that wrote the file wants to add more stuff to it, and then fill in the rest of the attributes according to the mimetype.
Thinking about it for a few seconds more, perhaps it could hook into the general function that writes to files in the first place. That way, if the file gets modified, which I doubt would change the mimetype, it could be scanned again for metadata. It would have to ignore changes to attributes, of course, to avoid recursion.
But then, what happens if a user changes the attribute? Should the hooked metadata program see that and change the file’s internal metadata? The next time that Cl-Amp looks at the mp3 file, it will see the internal metadata, not the new attribute metadata. And if the file gets changed a little bit, the new attributes will be overwritten by the old metadata. Perhaps a separate attribute that would notify the metadata server that it filled in attributes already, and that it shouldn’t again if the attributes change. Or something like that.

I think the addon should know what do with a file and the user should choose the addon he will use.

Example:
When a user change the colordeph attribute of an image file to 3 and and the true value is 8 then the addon should read the true value from the file and change the attribute to the right value.

The other case is when the addon don’t realy know what the right value is. For example when you have an mp3 with bad id2 tags. The you would like to set the right attribute and the indexserver should’t overright this value by some trash in the id2 tag. So the mp3 addon should not update files.

Idea:
If there will be a real fs mointor in Haiku then the mp3 addon will be notified by the mointor that a mp3 attribute has changed. Then the mp3 addon should read the attribut and change the id2 tag to be consistend with the file content!

But every user may have different idears about that. So the addons should be configureable…

bogomipz wrote:
Does BeOS support monitoring the whole file system, or do you have to watch each and every directory separately? Most unix kernels do not let you monitor the whole FS, Haiku should add this if it's not already there.

You can query by name + “modified after”, to catch -some- of what happens on a BFS volume. (Curiously though, folders don’t seem to show up in such a query.)

Open (live) queries lock a volume from being unmounted, which is highly annoying.

bogomipz wrote:
About that mime type again. If the index feeder (or whatever name fits better) figured out the type when missing, then copying files from the shell or from non-BFS volumes wouldn't be a problem any more. The only question is this; do the native applications create, write and close the file, *then* add metadata? If so, the index feeder would process the file before mime type and other attributes were added by the original writer, which is probably less than optimal behavior.

A future indexing server (like any other application) has no way of knowing whether it’s a new file that is being created, or one that is copied, or received. It’s all the same to the filesystem. C applications simply can’t tell. I don’t think it’s currently possible in BeOS to know which application holds an open file descriptor (read or write), and there’s no locking facility. I believe Linux and *BSD have APIs you can use to check whether a file is in use. The way BeOS applications read/write file data and attribute data is completely up to the application itself. There’s no way to know whether or not the application is done with the file, whether or not it’s safe to export its metadata. There’s not even a way of knowing for certain which application created a certain file, even though you can make an educated guess based on the apps that state support of its filetype and the current preferred application of that filetype.

The BeOS registrar scans for missing mime-type attributes when the system and the running applications appear to be idle. (It’s priority is set lower. It doesn’t get scheduled unless called on explicity, or when the system is otherwise idle.) Tracker also initiates mime-type identification when doing drag&drops, and when asking for Info on a file.

I think its possible to make an indexing server that that works okay, even though care has to be taken to avoid working on a file that an application is currently working on. Anyway, as long as the file is only ever read, and never altered, and the attributes are expendable (in a sense), there is little danger.

1 Like

In my mind, the file data has to be the authority.
Attributes are not authoritative sources of information,
except for zero-size file types such as People files
(which I personally dislike, due to portability issues).

When a file changes, the attributes have to be updated,
either by the application or by the indexing server.
Not both, if it can be avoided, by clean and simple design,
rather than by bloating the system with app<->indexer
communication.

Data has to flow from within the file to the attributes,
not the other way around, as tempting as it may be.

For attribute->file updates to work, the indexing service has
to have a query open for every possible indexed attribute,
which adds up to some amount of mimetypes * each mimetype’s
relevant attributes * mounted volumes = major kernel workload.
Bad idea.

File->attribute updates can work with a single query per volume,
or even be query-less, like the registrar.

I think it would be very confusing to allow data to flow in both
directions… (file <-> attributes) It’s not user friendly regardless
if it can be set as a user preference, per filetype even.

The primary data stream (the file) must be the final authority,
the primary source of data.