OK, I know that docx, xlsx and .pptx have a zip container, which is why Haiku is technically correct in assigning the application/zip mimetype to them. But this is highly impractical. How can I fix this?
Opening the file types app and adding the file extensions to the correct mime time to e.g. application/vnd.openxmlformats-officedocument.wordprocessingml.document has no effect. That is: Even after a reboot, tracker shows all files as Zip archives, regardless of their extension.
Also, right clicking a file and selecting âfile typesâ from the context menu seems to affect only the current file and it involves a two-step process where I have to first manually change the mime type and second, if I am very lucky, I may have the chance to set the desired handler. But even this manual assignment doesnât seem to survive reboots.
Iâm almost certain that there must be another more practical way, or isnât there? How does MacOS handle this? There, mime types are detected by content sniffing as well, so it should have the same problem.
File extensions mean very little on Haiku since we use MIME types in extended attributes.
This sounds like a bug.
What filesystem type are you working on? This manual assignment probably only works on BFS, as other filesystems donât support typed extended attributes. (Or no extended attributes at all, e.g. FAT.) But it should work smoothly on BFS at least.
We need MIME sniffing rules for these types. We have them for other zip-based formats like ODTs to distinguish them from .zips, so we just need them for DOCX and the like too.
Then, why does the file types app allow to add extensions?
BeFS. Iâm not aware that the installer accepts anything else. Does it?
I wasnât aware that OpenDocument also uses a zip container. In that case, adding something similar for OpenXML should not be too difficult, and, maybe Iâm naive, but Iâm surprised that it hasnât been done.
This works for compressed containers that have their identifying stuff as the first file. This doesnât always have to be the case.
Some formats also use âjustâ the filename to try and differentiate. Maybe we should allow some file extensions to be the deciding factor for some files.
This issue keeps popping up, and every time, can only be solved with developer intervention. File extensions should really be the deciding factor by default. Files that have the same file extension but need to be treated differently, are the exception, not the other way around. Is Unix purity so important?
Iâm undecided about that. However, it is frustrating that as a user I donât seem to be able to affect the process. The file type application lets you add extensions, but doing so doesnât have any visible effect
Unfortunately, here itâs a type of file which is also a zip file.
What missing, maybe, is to have a confidence value for extension, a 0.5 confidence on .docx extension could then win over the 0.4 confidence of the sniffing rule of a zip file.
I would rather have something like a mimetype with a dependency.
Something like a something file has to also be a zip file, that is without considering dependant types it must resolve to a zip file, and then we can have the zip-dependants battle about which is right. i.e which file extension or magic is included (first file in the archive at a fixed offset for example)
Of course you can, you can set the mime type attribute manually. Ideally this should be just as easy as changing a file extension, but the ui for editing attributes isnât quite there yet.
You can also edit the sniffing rules from the filetypes preferences, if you know how to identify files from their content (currently not by their extensions)
This has nothing to do with âUNIX purityâ and is more a legacy from classic MacOS (where file extensions werenât a thing either)
Can you give me a hint what I need to do? BTW, are you talking about changing the attributes for individual files? Because, having to set the mime type for each file individually would also be kind of tedious.
Can you tell me where the file are that I need to edit?
Not the solution you are asking for, but you can leave that at just changing the MIME type and setting the handler for all files of that type from the FileTypes general preferences.
You can also set the type from the command line with settype -t 'application/vnd.openxmlformats-officedocument.wordprocessingml.document' file.docx, do it to several files in one go or compose it with find to set it for all the files in your system if you are brave enough.
Even if better than clicking, thatâs still manual.
OpenDocument requires the first file in the zip to be âmimetypeâ, uncompressed and with the mimetype as content, so itâs easy to detect. Microsoft Office formats arenât that nice. You can check, on top of the zip signature, for the text â[Content_Types].xmlâ that seems to always be the first file in packages created by Microsoft tools, and for âword/document.xmlâ, âppt/presentation.xmlâ and âxl/workbook.xmlâ depending on the kind of file. Though I think the latter are not really normative, as it isnât that they are the first files, so you may not find the strings in the first file chunk.
You would do it from the FileTypes preferences. Select the type (add it if you donât already have one) and edit the Rule field in the File recognition box, just below the extensions. If you donât see it, check Show recognition rule in the Settings menu.
I think LibreOffice ships the sniffing rules and filetype definitions, but they are not in the base system? Should we include them in the base Haiku install?
OpenDocuments are already defined in haiku mine_db. For instance, for .odt files:
I think similar can be done for OpenXML documents, using, as pointed by @madmax , using sniff rule to check itâs a zip file but also contains the mandatory Content_Types.xml name in the zip catalogue
Something like â0.50 (\âPK\â) [0:512](\â[Content_Types].xml\â)"
Issues are:
there is no warranty that the [Content_Types].xml entry will be at top of the file, Iâve a least one sample, while still small, XLSX file where in fact it appears at the bottom of the file
you still need a second way to detect the type of OpenXML document, between spreadsheet, text, presentation. Which will also depends on the presence or not of some named XML entries within the zip file.