BeOS compatibility and packagefs

PulkoMandy · April 14, 2024, 10:18am

it doesn’t, that would be stupid…

The packagefs stores the uncompressed files in RAM, in the file cache. This can be removed from memory and later re-extracted, but only if you run out of memory. On modern systems, the RAM is often much larger than the complete uncompressed system, so, it’s not really a problem.

cb88 · April 15, 2024, 3:25pm

You can argue that all you like but abusing free ram isn’t completely free, and its not even possible to do effectively on low ram systems.

It’s also violating the basic storage hierarchy… you have a fast disk, why are you trying to optimized storage space on disk when the disk is the cheapest of all the memory types. It’s classic premature optimization and overcomplication of something that should be super simple.

PulkoMandy · April 15, 2024, 5:37pm

You can make uncompressed packages if you want. It doesn’t change much in term of disk access performance if you run an SSD (it does if you run a spinning hard disk, which was the case for most people when this was developed).

It also makes the install media smaller, and the download of updates faster.

cb88 · April 16, 2024, 2:30pm

So would have compressing the packages to begin with in any other format as well… like literally everyone does this.

win8linux · April 16, 2024, 5:20pm

HPKGs are already compressed with ZSTD, which is widely considered to be the best overall compression algorithm and has been gaining traction for years. The next best open-source option would be XZ, unless of course its library’s recent security incident is taken into account.

cb88 · April 16, 2024, 5:44pm

My point was that is a BAD thing at runtime… unless you have hardware acceleration of decompression like consoles have, you are wasting CPU cycles or memory space to do that. Rather than implementing some complicated caching scheme yous should just let SSDs be fast with a simple caching scheme (otherwise you are pushing complexity and higher latency caches up in the stack).

And yes… everyone uses some form of relatively good compression these days.

PulkoMandy · April 16, 2024, 6:06pm

I think that’s at least two misleading things…

First, “the best overall compression algorithm” just makes no sense at all. It depends what you want to do with your compression. ZSTD is a good choice when you need fast decompression, and still a pretty good reduction of size. But, what if you want something even faster? You can try snappy. Something that compresses more? There are a lot of choices, the latest ones being based on neural networks, and needing a last-generation GPU to run the decompression.

As for XZ, XZ is not a compression algorithm, it’s a file format. It uses LZMA compression. The HPKG files do not use an existing compression file format, because few of them are designed for random access. Which is required for HPKG files. So, I don’t see how the recent security incident would be taken into account here. We may still use the same compression algorithm, but a different implementation, for example.

Maybe we are wasting them, or maybe we are using them wisely. Only a benchmark could tell, and it depends a lot on the hardware specifics (SSD vs spinning disks, slow or fast internet connections) as well as the use cases (how often you need to extract a package, or download one, vs how often you access it).

It seems you have made up your mind on the results already without making any measurements. So, we can argue as much as we want, there is no data to back anything, and the discussion will not be very useful.

cb88 · April 16, 2024, 6:21pm

It’s runtime costs that don’t need to exist without the over complicated high maintenance design. Why shouldn’t I make my mind up against something that breaks several software engineering fundamentals?

win8linux · April 16, 2024, 6:24pm

Yeah rereading my previous comment, I’ll have to agree. It’s early morn here, so apologies for the mistakes there.

Those two qualities are some of the most desired metrics for compressed archives: how small they can get and how fast they can unpack. So at least by this interpretation, ZSTD is the best overall (open-source) option. Should’ve prolly been clearer about that.

Sorry, I meant file format there. It does indeed use the LZMA algorithm. Got confused there since most of the time I’ve seen ZSTD used it is with .tar.zst archives and is most frequently associated with that file format from an end-user perspective.

Anyways, this is prolly a good time to change the thread notifications setting and go to sleep…

Off-topic note:
Really wish Discourse had the ability to do timed thread muting or just time-limited thread notification settings in general.

suhr · April 17, 2024, 4:44am

Is it possible to make and use a package that is not compressed but is still packaged?

suhr · April 17, 2024, 4:47am

Keep in mind that RAM is still faster than SSD. So you want an in-memory cache anyway to make access faster.

extrowerk · April 17, 2024, 5:05am

Use the sources, Luke:

github.com

haiku/haiku/blob/master/src/bin/package/package.cpp#L31


      
          
          
          static const char* kUsage =
          	"Usage: %s <command> <command args>\n"
          	"Creates, inspects, or extracts a Haiku package.\n"
          	"\n"
          	"Commands:\n"
          	"  add [ <options> ] <package> <entries>...\n"
          	"    Adds the specified entries <entries> to package file <package>.\n"
          	"\n"
          	"    -0 ... -9  - Use compression level 0 ... 9. 0 means no, 9 best compression.\n"
          	"                 Defaults to 9.\n"
          	"    -C <dir>   - Change to directory <dir> before adding entries.\n"
          	"    -f         - Force adding, replacing already existing entries. Without\n"
          	"                 this option adding will fail when encountering a pre-exiting\n"
          	"                 entry (directories will be merged, though).\n"
          	"    -i <info>  - Use the package info file <info>. It will be added as\n"
          	"                 \".PackageInfo\", overriding a \".PackageInfo\" file,\n"
          	"                 existing.\n"
          	"    -q         - Be quiet (don't show any output except for errors).\n"
          	"    -v         - Be verbose (show more info about created package).\n"

PulkoMandy · April 17, 2024, 12:13pm

Yes, in fact this is used in a specific case: the package containing the bootloader is not compressed, because the stage 1 bootloader needs to be small and cannot contain code to decompress packages.

It’s not very hard to write a script to unpack all hpkg files and repack them without compression if you are interested in benchmarking this.

PulkoMandy · April 17, 2024, 12:14pm

The main maintenance cost is the wasted time arguing on the forums with people who don’t actually maintain the code. But I think no amount of changes to the code will fix that.

cb88 · April 18, 2024, 2:38pm

This is true but you also don’t want these caches to be too large, to the point that they have diminishing returns (if you do you are wasting effort and hardware doing something pointless). Eg if you cache all reads your hit rate with end up quite low… until the cache fills all your ram and starts evicting cold entries, and even then the cache hit rate may still be pretty low relative to how much memory is used.

PulkoMandy · April 18, 2024, 2:57pm

If you cache data that is already written back to disk, you can immediately release it if the memory is needed for something else. There is no need to write it back to disk if you keep track that it is already identical to what’s on disk.

So you’re not really “wasting hardware”: it’s not like you could power off a RAM chip when it’s not in use. They’re going to sit there unused, and still use just as much power. So you may as well put something in there.

As for efforts, it is also marginal, since the disk cache is something we already have for all other filesystems anyways (and all other OS work pretty much the same).

In fact, we even make things simpler: since packagefs is read-only, its cache entries never need to be written back to disk. We also save a lot of filesystem overhead: traditional filesystems are very bad at storing small files. BFS will need one complete inode (usually 4 kilobytes) for the file header, and another (4 kilobytes more) for the data, even for a simple configuration file that’s just a few hundred bytes. Whereas, if all these small files are packed into a single package, they can be stored a lot more efficiently. So, even without compression, there is quite a lot of saved space here, made possible by the read-only nature of packages. Again this is not something completely crazy: it is somewhat similar to squashfs on Linux, even if you don’t typically find that used on desktop systems.

suhr · April 18, 2024, 4:10pm

I guess one could just decompress packages (without unpacking them) and set smaller caches in packagefs.

cb88 · April 18, 2024, 4:13pm

Reiserfs already solved that 20 years ago, pretty sure at least some modern file systems can do the same.

cb88 · April 18, 2024, 4:14pm

Yes this is what you would have to do if you wanted to run on lower ram machines…

Edit: actually what you suggest there would cause lower end machines to choke, because of not enough CPU to keep up with reading the packages.

suhr · April 18, 2024, 4:29pm

Would it? I don’t know how packagefs is implemented, but I don’t think it does brute force search on all packages on every request.