The packagefs stores the uncompressed files in RAM, in the file cache. This can be removed from memory and later re-extracted, but only if you run out of memory. On modern systems, the RAM is often much larger than the complete uncompressed system, so, it’s not really a problem.
You can argue that all you like but abusing free ram isn’t completely free, and its not even possible to do effectively on low ram systems.
It’s also violating the basic storage hierarchy… you have a fast disk, why are you trying to optimized storage space on disk when the disk is the cheapest of all the memory types. It’s classic premature optimization and overcomplication of something that should be super simple.
You can make uncompressed packages if you want. It doesn’t change much in term of disk access performance if you run an SSD (it does if you run a spinning hard disk, which was the case for most people when this was developed).
It also makes the install media smaller, and the download of updates faster.
HPKGs are already compressed with ZSTD, which is widely considered to be the best overall compression algorithm and has been gaining traction for years. The next best open-source option would be XZ, unless of course its library’s recent security incident is taken into account.
My point was that is a BAD thing at runtime… unless you have hardware acceleration of decompression like consoles have, you are wasting CPU cycles or memory space to do that. Rather than implementing some complicated caching scheme yous should just let SSDs be fast with a simple caching scheme (otherwise you are pushing complexity and higher latency caches up in the stack).
And yes… everyone uses some form of relatively good compression these days.
First, “the best overall compression algorithm” just makes no sense at all. It depends what you want to do with your compression. ZSTD is a good choice when you need fast decompression, and still a pretty good reduction of size. But, what if you want something even faster? You can try snappy. Something that compresses more? There are a lot of choices, the latest ones being based on neural networks, and needing a last-generation GPU to run the decompression.
As for XZ, XZ is not a compression algorithm, it’s a file format. It uses LZMA compression. The HPKG files do not use an existing compression file format, because few of them are designed for random access. Which is required for HPKG files. So, I don’t see how the recent security incident would be taken into account here. We may still use the same compression algorithm, but a different implementation, for example.
Maybe we are wasting them, or maybe we are using them wisely. Only a benchmark could tell, and it depends a lot on the hardware specifics (SSD vs spinning disks, slow or fast internet connections) as well as the use cases (how often you need to extract a package, or download one, vs how often you access it).
It seems you have made up your mind on the results already without making any measurements. So, we can argue as much as we want, there is no data to back anything, and the discussion will not be very useful.
It’s runtime costs that don’t need to exist without the over complicated high maintenance design. Why shouldn’t I make my mind up against something that breaks several software engineering fundamentals?
Yeah rereading my previous comment, I’ll have to agree. It’s early morn here, so apologies for the mistakes there.
Those two qualities are some of the most desired metrics for compressed archives: how small they can get and how fast they can unpack. So at least by this interpretation, ZSTD is the best overall (open-source) option. Should’ve prolly been clearer about that.
Sorry, I meant file format there. It does indeed use the LZMA algorithm. Got confused there since most of the time I’ve seen ZSTD used it is with .tar.zst archives and is most frequently associated with that file format from an end-user perspective.
Anyways, this is prolly a good time to change the thread notifications setting and go to sleep…
Off-topic note:
Really wish Discourse had the ability to do timed thread muting or just time-limited thread notification settings in general.
Yes, in fact this is used in a specific case: the package containing the bootloader is not compressed, because the stage 1 bootloader needs to be small and cannot contain code to decompress packages.
It’s not very hard to write a script to unpack all hpkg files and repack them without compression if you are interested in benchmarking this.
The main maintenance cost is the wasted time arguing on the forums with people who don’t actually maintain the code. But I think no amount of changes to the code will fix that.
This is true but you also don’t want these caches to be too large, to the point that they have diminishing returns (if you do you are wasting effort and hardware doing something pointless). Eg if you cache all reads your hit rate with end up quite low… until the cache fills all your ram and starts evicting cold entries, and even then the cache hit rate may still be pretty low relative to how much memory is used.
If you cache data that is already written back to disk, you can immediately release it if the memory is needed for something else. There is no need to write it back to disk if you keep track that it is already identical to what’s on disk.
So you’re not really “wasting hardware”: it’s not like you could power off a RAM chip when it’s not in use. They’re going to sit there unused, and still use just as much power. So you may as well put something in there.
As for efforts, it is also marginal, since the disk cache is something we already have for all other filesystems anyways (and all other OS work pretty much the same).
In fact, we even make things simpler: since packagefs is read-only, its cache entries never need to be written back to disk. We also save a lot of filesystem overhead: traditional filesystems are very bad at storing small files. BFS will need one complete inode (usually 4 kilobytes) for the file header, and another (4 kilobytes more) for the data, even for a simple configuration file that’s just a few hundred bytes. Whereas, if all these small files are packed into a single package, they can be stored a lot more efficiently. So, even without compression, there is quite a lot of saved space here, made possible by the read-only nature of packages. Again this is not something completely crazy: it is somewhat similar to squashfs on Linux, even if you don’t typically find that used on desktop systems.