To cut down on http scrapers. We get a lot of random bots scraping our repos when we leave indexes on which eats up bandwidth which could be better spent mirroring to mirrors via rsync.
Hm. If there is a good use-case I don’t want to block it from functioning either. We have around 250 GiB of haikuports packages that bad bots were randomly scraping on a loop and raising our bandwidth usage.
Do you have some details on the usage cases @Diver ?
I’d occassionaly grab older packages by navigating Haiku Depot Server, since HaikuDepot does not have the facility to list/retrieve older versions. Software ships with bugs so going back a couple of versions isn’t such a rare user behaviour. Also, SoftwareUpdater is all or nothing, we still cannot manually select what to update unless we use Terminal based pkgman.
Nah. I can try adding it, but the user agents at the time were identifying as Internet Explorer. Bad bots don’t care about robots.txt
So. we really need better tools for managing repos and packages. (something like aptly for haiku comes to mind)
My wish list includes server-side:
de-duplication (might be limited though in benefit)
package version tracking
supporting cold object storage for old versions of software
haikuporter buildmaster using s3 for storage
This one almost needs a complete rewrite around haikuporter buildmaster. The code is heavily dependent on local files. I think if we could develop something to manage / goom repositories via rest api calls, that haikuporter buildmaster should interface with it directly.
Several of the above depend on native support for our hpkg’s and hpkr’s. I can write various server-side tooling in rust to do all this stuff… but have to get Alexander von Gluck IV / hpkg-rs · GitLab functioning before any of it happens.
If anyone knows rust and wants to help out parsing hpkg files and hpkr files in it… PR’s welcome Our only native code for managing hpkg / hpkr files is HaikuDepot (java) and the package kit.