Seen in the “getting involved” page that there would be a need for mirroring large files and providing torrent hosting.
I have an unused NAS with 1Tb free space that can be used as a https or torrent host.
I have a few questions:
What would be of highest priority: mirroring nightly/release images, mirroring haiku depot releases, else?
I see the xml/atom feed for nighties does not seem to be working. Is it in somebody to-do list to fix this as it would be very easy for me to automate torrent creations from there? That would be handy as my spare NAS is not powrfull enough to run docker to generate images on my side.
Also, out of curiosity, how many “build farms” for OS and for ports does Haiku have and where are they located?
Hello there, and welcome back to the community! (It says apparently your last post was 14 years ago haha)
Yes, we definitely do need more mirrors, since we only have a small selection available for stable release images. If anyone else is reading this and would like to run a mirror for Haiku, please post in this forum to let people know!
I can’t speak for the sysadmins here (we will have to wait for their response), but at the moment the nightly images are only distributed through the United States mirror, and IMO it would be nice if there was another mirror we could get those from. Not too sure if there are any other things needing a mirror, we will have to see what the sysadmins say.
Currently we don’t have a signature system on the package repositories, so mirrors for it can’t be trusted. Anyone having write access to a mirror (intentionally or by attacking it) could replace any package there and no one would notice.
For release and nightly images, the SHA checksum is available on the main website and so mirroring should be possible.
Everything is currently running on a single server owned by Haiku inc.
For Haikuports it runs Haiku in virtual machines. For Concourse (providing the nightly builds) I think it runs them on Linux in Docker containers.
Correct. This is the main reason i’ve only really pursued a few “release” mirrors and focused on getting our repos onto wasabi object storage. We really can’t 100% trust package mirrors until we get signature checking on the “repo” file. (The repo file contains hashes of all packages within the repo… so in theory we should be able to validate the signature of this single file as a first step)
I’ve built signatures into the haiku repo using minisig and our CI/CD. Haikuporter buildmaster should follow the same strategy, but nobody has shown interest in improving buildmaster lately.
For the Haiku builds, when we moved to concourse, I turned off all of the MacMini’s in favor of one personally owned Dell i7 Optiplex machine with an NVME (Haiku, Inc owned) performing builds. All of our Haiku builds are run on a single builder in Austin, TX. We definitely could use more OS building systems for redundancy, but I’ve been favoring “larger and faster” over the smaller outdated MacMini’s which are less power efficient. For what it’s worth, my Haiku builder made it through the Texas ice storm somehow without an outage… so my grid connection seems pretty reliable.
All of our Haiku builds run within containers with a “consistent set of buildtools”, so having a bit larger system became a requirement vs squeezing our builds into systems with 2-4 GiB of ram.
For Haikuports builds, mmlr has two VM’s running at his personal residence. I don’t know much beyond that for those.
Oh… another thing I experimented with was pushing Haiku package repositories onto IPFS. In theory it would sign artifacts in a transparent way, allowing anyone to mirror our package repos with a click of a button AND provide deduplication.
Thanks all for your answers. I will set-up an https mirror for the last 20 nightlies for each arch. It will be in place this week-end (wether I keep it long term will depend if my connection gets hammered).
I am also putting the finishing touches at a script to automatically generate edonkey (emule) downloads for nightlies (as well as bittorent in the coming days) as P2P download would scale much better.
Will make sure xml/atom gets generated torrent subscription can be done.
Has applying for sponsorships been considered to help out with mirroring? Fastly CDN and DigitalOcean (more details) both sponsor OSS, including non-Linux OS projects. There are prolly others that I’m unaware of.
For sponsorign we could ask https://osuosl.org/services/hosting/ as well. I expect several other places would also do it, but it’s not always easy to find the contact points.
We used to have a lot of mirrors for our stable releases as well, but somehow these were not used for packages (we didn’t dare to ask people to host this much data, I guess?)
Is the Inc. willing to look into alternate cloud storage providers or CDNs? Got suggested BackBlaze for purely cloud storage and bunny.net for both CDN and cloud storage.
Matching today: Just find another S3 Object Storage host
Reliable
Cost: $$$$ for our amount of data + traffic (+$100-$200 / month). Cost goes up over time
No need for mirrors except for geographic speeds
Funky: Try to make the IPFS thing work.
Reliable with lots of active pinning. Our repos “update” daily, and every mirror has to “repin” when updates are made to pull latest data. Known issue with IPFS that will helpfully be fixed.
Built-in deduplication and signature checking
The repos would essentially be community mirrored with Haiku, Inc. keeping a reliable pin going.
Haiku, Inc. would pay for a few smallish IPFS vm’s to pin our releases and keep them reliable.
We depend on the bandwidth distributed across everyone who pins, and everyone who runs mirrors.
As an example, if we have a large userbase in Russia (which we do), and Russia blocks Digital Ocean (which they do), folks in Russia could take things into their own hands and “pin” our repos. Then folks in Russia could stand up or use a Russian IPFS gateway to access Haiku.
Kind of dependent on “large stable” gateways like cloudflare-ipfs.com (which is the only public one that offers SSL). I’m trying to get a hold of the Cloudflare folks for some issues I ran into on their gateway (caching related)
Old School We buy a dedicated server with lots of bandwidth, and install S3 there.
Cost $$$
Reliability: High
I feel like the IPFS stuff could “grow” the most over time with us and it solves getting Haiku repos mirrored all over the world, and it saves space with deduplication, and it has built in signatures for security… but there are a lot of pointy bits because the tech is still young.
Thanks for researching. I think this option (which is zero cost and a bit distributed) should be researched. If it doesn’t work, we can see about the other ones? I think our requirements are 1TB of storage (for now, this might grow) and 3TB of bandwidth? But if we have multiple servers, they will share the bandwidth and it will not be so much of a problem anymore?
Also, if we go the p2p way, maybe some other protocols should be considered, for example bittorrent (which is popular) or gnutella (a bit forgotten these days, but interesting because the data transfers happen over http, so it only changes how to discover a server where you can download a particular file).
Regarding other protocols, I am already sharing releases and nigthlies (20 last revs) through edonkey/emule (still need to automate magnet ed2k file creation for this protocol).
I am also investigating automated tracker creation for bittorrent to share nightlies, but did not find time to do it just yet. One of the top items on my todo list.