Mirroring and torrent hosting

Hi Haiku contributors/maintainers.

Seen in the “getting involved” page that there would be a need for mirroring large files and providing torrent hosting.
I have an unused NAS with 1Tb free space that can be used as a https or torrent host.

I have a few questions:

  • What would be of highest priority: mirroring nightly/release images, mirroring haiku depot releases, else?
  • I see the xml/atom feed for nighties does not seem to be working. Is it in somebody to-do list to fix this as it would be very easy for me to automate torrent creations from there? That would be handy as my spare NAS is not powrfull enough to run docker to generate images on my side.
  • Also, out of curiosity, how many “build farms” for OS and for ports does Haiku have and where are they located?

Thanks

1 Like

Hello there, and welcome back to the community! (It says apparently your last post was 14 years ago haha)

Yes, we definitely do need more mirrors, since we only have a small selection available for stable release images. If anyone else is reading this and would like to run a mirror for Haiku, please post in this forum to let people know!

I can’t speak for the sysadmins here (we will have to wait for their response), but at the moment the nightly images are only distributed through the United States mirror, and IMO it would be nice if there was another mirror we could get those from. Not too sure if there are any other things needing a mirror, we will have to see what the sysadmins say.

Checking Trac now, I see that someone has reported the issue but no one has gotten around to it: #15553 (RSS not working in downloads.haiku-os.org) – Haiku

It’s currently assigned to @kallisti5 so I’ve mentioned them and hopefully they will see this thread.

Again, you will have to ask the sysadmins, sorry about that. :slight_smile:

1 Like

Currently we don’t have a signature system on the package repositories, so mirrors for it can’t be trusted. Anyone having write access to a mirror (intentionally or by attacking it) could replace any package there and no one would notice.

For release and nightly images, the SHA checksum is available on the main website and so mirroring should be possible.

Everything is currently running on a single server owned by Haiku inc.
For Haikuports it runs Haiku in virtual machines. For Concourse (providing the nightly builds) I think it runs them on Linux in Docker containers.

Correct. This is the main reason i’ve only really pursued a few “release” mirrors and focused on getting our repos onto wasabi object storage. We really can’t 100% trust package mirrors until we get signature checking on the “repo” file. (The repo file contains hashes of all packages within the repo… so in theory we should be able to validate the signature of this single file as a first step)

I’ve built signatures into the haiku repo using minisig and our CI/CD. Haikuporter buildmaster should follow the same strategy, but nobody has shown interest in improving buildmaster lately.

(example)
https://eu.hpkg.haiku-os.org/haiku/master/x86_64/current/repo.minisig

This really isn’t true anymore :slight_smile:

For the Haiku builds, when we moved to concourse, I turned off all of the MacMini’s in favor of one personally owned Dell i7 Optiplex machine with an NVME (Haiku, Inc owned) performing builds. All of our Haiku builds are run on a single builder in Austin, TX. We definitely could use more OS building systems for redundancy, but I’ve been favoring “larger and faster” over the smaller outdated MacMini’s which are less power efficient. For what it’s worth, my Haiku builder made it through the Texas ice storm somehow without an outage… so my grid connection seems pretty reliable.

All of our Haiku builds run within containers with a “consistent set of buildtools”, so having a bit larger system became a requirement vs squeezing our builds into systems with 2-4 GiB of ram.

For Haikuports builds, mmlr has two VM’s running at his personal residence. I don’t know much beyond that for those.

1 Like

Oh… another thing I experimented with was pushing Haiku package repositories onto IPFS. In theory it would sign artifacts in a transparent way, allowing anyone to mirror our package repos with a click of a button AND provide deduplication.

However managing a huge number of large files in IPFS becomes a bit of a nightmare. I opened a bug here about it: https://github.com/ipfs/go-ipfs/issues/7586

On a smaller scale, I pushed our release images into IPFS. It seems to work pretty well. You can try the IPFS Gateway here: http://ipfs.haiku-os.org

dig TXT ipfs.haiku-os.org
ipfs.haiku-os.org.	3600	IN	TXT	"dnslink=/ipfs/QmXtYCCM8Kt7Q9DbXYKHy8bUQve9zauNiKUZs57bnphBtw"
5 Likes

Thanks all for your answers. I will set-up an https mirror for the last 20 nightlies for each arch. It will be in place this week-end (wether I keep it long term will depend if my connection gets hammered).

I am also putting the finishing touches at a script to automatically generate edonkey (emule) downloads for nightlies (as well as bittorent in the coming days) as P2P download would scale much better.
Will make sure xml/atom gets generated torrent subscription can be done.

[edit: typo]

1 Like

Thanks, this is very interesting. I am not familiar with IPFS. I will make sure “old-school” p2p work and will then investigate this.
[edit:typo]

Glad to hear IPFS is being given a go, I’m pinning the releases now!

If you ever do end up putting packages on IPFS, I’d like to pin what I can― my node has empty space, without much to do with it lol.

@jaidedcd wanna help even more? Our s3 bucket provider just told us we’re using too much outbound bandwidth.

I’ve setup https://us.hpkg.haiku-os.org on IPFS as a test. We could really use more mirrors :slight_smile:

For now, repinning /ipns/us.hpkg.haiku-os.org every few days would help us out a ton. I’ve pinned it from a server and my home NAS.

2 Likes

Has applying for sponsorships been considered to help out with mirroring? Fastly CDN and DigitalOcean (more details) both sponsor OSS, including non-Linux OS projects. There are prolly others that I’m unaware of.

For sponsorign we could ask https://osuosl.org/services/hosting/ as well. I expect several other places would also do it, but it’s not always easy to find the contact points.

Debian for example has a quite long list of mirrors in many places: https://www.debian.org/mirror/list

We used to have a lot of mirrors for our stable releases as well, but somehow these were not used for packages (we didn’t dare to ask people to host this much data, I guess?)

1 Like

Is the Inc. willing to look into alternate cloud storage providers or CDNs? Got suggested BackBlaze for purely cloud storage and bunny.net for both CDN and cloud storage.

I’ve got a Dreamhost plan with unlimited bandwidth/size. Maybe I could host some images there?

Heck yea! Pinning now, I’ll keep up-to-date.

2 Likes

I’m debating which way to go with all of this.

  • Matching today: Just find another S3 Object Storage host
    • Reliable
    • Cost: $$$$ for our amount of data + traffic (+$100-$200 / month). Cost goes up over time
    • No need for mirrors except for geographic speeds
  • Funky: Try to make the IPFS thing work.
    • Reliable with lots of active pinning. Our repos “update” daily, and every mirror has to “repin” when updates are made to pull latest data. Known issue with IPFS that will helpfully be fixed.
    • Built-in deduplication and signature checking
    • The repos would essentially be community mirrored with Haiku, Inc. keeping a reliable pin going.
    • Haiku, Inc. would pay for a few smallish IPFS vm’s to pin our releases and keep them reliable.
    • We depend on the bandwidth distributed across everyone who pins, and everyone who runs mirrors.
      • As an example, if we have a large userbase in Russia (which we do), and Russia blocks Digital Ocean (which they do), folks in Russia could take things into their own hands and “pin” our repos. Then folks in Russia could stand up or use a Russian IPFS gateway to access Haiku.
    • Kind of dependent on “large stable” gateways like cloudflare-ipfs.com (which is the only public one that offers SSL). I’m trying to get a hold of the Cloudflare folks for some issues I ran into on their gateway (caching related)
  • Old School We buy a dedicated server with lots of bandwidth, and install S3 there.
    • Cost $$$
    • Reliability: High

I feel like the IPFS stuff could “grow” the most over time with us and it solves getting Haiku repos mirrored all over the world, and it saves space with deduplication, and it has built in signatures for security… but there are a lot of pointy bits because the tech is still young.

Understand your pain even if I don’t really understand what your talking about (that’s new to me, as I have worked with servers stuff a lot)

So some questions.
What are others using? FreeBSD?
Can we team up with some one else?

Do we need all that storage or could be use the 2-3 latest build and If “we” like to have a older one that can be built on demand?

I did a bit of research a while back and found some organisations we could ask:

2 Likes

+1

Globalize, and distributed the wealth (and effort).

Thanks for researching. I think this option (which is zero cost and a bit distributed) should be researched. If it doesn’t work, we can see about the other ones? I think our requirements are 1TB of storage (for now, this might grow) and 3TB of bandwidth? But if we have multiple servers, they will share the bandwidth and it will not be so much of a problem anymore?

Also, if we go the p2p way, maybe some other protocols should be considered, for example bittorrent (which is popular) or gnutella (a bit forgotten these days, but interesting because the data transfers happen over http, so it only changes how to discover a server where you can download a particular file).

4 Likes

Regarding other protocols, I am already sharing releases and nigthlies (20 last revs) through edonkey/emule (still need to automate magnet ed2k file creation for this protocol).
I am also investigating automated tracker creation for bittorrent to share nightlies, but did not find time to do it just yet. One of the top items on my todo list.