Mirroring and torrent hosting

I think the most worrying part is not really the nightly builds, but the package repositories? Which requires more work for a non-HTTP system since pkgman needs to be adjusted.

For nightlies it’s a lot easier to publish/mirror them in many places, and more is better :slight_smile:

Good point, will check over the week-end how the depot infra is structured to check how I can make packages available on edonkey/torrents (should be quite easy). Do not have the C++ knowledge to modify pkgman to consume this type of p2p protocols though, somebody else will have to tackle this :slight_smile:

Can the releases images be posted on Github releases page https://github.com/haiku/haiku/releases? It’s free & reliable.

2 Likes

Ha… That’s actually a great idea :slight_smile:

2 Likes

Well, full depot for ‘x86_64’ will be available in he coming hour on edonkey.
Will generate the page providing ed2k links over the week-end.
Will need to check if we have a readily available lib in our ports to use edonkey/emule p2p.

1 Like

I think the releases are already distributed through sourceforge mirrors (also free and also reliable, and doesn’t depend on a single company because the mirrors are not owned by sourceforge).

1 Like

I mainly meant as a mirror. Unfortunately you can’t “release” artifacts without tagging a branch. Tagging a branch on github would put it out-of-sync with our central git server’s tags (we don’t push tags to github)

For the IPFS stuff, having some “issues” managing large amounts of data. The “big chain” of linked chunks broke resulting in some strange issues. I think IPFS might be too unstable for our usage :expressionless:. I’ll see where that bug report goes though.

I’m also looking at storj.io since the price is right (and it’s distributed) They don’t seem to offer public access to artifacts however :expressionless:

@jaidedcd your pin likely never finished. The VM I was hosting this stuff from had a corrupt repo with a few corrupt CID (making IPFS pins “stall indefinitely”)

I’ve done some adjustments and repairs. If you re-pin /ipns/hpkg.haiku-os.org you should be golden. (I cut out the “us” since that only makes sense for gateways and not as much for “mirroring”)

The CID chunks you received should still exist on your IPFS node unless you cancelled the pin and garbage collected, so not much “re-downloading”

Yea, I only managed to get it pinned once, but when it tried updating afterwards it stalled.
I’ll point my node to hpkg.haiku-os.org now, thanks.

1 Like

FYI, also pined /ipns/hpkg.haiku-os.org.
Not too sure yet on how to automate obsolete file deletion when it will come to hosting ports though (eDonkey or Gnutella makes it so much easier for now), will have to do more tests with it.

Thanks! This is a good question. Repinning /ipns/hpkg.haiku-os.org will pull in “updates” (delta, aka small amounts of bandwidth). however i’m pretty sure IPFS resolves /ipns/hpkg.haiku-os.org to a CID.

ipfs pin ls --type=recursive
QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn recursive

I’m asking the IPFS folks how this is handled. Data that isn’t pinned will be garbage collected (deleted), but this raises some interesting questions about how old data is cleaned up.

I’ll push some minor updates to /ipns/hpkg.haiku-os.org once my personal server pin is complete and see what happens.

EDIT: I’m going to open a new thread for the IPFS stuff… I feel like i’ve hijacked this one :slight_smile:

1 Like

No worries, this is still about mirroring, but it is true that IPFS is a subject in itself. Maybe the forum needs a “downloads & mirrors” category?

Thinking of a blog post explaining the situation I got us into :slight_smile:

3 Likes

Good point, information will be easier to find for future reference (and to understand where we come from).
For the long run, will probably need a (several?) site page(s) explanining how to mirror nightlies, releases and ports.

Sorry I’ve been kind of out of the Haiku loop for a few weeks though I was aware of our infrastructure issues.

A blog post would be nice @kallisti5, maybe you could explain a bit more about why Haiku needs 1 TB of data and 3 TB of bandwidth for the packages, and maybe a bit of the architecture, or you could link to existing docs. That might help people understand the issues better and provide suggestions.

I do suspect that there is massive redundancy and duplication among multiple releases of the same package, and something I’ve thought about for a long time is some way to better archive older releases of packages so they could still be downloaded, if needed, but maybe use less space day to day.

Sort of the same ideas as IPFS but maybe simplified a bit for our use and hopefully made more reliable. Anyhow I’ve had the thought about a “package diff” system for a while, especially for the haiku.hpkg, since I doubt a lot changes between releases. I just need to sit down one day and actually try to implement it :wink:

The other benefit of such a system would be tiny, almost instantaneous updates. At least, I think so.

4 Likes

Way ahead of ya :slight_smile:

https://github.com/haiku/infrastructure/blob/master/docs/ipfs-pinning.md

We can definitely trim back the 1TiB of data we have today (i’ve already done that somewhat to make the IPFS mirroring less burdensome). It’s a lot of nightly images and haiku repos. however reducing data stored is actually hurting our position with Wasabi. They’re focused on ratios (1TiB stored, 3TiB outbound bandwidth is too much) If we stored 3TiB with the, I think they would be a lot happier using their logic.

2 Likes

I haven’t found any convenince scripts for keeping pins updated, so I’ll stash this here: pin-update.sh, pin-update.service, pin-update.timer.

It just checks for text-files in ~/.pins/ named after the IPNS address you want to regularly update. If you wanna regularly update /ipns/hpkg.haiku-os.org, just make a blank file “~/.pins/hpkg.haiku-os.org”. At every update, it’ll add the latest hash to that file so you have an update-history, and so future updates can use “ipfs pin update”.

The systemd files’ll automate it.

Thanks!

I’ve tossed haikuports into it… so the size will increase to ~95GiB on-disk (serving 140GiB of data)
I’m going to leave the nightly images in S3… that’s a lot of data to mirror and represents something that’s not used “continuously”

I stood up a dedicated German gateway based on the feedback in my country poll.

https://de.hpkg.haiku-os.org The breadcrumbs in the UI are broken, but everything else working so far. Users in Germany reporting 5-25MB/s (so pretty great, especially given the weight of the US pins so far)

Currently i’m pinning at my house (US) and a personal server (US). The German gateway is pinning just the release images.

For the moment, the biggest issue I’ve seen is the initial discovery of the ipns records (when updated) on the gateways can take over 90 seconds.

1 Like

Sitting here on a mediocre wifi waiting for an update of gcc_debuginfo, makes me wonder why we don’t have delta updates, has anyone investigated that possibility? might be really nice on slower connections, especially with huge packages like compiler related ones (or new nightlies for haiku which usually have only minor changes)

I have had the idea for differential hpkgs and have mentioned in repeatedly in the context of our various packages, but especially the nightly images which don’t change much day to day, as you say. I think it could save on server storage maybe, but certainly could make updates much smaller and faster. In the current world of multi-gigabyte OS updates having updates on the order of kilobytes would be awesome.

It does not seem that hard given the hpkg format, but with that said I have not made any effort to try it yet. I would start with straight up “this file now has these bytes” and see how that works, then maybe even consider binary diffs of the inner files as well, but I bet the first simple approach would already be a big win. The “hpkgd” or “hpkd” format could be very similar to hpkg, but with less metadata and three SHA checksums of itself, the package it updates and then the resulting package. Maybe one hpkd file could even update multiple starting packages, which might make sense for the nightlies.

Anyhow, I will definitely give this a try at some point, but with me that could mean not for a while, so obviously if someone else wants to give it a try please do.

3 Likes