Hacked off with container OSeses

adamfowleruk · August 17, 2021, 4:37pm

So I work with containers day in and day out. The apps are fun, the container definition, creation, and security not so much.

I’ve been looking at container base images and am struck by both the bloat and the inability to be able to prove a lot of the time what the license and exact git traceable versions of things that are deployed are.

It’s also amusing to see people think that a layered file system is a new thing!

I know it cannot happen as containerd and the linux kernel implementation of containerisation is the core enabling tech, but I wish oh wish I could grab something like Haiku that was very small, just load the packages I needed, and use that as a base image without all the upstream faff of alpine vs distroless vs everything else.

So what do we think… Haiku base image for Kubernetes anyone??? Who is with me!?!

Only half joking… It would make messing around with Base OS images fun at least if it were Haiku.

extrowerk · August 17, 2021, 4:56pm

I have no experience with containerization, but isnt it simply some made-up bs to cover the fact that the developers cannot devops and vice-versa, and shared linking is too hard, and basically most people lazy and/or incompetent or simply careless in the CS (not just nowadays, but basically ever)?

If so, why should Haiku take part in ot and why shouldnt Haiku simply eliminate all legacy/deprecated way of thinking and reform the living sht from the industry to shake up the people in it?

adamfowleruk · August 17, 2021, 5:00pm

Haha!

To be fair, I’m not entirely sure how process isolation is implemented in Haiku’s kernel right now. You may find the same basic concepts exist and all this containerd mumbojumbo isn’t required… An interesting thought…

asm · August 17, 2021, 6:11pm

Containerization is more than just isolating the processes or for use in fancy “webby setups” though.

It’s also a way to define (in a declarative style) a reproducible setup that can then be used by anyone. Amongst other things it makes it a lot easier to package all the right versions of libs and tools and ensure that your program is going to run as expected.

From the dev side - I just pass a dockerfile (or docker-compose) to another dev and we are then using the same environment. The fact that it’s isolated means that we can’t have conflicting libc libraries, random header conflicts, or whatever.

I find it really useful!

asm · August 17, 2021, 6:12pm

All valid criticisms!

The bloat can be unreal!

nephele · August 17, 2021, 6:16pm

One part for that is a common config format, I think Haiku is already there, almost, woth flattened bmessage config files. Not every apps uses that though…

FreeBSD is also trying to make all their config files have a common format, and kick out the circus of black magic config file formats specific to certain unix applications (like crontab or mail aliases file)

edit: haiku also lets you mount packages in a chroot, which is similar in design to a namespace in that you don’t have to care what the rest of the OS is using, and generally it is much easier to have folders at different locations in haiku because of find_directory.

PulkoMandy · August 18, 2021, 6:34am

This is one place where containers work and they are better than running virtual machines. For example at work we use this so that our buildbots can build any version of our software (new or old), even ones we didn’t touch for 10 years. We have an archive of all our docker containers and can easily restart one if we need to.

For use on developer workstations, however, it seems a bit overkill, and as a result, unconvenient. I would be much happier with something like python virtualenv (which doesn’t do privilege isolation or anything like that). Unfortunately I’m a C++ developer, so that isn’t going to happen.

We’re doing embedded software and using Yocto for the final packaging, This also does some kind of isolation but the approach is “let’s recompile the whole universe” which makes it a bit unconvenient and we had to set up quite some infrastructure for it to work well (big buildserver shared between devs using icecc, package cache on an NFS server), and we are not completely happy with this network-heavy way of doing things, with some colleagues working remotely over not-so-high-speed DSL lines, it’s not so fun for them.

adamfowleruk · August 18, 2021, 7:56am

This is what containerisation should achieve, but actually if you look at many Dockerfiles you see they are based on non-versioned tags, or use apt commands to update the machine during the build, meaning that the next time someone runs the build it may have different package versions. In order to be reproducible you need to know the exact machine setup. Otherwise you are open to supply chain attacks.

Another issue is pinning built artifacts (E.g. containers, node.js packages, and so on) against the exact commit they came from. A scan and compare of npm packages, for example, will show that half of common packages built from the claimed commit sha don’t produce the same binary output.

Then you have licensing issues. Alpine linux images are great, they are small, they have relatively few snyk scan report issues - but they use Busybox who are GPLv2 sue-happy. So corporates have to ensure they don’t use alpine as the base for anything either they produce, or in the open source upstream of their dependencies. So I end up with base image bloat. On node this is even hairier - how do I know all the 2089 packages (not a joke) just installed by yarn have a compatible license for my simple REST API service?

I guess my question is, if you were to provide the same experience (if not the same exact approach/config files) of isolating and packaging container images on Haiku, would it be different, and would there be any benefits/problems with the design of such a subsystem on Haiku? I.e. is there a ‘better way’ to do this? Is it easier on a single ecosystem like Haiku rather than the sprawl of Linux?

I’m more interested in the theory of if there is a better way than actually saying we should build it right now. The thought exercise might help in the general approach. Although of course, if we find there is a great benefit to a container style system in Haiku then great, I’ll totally help get involved. I think for now though knowing about what primitives are available would be useful.

ahwayakchih · August 18, 2021, 8:29am

For use on developer workstations, however, it seems a bit overkill, and as a result, unconvenient. I would be much happier with something like python virtualenv (which doesn’t do privilege isolation or anything like that). Unfortunately I’m a C++ developer, so that isn’t going to happen.

I use containers for isolation when developing too. I guess it depends on what language you’re using and how it manages modules/libraries. With node.js, it’s really risky to install modules on your workstation (or on anything that contains data that should not be accessible by someone you don’t trust) - they can run whatever they want on install. AFAIK similar situations happened with Python and PHP (when used with their “package repositories”).

PulkoMandy · August 18, 2021, 8:59am

That’s why I mention virtualenv for Python. It solves this nicely, and it’s essentially just a shell script setting a few environment variables.

ahwayakchih · August 18, 2021, 9:47am

Yeah, but you wrote that it does not do privilege isolation. For me settings a few env variables is not enough to stay secure (well… as secure as possible given container runner and possible privilege escalation bugs/holes), but i may be a bit paranoid :).

PulkoMandy · August 18, 2021, 10:08am

Fortunately in my case I trust my colleagues and the code review process we have in place

But yes, there are places where that isn’t the case, and then the extra safety provided by containers can be useful.

tqh · August 18, 2021, 10:45am

PulkoMandy:

asm:

From the dev side - I just pass a dockerfile (or docker-compose) to another dev and we are then using the same environment. The fact that it’s isolated means that we can’t have conflicting libc libraries, random header conflicts, or whatever.

This is one place where containers work and they are better than running virtual machines. For example at work we use this so that our buildbots can build any version of our software (new or old), even ones we didn’t touch for 10 years. We have an archive of all our docker containers and can easily restart one if we need to.

For use on developer workstations, however, it seems a bit overkill, and as a result, unconvenient. I would be much happier with something like python virtualenv (which doesn’t do privilege isolation or anything like that). Unfortunately I’m a C++ developer, so that isn’t going to happen.

I use systemd-nspawn so I don’t have to have all my dev deps on the machine I’m developing on. It’s like chroot into a container. I really like it.

Here is my base image setup, which is done once:

function init() {
	msg "Downloading Debian Buster RootFS"
	machinectl pull-tar https://github.com/debuerreotype/docker-debian-artifacts/raw/d5a5b49170b3f736cc7952787f074d7e24cf56fd/buster/rootfs.tar.xz debian-buster --verify=no --read-only
	msg "Setting up base packages and user"
	sudo systemd-nspawn -q -P --register=no -M debian-buster -E DEBIAN_FRONTEND=noninteractive /bin/bash << END_COMMANDS

	cat - >> /etc/dpkg/dpkg.cfg.d/excludes << END_DPKG_CONF
# Drop locales
path-exclude=/usr/share/locale/*
path-include=/usr/share/locale/en/*
path-include=/usr/share/locale/locale.alias
# Drop translated manual pages
path-exclude=/usr/share/man/*
path-include=/usr/share/man/man[1-9]/*
END_DPKG_CONF

	hostname debian-buster
	rm /etc/machine-id /var/lib/dbus/machine-id > /dev/null
	apt-get update
	apt-get -qq dist-upgrade
	apt-get install -qq sudo nano

	useradd -u $(id -u) -m $(id -un) -G sudo
	passwd -d "$(id -un)"
END_COMMANDS
}

And then I create a virtual env for haiku dev:

NAME=haiku-dev
OVERLAY="/var/lib/machine-overlays/$NAME"

sudo systemd-nspawn -q --register=no --read-only \
	-M debian-buster --hostname $NAME --overlay=+/:$OVERLAY:/ \
	--bind /home/myusername/haiku-projects -E TZ=':/etc/localtime' \
	--chdir /home/$(id -un)/haiku-projects/haiku -u $(id -u) /bin/bash

kallisti5 · August 22, 2021, 8:42pm

Haiku isn’t Linux. For Haiku to run containers (Haiku containers, not Linux containers), a process isolation subsystem would need created. We do have chroot support, but honestly Haiku’s target isn’t the server.

With that said, what we really need is what Docker does on OS X / Windows workstations. docker-machine creates a Linux VM (in virtualbox or whatever), and tells your local docker executable on OS X to run containers inside of it.

If we want to enable developers to work within Linux containers under Haiku, we’ll need:

Get a hardware virtual machine acceleration layer working for qemu (kvm, bhyve, etc)
Add Haiku support to docker-machine, docker-desktop, or roll our own
Get docker (the cli) or podman running under Haiku in remote mode so it can communicate to the linux vm + linux containers.

However… I think we have bigger priorities at the moment… like improving WebPositive, VPN support, etc.

roiredxsoto · August 23, 2021, 9:23am

Good day,

I would definitely enjoy having some sort of containers, but to run an isolated Haiku inside Haiku, so that way I could mess up things inside the container without messing up the system. I’m thinking about how Fedora Silverblue does it with the Toolbox, where I can set a different Fedora version within each toolbox and run isolated software in each toolbox, thinking about having a “toolbox” for Beta3 (or whatever beta might be] and a “toolbox” for Nightly would be nice for testing things without breaking the installed OS.

Other than that, I wouldn’t care so much for containers on Haiku.

Regards,
RR

adamfowleruk · August 23, 2021, 10:50am

A very interesting use case I had not considered! I suppose the more Haiku is a daily driver for developers the more useful such a feature would be.

tqh · August 23, 2021, 4:08pm

Once we have virtualization and Go I think it will not be that hard to provide Docker for Desktop like functionality with MiniKube. It is open source and runs a docker daemon that you can point your docker client to.

The only thing Minikube lacks compared to Docker for Desktop is announcing its node through mdns so you can point to it in your docker config.

I already replaced Docker for Desktop on my Mac, so it works.

kallisti5 · August 26, 2021, 2:48am

golang is important to have any level of functionality in docker, podman, etc.

I tried compiling podman under our old golang 1.4 port, and it seemed likely to work (remote mode) with only a few modifications to podman. golang 1.4 is just too old though.

We need a more up to date golang port… for those not in the know, golang > 1.4 switched from C to self hosting (golang is written in golang).

@jessicah is working on a golang port at the moment, and a few other folks have worked on it in the past.