Re-designing the kernel device_manager

waddlesplash · May 26, 2023, 2:10am

NOTE: This is a highly technical topic dealing with Haiku’s in-kernel “device manager” system. If you have never written or worked within device driver code on Haiku or BeOS, the discussions in this topic are probably not relevant to you, in which case: look, but don’t touch!

It has been my personal view ever since I started working on Haiku kernel code that the “new-style” device manager (which isn’t really new anymore as it’s over a decade old), while it does have a lot of advantages over the old one, is in many ways clunky and cumbersome to write code for, and has a number of annoying sharp edges. I believe I am not the only one to think so; I think I recall @axeld saying somewhere at one point that it deserved a redesign (and mentioned some specific developments that should have triggered such a redesign.)

Well, today I was working on some things in the USB disk driver, and realized the only way to do some of the things I wanted was to switch it over to use the “new” driver API. But said new driver API, for USB devices in particular, has some major disadvantages of the old one.

The basic disadvanges are:

Confusing boilerplate. In the “old” driver API (which comes from BeOS), every driver has to manage its own set of names of what devices it has “published.” The new driver API thankfully manages this for you, but instead it requires that every driver have a “device” module which handles probing and the like, in addition to the “driver” module which actually handles the read/write/etc. hooks.

In most drivers, this “module” winds up being a lot of duplicate and boilerplate code, which there isn’t much that can be done about: the register_device, init_device, etc. hooks.

In addition, since the device manager is string-attribute-based, every supports hook has to fetch a bunch of string attributes one at a time and then compare them. This leads to even more boilerplate and can be very inefficient.
- This last point is one of the major disadvantages of switching USB drivers from the “old” API to the “new” API. Under the “old” API, one constructs a pretty compact usb_support_descriptor array and passes it to the USB bus manager, which then invokes callbacks whenever devices matching the ones you specified are connected. Under the “new” API, one is left only with the supports hook, and winds up with a very boilerplate-y and not very pretty-looking probing function. You can readily compare the usb_ecm driver under the old and then new APIs to see what I mean here.
Inflexible abstractions. Say I want to publish a device in /dev. Why shouldn’t it be as easy as invoking device_manager->publish(parent, path, hooks);? But it isn’t. You have to
1. have a device module globally registered
2. call register_node to register a parent “device” node, using the name of the globally-registered device module
3. perform whatever initialization you need to in the init_device method of the “device” node when the device_manager calls it
4. have a driver module globally registered
5. call publish_device with your parent “device” node and the name of the globally-registered driver module
6. instantiate the driver using data passed through structures allocated back in step 3
Why is this so complicated? No wonder the /dev/null device (and basically all other drivers which are not attached to any other device) are still using the legacy driver API: because the new one would add tons of bloat to such simple drivers.

The idea that "everything is a device_node" seems to be the root of this inflexibility. That a PCI device has a device_node makes sense, that (e.g.) an ethernet driver attached to it is a device_node which is a child of it makes much less sense, I think.
Confusing APIs. Did you know that if unregister_node returns B_BUSY, this isn’t actually an error? I didn’t, until I saw this commit, and apparently @korli didn’t either as he is the one who originally wrote this code.

But really, the whole system of how one publishes a device, as described in the previous item, is a very confusing API.
Hard-coded search paths. I have been bitten by this when working on drivers, and so has @X512 and pretty much anyone else trying to bring up Haiku on anything that is not basic x86 (even QEMU with virtio posed problems that required multiple rounds of adjusting things here to get them to work.)

The original documentation for the device_manager seems to indicate this was supposed to be a feature, and not a bug. I think it’s clear that, while the idea sounds good on paper, it doesn’t really work in reality; something more flexible is necessary. Exactly what that looks like is up for debate.
Major missing features. Forget support for suspend/resume, did you know that Haiku does not even gracefully power off most devices? We basically just end all userland processes, synchronize mounted partitions, and then kill the power. Yes, really. The device manager has absolutely no idea how to ask drivers to shut down, and drivers expose no mechanism by which they would do so. We could at least try to close all devices, which would probably suffice for most things, but the poor device manager has no idea what order to close things in, because it usually relies on other things telling it to close and destroy device nodes rather than doing anything of its own volition.
Probably a lot more. I just wrote these up based on what I am currently running into while working on the USB disk and related drivers. There are definitely more problems than just these. I really should have written down the ones I thought of while working on the nvme_disk driver years ago, to begin with.

Now, on to the proposed changes…

waddlesplash · May 26, 2023, 2:16am

It would seem to make sense to me to have two kinds of things: device_nodes, devices that other drivers can attach to, and driver_nodes, which expose something in devfs (or even just privately) and have the standard set of read, write, etc. hooks.

Implementing this would resolve the crazy 6-step dance listed under “Inflexible abstractions”, as well as make it much easier for “drivers” without a “device” (e.g. /dev/null) to publish their devices.

(It’s possible driver_node is too confusing a name. After all, items in /dev/, which would be backed by driver_nodes, are usually referred to as “devices”. What terminology should be used instead, I’m not sure.)

Under this scenario, each and every device_node could have only one driver_node which was “attached to” and thus “handling” it – or perhaps only one device_node that could have it open in read/write mode. Presuming that partitions on a drive also become device_nodes (I have no idea if they do, at present, but given their presence in /dev/, I suspect they do), this would then prevent two filesystem drivers from having the same partition open at once. (There is a “reservation” system in the existing device manager, but as far as I know it only applies to actual hardware devices, not partitions. Or at least I’ve not seen it work in practice for partitions.)

waddlesplash · May 26, 2023, 2:25am

There is already #18333 (device_manager: introduce driver supported device ID registry to speed up boot) – Haiku with some discussion related to this. But I think it is worth mentioning here because, no matter what solution would ultimately be chosen there, I think it makes the most sense to redesign the device_manager around the “busses” concept, which it already partially employs.

Under this scenario, the only thing the device manager would necessarily know about any device_node is a string describing what bus it belongs to, and then a pointer to the bus manager handling that bus. The bus manager would be responsible for (with the assistance of generic routines in the device manager) invoking probe on drivers to see if they support drivers, comparing ID structures (which would be entirely bus-specific), determining what order device sleep or shutdown hooks should be called in, and so on.

This would return us to something more like the model USB uses today: drivers fill out a bus-specific structure/array, like usb_support_descriptor, which they hand off to the bus manager. The bus manager then interacts with the device manager to create nodes when new devices are attached (or removed), and then checks its list of support_descriptors to decide what drivers to invoke supports or just attach on. Following on the ideas in the above-linked ticket, this information could be written to extended attributes somehow, meaning that no drivers need to actually be loaded until it is known for certain that they are required.

waddlesplash · May 26, 2023, 2:51am

Talk is cheap. Let’s look at some pseudocode of what a driver using this hypothetical “device manager 2” might look like, based on the usb_ecm driver:

module_dependency module_dependencies[] = {
	{ B_USB_MODULE_NAME, (module_info**)&gUSBModule},
	{ NULL }
};

struct driver_module_info sUsbEcmDevice = {
	{
		USB_ECM_DRIVER_MODULE_NAME,
	},

	usb_ecm_detach, // detach from device. tear down state, but may still have consumers so cannot be destroyed yet
					// (device may have been removed or is now gone, that will be indicated via parameter.)
	usb_ecm_destroy, // destroy driver_node, all datastructures will be free after this returns

	usb_ecm_open,
	usb_ecm_close,
	usb_ecm_free,
	usb_ecm_read,
	usb_ecm_write,
	NULL,	// io
	usb_ecm_control,
};

static usb_support_descriptor[] sSupportDescriptor = { {
	USB_INTERFACE_CLASS_CDC, /* CDC - Communication Device Class */
	USB_INTERFACE_SUBCLASS_ECM, /* ECM - Ethernet Control Model */
	0, 0, 0 /* no protocol, vendor or device */
} };

// NOTE: usb_driver_module_info, this is USB-specific!
struct usb_driver_module_info sUsbEcmDriver = {
	{
		USB_ECM_DRIVER_MODULE_NAME,
	},
	{
		sSupportDescriptor,
		B_COUNT_OF(sSupportDescriptor),
	},

	NULL,	// init/constructor: not needed
	NULL,	// deinit/destructor: not needed
	NULL, 	// supports_device/probe: not needed
	usb_ecm_attach, // takes device_node to attach, does all necessary driver startup operations,
					// & invokes device_manager->publish(driver_module_info, "/dev/net/...");
					// also creates child nodes, if needed (not needed for USB-ECM)
};

module_info* modules[] = {
	(module_info*)&sUsbEcmDriver,
	NULL
};

X512 · May 26, 2023, 3:01am

I think that string attributes is actually a good idea because:

It allows generic introspection of device attributes, for example in Devices GUI utility. It helps a lot with diagnosing driver problems and writing new drivers.
It allows to implement generic device and driver matching instead of writing similar driver matching in each bus manager again and again.
String comparison should be not an efficiency issue on any <= 20 years old PC. It should be done in a few milliseconds.

Yes, I agree that devfs node publishing can be simplified a lot. With existing API there are also a problem that it is impossible to pass some cookie when creating devfs node or to get devfs path then initializing node.

If modifying all this things, it will mean creating totally new device manages v2 API incompatible with existing drivers. So it may be needed to support both V1 and V2 APIs until all drivers will migrate to V2 API.

It would be a good idea to make some drafts of device manager V2 API (writing header files etc.) even is no implementation will be made. It will allows technical discussions about API design details.

X512 · May 26, 2023, 3:08am

There are actually 2 different use cases of device_node:

Node representing actual device. It have device/bus attribute, supports automatic driver node search and attachment and it implements kernel side API for device by extending driver_module_info structure.
Driver node. This node is automatically attached to device node and consumes device kernel side API. It defines a set of attributes used for automatic attachment. It may publish devfs nodes and/or device nodes from first point.

Some my drawings:

waddlesplash · May 26, 2023, 3:10am

This can be accomplished by writing a “struct description field.” For example, specifying that at offset 0 there is a uint32 with the name pci/vendor. Then we get the best of both worlds: strings when you want them, structures when you don’t need them, and introspection is looking at the actual data and not a “mirror” of it. In addition it will be very fast since only the binary data need be compared.

I agree that writing similar device matching again and again would be not great and we should avoid it.

But I think the only thing that must be done per-bus manager is the comparison itself. Something like:

float
compare(usb_support_descriptor s, usb_device_descriptor d)
{
	if (s.dev_class != d.dev_class)
		return -1;
	if (s.dev_subclass != d.dev_subclass)
		return 0.1;
	if (s.dev_protocol != d.dev_protocol)
		return 0.2;
	if (s.vendor != d.vendor)
		return 0.3;
	if (s.product != d.product)
		return 0.4;
	return 0.5;
}

Embedded systems do exist. And the problem is more that fetching and comparing the strings leads to tedious boilerplate code.

If I am going to be the one to write V2, then I wouldn’t merge it until I had migrated all drivers into the V2 API. There are actually not so great quantities of them.

I agree it would be good, but part of the problem with this, for me, is that I design the nuances of these APIs while I am actually implementing them, and leave things ambiguous until that point.

We can, instead, discuss beforehand what the API usage should look like in practice, and that will specify what the API itself should look like. Hence the pseudocode I wrote above.

waddlesplash · May 26, 2023, 3:13am

Yes, indeed. That’s what I was trying to capture above: there should be two different structures, one for each of these use-cases, which interact with one another.

Some devices will have just one node: like PCI devices, for instance, will be devices, while a network driver attached to a PCI device will be just a driver.

Some other devices may have both: a hard disk will be exposed in /dev/ but also could have a filesystem mount attached to it. Though, perhaps in that case, this would be something else entirely (hanging off devfs) and not a device or driver node.

X512 · May 26, 2023, 3:13am

It can be even some pseudocode that do not even compile. There are no meaning to implement thing if if will significantly change after public discussion.

waddlesplash · May 26, 2023, 3:15am

Oh, sure. I thought you meant stuff that actually did compile.

The pseudocode I have listed here is a pretty good example of the ideas so far. The implementation to make this all work would be up in the air.

I think the most critical thing to nail down is precisely what the distinction between a device and driver node would be, and then what those APIs vaguely look like for API consumers. The rest of it is “all semantics”, as they say.

X512 · May 26, 2023, 3:16am

Large patches will be extremely hard to maintain and it will have high chance to be finally abandoned. A lot of new API drivers are need to be written for RISC-V and ARM (clock, reset, GPIO, i2c and more), it will conflict a lot.

waddlesplash · May 26, 2023, 3:18am

Fortunately for the time being, I am paid to work on Haiku and so have plenty of time to do the grunt work so that other people can do the “fun stuff.” So, in this case, dealing with the cumbersomeness of a large patchset, and then helping people fix their not-yet-merged drivers, would be my job, quite literally.

X512 · May 26, 2023, 3:19am

It is hard to collect all code pieces from forum posts together, it would be better to create some GitHub branch, Gist, Gerrit patch etc. that will be updated with version tracking and users can leave comments for line numbers.

waddlesplash · May 26, 2023, 3:22am

Probably.

Let’s see what the other developers have to say about the ideas here. If there is a positive response, then I will start collecting the ideas.

Some things which need to be looked at:

FreeBSD compatibility layer integration
At least one each of device drivers handling: PCI, ACPI, FDT
USB stack

X512 · May 26, 2023, 3:24am

I still see no meaning in this hard coded boilerplate code. If it can be handled in generic way, it should be handled in generic way.

waddlesplash · May 26, 2023, 3:27am

Because each bus will match things differently. In USB’s case, as soon as the “class” is matched, we return a positive nonzero value. (I also should have added explicit handling for 0 as a magic number in the support descriptor, which would have complicated the code further.)

PCI, on the other hand, pretty much mandates vendor and device match, or else the score will be 0. FDT is string comparison, but I guess there are still requirements for what makes a non-zero score there as well. So, comparison between busses is not the same at all, it differs in critically important ways.

waddlesplash · May 26, 2023, 3:28am

If a bus’ ID comparison method is really just memcmp, then of course it can use a generic function. But at least USB simply needs to do special things here.

X512 · May 26, 2023, 3:32am

No, this is a mistake.

It can be done with generic matching proposed by me with using only device/bus and usb class attributes for matching. Attributes not present in driver attribute matching list are ignored. It is also possible to define multiple matching rules, for example one for class only, another for class and vendor pair and so on.

As long I see Haiku driver code base, it should be possible to write generic driver compatible hardware definitions without problem.

waddlesplash · May 26, 2023, 3:35am

But one never writes a device driver for a “generic” device, you always write one for a PCI device, a USB device, an I2C device, …

If you are doing anything other than == comparison of arbitrary-sized buffers, “hard-coding” (?) a comparison function is mandatory. Otherwise, if we had a “ruleset” telling some generic code how to compare IDs, then it will have to “evaluate” this rule-set on every comparison, right? But then we have just defined a “language” with an “interpreter.” Best to just pass in a function for doing the comparison, otherwise we’ll end up having to write a JIT, or use the BPF virtual machine or something!

X512 · May 26, 2023, 3:37am

Can you give exact example where some special compare rule is needed and memcmp() is not enough? And it can’t be represented with multiple memcmp() matching rules.