Cortex

starsseed · December 5, 2013, 9:14pm

[quote=NoHaikuForMe]In PCM audio, for each frame you simply add together the relevant sample from all the inputs. If you’re using a signed integer representation you should use saturating arithmetic, like a DSP, not the wrapping arithmetic provided by default in most CPUs. For floating point it’s unimportant.
[/quote]You missed the point, all streams are not necessarily row PCM sound. How do you mix 2 videos ? do you want to do a cross fading, a Picture in picture a superpositon or a chroma-key ?

Whatever you want to do, a pipe can’t mix 2 sources, you need a computation, whether visible or not.

Once again: You missed the point, Over the history of personal computing there have been many people who thought like you that things are simple when the are not ! You claim you understand how I think ? you do not even understand that a node handles nothing else but 16bit 44100Hz PCM mono sound, and a pipe (or more precisely a buffer in the case of our media kit) is not a DSP…

It was wrong for you, but right for them.
You see : mixing medias implicitly is not always a good idea, since the result was not the most convenient for you !

[quote]
it’s a simple addition and it’s performed automatically by JACK itself. JACK made a bunch of design choices which make this (and other important things) easier to get right, but the fundamental concept isn’t hard.[/quote]
So Jack behave as some kind of node : it DOES the job.

Yes, call it “node”, “Jack” or whatever you want.

+1

Barrett · December 6, 2013, 1:54am

[quote=PulkoMandy]our mixer is not just blindly adding streams together. What if they have different bitrates? The mixer node takes care of this, it will also resample (and there are two algorithms implemented to do that with different levels of quality).

You are ignoring what I said: what if you want to implement a ring modulator ? Getting the sum of two signals, when what you wanted is to multiply them, is utterly useless. So your suggestion is restricting what the media kit could do. And I’m still talking about sound streams here.

The same apply on the output side: what if the nodes you plugged as outputs don’t expect the same format ? There’s negociation between nodes to agree on a common format. While this happens with pairs of nodes, it’s easy to find a solution. When it happens with multiple nodes on each end, you really need some kind of mixer or splitter node in the middle to handle the format conversions and make sure every node gets input in a format it can process.

Note this doesn’t increase latency more than what you suggest: in your scheme “something” would still have to do the mixing and copying. Why not use nodes for that?[/quote]

There’s already a mechanism which provide format negotiation between nodes, so actually from this point of view the selection of which format will be accepted may be leaved to the node. Or alternatively i imagine node X which is connected with Y using floating point format, when you attempt to connect the node J to the same input as X, the format should be the same. This is actually a problem resolved from the begin in jack, since it’s using the same format for all clients.

About the ring modulator, from the user perspective if you connect two nodes to the SAME input, they will be mixed up and will come to the destination node as a single mixed stream, if you connect two nodes to two different inputs of the same destination node, it will take care of what to do. The system mixer may mix them up, your node may sum/multiply etc. Some other node may just deny multiple connections to the same input and multiple inputs. The power is untouched. This is a explicitly behavior which i think in the audio is something natural and expected by the user. This also obsolete my thesis of having a multiple input open every time.

From the latency perspective, and also assuming that devs make well-fined rules on how this shoud be handled by the media_kit, it will add just the latency to mix the frames if the format is restricted. It will also remove the latency of the mixer / adapter node, which should be not underestimated. If you have 3 nodes it is ok, but what if i have considerable number of nodes and a considerable complexity in the graph? I have to run a lot of unneeded nodes.

But consider the advantage from the user perspective, it will remove the annoying thing of opening nodes, connecting them etc. This also leave some room to think about a little but very flexible set of BMediaNodes provided by the api to manage different situations. The fact that in video mixing up isn’t the same as audio, is basically limiting the potentially power of media_kit in the audio level. The same apply for interleaved stereo streams.

I’m actually running a project, that i hope to release at the end of the month, which is strongly related to those things. So i maybe wrong on some things, in any case i want to try myself.

PulkoMandy · December 6, 2013, 7:57am

Read again: there is a format negotiation which works one-to-one. A node can provide a given set of formats, and another node can consume another set. The negotiation takes the intersection of these and picks the best available format.

Now, you add a third node into the connection: why should this one accept the existing parameters? what if it can’t? Jack has chosen a single format for everything, which to me sounds like a huge limitation, one that can’t be worked around. On the other hand, our system of one-to-one negotiations can easily be made to work with N:1 or 1:N mappings, using mixer and splitter nodes.

I don’t think so. In any audio system, be it hardware or software, to mix things together, you need a mixer. Making this happen by some hidden magic means you can’t tweak that mixer parameters. What if your two input streams have different formats? You still need some code to make the mixing happen. Again, why not do this using a mixer node?

Also, I think you are looking too much at the low-level Cortex that shows everything. Cortex is a debugging tool for the media stuff, it is not a generic audio synthesis application. Instanciating a mixer node automatically from your app (or your Jack port) is a perfectly acceptable solution. Each Jack node would map to a [Mixer > Actual Media Node > Splitter] in Cortex.

How do you want the mixing to happen with zero-latency? I’m curious to read your code that does that. Whatever you do, nodes or not, you have to get the samples coming from all your input, sum them, and generate a single output stream. How do you do this without allocating an output buffer, filling it, and sending it to the next item in the processing chain? (wether this is another node or another part of the same node).

Whatever you do, mixing DOES add latency. So, we use a node, which doesn’t add latency by itself, but allows you to SEE that the mixing process is adding latency on its own.

I don’t see how this is limiting anything, quite the opposite. You can write your own mixer node, or even do the mixing like you want, inside a node, if you wish so. You just have to write the code to do that yourself, and it could easily be shared as an helper library. But I don’t think you’ll get lower latency with that, and you’ll end up rewriting parts of the meda kit (such as… the mixer node?) inside your own code. This also means the mixing and actual processing of your node are now tied together with no way to use another mixer, for example. What if I wrote a better mixer algotrithm (faster, less latency, or gives better results by avoiding saturation, or does some kind of normalization, or whatever)? No way to use it with your node that wants to do the mixing itself now.

NoHaikuForMe · December 6, 2013, 4:22pm

[quote]
I don’t think so. In any audio system, be it hardware or software, to mix things together, you need a mixer.[/quote]

No. Sounds will actually “mix” together as described without a special mixer. You can try it for yourself, stamp your foot and clap your hands. The sounds are added together, you hear a combination of both and you don’t need to buy any fancy equipment, isn’t that amazing? It turns out that what’s going on here is a physical property of sound waves, and in fact it’s the foundation of the PCM encoding that makes your PC audio work.

Closer. One Mixer node per input, then the Actual Media Node, then one Splitter node per output.

[quote]
Whatever you do, mixing DOES add latency. So, we use a node, which doesn’t add latency by itself, but allows you to SEE that the mixing process is adding latency on its own.[/quote]

Goodness no, as we saw at the start of this reply this simple “mixing” doesn’t inherently incur any latency at all. Now, on a personal computer the addition step does take (minute amounts of) CPU time, and it can happen rarely that this is enough to push us to use a longer period (buffer size) and that will increase the latency, but most of the time it’s a drop in the ocean.

Now, if we need to wake up a third party process to do the addition, then that certainly incurs more overhead, making it more likely that we’d use a longer period, so that is a reason to avoid this approach.

spinach · December 6, 2013, 4:34pm

sounds like a problem best solved by keeping audio plugins simple (haven’t noticed any kind of wakeup with the bundled nodes)

Barrett · December 6, 2013, 5:31pm

What happens when connecting two nodes with unconmpatible formats actually? The connection can’t be made. The same is for multiple connections, no difference. This is resolved by imposing more restrictions since the case of multiple connections is particular. Where is the limitation? You still have the actual behavior, i.e. 1:1, but plus there’s the possibility to automatically mix frames or split when doing multiple connections, N:N.

I don’t think it should be necessarily a hidden magic thing, there are a lot of ways to expose controls and settings for the user.

This is hilarious, where i wrote that mixing is zero latency? : )
Are you really reading what i wrote? There’s not anything which is really zero latency in the world, except quantum correlation, maybe.

Anyway, can you demonstrate that media_nodes are not increasing the latency? And the resulting overhead? And the throughput? If you can do that, i’ll retire my idea and follow your thesis. At some point i’ll come out with benchmarks, and we’ll see the truth, before this moment you can’t be sure of what you are saying, neither me.

Anyway, with nodes or not, the situation should be cleared.

No if it’s managed by the media_kit in a smart way. Why the opposite? What you can do actually with the media_kit is preserved at all.

[quote=PulkoMandy]

This also means the mixing and actual processing of your node are now tied together with no way to use another mixer, for example. What if I wrote a better mixer algotrithm (faster, less latency, or gives better results by avoiding saturation, or does some kind of normalization, or whatever)? No way to use it with your node that wants to do the mixing itself now.[/quote]

No if you need another mixer you instantiate it, connect the two nodes to the mixer, and the mixer output to the final node. I don’t see the point really.

Barrett · December 6, 2013, 10:24pm

Anyway, i like the “everything is a node” concept, my concerns are related mostly to the eventual latency added.

PulkoMandy · December 8, 2013, 7:54pm

The conversion can be lossy in that case (say you convert from float to uint8 because the consumer node only accepts that). I’d rather know this before I notice the output of the node chain is completely garbled and aliased. Same thing when you need to resample because two nodes operate on different frequency ranges. Making this happen “automagically” has consequences deep inside the Media Kit concepts. The most obvious one is it makes the Media Kit knows about the data formats. It has to know that the streams are audio, video, or something else, which format they use, and so on. It means the Media Kit somehow makes the decision for you, to pick a format on each side and instanciate some kind of converter. How does that work? Pretty much exactly like our existing mixer node. It takes buffers in one given format on one side, and outputs them in another format on the other side. Making this a separate node or some invisible part of each node is a minor design decision at this point. With the current Media Kit solution, you get a more complex node chain, but a very simple API for each node. When you integrate the mixer into each node, whenever you instanciate a node, you have to ask yourself several questions. Is it a “sound” node? If so, I have to configure the mixer settings. Do I have a single input so I can bypass the mixer? Wan I agree on a common format with said input? Will I use interpolation to do the resampling? Is some kind of filter needed after or before resampling to avoid or limit aliasing?

You see this would make the API and the process for writing and instanciating/configuring a node a lot more complex. Jack avoid all this complexity by picking a single common format, which makes things a lot more simple, but restricted to a single bitrate and sample format for all the nodes. Forget about using their system for anything else than audio in the system-defined format. Now that’s a big restriction to what you can do.

NoHaikuForMe keeps saying mixing is for free, in the message just above yours. I’m sure you have better understanding than he does, having actually written some code and played with Jack and/or the Media Kit.

So, here’s why mixing does add latency: while adding sounds in the real, physical world isn’t much of a problem, we only have one (or two) speaker and only one DAC connected to each of them. This is a hardware limitation. So we can’t add waves in the physical world,which would indeed be the simplest and fastest way of mixing them. Instead, we have to do this in the digital side of things, before the DAC. We want to do this at arbitrary places in the signal processing chain - or, in your suggested model, at the input of every node. The only way to do this is to rely on plain old software to do the mixing. The way it works (assuming everyone works in the same bitrate and sample format to keep things simple) is to read one buffer from each of the inputs, and for each position in the buffer, sum up the samples from each input, and place the result at the same position in the output buffer.

There is a problem with that: you can’t keep blindly summing things up or you’ll quickly encounter some saturation. If your inputs are in the range (0…4095) (it’scommon to have 12-it DACs, 24-bit or 48-bit is also possible but let’s take 12-bit as an example). So, if your inputs are in the range 0…4095, and you sum 4 of them, the result can be in the range 0…16383. It now needs a 14-bit value to store it. In the end, if the DAC on your hardware is 12-bits, you have two choices: either you ignore the problem and you get very bad saturation, or, you divide this by the number of outputs to keep everything in the same range. This latter solution is probably what the software you used on Amiga did: (some wave + silence) / 2 = some wave / 2.

In any case, mixing involves manipulating buffers line any other signal processing, and things that come for free in the physical world are often not so easy to replicate in the software one.

I don’t like the idea of making the Media Kit aware that some of the streams may be audio and these may have implicit mixing. I think this can be done either with explicit instanciation of mixer nodes (our current solution), or inside the “audio” nodes themselves, without having to modify the Media Kit at all. There could be a MultiInputAudioNode class or something like that, which would handle this input connect/disconnect, output broadcasting, format conversion and so on, with an API to configure it all. Nodes would then subclass this and implement the core signal processing relevant to them. This is just one possible specialization of what Media Kit can already do, and can be used to build something more dedicated to audio processing above the Media Kit, such as a Jack-compatible API, or something completely different. Meanwhile, the “core” MediaKit stuff can still be used for video, or some other arbitrary signal processing.

[quote]

This is possible, but you’re back to the way it works currently. This means you have to keep two kind of mixers around, an explicit standalone mixer node, and an implicit one bundled into every node. There’s the case of different mixer algorithm, but also the case where you want to post-process the mixer output. Maybe you want to normalize it, or filter it, or something else, before going on with the sound processing. The Media Kit is nice for this because it is very low level, and allows you to master the exact signal path. This is powerful but maybe a bit complicated, however, I don’t expect people to use Cortex to define their processing chains. The same graph-like paradigm could be used in a higher-level application, working only on MultiInputAudioNodes, or instanciating 3 nodes for each graph element (mixer > actuall processing > splitter), or maybe a mix of the two (to handle ‘legacy’ nodes).

Anyway, I agree we’re talking about void so far, and I’m waiting for your code so there is some actual testing and numbers. Good luck with the work

NoHaikuForMe · December 9, 2013, 3:43am

… followed by three paragraphs of waffle without mentioning latency once. I’ve explained what’s really going on here above already.

Mmm. It’s important to distinguish between what happens with integers versus floating point, but…

It hasn’t been “common” to have a 12-bit DAC in PC audio for at least 15 years, maybe closer to 20 years and a 48-bit DAC isn’t “possible” unless you’re determined to stretch the definition of the word beyond reasonable demands. But sure, let’s go along with your example since I know where you’re headed.

Please don’t use unsigned representations. They were ghastly but arguably necessary for 8-bit PCM decades ago, but they’re needlessly unclear here, prefer an unbalanced signed representation. -8192 to +8191

Don’t tease us PulkoMandy, tell us which you think is the right choice here. In fact, I’ll even let you cheat and go check what Haiku’s System Mixer does before answering.

Barrett · December 9, 2013, 3:09pm

Yeah, i have to admit that everything will become more confused, it’s the engineers trap to solve things with too much complexity where in a lot of cases the simplest solution is the better.

PulkoMandy:

There is a problem with that: you can’t keep blindly summing things up or you’ll quickly encounter some saturation. If your inputs are in the range (0…4095) (it’scommon to have 12-it DACs, 24-bit or 48-bit is also possible but let’s take 12-bit as an example). So, if your inputs are in the range 0…4095, and you sum 4 of them, the result can be in the range 0…16383. It now needs a 14-bit value to store it. In the end, if the DAC on your hardware is 12-bits, you have two choices: either you ignore the problem and you get very bad saturation, or, you divide this by the number of outputs to keep everything in the same range. This latter solution is probably what the software you used on Amiga did: (some wave + silence) / 2 = some wave / 2.

In any case, mixing involves manipulating buffers line any other signal processing, and things that come for free in the physical world are often not so easy to replicate in the software one.

Thanks for pointing out, i have a lot to learn already

: )

PulkoMandy:

I don’t like the idea of making the Media Kit aware that some of the streams may be audio and these may have implicit mixing. I think this can be done either with explicit instanciation of mixer nodes (our current solution), or inside the “audio” nodes themselves, without having to modify the Media Kit at all. There could be a MultiInputAudioNode class or something like that, which would handle this input connect/disconnect, output broadcasting, format conversion and so on, with an API to configure it all. Nodes would then subclass this and implement the core signal processing relevant to them. This is just one possible specialization of what Media Kit can already do, and can be used to build something more dedicated to audio processing above the Media Kit, such as a Jack-compatible API, or something completely different. Meanwhile, the “core” MediaKit stuff can still be used for video, or some other arbitrary signal processing.

Yeah, it’s like what i was thinking, and i’m building an internal API for myself, something you may have already see around from Be. I got inspiration about this, from the Titan source code, which include some nodes developed by Be inc.

We should have things like this, for example the BSoundCounsumer node planned to become something like BSoundPlayer. I think it’s partially a mistake, in my idea there should be something like a BAudioProducer and BSoundPlayer should be based on this.

Similarly we may have a BAudioConsumer node and a BSoundConsumer easy-to-use class.

I think this, just because i see there’s a lot of code duplication in nodes, a various things done by nodes are equals. That’s just an idea right now, i’m sure it’s good for an application but not so sure about an OS API which have a lot of problems, and i may not see them right now…so i don’t exclude i may convince myself that the generic approach is the best.

PulkoMandy:

This is possible, but you’re back to the way it works currently. This means you have to keep two kind of mixers around, an explicit standalone mixer node, and an implicit one bundled into every node. There’s the case of different mixer algorithm, but also the case where you want to post-process the mixer output. Maybe you want to normalize it, or filter it, or something else, before going on with the sound processing. The Media Kit is nice for this because it is very low level, and allows you to master the exact signal path. This is powerful but maybe a bit complicated, however, I don’t expect people to use Cortex to define their processing chains. The same graph-like paradigm could be used in a higher-level application, working only on MultiInputAudioNodes, or instanciating 3 nodes for each graph element (mixer > actuall processing > splitter), or maybe a mix of the two (to handle ‘legacy’ nodes).

This discussion has cleared a lot of things for me, so basically in the first way an hypotetic audio routing app, should instantiate a mixer automatically in those situations.

As you said we may just instantiate those node by default, but the main problem is the same as before, we need something close to zero latency (i mean when doing 1:1 connections with a mixer and a splitter in the middle). Anyway, i would add that, in any case we should have minimum-latency nodes, with multiple inputs or not. So maybe, as said, i’m using a too complex approach…

I’m also curious if the situation will be improved by the current scheduler contract : )

[quote=PulkoMandy]

Anyway, I agree we’re talking about void so far, and I’m waiting for your code so there is some actual testing and numbers. Good luck with the work :)[/quote]

Thanks…i think we need it for a lot of reasons, if we want to seriously develop the media_kit we should have some reference to detect regressions and improvements.

PulkoMandy · December 10, 2013, 9:15am

Barrett:

Yeah, it’s like what i was thinking, and i’m building an internal API for myself, something you may have already see around from Be. I got inspiration about this, from the Titan source code, which include some nodes developed by Be inc.

We should have things like this, for example the BSoundCounsumer node planned to become something like BSoundPlayer. I think it’s partially a mistake, in my idea there should be something like a BAudioProducer and BSoundPlayer should be based on this.

Similarly we may have a BAudioConsumer node and a BSoundConsumer easy-to-use class.

I think this, just because i see there’s a lot of code duplication in nodes, a various things done by nodes are equals. That’s just an idea right now, i’m sure it’s good for an application but not so sure about an OS API which have a lot of problems, and i may not see them right now…so i don’t exclude i may convince myself that the generic approach is the best.

We already did similar work in other places. For example, our translators for the translation kit use a “translationutils” static library with a lot of common code. I think this is a good way to experiment things, and it avoids locking up possibilities too early. I’m thinking for example of the “Sane” translator that allows to use a scanner using the translator API. This one probably won’t use libtranslationutils, because it doesn’t work with files. So there is a good balance of a generic API, and a specialized application of it.

Of course, the additional flexibility of Cortex leads to more work when working only with Audio. Maybe this has some performance hit in some cases, but I’m ready to live with it for now. I’m waiting for your experiments, and if they show there are huge latency problems we may have to tweak some things or maybe explore other ways.

@NoHaikuForMe: sorry, this seemed so obvious to me: adding/averaging buffers in software takes time (in case you didn’t guess… it’s a loop over the buffer length and memory access to buffers from different places in memory, potentially causing a lot of cache misses and taking some time to get done). This time adds to the latency of the processing chain. Again, if you have an algorithm for mixing that doesn’t involve spending time adding these buffers together, please share it!

I didn’t talk about using floating point numbers for the mixing because it’s not really relevant: you can do all the mixing with just summing stuff together, but in the end you still have to somehow normalize the output so it fits the range of the DAC - probably 24bit on modern hardware. You can either do this statically (blindly divide by the number of streams you summed, effectively getting the average of sound waves, which is not what ‘physical’ mixing would do but doesn’t saturate), or dynamically (detect the maximum value, and divide by this to get something in the -1…1 range, then multiply by the DAC resolution). Or you can make sure you know how much streams you’re going to mix and have them provide low-enough vales so the sum doesn’t exceed what your DAC can handle. But this last approach needs a bound on the number of streams, reduces the dynamic available to each of them, and if you set the bound too high, also reduces the resolution, even when using floats (ok, maybe with double as they are represented on x86 you may get pretty far without ever noticing this…).

What do you suggest?

NoHaikuForMe · December 10, 2013, 5:50pm

[quote=PulkoMandy]
@NoHaikuForMe: sorry, this seemed so obvious to me: adding/averaging buffers in software takes time (in case you didn’t guess… it’s a loop over the buffer length and memory access to buffers from different places in memory, potentially causing a lot of cache misses and taking some time to get done). This time adds to the latency of the processing chain. Again, if you have an algorithm for mixing that doesn’t involve spending time adding these buffers together, please share it![/quote]

It is important to keep in mind that PCM data is granular. This is true inside a DSP-based system, but it’s even more true for something like the Media Kit that deals not with individual samples or frames but with whole buffers amounting to several milliseconds of audio. As a result the system latency acquires the same granularity, if buffers containing 4ms of audio are used then it’s quite meaningless to claim one additional millisecond of latency from some change, do you see? Either the 4ms buffers are fine, or you need larger buffers, or more of them, but no way can the latency now be 5ms. Be’s design makes this a lot harder to understand than it should be, but that’s really not anyone else’s fault.

In a system like Haiku the DAC really isn’t your concern, you are responsible only for handing over the PCM samples to some sort of hardware, it might be the digital-only HDA implementation on a modern graphics card or it might be a cheap 1990s AC97 implementation, but either way it’s the same for Haiku. Likewise the word “normalize” is wrong in this context, and we will see why floating point is relevant later.

You could do either of these things but they’ll sound terrible, so don’t. Actually the second option might end up sounding like a third rate implementation of a really vicious compression-type distortion which is sort-of cool for a certain type of project but not what you were looking for. If you did want that you shouldn’t implement it this way FWIW.

Ah, no. Although managing levels sensibly is a useful skill for a human it’s not an appropriate thing for the system to try to guess on its own. However such shenanigans are exactly why you should choose float. You get a sign bit, an implied leading one and 23-bits of “fraction” across the entire normal range of the 8-bit exponent. Even if you divide by, say, four billion, you still keep all the precision anyway, and the same applies for multiplying too, which leads to this behaviour of floating point being sometimes called “infinite headroom” (in fact you get about +/- 750dB to play with). It was very easy to goof during processing in analog and in 16-bit or even to a lesser extent 24-bit integer PCM systems, making something too hot and distorted beyond repair, or else so quiet there’s nothing left to amplify. This sort of goof doesn’t happen in floating point.

Saturating addition. If you’re determined not to learn enough theory to see why it’s the Right Thing™ then why not just try it?

jua · December 10, 2013, 6:25pm

No.

It is enitrely possible and useful that a chain of nodes can have a larger latency than the duration of a buffer. A single node must not have a larger latency than the buffer duration. But normally there are many nodes chained together so the total latency (what I guess you mean by “system latency”) adds up. Every node can be working on a buffer at the same time (the concept known as pipelining in e.g. CPU architecture), so you can have 4ms of audio coming out of the chain at the end even though the inital latency for the first buffer to travel through it is e.g. 20ms.

Do you want to resize buffers on every latency change? Reallocate all buffer pools with larger buffers just because another node was inserted and the total latency increased…?

NoHaikuForMe · December 10, 2013, 7:46pm

[quote=jua]
No.

It is enitrely possible and useful that a chain of nodes can have a larger latency than the duration of a buffer. A single node must not have a larger latency than the buffer duration. But normally there are many nodes chained together so the total latency (what I guess you mean by “system latency”) adds up. Every node can be working on a buffer at the same time (the concept known as pipelining in e.g. CPU architecture), so you can have 4ms of audio coming out of the chain at the end even though the inital latency for the first buffer to travel through it is e.g. 20ms.[/quote]

And how does any of this contradict what I wrote above? How does your pipeline manage to chain together buffers each four milliseconds long such that the total is five milliseconds or indeed anything but a multiple of four milliseconds? Your example is 20ms, which is a multiple.

By “system” latency I wanted from the outset to distinguish what we’re discussing from the audio latency, which is arbitrary in this context. A tape delay simulator, for example, can introduce colossal audio latency but that doesn’t mean anything for the system latency.

A quite useful thought experiment to run in your head when thinking about this stuff is, how do larger or smaller buffers impact my CPU? Remember, the actual calculations to be done are the same whatever the buffer sizes. If your CPU isn’t powerful enough to run a cabinet simulator in real time you can’t usually fix that by just making some buffers a bit bigger, you’ll either need a more powerful CPU or else go without the cabinet simulator. Changing buffer sizes my help you a little with CPU caches to squeeze out a last few percentage points of available performance, but it cannot make the sort of dramatic differences that Be’s design seems to anticipate.

And this is where we come back to adding numbers together. A floating point add is a very cheap operation on a modern CPU. Compared to almost any interesting effect that you might incorporate into a Media Kit node it is inconsequentially small overhead. Unless you insist on doing it as an entirely separate node and thus add the node overhead on top. That’s why it’s silly to talk about this increasing latency, the granularity of latency is a whole buffer, several milliseconds long, there’s no way any sensible topology will cause you to spend a noticeable fraction of that time doing the addition, so you’ll keep using the same buffers as before and the system latency remains the same.

jua · December 10, 2013, 8:11pm

Buffers do not get “chained together”. They simply flow through the chain from node to node. Every node needs a bit of calculation time, the sum of those times is the total latency of the chain. The only thing that matters is that a steady flow of buffers arrives at the output.

Suppose your chain includes 3 nodes: producer → filter → consumer. Suppose producer has 1ms latency, filter has 3ms, consumer 1ms. So, total latency is 5ms. That the buffers are 4ms is irrelevant. The buffer duration is independent of latencies. (The only boundary is that the latency of any single node in the chain may not be larger than the buffer duration.)

That was coincidence, I just picked a random value… it could also be 17 or 29 or 2549, doesn’t matter, it always works.

Please precisely define what system latency is for you to make sure we aren’t talking about completely different things.

Yes, but don’t confuse latency and throughput, they are two separate measures.

Also, don’t forget that the Media Kit is the one subsystem for audio (and more) in BeOS/Haiku. Keeping the buffer sizes independent of latencies means you can run a high-latency application alongside with a low-latency one and none disturbs the other. Say you run your media player which has a high 200ms latency and at the same time a real-time synth with 5ms latency. Works fine with the Media Kit approach. You wouldn’t want to raise your buffer size system-wide to 200ms just because a single applications requests it!
Certain other operating systems have multiple completely separate APIs/subsystems for “regular” audio use and “real-time” audio use. And when you use one, it blocks the others… that’s bad. Haiku’s MediaKit however covers it all at once – but that means it has to anticipate coexistence of widely different latency needs.

NoHaikuForMe · December 10, 2013, 10:16pm

[quote=jua]
Buffers do not get “chained together”. They simply flow through the chain from node to node. Every node needs a bit of calculation time, the sum of those times is the total latency of the chain. The only thing that matters is that a steady flow of buffers arrives at the output.[/quote]

Indeed?

First let’s try some other numbers that obey your constraint, how about the filter has 0.5ms latency. Total is 2.5ms of latency according to your thinking. But our buffers are still 4ms long. No matter what strategy the Media Kit uses, when a particular buffer starts playing for the next four milliseconds the audio played won’t be influenced in any way by what your filter or producer are doing. Your approach is working out what the Media Kit calls “processing latency” and which generally doesn’t get a name because it’s not a very useful thing to know as we’ll see shortly.

Now, let’s get back to your numbers and try a worked example

T=0 producer begins working on buffer A, the first 4ms audio buffer
T=1 producer hands buffer A over to filter
T=4 filter hands buffer A’ (modified by filter) over to consumer
T=5 consumer has sent A’ to PCM hardware, back now to producer to work on buffer B, the second 4ms audio buffer
T=6 producer hands buffer B to over to filter
T=9 a PCM buffer underrun occurs, probably audible as a loud click or stutter

So we see this doesn’t work. Why not? Because as we discussed previously the workload is too high, we need to do 5ms of work every 4ms of time, and that’s not possible.

[quote]
Also, don’t forget that the Media Kit is the one subsystem for audio (and more) in BeOS/Haiku. Keeping the buffer sizes independent of latencies means you can run a high-latency application alongside with a low-latency one and none disturbs the other. Say you run your media player which has a high 200ms latency and at the same time a real-time synth with 5ms latency. Works fine with the Media Kit approach. You wouldn’t want to raise your buffer size system-wide to 200ms just because a single applications requests it![/quote]

It seems you’ve contradicted yourself. Does the buffer duration have to be at least as large as the 200ms latency of the media player, which you “wouldn’t want” or not? Earlier “just because a single application request it” was enough reason, now it isn’t.

I can’t restate this often enough - when you’re not sure, go try it and see what happens for real. If what you find experimentally doesn’t match up with how you thought the system works, it means you were wrong.

Sure, I’m using “system latency” here to mean the time taken to play all the frames of audio in every buffer used. Typically a synchronous system will use 2 buffers, but more may be necessary, the system you’ve described could need considerably more. You might wonder why this amount of time is important, it’s because whatever input the user is supplying cannot reliably affect the output until all these frames have been played. If you try to cheat (and some people have, including by accident) then there is noticeable jitter which is far worse than higher latency as far as musicians are concerned.

The reason to worry about this measure is that it’s perceptible to the user but you have the opportunity to do something about it unlike the physical latency of the DAC, or the unavoidable audio latency of a “Cathedral effect” convolution filter the user insists on running.

jua · December 11, 2013, 7:18pm

So far so good.

Why T=4? You said the filter in your example has 0.5ms latency. Thus, the filter can send out buffer A’ at T=1.5ms.

Corrected, the timing looks like this, I will name the 3 nodes A → B → C for simplicity.
t=0 : A begins its work on buffer 1
t=1 : A finished and sends buffer 1 to B
t=1.5 : B finished and sends buffer 1 to C
t=2.5 : if C is the soundcard node, the audio starts coming out of the user’s speakers right now.

Now at this point A needs to know when it should start preparing another buffer. Every node of a chain knows its downstream latency (the latency from itself to the final consumer), node A’s downstream latency is 1.5ms. The first buffer was played to the user at t=2.5ms, the buffer is 4ms, so the next buffer needs to become analog in the user’s soundcard at exactly t=6.5ms. Node A has to start preparing the next buffer at (6.5 - 1.5 - 1)ms which is the buffer’s performance time minus the node’s downstream latency minus its own latency.

So, the timeline goes on as such:

t=4 : A begins filling buffer 2
t=5 : B receives buffer 2 and filters it
t=5.5 : B finishes and sends to C
t=6.5 : content of buffer 2 comes out of speakers. No dropout happened!

Note: in reality, the MediaKit also has to take in account the “scheduling latency”, the time it takes for the node’s work thread to be scheduled. I left that out for simplicity, it doesn’t matter for understanding the basic idea.

The workload gets too high only when the latency of any single node is larger than the buffer duration. You can’t have a node with latency=5ms in a chain which processes buffers of 4ms duration. But the total latency of the entire chain added up may be arbitrarily larger (or even smaller) than the buffer duration.

I thought that was what you wanted to say. I guess that was a misunderstanding.

[quote=NoHaikuForMe]
Sure, I’m using “system latency” here to mean the time taken to play all the frames of audio in every buffer used. Typically a synchronous system will use 2 buffers, but more may be necessary, the system you’ve described could need considerably more.[/quote]
(Edited:) Yes, it may need more buffers, but with the benefit of the latency being independent of buffer duration (except the boundaries I talked about). What kind of system do you have in mind?

NoHaikuForMe · December 11, 2013, 10:43pm

[quote=jua]
Why T=4? You said the filter in your example has 0.5ms latency. Thus, the filter can send out buffer A’ at T=1.5ms.[/quote]

I’d finished my example, I was back to examining yours as I wrote in the text. They illustrate different problems with your theory.

But OK, let’s back up to my example, with the filter taking just 0.5ms to run, you’d call the latency 2.5ms, but in fact the sound coming out of the system now is from more than 4ms ago, not 2.5ms. Every buffer takes 4ms to play, there’s no getting around that. Let me spell that timeline out since I didn’t previously.

T=0 producer begins working on buffer A, the first 4ms audio buffer
T=1 producer hands buffer A over to filter
T=1.5 filter hands buffer A’ (modified by filter) over to consumer
T=2.5 consumer has sent A’ to PCM hardware, and by Be’s design we can rest until T=4
T=4 producer begins working on buffer B, the second 4ms audio buffer
T=5 producer hands buffer B over to filter
T=5.5 filter hands buffer B’ (modified by filter) over to consumer
T=6 audio frame playing is still from buffer A, now over 5ms old, our latency is clearly not 2.5ms

[quote=jua]
The workload gets too high only when the latency of any single node is larger than the buffer duration. You can’t have a node with latency=5ms in a chain which processes buffers of 4ms duration. But the total latency of the entire chain added up may be arbitrarily larger (or even smaller) than the buffer duration.[/quote]

No, the workload is too high when you can’t do one period of audio in one period of time. You’ve tricked yourself by dividing the workload up into pieces, but you have to do all the pieces or else you’ll stall out. Since this is the Be Media Kit the usual next resort is to say that you’ll run some of the workload on another CPU, but this means additional overhead and buffers and it only buys you the ability to split the workload N times for N cores, not “arbitrarily larger”.

korli · December 12, 2013, 12:49pm

That’s correct but I don’t understand what you expect exactly: an audio hardware output plays an end of buffer produced before at minimum the length of the buffer plus the time to produce that buffer. This means a minimum of 6.5ms in your case, whatever the audio software system is.

[quote=NoHaikuForMe]
No, the workload is too high when you can’t do one period of audio in one period of time. [/quote]

That’s correct. Though a node can theoretically add any latency, but has to produce a buffer at the time expected by the consumer, once by period of time. For instance a delay node will produce a buffer based on the buffer consumed N periods of time before.
The workload is effectively too high when a node can’t produce a buffer at the time expected by its next downstream consumer (you wrote the same differently).

NoHaikuForMe · December 12, 2013, 9:13pm

I hoped to get jua to understand how this works, and perhaps PulkoMandy too.