Getting the basics right

So here’s Adrien Destugues (“pulkomandy”) piping up yet again about Haiku’s audio, this time on the mailing list

The most vital thing for anybody working on Haiku audio to know about this is that it’s wrong.

Not “I have a bit of a quibble with how Pulkomandy phrased this” or “Some people don’t entirely agree, this is an open area of research”, it’s just plain wrong. Doing this will make your OS markedly worse.

As always I encourage people to try this for themselves rather than just taking my word on trust. Get say, four pop songs and mix them together by summing the floats, then dividing by four. The result is very quiet, far too quiet compared to just playing four pop songs at once with, say, four actual boom box CD players. In fact it’s exactly four times (in linear signal terms) too quiet because dividing by four is an error. Now try without the unnecessary division, it sounds how you’d expect from four songs played at once.

If you don’t want to go to all that bother, try one simple thought experiment, suppose we mix silence with a pop song. The correct and simplest algorithm gives us back the pop song untouched. That seems right. Adrien’s algorithm halves the linear volume of the pop song, which is obviously wrong.

Wow! Massive trolling!

I don’t think that every comment every developer makes on the mailing list needs to be examined in such detail. People make errors when commenting on a topic off the top of their head.

Why don’t you actually join the discussion instead of picking out a tiny part of it to criticise out of context here on the forum? Because you’d rather face users in a more publicly visible forum, than developers?

Hi!

What should we do then? Just add the signals together without any normalization? This would in theory match with what happens in "real life" (in your 4 boom boxes example, at least), but we have to fit everything into a single soundcard, which has a limited input range (usually, 16 to 24bit).

So, if we don't do any normalization when adding signals together, as soon as you use more than two inputs, you get saturation. Is there a way to handle it without lowering the final volume? (just ignoring the problem doesn't count, as the garbage that would result certainly isn't what would happen in real life either).

Of course my suggestion only works if there is a fixed number of inputs, which isn't the case with our mixer. Thanks for pointing it out, but I'd appreciate inputs or hints on what else can be done.

BTW I think maybe what you do is just add the signals, then the user (or some other software) needs to adjust the gain for each input to avoid clipping? Or adjust the gain of the mixer? It doesn’t seem like a very easy problem to solve otherwise…

Yes you do, so you’ll sometimes clip. Interestingly, that’s ultimately what happens for a room full of boom boxes too, though they experience a “soft” clipping at the limit whereas you’ll hard clip at 0dB full scale. But that’s not important here.

Just “ignoring the problem” doesn’t result in “garbage”. Try it. Really. We’ll all wait. I always invite people to try it, not just because practical experience is most likely to get you to understand what’s happening, but because the practical experiment sticks in your head, you don’t forget it a week later and repeat the same error.

If you put two lots of full scale white noise into the mixer, about 25% of the samples will clip. That’s a noticeable distortion, you may be able to hear it’s a bit “off” somehow, but you started with white noise at 0dB and for something people are actually going to listen to the rate of clipping will always be less, usually much, much less.

Clipping isn’t desirable, you should do it just once, at the edge of the pipeline when the audio is being converted to integers for a DAC, but it’s better than any other alternatives when mixing inputs which may be too hot and (done once) isn’t usually even noticeable.

In principle it might look as though a Compressor is what you need here. If you’re an aspiring (electronic) musician and sometimes things are much too loud or too quiet, a Compressor is definitely the right tool to make your output bearable for the audience. But such dynamic compression (not to be confused with the data compression in something like FLAC or MP3) introduces a delay and potentially also distortion that are unpleasant and unnecessary for the simple problem of playing a “You’ve got mail” ding over the Youtube video you’re watching. So just apply clipping, once, in the right place and don’t worry.

Ok, assuming floating point format is used for the samples, this could be made to work at least for the mixing part. But, the existing “float” format in BeOS and Haiku requires that the samples are normalized in the -1:1 range.

This won't be a problem in the simple case (where we just have some producers and a single mixer just before the soundcard at the end of the chain), but we can get into a lot more problems with more complex audio processing. In my mailing list posts I tok the example of a ring modulator, that is, a node that would take two inputs, and multiply them together.

This is where things break apart, unfortunately. If the input is floating point in the -1:1 range, multiplying two streams gives an output also in the -1:1 range. No clipping or saturation problems. But if the inputs are in a wider range, let's say -2:2 because each input is the sum of two other signals, then the output will be in the -4:4 range. I'm not sure this is the expected result, and it can get out of control quite quickly if you manipulate a dozen different signals or so in such a way.

Maybe one solution would be to adjust the gains automatically when setting up the media node chain, with each node reporting its min/max levels to the ones it feeds input to.

It’s just not in BeOS and Haiku. Jack too require i/o normalized in +1/-1 range, I still fail to see where’s the problem, sorry. I’m under the impression the discussion is moving on “there’s someone wrong on the internet”, but anyway, ALL audio out there is done in floating point. We might market our OS by having a very funky support for integer audio and other things, but I don’t think it will feet the needs of the developers.

[quote=PulkoMandy]Ok, assuming floating point format is used for the samples, this could be made to work at least for the mixing part. But, the existing “float” format in BeOS and Haiku requires that the samples are normalized in the -1:1 range.
[/quote]

If the format just says +/- 1.0 is 0dB full scale, there’s no problem. That’s a common convention used in lots of places, and doesn’t prohibit values outside -1:1

If the documentation says other values are actually prohibited, that’s more problematic, but I would recommend Haiku simply changes the documentation. Nobody is going to enforce a rule like that, and implying to developers that they can rely on it just means they’re going to produce poorer software.

A practical problem you run into with float representation is that floats can contain several types of extraordinary value which are not processed quickly by an FPU. Infinities, NaNs and de-normals are allowed in the IEEE 32-bit format, but the hardware treats them as a special case which may take tens or hundreds of times longer to handle. It makes good sense to declare that audio software “should not” produce these values, since Infinity and NaN are meaningless as PCM audio data and de-normals are effectively zero for audio purposes, but again it’s probably futile to say they’re actually prohibited and thereby set the expectation that it’s OK for programs to crash if the inputs are de-normalised.

Sure, it’s usually going to make sense for the control input to only vary -1:1 for a ring modulator. But while stringing arbitrary modules together shouldn’t cause a catastrophic failure I don’t think it’s unfair to expect that doing so might not sound very good if you’ve no idea what you’re doing.

One of the things that annoys noise musicians is that a lot of people think “anybody could do that” but noise performance actually requires a pretty good understanding of the equipment in order to make anything interesting at all happen without just blowing all the gear up. In software we don’t have to worry about letting the magic smoke out with values which are too big, so we can let users experiment for themselves - shoving crazy values into the control input of a software effect might result in annoying squealing sounds, or total silence, but it shouldn’t crash your hard disk.

A lot of commercial (e.g. guitar pedal format) analogue ring modulators actually don’t have a control input, that’s replaced by a built-in LFO (low frequency oscillator) which the manufacturer can ensure only runs in the right range. Having the separate input so that a user can insert an LFO or something else appropriate offers more flexibility but really a lot of people are just going to use an LFO anyway.

[quote]
Maybe one solution would be to adjust the gains automatically when setting up the media node chain, with each node reporting its min/max levels to the ones it feeds input to.[/quote]

It can’t hurt for nodes to do this, where they have the information to calculate a correct answer, but I shouldn’t think it’s a priority. However, a much smaller tweak would be for relevant nodes to just declare “this is actually a control input / output” as a hint that can be read by other participants. For example the hint could be displayed as a different colour in Cortex to show that it makes sense to plug the LFO into the control side of the ring modulator, without actually preventing you from instead listening to the LFO directly or feeding audio into the modulator’s control line.

Anyway, this is advanced stuff. It’s OK if using Haiku as a modular synth is trickier than playing a Youtube video.

[quote=NoHaikuForMe]
It can’t hurt for nodes to do this, where they have the information to calculate a correct answer, but I shouldn’t think it’s a priority. However, a much smaller tweak would be for relevant nodes to just declare “this is actually a control input / output” as a hint that can be read by other participants. For example the hint could be displayed as a different colour in Cortex to show that it makes sense to plug the LFO into the control side of the ring modulator, without actually preventing you from instead listening to the LFO directly or feeding audio into the modulator’s control line.

Anyway, this is advanced stuff. It’s OK if using Haiku as a modular synth is trickier than playing a Youtube video.[/quote]

I think the primary aim of the media_kit was to be something enough flexible to support any kind of media production. To some extent, this goal is really full filled. I think we have the possibility to get the Mixer to work in different ways, one of them would be to get it do it’s processing in floating point and advantage this over all other formats. As said in ml, the main problem is ensuring that audio isn’t going to get resampled or converted to a different format until reach the sound card. Right now it’s a problem if the user isn’t aware of the issue, and I might expect musicians which just want to do some real time processing without knowing anything at all about how signal is processed. If we were building an enterprise system, there’s not problem as there are ways to make sure this works. So, in the end the problem is more related to the end user and the kit might have provided something (such as ways to organize nodes into groups) to make this easy to control.