Professional Sound API

DavieB · April 28, 2010, 7:31am

A plug-in on Haiku is simply a media kit “node”. There is no requirement for a sound API for plug-ins on Haiku, it’s all provided for by the media kit.

TechnoMancer · April 28, 2010, 8:42am

While this is true it is also good to let people use their favourite plugins from other things as well, we do not require a new API, however if something must be portable or we are using a filter written elsewhere then an API like LV2 is not a bad idea.

DavieB · April 28, 2010, 9:09am

I suppose if someone has a plug-in they want to use then having a wrapper to the media API is ok. But will these plug-ins ever become native to Haiku? Will they be able to truly latch into the API and run as efficiently as possible? What will the latency be on a wrapped LV2 plug-in against the latency on a correctly implemented media kit node?

zanga · April 28, 2010, 1:27pm

Short answer: yes. Long answer: the LV2 core specification does not contain anything platform-specific, some LV2 extensions do (for example GtkUI), but those can be easily replaced by Haiku-specific extensions. There’s no technical reason, at least, preventing LV2 plugins to be as “native” as you want them to be.

Yes, the only overhead is an extra function call each time the audio buffer(s) is (are) processed, and this is more than neglibile.

I’m not familiar with the media kit API, but I wrapped several sound processing APIs and I never had any extra latency - on the contrary, the media server introduces latency for each node connected in a chain-like fashion, so the bottleneck in this case is actually in the way Haiku handles inter-application audio data routing.

DavieB · April 28, 2010, 2:23pm

[quote=zanga]
I’m not familiar with the media kit API, but I wrapped several sound processing APIs and I never had any extra latency - on the contrary, the media server introduces latency for each node connected in a chain-like fashion, so the bottleneck in this case is actually in the way Haiku handles inter-application audio data routing.[/quote]

How does the Media Server compare with Jack in terms of latency? I tried Jack on windows recently and it was atrocious, not surprisingly on Windows really. The model on Windows is essentially ASIO which is a direct connection to the audio driver, not really a media server as such but this is very low latency on a capable machine. The media server is more like a way of patching nodes together, nodes being anything you like; a physical audio input, a software “plug-in”, a mixer etc. It seems to me that the media api is in fact just one big audio patch-bay/mixer; a virtual mixing desk. So if you know anything about mixing desks you will know what the media API can provide. This is what I kind of what I saw in Cortex. The media API being one big mixer. So as far as latency goes in this regard it’s gonna be a trade off. If you want a mixer/media server then you are going to have latency, no question, this is true in the real world of pro level mixing desks too, be it very low. But if you want a hyper fast IO then you don’t get the mixer, just a driver to your hardware, super low latency without the patching of nodes.

I like the idea of having the virtual mixing desk media server thing as long as it’s fast enough. How fast is the media server? Anyone tried chaining together some heavy DSP plug-ins?

zanga · April 28, 2010, 2:47pm

[quote=DavieB][quote=zanga]
I’m not familiar with the media kit API, but I wrapped several sound processing APIs and I never had any extra latency - on the contrary, the media server introduces latency for each node connected in a chain-like fashion, so the bottleneck in this case is actually in the way Haiku handles inter-application audio data routing.[/quote]

How does the Media Server compare with Jack in terms of latency? I tried Jack on windows recently and it was atrocious, not surprisingly on Windows really. The model on Windows is essentially ASIO which is a direct connection to the audio driver, not really a media server as such but this is very low latency on a capable machine. The media server is more like a way of patching nodes together, nodes being anything you like; a physical audio input, a software “plug-in”, a mixer etc. It seems to me that the media api is in fact just one big audio patch-bay/mixer; a virtual mixing desk. So if you know anything about mixing desks you will know what the media API can provide. This is what I kind of what I saw in Cortex. The media API being one big mixer. So as far as latency goes in this regard it’s gonna be a trade off. If you want a mixer/media server then you are going to have latency, no question, this is true in the real world of pro level mixing desks too, be it very low. But if you want a hyper fast IO then you don’t get the mixer, just a driver to your hardware, super low latency without the patching of nodes.

I like the idea of having the virtual mixing desk media server thing as long as it’s fast enough. How fast is the media server? Anyone tried chaining together some heavy DSP plug-ins?[/quote]

I think you are maybe confusing latency and throughput here.

Very roughly speaking, latency is the amount of time it takes for a system to produce the output corresponding to the input, throughput is the “processing power” of a system; in other words, in audio latency is the (unwanted) delay due to various kinds of buffering (hardware and software) and unavoidable algorithmic delays (access to future input data simulated by adding delays and processing current data as it was past data), while throughput is how much stuff you can process at the same time.

In practice, latency and throughput are contrasting concepts, so you need to make tradeoffs.

Now, AFAIK, the Haiku media server exchanges data among directly connected nodes at each “cycle”, while JACK executes all the nodes taking into account dependencies; this mean that if I have something like: input -> A -> B -> C -> output, in the Haiku Media server it goes like:

t=1: input -> A
t=2: A -> B
t=3: B -> C
t=4: C -> output

while in JACK it would be:

t=1: input -> A -> B -> C -> output

So the latency introduced by the Haiku media server is in this case 4 - 1 = 3, while in JACK it is always 0.

This, of course, at the expense of throughput. If you use ALSA directly on Linux you have better throughput than when using JACK, no doubt about that, but the total latency (hardware + software) is exactly the same.

cb88 · April 29, 2010, 1:45am

Your analysis seems a bit off…jack is like a single cycle CPU with low throughput where to the user/programmer an instruction takes once cycle but the cycle must be much longer yes it has zero cycle latency but the cycles are much longer thus the low throughput really no CPUs are designed this way anymore because its stupid.

If Haiku operates as you describe it has an audio pipeline which spreads out the latency between each step instead of delaying the whole pipe while one chunk goes through as you described Jack

A pipelined audio system would IMO have better throughput with minimal degradation in latency especially on SMP processors which are widespread at this point and were obtainable back in the BeOS days though a bit more expensive. In fact the improved throughput would probably have the effect of reducing system load thereby reducing latencies…assuming high load audio processing if you assume lower load Jack should be able to keep up but will reach its limits faster from this perspective I see a pipelined audio kit as more robust.

I have no idea how these are actually implemented but your description doesn’t make much sense to me. Maybe you can set my thinking straight if I have missed something or other.

t=1: input -> A
t=2: A -> B
t=3: B -> C
t=4: C -> output

while in JACK it would be:

t=1: input -> A -> B -> C -> output

While your diagram is OK you have to remember that T isn’t the same in both of those most likely that jack’s t will be very large and haiku’s t very small

PS: I think the system refered to earlier is RADAR V … which I believe can still run BeOS as an option (going from what I have read never seen one myself)

DavieB · April 29, 2010, 7:52am

Radar is a hardware DAW built on top of BeOS, yes.

Can we make the example a bit more accessible please?

What does “t” represent? Is it a variable? Does it mean “thread”?

input, I understand this, it could be anything from a file playing or a signal coming in from a hardware sound card.

A, B and C is this is a dsp process such as a “gain control” or an “EQ”?

output, again this is the end of the path, could be a file or a physical output on the sound card.

So correct me if I am wrong.

Jack

soundcard -> EQ -> Chorus -> Delay -> monitors

or

Wav file -> EQ -> Chorus -> Delay -> Wav file

Haiku

soundcard -> EQ
EQ -> Chorus
Chorus -> Delay
Delay -> monitors

???

soundcard ->

start threads
thread1 = EQ -> Chorus -> Delay
thread2 = EQ -> Chorus -> Delay
thread3 = EQ -> Chorus -> Delay
end threads

-> monitors

Feel free to step in at any time, haha…

TechnoMancer · April 29, 2010, 8:19am

t is a unit of time as the DSP system sees it.
What I believe it means is that in JACK the whole chain is processed in one step then it happens again and again over time but the time step is larger.
In haiku, it passes the buffers through each node and then they give them back to be passed to the next node and so on, this means that each “time step” is smaller and the delay of moving the data from one place to another is spread out over the whole chain.
At least that’s what I understand

starsseed · April 29, 2010, 8:44am

If your buffer length is 5ms, your latency is at least 5ms
then, you have to wait the scheduler to execute your thread. for this kind of job, (soft)realtime capable scheduler are much better. your thread wait less and then your buffer can be smaller. It was the main reason for BeOs’ excellence.

so I wouldn’t say Jack’s latency is 0.

DavieB · April 29, 2010, 8:57am

Oh I see.

This is about how a “unit of time” is being managed by the Haiku media server or Jack.

So in Haiku is the following roughly correct?

A media roster (mr) object manages 4 nodes; 1-input, 2-EQ, 3-chorus and 4-file output

A single cycle is as follows …

input returns to mr -> mr feeds EQ -> EQ returns to mr -> mr feeds chorus -> chorus returns to mr -> mr feeds file output -> file output returns to mr

… repeat

NoHaikuForMe · April 29, 2010, 9:50am

Some of what’s been written above is… confused

LADSPA / LV2 and JACK are complementary. Consider a serious DAW system, you may have twenty plugins running in realtime, you don’t want the overhead (minimal as it is) of the context switch for each plugin, so they run in-process. But also, you may see that you’re running short of CPU throughput and so you decide to “cook” some of the plugins, those you haven’t really been changing the parameters on. The same API can be run faster (or indeed slower) than realtime, applying any automation and resulting in an intermediary image of a track to which any remaining plugins are still applied in realtime.

Now, as to the hypothetical benefit of sort of “sharing” the latency between applications, there are several nasty problems with this approach, although none of them apply to the real world uses of the BeOS media kit, which were mostly to play some music or game sounds, and not to do serious audio processing.

• Synchronisation. In the JACK system all the components are processing the same time period at the same time. A “sharing” scheme screws this up. Of course you can still have an audio clock but there is no well-defined synchronisation between that clock and non-audio events. We can easily imagine such a scheme causing the audible effect of two simultaneous MIDI events to be separated in time, which would be intolerable from a live music point of view.

• Worse case wins. In the JACK system all the work must be scheduled for each period. Most decent audio software is designed with realtime considerations in mind, which means predictable workload (the workload may vary when parameters are tweaked, but it shouldn’t vary for other reasons). A lot of work was put into identifying JACK software that emitted or was strangled by denormals - valid but unusual IEEE floating point values that are often costly to process in hardware. But with the “sharing” scheme each period must be long enough for the most CPU intensive of your processes to complete.

e.g. suppose process A takes 5ms for a 2048 frame buffer, process B 8ms, process C takes 22ms. These are very intensive processes, perhaps process C is a Convolver with a complex impulse from a cathedral. Or maybe we just aren’t using a very powerful computer. Let’s further suppose we’re working with 48kHz PCM audio, so a 2048 frame buffer equates to just under 43ms of latency.

in JACK 5ms + 8ms + 22ms = 35ms, and we incur just 2048 frames of total latency, 43ms

in the hypothetical sharing system (I hope the BeOS media kit is not in fact this terrible) the worst must run in one cycle, the worst is 22ms and we have three cycles of 2048 frames, for a total latency of 128ms

In reality small latencies (< 30ms) are often inevitable anyway, not least because sound travels relatively slowly through air (about one foot per millisecond) - if you perform live you have to get used to the fact that it takes a noticeable amount of time for sound to travel across the stage, and compensate for that. But of course it’s appropriate to try to keep this to a minimum - which is why JACK was designed the way it is.

DavieB · April 29, 2010, 11:03am

Ok I run a Cubase mix and have plug-ins inserted into the mixer etc. Ok, the CPU is being beaten to death so I “freeze” one or two tracks to free up some CPU throughput. This process of freezing can happen in realtime or offline (faster than realtime).

Sharing latency between applications will be done by a process that must synchronize all the processes to some sort of clock. Media roster, I think, is the class that manages nodes and that is slaved to some sort of system level clock, other media roster objects can run in parallel and will also be in sync. I think that the biggest media roster object or rather the one with the largest latency will mean that the other media rosters will have to wait for this largest one to finish. At least that’s my understanding of it. I’m very new to the media kit, so I’m currently reading the documentation. Serious audio processing seems achievable using the media kit, why do you say this? Can you give some reasons.

Synchronization. I think in the Haiku system the same is also true there is a master clock that synchronizes the nodes under a media roster object. Midi can be synchronized to this clock also. Live, midi events are recorded in sync to the audio that is being monitored. The monitored audio will have a latency but when the performer hits a key on their midi keyboard there is no latency and the sound is heard by them in time with the monitored audio, It is recorded exactly in sync with the audio. Not entirely true for virtual instruments though.

I don’t think the sharing system you describe is in fact how the media kits works. I think the media roster object owns the nodes kind of like a cable between them. And it is slave to a system level clock. The biggest media roster object is total system latency. I don’t actually think the latency you describe would be any different. The media roster is just jacking together the nodes.

I think 43ms in Jack is 43ms in Haiku more or less.

TechnoMancer · April 29, 2010, 12:43pm

All Media Nodes are slaved to a time source, I believe this fixes the sync issue.
There is a system default time source that is used by media kit as the default for nodes, you can use others though I believe.

zanga · April 29, 2010, 12:56pm

Oh my, what a mess! I try to be as clear as possible this time, but this is the last time I try to explain things, since I was only interested in knowing what the developers think about this issue and I hope they know what I am talking about at least.

So, let’s say that we are reading and writing audio buffers from/to the soundcard and each buffer represents x milliseconds of audio.

Now, let’s consider we have a program which reads stuff from the microphone input, processes it and writes it back to the soundcard: each x milliseconds the sound card driver gives you the data which corresponds to the last x milliseconds of microphone input (so your input data goes from (now - x milliseconds) to (now)), you process it (let’s suppose instantly for simplicity) and you write it back to the soundcard - ignoring other latencies, the soundcard can only start to reproduce the the processed sound from the moment you sent it, so from (now) to (now + x milliseconds). This means that the output sample corresponding to the input sample (now - x milliseconds) is (now), so we have x milliseconds of latency - which in this case is only related to the buffer size and the sample rate.

Now we move to the case of inter-application audio data routing: we have three independent programs, A, B and C, and a signal flow like: input -> A -> B -> C -> output, which means A takes its input from the sound card and outputs to B, B outputs to C, C outputs to the soundcard.

Once a system like JACK or the Media Server receive data, they let all nodes process some data, the difference is in which data is actually processed.

AFAIK, the Media Server does not care about processing chains, it just “calls” nodes in random order giving them as input the output of the previous node at the previous cycle, and keeping in memory their output for the next cycle.

If you do your own math, you will discover that each node in a chain introduces a latency which is equal to the amount of time corresponding to the buffer size (input of B is the past output of A, input of C is the past output of B, which correpsonds to the second past output of A).

In our case, since we have 3 nodes in a chain, the Media Server introduces an additional latency which is three times the time corresponding to the buffer size (so our software latency, excluding algorithmic latencies, is now 3 or 4 times the buffer size in time - I don’t know whether the output to the sound card is actually the current or the past output of C).

JACK, instead, analyses the connection graph and does basically a topological sort, so that at each cycle, it can give to a certain node the data produced by the previous node in the chain corresponding to the “global” output which is going to be produced in this whole cycle.

So, it will execute, in order, A, B and C in our case, using the current output of A as input of B and the current output of B as the input of C, which means it introduces no latency at all (total software latency, excluding algorithmic latencies, is just the buffer size in time).

Now let’s make some considerations:

JACK is probably more resource intensive that Media Server, so it steals some more throughput, but I have no measurements so I can’t really tell.
The number of nodes executed at each cycle by both systems is probably the same (all nodes), unless they have some mechanisms which exclude from execution nodes which are not useful to generate the total output (theoretically there are ways to do that in both systems, but in practice it might even be harmful).
I/O buffer sizes are the same or at least comparable in both systems, since whatever systems, smaller buffer sizes correspond to less latency but also to less throughput (more interrupts or read/writes and more code executed in practice).
The threshold to hear separate sounds is somewhere around 10 ms, which corresponds to buffer sizes of 441 samples at 44.1 kHz, 480 samples at 48 kHz, 960 samples at 96 kHz, etc. In pratice the hardware (A/D, D/A, and other stuff) introduce even more latency, so you should typically stay at least below 5 ms of software latency. In JACK you can just stick to that, with the Media Server you should further divide your buffer size by the maximum number of chained nodes, which gives you worse throughput.
A nasty side-effect of introducing latency for each node is synchronization of different streams, and that IS important if you want graph-like connections, because the human hearing can perceive differences between sounds no matter how small, in time, they are (well, almost, but for practical purposes that’s what it is).

DavieB · April 29, 2010, 1:49pm

I don’t think I really know what you are talking about because you are being too technical. If you can make what you are saying more accessible by using analogies to the real world applications, such as mixing desks, monitoring, input, etc it would be easier to follow what you are saying. It is obvious you have a specialist skill in the area of LV2 Jack etc and have developed for it and others but I myself have absolutely no experience what so ever in this area of programming. I am a music producer and songwriter but I work as a professional software developer mainly and I write database systems but I can follow a coding guideline if it’s given to me or some well commented code examples or well documented help files. It would be good if you could discuss this with the developers of the media kit. Anyway, trying to make sense of all this …

Ok, a soundcards input to output has a latency which is the same as the soundcards buffer size for a given sample rate.

Now, we have three programs linked in series.

In your example is a “node” a program; A, B or C?

The media server doesn’t card about processing chains? I think I know what you are going on about now. On linux you can string your programs together in series such as; Renoise -> Ardour -> soundcard. Jack treats this as one thing? How does the media server achieve this? Well each program A, B and C uses a the media roster object. This roster object owns individual “nodes” for fx, inserts, tracks, anything you like etc. The roster object is aware of a master clock so that they can run in sync. This is ok when things are to be sync in parallel but you want to know how the media roster objects run in series. This is an important point. I don’t actually know.

Simple high level view of the media kit

Media Kit is the studio.
A “node” is to an effect, input, output, track, channel etc
A media roster is an application such as a sequencer, DAW, beatbox program that can use nodes and can be sync’d to a studio clock etc.

So can media rosters be chained together in series? One feeding the next etc?

zanga · April 29, 2010, 2:12pm

DavieB:

I don’t think I really know what you are talking about because you are being too technical. If you can make what you are saying more accessible by using analogies to the real world applications, such as mixing desks, monitoring, input, etc it would be easier to follow what you are saying. It is obvious you have a specialist skill in the area of LV2 Jack etc and have developed for it and others but I myself have absolutely no experience what so ever in this area of programming. I am a music producer and songwriter but I work as a professional software developer mainly and I write database systems but I can follow a coding guideline if it’s given to me or some well commented code examples or well documented help files. It would be good if you could discuss this with the developers of the media kit. Anyway, trying to make sense of all this …

Ok, let’s see if I can make things more clear.

Ok.

[quote=DavieB]Now, we have three programs linked in series.

In your example is a “node” a program; A, B or C?[/quote]

Yes.

[quote=DavieB]The media server doesn’t card about processing chains? I think I know what you are going on about now. On linux you can string your programs together in series such as; Renoise → Ardour → soundcard. Jack treats this as one thing? How does the media server achieve this? Well each program A, B and C uses a the media roster object. This roster object owns individual “nodes” for fx, inserts, tracks, anything you like etc. The roster object is aware of a master clock so that they can run in sync. This is ok when things are to be sync in parallel but you want to know how the media roster objects run in series. This is an important point. I don’t actually know.

Simple high level view of the media kit

Media Kit is the studio.
A “node” is to an effect, input, output, track, channel etc
A media roster is an application such as a sequencer, DAW, beatbox program that can use nodes and can be sync’d to a studio clock etc.

So can media rosters be chained together in series? One feeding the next etc?[/quote]

Well, I said I have no experience with the media kit itself, apart from resampling; my source of information is: The Be Book - System Overview - The Media Kit

I don’t know if media rosters can be chained themselves (doesn’t Cortex do that?), but even inside one roster, connecting nodes in series creates latency.

Real world example: let’s say I have 3 ms buffering latency and I’m using a guitar rack program which uses nodes for each effect. I want to apply 5 effects in this order: wah, distortion, phaser, flanger, FIR convolver - if each node introduces latency I have a total of at least 12 ms of latency. If I used JACK to connect individual application each doing one of these effect I would have had only the 3 ms of latency.

Why? Because JACK is capable of executing the effects in the “correct” order, while the media kit does not take the order of effects into account. So the media kit has to use past buffers for each effect to avoid processing bad data, and each buffer is 3 ms long.

Result: 12 ms is hearable, 3 ms instead is not.

I hope this is clear enough.

DavieB · April 29, 2010, 2:32pm

I’ve been going through the documentation too.

I did notice in the documentation that when a media roster links it’s nodes together that the link itself introduces a latency of 2 microseconds? Is this what you mean?

This is the quote from the media kit http://www.haiku-os.org/legacy-docs/bebook/TheMediaKit_Overview_Introduction.html

"Let’s consider a case in which three nodes are connected. The first node has a processing latency of 3 microseconds, the second has a processing latency of 2 microseconds, and the last has a processing latency of 1 microsecond.

This example uses the term “microseconds” because the Media Kit measures time in microseconds; however, the latencies used in this example may not be indicative of a real system.

In addition, 2 microseconds is required for buffers to pass from one node to the next. The total latency of this chain of nodes, then, is 3 + 2 + 2 + 2 + 1 = 10 microseconds."

This is kind of making a bit of sense to me now but I don’t care about the low level stuff at the moment. I’m interested in the high level plan for professional audio on Haiku.

Assuming that the media kit is capable, something you are currently trying to find out, I envisage it this way …

Haiku is the recording studio.

At the centre of the Haiku sound studio is, yes you guessed it, a mixing desk. If you are working with professional sound applications you need a mixing desk, it is an essential thing to have.

The mixing desk provides the ability to IO, set levels, eq, compress, and use “add-on” fx etc. This is also where you will find the master clock and transport contols for the studio.

It is the hub of the sound environment in Haiku and DAW’s, Editors, BeatBox’s, Sequencers, instrument racks all output to the mixing desk. Even the sound from the media player goes to the mixing desk.

There is no situation that cannot be catered for in this design because it comes from a professional studio.

Cortex is the nearest example of what I am thinking about but it’s badly done.

What Haiku needs first and foremost is a mixing desk …

Thoughts please…

NoHaikuForMe · April 29, 2010, 2:44pm

[quote=zanga]

JACK is probably more resource intensive that Media Server, so it steals some more throughput, but I have no measurements so I can’t really tell.[/quote]

The overhead for JACK is really tiny, I’d be very surprised if it’s as much as the Media Server in a like-for-like comparison, let alone more. Where possible JACK uses FIFO scheduling, so the OS scheduler is automatically running the next needed JACK thread(s) whenever there’s work to be done, and it uses carefully aligned and wired shared memory so that everything is in RAM and there’s zero copies, probably the biggest source of “overhead” is the choice of single precision floating point for PCM representation, but if you’re doing anything really serious you have to pay that price anyway, so better to pay it only once inside JACK itself where they have highly tuned conversion code.

You might think the graph ordering problem would add overhead, but JACK’s design ensures this can be done outside the tight audio loop where it isn’t time critical - and without locking too.

The FIFO scheduler is potentially dangerous (nothing else will be scheduled while a FIFO process is ready to run), but modern Linux lets processes be given a “limited” privilege to run with this scheduler for a finite fraction of time. Because JACK is all about low latency a process that abuses FIFO scheduling to stay running too long is undesirable anyway, so the interests of JACK performance and system stability are aligned.

All this is great - so long as you’re doing pro audio. The moment you’ve got people who don’t know about audio writing the software, you lose. Such people aren’t going to spend hours analysing their work in cachegrind and they certainly aren’t going to worry if their code sometimes incurs a disk seek (typically 10ms each) when it’s supposed to be emitting audio. Their approach is fine for listening to music while reading your email, but it’s not pro audio, and in JACK what will happen is the xrun counter will start climbing and the program gets kicked out of the graph.

The consensus has long been that we aren’t going to get general application developers to “pay their taxes” (to use Raymond Chen’s preferred phrase) on pro audio APIs, so you will always want a separate “no compromise” pro audio system for doing real work because the system used by general application developers will have to accept relatively long latencies and unpredictable latency jitter from less than excellent software. From what I have read and understood about it, the Media Kit is the latter type of system.

DavieB · April 29, 2010, 2:55pm

[quote=NoHaikuForMe]
The consensus has long been that we aren’t going to get general application developers to “pay their taxes” (to use Raymond Chen’s preferred phrase) on pro audio APIs, so you will always want a separate “no compromise” pro audio system for doing real work because the system used by general application developers will have to accept relatively long latencies and unpredictable latency jitter from less than excellent software. From what I have read and understood about it, the Media Kit is the latter type of system.[/quote]

Are you saying the media kit is not up to scratch for pro audio applications? Have I misunderstood?