Cortex

jua · March 30, 2014, 3:09pm

No. Neither me nor my examples ever assumed that.

I’m not interested in getting into this discussion with you again, but please, stop spreading false information.

NoHaikuForMe · March 30, 2014, 5:08pm

BBufferConsumers make up rather less than “most” nodes. In many cases the BBufferConsumer is paired with a BBufferProducer which does declare a latency.

It seems as if this recycles the same argument we demolished earlier in the thread, go back and read it.

You run into a tautology. If you can’t get all the work done, you can’t get all the work done. In general if you can’t get 10ms of work done in less than 10ms, you won’t have any more luck trying to do the remaining work and the next 10ms of work in the subsequent ten milliseconds. This is a conceptual error by Be engineers that we’ve been banging our heads against on and off for the last half of this thread.

[quote]
You’ve understand perfectly. No need to comment a typo of a non mother tongue speaker. This just show us your joking way of doing discussions.[/quote]

I can’t be sure whether I understand you when you write things that make no sense. Your claim here was wrong, as I explained, so it was also possible you didn’t mean to claim that at all, I gave you the benefit of the doubt.

I don’t think you’ve understood why this latency for most clients is “trivial”. It’s ZERO. Remember, this is strictly algorithmic latency. For all clients without algorithmic latency, regardless of how many inputs or outputs they have, no callback is needed.

I am concerned that you still haven’t grasped what’s going on here, because the Media Kit does such a lousy job of explaining the realities of real time audio. Algorithmic latency here is the latency that’s unavoidably incurred by the mathematics of what we want to do. Not by whether you use an overclocked Xeon, or whether you used the latest high-speed DIMMs or whatever. So let’s do a worked example.

We’ll choose a lookahead limiter because although in practice they can be quite complicated both their purpose and the fundamental method of operation are fairly accessible. The purpose of the lookahead limiter is to ensure that, no matter how loud the input sounds are, the output is kept under a particular threshold, usually configured as a parameter, and further, the sound is distorted as little as possible. Imagine you’re on stage and there is a sound guy controlling the mixing desk, but he is psychic, so that a fraction of a second before you play something very loud he adjusts the gain so that it doesn’t deafen the audience. That’s what this limiter is for.

Obviously computer software, unlike the imaginary sound guy, is not psychic. It applies gain based on the trend of samples, to give smooth changes and thus reduce distortion. So our limiter cannot set the gain correctly for a particular sample N until it knows what the next few samples N+1, N+2 … N+M are. The value of M for a lookahead limiter might be something like 48.

If we’re sending and receiving sample frames in periodic blocks (as in both the Media Kit and JACK) then we can’t just use the received samples to calculate the samples we’re sending, because we need the next M samples too. So instead we insert M samples of delay, always outputting samples that are M samples later than the ones we received, this is the algorithmic latency of our limiter and that’s what JACK’s callback is for.

Does that make sense now?

[quote]
Absolutely not, scheduling is another story. BeOS use the latency to determine when the buffer should be played out.[/quote]

Exactly, this is a wrong-headed design, and the Media Kit pays dearly for it. It’s hard to tell exactly when this muddle started, we do not have the source code history of BeOS. Algorithmic latency is largely irrelevant to real-time scheduling, so the BeOS “latency” measurements end up proxying as a way to schedule the processing work.

Yes, so your suggested “improvement” doesn’t work, unless as explained below you’re willing to incur a one period per node additional latency. This is the decision Be’s developers didn’t like the look of back in 1999, but we don’t know if they could have come up with a solution (for example, a more rational way to schedule things in the Media Kit) because soon after the “focus shift” was announced, BeOS R5 PE was pushed out the door and developers who weren’t focused on making cheap “Network Appliances” work with BeOS were fired.

[quote]
That’s something like a xrun.[/quote]

No, it’s additional latency, an entire period (ie some agreed number of frames) of additional latency, per node. This is a very high cost to pay, and one Haiku developers have previously claimed is not paid in Haiku. Feel free to either dispute this with them, or accept it as true.

[quote]
Anyway, in sync mode jack always wait for the clients to end. In async he doesn’t wait and go to the next cycle. If jack were not taking into account latencies, how it recognize xruns?[/quote]

You seem to have answered the question yourself. If a new cycle begins and we are still processing the previous period then we have an xrun.

Indeed? Well I suppose in a way you are almost right. The xrun is detected because the hardware driver (for example ALSA) reports it to userspace. JACK needn’t try to guess. You might think “but it’s not a guess” alas you would be wrong, the clock that matters for xruns is the sample clock on the hardware, which may or may not match wall clock time. The hardware driver has access to this clock (we hope) but the userspace application does not directly.

[quote]
But this way to work is replicable also in Haiku. Jack is like a media_node itself and can be compared to it. It’s clients aren’t comparable to media_nodes but a little subset of it.[/quote]

In what sense is JACK “like a media_node itself” ?

NoHaikuForMe · March 30, 2014, 6:32pm

and

How else can we interpret this? I think I offered adequate time for you to explain it some other way but you declined. The node has no way, when it gives the estimate, of knowing the actual runtime conditions under which, according to you “if the node was written correctly” it must meet the deadline. In practice then it will unavoidably fall short of your requirement, and all today’s Haiku Media Kit producers behave this way.

I’m happy to limit myself to just quoting your exact words in future if you prefer.

jua · March 30, 2014, 7:01pm

[quote=NoHaikuForMe]
How else can we interpret this?[/quote]

Two nodes doing the same work at the same time doubles the CPU load, not the processing time. I explained this in detail before, I won’t do it again.

Estimating latency is tricky, but by no means impossible. Same as before, I won’t go over it all again.
If you want to know more about it, read the sourcecode of actually existing nodes. You see, Media Kit nodes exist, and they work.

NoHaikuForMe · April 1, 2014, 10:09am

[quote=Barrett]
If the node can’t get the work done in 10ms, the final consumer notify it, then the node can choose their way to work.[/quote]

In practice all of the supplied nodes and example code have the same “way to work”: They increase their internal latency estimate (up to an arbitrary limit) and press on regardless. Since most “lateness” reports are speculative they often get away with this.

As these internal latencies grow beyond one buffer period (for example, default Media Players alone can easily end up reporting a latency over 140ms, despite a buffer period of 40ms) the system starts to experience buffer bloat. Despite a requirement in the Be Book, Haiku’s BufferGroup code does not block but instead always returns an error if no block is yet available, so sooner or later the system will trip over its own nose in this fashion, and indeed various bugs have been reported that amount to this problem.

[quote]
If there’s not algorithmic latency the same is on Haiku. You have just the scheduling latency plus some little constant. This is completely unrelated to the discussion.[/quote]

It’s important because this algorithmic latency must be tracked in some pro audio applications. As we’ll see Haiku ends up not providing any way to do that. If you want to handle algorithmic latency in Haiku you’ll need to abolish (or entirely re-think) the Media Kit.

This is an attractive option, and it seems (though we can’t be sure at this distance of time) to be how Be’s engineers conceived “event latency” in the Media Kit very early on. By the time it was actually shipping, however, they had switched to the present approach, in which reporting algorithmic latency here doesn’t work.

The BMediaEventLooper’s ControlLoop() dispatches B_HANDLE_BUFFER events (thus, the actual work of the node) based on the node’s reported “latency” and its own estimate of scheduling latency. So yes, this is used to schedule work.

This is where (what I believe was) the original scheme that would have handled algorithmic latency fell apart. To get the Control Loop to behave in a desirable way Be’s engineers began recommending that people use SetEventLatency such that the reported “latency” was almost an entire buffer period. By the time BeOS R4.5 shipped, all example code took as its assumption the fact that “latency” is really just a curious way to say “how long before performance time this node should be run” and all concept of actual algorithmic latency is gone.

Critically, in Haiku every consumer makes this decision independently. The idea seems to have been that this propagates lateness detection through the network, but in practice the effect is that reported latencies grow until they bump into the arbitrary limits hacked in by various developers over the years.

In JACK xrun detection is a problem solely for the driver, at the edge of the graph. So long as the graph executes in its entirety within a buffer period, we’re OK.

[quote]
The BeOS use timesources, which are exactly that.[/quote]

BTimeSources are software, they estimate a relationship between real time tracked by the OS and, for the cases we’re interested in, a frame counter increased by the Multi Audio Node. Whilst more useful than just relying on the real time reported by the OS directly, they are not the hardware, just a proxy.

Haiku actually ignores the behaviour dictated by the Be Book for these time sources in a bunch of places, but it hardly matters because in practice so did BeOS.

The Multi Audio Node’s consumer does make its lateness decisions based on whether it was able to write buffers in time, as far as I can tell. But other consumers don’t have that option, so most “lateness” reports in Haiku are speculative.

Barrett · March 31, 2014, 11:11am

If the node can’t get the work done in 10ms, the final consumer notify it, then the node can choose their way to work.

If there’s not algorithmic latency the same is on Haiku. You have just the scheduling latency plus some little constant. This is completely unrelated to the discussion.

Obviously computer software, unlike the imaginary sound guy, is not psychic. It applies gain based on the trend of samples, to give smooth changes and thus reduce distortion. So our limiter cannot set the gain correctly for a particular sample N until it knows what the next few samples N+1, N+2 … N+M are. The value of M for a lookahead limiter might be something like 48.

If we’re sending and receiving sample frames in periodic blocks (as in both the Media Kit and JACK) then we can’t just use the received samples to calculate the samples we’re sending, because we need the next M samples too. So instead we insert M samples of delay, always outputting samples that are M samples later than the ones we received, this is the algorithmic latency of our limiter and that’s what JACK’s callback is for.

Does that make sense now?

The client should calculate such latency, we’re in the same situation of a media_node. No difference with the media_kit.

[quote]

While i understand your reasons, the BeOS latency is not used to schedule any work, stop saying it.

The cost paid is the additional latency notified.

That is what happens in Haiku, if we are in the performance time but have no buffers to play, we have a late producer.

The BeOS use timesources, which are exactly that.

The media_node have control on things which in the jack environment only jack have.

Barrett · April 1, 2014, 11:07am

NoHaikuForMe:

In practice all of the supplied nodes and example code have the same “way to work”: They increase their internal latency estimate (up to an arbitrary limit) and press on regardless. Since most “lateness” reports are speculative they often get away with this.

As these internal latencies grow beyond one buffer period (for example, default Media Players alone can easily end up reporting a latency over 140ms, despite a buffer period of 40ms) the system starts to experience buffer bloat. Despite a requirement in the Be Book, Haiku’s BufferGroup code does not block but instead always returns an error if no block is yet available, so sooner or later the system will trip over its own nose in this fashion, and indeed various bugs have been reported that amount to this problem.

The additional buffer is requested only in offline mode. If we are not in offline and recording mode, the buffer is ignored, and the producer is notified to be too late. The latency is not arbitrary but calculated this way :

lateness = (performance_time - latency - real_time)*-1

So it’s not arbitrary at all, it’s just the same which is done in jack. I’ve not looked into BBufferGroup, but when developing, i remember it blocking when there aren’t enough frames. About the media player latency, it’s a bit more complex. The calculus is done this way , media_player_latency + downstream latency. In your case more than 100 ms are caused by the long buffers which use Haiku as default. If you use the hda_audio patch to your haiku build, you will reduce it a lot.

You may say that it’s doomed, but the latency mechanism of Haiku separate various latencies, and one of them is the processing latency.

Well, it’s not the latency used to schedule work but the event_time.

It’s true, but also not. If we take as example a situation where the producer is connected to the sound card, so this decision is made by the audio card consumer. This isn’t bad, but reasonable because the media_kit was conceived to do different work at different time and format needs. Also this make the cpu load less, since the OS have more possibility to correctly balance it.

Do you remember the “DAC time source”? It’s different than the system one. If the soundcard driver provide it, it will use the card clock.

Could you fill bugs in the bugtracker? At least you will be of some help to understand your reasons against the media_kit. Other than that one probably will find a solution. Alternatively, your words are just words. Verba volant scripta manent.

Don’t know where you are looking, but those are not speculative. They are calculated in the way i showed at the begin of the article. If you found that in a Haiku’s node, fill a bug report.
And in the end, we are taking about BMediaEventLooper, but a node isn’t required to be derived from it. So this part of the system could be improved by providing a better designed class. Since you seems so expert, why not suggest a new design? At least to solve what we can solve, if the idea is worth one may end up with an implementation.

NoHaikuForMe · April 1, 2014, 11:25am

Do you understand what CPU load is? When you do more work, it takes more time. Perhaps this time does not feel perceptible to you, on human timescales, but the reason the CPU load goes up is that the CPU is spending more time working. Processing time has increased.

[quote]
Estimating latency is tricky, but by no means impossible. Same as before, I won’t go over it all again.
If you want to know more about it, read the sourcecode of actually existing nodes.[/quote]

There are basically two “methods” used by existing nodes. One method is to pick a number out of their air, say, one millisecond. The node declares its latency will always be one millisecond no matter what. This “estimate” is worthless.

The other one - which we’d already addressed - is to measure once at startup how long it takes to run the function that generates the output, this by its nature cannot take any account of future changes.

[quote]
You see, Media Kit nodes exist, and they work.[/quote]

Yes, you’ve said that before. And I even agreed with you, they do exist, they’re just awful.

Haiku gets away with a lot because hardware has got a lot more capable since the days of BeOS. The poorly implemented resampler I mentioned earlier for example, doesn’t sound as terrible on a PC with 192kHz audio, it’s still bad, but that higher sample rate hides the mistakes better.

jua · April 1, 2014, 5:59pm

Before we continue misunderstandings, let’s clear up definitions: when CPU load goes up, the increasing measure is CPU time. CPU time is the time the CPU uses to do actual work as opposed to being in its idle thread. By ‘processing time’ I mean something different: the wall-clock time it takes for a buffer to travel through a node. Whether a node takes 1 second at 20% CPU load or 1 second at 40%: that doesn’t matter, it’s still 1 second. The node could also use the 100% CPU for 0.1s and then sleep the remaining 0.9s before sending out the buffer – still 1 second.

There’s a third one used by nodes: look at the buffer duration and make a calculation based on that.

NoHaikuForMe · April 1, 2014, 10:16pm

Good point, the System Mixer uses two different numbers for this calculation to determine its internal “latency”, I had considered these to basically be picking a number from the air, but you could treat them more kindly if you were generous.

For a FormatChangeRequested it chooses either the buffer duration plus 4.5ms or 1.5 times the buffer duration, whichever is larger.

At Connect it chooses buffer duration plus 3.5ms or 1.5 times the buffer duration plus 1.5ms, whichever is larger.

As with many nodes if it receives a “late” notice it increases the internal “latency” by the reported lateness, unless it reaches an arbitrary limit (here 150ms) and then further late notices are ignored.

These choices mean that under normal circumstances after a particular buffer B has been created by the System Mixer, but before that buffer is handled by the Multi Audio Node, the System Mixer will calculate a further buffer B+1, and queue that too, so there’s always an extra buffer “in the air”. I have not found any commentary to explain this decision, if indeed there ever was a conscious decision to have it work this way.

NoHaikuForMe · April 2, 2014, 9:34am

On the whole actual nodes don’t block (sleep while working) like your examples. They’re running at far higher priority than almost everything else (Most at B_URGENT_PRIORITY with some at B_REAL_TIME_PRIORITY) so they can’t lose the lottery against less prioritised threads like an ordinary B_NORMAL_PRIORITY thread can occasionally and their memory is (supposed to be) wired so they don’t wait for paging.

Any loads you may see reported in the user interface are averages over a considerable time, usually a second or more. So they’re taking into account long (relatively speaking) periods when everything is asleep, not because a node blocked while working on a buffer, but because there was no work to be done.

So, when the node is actually doing any work, it always does that at 100% “CPU load”. If two nodes need to do some work, they each do their work at 100% “CPU load”, but they have to take turns and so it takes longer in wall clock time. Is this clearer for you now? The appearance of everything running simultaneously in a modern desktop OS is an illusion, like the appearance of continuous motion on a movie screen achieved by projecting a series of still images.

NoHaikuForMe · April 2, 2014, 10:13am

Yes, but buffers are created spontaneously as the event time triggers. Offline mode is… vestigial at best at this point. Most stuff does not work properly in offline mode, ignore it.

[quote]If we are not in offline and recording mode, the buffer is ignored, and the producer is notified to be too late. The latency is not arbitrary but calculated this way :

lateness = (performance_time - latency - real_time)*-1[/quote]

Did you mean to say lateness rather than latency, here? I can’t tell what you think you’re telling me.

I said the limit is arbitrary. It was imposed by Axel back in 2010, it’s set to 150 milliseconds, no further rationale was ever provided or asked for, it’s an arbitrary choice.

The reported latency starts significant lower and grows. The buffer period is, as I already said, about 40ms, and yes, this leaves more than one buffer “in flight”. The code actually acknowledges this, it ensures there are extra buffers allocated to allow for it, hence buffer bloat.

[quote]
You may say that it’s doomed, but the latency mechanism of Haiku separate various latencies, and one of them is the processing latency.[/quote]

I thought I’d explained what algorithmic latency is. It’s just a little further back up the page, maybe go back and read it again.

The event time just cranks monotonically here, the RealTimeFor() function basically converts the event time to a real time, then subtracts the declared “latency”, and that’s how long the BMediaEventLooper() will sleep. Thus, Media Kit latency controls when the work is actually scheduled to be done, increase the “latency” and the work will be scheduled sooner. It’s a self-fulfilling prophecy.

[quote]
Do you remember the “DAC time source”? It’s different than the system one. If the soundcard driver provide it, it will use the card clock.[/quote]

That’s actually the example I was looking at while writing. The card clock is not directly exposed as “DAC time source”, instead, exactly as I wrote, the Multi Audio Node updates a frame counter which acts as the “clock” for this time source. It’s a subtle difference, but it’s worth knowing about.

[quote]
Could you fill bugs in the bugtracker? At least you will be of some help to understand your reasons against the media_kit. Other than that one probably will find a solution. Alternatively, your words are just words. Verba volant scripta manent.[/quote]

There are plenty of bugs already filed about the Media Kit. Maybe somebody will fix some more of them this year. I am not speaking at all, any voices you’re hearing are in your head, this is just text, as a bug report would be text.

Nothing was actually late in a sense that would matter to the user, but the producer is sent a late notification anyway. That’s why I call it speculative. Remember that all the extant nodes respond to this by increasing their reported latency (so that, as we saw above, they will run “earlier”).

spinach · April 2, 2014, 5:05pm

realtime typically isn’t more than 44.1kHz though – extra time for calculations on a processor several dozens of thousands of times faster is not touching milliseconds. it’d help to speak in terms of real sample rates (in place of “realtime”) vs real cpu cycles. otherwise, no matter how adamantly one might appeal to reality, we’re not discussing real use and every number brought up is completely baseless.

jua · April 2, 2014, 5:26pm

I understand scheduling and load tracking well, no worries. Your description there isn’t really correct though, at least not for Haiku (hint: a B_URGENT_PRIORITY thread still plays “the lottery” – lower-priority threads still get time-slices as well, no matter how much CPU the urgent one demands).

Anyway, I didn’t even want to get lured into this discussion again, I will stop now.

NoHaikuForMe · April 2, 2014, 11:37pm

Oh?

[quote=“The Be Book”]
Real-time (100 and greater). A real-time thread is executed as soon as it’s ready. If more than one real-time thread is ready at the same time, the thread with the highest priority is executed first. The thread is allowed to run without being preempted (except by a real-time thread with a higher priority) until it blocks, snoozes, is suspended, or otherwise gives up its plea for attention.[/quote]

Both the old “simple” scheduler and Pawel’s scheduler largely‡ honour this expectation, both according to their authors and my review of the code. In Pawel’s scheduler threads at B_URGENT_PRIORITY or other real time priorities are exempt from busy/ greediness penalties used to punish normal threads. To be quite clear: lower-priority threads get no time-slices unless/ until the urgent thread blocks. The exact opposite of what you wrote.

‡ So far as I can see Pawel doesn’t strictly honour the case where multiple threads are all runnable at say, B_URGENT_PRIORITY. Be’s intention is that this will be treated like POSIX SCHED_FIFO rather than SCHED_RR but Pawel seems to have implemented SCHED_RR. It’s a small nitpick under the circumstances.

jua · April 3, 2014, 8:45am

[quote=NoHaikuForMe]
Both the old “simple” scheduler and Pawel’s scheduler largely‡ honour this expectation[/quote]
While I don’t have time to look at the code in detail now, it seems you are right on this one. I somehow remembered it only applied to B_REAL_TIME_PRIORITY itself, not anything above 100 already.
Anyway, you might want to reread what I said about processing time.

As I said, I won’t discuss things further in here.

Barrett · April 5, 2014, 10:30am

Suppose we receive an event for performance time at 50 ms, but real time is at 60 ms, suppose our latency is 1 ms : 50 - 1 - 60 = -11 then multiply this value for -1.

Don’t see how this is a design flaw, it seems natural to have an hardware abstraction.

[quote]

The topic is becoming annoying, and you are more or less talking with a hobbyist. If you are so sure about what you say, i think you should talk with core developers. And yes there are various ticket for the media_kit but no one related to the problems we discussed. Said that, i’ll stop to talk with you.