Finally, new DAW running n v. 4

learjeff · March 15, 2006, 7:51pm

Quote (Nils K @ Mar. 14 2006,02:11)

Quote (learjeff @ Mar. 13 2006,10:07)

Plugin latency would have nothing to do with 16-bit vs. 24-bit soundcard.

Overall audio latency depends on soundcard drivers, but that's a different thing. We're only talking about plugin latency here: the delay the plugin adds to the signal, and n-Track's ability to compensate for that.

How and why does n-Track version 3.x differ from version 4.x in that respect?

All I state is that I notice an audible difference in the latter not present in the former.

regards, Nils

I have no idea. I made a narrow point and I stand behind it. Perhaps maybe I should duck behind it! :;):

The point being simply that this has nothing to do with 16-bit vs. 24-bit.

learjeff · March 15, 2006, 8:12pm

Quote (woxnerw @ Mar. 14 2006,18:42)

Is that why the "Rise-Time" of the "Spike" is so important?

The Spike .. being the Impulse Response Waveform?

Bill..

Not exactly.

Here's the theory, but be warned that I have a limited understanding of the actual math. (I have Hamming's book, the classical text, but I get lost in the first chapter, which covers the necessary underlying math. Makes me wish I'd skipped fewer of those Differential Equations lectures ... I'm still astonished I got a C in that class! I walked out of the final exam certain that I'd flunked it since I didn't have a clue what I was doing and just imitated some of the stuff I'd seen on the chalkboard. I was embarrassed to turn it in, but I PASSED!)

BTW, I'll get a little of this wrong but will update it later with text in hand.

Principal fact: Any infinite impulse response (IIR) filter can be completely characterized by its response to an impulse input.

What's this mean?

I won't go into what IIR really means, but most things that act like filters in nature and analog circuitry are IIR filters. This includes natural reverbs, audio filters, and a host of other things.

"Completely characterized" means we know all there is to know about how it will respond to a give input, so we can (in theory) duplicate it exactly.

OK, then, what's an "impulse input?" It's when the input signal starts at the maximum input value and immediately drops to zero. Well, this is where reality creeps in, and we find that despite the truth of the fact above, we can't REALLY completely characterize anything in the physical world, mostly because it's impossible to go from any value to zero instantaneously.

But we can get close enough.

BTW, what we do ideally is feed this (impossible) impulse into the thing we want to imitate and record its response. Then we wave a mathemagical wand over the recorded response to produce something called a "filter kernel" or "convolution kernel". Later, in our emulation, we plug the kernel into a generic "convolution engine", and bingo, we've built a model for what we measured. Feed it an input signal and (within the limits of our model framework, like bit rate) out comes the same thing that the real thing would do. Which is why SIR sounds so dang good.

Now, how do we feed an impulse into a real system? Well, we can't. Two impossible parts: (1) getting it to start at the maximum value, and (2) dropping it to zero instantaneously.

Obviously, can't do either of these. But what we can do is fire a gun or something that makes a nice sharp BANG spike. Ideally, it goes from zero up to a maximum and then comes right back down, without squirrelling around on the way down. I'm not sure how we compensate for the upward part, but I bet there's mathmagical wizardry to minimize that. Still, it's important for the sound to go up and back down as cleanly and quickly as possible (if recorded in an anechoic chamber). We need it to be as close to an impulse as possible. Otherwise, the underlying math that this is all based on comes crumbling down like a house of cards on a windy day.

Someone should post this to a wiki somewhere!

BTW, I don't know whether the files called "SIR impulse files" are recordings of the responses or are actually convolution kernels. It doesn't really matter because one is derived from the other, but with the waveform you can choose how long a kernel to calculate. The longer the kernel, the better the results but the higher the latency.

Cheers
Jeff

jimbob · March 16, 2006, 2:48am

I’m feeling particularly geeky today and will try to add to what Jeff said about how a processor like SIR works and will make a comment about how this relates to latency at the end.

It is my impression that the convolution processor is an FIR filter (Finite Impulse Response) and that it is programmed directly with the impulse response windowed to fit within the filter length. (Note that the impulse response is not just the time response for the reasons Jeff described but can be derived from it)

An FIR filter can be viewed as a tapped delay line where each tap goes to a multipier with a stored coefficient and the results of all the tap*coefficient products go to an adder and the sum is the output.

If you imagine stimulating a room with an ideal impulse that consists of a single sample with the value of 1 (full-scale in this case) an (ideal) microphone in an ideal anechoic room would pick up that one sample at some fixed delay due to the distance from the impulse source. No other samples would be picked up because all the reflections are absorbed. To reproduce this we could used a tapped delay line and set the coefficient corresponding to the delay to 1 and all others to zero. If the source dropped a little due to the distance the coefficient would be lower. If we then feed an electrical impuse to the deay line of one sample, we will get a response identical to the measured response (with the same delay).

If we are in a anechoic room with one reflecting surface introduced, instead of the microphone picking-up only one sample, we will get the original sample plus one echo that is slightly more delayed than the direct “sound” was (I know you can’t get a single sample to propigate through a real medium but hang with me) and with some additional loss due to absorbtion in the boundry as well as the increase “inverse square” loss caused by the greater distance that the echo signal has to travel compared to the direct signal (also responsible for the greater delay).

By setting the delay line tap coefficient (for that amount of delay) to the magnitude (and sign) of the echo you will then get two samples to show up when you feed a single sample into the input. One at the original delay at the original magnitude of the direct signal, the other with the delay and attenuation that was measured for the echo. Since all the other coefficients in our example are zero and the delay for each tap is different each response will be distinct.

If instead of one impulse, we feed two impulses into the input of the delay line that were spaced exactly the same as the difference between the direct sound and the echo we would get three pulses out. The first would be the output (due to the first impulse) corresponding to the initial delay and would be the measured magnitude (since it is the only output at this time). At the time when that first impulse hits the second active tap however the second impulse is just hitting the first tap so the output will correspond to the sum of the first impulse times the echo coefficient and the second impulse times the initial delay coefficient. When the second impulse reaches the echo tap it will be reproduced at the measured magnitude (for the echo) since there is no longer any sample at the location of the initial delay.

Getting closer to reality, we know that real signals have a continuous train of samples of various magnitudes happening all the time so the output of our delay line will contain the sum of many samples multiplied by their coefficients. We also know that real environments are much more complex and there will be lots more echos.

What is not immediately obvious is that this approch does in fact describe the frequency response as well as the time response. This could be understood by the fact that anything which modifies the time response will modify the spectrum of the signal when it is analyzed (by converting the time domain signal to the frequency domain). It should also be understood that since the time and frequency domain are simply different ways of describing the same signal, that any change in the time domain will affect the frequency response and vis versa. A useful rule of thumb is that anything that is narrow in the time domain (think spike) is wide in the frequency domain (think white noise) and anything that is narrow in the frequency domain (think sine wave) is wide in the time domain (continuous over the duration of the analysis).

There are practical limitations of these things that are significant. Part of the problem is dealing with the finite length of a filter in practical systems. For an FIR filter to have a truly “brick-wall” cut-off, the filter must have an infinite number of taps. Worse than that, if a device were to have a “perfect” response it would have to begin having an output before it actually recieves the input. This is a major difficulty but since perfection is not necessary we can get around that.

To reduce the aberrations induced by filters that are not infinite in length the coefficients (the scaled versions of the impulse response) are windowed (think of it as fading in and out at the beginning and end) to avoid discontinuities. The longer the delay line is, the more accurate it can be but the intrinsic delay is only the delay to the first active tap (with a sufficiently large coefficient) plus the time needed for the calculations. The more taps you have, the more multiplications and additions you have. Since you need all of the calculation of all the taps to be done within one sample interval for real-time operation the calculations themselves cannot add latency (but they can make your system choke if it is not fast enough).

While I have not used the plug-in I think it is possible that the source of the latency in SIR is the direct or “dry” signal portion of the impulse response. If used as a totally “wet” reverb (in an aux channel for instance) latency (to the first echo) would have to be huge to be an issue. The operating principle of reverb IS delay.

The direct signal would initially seem to be simple, just feed it through to the output. This would work fine if you did not want the processor to modify the direct frequency response of the signal the way it occurred when the impulse response was measured. You cannot simply truncate everything before the first peak of the impulse response. You need to have a more gradual rise. To simulate having an output that begins before you have an input you can deliberately introduce delay and program the coefficients to rise more gradually at the beginning of the impulse. This means that the signal will spend some time in the delay line before the impulse response peaks to maximum output for the direct signal. This is likely to be the source of delay and may depend on the impulse response being modelled.

I suspect that if SIR was only used for “wet” reverb it would be possible to remove that portion of the impuse related to the direct sound and eliminate the excess delay.

If you managed to read through this all, go get yourself a cookie. If you understood it, take yourself to dinner.

(More than) Enough for now,
Jim

Sloom · March 16, 2006, 3:06am

I read half of the first paragraph!

woxnerw · March 16, 2006, 12:44pm

What a nice detailed report on this topic… You’re right sloom, if I re-read that post a few more times I still wouldn’t grasp the ideas that jimbob has presented there… This report should be in the “wiki” so that a “Search” would bring IT up if someone were to want to read up on how SIR as a VST Effect responds when applied to a .wav file on the n-Track timeline…

jimbob’s contribution to this Board along with several other “Posters” wouldn’t be “What-It-Is”… without them…

Bill…

vanclan · March 16, 2006, 2:41pm

I agree with Sloom.

Sloom · March 16, 2006, 10:20pm

You see, this is why I’m a bass player. Most of the time, I only have to count up to, or sub-divide within, “4”. That’s as techie as it gets here, folks!

But we’re glad you’re around, Jimbob (Somebody has to do it)!

learjeff · March 16, 2006, 11:13pm

Jimbob, I’m pretty sure it’s an IIR, not FIR, because it’s recursive. (The output of the filter is fed into the input next timeslice.) If we had infinite bit width computers, the response to an impulse would be infinite. Of course, you’d have to record an infinitely long response file to calculate the kernel, but the kernel can be finite and the impulse infinite. Not that any of this matters.

An FIR is like a tapped delay, just as you say. The output does not affect the input; it’s nonrecursive.

As it turns out, SIR has a fixed latency of 8960 samples. Theoretically speaking, the latency of a convolution engine is equal to the length of the filter kernel, as I said above. Each element in the filter kernel exxentially governs how much of the nth-earlier sample to add to the current output. So it can’t possibly emit any output intil it’s received as many samples as there are elements in the kernel.

Evidently SIR uses a fixed 8960-term filter kernel. Well, that’s my theory!

Also, from SIR’s web page, I find that the “response files” are actually just that: impulse response recordings, rather than computed kernels. SIR must compute the kernel when you load the response file. That’s way cool, because it means you can fiddle with the response wave file to generate whatever kind of response you want from SIR. Way cool.

Regards,
Jeff

jimbob · March 17, 2006, 12:07am

That is certainly possible but not the easiest way to implement an arbitrary impulse response. An FIR can implement the (windowed) impulse response directly and probably has lower latency as well as having no stability issues (whenever you feed the output back to the input you have to be careful). Most (if not all) echo cancellers use FIR filters. An echo-canceller (used in telephony primarily) models the “reverb” (undesired in this case) then subtracts it from the signal (containing echo) so that the echoes are cancelled and the signal becomes “dry”. (The opposite of what we typically do in mixing but a potentially useful concept even in our application, imagine de-reverb-ing a track recorded in a bad room or subtracting bleed between channels in a recording.)

An IIR is normally used to reduce the complexity of the filter components by “re-using” the registers and multi-pliers to allow an impulse response longer than the number of registers. In exchange for that it places certain constraints on the performance of the filter.

For a reverb recirculation would potentially work well for secondary reflections but I would assume that the last first-order reflection would have to fit within the kernal length. This would suggest that each “single-bounce” reflection would have to fit within the first 200 ms. (8960 samples at 22 usec/sample) in order to contribute to the impulse response (at 44.1 kHz.). The contributions of multiple bounce reflections would seem to me to also only be counted if they fell within the same interval (which would get shorter at higher sampling rates for a fixed kernel size). For small rooms it could work well, but how would it handle that slap echo off the scoreboard in a stadium that shows up for the first time 300 ms after the sound is generated?

I’m not a math guy but it seems to be the hard way to do it (If I was a math guy it might seem trivial).

Jim

learjeff · March 17, 2006, 12:21am

I was mistaken: SIR is indeed an FIR filter.

Convolutions can do very nonintuitive things, including returning the nth order derivitive or definite integrals. I used one once (long ago) to determine the time at which a solenoid triggered, while watching the voltage on the circuit and detecting where the 2nd derivitive went negative (or something like that). Fortunately, I had the help of a better educated soul, who calculated the filter kernel for me.

woxnerw · March 17, 2006, 12:54am

Hi sloom:
IT was a few years ago now… But on another Audio Board… a poll was taken and a breakdown was posted on what instrument the majority of DAW/Workstation owners replied to the poll…

IT turned out that Bass Players topped the list with Drummers a close second. Followed by other instrument players… With guitar players being somewhat down the list.

I can’t remember the breakdown, but there were a lot of replies to the topic…

It seemed as though there was something about bass players and drummer’s mental make-up, that gets them into spending money on recording equipment…

Bill…

jimbob · March 17, 2006, 2:09am

I’m a harmonica player. I should be the most portable member of the band but I tend toward perfectionism and know more than is good for me. As a result I assembled a really high-quality small-venue PA and while not all that big, it is a pain to haul. I got into multi-track recording because my board was digital and it was a relatively small incremental cost to add the interface card (Frontier Designs Dakota) to my PC. My PA speakers are studio-monitor quality (Tannoy T12s) so they serve both functions.

Things have ballooned a bit since then and now I record and produce for a couple of different bands as well as for a few friends. So far noone with drums but eventually I will have to learn how to deal with that in the “studio” (as opposed to the live environment which I have some experience with).

Jim