Binaural audio

mono to binaural


Anyone have any experience “converting” a mono signal into a binaural one?
I’m specifically thinking about a possible use for binaural audio in improving the intelligibility of speech in a noisy environment, I’m not interested in the fancy “surround sound” thing, although similar techniques may be applicable.
So far, my internet searches have turned up many pages of what looks like alphabet soup mixed with pasta shapes, written by professor somebody at a place called .edu

I thought I’d get more sense here, where there’s less pasta and more Praktika. :)


I’m not real sure what you mean by binaural Steve.

Many, many years ago, I fixed a radio for a friend of mine, it claimed to have binaural sound.

When I opened it up I found a 2 inch speaker with two toilet roll tubes attached to it. One was stuck to the front of the speaker, and one to the back. The other ends of the tubes were near two grills on the radio’s case. So all it did was produce two mono outputs 180 degrees out of phase with each other.

I’m not sure how it was supposed to sound, but to me it sounded like a 2 inch speaker with 2 toilet rolls glued to it.

The other meaning of binaural I know about is when a dummy head or similar configuration is used in recording. It was quite popular in the 70’s - early 80’s, but I haven’t heard anything about it for a long whiles.

It consisted of a stereo pair of mikes, but set up so that it mimicked the human hearing system. It was quite effective when listened to thru headphones.

But you must be talking about another meaning of binaural, so perhaps you could explain more?


I tried to edit my previous post, but as usual, it wouldn’t let me.

This is one link that explains one of the meanings of binaural (but not the toilet roll one, you’ll have to search for that yourself :D)

Edit: This time I can edit!

but hey, I’ve just remembered…

Are you talking about the Qsound processing?

I have those QSound plugins, but I’ve never used them. But I seem to recall that they claim to produce a psuedo stereo output from a mono input.

I’ll see if they’re still installed and have a play with them.

Can’t edit again. This is getting old.

This is from the Qtools help file:



The QSound processes provided in this tool set allow the placement of sound sources well beyond the bounds of conventional stereo. There are three fundamentally different ways to take advantage of this capability.

First, it is possible to process a true stereo signal in such a way as to expand the perceived width of the sound stage. This is the purpose of QXpander/AX (QX/AX).

Second, it is also possible to take a single mono stream (most typically a single sound but also perhaps a mix of sounds) and ‘place’ this sound at a precise location. In this application, the QSound process can be thought of as a ‘super panner’. Like the pan control or ‘pan pot’ on a stereo mixing board, the QSound process allows placement of the sound location at the discretion of the operator. This is the purpose of QSys/AX.

Third, it is possible to process a mono signal in such a way as to create a pseudo stereo or pseudo 3D stereo signal. This is the purpose of Q123/AX. (One-to-three, meaning mono-to-3D.)

It looks like the third one, Q123/AX is the one you want Steve.

If you can’t find a trial version online, PM me. I have a fairly old (1998) trial version.


Thanks for that Gizmo. That Qsound stuff looks (and sounds) interesting and I actually have a similar thing on my built-in pc soundcard. However, that’s not quite what I’m after.

What I’m trying to do may well be impossible, but that doesn’t mean I won’t have a go!

I’m trying to separate human speech from noise (white noise) spatially by “spreading out” the whole thing like panning in a mix.

The noise, being random should be evenly spaced across the left-to-right soundstage, but the voice should occupy a discrete location somewhere in the middle, thereby giving it a 3db advantage.

I think this should be possible by cloning a mono track, inverting and then applying phase shift to one of them. The problem being that the phase shift would need to be corrected for frequency in order to get the time delay right (so that the brain thinks the speech occupies just one point in space)

I suppose I want to pan the noise hard left/right and leave the voice in the middle!

Oh, and I want to do it in real time, in hardware.


Beefy, I’m trying understand what you are thinking about…

If a monaural source is split into two identical audio streams and one is phase shifted 180 degrees and is played as two stereo channels, because of phase cancellation a listener who is properly positioned in the sound field would hear reduced noise. IF a frequency dependent phase shift could be applied, then it might be possible to apply a different shift to left and right channel’s frequency bearing components so that speech frequencies would not cancel out and would appear to be localized in the soundfield at a single point.

Is that what you are thinking about? If so, it would be a tough go technically.

Don’t know a lot about digital signal processing theory, but I do know that as far as white noise removal goes, it is theoretically tough to beat a simple moving average filter. If the you are talking about ambient noise, the moving average filter will suck, but it will be tough to beat a good multiband filter with lots of bands. Especially if you want to deal with the noise in real time. If you really meant real time, that also implies that you might be able attack the problem at its source… There is no way to isolate the voice better as it is being recorded?



The noise, being random should be evenly spaced across the left-to-right soundstage,

I don’t think so Steve. If the source is mono, then any particular noise spike is going to be centrally positioned if it’s played from two speakers, just like every other signal on the mono source.

And I think that’s what T is saying too, (but he’s saying it better than me :D).

I think there are algorithms for helping to isolate an ‘intelligent’ signal from random noise, but how they work, I really don’t know, but I suspect it’s a lot more complicated than just summing two out of phase signals.

Anyway, good luck, and let us know the results. :)

Still can’t edit my own posts. :(

Anyway, to clarify…

If the noise is produced after the sound is split, then sure, any particular noise spike will be coming out of one speaker or the other…

But if the noise is on the original mono source, then it will appear on both speakers at the same amplitude, just like the wanted voice, sorry.

But keep on working at it, there’s a fortune to be made for the guy who can successfully eliminate noise. :D


Yep, T is saying it better than me too!

Imagine listening to a stereo recording of an orchestra. If the engineers have done their job correctly, you should be able to identify the location of each instrument in space, just as if you were in the auditorium.
Once you can do that, the brain can focus on a particular “spot” and hear things which are actually quieter than the surrounding instruments.
Once you know where the triangle player is, you can hear the triangle.

That’s basically what I’m aiming for, but for orchestra substitute white noise, and for triangle substitute human voice.

However, the material I am working with is a noisy mono radio comms channel, so I have to construct the soundstage myself.

I did do some experiments with just high and low pass filters on the left and right channels respectively, and it did seem to do something but I can’t say it actually made the voice more intelligible :D

Looks like it’s a project for those long winter evenings.


Get these two plug-ins and play with them:

MDA Image
MDA Stereo Simulator

I know what you’re saying Steve, with a stereo source then there is positional information, and there might be a way of isolating an instrument from noise using that fact. (but darned if I could figure out how to do it).

But what I’m saying is, with a mono source, that positional information is not there, and if it ain’t there to start with, you can’t recover it.

And the noise on the recording is not random. Sure, it was generated randomly, but now it’s on the recording, it’s fixed; it’s down there in ones and zeros. And no matter how many channels you convert the original mono source into, every darn channel will be an exact copy of the exact same noise too.

Just think of your mono source as a waveform. That waveform was created by summing the voice and the noise, but now it’s summed, there’s no way of unsumming it. No matter how long you stir green paint, it’ll never separate into yellow and blue again.

The only way I can see of decomposing the waveform, is by amplitude…find the average level of the voice component, and then remove anything that is far above or far below that level,

And by frequency…identify the main frequency components of the voice, then notch out everything else.

But positional information? Nope, it’s not there to start with.

But I’d sure love to see you prove me wrong. :)


Phoo, thanks for reminding me about those plugs, I’d forgotten all about them. I already have them .
I’ll do some fiddling with those and see what happens.

Gizmo, I’m not trying to recover positional info, I’m trying to artificially create it. Neither am I trying to reduce noise. As you rightly say, every copy of the mono source will contain exactly the same noise. I’m trying to give the brain a little helping hand to ignore the noise, or at least concentrate more on the wanted signal.

Have you ever been in a noisy bar and had a conversation across the room? There was no way you could have because the noise level was way above the level of your voice, yet you did. How?



Have you ever been in a noisy bar and had a conversation across the room? There was no way you could have because the noise level was way above the level of your voice, yet you did. How?

Just a quick answer Steve, because I’m working this eve.

This is what I think happens, but I could be wrong, I often am.

Positional information, and the ability of the brain to isolate sound coming from a certain position. And that is determined by amplitude differences at each ear and different arrival times at each ear. And that is especially true with the HF components and the transients. Ever noticed how difficult it is to work out where a constant tone is coming from?

Anyway, that’s called the cocktail party effect I think, and old guys like me can’t do it anymore.

I can no longer pick out a conversation on the other side of the room. I can no longer hear a quiet sound in a noisy room. My HF is gone. I’m not deaf, I can still hear a pin drop in a quiet room. but I can’t ‘focus’ anymore. The phase difference at LF is not large enough to give my brain the right clues, and I can’t hear the HF well enough.

Anyway, that’s what I think it is, but as I said, I’m probably wrong.

I’d like to discuss this more Steve, you have some interesting ideas and I’d like to see where they’re going, but gotta be FOH at a gig tonight, so I’ll write more tomorrow, that’s assuming I ain’t gone totally deaf in the meantime. :D

The human ear, Steve, is capable of directional mic-ing so to speak. But with age the ear along with everything else actually dries out and is less capable of movement. It’s a muscle thing. Anyways, such as the eye begins to have trouble focusing on the close up reading, the ear has more trouble directional mic-ing. I know this sounds like BS, but it isn’t.

I never knew that Yazmiester. I thought it was just due to amplitude and time/phase delay at each ear. But my wife always reckons I’ve got rabbit genes somewhere. :D
So thanks for the info, I figure that a day I learn something new is day that don’t count against my lifespan. :)