Skip to content

Instantly share code, notes, and snippets.

@pi3r
Created November 15, 2010 20:43
Show Gist options
  • Save pi3r/700905 to your computer and use it in GitHub Desktop.
Save pi3r/700905 to your computer and use it in GitHub Desktop.
<las> pi3r: PCM audio data is really very, very simple
<las> pi3r: the simplest case is a mono (single channel) data buffer. to "mix" two such buffers you really do just add the values in each buffer
<pi3r> but it's going to "overflow"
<las> pi3r: interleaved data is slightly more complex, but not for mixing purposes if they are both interleaved: you still just add them
<pi3r> or a thing la that?
<pi3r> s/la/like
<las> pi3r: the overflow issue is why JACK and most other sensible APIs use floating point
<las> pi3r: for integer format samples, yes, overflow is potentially an issue, and there is no simple, trivial "always do this" rule that can be used
<pi3r> so if i have 2 totally different buffer coming from 2 different input (but with the same format)
<las> pi3r: digital audio engineering evolved a variety of different solutions, none of which are compelling. float is compelling so that is what we use
<pi3r> i can do finalBuf[i] = buf1[i] + buf2[i]
<pi3r> ?
<las> pi3r: if its floating point data, then certainly, yes, you can do that
<pi3r> las, it seems so easy, i'm reading about that and i can't understand how it work
<las> pi3r: you don't understand enough about time domain and frequency domain :)
<pi3r> hum, clearly not
<pi3r> :D
<las> pi3r: then you should just accept for now that it Just Works That Way
<pi3r> but it's hard :D
<pi3r> i want to understand ! maybe it's why i don't
<pi3r> :D
<las> pi3r: i can try to explain it quickly
<las> pi3r: lets start with two actual acoustic pressure waves approaching your ear from sources
<pi3r> i don't want to bother you but if you have time
<las> pi3r: the air pressure is varying, becoming greater and lesser
<pi3r> yeah
<las> pi3r: (actually, lets just start with 1 acoustic pressure wave)
<pi3r> ok
<las> pi3r: now, there is only wave, only one set of variations in air pressure approaching your ear
<las> pi3r: but, as represented by the elegance of Fourier's theorem, this single wave can represent the summation of all the individual waves used to create the recording
<las> pi3r: that is, within that one acoustic pressure wave's variations are *ALL* the variations of the (say) bass guitar, piano, singer and theremin
<pi3r> hum
<las> pi3r: your ear is able to carry out an analog equivalent of fourier analysis, extract all the frequency components (well, not all, but thats a physiological issue) and thus you can hear individual distinct sounds
<las> pi3r: even though only a single pressure wave arrived at your ear
<las> pi3r: so, lets step back to a simple setup where there are two sources making sounds. if you want, make them nice and simple. the first generates an acoustic pressure wave that varies at 100Hz
<pi3r> hum it's like i can visualize me ear doing that :)
<las> pi3r: the second does the same at 200Hz
<pi3r> s/me/my
<las> pi3r: do some magic so that we can ignore directional complications
<las> pi3r: now the waves are both flowing toward your ear
<las> pi3r: what happens? you end up with a single acoustic pressure wave containing variations in air pressure that correspond to *both* sound sources
<pi3r> hum seems logical
<las> pi3r: ok
<pi3r> now
<las> pi3r: that is "mixing" in the analog domain. it consists of simply adding the two pressures together
<pi3r> my ears are not threaded
<las> pi3r: actually, your ears are extremely threaded
<pi3r> arf
<las> pi3r: they run a large frequency-dependent sensors in parallel
<las> pi3r: so, when the air pressure in one wave was "low" and the other one was "high", if they were the same amplitude, we add them together, and we get "high"
<las> pi3r: if they were both "high pressure" at a particular point, we'd add them and get an even higher pressure point
<las> pi3r: if they were both low, we'd add them and get an even lower pressure point
<nedko> if jack_lsp says that it cannot connect to jack server and you are sure that jack server is running, then your jack setup is broken :P
<pi3r> hum
<nedko> _nusse: ^^^
<las> pi3r: so, now convert the pressure wave to digital with a converter that measures the pressure value at any point in time
<las> pi3r: you get a series of numbers
<las> pi3r: if you do this with each individual sound source, you will two series of numbers that represent the air pressure of each sound over time
* deronnax ([email protected]) has joined #jack
<las> pi3r: fundamentally, this is what the signal chain of "microphone -> mic-preamp -> analog-to-digital converter" is doing
<las> pi3r: when the pressure is high, it will be represented by a number closer +1.0
<las> pi3r: when the pressure is low, it will be represented by a number closer -1.0
<las> pi3r: when the pressure is the same as that of the static air, it will be represented by a number close to 0.0
<pi3r> hum i was missing the link between time and air pressure
<las> pi3r: take the two digital representations of the two different sounds and mix them
<las> pi3r: all you have to do is add up the numbers
<las> pi3r: the result is a new set of numbers representing the time doman variation in air pressure
<pi3r> i understand why working with float is so important
<pi3r> now
<pi3r> ~logs
<pi3r> so if i hit an overflow, it's really bad luck
<pi3r> ss/it/if
<las> pi3r: correct.
<las> pi3r: the most common approach with integers involves dividing each signal by the number of signals
<las> pi3r: this more or less ensures no overflow, but it doesn't actually produce the desired effect most of the time (both signals become notably quieter)
<pi3r> then i'll need to resample
<pi3r> right?
<las> pi3r: eh? you mean convert to float?
<pi3r> not resample sorry
<pi3r> normalize
<JackWinter> pi3r: since you want to send the signal to jack and jack expects floating point, you might as well do that before mixing :)
<las> pi3r: yes
<las> pi3r: JACK expects normalized floating point
<pi3r> right
<las> pi3r: there are some contentious minor details about normalization, but the basic approach is to divide by (2^nbits)-1 where nbits is your integer sample bit width (8, 16, 24 typically)
<pi3r> JackWinter, but the buffers are coming from jack, so there are already normalized? right?
<pi3r> las, dully noted
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment