pi3r · November 15, 2010 20:43
diff --git a/Sound explanation #jack b/Sound explanation #jack
 <las> pi3r: PCM audio data is really very, very simple
 <las> pi3r: the simplest case is a mono (single channel) data buffer. to "mix" two such buffers you really do just add the values in each buffer
 <pi3r> but it's going to "overflow"
 <las> pi3r: interleaved data is slightly more complex, but not for mixing purposes if they are both interleaved: you still just add them
 <pi3r> or a thing la that?
 <pi3r> s/la/like
 <las> pi3r: the overflow issue is why JACK and most other sensible APIs use floating point
 <las> pi3r: for integer format samples, yes, overflow is potentially an issue, and there is no simple, trivial "always do this" rule that can be used 
 <pi3r> so if i have 2 totally different buffer coming from 2 different input (but with the same format)
 <las> pi3r: digital audio engineering evolved a variety of different solutions, none of which are compelling. float is compelling so that is what we use
 <pi3r> i can do finalBuf[i] = buf1[i] + buf2[i]
 <pi3r> ?
 <las> pi3r: if its floating point data, then certainly, yes, you can do that
 <pi3r> las, it seems so easy, i'm reading about that and i can't understand how it work
 <las> pi3r: you don't understand enough about time domain and frequency domain :)
 <pi3r> hum, clearly not
 <pi3r> :D
 <las> pi3r: then you should just accept for now that it Just Works That Way
 <pi3r> but it's hard :D
 <pi3r> i want to understand ! maybe it's why i don't
 <pi3r> :D
 <las> pi3r: i can try to explain it quickly
 <las> pi3r: lets start with two actual acoustic pressure waves approaching your ear from sources
 <pi3r> i don't want to bother you but if you have time
 <las> pi3r: the air pressure is varying, becoming greater and lesser 
 <pi3r> yeah
 <las> pi3r: (actually, lets just start with 1 acoustic pressure wave)
 <pi3r> ok
 <las> pi3r: now, there is only wave, only one set of variations in air pressure approaching your ear
 <las> pi3r: but, as represented by the elegance of Fourier's theorem, this single wave can represent the summation of all the individual waves used to create the recording
 <las> pi3r: that is, within that one acoustic pressure wave's variations are *ALL* the variations of the (say) bass guitar, piano, singer and theremin
 <pi3r> hum
 <las> pi3r: your ear is able to carry out an analog equivalent of fourier analysis, extract all the frequency components (well, not all, but thats a physiological issue) and thus you can hear individual distinct sounds
 <las> pi3r: even though only a single pressure wave arrived at your ear
 <las> pi3r: so, lets step back to a simple setup where there are two sources making sounds. if you want, make them nice and simple. the first generates an acoustic pressure wave that varies at 100Hz
 <pi3r> hum it's like i can visualize me ear doing that :)
 <las> pi3r: the second does the same at 200Hz
 <pi3r> s/me/my
 <las> pi3r: do some magic so that we can ignore directional complications
 <las> pi3r: now the waves are both flowing toward your ear
 <las> pi3r: what happens? you end up with a single acoustic pressure wave containing variations in air pressure that correspond to *both* sound sources
 <pi3r> hum seems logical
 <las> pi3r: ok
 <pi3r> now
 <las> pi3r: that is "mixing" in the analog domain. it consists of simply adding the two pressures together
 <pi3r> my ears are not threaded
 <las> pi3r: actually, your ears are extremely threaded
 <pi3r> arf
 <las> pi3r: they run a large frequency-dependent sensors in parallel
 <las> pi3r: so, when the air pressure in one wave was "low" and the other one was "high", if they were the same amplitude, we add them together, and we get "high"
 <las> pi3r: if they were both "high pressure" at a particular point, we'd add them and get an even higher pressure point
 <las> pi3r: if they were both low, we'd add them and get an even lower pressure point
 <nedko> if jack_lsp says that it cannot connect to jack server and you are sure that jack server is running, then your jack setup is broken :P
 <pi3r> hum
 <nedko> _nusse: ^^^
 <las> pi3r: so, now convert the pressure wave to digital with a converter that measures the pressure value at any point in time
 <las> pi3r: you get a series of numbers
 <las> pi3r: if you do this with each individual sound source, you will two series of numbers that represent the air pressure of each sound over time
 * deronnax ([email protected]) has joined #jack
 <las> pi3r: fundamentally, this is what the signal chain of "microphone -> mic-preamp -> analog-to-digital converter" is doing
 <las> pi3r: when the pressure is high, it will be represented by a number closer +1.0
 <las> pi3r: when the pressure is low, it will be represented by a number closer -1.0
 <las> pi3r: when the pressure is the same as that of the static air, it will be represented by a number close to 0.0
 <pi3r> hum i was missing the link between time and air pressure
 <las> pi3r: take the two digital representations of the two different sounds and mix them
 <las> pi3r: all you have to do is add up the numbers
 <las> pi3r: the result is a new set of numbers representing the time doman variation in air pressure
 <pi3r> i understand why working with float is so important
 <pi3r> now
 <pi3r> ~logs
 <pi3r> so if i hit an overflow, it's really bad luck
 <pi3r> ss/it/if
 <las> pi3r: correct. 
 <las> pi3r: the most common approach with integers involves dividing each signal by the number of signals
 <las> pi3r: this more or less ensures no overflow, but it doesn't actually produce the desired effect most of the time (both signals become notably quieter)
 <pi3r> then i'll need to resample
 <pi3r> right?
 <las> pi3r: eh? you mean convert to float?
 <pi3r> not resample sorry
 <pi3r> normalize
 <JackWinter> pi3r: since you want to send the signal to jack and jack expects floating point, you might as well do that before mixing :)
 <las> pi3r: yes
 <las> pi3r: JACK expects normalized floating point
 <pi3r> right
 <las> pi3r: there are some contentious minor details about normalization, but the basic approach is to divide by (2^nbits)-1 where nbits is your integer sample bit width (8, 16, 24 typically)
 <pi3r> JackWinter, but the buffers are coming from jack, so there are already normalized? right?
 <pi3r> las, dully noted
	<las> pi3r: PCM audio data is really very, very simple
	<las> pi3r: the simplest case is a mono (single channel) data buffer. to "mix" two such buffers you really do just add the values in each buffer
	<pi3r> but it's going to "overflow"
	<las> pi3r: interleaved data is slightly more complex, but not for mixing purposes if they are both interleaved: you still just add them
	<pi3r> or a thing la that?
	<pi3r> s/la/like
	<las> pi3r: the overflow issue is why JACK and most other sensible APIs use floating point
	<las> pi3r: for integer format samples, yes, overflow is potentially an issue, and there is no simple, trivial "always do this" rule that can be used
	<pi3r> so if i have 2 totally different buffer coming from 2 different input (but with the same format)
	<las> pi3r: digital audio engineering evolved a variety of different solutions, none of which are compelling. float is compelling so that is what we use
	<pi3r> i can do finalBuf[i] = buf1[i] + buf2[i]
	<pi3r> ?
	<las> pi3r: if its floating point data, then certainly, yes, you can do that
	<pi3r> las, it seems so easy, i'm reading about that and i can't understand how it work
	<las> pi3r: you don't understand enough about time domain and frequency domain :)
	<pi3r> hum, clearly not
	<pi3r> :D
	<las> pi3r: then you should just accept for now that it Just Works That Way
	<pi3r> but it's hard :D
	<pi3r> i want to understand ! maybe it's why i don't
	<pi3r> :D
	<las> pi3r: i can try to explain it quickly
	<las> pi3r: lets start with two actual acoustic pressure waves approaching your ear from sources
	<pi3r> i don't want to bother you but if you have time
	<las> pi3r: the air pressure is varying, becoming greater and lesser
	<pi3r> yeah
	<las> pi3r: (actually, lets just start with 1 acoustic pressure wave)
	<pi3r> ok
	<las> pi3r: now, there is only wave, only one set of variations in air pressure approaching your ear
	<las> pi3r: but, as represented by the elegance of Fourier's theorem, this single wave can represent the summation of all the individual waves used to create the recording
	<las> pi3r: that is, within that one acoustic pressure wave's variations are ALL the variations of the (say) bass guitar, piano, singer and theremin
	<pi3r> hum
	<las> pi3r: your ear is able to carry out an analog equivalent of fourier analysis, extract all the frequency components (well, not all, but thats a physiological issue) and thus you can hear individual distinct sounds
	<las> pi3r: even though only a single pressure wave arrived at your ear
	<las> pi3r: so, lets step back to a simple setup where there are two sources making sounds. if you want, make them nice and simple. the first generates an acoustic pressure wave that varies at 100Hz
	<pi3r> hum it's like i can visualize me ear doing that :)
	<las> pi3r: the second does the same at 200Hz
	<pi3r> s/me/my
	<las> pi3r: do some magic so that we can ignore directional complications
	<las> pi3r: now the waves are both flowing toward your ear
	<las> pi3r: what happens? you end up with a single acoustic pressure wave containing variations in air pressure that correspond to both sound sources
	<pi3r> hum seems logical
	<las> pi3r: ok
	<pi3r> now
	<las> pi3r: that is "mixing" in the analog domain. it consists of simply adding the two pressures together
	<pi3r> my ears are not threaded
	<las> pi3r: actually, your ears are extremely threaded
	<pi3r> arf
	<las> pi3r: they run a large frequency-dependent sensors in parallel
	<las> pi3r: so, when the air pressure in one wave was "low" and the other one was "high", if they were the same amplitude, we add them together, and we get "high"
	<las> pi3r: if they were both "high pressure" at a particular point, we'd add them and get an even higher pressure point
	<las> pi3r: if they were both low, we'd add them and get an even lower pressure point
	<nedko> if jack_lsp says that it cannot connect to jack server and you are sure that jack server is running, then your jack setup is broken :P
	<pi3r> hum
	<nedko> _nusse: ^^^
	<las> pi3r: so, now convert the pressure wave to digital with a converter that measures the pressure value at any point in time
	<las> pi3r: you get a series of numbers
	<las> pi3r: if you do this with each individual sound source, you will two series of numbers that represent the air pressure of each sound over time
	* deronnax ([email protected]) has joined #jack
	<las> pi3r: fundamentally, this is what the signal chain of "microphone -> mic-preamp -> analog-to-digital converter" is doing
	<las> pi3r: when the pressure is high, it will be represented by a number closer +1.0
	<las> pi3r: when the pressure is low, it will be represented by a number closer -1.0
	<las> pi3r: when the pressure is the same as that of the static air, it will be represented by a number close to 0.0
	<pi3r> hum i was missing the link between time and air pressure
	<las> pi3r: take the two digital representations of the two different sounds and mix them
	<las> pi3r: all you have to do is add up the numbers
	<las> pi3r: the result is a new set of numbers representing the time doman variation in air pressure
	<pi3r> i understand why working with float is so important
	<pi3r> now
	<pi3r> ~logs
	<pi3r> so if i hit an overflow, it's really bad luck
	<pi3r> ss/it/if
	<las> pi3r: correct.
	<las> pi3r: the most common approach with integers involves dividing each signal by the number of signals
	<las> pi3r: this more or less ensures no overflow, but it doesn't actually produce the desired effect most of the time (both signals become notably quieter)
	<pi3r> then i'll need to resample
	<pi3r> right?
	<las> pi3r: eh? you mean convert to float?
	<pi3r> not resample sorry
	<pi3r> normalize
	<JackWinter> pi3r: since you want to send the signal to jack and jack expects floating point, you might as well do that before mixing :)
	<las> pi3r: yes
	<las> pi3r: JACK expects normalized floating point
	<pi3r> right
	<las> pi3r: there are some contentious minor details about normalization, but the basic approach is to divide by (2^nbits)-1 where nbits is your integer sample bit width (8, 16, 24 typically)
	<pi3r> JackWinter, but the buffers are coming from jack, so there are already normalized? right?
	<pi3r> las, dully noted