endolith/WAV interpretation.md

Last active June 27, 2024 17:02

Star (10) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/endolith/e8597a58bcd11a6462f33fa8eb75c43d.js"></script>
Save endolith/e8597a58bcd11a6462f33fa8eb75c43d to your computer and use it in GitHub Desktop.

Download ZIP

Interpretation of WAV file sample data and asymmetry

Raw

WAV interpretation.md

How to handle asymmetry of WAV data?

WAV files can store PCM audio (WAVE_FORMAT_PCM). The WAV file format specification says:

The data format and maximum and minimums values for PCM waveform samples of various sizes are as follows:

Sample Size Data Format Maximum Value Minimum Value

One to eight bits Unsigned integer 255 (0xFF) 0

Nine or more bits Signed integer i Largest positive value of i Most negative value of i

For example, the maximum, minimum, and midpoint values for 8-bit and 16-bit PCM waveform data are as follows:

Format Maximum Value Minimum Value Midpoint Value

8-bit PCM 255 (0xFF) 0 128 (0x80)

16-bit PCM 32767 (0x7FFF) -32768 (-0x8000) 0

Sample Size	Data Format	Maximum Value	Minimum Value
One to eight bits	Unsigned integer	255 (0xFF)	0
Nine or more bits	Signed integer i	Largest positive value of i	Most negative value of i

Format	Maximum Value	Minimum Value	Midpoint Value
8-bit PCM	255 (0xFF)	0	128 (0x80)
16-bit PCM	32767 (0x7FFF)	-32768 (-0x8000)	0

Both the signed and unsigned formats are asymmetrical. How to handle the asymmetry? The signed version is two's complement representation, and AES17 defines the meaning of full-scale amplitude in this case:

amplitude of a 997-Hz sine wave whose positive peak value reaches the positive digital full scale, leaving the negative maximum code unused.

NOTE In 2's-complement representation, the negative peak is 1 LSB away from the negative maximum code.

As does IEC 61606-3:

amplitude of a 997 Hz sinusoid whose peak positive sample just reaches positive digital full-scale (in 2’s-complement a binary value of 0111…1111 to make up the word length) and whose peak negative sample just reaches a value one away from negative digital full-scale (1000…0001 to make up the word length) leaving the maximum negative code (1000…0000) unused

So, for example, for 16-bit audio, a signal that just reaches +32,767 and −32,767 would be full-scale, while one that reaches −32,768 exceeds full-scale.

The midpoint example for 8-bit clarifies that the symmetry of unsigned data is the same as for signed data. So, for 8-bit data, a signal that reaches from 1 to 255 would be full-scale, and the value 0 exceeds full-scale.

WAVE Audio File Format Specifications says:

For float data, full scale is 1.

So, to correctly convert signed ints to float, divide by 2**(b-1) - 1, where b is the number of bits.

To correctly convert unsigned ints to float, subtract 2**(b-1), then, similarly, divide by 2**(b-1) - 1.

The float representation will then be limited to +1.0 full-scale in the positive direction, but can exceed −1.0 full-scale in the negative direction.

Examples

Unsigned

WAV format actually allows for less than 8 bits:

The bits that represent the sample amplitude are stored in the most significant bits of i, and the remaining bits are set to zero.

So I'll show 2-bit audio first (wBitsPerSample = 2), because it's simpler to follow:

WAV	Sample	int	float	Comment
0xC0	0b11	3	+1.0	full-scale
0x80	0b10	2	0.0	midpoint
0x40	0b01	1	−1.0	full-scale
0x00	0b00	0	−2.0

For 8-bit audio, as mentioned above, 255 is full-scale, 128 is midpoint, 1 is negative full-scale, and 0 exceeds full-scale:

WAV	Sample	int	float	Comment
0xFF	0b1111_1111	255	+1.000	full-scale
0xFE	0b1111_1110	254	+0.992
0xFD	0b1111_1101	253	+0.984
...	...	...	...
0x82	0b1000_0010	130	+0.016
0x81	0b1000_0001	129	+0.008
0x80	0b1000_0000	128	0.000	midpoint
0x7F	0b0111_1111	127	−0.008
0x7E	0b0111_1110	126	−0.016
...	...	...	...
0x03	0b0000_0011	3	−0.984
0x02	0b0000_0010	2	−0.992
0x01	0b0000_0001	1	−1.000	full-scale
0x00	0b0000_0000	0	−1.008

Signed

For 16-bit audio, the interpretation is signed:

WAV	Sample	int	float	Comment
0x7FFF	0b0111_1111_1111_1111	+32,767	+1.00000	full-scale
0x7FFE	0b0111_1111_1111_1110	+32,766	+0.99997
0x7FFD	0b0111_1111_1111_1101	+32,765	+0.99994
...	...	...	...
0x0002	0b0000_0000_0000_0010	+2	+0.00006
0x0001	0b0000_0000_0000_0001	+1	+0.00003
0x0000	0b0000_0000_0000_0000	0	0.00000	midpoint
0xFFFF	0b1111_1111_1111_1111	−1	−0.00003
0xFFFE	0b1111_1111_1111_1110	−2	−0.00006
...	...	...	...
0x8003	0b1000_0000_0000_0011	−32,765	−0.99994
0x8002	0b1000_0000_0000_0010	−32,766	−0.99997
0x8001	0b1000_0000_0000_0001	−32,767	−1.00000	full-scale
0x8000	0b1000_0000_0000_0000	−32,768	−1.00003

As is 9-bit audio:

WAV	Sample	int	float	Comment
0x7F80	0b0111_1111_1	+255	+1.000	full-scale
0x7F00	0b0111_1111_0	+254	+0.996
0x7E80	0b0111_1110_1	+253	+0.992
...	...	...	...
0x0100	0b0000_0001_0	+2	+0.008
0x0080	0b0000_0000_1	+1	+0.004
0x0000	0b0000_0000_0	0	0.000	midpoint
0xFF80	0b1111_1111_1	−1	−0.004
0xFF00	0b1111_1111_0	−2	−0.008
...	...	...	...
0x8180	0b1000_0001_1	−253	−0.992
0x8100	0b1000_0001_0	−254	−0.996
0x8080	0b1000_0000_1	−255	−1.000	full-scale
0x8000	0b1000_0000_0	−256	−1.004

Author

endolith commented May 4, 2020

Also the wBitsPerSample field is 2 bytes, which means you could conceivably store data with 65535 bits per sample? Which is ridiculous. Even 255 bits per sample would be a dynamic range of 1,537 dB!

gavingc commented Aug 31, 2020

Hi,
What a great write-up.
Yesterday I came to exactly the same understanding, the WAV file format allows saving a file that is essentially invalid in other audio standards, or even just a sane application of symmetry.
Worse still a lot of texts encourage it.
I'm darn sure that a lot of software implements it.

The 8 kHz 16 bit PCM example on Wikipedia evidences this, the files contains the -32768 value!
https://en.wikipedia.org/wiki/WAV
8k16bitpcm.wav
http://www.nch.com.au/acm/8k16bitpcm.wav

I'm completing a signal processing course in my engineering degree and the first assignment requires a 16 bit audio sample to be iteratively quantised down bit by bit, all the way down to 1 bit, by removing the LSB.

Did you ever answer the question: How to handle asymmetry of WAV data?

Is the only fix to simply clip or compress the value that exceeds full-scale?

Author

endolith commented Aug 31, 2020 •

edited

Loading

@gavingc Basically there are two different interpretations of PCM data, used when converting to float. I listed two standards above that interpret it such that the largest positive value is considered full-scale, and therefore maps to +1.0, so the most negative number exceeds full-scale and is <−1.0. However, other sources (Android spec, USB Audio spec) interpret PCM as a fixed-point number, with the binary point after the first bit, so that the most negative number is full-scale, and maps to −1.0, so the most positive number is less than full-scale, and is <+1.0.

essentially invalid in other audio standards,

Do you have examples of other audio standards that this can be compared with?

Is the only fix to simply clip or compress the value that exceeds full-scale?

In my code, I'm supporting both interpretations, and for the one that allows negative values to exceed full-scale, I'm just keeping the values, assigning a float value more negative than -1.0. scipy/scipy#12507

gavingc commented Sep 13, 2020 •

edited

Loading

Hi,
I just meant from a high level the two audio standards AES17 and IEC 61606-3 both define a symmetrical approach while the WAV file format does not, exactly as you have laid out in detail.

I ended up applying the view demonstrated by MATLAB R2018b audioread(), that scaling is done by multiplying/dividing by 2^(nbits - 1).
This gives the range [-1, +1) i.e. < +1.0 or 1 – 2^( - (16 - 1) ) ≈ 0.999969482 for 16-bit.

Author

endolith commented Sep 13, 2020

@gavingc I'm not sure what you mean. Two's complement is inherently asymmetrical, so you have to accept one asymmetry or the other. The WAV file format doesn't specify anything about how to interpret this.

Thanks for the info about MATLAB, I'll document that in the scipy function. The MATLAB write function works the same way presumably?

gavingc commented Sep 13, 2020

Yes indeed, the WAV file format can even be used to store data that is not audio. Ideally though an audio file format would have specified symmetry?

I didn't investigate this closely but it looks like storing floats in WAV does support symmetry.

The most clear example of the asymmetry is perhaps the limit case of 1-bit signed, where formally the int values are -1 and 0.
I ended up representing that in a double data type as [-1.0, 0.0) .
https://stackoverflow.com/questions/12240925/sign-extend-1-bit-2s-complement-number

Yes I didn't confirm by testing the limits of every function in MATLAB but everything I used worked out with this approach, range [-1, +1), without loss of data/bits only loss of scaling resolution if you like.

Author

endolith commented Sep 21, 2020

@gavingc
Yes, I wished they specified the interpretation as well.

Yes, float format WAVs can exceed [-1.0, 1.0] by a huge amount, so there is no symmetry limitation there.

Yeah, 1-bit WAV could be interpreted as {-1.0, 0.0} with the fixed-point convention, while the full-scale convention breaks down at 1-bit and results in dividing by zero.

Yes, this says MATLAB does it that way too: https://www.mathworks.com/matlabcentral/answers/294112-what-does-the-audioread-function-actually-do#comment_377989

Author

endolith commented Jun 27, 2024 •

edited

Loading

See scipy/scipy#12507 for more context on the different ways to interpret this.

MATLAB's audioread, USB Audio and Android all interpret integer PCM data as fixed-point

which produces slightly different values.

endolith/WAV interpretation.md

How to handle asymmetry of WAV data?

Examples

Unsigned

Signed

endolith commented May 4, 2020

gavingc commented Aug 31, 2020

endolith commented Aug 31, 2020 • edited Loading

gavingc commented Sep 13, 2020 • edited Loading

endolith commented Sep 13, 2020

gavingc commented Sep 13, 2020

endolith commented Sep 21, 2020

endolith commented Jun 27, 2024 • edited Loading

endolith commented Aug 31, 2020 •

edited

Loading

gavingc commented Sep 13, 2020 •

edited

Loading

endolith commented Jun 27, 2024 •

edited

Loading