WAV files can store PCM audio (WAVE_FORMAT_PCM). The WAV file format specification says:
The data format and maximum and minimums values for PCM waveform samples of various sizes are as follows:
Sample Size Data Format Maximum Value Minimum Value One to eight bits Unsigned integer 255 (0xFF) 0 Nine or more bits Signed integer i Largest positive value of i Most negative value of i For example, the maximum, minimum, and midpoint values for 8-bit and 16-bit PCM waveform data are as follows:
Format Maximum Value Minimum Value Midpoint Value 8-bit PCM 255 (0xFF) 0 128 (0x80) 16-bit PCM 32767 (0x7FFF) -32768 (-0x8000) 0
Both the signed and unsigned formats are asymmetrical. How to handle the asymmetry? The signed version is two's complement representation, and AES17 defines the meaning of full-scale amplitude in this case:
amplitude of a 997-Hz sine wave whose positive peak value reaches the positive digital full scale, leaving the negative maximum code unused.
NOTE In 2's-complement representation, the negative peak is 1 LSB away from the negative maximum code.
As does IEC 61606-3:
amplitude of a 997 Hz sinusoid whose peak positive sample just reaches positive digital full-scale (in 2’s-complement a binary value of 0111…1111 to make up the word length) and whose peak negative sample just reaches a value one away from negative digital full-scale (1000…0001 to make up the word length) leaving the maximum negative code (1000…0000) unused
So, for example, for 16-bit audio, a signal that just reaches +32,767 and −32,767 would be full-scale, while one that reaches −32,768 exceeds full-scale.
The midpoint example for 8-bit clarifies that the symmetry of unsigned data is the same as for signed data. So, for 8-bit data, a signal that reaches from 1 to 255 would be full-scale, and the value 0 exceeds full-scale.
WAVE Audio File Format Specifications says:
For float data, full scale is 1.
So, to correctly convert signed ints to float, divide by 2**(b-1) - 1
, where b is the number of bits.
To correctly convert unsigned ints to float, subtract 2**(b-1)
, then, similarly, divide by 2**(b-1) - 1
.
The float representation will then be limited to +1.0 full-scale in the positive direction, but can exceed −1.0 full-scale in the negative direction.
WAV format actually allows for less than 8 bits:
The bits that represent the sample amplitude are stored in the most significant bits of i, and the remaining bits are set to zero.
So I'll show 2-bit audio first (wBitsPerSample = 2), because it's simpler to follow:
WAV | Sample | int | float | Comment |
---|---|---|---|---|
0xC0 | 0b11 | 3 | +1.0 | full-scale |
0x80 | 0b10 | 2 | 0.0 | midpoint |
0x40 | 0b01 | 1 | −1.0 | full-scale |
0x00 | 0b00 | 0 | −2.0 |
For 8-bit audio, as mentioned above, 255 is full-scale, 128 is midpoint, 1 is negative full-scale, and 0 exceeds full-scale:
WAV | Sample | int | float | Comment |
---|---|---|---|---|
0xFF | 0b1111_1111 | 255 | +1.000 | full-scale |
0xFE | 0b1111_1110 | 254 | +0.992 | |
0xFD | 0b1111_1101 | 253 | +0.984 | |
... | ... | ... | ... | |
0x82 | 0b1000_0010 | 130 | +0.016 | |
0x81 | 0b1000_0001 | 129 | +0.008 | |
0x80 | 0b1000_0000 | 128 | 0.000 | midpoint |
0x7F | 0b0111_1111 | 127 | −0.008 | |
0x7E | 0b0111_1110 | 126 | −0.016 | |
... | ... | ... | ... | |
0x03 | 0b0000_0011 | 3 | −0.984 | |
0x02 | 0b0000_0010 | 2 | −0.992 | |
0x01 | 0b0000_0001 | 1 | −1.000 | full-scale |
0x00 | 0b0000_0000 | 0 | −1.008 |
For 16-bit audio, the interpretation is signed:
WAV | Sample | int | float | Comment |
---|---|---|---|---|
0x7FFF | 0b0111_1111_1111_1111 | +32,767 | +1.00000 | full-scale |
0x7FFE | 0b0111_1111_1111_1110 | +32,766 | +0.99997 | |
0x7FFD | 0b0111_1111_1111_1101 | +32,765 | +0.99994 | |
... | ... | ... | ... | |
0x0002 | 0b0000_0000_0000_0010 | +2 | +0.00006 | |
0x0001 | 0b0000_0000_0000_0001 | +1 | +0.00003 | |
0x0000 | 0b0000_0000_0000_0000 | 0 | 0.00000 | midpoint |
0xFFFF | 0b1111_1111_1111_1111 | −1 | −0.00003 | |
0xFFFE | 0b1111_1111_1111_1110 | −2 | −0.00006 | |
... | ... | ... | ... | |
0x8003 | 0b1000_0000_0000_0011 | −32,765 | −0.99994 | |
0x8002 | 0b1000_0000_0000_0010 | −32,766 | −0.99997 | |
0x8001 | 0b1000_0000_0000_0001 | −32,767 | −1.00000 | full-scale |
0x8000 | 0b1000_0000_0000_0000 | −32,768 | −1.00003 |
As is 9-bit audio:
WAV | Sample | int | float | Comment |
---|---|---|---|---|
0x7F80 | 0b0111_1111_1 | +255 | +1.000 | full-scale |
0x7F00 | 0b0111_1111_0 | +254 | +0.996 | |
0x7E80 | 0b0111_1110_1 | +253 | +0.992 | |
... | ... | ... | ... | |
0x0100 | 0b0000_0001_0 | +2 | +0.008 | |
0x0080 | 0b0000_0000_1 | +1 | +0.004 | |
0x0000 | 0b0000_0000_0 | 0 | 0.000 | midpoint |
0xFF80 | 0b1111_1111_1 | −1 | −0.004 | |
0xFF00 | 0b1111_1111_0 | −2 | −0.008 | |
... | ... | ... | ... | |
0x8180 | 0b1000_0001_1 | −253 | −0.992 | |
0x8100 | 0b1000_0001_0 | −254 | −0.996 | |
0x8080 | 0b1000_0000_1 | −255 | −1.000 | full-scale |
0x8000 | 0b1000_0000_0 | −256 | −1.004 |
Also the wBitsPerSample field is 2 bytes, which means you could conceivably store data with 65535 bits per sample? Which is ridiculous. Even 255 bits per sample would be a dynamic range of 1,537 dB!