Skip to content

Instantly share code, notes, and snippets.

@munshkr
Last active November 13, 2024 10:17
Show Gist options
  • Save munshkr/30f35e39905e63876ff7 to your computer and use it in GitHub Desktop.
Save munshkr/30f35e39905e63876ff7 to your computer and use it in GitHub Desktop.
The C64 Digi ~ C=Hacking #20
<=============
The C64 Digi
=============> Robin Harbron <[email protected]>
Levente Harsfalvi <[email protected]>
Stephen Judd <[email protected]>
Introduction
------------
Digis -- digitally sampled audio -- are fairly common on the 64. This is
meant to be a comprehensive article on digis: how they work, examples,
different playback methods on the 64 (volume register and Pulse Width
Modulation), and some tricks. We'll even show you how to play 6-bit and
even 8-bit digis in high quality on a 64, which is really pretty neat to
hear.
The first part discusses digis from a fundamental point of view -- just
what a digi is, acoustic signals, and things like that. The most common
method of playing digis is via the volume register at $d418, and the next
two sections are devoted to this technique. Section two discusses some
SID fundamentals, and the reason why $d418 may be used for digis (and why
later-model SIDs don't play digis correctly); Section three discusses
$d418-digis from a software perspective: how to play them, tricks for
improving them, how to boost digis on 8580 SIDs, and how to detect what
kind of SID (6581 or 8580) is in the machine. The fourth and final part
of this article discusses pulse width modulation, and includes example source
code and a binary that plays a true 7-bit digi at around 16KHz -- something
which, we think, has never been done before.
Without further ado...
===============
Digis: Overview
===============
The whole point of playing a digi on a 64 is to provide something
for your ear to hear. So let's begin by discussing just what an acoustic
signal is and how that relates to digis.
Probably everyone knows that "sound" is how your ear responds to
changes in air pressure -- that is, when you clap your hands together,
it compresses the air between your hands in a special way, and that
higher pressure moves outwards into the surrounding air (since it's at
lower pressure). That pressure change propagates along and when it
encounters your ear it causes the ear drums to move, causing three little
bones to move, causing some fluid to move, causing tiny, exquisitely
sensitive hairs to move, transmitting a signal that your brain converts
to "sound".
An audio speaker also changes the air pressure in response to a
signal. If you take a coil of wire and change the voltage on it, it
generates a magnetic field; if a magnet is placed inside the coil, the
changing magnetic field will place a force on the magnet, causing it to
move, causing some air to be pushed along, causing a change in pressure,
causing a signal to propagate to your ear which your brain interprets as
Van Halen. All a stereo (CD player, etc.) does is send a varying voltage
signal to the speaker. As that voltage level goes up and down the magnet
moves back and forth, and so the speaker converts that electrical energy
into an accoustic wave.
For us, the trick is to coax SID into sending a specific voltage
signal to the speaker, the way a stereo or CD player might. And a CD player
is of course a very apt comparison, since it is itself a digi player.
Just for reference, a really good pair of ears can hear signals from
around 20Hz to 22KHz, with the sensitivity dropping considerably outside
of around 100Hz to 10KHz. A CD player has a playback rate of 44KHz, and
the highest frequency SID can generate from the frequency registers is
around 4KHz. If you've ever set SID to maximum frequency and heard just
how high 4KHz is, you can appreciate that even 10KHz is _really_ high, and
actually quite difficult to hear. In human speech, most of the information
content of vowel sounds is contained in the range 300Hz - 3KHz, and above
around 1KHz for consonant sounds; most information in musical sounds is in
the range 100Hz - 3KHz.
Discrete Sampling
To understand digis a little better, consider the more general
case of a discretely sampled signal -- a continuous signal sampled at
discrete time intervals. Let's say we had some device producing a
_continuous_ sinusoidal signal in time:
* *
* *
* *
* *
* *
* * *
* * *
* *
* *
* *
* *
* *
* *
-----------------------------------------------> time
(yes, I did miss my calling as an ASCII artist)
To turn the signal into a _discrete_ signal, we simply sample the
signal at discrete intervals of time. For example, let's say the above
signal lasts one second, and is input into a device which measures the
value every 1/4-second. The device will spit out four numbers: 0, 1, 0,
and -1:
*
* *
*
The sampling frequency here is four samples per second -- 4 Hz. If we were
to then play back this signal at the sampling frequency, we'd get a signal
like
**********
********** **********
**********
So one thing sampling does is to "staircase" a signal -- the sample becomes
some sort of "average" value over the sample period. Increasing the sample
rate -- taking more samples per second -- will smooth things out, and the
sampled signal will look (and sound!) more like the original signal.
Now let's say we just took two samples in that one second -- 2 Hz sampling
rate -- and just happened to catch the signal at its maximum and minimum
values (the peak and trough). Upon playback, the signal would look like
*********************
*
*
*
*
*
*
*********************
That is, a square (pulse) wave. If you're on the ball, you've noticed
that the frequency of the new signal is 1 Hz -- exactly half the sampling
frequency. This is also called the Nyquist frequency. In general, the
_maximum_ frequency that can be captured in a discrete sample (called the
Nyquist critical frequency) is half the sampling frequency -- as you can
see above, it takes two data points to get a single (nonzero) frequency.
So, for example, the highest frequency a CD player -- which has a sampling/
playback rate of 44KHz -- can capture is 22KHz, well above the range of
normal human hearing.
Thus, increasing the sample rate increases the frequency range captured
in the discrete signal. This is why a digi at a high sample rate in general
sounds better than a digi sampled at a low sample rate.
BUT -- there is more to life than sample rate: there is also sample
resolution. The sample resolution -- 4-bit samples, 8-bit samples, etc. --
determines how accurately the sample measures the actual signal. For
example, let's say we sample sin(x) when x=0.5:
sin(0.5) = 0.4794255...
No matter what sample resolution we use, there will always be some error
in the measurement, and the _true_ value of the sample will be the
_measured_ value plus some error.
In general the sampling errors are random and uniformly distributed, so
the sampled signal corresponds to the original signal plus some noise (the
random errors). That is why you almost always hear some sort of hiss on
a normal C64 digi, which uses a resolution of 4 bits per sample.
So, increasing the sample _resolution_ decreases the amount of noise introduced
into the sampled signal (and increases the dynamic range), and increasing the
sample _rate_ increases the frequency range.
If you're _really_ on the ball, you've noticed that the 1-Hz square pulse
above actually contains frequencies higher than 1Hz, simply because a
square pulse contains higher harmonics in addition to the 1Hz fundamental
frequency. And you've also no doubt realized that the sampled pulse wave
would sound different than the original sine wave (due, of course, to the
added harmonics) -- it's at the right frequency, but it will sound like a
pulse wave instead of a sinusoid.
Have we somehow broken the Nyquist limit?
The answer is no, because of a nifty thing called the Discrete Sampling
Theorem, which says that, given the samples h_n of a bandwidth-limited
function h(t), the original function h(t) is given by
h(t) = dt * Sum{ h_n * sin(2*pi*f_c*(t-n*dt)) / (pi*(t-n*dt)) }
where dt is the sampling period and f_c is the cutoff/critical frequency.
What this means is that the original signal can be _reconstructed_ from the
discrete samples, not that it is _equivalent_ to the discrete samples.
The Nyquist limit is the highest frequency that can be _reconstructed_ from
the discrete samples, not the highest frequency that will be produced if you
"staircase" the discrete samples through a speaker. If the original
signal is bandwidth-limited, and there are at least two samples for the
highest frequency, then the signal can be completely reconstructed.
Since a "normal" digi contains all these extra frequencies, shouldn't a digi
sound "different" than a "true" analog signal? Sure. On the other hand, many
of the extra frequencies are beyond the range of human hearing, and the rest
can often be removed using a filter -- all CD players filter the output, for
example. So sometimes it is worthwhile to turn on a low/band pass filter
when playing a C64 digi, especially at lower sample rates.
And that more or less summarizes basic discrete sampling theory.
=============
D418 Playback -- Hardware
=============
The SID contains both analog and digital subparts on one silicon plate -- in
other words, it is a mixed signal device.
At the time, the SID was certainly the best of the microcomputer sound chips.
This may be mostly due to its mixed signal design, which the designers used
to solve certain problems.
The hard thing in a sound generator design is to implement waveforms, volume
control, and mixing. Things like that don't really fit into the digital
'either 0 or 1' philosophy, unless lot of data bits and arithmetic functions
are involved. In a fully digital sound chip, the waveforms could be generated
by ROM lookup tables. The mixing function could be derived from binary
addition, while the volume control from division or multiplication. Unless the
sound functionality is greatly simplified, the arithmetic functions must be
present and they must be implemented in hardware. Finally, the D/A conversion
could be done by (fast) pulse width modulation just at the output stage.
(Most of today's wavetable sound cards operate like this).
This method implies heavy arithmetic hardware, which was not an option for
designers back then. Still, most sound chips were fully digital, and all
suffer from the required compromises (i.e. generating square waves only,
no dedicated channel volume control, etc. - both TED and the VIC-I are obvious
examples).
The solution that one finds in the SID design is very straightforward: mixing
and variable volume level is problematic in a digital circuit when dealing
with waveforms, so simply avoid doing it. In the SID, only the microcomputer
interface, the registers, the oscillators (phase accumulating oscillators),
and other controller logic are digital; the mixing and volume control parts
are fully analog. There are digital to analog converters providing analog
voltage levels from the digital state variables. The SID D/As are in fact
'multiplying' D/As, having an analog input (AIN), an input base voltage
(IBASE), and a digital input. They operate by amplifying the input voltage
offset (AIN-IBASE) by a factor proportional to the number on the digital input
and adding this offset back to the base level.
This mixed signal design also allowed some other features to be implemented.
The most important one is the analog filter (that is, a two integrator loop,
bi-quadratic filter, according to Yannes). With that, the SID points beyond a
home computer sound chip - it is a true analog subtractive synth (marketing as
such was cancelled because of manufacturing capacity reasons).
Here is a detailed map on the SID inners (analog path; probably my most
beautiful ASCII ever :-D). Info can be found in the SID patents (US 4,677,890;
1986), the MOS 6581 technical document (can be found somewhere on the Net), or
the back of the Programmer's Reference Guide (PRG).
----------------- 11bit ------------
|Cutoff freq reg|-------->|Cutoff D/A|---------o
----------------- ------------ |
$d415-16 |
|
----------------- 4bit ------------ |
|Resonance reg. |-------->|Reson. D/A |-o |
----------------- ------------- | |
$d417.[4-7] | |
| |
=0 v v
----------- ----------- >o------------>| ------------------
|wave D/A |--->|env. D/A |-->o/ | | |
----------- ----------- o--->| | | |
^ ^ ^ =1 o--------|--->| |
|12bit |8bit | | | | |
| | | | | | |
----------- ----------- | | | | |
|OSC1 + | |ADSR cnt+| | | | | |
|wave sel.| |env. log.| | | | | |
----------- ----------- | | | | |
$d400-03, $d405-06, $d417.0 | | | |
$d404.[1-7] $d404.0 | | | FILTER |
| | | |
=0 | | | |
----------- ----------- >o----|------->| | |
|wave D/A |--->|env. D/A |-->o/ | | | |
----------- ----------- o--->| | | |
^ ^ ^ =1 | | | |
|12bit |8bit | | | | |
| | | | | | |
----------- ----------- | | | | |
|OSC2 + | |ADSR cnt+| | | | | |
|wave sel.| |env. log.| | | | | |
----------- ----------- | | | | |
$d407-0a, $d40c-0d, $d417.1 | | | |
$d40b.[1-7] $d40b.0 | | | LP BP HP |
| =0 | ------------------
=0 | >o->| | | |
----------- ----------- >o----|--o/ | *** o o o
|wave D/A |--->|env. D/A |-->o/ | o- | / / /
----------- ----------- o--->| ^ =1 | =0 V V V =1
^ ^ ^ =1 | | | o o o o o o
|12bit |8bit | | | | | | | | | |
| | | |$d418.7 |<-------o | |
----------- ----------- | | |<------------o |
|OSC3 + | |ADSR cnt+| | | |<-----------------o
|wave sel.| |env. log.| | | |
----------- ----------- | | |
$d40e-11, $d413-14, $d417.2 | |
$d412.[1-7] $d412.0 | | -----------------
| | | Master volume |AUDIO
=0 | o--->| D/A |----->
>o----|------->| ----------------- OUT
EXT IN --------------------->o/ | ^
o--->| |4bit
^ =1 ^ |
| | $d418.[0-3]
$d417.3 ^ |
| |
Analog mixing ---|--------|
***: Filter type select switches, $d418.[4-6] respectively
$d418 digis
-----------
The most common method of playing a digi is to use the register at $d418.
When someone plays a digi using the master volume register, the situation is
similar to the waveform D/A converters. Both D/As are multiplying D/As --
signal amplifiers whose amplification is proportional to the input digital
number. If there is a nonzero signal offset on the D/A input it will be
multiplied proportionally by this number.
Playing digis with $d418 is possible because there is indeed a relatively
large DC voltage offset on the master volume D/A. This offset is present
right from the moment when the SID is powered up.
Where can this DC offset come from?
There is a mixer before the master volume D/A (see figure). If there's a DC
offset on the D/A input, it must come from there. ...And going further,
the DC offset on the mixer must also come from somewhere. But where?
Signals come from the three ADSR volume D/As, the EXTIN line, and the three
outputs of the filter. Fortunately, all paths that go to the mixer have analog
switches (all paths can be disconnected from the mixer individually, if that's
needed).
The above analog switches are driven by the filter selection bits ($d417 bits
0-3), the voice 3 off bit ($d418 bit 7) and the filter type selection bits
($d418 bits 4-6).
After a reset, the filter selector bits are all 0 (all signals are routed
towards the master mixer), the 'voice 3 off' switch is on, and the filter type
selector bits are 0 (filter outputs are unconnected). In this state, only
EXTIN and the three SID voice signals are present on the mixer. EXTIN can
be eliminated as the source since it has no DC offset (as long as the computer
was not hacked, see notes on the 8580).
The ADSR volume D/A is similar to the previously mentioned multiplying D/As.
If the digital number on the input is 0, the input analog signal offset can't
pass through (as measurements verify). This is the case when SID is reset,
setting the envelope counters to zero. Therefore, nothing behind the ADSR
multiplying D/As can have any effect on the DC offset of the mixer.
So, the DC offset must come from the ADSR multiplying D/As. Another
measurement shows that even the mixer itself has a small DC offset.
Tests and results
-----------------
I did some tests that support this theory. They were done 'by hand', by simply
using a digital voltmeter + the FC3 monitor.
The chip was a 6581(R1), 0883, Hong Kong (an early 6581).
When turned on the voltage on the AUDIO OUT was about 5.5 volts (slowly
decreasing as it warmed up, stopping at about 5.43 after some 10 mins - all
subsequent tests were done after this time period).
Writing $0f to $d418 raised the output voltage to 6.15 volts. Therefore, the
maximum output amplitude that can be achieved when playing digis is 0.72 volts
in this 'mode' (without wiggling any other SID settings to achieve higher
voltage levels) -- remember that what counts is the maximum voltage
_difference_, not the maximum absolute voltage.
The next test is to determine if the mixer has its own DC offset (with all
possible paths are disconnected). It's possible to do. With the volume at
maximum (to maximize any effect), all voices are routed towards the filter
($d417 = $0f), while making sure that the filter outputs are not routed to the
mixer ($d418 = $0f). In this state no paths can drive the mixer. The result
is 5.39 volts. When the volume changes, the output also changes towards the
previous 5.43 volts --> there is a (very small) DC offset from just the mixer.
What could be the DC offset value of each individual SID voice (i.e. the base
level difference of the multiplying D/As)? Doing the above, but leaving one
voice routed to the mixer ($d417 = $0e, $0d or $0b) gives 5.69 volts.
5.69-5.43 = 0.26 volts, and 5.43 + 3*0.26 = 6.21, almost 6.15 volts.
To determine if the ADSR multiplying D/As act as expected, I used pulse
waves with zero frequency and 0 or $fff pulse width (two cases), to make the
input signal of the ADSR multiplying D/A the minimum and maximum possible
level. After careful checking, the output changed a few hundredth volts
(about 0.01 volt per voice). So the D/A doesn't close up completely, but
it's still O.K.
To prove that these offsets are equal for all voices, I did another test. Some
people know that the filter inverts phase (multiplies the input signal by -1).
Machine is reset, $d417 = $01, $d418 = $9f. (Voice 1 is routed through the
filter, voice 3 is cut off from the mixer completely ($d418.7), low pass
filter is selected, volume = $0f). The output voltage was 5.41 volts, just
very slightly below the "default" output level. This means that the DC of
voice2 + (-1*) DC of voice 1 resulted in about 0 relative offset. Doing
similar tests proved that the DC offsets for the voices match each other
almost exactly (within a few hundredths of a volt).
These measurements all support the idea that the DC offset comes from the
ADSR multiplying D/As, that the offset is mostly independent from the waveform
D/A converters (as long as sustain levels are 0), and that the offsets are
equal for all voices. In addition, a small DC offset is supplied by the master
signal mixer itself.
What if we try different sustain settings? For this test, set the volume to
maximum, as usual. Set the sustain level to $0f for all voices ($d406, $d40d,
$d414 = $f0). Start the attack, but with no waveform selected ($d404, $d40b,
$d412 = 01). The output level is now 5.21 volts, a little bit below the '0'
offset of the audio output! (Doing the test with just one voice (all
others disconnected), the output is 5.29 volts).
Finally, we can do some experiments with the pulse waveform. The pulse
waveform is useful for these tests, since at zero frequency we can set both
the minimum and the maximum constant DC levels at the voice D/A just by using
the pulse width registers. Reset the computer. Set voice 1 to zero frequency,
pulse level $0fff, sustain level 15, and $d404=$41 (pulse waveform + gate on).
Route only voice 1 through the mixer ($d417 = $0e). The output voltage is
similar to the test when no waveform was selected -- 5.29 volts! This seems to
show that "waveform accu = $0fff" is the same as when no waveform is selected
(i.e. the waveform D/A digital input pins are pulled high when they're not
driven, as seen in most other NMOS chips).
When the pulse width is 0 in the above test the output changes to 6.34 volts.
This seems to be strange (a multiplying D/A giving higher signal level for
multiplying something by 0).
Now, when the ADSR multiplying D/A is closed, the output is 5.70 volts. When
it's fully open, the output changes from 5.29 volts (wave acc= $fff) to 6.34
volts (wave acc = 0). One reasonable answer is that the base voltage of the
waveform D/A is higher, and the analog input is tied lower than the base
voltage of the ADSR D/A -- the effect is that the SID waveforms will lie
'around' the ADSR multiplying D/A base voltage, more or less symmetrically.
This was surely done intentionally, to reduce absolute voltage levels (for
linearity). In the 6581, the big DC offset is probably a result of having the
ADSR D/As and the master volume D/A at different base levels (the difference
appears as true DC offset on the master volume D/A). If both were the same
(presumably at VDD/2), and the waveform D/A parameters were selected similarly
(operation is symmetric to VDD/2), there would be no final DC offset at all.
Rather like the 8580...
Other issues
------------
So now we know why $d418 digis are possible - but still, there are some things
to note.
The DC offset on the master volume D/A changes with different SID settings,
and whatever affects the DC offset on the mixer will affect the digi
volume. For example, even the filter output signals have a small DC offset.
Just do a test - set the volume to 0f, then simply turn one filter route on
(for example, $d418 = $1f). You'll hear a small click (i.e. a small DC offset
change on the mixer), even if the filter has no input.
Moreover, as seen above, the DC offset can be eliminated completely (just by
SID register settings), leading to no audible digi sound at the output. In
other words, whatever affects the DC offset on the mixer _will_ affect the
digi volume.
One place where this is important is playing a digi with a tune: there's a
constantly changing signal going to the mixer instead of a constant DC offset,
so playing a digi on the master volume also causes distortion for both the SID
voices and the digi sound (since they're cross modulated). To reduce this
effect most 3+1 like SID + digi players play samples by writing 8-offset sample
values to $d418 (ie. adding 8 to 3-bit sample values and writing this to $d418
- see players used by Jeroen Tel and other famous composers using digi). This
trick reduces the modulating effect while still maintaining good digi volume.
The DC offsets used to create awful clicking sometimes. For example, the
filter inverts phase. If the filter is currently routed to the mixer, there'll
be a large 'click' (2 times the DC offset) when a voice is on and its routing
is changed to or from the filter.
The 8580
--------
This is a completely redesigned chip. I don't know details, but it was
probably redesigned by the time all other chips in the C64 were done for CSGs
new manufacturing technology and the C64c. It is a 'better' chip from the
technical side (but in my opinion it sounds crude in comparison to the 6581,
at least the R4 series). The 6581 was designed in months. Bob Yannes had to do
everything from scratch and use the manufacturing technology MOS currently had
(NMOS). And it shows. First, it has high background noise. The DC offsets are
really also a misfeature. The D/A converters are sometimes non-monotonic (at
least, the waveform D/As and the filter cutoff D/A have some drops at the
change of the most significant bits). The op-amps in the active (resonant)
filter are simple, linearized NMOS inverters ;-) (loopbacked, they act like
more or less linear op-amplifiers around VDD/2). And I still haven't mentioned
bugs in the digital side (ADSR envelope bugs). Because of the above, one
probably won't find two identical 6581 chips -- each sounds a little bit
different (mostly due to the filter). Since the active components of the
filter are far from ideal, the filter is strongly nonlinear (the cutoff curve
changes with signal amplitude). On the other hand, these things are what make
the SID sound so unique.
Most of the problems were fixed in the 8580. It has much less background
noise. The chips sound the same (there are hardly any differences between
different 8580s). Most of the DC offset issues (the clicks) were elminated. It
needs less power, and lower VDD level. Something was changed also in the
digital logic, but the ADSR part was not touched. The 'combined' waveforms are
a bit different (and more useable from the musician's point of view).
The clicks were reduced, which means that there is no (or no significant) DC
offset on the master volume D/A in the 8580.
(I have not done any measurements, but after listening to a lot of 3+1 channel
type musics, I have a strong suspicion that even if sounds are turned on, the
average DC offset on the master volume D/A is still minimal).
To fix this in software, you'll have to wait until the next section of this
article.
To fix this in hardware, people use a simple hack: take a resistor of about
330k and tie the SID EXTIN line to GND through that (directly, beside the
chip, on the mainboard).
The EXTIN line goes directly to the mixer, and thus the master volume D/A, or
can also be routed through the filter. In either case, unless the filter is
disconnected, the above hack will give a pretty large DC offset, similar to
the original 6581s. So, digi sounds can be played :-) (even with SID music
playing simultaneously, similar to the 6581).
This solution is good as a work around, but there's one thing to note: this is
not completely the same as the 6581 ADSR D/A offset voltage. At least, this
offset is negative (should that pin rather be tied to VDD?). Programs that
depend on the 6581s way of DC offsets will not work correctly (but I know of
very few such programs, so at worst you'll experience slightly different digi
sound only occasionally -- but hey, the 8580 sounds different anyway). Another
problem is that when EXTIN is routed through the filter the DC offset may
cause strong distortion since the DC operating point of the filter is changed
-- bad news if the 'semi-linear' amplifiers in the filter are picky about
absolute DC level. Some music (not neccessarily involved with digis) indeed
do route EXTIN through the filter, for noise reduction on older C64s (with the
earlier C64 mainboards that pick up lots of 'digital' background noise from
EXTIN). DC distortion can also occur occasionally for the same reason but on
the master volume D/A (the higher the difference from VDD/2, the greater the
risk of experiencing nonlinearity and clipping distortion).
Some final words
----------------
A lot of this information comes from Dag Lem, who is certainly the No. 1 SID
hacker for me ;-). Take a look at reSID, his SID emulator library (the sources
can be downloaded from somewhere). reSID contains so much reverse engineered
information of the real SID that you won't believe it -- check it out if
you're interested.
=============
D418 Playback -- Software
=============
$D418 digis are by far the most common playback method. The volume register
gives 16 different amplitudes (0-15), and so can provide 4-bit digi playback.
In its most basic form, this is an extremely easy routine to code. Simply
load each 4-bit sample, and store it in the volume register ($d418).
Assuming $fd/$fe are pointing to the beginning of a series of samples, the
following code will play it back:
ldy #0
:loop lda ($fd),y
sta $d418
ldx #5 ;some delay value
:delay dex
bne :delay
iny
bne :loop
inc $fe
jmp :loop
The ldx #5 would have to be adjusted depending on the speed of the
sample - the lower this number (not including zero) the faster the
sample will play back.
There are a number of improvements we could make to this code - first
of all, this method takes twice as much RAM to store the sample as is
necessary. Because we're dealing with 4-bit samples, we can store 2
samples in each byte. This can be handled simply by alternately
masking out the high bits (with AND #15) to play the sample stored in
the low nybble, and by shifting the high nybble down to the low nybble
to play the high nybble (LSR : LSR : LSR : LSR). A lookup table may also
be used to save processor cycles (but use more RAM).
Another improvement is to move the routine to zero-page, and use self-
modifying code. In general, this results in the fastest digi players.
We should of course have the routine check for the end of the sample --
typically just checking the high byte of the zero-page pointer is enough
(in this case, checking $fe). Typically digis are page aligned anyway, so
just zeroing out the unused part (if any) of the last page is fine.
Finally, it is often important that each sample of a digi is played back
at regular intervals. If the samples aren't played at a steady speed,
extra distortion is audible. In the example above, playback is steady
for a full page (256) of samples - but several extra cycles are added
by incrementing the zero-page pointer to the digi. The situation
worsens when we start adding extra code to check for the end of the
digi, and even the main loop starts getting irregular when we add the
code for the simple form of packing discussed earlier (2 4-bit samples
per byte).
These problems can be solved by careful cycle counting and adding NOP
and harmless BIT instructions in strategic places to make each
iteration the same number of cycles, regardless of which branch is
taken - people who have written a stable raster routine, or done some
Atari 2600 coding have likely done this sort of painstaking work
before.
NMI-driven digis
----------------
More commonly, however, we enlist the help of CIA #2 and have it
generate regular Non-Maskable Interrupts which we use to call our digi
player. This has two important advantages - first, it makes timing
much more simple. Second, it frees your main program to do other
things while the digi is playing "in the background".
To experiment, I pulled a 4-bit packed digi from the extras disk
included with Super Snapshot 5.22. It's the beginning seconds of the
introduction to Classic Star Trek (Space, the Final Frontier).
Here's the source for a fairly "frills-free" NMI based digi player,
with my comments after blocks of code:
start = $1400
end = $7cff
freq = 141
ptr = $fd
Labels start and end simply point to the beginning and end of the
digi. Freq isn't actually the frequency - it's the number of
processor cycles between interrupts necessary to play the digi at the
desired speed/pitch. If you know the frequency (in hz) of the digi,
simply divide your CIA clock speed (approximately 1000000 hz) by the
digi frequency. In this case, the digi runs at approximately 7100 hz.
We use two zero page locations to form a 16-bit pointer to the current
sample in the digi to play.
*= $1000
;disable interrupts
lda #$7f
sta $dc0d
sta $dd0d
lda $dc0d
lda $dd0d
sei
This code simply disables interrupts and initializes both CIA timers.
;blank screen
lda $d011
and #255-16
sta $d011
Just like erratically timed code can introduce distortion when a digi is
played back, the VIC steals cycles from the processor that can cause
interrupts to not occur precisely when you'd like them to. This routine will
work without the screen blanked, but the extra noise introduced when the
screen is on is noticeable when the time between samples is less than around
2.5-3 times the time the processor is stopped. Another option is to use some
multiple of the raster timing as the sampling rate, and start the routine on a
non-badline, to ensure that the interrupts never occur on a badline. (A final
option is to use a raster-driven interrupt for the digi; with the SCPU, it is
actually possible to drive an IFLI display and play a digi at the same time,
badlines and all -- email Robin for more info, or maybe wait for a future
article!). But the simplest thing to do is to blank the screen :).
;switch out roms
lda #$35
sta 1
;point to our player routine
lda #<nmi
sta $fffa
lda #>nmi
sta $fffb
Unless using the KERNAL routines is necessary in my program, I always
switch out the ROMs. One of the biggest benefits is that our NMI
routine will be immediately called, rather than using $0318/$0319 and
waiting for the KERNAL to indirectly call your routine.
;initialize player
lda #<start
sta ptr
lda #>start
sta ptr+1
ldy #0
sty flag
lda (ptr),y
sta sample
This section simply initializes the various memory locations that the
player uses - sets ptr/ptr+1 to point to the beginning of the digi,
loads the first sample, and clears the flag that handles the
alternating between the lower and upper nybble of the packed samples.
;setup CIA #2
lda #<freq
sta $dd04
lda #>freq
sta $dd05
Sets Timer A on CIA #2 to freq.
lda #%10000001
sta $dd0d
Enables Timer A interrupts on CIA #2.
lda #%00010001
sta $dd0e
Sets Timer A to run in continuous mode. As soon as Timer A counts
down to zero, it will automatically be reloaded to the last writes to
$dd04/$dd05 and begin counting down again.
endless jmp endless
For this example, we just put the computer in an endless loop.
nmi
pha
txa
pha
tya
pha
;play 4-bit sample
lda sample
and #15
sta $d418
We play the sample while all the code is still linear - before any
branches have occurred. This is to minimize the distorting effects I
mentioned earlier. The AND #15 is used so we don't inadvertently
enable the filter bits in $d418 with the high nybble packed into
sample.
;clear NMI source
lda $dd0d
By reading $dd0d, we are acknowledging the source of the interrupt,
and the CIA will now generate another interrupt next time Timer A
counts down to zero.
;just something to look at
inc $d020
;every other NMI do 1) or 2):
lda flag
bne lower
Now we deal with "unpacking" the samples.
;1) shift upper nybble down
upper lda sample
lsr a
lsr a
lsr a
lsr a
sta sample
jmp exit
When flag is set to zero, we shift the high nybble of sample down to
the low nybble so it's ready to be played next NMI.
;2) get a new packed sample
; then point to next
lower ldy #0
lda (ptr),y
sta sample
inc ptr
bne checkend
inc ptr+1
When flag is set to one, we load a new packed sample into sample, and
point ptr at the next packed sample.
;if end of sample, point to
;beginning again
checkend lda ptr
cmp #<end
bne exit
lda ptr+1
cmp #>end
bne exit
lda #<start
sta ptr
lda #>start
sta ptr+1
Simply check for the end of the digi, and if we've reached it, loop
back to the beginning of the digi.
;toggle flag and exit NMI
exit lda flag
eor #1
sta flag
pla
tay
pla
tax
pla
rti
;sample's lower nybble holds
;the 4-bit sample to played
;next NMI - the upper nybble
;holds the next nybble to be
;played on "odd" NMIs, and is
;undefined on "even" NMIs.
sample .byte 0
;flag simply toggles between 0
;and 1 - used to decide whether
;to play upper or lower nybble
flag .byte 0
Improving D418 Digis
--------------------
D418 digis tend to generate a lot of noise, because, of course, the 4-bit
sample resolution. Over the years people have come up with numerous tricks to
improve the sound of a d418 digi; here are some that we know of and have tried.
The first, and most obvious, thing to do is to use the low-pass filter, since
a lot of the noise is at higher frequencies. Unfortunately this won't work,
since the filters occur in SID before the volume amplifier -- all the filters
can do is change the DC offset that makes the digi possible. This trick
will work for methods that use SID voices, however (such as Pulse Width
Modulation, discussed in the next section).
Another trick is to "dither" the sound, as discussed in C=Hacking #11. The
idea here is to generate an intermediate "average" value by toggling between
two values. For example, if d418 is set to '8' half of the time, and '9' the
other half, its 'average' value will be 8.5. So this is somewhat like adding
an extra bit of resolution. In principle, you can extend this further: if it
is '8' one-third of the time and '9' for the remaining two-thirds, the average
value will be 8.66. And so on.
Now, we aren't _really_ increasing the sample resolution here, but are instead
increasing the sample playback rate -- we're playing two samples ('8' and '9'
for example) where before we played just one. Don't get too carried away
thinking about "average" voltage levels (after all, there is an average
voltage for the entire digi but that's not what you hear!) -- what's important
is how well the sampled signal represents the original signal. If the
original signal is rising from 8 to 9 during the sample interval, this type
of trick will work well.
Which leads us to another trick: interpolation. This is really a compression
trick, more than a 'resolution' trick. Let's say that one sample value is 5,
and the next value is 9. It might be reasonable to expect an 'intermediate'
value of 7, to play right after the 5. Once again, the idea is to increase
the playback rate to better-represent the original signal. This type of trick
increases the playback rate without increasing the amount of data -- and as
always, your mileage may vary. Many modern soundcards and CD-players use
interpolation.
Another curious trick is to add noise to the signal -- that is, the 4-bit
sample corresponds to the original signal plus noise. Sometimes, by adding
noise to the signal playback the noise can actually cancel! The 'dithering'
trick above can be viewed in this way.
Boosting 8580 Digis
-------------------
As most people know, there are 'old' SIDs (6581) and 'new' SIDs (8580), and
$d418 digis do not work right on 8580 SIDs, (such as in the 128D, most 128s,
and the 64C) for the reasons discussed earlier -- the 8580 does not have a
residual voltage leading into the amplitude modulator.
The software fix for this is pretty simple: have SID generate a signal, and
hence a voltage, for the volume register to modify. You can actually use
pretty much any waveform to do this, but a pulse is the simplest, since a
pulse wave just toggles between two voltage levels. Moreover, page 463 of
the PRG says, "The TEST bit, when set to a one, resets and locks Oscillator 1
at zero until the TEST bit is cleared. The Noise waveform output of
Oscillator 1 is also reset and the Pulse waveform output is held at a DC
level." So it's not really necessary to worry about the frequency or pulse
width, by using the test bit.
BUT -- it is very important to set the sustain level to $f. The ASDR envelope
generators generate the voltage. A sustain level of 0 gives no improvement.
So, to 'boost' a digi on a later-model SID, you can just turn on a pulse with
the test bit set:
LDA #$FF
STA $D406
LDA #$49
STA $D404
Setting more voices gives the digi a substantial extra boost:
LDA #$FF
STA $D406
STA $D406+7
STA $D406+14
LDA #$49
STA $D404
STA $D404+7
STA $D404+14
The moral is: if you're writing a digi routine, and want it to work on all
computers, be sure to boost the digi.
And for completeness, using more channels is a commonly used trick to enhance
digi resolution on the Plus/4. The TED digi resolution (the volume register)
is 3 bits. Fortunately, all channel on/off bits + the volume level are in the
same register ($ff11). If one source is on, the output DC is about half of the
level when both are turned on. This trick can be extended further to results
in a 'semi 4-bit' or 5-bit digi table (the dynamic range is enhanced, but
there are larger steps at the table end than at the start). This trick could
also be used in SID if the sound sources were accurately preset, but runs into
problems due to the non-matching SID-versions and having the control bits in
multiple registers.
SID Type Auto-Detect
--------------------
The following routine will detect what type of SID is in use. I've
tested it on a fair cross-section of my collection of computers - my
NTSC 128D, two 64Cs, two "breadbox" C-64s, and my PAL breadbox 64. In
all cases the code performed 100% accurately - but still, there may be
cases where it fails. I'd be interested to know if anyone finds any
faults in the routine, so I can improve it!
How does the routine work? I was told that the old SID (6581) and the
new SID (8580) behave differently when set to play combined
waveforms. I coded a fairly simple routine to use the REU to sample
$d41b (the upper 8 bits of Oscillator 3's waveform output) for a full
64k bank. Then I experimented with various frequencies and
combinations of waveforms on Oscillator 3 until I found consistently
different results with the two different SIDs.
When I combined the triangle and sawtooth waveforms and then sampled
$d41b I found that most of the time the oscillator was just putting
out zeros, with occasional bursts of numbers. These "bursts" were
consistently near $ff on the 8580, while the 6581 was always well
below $80 - often $3f was the highest it would get.
So, the detection code ended up being quite simple - I'll explain each
block of code:
*= $4000
start sei
lda #11
sta $d011
Disable bad-lines (by blanking the screen). This prevents badlines
from interfering with the detection process.
;sid setup here!
lda #$20
sta $d40e
sta $d40f
Set Oscillator 3's Frequency Control to $2020. I just randomly chose
this value when experimenting, and it worked, so I kept it. The trick
here is to set a value fast enough that the oscillator will make a
number of cycles (so we can get a good sample of the values coming
out) but not so fast that it might miss any of the "bursts" I was
mentioning earlier.
lda #%00110001
sta $d412
Combine the triangle and sawtooth waveforms and start the ADSR cycle.
ldx #0
stx high
loop lda $d41b
cmp high
bcc ahead
sta high
ahead dex
bne loop
This loop takes 256 samples of Oscillator 3's output, saving the
highest value in location high.
lda #%00110000
sta $d412
Stop Oscillator 3.
cli
lda #27
sta $d011
Turn the screen back on.
lda high
rts
high .byte 0
Return from the routine with the highest value sampled from Oscillator
3 in the accumulator. This allows you to branch based on the high
bit:
bmi SID8580
bpl SID6581
Voila!
======================
Pulse Width Modulation
======================
The primary limitation of using the volume register is, of course,
that it is only 4-bits. Pulse width modulation (PWM) allows us to get
around that limitation.
In general, there are lots of ways of transmitting information.
If you've ever used a radio you've encountered both amplitude modulation,
where the signal is encoded as the amplitude of some carrier wave, and
frequency modulation, where the signal is encoded by changing the frequency
of the carrier wave. In both cases, the idea is to strip out the encoded
information and throw away the carrier.
Yet another possibility is pulse width modulation: use a pulse
wave at some carrier frequency, and modulate the pulse width. Pulse width
modulation has several nice properties for transmitting signals; we can
take advantage of it to play digis.
Pulse waves, of course, take on only two possible values: zero and
one (low and high, etc.). Over a single period, a pulse wave will in general
be low for some amount of time and then high for some amount of time.
The _duty cycle_ of a pulse wave is the amount of time it spends in the high
state compared to the total period. For example, a square wave, which is
low exactly half the time and high the other half, has a duty cycle of 50%:
______ ______
| | | |
| | | |
_____| |____| |____ ...
Remember that, regarding SID, a signal like the above is simply a voltage
level. What is the _average_ voltage over a single period? Since a square
wave is zero half the time and one the other half the average value is
just 1/2. If instead the pulse had a duty cycle of 75%, it would be low
for 1/4 the cycle and high for 3/4, giving an average value of 3/4.
So the _average_ value of a single pulse is simply the duty cycle. So if
we change the duty cycle for each pulse we can essentially generate a
series of average voltage values -- and since a digi is nothing more than
a series of average signal values, we can use PWM to play a digi.
To make this more precise, let's say we had a digi sampled at 1KHz -- one
thousand samples per second. Since each sample value will be approximated
by a pulse, we need one thousand pulses per second. The duty cycle of
the first pulse will be the first sample value, the duty cycle of the
second pulse will be the second sample value, and so on. Note that
the sample rate is the carrier frequency -- the frequency of the modulated
pulse train, 1KHz in this case.
(Actually, to be more accurate, we need _at least_ 1000 pulses per second --
for example, we could use 2000 pulses per second, and represent each sample
value using two pulses. So the more correct statement is that the pulse
carrier frequency is the maximum sample playback frequency.).
The advantage for playing C64 digis is that we have much more resolution
for the pulse width, and probably not in the way you think! Because you
are probably thinking that SID has this nice 12-bit pulse width that
we can use here. The problem is that the absolute highest frequency SID
can produce, using the frequency registers, is about 4KHz, which would
be the maximum playback rate.
There's still another catch -- the carrier wave is still there! Imagine
trying to encode a signal that was constant, say 1/2 everywhere. To
generate a "digi" value of 1/2, you'd use a square wave, half down and
half up. So while the _average_ value of each pulse would be 1/2, the
actual signal would be a square wave at the carrier frequency (look at
the little picture above if you don't see it -- its average value is 1/2).
Trying to modulate a 4KHz carrier wave results in a piercing 4KHz tone,
and a _maximum_ sample rate of 4KHz (and this assumes that you can sync your
code up exactly with SID). So that's pretty worthless for digis.
- BUT -
What if we could change the voltage level manually? Let's say some
hypothetical machine language program toggled the voltage level
on each machine cycle -- the result would be a square wave of
frequency 0.5 _mega_ hertz. Okay, let's say it changed the voltage
level every 10 machine cycles -- the result would be a carrier
frequency of around 50 KHz. The point here is that a machine language
program can generate its own pulse waveform, and do so at much higher
frequencies than SID can produce.
Toggling the voltage levels turns out to be very simple. As was
described earlier, the way to "boost" digis on later SIDs is to use
a pulse waveform at frequency zero. Depending on the value of the
pulse width register, SID will set the output voltage to either high
or low. So all a program has to do is set up a pulse waveform at zero
frequency and use the pulse width registers to toggle the voltage --
set $d403 to either $00 or $ff to toggle low/high. (You could also use
$d418 to toggle low/hi, but this method should produce more uniform
results, and unlike $d418 can be filtered).
So now we're cooking -- we've got a program that can generate a pulse
train. The next step is to change the width of each pulse to represent
the sample values in our digi. Remember that the duty cycle -- the
percentage of time the pulse spends high -- is the average value for that
pulse. But also remember that each digi sample represets an average
value over the sample period. If the pulse period is equal to the sample
period, then _the duty cycle is exactly the sample value_!
Example: let's say that we have an 8-bit sampled digi, so that values go
from 0-255, and our program generates pulses with a period of 256 "ticks".
Now pick a sample value, say 56. All the program has to do is hold the
pulse high for 56 "ticks", and low for the remaining 255-56 = 199 "ticks",
and it will have the correct average value: 56/256. So a program to play
8-bit samples might look like
1 - Load .X with next sample value
2 - Load .Y with 256-.X
3 - Set pulse high
4 - Loop for .X iterations (each loop iteration is one "tick")
5 - Set pulse low
6 - Loop for .Y iterations
7 - Loop back to step 1
Let's say that each "tick" takes m cycles, and the sample size is 2^n, so
that there are 2^n ticks per sample. A stock machine runs at around
10^6 cycles/second, so...
(10^6 cycles/second) / (2^n ticks/sample * m cycles/tick)
= 10^6 cycles/second / (m * 2^n cycles/sample)
= 10^6 / (m * 2^n) samples/second
So, for example, let's say we had n=6-bit samples -- 2^6 = 64 -- and could
generate pulses with a resolution of one machine cycle -- m=1. Then
we could play that 6-bit sample at 10^6/64 = 15.6KHz. That is _really
very good_! In principle -- possibly using the CIA timers, possibly using
fixed delay loops, possibly using a massively unrolled loop -- this can
be done on a stock machine. (I did try using the CIA timers, but the
number of cycles to set up the timers was too big, and made it sound poor;
I've included the code below though.)
At this point it becomes a numbers game. As we increase the sample size
(increase m or n above), we _decrease_ the sampling rate -- if, in the
above example, we instead use 8-bit samples, the sampling frequency drops
by a factor of four to around 4 KHz. So there's a tradeoff between
resolution and sampling frequency.
AND... we still have this issue of the carrier frequency. You should be
able to convince yourself that the sampling frequency above is exactly
the carrier frequency. So with the 8-bit resolution example there
would be an awful 4KHz tone running through the playback. There are
only two ways to beat the carrier frequency: push it high enough that
you no longer hear it, or else push it high enough that you can use the
filters to dampen it down.
How high is high enough? You can judge for yourself, but 15 KHz is
pretty tough to hear, unless you have good ears and the volume is really
loud -- so 6-bit samples are within reach on a stock machine.
But add a SuperCPU into the picture, and the numbers get _really_ nice.
Everyone knows that a SCPU can interact with the C64 at 1MHz, and
hence generate pulses with 1MHz resolution, using code like
lda #$ff
sta $d403 ;Set level high
:loop lda $d011 ;wait for C64 cycle
dex
bne :loop
where .X contains the sample value. But what happens if we try to move
beyond that 1MHz? What if we put some NOPs into the above delay loop,
in place of the lda $d011? Well, in principle it means that the duty
cycles won't always be right, which corresponds to some sampling error.
In practice, however, it works _really well_! Consider what happens when
the above code is changed to:
:loop
nop
nop
dex
bne :loop
The earlier formula still applies, but now using 20MHz cycles:
20 * 10^6 / (m * 2^n) samples/second
In this example each loop iteration -- each "tick" -- is nine 20MHz cycles,
giving a playback rate of approximately 17Khz for 7-bit samples. Which
is TOTALLY COOL!
And it can even be pushed to 8-bit samples (although I personally don't think
they sound any better, at least with the code I've tried; maybe the code can
be improved). Using loops like
:loop
dex
beq :done
dex
beq :done
...
dex
bne :loop
:done
it is possible to "fine-tune" the loop tick to somewhere between 4-5 cycles,
giving a playback rate between 15KHz and 19KHz, for an 8-bit sample. Pretty
cool. The code is also a little more involved (with 7-bit samples we can
use BMI for the loop branches; not so with 8-bits). But it really is
possible to play 8-bit samples at 19KHz on a C64 (plus SuperCPU).
Using two voices
----------------
You may be thinking, Hey, we've got three pulse waves to work with, can
we improve the performance by using multiple pulses?
Let's say we have two pulses, P1 and P2, with the same period. When both
are activated, the pulses simply add together -- that is, the total voltage
is just the sum of the individual voltages, and therefore the _average_
voltage is the sum of the individual pulse averages:
avg voltage = D1 + D2
where D1 and D2 are the duty cycles of pulses P1 and P2. In the simplest
case, this gives us an extra bit of resolution -- if D1 and D2 are both
7-bit values, say, then D1+D2 is an 8-bit value.
-BUT-
Consider, for a moment, what would happen if we were to change the amplitude
of the second pulse -- that is, let's say the maximum voltage it took on
was 1/16 of the maximum voltage of the first pulse. The average voltage
would then be
avg = D1 + D2/16
This then gives us _four_ extra bits of resolution, with each bit to the
_right_ of the decimal place. For example, if D1 and D2 are 4-bit numbers,
with D1=xxxx and D2=yyyy, then the avg will be a number like xxxx.yyyy
(four bits to the left of the decimal place and four to the right).
Of course, we can change the pulse amplitude by changing the sustain
setting, so in principle this gives a very easy and efficient way of
playing high-resolution digis. In practice, I have not been able to
make it work very well. I used a sustain setting of 1 and split an
8-bit sample into two 4-bit pulses; I believe the result sounds better
than 4-bits, but certainly doesn't sound anywhere near 8-bits. My
suspicion is that it is because the second pulse voltage is not really
1/16 of the first pulse, which corresponds once again to adding noise
to the sample value.
To find out, we can just measure the output at different sustain levels.
The following table gives the voltage output for voice 1 using a pulse
waveform at zero frequency and volume 15:
Pulse Width Diff
SU 000 fff 000 fff
0f 6.34 5.29 .08 .07
0e 6.26 5.36 .02 .01
0d 6.24 5.37 .06 .05
0c 6.18 5.42 .03 .02
0b 6.15 5.44 .05 .03
0a 6.10 5.47 .03 .02
09 6.07 5.49 .04 .02
08 6.03 5.51 .03 .02
07 6.00 5.53 .05 .03
06 5.95 5.56 .03 .02
05 5.92 5.58 .05 .02
04 5.87 5.60 .04 .03
03 5.83 5.63 .04 .02
02 5.79 5.65 .06 .02
01 5.75 5.67 .05 .02
00 5.70 5.69
Voice 2 is identical within a few hundredths of a volt. If this test is
repeated using voices 1 and 2 simultaneously, the result is:
Pulse Width
SU 000 fff
0f 7.30 5.25
0e 7.12 5.36
0d 7.09 5.37 (!)
0c 6.95 5.46
0b 6.88 5.49
0a 6.78 5.54
09 6.72 5.58
08 6.62 5.62
07 6.58 5.65
06 6.47 5.70
05 6.40 5.73
04 6.31 5.78
03 6.22 5.82
02 6.13 5.87
01 6.07 5.90
00 5.97 5.95
Note the weird step at $0d -- the response is definitely not linear!
Now, to summarize, when using one voice, the "positive" amplitude (about the
mean 5.70V) is .64V and the "negative" amplitude is .41V, giving a spread of
1.05V. With two voices together, the amplitudes are 1.33V, 0.72V, and 2.05V
respectively. If the two signals were simply added together, the numbers
should be 1.28V, 0.82V, and 2.1V.
What we originally wanted was a signal like
D1 + D2/16
that is, another pulse that is 1/16 the value of the 'full' pulse. 1/16 of
the positive amplitude is .64V/16 = .04V, and 1/16 of the negative amplitude
is .41V/16 = .026V. A setting of sustain level 1, on the other hand, gives
voltage offsets of 0.05 and 0.02, giving approximately
.64V / .05V = D1 / 12.8
.41V / .02V = D1 / 20.5
So, in summary, whereas I wanted D1 + D2/16, I was actually getting something
that varied from D2/12.8 to D2/20.5, even if the two voices summed together
correctly.
There may still be a way to make all this work right, which would be great,
but I'm tired :). The code from my attempts is below.
I also could not get two 7-bit pulses to sound like an 8-bit pulse. I took
an 8-bit pulse and divided it in half, assiging each half to a pulse
(and giving the extra bit to pulse 2, if an extra bit was present).
I suspect that another issue is that it is impossible to update both
pulses simultaneously, meaning some delay between pulses, which translates
to adding -- surprise! -- noise to the signal. Perhaps it would be
more effective at lower resolutions, however.
If someone has some success using these techniques I'd be interested in
hearing it.
SID lockups
-----------
Blindly applying these PWM algorithms has a way of locking up SID -- like,
locking him up hard. To be honest, I don't have a good explanation for why
this happens, and I haven't yet found a good method of prevention -- toggling
the test bit, playing a real sound for a short time, toggling the gate bit,
and so on, just don't seem to "initialize" SID reliably enough. Sometimes the
code works, and sometimes it doesn't -- it's the same code both times. Often
resetting the machine will make things work; I'm not sure what hardware resets
take place within SID, but the kernal certainly zeros him out so that's a
possibility. The other observation is that playing a tune seems to 'clear
out' whatever is blocking SID. So there _must_ be some kind of software
solution to the problem.
In the example code pressing RESTORE restarts the code, which will usually
clear the 'blockage' after a tap or two, if it happens.
If anyone has some thoughts on this issue (or even better, an explanation
of what is going on!) I'd love to hear them.
Pulse Width Modulation, continued
--------------------------------- from various
The digi article in issue #20 of C=Hacking left a few loose ends, and
generated some followups.
First, Otto Jarvinen (sounddemon) emailed to say that the SID detection
routine occasionally reported incorrect results for him, and suggested that
a workaround was to do the detect several times. YMMV!
Second, a day or two after issue #20 was released, Levente discovered a
brilliant way to play 6-bit PWM digis on a stock machine:
--
I couldn't resist, and tried something out (see attachment). It works!!! :-)
In fact, when I wrote the last letter I didn't know that I found something
useable, just had some ideas - I felt that I'm at the right place. When I read
C=H 20 this morning and read your comment about the Test bit (from the PRG), I
knew that it must work. All I had to do is then to put this idea into code.
The whole idea is about starting the pulse by software, and then having the
SID turn it back to 0 after a time.
Is it possible? ...The keys are the Test bit (the SID wave counter can be
reseted anytime), the pulse width register, the wave counter and the SIDs way
of generating pulse wave. (Ie. the pulse wave is high, as long as the wave
counter is less than the value in the pulse width register).
Check this algorithm:
- Init: volume at max, voice 1 sustain level max, start attack. Freq is
selected well (=$4000), so the wave counter is incremented by 4 every
processor clock cycles.
Loop:
- load next sample value, and put it to the pulse width low register ($d402;
ensure that $d403 is 0).
- Set test bit, and clear test bit (counter reset).
- Increase sample pointer, some delay, then loop. The delay must be 64 clock
cycles + the time while the Test bit is kept set (4 cycles if using STA $d404
: STX $d404 immediately with pre-loaded values).
What will happen? The 8-bit sample value is put directly to the pulse width
register (MSBs of the pulse width register are cleared!...). The wave counter
is started (release test bit), and it increases 4 by every CPU cycles (=
counts 256 in 64 cycles). After some time, the counter will reach the value in
the pulse width register. This happens in exactly after (8-bit sample value /
4) cycles, because of the above. In this cycle (or the next?...) the SID turns
its pulse output to 0. Voil�!
One must just make sure that the loop length in cycles matches the above
conditions, and then it runs like hell... Since it does exactly the same on
the SID as the other (bit-banging) way, it just does it with some hardware
help, there's also no problem with the 4khz maximum barrier (since the
oscillator is reset every loop).
With little enhancement, it's possible to write an about 7.5 bits player for a
stock C64 by this method. This is what you find in the attachment... The idea
is using all the 3 channels simultaneously. A slightly increased sample value
is written to the three pulse width registers, so the oscillators will finish
the duty cycle one processor cycle later, when there's a carry between
bits(0,1) to the MSBs.
The replay freq is the CPU clk / 68 (~15khz). 64 cycles (variable duty cycle)
+ 4 cycles (constant duty cycle because of the reset time - no problems with
that, it doesn't change (just gives a small constant DC...)).
By similar methods, it should be possible to write a sample player with higher
PWM freq (with less resolution of course, but eliminating this still audible
whistling).
(I tried using the filter to reduce it, but it sounded so bad that I left it
out. It clicked like hell. The FETs got saturated.)
[Richard Atkinson suggested turning down the sustain volumes to avoid this]
See the attachment, and the binary. I think the sample sounds pretty good :-).
(The cut is from 'Greece 2000' by Three drives on a vinyl).
(Another idea that popped up in my mind: since the TED sound generator can
also be reset, I could probably translate this idea to the Plus/4 :-O ).
Best regards,
Levente
--
The binary is available at http://www.ffd2.com/fridge/chacking/ towards the
bottom of the page.
Third, I received a very interesting email from an Apple-II guy, which I'd
like to pass on:
--
Hi!
I found your page as I was searching for something else 6502-related,
and was very interested. Although I have always been aware of the
C64, I have never really been a user--I have used Apple II's since 1980.
I was particularly interested in the article on playing "digis" on the
C64. I became interested in playing digitized sounds on the Apple II
in 1993, after hearing a 3-bit, 11.025 KHz PWM player. At 3 bits, you
can imagine how noisy speech samples were, but the overall effect
for a 1 MHz machine with a 1-bit speaker "toggle" was amazing. It
made me wonder how far this PWM technique could be pushed on a
stock, 1 MHz Apple II (not the somewhat faster, 65816-based IIgs).
The short answer is, much farther than I expected! Robin and Stephen
accurately describe the theoretical PWM limit as 6 bit samples at
about 16 KHz for a stock 1 MHz machine, but, as they point out,
that is not practically realizable for a number of reasons, unless the
play loop is completely unrolled!
Furthermore, in the Apple II world, sampled sounds have acquired a
few standardized sampling rates--mostly as a result of Mac influence,
which was in turn influenced by CD's. The most common rate in the
Apple II world is 11.025 KHz, or one-fourth of the audio CD sampling
rate. This is commonly considered to be "AM radio quality", with a
Nyquist bandwidth of about 5.5 KHz and a practical bandwidth of
4+ KHz, given practical anti-aliasing filters (at the sampling end, not
the playback end).
A frequency of 11.025 KHz is, though high, still painfully audible to
people whose ears are not zonked--a piercing "squeal" running
through every sound. So even though it is possible to write a
practical 6-bit 11.025 KHz PWM player (usually called a SoftDAC
in the Apple II world), the resulting listening experience is disappointing.
So I went to work on a way to do 2x oversampling, and built a 5-bit
22.050 KHz PWM player. It was sad to lose a bit, but the absence
of any audible "carrier" more than compensated for it!
If you have access to an 8-bit Apple II (preferably with lower case,
like a //e), and also preferably with a way of attaching an external
speaker or headphones in place of the miserable 2.75" internal
speaker, then you can easily give it a try and judge for yourself.
I'm pretty proud of the novel design of the code, which I would
characterize as "vectored" unrolled loops, one for every two
pulse duty cycles, which I wrote a BASIC program to write
for me--much less painful for counting cycles!
The package is available on the web at:
http://members.aol.com/MJMahon/index.html
and is called <A HREF="http://members.aol.com/MJMahon/sound22.shk">Sound Editor v2.2</A>, since I had to "dress up" the player
into something fun to play with. ;-) An earlier version of Sound Editor
was published on SoftDisk in 1994, IIRC, but this one is a little more
evolved. It also introduced 2:1 ADPCM compression of 8-bit sampled
sounds, to save disk space. It is a lossy compression, but not very
noticeably. The editor package also includes those routines, in 6502
assembly code.
All of this should be trivially adaptable to the stock, 1 MHz C64, with
very good results. By using the filters, you could probably filter out
the 11.025 KHz carrier and return to 6-bit accuracy!
I should note that in the Apple world, sampled sounds are usually
represented as "excess-128" codes, which means that the sign bit
is inverted. This actually simplifies things, since the sample value
is within a few shifts of being the pulse width in cycles.
Let me know what you think!
-michael
--
(Always great to hear from Atari and Apple ][ folks!)
And finally, I have a little mathematical analysis of PWM and how it compares
to a "straight" digi. Basically, I found some of the PWM explanations a
little unconvincing in issue #20 (even though I wrote them!). For example,
the idea of "average voltage" seems a little funny, since every two samples
has an "average voltage", as does every four, etc. but that set of average
voltages would give a different sounding signal than the original (or
more dramatically, there is an average voltage over a full second of digi
playback, but that's not what you hear!). So I wanted to know how a
PWM signal _really_ compares to a straight digi playback.
Another issue is changing the amplitude of a PWM digi, i.e. using two
pulse waveforms, with one 1/16 the value of the other, to get higher
resolution. If you recall the discussion of digis, the resolution of a PWM
digi depends on the number of pulse widths available, not the amplitude.
Adding two PWM waveforms together does not change the number of pulse widths
available, so I wanted to figure out what changing the amplitude _really_
does to a PWM digi, and if it can really be exploited.
And finally, I wanted to know about the carrier wave (that is so piercing
at lower playback frequencies) -- and once again, how it compares with a
standard digi (which, after all, is stair-stepping the voltages at the
playback rate).
Since the rest of this article is some Fourier analysis that 99% of people
will have zero interest in, I'll put the conclusions here. The first is:
PWM digis and standard digis are essentially identical except at higher
frequencies (except for a phase shift, which doesn't make any difference to
your ear). The second is: changing the amplitude of a PWM changes the
resolution. More specifically, the amplitude of the pulse multiplies the
digi sample value. If two pulses can be synced close enough, it should
indeed be possible to use two pulses to get a higher resolution. Moreover,
by modulating the amplitude of a single PWM digi, using the $d418 volume
register -- that is, using PWM _and_ $d418 -- it should be possible to get a
higher dynamic range, something that should be a little more achievable using
SID (but maybe not that useful, so I didn't try it out). And finally, a
standard digi has zero amplitude at the carrier frequency.
In other words, after a lot of effort I was able to demonstrate what everyone
already knows.
The analysis doesn't change anything from the previous articles (except
possibly the idea for changing the PWM amplitude to get more dynamic range).
And now, some Fourier analysis. A standard digi just sets the voltage to
the sample value s_j, for a length of time dt (dt = 1/sample rate). The
Fourier transform of a single sample s_j (occuring at time t_j) is
s_j [e^(-iw dt) - 1] * [e^(-iw t_j) / -iw]
where w = angular frequency. Since the above is a little hard to read, I'll
say it in words. The first term is the sample value s_j, which scales
amplitudes at all frequencies. The second term is due to the finite length
of the pulse (evaluating the Fourier integral at the boundaries), and
basically changes the phase of the transform. The third term is like
sin(w)/w -- a sinusoid with decreasing amplitude as frequency increases.
So: the transform goes like sin(w)/w times the sample value, with some phase
effects thrown in (we'll get back to these in a moment).
A PWM digi sets the duty cycle of a pulse to the sample value s_j, giving
a Fourier transform of
[e^(-iw s_j dt) - 1] * [e^(-iw t_j) / -iw]
Compare this with the earlier expression, and you'll see that the sample
value s_j has moved up in to the exponent of the "phase term" but that
they're otherwise the same.
The first thing to do is to show that both expressions, PWM and standard,
reduce to the same thing -- that is, that a PWM and a standard digi sound
the same! The expressions both decrease as 1/frequency, due to the
sin(w)/w term. This means that at large frequencies the values become
negligible. (How large? For example, if the sample frequency is just 1KHz,
then sin(w)/w is .001 times smaller near w=1KHz (i.e. the sample frequency,
which is twice the Nyquist limit) than it is near w=0).
So now consider the phase terms for small w. The Taylor expansion for e^x is
1 + x + x^2/2 + ...
We can therefore expand the "phase terms" as
regular: e^(-iw dt) - 1 = (1 - iw*dt + w^2 dt^2/2 + ...) - 1
= -iw*dt + O(w^2 dt^2)
pwm: e^(-iw s_j dt) - 1 = -iw*s_j*dt + O(w^2 dt^2)
where O(w^2 dt^2) is considered very small since w and dt are both small.
Substituting the above into the original expressions gives
s_j*iw*dt [e^(-iw t_j) / iw]
in both cases. That is, we have shown that for "small" frequencies -- more
specifically, for frequencies where (w^2*dt^2) is much smaller than (w*dt),
which is where w*dt<1, which is frequencies less than the sample frequency,
which is all frequencies of interest! -- PWM and standard digis are the same.
The explanation lies in the phase terms. Those "phase terms"
[e^(iw dt) - 1] (regular)
and
[e^(iw s_j dt) - 1] (PWM)
do more than just change the phase. When they multiply the sin(w)/w signal,
they take the sin(w)/w signal, change the phase, and then subtract the
sin(w)/w signal again. It's this difference of signals that makes things
work out at the frequencies we care about. PWM and standard digis are _not_
the same, but the main differences are at higher frequencies, where the
amplitudes are in general much smaller.
But... but... what about the PWM carrier frequency? If we take a constant
digi, say with sample values = 1/2, the standard digi gives a constant
voltage, whereas a PWM digi gives a square wave at the sample frequency.
The answer comes from the "phase terms" above. The sample frequency is
w = 2*pi/dt.
Substituting this into the phase terms gives
[e^(i*2*pi) - 1] (regular)
and
[e^(i s_j 2*pi) - 1] (PWM)
The regular expression is exactly zero -- there is _nothing_ at the
sample frequency of a regular digi. But that's not the case for the PWM
term, because of the s_j up in the exponent. PWM digis have a _finite_
amplitude at the carrier frequency. Note that because of the sin(w)/w
term it gets smaller as the sample frequency increases -- but it isn't zero.
Finally, the phase term expansions give some insight into what happens
when both the pulse width _and_ height are varied. If the pulse width
is s_j, and the height is set to h_j, then the Fourier transform becomes
h_j*s_j *iw*dt [e^(-iw t_j) / iw]
That is, the amplitude multiples the width. For the case of adding two
PWM waves together, then, the amplitude really does effectively scale the
sample value, and it should be possible to add one PWM value at 1/16 the
amplitude of another to get an effective 8-bit value.
What about _varying_ the amplitude of a single PWM sequence? For a 6-bit PWM
digi, say, the sample values s_j can go from 0 to 63. If this is then
multiplied by h_j=2 say, then the values become 0 2 4 ... 126 -- a 7-bit
number where the lowest bit is always 0. What use is that? Well, we still
have the h_j=1 values of 0..63, which do include the lowest bit. So we
can effectively change the dynamic range from 0..63 to 0..126 using just two
amplitude values.
As a practical matter, then, it might be possible to use all 15 $d018 values
available to get a big dynamic range, and hence a better sounding digi,
using fewer CPU cycles. Well, ok, we're only _sort of_ changing the dynamic
range, so I pretty much doubt the usefulness of it. But maybe someone out
there would like to give it a shot.
All right, let's hope this closes the book on pulse width modulation for
digi playback!
.......
....
..
. C=H 20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment