munshkr · November 13, 2024 10:17
diff --git a/The C64 Digi b/The C64 Digi
 <=============
  The C64 Digi
  =============> Robin Harbron <[email protected]>
 		   Levente Harsfalvi <[email protected]>
 		     Stephen Judd <[email protected]>

 Introduction
 ------------

 Digis -- digitally sampled audio -- are fairly common on the 64.  This is
 meant to be a comprehensive article on digis: how they work, examples,
 different playback methods on the 64 (volume register and Pulse Width
 Modulation), and some tricks.  We'll even show you how to play 6-bit and
 even 8-bit digis in high quality on a 64, which is really pretty neat to
 hear.

 The first part discusses digis from a fundamental point of view -- just
 what a digi is, acoustic signals, and things like that.  The most common
 method of playing digis is via the volume register at $d418, and the next
 two sections are devoted to this technique.  Section two discusses some
 SID fundamentals, and the reason why $d418 may be used for digis (and why
 later-model SIDs don't play digis correctly); Section three discusses
 $d418-digis from a software perspective: how to play them, tricks for
 improving them, how to boost digis on 8580 SIDs, and how to detect what
 kind of SID (6581 or 8580) is in the machine.  The fourth and final part
 of this article discusses pulse width modulation, and includes example source
 code and a binary that plays a true 7-bit digi at around 16KHz -- something
 which, we think, has never been done before.

 Without further ado...

 ===============
 Digis: Overview
 ===============

 	The whole point of playing a digi on a 64 is to provide something
 for your ear to hear.  So let's begin by discussing just what an acoustic
 signal is and how that relates to digis.

 	Probably everyone knows that "sound" is how your ear responds to
 changes in air pressure -- that is, when you clap your hands together,
 it compresses the air between your hands in a special way, and that
 higher pressure moves outwards into the surrounding air (since it's at
 lower pressure).  That pressure change propagates along and when it
 encounters your ear it causes the ear drums to move, causing three little
 bones to move, causing some fluid to move, causing tiny, exquisitely
 sensitive hairs to move, transmitting a signal that your brain converts
 to "sound".

 	An audio speaker also changes the air pressure in response to a
 signal.  If you take a coil of wire and change the voltage on it, it
 generates a magnetic field; if a magnet is placed inside the coil, the
 changing magnetic field will place a force on the magnet, causing it to
 move, causing some air to be pushed along, causing a change in pressure,
 causing a signal to propagate to your ear which your brain interprets as
 Van Halen.  All a stereo (CD player, etc.) does is send a varying voltage
 signal to the speaker.  As that voltage level goes up and down the magnet
 moves back and forth, and so the speaker converts that electrical energy
 into an accoustic wave.

 	For us, the trick is to coax SID into sending a specific voltage
 signal to the speaker, the way a stereo or CD player might.  And a CD player
 is of course a very apt comparison, since it is itself a digi player.

 	Just for reference, a really good pair of ears can hear signals from
 around 20Hz to 22KHz, with the sensitivity dropping considerably outside
 of around 100Hz to 10KHz.  A CD player has a playback rate of 44KHz, and
 the highest frequency SID can generate from the frequency registers is
 around 4KHz.  If you've ever set SID to maximum frequency and heard just
 how high 4KHz is, you can appreciate that even 10KHz is _really_ high, and
 actually quite difficult to hear.  In human speech, most of the information
 content of vowel sounds is contained in the range 300Hz - 3KHz, and above
 around 1KHz for consonant sounds; most information in musical sounds is in
 the range 100Hz - 3KHz.

 Discrete Sampling

 	To understand digis a little better, consider the more general
 case of a discretely sampled signal -- a continuous signal sampled at
 discrete time intervals.  Let's say we had some device producing a
 _continuous_ sinusoidal signal in time:

                     
         *  *
      *        *
    *            *
   *              *
  *                *
 *                  *                      *
 *                    *                    *                
                      *                  *
                       *                *
                        *              *
                         *            *
                           *        *
                              *  *
 -----------------------------------------------> time

 (yes, I did miss my calling as an ASCII artist)

 To turn the signal into a _discrete_ signal, we simply sample the
 signal at discrete intervals of time.  For example, let's say the above
 signal lasts one second, and is input into a device which measures the
 value every 1/4-second.  The device will spit out four numbers: 0, 1, 0,
 and -1:

          *


 *                   *


                              *

 The sampling frequency here is four samples per second -- 4 Hz.  If we were
 to then play back this signal at the sampling frequency, we'd get a signal
 like

          **********


 **********          **********


                              **********

 So one thing sampling does is to "staircase" a signal -- the sample becomes
 some sort of "average" value over the sample period.  Increasing the sample
 rate -- taking more samples per second -- will smooth things out, and the
 sampled signal will look (and sound!) more like the original signal.

 Now let's say we just took two samples in that one second -- 2 Hz sampling
 rate -- and just happened to catch the signal at its maximum and minimum
 values (the peak and trough).  Upon playback, the signal would look like

 *********************
                    *
                    *
                    *
                    *
                    *
                    *
  		    *********************

 That is, a square (pulse) wave.  If you're on the ball, you've noticed
 that the frequency of the new signal is 1 Hz -- exactly half the sampling
 frequency.  This is also called the Nyquist frequency.  In general, the
 _maximum_ frequency that can be captured in a discrete sample (called the
 Nyquist critical frequency) is half the sampling frequency -- as you can
 see above, it takes two data points to get a single (nonzero) frequency.
 So, for example, the highest frequency a CD player -- which has a sampling/
 playback rate of 44KHz -- can capture is 22KHz, well above the range of
 normal human hearing.

 Thus, increasing the sample rate increases the frequency range captured
 in the discrete signal.  This is why a digi at a high sample rate in general
 sounds better than a digi sampled at a low sample rate.

 BUT -- there is more to life than sample rate: there is also sample
 resolution.  The sample resolution -- 4-bit samples, 8-bit samples, etc. --
 determines how accurately the sample measures the actual signal.  For
 example, let's say we sample sin(x) when x=0.5:

 	sin(0.5) = 0.4794255...

 No matter what sample resolution we use, there will always be some error
 in the measurement, and the _true_ value of the sample will be the
 _measured_ value plus some error.

 In general the sampling errors are random and uniformly distributed, so
 the sampled signal corresponds to the original signal plus some noise (the
 random errors).  That is why you almost always hear some sort of hiss on
 a normal C64 digi, which uses a resolution of 4 bits per sample.

 So, increasing the sample _resolution_ decreases the amount of noise introduced
 into the sampled signal (and increases the dynamic range), and increasing the
 sample _rate_ increases the frequency range.


 If you're _really_ on the ball, you've noticed that the 1-Hz square pulse
 above actually contains frequencies higher than 1Hz, simply because a
 square pulse contains higher harmonics in addition to the 1Hz fundamental
 frequency.  And you've also no doubt realized that the sampled pulse wave
 would sound different than the original sine wave (due, of course, to the
 added harmonics) -- it's at the right frequency, but it will sound like a
 pulse wave instead of a sinusoid.

 Have we somehow broken the Nyquist limit?

 The answer is no, because of a nifty thing called the Discrete Sampling
 Theorem, which says that, given the samples h_n of a bandwidth-limited
 function h(t), the original function h(t) is given by

 	h(t) = dt * Sum{ h_n * sin(2*pi*f_c*(t-n*dt)) / (pi*(t-n*dt)) }

 where dt is the sampling period and f_c is the cutoff/critical frequency.

 What this means is that the original signal can be _reconstructed_ from the
 discrete samples, not that it is _equivalent_ to the discrete samples.
 The Nyquist limit is the highest frequency that can be _reconstructed_ from
 the discrete samples, not the highest frequency that will be produced if you
 "staircase" the discrete samples through a speaker.  If the original
 signal is bandwidth-limited, and there are at least two samples for the
 highest frequency, then the signal can be completely reconstructed.

 Since a "normal" digi contains all these extra frequencies, shouldn't a digi
 sound "different" than a "true" analog signal?  Sure.  On the other hand, many
 of the extra frequencies are beyond the range of human hearing, and the rest
 can often be removed using a filter -- all CD players filter the output, for
 example.  So sometimes it is worthwhile to turn on a low/band pass filter
 when playing a C64 digi, especially at lower sample rates.

 And that more or less summarizes basic discrete sampling theory.


 =============
 D418 Playback -- Hardware
 =============

 The SID contains both analog and digital subparts on one silicon plate -- in
 other words, it is a mixed signal device.

 At the time, the SID was certainly the best of the microcomputer sound chips.
 This may be mostly due to its mixed signal design, which the designers used
 to solve certain problems.

 The hard thing in a sound generator design is to implement waveforms, volume
 control, and mixing. Things like that don't really fit into the digital
 'either 0 or 1' philosophy, unless lot of data bits and arithmetic functions
 are involved. In a fully digital sound chip, the waveforms could be generated
 by ROM lookup tables. The mixing function could be derived from binary
 addition, while the volume control from division or multiplication. Unless the
 sound functionality is greatly simplified, the arithmetic functions must be
 present and they must be implemented in hardware. Finally, the D/A conversion
 could be done by (fast) pulse width modulation just at the output stage.
 (Most of today's wavetable sound cards operate like this).

 This method implies heavy arithmetic hardware, which was not an option for
 designers back then. Still, most sound chips were fully digital, and all
 suffer from the required compromises (i.e. generating square waves only,
 no dedicated channel volume control, etc. - both TED and the VIC-I are obvious
 examples).

 The solution that one finds in the SID design is very straightforward: mixing
 and variable volume level is problematic in a digital circuit when dealing
 with waveforms, so simply avoid doing it. In the SID, only the microcomputer
 interface, the registers, the oscillators (phase accumulating oscillators),
 and other controller logic are digital; the mixing and volume control parts
 are fully analog. There are digital to analog converters providing analog
 voltage levels from the digital state variables. The SID D/As are in fact
 'multiplying' D/As, having an analog input (AIN), an input base voltage
 (IBASE), and a digital input. They operate by amplifying the input voltage
 offset (AIN-IBASE) by a factor proportional to the number on the digital input
 and adding this offset back to the base level.

 This mixed signal design also allowed some other features to be implemented. 
 The most important one is the analog filter (that is, a two integrator loop,
 bi-quadratic filter, according to Yannes). With that, the SID points beyond a
 home computer sound chip - it is a true analog subtractive synth (marketing as
 such was cancelled because of manufacturing capacity reasons).

 Here is a detailed map on the SID inners (analog path; probably my most
 beautiful ASCII ever :-D). Info can be found in the SID patents (US 4,677,890;
 1986), the MOS 6581 technical document (can be found somewhere on the Net), or
 the back of the Programmer's Reference Guide (PRG).


                -----------------  11bit  ------------
                |Cutoff freq reg|-------->|Cutoff D/A|---------o
                -----------------         ------------         |
                    $d415-16                                   |
                                                               |
                -----------------  4bit   ------------         |
                |Resonance reg. |-------->|Reson. D/A |-o      |
                -----------------         ------------- |      |
                    $d417.[4-7]                         |      |
                                                        |      |
                                 =0                     v      v
 -----------    -----------     >o------------>|    ------------------
 |wave D/A |--->|env. D/A |-->o/               |    |                |
 -----------    -----------      o--->|        |    |                |
     ^              ^          ^ =1  o--------|--->|                |
     |12bit         |8bit      |     |        |    |                |
     |              |          |     |        |    |                |
 -----------    -----------     |     |        |    |                |
 |OSC1 +   |    |ADSR cnt+|     |     |        |    |                |
 |wave sel.|    |env. log.|     |     |        |    |                |
 -----------    -----------     |     |        |    |                |
 $d400-03,      $d405-06,    $d417.0  |        |    |                |
 $d404.[1-7]    $d404.0               |        |    |     FILTER     |
                                     |        |    |                |
                                 =0  |        |    |                |
 -----------    -----------     >o----|------->|    |                |
 |wave D/A |--->|env. D/A |-->o/      |        |    |                |
 -----------    -----------      o--->|        |    |                |
     ^              ^          ^ =1  |        |    |                |
     |12bit         |8bit      |     |        |    |                |
     |              |          |     |        |    |                |
 -----------    -----------     |     |        |    |                |
 |OSC2 +   |    |ADSR cnt+|     |     |        |    |                |
 |wave sel.|    |env. log.|     |     |        |    |                |
 -----------    -----------     |     |        |    |                |
 $d407-0a,      $d40c-0d,    $d417.1  |        |    |                |
 $d40b.[1-7]    $d40b.0               |        |    | LP   BP   HP   |
                                     |     =0 |    ------------------
                                 =0  |    >o->|       |    |    |
 -----------    -----------     >o----|--o/    |   *** o    o    o  
 |wave D/A |--->|env. D/A |-->o/      |     o- |      /    /    /
 -----------    -----------      o--->|  ^  =1 |   =0 V    V    V   =1
     ^              ^          ^ =1  |  |     |      o o  o o  o o
     |12bit         |8bit      |     |  |     |      | |  | |  | |
     |              |          |     |$d418.7 |<-------o    |    |
 -----------    -----------     |     |        |<------------o    |
 |OSC3 +   |    |ADSR cnt+|     |     |        |<-----------------o
 |wave sel.|    |env. log.|     |     |        |
 -----------    -----------     |     |        |
 $d40e-11,      $d413-14,    $d417.2  |        |
 $d412.[1-7]    $d412.0               |        |    -----------------
                                     |        |    | Master volume |AUDIO
                                 =0  |        o--->|     D/A       |----->
                               >o----|------->|    ----------------- OUT
 EXT IN --------------------->o/      |                     ^
                                o--->|                     |4bit
                               ^ =1           ^            |
                               |              |       $d418.[0-3]
                            $d417.3  ^        |
                                     |        |
                    Analog mixing ---|--------|

 ***: Filter type select switches, $d418.[4-6] respectively


 $d418 digis
 -----------

 The most common method of playing a digi is to use the register at $d418.
 When someone plays a digi using the master volume register, the situation is
 similar to the waveform D/A converters. Both D/As are multiplying D/As --
 signal amplifiers whose amplification is proportional to the input digital
 number. If there is a nonzero signal offset on the D/A input it will be
 multiplied proportionally by this number.

 Playing digis with $d418 is possible because there is indeed a relatively
 large DC voltage offset on the master volume D/A. This offset is present
 right from the moment when the SID is powered up.

 Where can this DC offset come from?

 There is a mixer before the master volume D/A (see figure). If there's a DC
 offset on the D/A input, it must come from there. ...And going further,
 the DC offset on the mixer must also come from somewhere.  But where?

 Signals come from the three ADSR volume D/As, the EXTIN line, and the three
 outputs of the filter. Fortunately, all paths that go to the mixer have analog
 switches (all paths can be disconnected from the mixer individually, if that's
 needed).

 The above analog switches are driven by the filter selection bits ($d417 bits
 0-3), the voice 3 off bit ($d418 bit 7) and the filter type selection bits
 ($d418 bits 4-6).

 After a reset, the filter selector bits are all 0 (all signals are routed
 towards the master mixer), the 'voice 3 off' switch is on, and the filter type
 selector bits are 0 (filter outputs are unconnected). In this state, only
 EXTIN and the three SID voice signals are present on the mixer. EXTIN can
 be eliminated as the source since it has no DC offset (as long as the computer
 was not hacked, see notes on the 8580).

 The ADSR volume D/A is similar to the previously mentioned multiplying D/As.
 If the digital number on the input is 0, the input analog signal offset can't
 pass through (as measurements verify). This is the case when SID is reset,
 setting the envelope counters to zero.  Therefore, nothing behind the ADSR
 multiplying D/As can have any effect on the DC offset of the mixer.

 So, the DC offset must come from the ADSR multiplying D/As. Another
 measurement shows that even the mixer itself has a small DC offset.


 Tests and results
 -----------------

 I did some tests that support this theory. They were done 'by hand', by simply
 using a digital voltmeter + the FC3 monitor.

 The chip was a 6581(R1), 0883, Hong Kong (an early 6581).

 When turned on the voltage on the AUDIO OUT was about 5.5 volts (slowly
 decreasing as it warmed up, stopping at about 5.43 after some 10 mins - all
 subsequent tests were done after this time period).

 Writing $0f to $d418 raised the output voltage to 6.15 volts. Therefore, the
 maximum output amplitude that can be achieved when playing digis is 0.72 volts
 in this 'mode' (without wiggling any other SID settings to achieve higher
 voltage levels) -- remember that what counts is the maximum voltage
 _difference_, not the maximum absolute voltage.

 The next test is to determine if the mixer has its own DC offset (with all
 possible paths are disconnected). It's possible to do. With the volume at
 maximum (to maximize any effect), all voices are routed towards the filter
 ($d417 = $0f), while making sure that the filter outputs are not routed to the
 mixer ($d418 = $0f). In this state no paths can drive the mixer. The result
 is 5.39 volts. When the volume changes, the output also changes towards the
 previous 5.43 volts --> there is a (very small) DC offset from just the mixer.

 What could be the DC offset value of each individual SID voice (i.e. the base
 level difference of the multiplying D/As)?  Doing the above, but leaving one
 voice routed to the mixer ($d417 = $0e, $0d or $0b) gives 5.69 volts.
 5.69-5.43 = 0.26 volts, and 5.43 + 3*0.26 = 6.21, almost 6.15 volts.

 To determine if the ADSR multiplying D/As act as expected, I used pulse
 waves with zero frequency and 0 or $fff pulse width (two cases), to make the
 input signal of the ADSR multiplying D/A the minimum and maximum possible
 level. After careful checking, the output changed a few hundredth volts
 (about 0.01 volt per voice).  So the D/A doesn't close up completely, but
 it's still O.K.

 To prove that these offsets are equal for all voices, I did another test. Some
 people know that the filter inverts phase (multiplies the input signal by -1).
 Machine is reset, $d417 = $01, $d418 = $9f. (Voice 1 is routed through the
 filter, voice 3 is cut off from the mixer completely ($d418.7), low pass
 filter is selected, volume = $0f). The output voltage was 5.41 volts, just
 very slightly below the "default" output level. This means that the DC of
 voice2 + (-1*) DC of voice 1 resulted in about 0 relative offset. Doing
 similar tests proved that the DC offsets for the voices match each other
 almost exactly (within a few hundredths of a volt).

 These measurements all support the idea that the DC offset comes from the
 ADSR multiplying D/As, that the offset is mostly independent from the waveform
 D/A converters (as long as sustain levels are 0), and that the offsets are
 equal for all voices. In addition, a small DC offset is supplied by the master
 signal mixer itself.

 What if we try different sustain settings? For this test, set the volume to
 maximum, as usual. Set the sustain level to $0f for all voices ($d406, $d40d,
 $d414 = $f0). Start the attack, but with no waveform selected ($d404, $d40b,
 $d412 = 01). The output level is now 5.21 volts, a little bit below the '0'
 offset of the audio output! (Doing the test with just one voice (all 
 others disconnected), the output is 5.29 volts).

 Finally, we can do some experiments with the pulse waveform.  The pulse
 waveform is useful for these tests, since at zero frequency we can set both
 the minimum and the maximum constant DC levels at the voice D/A just by using
 the pulse width registers. Reset the computer. Set voice 1 to zero frequency,
 pulse level $0fff, sustain level 15, and $d404=$41 (pulse waveform + gate on).
 Route only voice 1 through the mixer ($d417 = $0e). The output voltage is
 similar to the test when no waveform was selected -- 5.29 volts! This seems to
 show that "waveform accu = $0fff" is the same as when no waveform is selected
 (i.e. the waveform D/A digital input pins are pulled high when they're not
 driven, as seen in most other NMOS chips).

 When the pulse width is 0 in the above test the output changes to 6.34 volts.
 This seems to be strange (a multiplying D/A giving higher signal level for
 multiplying something by 0).

 Now, when the ADSR multiplying D/A is closed, the output is 5.70 volts. When
 it's fully open, the output changes from 5.29 volts (wave acc= $fff) to 6.34
 volts (wave acc = 0). One reasonable answer is that the base voltage of the
 waveform D/A is higher, and the analog input is tied lower than the base
 voltage of the ADSR D/A -- the effect is that the SID waveforms will lie
 'around' the ADSR multiplying D/A base voltage, more or less symmetrically.

 This was surely done intentionally, to reduce absolute voltage levels (for
 linearity).  In the 6581, the big DC offset is probably a result of having the
 ADSR D/As and the master volume D/A at different base levels (the difference
 appears as true DC offset on the master volume D/A). If both were the same
 (presumably at VDD/2), and the waveform D/A parameters were selected similarly
 (operation is symmetric to VDD/2), there would be no final DC offset at all.
 Rather like the 8580...


 Other issues
 ------------

 So now we know why $d418 digis are possible - but still, there are some things
 to note.

 The DC offset on the master volume D/A changes with different SID settings,
 and whatever affects the DC offset on the mixer will affect the digi
 volume. For example, even the filter output signals have a small DC offset.
 Just do a test - set the volume to 0f, then simply turn one filter route on
 (for example, $d418 = $1f). You'll hear a small click (i.e. a small DC offset
 change on the mixer), even if the filter has no input.

 Moreover, as seen above, the DC offset can be eliminated completely (just by
 SID register settings), leading to no audible digi sound at the output.  In
 other words, whatever affects the DC offset on the mixer _will_ affect the
 digi volume.

 One place where this is important is playing a digi with a tune: there's a
 constantly changing signal going to the mixer instead of a constant DC offset,
 so playing a digi on the master volume also causes distortion for both the SID
 voices and the digi sound (since they're cross modulated). To reduce this
 effect most 3+1 like SID + digi players play samples by writing 8-offset sample
 values to $d418 (ie. adding 8 to 3-bit sample values and writing this to $d418
 - see players used by Jeroen Tel and other famous composers using digi). This
 trick reduces the modulating effect while still maintaining good digi volume.

 The DC offsets used to create awful clicking sometimes. For example, the
 filter inverts phase. If the filter is currently routed to the mixer, there'll
 be a large 'click' (2 times the DC offset) when a voice is on and its routing
 is changed to or from the filter.


 The 8580
 --------

 This is a completely redesigned chip. I don't know details, but it was
 probably redesigned by the time all other chips in the C64 were done for CSGs
 new manufacturing technology and the C64c. It is a 'better' chip from the
 technical side (but in my opinion it sounds crude in comparison to the 6581,
 at least the R4 series). The 6581 was designed in months. Bob Yannes had to do
 everything from scratch and use the manufacturing technology MOS currently had
 (NMOS). And it shows.  First, it has high background noise. The DC offsets are
 really also a misfeature. The D/A converters are sometimes non-monotonic (at
 least, the waveform D/As and the filter cutoff D/A have some drops at the
 change of the most significant bits). The op-amps in the active (resonant)
 filter are simple, linearized NMOS inverters ;-) (loopbacked, they act like
 more or less linear op-amplifiers around VDD/2). And I still haven't mentioned
 bugs in the digital side (ADSR envelope bugs). Because of the above, one
 probably won't find two identical 6581 chips -- each sounds a little bit
 different (mostly due to the filter). Since the active components of the
 filter are far from ideal, the filter is strongly nonlinear (the cutoff curve
 changes with signal amplitude). On the other hand, these things are what make
 the SID sound so unique.

 Most of the problems were fixed in the 8580. It has much less background
 noise. The chips sound the same (there are hardly any differences between
 different 8580s). Most of the DC offset issues (the clicks) were elminated. It
 needs less power, and lower VDD level. Something was changed also in the
 digital logic, but the ADSR part was not touched. The 'combined' waveforms are
 a bit different (and more useable from the musician's point of view).

 The clicks were reduced, which means that there is no (or no significant) DC
 offset on the master volume D/A in the 8580.

 (I have not done any measurements, but after listening to a lot of 3+1 channel
 type musics, I have a strong suspicion that even if sounds are turned on, the
 average DC offset on the master volume D/A is still minimal).

 To fix this in software, you'll have to wait until the next section of this
 article.

 To fix this in hardware, people use a simple hack: take a resistor of about
 330k and tie the SID EXTIN line to GND through that (directly, beside the
 chip, on the mainboard).

 The EXTIN line goes directly to the mixer, and thus the master volume D/A, or
 can also be routed through the filter. In either case, unless the filter is
 disconnected, the above hack will give a pretty large DC offset, similar to
 the original 6581s. So, digi sounds can be played :-) (even with SID music
 playing simultaneously, similar to the 6581).

 This solution is good as a work around, but there's one thing to note: this is
 not completely the same as the 6581 ADSR D/A offset voltage. At least, this
 offset is negative (should that pin rather be tied to VDD?). Programs that
 depend on the 6581s way of DC offsets will not work correctly (but I know of
 very few such programs, so at worst you'll experience slightly different digi
 sound only occasionally -- but hey, the 8580 sounds different anyway). Another
 problem is that when EXTIN is routed through the filter the DC offset may
 cause strong distortion since the DC operating point of the filter is changed
 -- bad news if the 'semi-linear' amplifiers in the filter are picky about
 absolute DC level.  Some music (not neccessarily involved with digis) indeed
 do route EXTIN through the filter, for noise reduction on older C64s (with the
 earlier C64 mainboards that pick up lots of 'digital' background noise from
 EXTIN). DC distortion can also occur occasionally for the same reason but on
 the master volume D/A (the higher the difference from VDD/2, the greater the
 risk of experiencing nonlinearity and clipping distortion).


 Some final words
 ----------------

 A lot of this information comes from Dag Lem, who is certainly the No. 1 SID
 hacker for me ;-). Take a look at reSID, his SID emulator library (the sources
 can be downloaded from somewhere). reSID contains so much reverse engineered
 information of the real SID that you won't believe it -- check it out if
 you're interested.


 =============
 D418 Playback -- Software
 =============

 $D418 digis are by far the most common playback method.  The volume register
 gives 16 different amplitudes (0-15), and so can provide 4-bit digi playback.

 In its most basic form, this is an extremely easy routine to code.  Simply
 load each 4-bit sample, and store it in the volume register ($d418).
 Assuming $fd/$fe are pointing to the beginning of a series of samples, the
 following code will play it back:

 	 ldy #0

 :loop	 lda ($fd),y
 	 sta $d418

 	 ldx #5 ;some delay value
 :delay	 dex
 	 bne :delay

 	 iny
 	 bne :loop

 	 inc $fe
 	 jmp :loop

 The ldx #5 would have to be adjusted depending on the speed of the
 sample - the lower this number (not including zero) the faster the
 sample will play back.

 There are a number of improvements we could make to this code - first
 of all, this method takes twice as much RAM to store the sample as is
 necessary.  Because we're dealing with 4-bit samples, we can store 2
 samples in each byte.  This can be handled simply by alternately
 masking out the high bits (with AND #15) to play the sample stored in
 the low nybble, and by shifting the high nybble down to the low nybble
 to play the high nybble (LSR : LSR : LSR : LSR).  A lookup table may also
 be used to save processor cycles (but use more RAM).

 Another improvement is to move the routine to zero-page, and use self-
 modifying code.  In general, this results in the fastest digi players.

 We should of course have the routine check for the end of the sample --
 typically just checking the high byte of the zero-page pointer is enough
 (in this case, checking $fe).  Typically digis are page aligned anyway, so
 just zeroing out the unused part (if any) of the last page is fine.

 Finally, it is often important that each sample of a digi is played back
 at regular intervals.  If the samples aren't played at a steady speed,
 extra distortion is audible.  In the example above, playback is steady
 for a full page (256) of samples - but several extra cycles are added
 by incrementing the zero-page pointer to the digi.  The situation
 worsens when we start adding extra code to check for the end of the
 digi, and even the main loop starts getting irregular when we add the
 code for the simple form of packing discussed earlier (2 4-bit samples
 per byte). 

 These problems can be solved by careful cycle counting and adding NOP
 and harmless BIT instructions in strategic places to make each
 iteration the same number of cycles, regardless of which branch is
 taken - people who have written a stable raster routine, or done some
 Atari 2600 coding have likely done this sort of painstaking work
 before.


 NMI-driven digis
 ----------------

 More commonly, however, we enlist the help of CIA #2 and have it
 generate regular Non-Maskable Interrupts which we use to call our digi
 player.  This has two important advantages - first, it makes timing
 much more simple.  Second, it frees your main program to do other
 things while the digi is playing "in the background".

 To experiment, I pulled a 4-bit packed digi from the extras disk
 included with Super Snapshot 5.22.  It's the beginning seconds of the
 introduction to Classic Star Trek (Space, the Final Frontier).

 Here's the source for a fairly "frills-free" NMI based digi player,
 with my comments after blocks of code:

 start    = $1400
 end      = $7cff
 freq     = 141
 ptr      = $fd

 Labels start and end simply point to the beginning and end of the
 digi.  Freq isn't actually the frequency - it's the number of
 processor cycles between interrupts necessary to play the digi at the
 desired speed/pitch.  If you know the frequency (in hz) of the digi,
 simply divide your CIA clock speed (approximately 1000000 hz) by the
 digi frequency.  In this case, the digi runs at approximately 7100 hz.

 We use two zero page locations to form a 16-bit pointer to the current
 sample in the digi to play.

         *= $1000

         ;disable interrupts
         lda #$7f
         sta $dc0d
         sta $dd0d
         lda $dc0d
         lda $dd0d
         sei

 This code simply disables interrupts and initializes both CIA timers.

         ;blank screen
         lda $d011
         and #255-16
         sta $d011

 Just like erratically timed code can introduce distortion when a digi is
 played back, the VIC steals cycles from the processor that can cause
 interrupts to not occur precisely when you'd like them to.  This routine will
 work without the screen blanked, but the extra noise introduced when the
 screen is on is noticeable when the time between samples is less than around
 2.5-3 times the time the processor is stopped.  Another option is to use some
 multiple of the raster timing as the sampling rate, and start the routine on a
 non-badline, to ensure that the interrupts never occur on a badline.  (A final
 option is to use a raster-driven interrupt for the digi; with the SCPU, it is
 actually possible to drive an IFLI display and play a digi at the same time,
 badlines and all -- email Robin for more info, or maybe wait for a future
 article!).  But the simplest thing to do is to blank the screen :).

         ;switch out roms
         lda #$35
         sta 1

         ;point to our player routine
         lda #<nmi
         sta $fffa
         lda #>nmi
         sta $fffb

 Unless using the KERNAL routines is necessary in my program, I always
 switch out the ROMs.  One of the biggest benefits is that our NMI
 routine will be immediately called, rather than using $0318/$0319 and
 waiting for the KERNAL to indirectly call your routine.

         ;initialize player
         lda #<start
         sta ptr
         lda #>start
         sta ptr+1

         ldy #0
         sty flag
         lda (ptr),y
         sta sample

 This section simply initializes the various memory locations that the
 player uses - sets ptr/ptr+1 to point to the beginning of the digi,
 loads the first sample, and clears the flag that handles the
 alternating between the lower and upper nybble of the packed samples.

         ;setup CIA #2
         lda #<freq
         sta $dd04
         lda #>freq
         sta $dd05

 Sets Timer A on CIA #2 to freq.  

         lda #%10000001
         sta $dd0d

 Enables Timer A interrupts on CIA #2.

         lda #%00010001
         sta $dd0e

 Sets Timer A to run in continuous mode.  As soon as Timer A counts
 down to zero, it will automatically be reloaded to the last writes to
 $dd04/$dd05 and begin counting down again.

 endless  jmp endless

 For this example, we just put the computer in an endless loop.

 nmi
         pha
         txa
         pha
         tya
         pha

         ;play 4-bit sample
         lda sample
         and #15
         sta $d418

 We play the sample while all the code is still linear - before any
 branches have occurred.  This is to minimize the distorting effects I
 mentioned earlier.  The AND #15 is used so we don't inadvertently
 enable the filter bits in $d418 with the high nybble packed into
 sample.

         ;clear NMI source
         lda $dd0d

 By reading $dd0d, we are acknowledging the source of the interrupt,
 and the CIA will now generate another interrupt next time Timer A
 counts down to zero.

         ;just something to look at
         inc $d020

         ;every other NMI do 1) or 2):
         lda flag
         bne lower

 Now we deal with "unpacking" the samples.

         ;1) shift upper nybble down
 upper    lda sample
         lsr a
         lsr a
         lsr a
         lsr a
         sta sample
         jmp exit

 When flag is set to zero, we shift the high nybble of sample down to
 the low nybble so it's ready to be played next NMI.

         ;2) get a new packed sample
         ;   then point to next
 lower    ldy #0
         lda (ptr),y
         sta sample
         inc ptr
         bne checkend
         inc ptr+1

 When flag is set to one, we load a new packed sample into sample, and
 point ptr at the next packed sample.

         ;if end of sample, point to
         ;beginning again
 checkend lda ptr
         cmp #<end
         bne exit
         lda ptr+1
         cmp #>end
         bne exit

         lda #<start
         sta ptr
         lda #>start
         sta ptr+1

 Simply check for the end of the digi, and if we've reached it, loop
 back to the beginning of the digi.

         ;toggle flag and exit NMI
 exit     lda flag
         eor #1
         sta flag

         pla
         tay
         pla
         tax
         pla
         rti

         ;sample's lower nybble holds
         ;the 4-bit sample to played
         ;next NMI - the upper nybble
         ;holds the next nybble to be
         ;played on "odd" NMIs, and is
         ;undefined on "even" NMIs.
 sample   .byte 0

         ;flag simply toggles between 0
         ;and 1 - used to decide whether
         ;to play upper or lower nybble
 flag     .byte 0


 Improving D418 Digis
 --------------------

 D418 digis tend to generate a lot of noise, because, of course, the 4-bit
 sample resolution.  Over the years people have come up with numerous tricks to
 improve the sound of a d418 digi; here are some that we know of and have tried.

 The first, and most obvious, thing to do is to use the low-pass filter, since
 a lot of the noise is at higher frequencies.  Unfortunately this won't work,
 since the filters occur in SID before the volume amplifier -- all the filters
 can do is change the DC offset that makes the digi possible.  This trick
 will work for methods that use SID voices, however (such as Pulse Width
 Modulation, discussed in the next section).

 Another trick is to "dither" the sound, as discussed in C=Hacking #11.  The
 idea here is to generate an intermediate "average" value by toggling between
 two values.  For example, if d418 is set to '8' half of the time, and '9' the
 other half, its 'average' value will be 8.5.  So this is somewhat like adding
 an extra bit of resolution.  In principle, you can extend this further: if it
 is '8' one-third of the time and '9' for the remaining two-thirds, the average
 value will be 8.66.  And so on.

 Now, we aren't _really_ increasing the sample resolution here, but are instead
 increasing the sample playback rate -- we're playing two samples ('8' and '9'
 for example) where before we played just one.  Don't get too carried away
 thinking about "average" voltage levels (after all, there is an average
 voltage for the entire digi but that's not what you hear!) -- what's important
 is how well the sampled signal represents the original signal.  If the
 original signal is rising from 8 to 9 during the sample interval, this type
 of trick will work well.

 Which leads us to another trick: interpolation.  This is really a compression
 trick, more than a 'resolution' trick.  Let's say that one sample value is 5,
 and the next value is 9.  It might be reasonable to expect an 'intermediate'
 value of 7, to play right after the 5.  Once again, the idea is to increase
 the playback rate to better-represent the original signal.  This type of trick
 increases the playback rate without increasing the amount of data -- and as
 always, your mileage may vary.  Many modern soundcards and CD-players use
 interpolation.

 Another curious trick is to add noise to the signal -- that is, the 4-bit
 sample corresponds to the original signal plus noise.  Sometimes, by adding
 noise to the signal playback the noise can actually cancel!  The 'dithering'
 trick above can be viewed in this way.


 Boosting 8580 Digis
 -------------------

 As most people know, there are 'old' SIDs (6581) and 'new' SIDs (8580), and
 $d418 digis do not work right on 8580 SIDs, (such as in the 128D, most 128s,
 and the 64C) for the reasons discussed earlier -- the 8580 does not have a
 residual voltage leading into the amplitude modulator.

 The software fix for this is pretty simple: have SID generate a signal, and
 hence a voltage, for the volume register to modify.  You can actually use
 pretty much any waveform to do this, but a pulse is the simplest, since a
 pulse wave just toggles between two voltage levels.  Moreover, page 463 of 
 the PRG says, "The TEST bit, when set to a one, resets and locks Oscillator 1
 at zero until the TEST bit is cleared.  The Noise waveform output of
 Oscillator 1 is also reset and the Pulse waveform output is held at a DC
 level."  So it's not really necessary to worry about the frequency or pulse
 width, by using the test bit.

 BUT -- it is very important to set the sustain level to $f.  The ASDR envelope
 generators generate the voltage.  A sustain level of 0 gives no improvement.

 So, to 'boost' a digi on a later-model SID, you can just turn on a pulse with
 the test bit set:

 	LDA #$FF
 	STA $D406
 	LDA #$49
 	STA $D404

 Setting more voices gives the digi a substantial extra boost:

 	LDA #$FF
 	STA $D406
 	STA $D406+7
 	STA $D406+14
 	LDA #$49
 	STA $D404
 	STA $D404+7
 	STA $D404+14

 The moral is: if you're writing a digi routine, and want it to work on all
 computers, be sure to boost the digi.

 And for completeness, using more channels is a commonly used trick to enhance
 digi resolution on the Plus/4. The TED digi resolution (the volume register)
 is 3 bits.  Fortunately, all channel on/off bits + the volume level are in the
 same register ($ff11). If one source is on, the output DC is about half of the
 level when both are turned on. This trick can be extended further to results
 in a 'semi 4-bit' or 5-bit digi table (the dynamic range is enhanced, but
 there are larger steps at the table end than at the start).  This trick could
 also be used in SID if the sound sources were accurately preset, but runs into
 problems due to the non-matching SID-versions and having the control bits in
 multiple registers.


 SID Type Auto-Detect
 --------------------

 The following routine will detect what type of SID is in use.  I've
 tested it on a fair cross-section of my collection of computers - my
 NTSC 128D, two 64Cs, two "breadbox" C-64s, and my PAL breadbox 64.  In
 all cases the code performed 100% accurately - but still, there may be
 cases where it fails.  I'd be interested to know if anyone finds any
 faults in the routine, so I can improve it!

 How does the routine work?  I was told that the old SID (6581) and the
 new SID (8580) behave differently when set to play combined
 waveforms.  I coded a fairly simple routine to use the REU to sample
 $d41b (the upper 8 bits of Oscillator 3's waveform output) for a full
 64k bank.  Then I experimented with various frequencies and
 combinations of waveforms on Oscillator 3 until I found consistently
 different results with the two different SIDs.

 When I combined the triangle and sawtooth waveforms and then sampled
 $d41b I found that most of the time the oscillator was just putting
 out zeros, with occasional bursts of numbers.  These "bursts" were
 consistently near $ff on the 8580, while the 6581 was always well
 below $80 - often $3f was the highest it would get.

 So, the detection code ended up being quite simple - I'll explain each
 block of code:


         *= $4000

 start    sei
         lda #11
         sta $d011

 Disable bad-lines (by blanking the screen).  This prevents badlines
 from interfering with the detection process.

         ;sid setup here!
         lda #$20
         sta $d40e
         sta $d40f

 Set Oscillator 3's Frequency Control to $2020.  I just randomly chose
 this value when experimenting, and it worked, so I kept it.  The trick
 here is to set a value fast enough that the oscillator will make a
 number of cycles (so we can get a good sample of the values coming
 out) but not so fast that it might miss any of the "bursts" I was
 mentioning earlier.

         lda #%00110001
         sta $d412

 Combine the triangle and sawtooth waveforms and start the ADSR cycle.

         ldx #0
         stx high

 loop     lda $d41b
         cmp high
         bcc ahead
         sta high
 ahead    dex
         bne loop

 This loop takes 256 samples of Oscillator 3's output, saving the
 highest value in location high.

         lda #%00110000
         sta $d412

 Stop Oscillator 3.

         cli
         lda #27
         sta $d011

 Turn the screen back on.

         lda high
         rts

 high     .byte 0

 Return from the routine with the highest value sampled from Oscillator
 3 in the accumulator.  This allows you to branch based on the high
 bit:

 	   bmi SID8580
 	   bpl SID6581

 Voila!


 ======================
 Pulse Width Modulation
 ======================

 	The primary limitation of using the volume register is, of course,
 that it is only 4-bits.  Pulse width modulation (PWM) allows us to get
 around that limitation.

 	In general, there are lots of ways of transmitting information.
 If you've ever used a radio you've encountered both amplitude modulation,
 where the signal is encoded as the amplitude of some carrier wave, and
 frequency modulation, where the signal is encoded by changing the frequency
 of the carrier wave.  In both cases, the idea is to strip out the encoded
 information and throw away the carrier.
 	Yet another possibility is pulse width modulation: use a pulse
 wave at some carrier frequency, and modulate the pulse width.  Pulse width
 modulation has several nice properties for transmitting signals; we can
 take advantage of it to play digis.

 	Pulse waves, of course, take on only two possible values: zero and
 one (low and high, etc.).  Over a single period, a pulse wave will in general
 be low for some amount of time and then high for some amount of time.
 The _duty cycle_ of a pulse wave is the amount of time it spends in the high
 state compared to the total period.  For example, a square wave, which is
 low exactly half the time and high the other half, has a duty cycle of 50%:

 	      ______	______
 	      |	   |    |    |
 	      |    |    |    |
 	 _____|    |____|    |____ ...

 Remember that, regarding SID, a signal like the above is simply a voltage
 level.  What is the _average_ voltage over a single period?  Since a square
 wave is zero half the time and one the other half the average value is
 just 1/2.  If instead the pulse had a duty cycle of 75%, it would be low
 for 1/4 the cycle and high for 3/4, giving an average value of 3/4.

 So the _average_ value of a single pulse is simply the duty cycle.  So if
 we change the duty cycle for each pulse we can essentially generate a
 series of average voltage values -- and since a digi is nothing more than
 a series of average signal values, we can use PWM to play a digi.

 To make this more precise, let's say we had a digi sampled at 1KHz -- one
 thousand samples per second.  Since each sample value will be approximated
 by a pulse, we need one thousand pulses per second.  The duty cycle of
 the first pulse will be the first sample value, the duty cycle of the
 second pulse will be the second sample value, and so on.  Note that
 the sample rate is the carrier frequency -- the frequency of the modulated
 pulse train, 1KHz in this case.

 (Actually, to be more accurate, we need _at least_ 1000 pulses per second --
 for example, we could use 2000 pulses per second, and represent each sample
 value using two pulses.  So the more correct statement is that the pulse
 carrier frequency is the maximum sample playback frequency.).

 The advantage for playing C64 digis is that we have much more resolution
 for the pulse width, and probably not in the way you think!  Because you
 are probably thinking that SID has this nice 12-bit pulse width that
 we can use here.  The problem is that the absolute highest frequency SID
 can produce, using the frequency registers, is about 4KHz, which would
 be the maximum playback rate.

 There's still another catch -- the carrier wave is still there!  Imagine
 trying to encode a signal that was constant, say 1/2 everywhere.  To
 generate a "digi" value of 1/2, you'd use a square wave, half down and
 half up.  So while the _average_ value of each pulse would be 1/2, the
 actual signal would be a square wave at the carrier frequency (look at
 the little picture above if you don't see it -- its average value is 1/2).

 Trying to modulate a 4KHz carrier wave results in a piercing 4KHz tone,
 and a _maximum_ sample rate of 4KHz (and this assumes that you can sync your
 code up exactly with SID).  So that's pretty worthless for digis.

 			- BUT -

 What if we could change the voltage level manually?  Let's say some
 hypothetical machine language program toggled the voltage level
 on each machine cycle -- the result would be a square wave of
 frequency 0.5 _mega_ hertz.  Okay, let's say it changed the voltage
 level every 10 machine cycles -- the result would be a carrier
 frequency of around 50 KHz.  The point here is that a machine language
 program can generate its own pulse waveform, and do so at much higher
 frequencies than SID can produce.

 Toggling the voltage levels turns out to be very simple.  As was
 described earlier, the way to "boost" digis on later SIDs is to use
 a pulse waveform at frequency zero.  Depending on the value of the
 pulse width register, SID will set the output voltage to either high
 or low.  So all a program has to do is set up a pulse waveform at zero
 frequency and use the pulse width registers to toggle the voltage --
 set $d403 to either $00 or $ff to toggle low/high.  (You could also use
 $d418 to toggle low/hi, but this method should produce more uniform
 results, and unlike $d418 can be filtered).

 So now we're cooking -- we've got a program that can generate a pulse
 train.  The next step is to change the width of each pulse to represent
 the sample values in our digi.  Remember that the duty cycle -- the
 percentage of time the pulse spends high -- is the average value for that
 pulse.  But also remember that each digi sample represets an average
 value over the sample period.  If the pulse period is equal to the sample
 period, then _the duty cycle is exactly the sample value_!

 Example: let's say that we have an 8-bit sampled digi, so that values go
 from 0-255, and our program generates pulses with a period of 256 "ticks".
 Now pick a sample value, say 56.  All the program has to do is hold the
 pulse high for 56 "ticks", and low for the remaining 255-56 = 199 "ticks",
 and it will have the correct average value: 56/256.  So a program to play
 8-bit samples might look like

 1 - Load .X with next sample value
 2 - Load .Y with 256-.X
 3 - Set pulse high
 4 - Loop for .X iterations (each loop iteration is one "tick")
 5 - Set pulse low
 6 - Loop for .Y iterations
 7 - Loop back to step 1

 Let's say that each "tick" takes m cycles, and the sample size is 2^n, so
 that there are 2^n ticks per sample.  A stock machine runs at around
 10^6 cycles/second, so...

 	(10^6 cycles/second) / (2^n ticks/sample * m cycles/tick)
 	= 10^6 cycles/second / (m * 2^n cycles/sample)
 	= 10^6 / (m * 2^n) samples/second

 So, for example, let's say we had n=6-bit samples -- 2^6 = 64 -- and could
 generate pulses with a resolution of one machine cycle -- m=1.  Then
 we could play that 6-bit sample at 10^6/64 = 15.6KHz.  That is _really
 very good_!  In principle -- possibly using the CIA timers, possibly using
 fixed delay loops, possibly using a massively unrolled loop -- this can
 be done on a stock machine.  (I did try using the CIA timers, but the
 number of cycles to set up the timers was too big, and made it sound poor;
 I've included the code below though.)

 At this point it becomes a numbers game.  As we increase the sample size
 (increase m or n above), we _decrease_ the sampling rate -- if, in the
 above example, we instead use 8-bit samples, the sampling frequency drops
 by a factor of four to around 4 KHz.  So there's a tradeoff between
 resolution and sampling frequency.

 AND... we still have this issue of the carrier frequency.  You should be
 able to convince yourself that the sampling frequency above is exactly
 the carrier frequency.  So with the 8-bit resolution example there
 would be an awful 4KHz tone running through the playback.  There are
 only two ways to beat the carrier frequency: push it high enough that
 you no longer hear it, or else push it high enough that you can use the
 filters to dampen it down.

 How high is high enough?  You can judge for yourself, but 15 KHz is
 pretty tough to hear, unless you have good ears and the volume is really
 loud -- so 6-bit samples are within reach on a stock machine.

 But add a SuperCPU into the picture, and the numbers get _really_ nice.
 Everyone knows that a SCPU can interact with the C64 at 1MHz, and
 hence generate pulses with 1MHz resolution, using code like

 	lda #$ff
 	sta $d403	;Set level high
 :loop	lda $d011	;wait for C64 cycle
 	dex
 	bne :loop

 where .X contains the sample value.  But what happens if we try to move
 beyond that 1MHz?  What if we put some NOPs into the above delay loop,
 in place of the lda $d011?  Well, in principle it means that the duty
 cycles won't always be right, which corresponds to some sampling error.
 In practice, however, it works _really well_!  Consider what happens when
 the above code is changed to:

 :loop
 	nop
 	nop
 	dex
 	bne :loop

 The earlier formula still applies, but now using 20MHz cycles:

 	20 * 10^6 / (m * 2^n) samples/second

 In this example each loop iteration -- each "tick" -- is nine 20MHz cycles,
 giving a playback rate of approximately 17Khz for 7-bit samples.  Which
 is TOTALLY COOL!

 And it can even be pushed to 8-bit samples (although I personally don't think
 they sound any better, at least with the code I've tried; maybe the code can
 be improved).  Using loops like

 :loop
     dex
     beq :done
     dex
     beq :done
     ...
     dex
     bne :loop
 :done

 it is possible to "fine-tune" the loop tick to somewhere between 4-5 cycles,
 giving a playback rate between 15KHz and 19KHz, for an 8-bit sample.  Pretty
 cool.  The code is also a little more involved (with 7-bit samples we can
 use BMI for the loop branches; not so with 8-bits).  But it really is
 possible to play 8-bit samples at 19KHz on a C64 (plus SuperCPU).

 Using two voices
 ----------------

 You may be thinking, Hey, we've got three pulse waves to work with, can
 we improve the performance by using multiple pulses?

 Let's say we have two pulses, P1 and P2, with the same period.  When both
 are activated, the pulses simply add together -- that is, the total voltage
 is just the sum of the individual voltages, and therefore the _average_
 voltage is the sum of the individual pulse averages:

 	avg voltage = D1 + D2

 where D1 and D2 are the duty cycles of pulses P1 and P2.  In the simplest
 case, this gives us an extra bit of resolution -- if D1 and D2 are both
 7-bit values, say, then D1+D2 is an 8-bit value.

 -BUT-

 Consider, for a moment, what would happen if we were to change the amplitude
 of the second pulse -- that is, let's say the maximum voltage it took on
 was 1/16 of the maximum voltage of the first pulse.  The average voltage
 would then be

 	avg = D1 + D2/16

 This then gives us _four_ extra bits of resolution, with each bit to the
 _right_ of the decimal place.  For example, if D1 and D2 are 4-bit numbers,
 with D1=xxxx and D2=yyyy, then the avg will be a number like xxxx.yyyy
 (four bits to the left of the decimal place and four to the right).

 Of course, we can change the pulse amplitude by changing the sustain
 setting, so in principle this gives a very easy and efficient way of
 playing high-resolution digis.  In practice, I have not been able to
 make it work very well.  I used a sustain setting of 1 and split an
 8-bit sample into two 4-bit pulses; I believe the result sounds better
 than 4-bits, but certainly doesn't sound anywhere near 8-bits.  My
 suspicion is that it is because the second pulse voltage is not really
 1/16 of the first pulse, which corresponds once again to adding noise
 to the sample value.

 To find out, we can just measure the output at different sustain levels.
 The following table gives the voltage output for voice 1 using a pulse
 waveform at zero frequency and volume 15:

    Pulse Width      Diff
 SU  000    fff	  000	fff

 0f  6.34   5.29   .08	.07
 0e  6.26   5.36   .02	.01
 0d  6.24   5.37   .06	.05
 0c  6.18   5.42   .03	.02
 0b  6.15   5.44   .05	.03
 0a  6.10   5.47   .03	.02
 09  6.07   5.49   .04	.02
 08  6.03   5.51   .03	.02
 07  6.00   5.53   .05	.03
 06  5.95   5.56   .03	.02
 05  5.92   5.58   .05	.02
 04  5.87   5.60   .04	.03
 03  5.83   5.63   .04	.02
 02  5.79   5.65   .06	.02
 01  5.75   5.67   .05	.02
 00  5.70   5.69

 Voice 2 is identical within a few hundredths of a volt.  If this test is
 repeated using voices 1 and 2 simultaneously, the result is:

    Pulse Width
 SU  000    fff
 0f  7.30   5.25
 0e  7.12   5.36
 0d  7.09   5.37 (!)
 0c  6.95   5.46
 0b  6.88   5.49
 0a  6.78   5.54
 09  6.72   5.58
 08  6.62   5.62
 07  6.58   5.65
 06  6.47   5.70
 05  6.40   5.73
 04  6.31   5.78
 03  6.22   5.82
 02  6.13   5.87
 01  6.07   5.90
 00  5.97   5.95

 Note the weird step at $0d -- the response is definitely not linear!

 Now, to summarize, when using one voice, the "positive" amplitude (about the
 mean 5.70V) is .64V and the "negative" amplitude is .41V, giving a spread of
 1.05V.  With two voices together, the amplitudes are 1.33V, 0.72V, and 2.05V
 respectively.  If the two signals were simply added together, the numbers
 should be 1.28V, 0.82V, and 2.1V.

 What we originally wanted was a signal like

 	D1 + D2/16

 that is, another pulse that is 1/16 the value of the 'full' pulse.  1/16 of
 the positive amplitude is .64V/16 = .04V, and 1/16 of the negative amplitude
 is .41V/16 = .026V.  A setting of sustain level 1, on the other hand, gives
 voltage offsets of 0.05 and 0.02, giving approximately

 	.64V / .05V = D1 / 12.8
 	.41V / .02V = D1 / 20.5

 So, in summary, whereas I wanted D1 + D2/16, I was actually getting something
 that varied from D2/12.8 to D2/20.5, even if the two voices summed together
 correctly.

 There may still be a way to make all this work right, which would be great,
 but I'm tired :).  The code from my attempts is below.

 I also could not get two 7-bit pulses to sound like an 8-bit pulse.  I took
 an 8-bit pulse and divided it in half, assiging each half to a pulse
 (and giving the extra bit to pulse 2, if an extra bit was present).
 I suspect that another issue is that it is impossible to update both
 pulses simultaneously, meaning some delay between pulses, which translates
 to adding -- surprise! -- noise to the signal.  Perhaps it would be
 more effective at lower resolutions, however.

 If someone has some success using these techniques I'd be interested in
 hearing it.

 SID lockups
 -----------

 Blindly applying these PWM algorithms has a way of locking up SID -- like,
 locking him up hard.  To be honest, I don't have a good explanation for why
 this happens, and I haven't yet found a good method of prevention -- toggling
 the test bit, playing a real sound for a short time, toggling the gate bit,
 and so on, just don't seem to "initialize" SID reliably enough.  Sometimes the
 code works, and sometimes it doesn't -- it's the same code both times.  Often
 resetting the machine will make things work; I'm not sure what hardware resets
 take place within SID, but the kernal certainly zeros him out so that's a
 possibility.  The other observation is that playing a tune seems to 'clear
 out' whatever is blocking SID.  So there _must_ be some kind of software
 solution to the problem.

 In the example code pressing RESTORE restarts the code, which will usually
 clear the 'blockage' after a tap or two, if it happens.

 If anyone has some thoughts on this issue (or even better, an explanation
 of what is going on!) I'd love to hear them.

 Pulse Width Modulation, continued
 --------------------------------- from various

 The digi article in issue #20 of C=Hacking left a few loose ends, and
 generated some followups.

 First, Otto Jarvinen (sounddemon) emailed to say that the SID detection
 routine occasionally reported incorrect results for him, and suggested that
 a workaround was to do the detect several times.  YMMV!

 Second, a day or two after issue #20 was released, Levente discovered a
 brilliant way to play 6-bit PWM digis on a stock machine:

 --
 I couldn't resist, and tried something out (see attachment). It works!!! :-)

 In fact, when I wrote the last letter I didn't know that I found something
 useable, just had some ideas - I felt that I'm at the right place. When I read
 C=H 20 this morning and read your comment about the Test bit (from the PRG), I
 knew that it must work. All I had to do is then to put this idea into code.

 The whole idea is about starting the pulse by software, and then having the
 SID turn it back to 0 after a time.

 Is it possible? ...The keys are the Test bit (the SID wave counter can be
 reseted anytime), the pulse width register, the wave counter and the SIDs way
 of generating pulse wave. (Ie. the pulse wave is high, as long as the wave
 counter is less than the value in the pulse width register).


 Check this algorithm:

 - Init: volume at max, voice 1 sustain level max, start attack. Freq is
 selected well (=$4000), so the wave counter is incremented by 4 every
 processor clock cycles.

 Loop:
 - load next sample value, and put it to the pulse width low register ($d402;
 ensure that $d403 is 0).
 - Set test bit, and clear test bit (counter reset).
 - Increase sample pointer, some delay, then loop. The delay must be 64 clock
 cycles + the time while the Test bit is kept set (4 cycles if using STA $d404
 : STX $d404 immediately with pre-loaded values).

 What will happen? The 8-bit sample value is put directly to the pulse width
 register (MSBs of the pulse width register are cleared!...). The wave counter
 is started (release test bit), and it increases 4 by every CPU cycles (=
 counts 256 in 64 cycles). After some time, the counter will reach the value in
 the pulse width register. This happens in exactly after (8-bit sample value /
 4) cycles, because of the above. In this cycle (or the next?...) the SID turns
 its pulse output to 0. Voil�!

 One must just make sure that the loop length in cycles matches the above
 conditions, and then it runs like hell... Since it does exactly the same on
 the SID as the other (bit-banging) way, it just does it with some hardware
 help, there's also no problem with the 4khz maximum barrier (since the
 oscillator is reset every loop).

 With little enhancement, it's possible to write an about 7.5 bits player for a
 stock C64 by this method. This is what you find in the attachment... The idea
 is using all the 3 channels simultaneously. A slightly increased sample value
 is written to the three pulse width registers, so the oscillators will finish
 the duty cycle one processor cycle later, when there's a carry between
 bits(0,1) to the MSBs.

 The replay freq is the CPU clk / 68 (~15khz). 64 cycles (variable duty cycle)
 + 4 cycles (constant duty cycle because of the reset time - no problems with
 that, it doesn't change (just gives a small constant DC...)).

 By similar methods, it should be possible to write a sample player with higher
 PWM freq (with less resolution of course, but eliminating this still audible
 whistling).

 (I tried using the filter to reduce it, but it sounded so bad that I left it
 out. It clicked like hell. The FETs got saturated.)

 [Richard Atkinson suggested turning down the sustain volumes to avoid this]

 See the attachment, and the binary. I think the sample sounds pretty good :-).
 (The cut is from 'Greece 2000' by Three drives on a vinyl).

 (Another idea that popped up in my mind: since the TED sound generator can
 also be reset, I could probably translate this idea to the Plus/4 :-O ).

 Best regards,

 Levente

 --

 The binary is available at http://www.ffd2.com/fridge/chacking/ towards the
 bottom of the page.

 Third, I received a very interesting email from an Apple-II guy, which I'd
 like to pass on:

 --

 Hi!

 I found your page as I was searching for something else 6502-related,
 and was very interested.  Although I have always been aware of the
 C64, I have never really been a user--I have used Apple II's since 1980.

 I was particularly interested in the article on playing "digis" on the
 C64.  I became interested in playing digitized sounds on the Apple II
 in 1993, after hearing a 3-bit, 11.025 KHz PWM player.  At 3 bits, you
 can imagine how noisy speech samples were, but the overall effect
 for a 1 MHz machine with a 1-bit speaker "toggle" was amazing.  It
 made me wonder how far this PWM technique could be pushed on a
 stock, 1 MHz Apple II (not the somewhat faster, 65816-based IIgs).

 The short answer is, much farther than I expected!  Robin and Stephen
 accurately describe the theoretical PWM limit as 6 bit samples at
 about 16 KHz for a stock 1 MHz machine, but, as they point out,
 that is not practically realizable for a number of reasons, unless the
 play loop is completely unrolled!

 Furthermore, in the Apple II world, sampled sounds have acquired a
 few standardized sampling rates--mostly as a result of Mac influence,
 which was in turn influenced by CD's.  The most common rate in the
 Apple II world is 11.025 KHz, or one-fourth of the audio CD sampling
 rate.  This is commonly considered to be "AM radio quality", with a
 Nyquist bandwidth of about 5.5 KHz and a practical bandwidth of
 4+ KHz, given practical anti-aliasing filters (at the sampling end, not
 the playback end).

 A frequency of 11.025 KHz is, though high, still painfully audible to
 people whose ears are not zonked--a piercing "squeal" running
 through every sound.  So even though it is possible to write a
 practical 6-bit 11.025 KHz PWM player (usually called a SoftDAC
 in the Apple II world), the resulting listening experience is disappointing.

 So I went to work on a way to do 2x oversampling, and built a 5-bit
 22.050 KHz PWM player.  It was sad to lose a bit, but the absence
 of any audible "carrier" more than compensated for it!

 If you have access to an 8-bit Apple II (preferably with lower case,
 like a //e), and also preferably with a way of attaching an external
 speaker or headphones in place of the miserable 2.75" internal
 speaker, then you can easily give it a try and judge for yourself.

 I'm pretty proud of the novel design of the code, which I would
 characterize as "vectored" unrolled loops, one for every two
 pulse duty cycles, which I wrote a BASIC program to write
 for me--much less painful for counting cycles!

 The package is available on the web at:

 http://members.aol.com/MJMahon/index.html

 and is called <A HREF="http://members.aol.com/MJMahon/sound22.shk">Sound Editor v2.2</A>, since I had to "dress up" the player
 into something fun to play with.  ;-)  An earlier version of Sound Editor
 was published on SoftDisk in 1994, IIRC, but this one is a little more
 evolved.  It also introduced 2:1 ADPCM compression of 8-bit sampled
 sounds, to save disk space.  It is a lossy compression, but not very
 noticeably.  The editor package also includes those routines, in 6502
 assembly code.

 All of this should be trivially adaptable to the stock, 1 MHz C64, with
 very good results.  By using the filters, you could probably filter out
 the 11.025 KHz carrier and return to 6-bit accuracy!

 I should note that in the Apple world, sampled sounds are usually
 represented as "excess-128" codes, which means that the sign bit
 is inverted.  This actually simplifies things, since the sample value
 is within a few shifts of being the pulse width in cycles.

 Let me know what you think!

 -michael

 --

 (Always great to hear from Atari and Apple ][ folks!)

 And finally, I have a little mathematical analysis of PWM and how it compares
 to a "straight" digi.  Basically, I found some of the PWM explanations a
 little unconvincing in issue #20 (even though I wrote them!).  For example,
 the idea of "average voltage" seems a little funny, since every two samples
 has an "average voltage", as does every four, etc. but that set of average
 voltages would give a different sounding signal than the original (or
 more dramatically, there is an average voltage over a full second of digi
 playback, but that's not what you hear!).  So I wanted to know how a
 PWM signal _really_ compares to a straight digi playback.

 Another issue is changing the amplitude of a PWM digi, i.e. using two
 pulse waveforms, with one 1/16 the value of the other, to get higher
 resolution.  If you recall the discussion of digis, the resolution of a PWM
 digi depends on the number of pulse widths available, not the amplitude.
 Adding two PWM waveforms together does not change the number of pulse widths
 available, so I wanted to figure out what changing the amplitude _really_
 does to a PWM digi, and if it can really be exploited.

 And finally, I wanted to know about the carrier wave (that is so piercing
 at lower playback frequencies) -- and once again, how it compares with a
 standard digi (which, after all, is stair-stepping the voltages at the
 playback rate).

 Since the rest of this article is some Fourier analysis that 99% of people
 will have zero interest in, I'll put the conclusions here.  The first is:
 PWM digis and standard digis are essentially identical except at higher
 frequencies (except for a phase shift, which doesn't make any difference to
 your ear).  The second is: changing the amplitude of a PWM changes the 
 resolution.  More specifically, the amplitude of the pulse multiplies the
 digi sample value.  If two pulses can be synced close enough, it should
 indeed be possible to use two pulses to get a higher resolution.  Moreover,
 by modulating the amplitude of a single PWM digi, using the $d418 volume
 register -- that is, using PWM _and_ $d418 -- it should be possible to get a
 higher dynamic range, something that should be a little more achievable using
 SID (but maybe not that useful, so I didn't try it out).  And finally, a
 standard digi has zero amplitude at the carrier frequency.

 In other words, after a lot of effort I was able to demonstrate what everyone
 already knows.

 The analysis doesn't change anything from the previous articles (except
 possibly the idea for changing the PWM amplitude to get more dynamic range).

 And now, some Fourier analysis.  A standard digi just sets the voltage to
 the sample value s_j, for a length of time dt (dt = 1/sample rate).  The
 Fourier transform of a single sample s_j (occuring at time t_j) is


 		s_j [e^(-iw dt) - 1] * [e^(-iw t_j) / -iw]
 		 

 where w = angular frequency.  Since the above is a little hard to read, I'll
 say it in words.  The first term is the sample value s_j, which scales
 amplitudes at all frequencies.  The second term is due to the finite length
 of the pulse (evaluating the Fourier integral at the boundaries), and
 basically changes the phase of the transform.  The third term is like
 sin(w)/w -- a sinusoid with decreasing amplitude as frequency increases.
 So: the transform goes like sin(w)/w times the sample value, with some phase
 effects thrown in (we'll get back to these in a moment).

 A PWM digi sets the duty cycle of a pulse to the sample value s_j, giving
 a Fourier transform of

 		[e^(-iw s_j dt) - 1] * [e^(-iw t_j) / -iw]

 Compare this with the earlier expression, and you'll see that the sample
 value s_j has moved up in to the exponent of the "phase term" but that
 they're otherwise the same.

 The first thing to do is to show that both expressions, PWM and standard,
 reduce to the same thing -- that is, that a PWM and a standard digi sound
 the same!  The expressions both decrease as 1/frequency, due to the
 sin(w)/w term.  This means that at large frequencies the values become
 negligible.  (How large?  For example, if the sample frequency is just 1KHz,
 then sin(w)/w is .001 times smaller near w=1KHz (i.e. the sample frequency,
 which is twice the Nyquist limit) than it is near w=0).

 So now consider the phase terms for small w.  The Taylor expansion for e^x is

 	1 + x + x^2/2 + ...

 We can therefore expand the "phase terms" as

 	regular: e^(-iw dt) - 1 = (1 - iw*dt + w^2 dt^2/2 + ...) - 1
 				= -iw*dt + O(w^2 dt^2)

 	pwm: e^(-iw s_j dt) - 1 = -iw*s_j*dt + O(w^2 dt^2)

 where O(w^2 dt^2) is considered very small since w and dt are both small.
 Substituting the above into the original expressions gives

 	s_j*iw*dt [e^(-iw t_j) / iw]

 in both cases.  That is, we have shown that for "small" frequencies -- more
 specifically, for frequencies where (w^2*dt^2) is much smaller than (w*dt),
 which is where w*dt<1, which is frequencies less than the sample frequency,
 which is all frequencies of interest! -- PWM and standard digis are the same.

 The explanation lies in the phase terms.  Those "phase terms"

 	 [e^(iw dt) - 1]  (regular)

 and

 	 [e^(iw s_j dt) - 1]  (PWM)

 do more than just change the phase.  When they multiply the sin(w)/w signal,
 they take the sin(w)/w signal, change the phase, and then subtract the
 sin(w)/w signal again.  It's this difference of signals that makes things
 work out at the frequencies we care about.  PWM and standard digis are _not_
 the same, but the main differences are at higher frequencies, where the
 amplitudes are in general much smaller.

 But... but... what about the PWM carrier frequency?  If we take a constant
 digi, say with sample values = 1/2, the standard digi gives a constant
 voltage, whereas a PWM digi gives a square wave at the sample frequency.
 The answer comes from the "phase terms" above.  The sample frequency is

 	w = 2*pi/dt.

 Substituting this into the phase terms gives

 	[e^(i*2*pi) - 1]	(regular)

 and

 	[e^(i s_j 2*pi) - 1]	(PWM)

 The regular expression is exactly zero -- there is _nothing_ at the
 sample frequency of a regular digi.  But that's not the case for the PWM
 term, because of the s_j up in the exponent.  PWM digis have a _finite_
 amplitude at the carrier frequency.  Note that because of the sin(w)/w
 term it gets smaller as the sample frequency increases -- but it isn't zero.

 Finally, the phase term expansions give some insight into what happens
 when both the pulse width _and_ height are varied.  If the pulse width
 is s_j, and the height is set to h_j, then the Fourier transform becomes

 	h_j*s_j *iw*dt [e^(-iw t_j) / iw]

 That is, the amplitude multiples the width.  For the case of adding two
 PWM waves together, then, the amplitude really does effectively scale the
 sample value, and it should be possible to add one PWM value at 1/16 the
 amplitude of another to get an effective 8-bit value.

 What about _varying_ the amplitude of a single PWM sequence?  For a 6-bit PWM
 digi, say, the sample values s_j can go from 0 to 63.  If this is then
 multiplied by h_j=2 say, then the values become 0 2 4 ... 126 -- a 7-bit
 number where the lowest bit is always 0.  What use is that?  Well, we still
 have the h_j=1 values of 0..63, which do include the lowest bit.  So we
 can effectively change the dynamic range from 0..63 to 0..126 using just two
 amplitude values.

 As a practical matter, then, it might be possible to use all 15 $d018 values
 available to get a big dynamic range, and hence a better sounding digi,
 using fewer CPU cycles.  Well, ok, we're only _sort of_ changing the dynamic
 range, so I pretty much doubt the usefulness of it.  But maybe someone out
 there would like to give it a shot.

 All right, let's hope this closes the book on pulse width modulation for
 digi playback!
 .......
 ....
 ..
 .                                    C=H 20