Skip to content

Instantly share code, notes, and snippets.

@sroccaserra
Last active May 10, 2024 04:09
Show Gist options
  • Save sroccaserra/5bacbdb3e000a54dbae0972c346021d4 to your computer and use it in GitHub Desktop.
Save sroccaserra/5bacbdb3e000a54dbae0972c346021d4 to your computer and use it in GitHub Desktop.
Documenting the converstion of old Amiga 8 bit samples

Archive:

These are 8 bit samples often used with the first Amiga trackers of the late 80s and early 90s, like Ultimate Soundtracker. In original and updated formats.

Following the info found in this other archive, I tried to accurately convert the original files to modern & self documenting formats (.wav or .aiff), so they can easily be used in modern DAWs or modern trackers, like Renoise. Try to disable interpolation, and adding a 2 poles low pass filter at around 7 kHz for an old school experience.

Disclaimer

The original files were not collected by me. They are on Aminet:

I am not an Amiga or sound files expert, just an enthousiast and curious developer. As such, I might have done a few errors. I would appreciate if more knowledgeable people took the time to check my work, and if you happen to find an error and have a way to fix it, please do so.

To gather some data about the original files, I explored mostly the ST-01 and ST-02 directories for reference, then I did a few stats on all the directories. But I didn't check the 10500+ original files individually.

Notes on the original files

First, using hexadecimal file viewers like GNU od and xxd, and D3.js to display the data, I found that interpreting files as 8 bit signed ints shows nice curves for some files.

IFF 8SVX files

I also found that a few of them (only 1 in the ST-01 directory) started by the 'FORM' word, followed by '8SVX' a few bytes later. This indicates an IFF header for sound data.

After a few more research (see reference links below), I found out that 8SVX IFF files contain info about the bit precision, and more importantly the sample rate of the original file.

A few stats: There are 4662 files with an IFF header in the original collection, from 10500+ files total. So around 44 % of the originals are IFF files, and 56 % are raw PCM data with no header.

Command to count files with an IFF header in the original files:

$ find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 grep -l '^FORM' | wc -l

Note: 22 files have the FORM keyword, but not at the start of the file (3 of them have several FORM keywords!).

$ find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 ggrep -obUa 'FORM' | cut -d':' -f1-2 | grep -v ':0$' | sort
./ST-05/cc1+2:12574
./ST-05/cc3+4:23399
./ST-05/cc3+4:35757
./ST-05/cc4:13200
./ST-05/cc4:842
./ST-06/jamigobass2:145
./ST-06/snare drum easy:255
./ST-06/voice 2:88
./ST-07/cc2:23
./ST-07/iso bdrum2:90
./ST-07/iso bdrum:447
./ST-07/iso explod:298
./ST-07/iso sdrum:372
./ST-07/iso shut2:443
./ST-07/iso shut:154
./ST-08/zent lion:6344
./ST-16/argh2:3185
./ST-18/gng-bass:104
./ST-18/gng-bdrum:284
./ST-18/gng-gui.sample:15214
./ST-18/gng-gui.sample:46
./ST-18/gng-gui.sample:6858
./ST-18/gng-piano-moll:180
./ST-18/tv-go:1065
./ST-18/tv-select:101
./ST-43/blast1:52

In addition, 17 more files have an 8SVXVHDR keyword, but not at the 8th byte which is the usual position:

ST-07/endblaster8:253
ST-12/desertsnare3:55
ST-20/supertomdrum:112
ST-31/adolf4:30576
ST-31/devestating:1032
ST-32/guitar3:62
ST-32/guitar4:62
ST-32/guitar5:62
ST-32/guitar6:62
ST-33/claps6:12
ST-34/ass:1034
ST-34/claps7:19
ST-34/claps8:59
ST-34/house2:10
ST-41/supercrash:74
ST-A9/WILDGITARB:112
ST-A9/WILDGUITARS:112

Then 30 more files have a VHDR chunk in wrong position (usually 12th byte):

ST-07/iso sdrum3:1
ST-15/a-snarek:116
ST-18/gng-eguitar:152
ST-18/tv-noise2:2914
ST-18/tv-noise:122
ST-18/tv-spectator:31356
ST-24/atom-piano:177
ST-24/bat-guitar:9227
ST-24/bat-sdrum:157
ST-26/jump-tomtom:122
ST-26/th-basscool:165
ST-27/animate-bass:157
ST-27/klax-klopfen2:118
ST-27/klax-klopfen3:117
ST-27/klax-typemachine:120
ST-38/jb-hit:133
ST-45/cadaver-pauke+drum:104
ST-45/puznic-bass:109
ST-45/puznic-bdrum1:117
ST-45/puznic-snare1:143
ST-46/battle-bdrum:126
ST-46/battle-sdrum:121
ST-51/robocop-slapbass:107
ST-51/robocop-tiptip:422
ST-52/lemmings-3egui:1747
ST-52/lemmings-3egui:859
ST-52/lemmings-awebdrum:21
ST-56/spysample:123
ST-56/spysample:13203
ST-56/spysnaredrum:20

Some of them could easily be fixed. Note: I didn't, but you can do it if you want. Have a look at the bytes around the positions I listed with a hex viewer.

Oktalyzer files

I then found out that some 8SVX files had the Oktalyzer string as annotation (in the ANNO 8SVX chunk).

Most of them in the first bytes, but some of them later in the file. Probably samples exported or simply chopped from Oktalyzer files, a late 80's Oktalyzer Amiga tracker as I discovered.

$ find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 ggrep -obUa -l 'lyzer' | wc -l

Almost all of them (2317) are well formed 8SVX files and have the Oktalyzer string at byte 76, in the ANNO 8SVX chunk.

Two of them have the string Octalyzer in a different place :

$ find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 rg -obUa 'Oktalyzer' | cut -d':' -f1-2 | rg -v ':76$'
./ST-08/zent lion:6420
./ST-79/HES.bigodnare2:164

The file ST-08/zent lion is 7696 bytes long, has no header at the start, and has a well formed 8SVX header starting at byte 6344 (!). It is probably two samples pasted together, the first without header and the second with a header.

The file ST-79/HES.bigodnare2 has an almost good header: it has two body chunks. Probably an error.

Then three files only have the string lyzer in the middle of their data, with no 8SVX header, probably indicating they where badly chopped from a 8SVX file and pasted together :

$ find ST-29 -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 ggrep -obUa 'lyzer' | cut -d':' -f1-2 | grep -v ':80$'
ST-29/xen-megablast:10444
ST-29/xen-tapsample:16166
ST-29/xen-we...:3100

Notes on the conversion

Converting raw files

For the raw files, I wrote a Python Script that can:

  • read a raw file
  • generate an AIFF header with values corresponding to the raw data,
  • create an AIFF file, by pasting the header to the raw data.

This means that if you remove the generated header of the AIFF files, you should find exactly the original raw file.

Here is the command I used, from the root of the archive:

$ find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 grep -ZL '^FORM' | xargs -0 -n1 -I {} python3 convert_to_aiff.py {}

The script is at the root of the archive. I would be grateful if someone would check it and review it. It works only for raw 8 bit signed mono data.

Note: since I use the -Z grep option that is not supported by macOs's old version of grep, I had to use ggrep, which points to GNU grep on my mac. You can install most GNU replacements for macOs with Homebrew. If you're on a real Linux, you're fine, your grep is probably a recent version of GNU grep.

Note: another option is to let sox do the job, below is a convert_raw_to_44100_Hz_16_bit_wave.sh script that does just that.

Converting IFF files

Since IFF files have many optional chunks, I wanted to avoid writing a parser for 8SVX header to extract the sample rate and other data about the IFF files. So I used SoX (Sound eXchange) to read and convert all files starting by a 'FORM' magic number header. SoX had no problem parsing them and converting them to .wav.

IFF files can contain basic looping points, by spliting the data into a first part (one shot) and a repeating part. The whole data should be present in the .wav files, but if the original files contained looping points, I was not able to preserve them.

Here is the command I used, from the root of the archive:

$ find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 grep -Zl '^FORM' | xargs -0 -n1 -I {} sox {} {}.wav

Notes on the sample rates

.wav files where converted from 8SVX IFF files using SoX. These files had information about their sample rate, and should be tuned as well as their original sample was.

Some stats : on the 4662 IFF files, 1783 are in 16726 Hz, and 2401 are in 8363 Hz. That would be 90 % of the IFF files.

.aiff files where generated from raw 8 bit signed PCM data, without header and without info about their original sample rate. Since 90 % of the IFF files had a sample rate of either 16726 Hz or 8363 Hz, I chose to write 16726 HZ in the AIFF header for the .aiff files. If the sample plays too fast in your DAW, try to play it an octave lower, it should sound ok in most cases.

Command to gather stats on the IFF files sample rates:

find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 ggrep -Zl '^FORM' | xargs -0n1 xxd -p -s32 -l2 | sort | uniq -c

Note: counts are decimal numbers, sample rates are hexadecimal numbers.

Using the original files in Renoise

Renoise can load raw PCM data, so you can load the original files and tune them (sort of). This method should also work with other DAWs.

To do that, click on the *.* button to show the files without extension. Then right click on the original sample and choose Load file with Options....

For the raw PCM files

In the dialog box choose 8 bit signed and 11025 (Renoise doesn't propose 8363 or 16726 as sample rate, the sample rates of 90 % of the original files). Then add 7 semitones to the pitch, and the sample should be tuned: play C3 or C4 for the base note of the original.

Note: this should give you the exact same result than double clicking the .aiff converted files, without the additional semitones.

For the IFF original files

This is the remaining 44 % of the collection. The same should work with an aditional step: skip 104 header bytes in the load options. But the number of header bytes can vary, it should be four bytes after the BODY marker. You should check the header length with xxd, od, or hexedit on Linux / macOs, HexEdit or HxD on Windows.

For those, the problem is to identify that it's an IFF file in the first place, as there are no extensions in the original files. Again, your favourite hex editor is your friend (look for the first four bytes, they should spell FORM).

And if there is a loop start info in the IFF file, you loose it.

Note: this should give you the exact same result as double clicking on the corresponding .wav converted file, without the additional semitones. You also loose the potential loop start info if present, I couldn't find a way to preserve that.

Renoise tool

I also wrote a Renoise tool to directly load the original files. For most of them it can guess the file format, and it preserves loop info in the 8SVX files. Downside: it does not allow to preview the file, you have to load it to hear it.

You can find it here:

References I used

Original files:

About ST-01 samples:

About IFF and 8SVX:

About AIFF:

I did not explore enough, but Python can do a few things with IFF or AIFF data and audio files in general:

Interesting series about trackers / Soundtracker:

#!/bin/bash
# Note: inspired by https://www.youtube.com/watch?v=eDCA1Tn52_E
# Note: since the source is 8 bit data, the "--endian big" option is not necessary.
mkdir -p ../wav
for i in *
do
sox -r 8363 -c 1 -b 8 -e signed-integer -t raw -v 0.8 "$i" -b 16 -r 44100 -V3 "../wav/$i.wav"
done
import argparse
"""
Works only for raw 8 bit signed mono data.
This script was writen as an easy way to convert original Amiga ST-XX raw
samples to a more documented format that keeps the original data untouched.
See also:
- http://paulbourke.net/dataformats/audio/
- https://github.com/audacity/audacity/blob/fa00dd0/lib-src/libsndfile/src/aiff.c
- https://archive.org/details/AmigaSTXX
"""
sample_rate_in_extended_precision = {
11025: b'\x40\x0c\xac\x44\x00\x00\x00\x00\x00\x00',
16000: b'\x40\x0c\xfa\x00\x00\x00\x00\x00\x00\x00',
16726: b'\x40\x0d\x82\xac\x00\x00\x00\x00\x00\x00',
22050: b'\x40\x0d\xac\x44\x00\x00\x00\x00\x00\x00',
44100: b'\x40\x0e\xac\x44\x00\x00\x00\x00\x00\x00',
}
DEFAULT_SAMPLE_RATE = 16726
def convert(input_file_name, output_file_name, sample_rate):
sample_data = bytearray(open(input_file_name, 'rb').read())
nb_samples= len(sample_data)
form_chunk_size = 4 + 4 + 4
comm_chunk_size = 4 + 4 + 2 + 4 + 2 + 10
ssnd_chunk_size = 4 + 4 + 4 + 4
total_size = form_chunk_size + comm_chunk_size + ssnd_chunk_size + nb_samples
nb_channels = 1
sample_size = 8
form_chunk = b'FORM' + \
(total_size - 8).to_bytes(4, byteorder='big') + \
b'AIFF'
comm_chunk = b'COMM\x00\x00\x00\x12' + \
nb_channels.to_bytes(2, byteorder='big') + \
(sample_rate*nb_channels).to_bytes(4, byteorder='big') + \
sample_size.to_bytes(2, byteorder='big') + \
sample_rate_in_extended_precision[sample_rate]
ssnd_chunk = b'SSND' + \
(total_size-34).to_bytes(4, byteorder='big') + \
b'\x00\x00\x00\x00' + \
b'\x00\x00\x00\x00'
header = form_chunk + comm_chunk + ssnd_chunk
open(output_file_name, 'wb').write(header+sample_data)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('input_file')
parser.add_argument('-o', '--output-file', help='The name of the output file (default is the name of the input file with ".aiff" extension)')
parser.add_argument('-r', '--sample-rate', help='The sample rate of the source in Hz. The default is {}.'.format(DEFAULT_SAMPLE_RATE), default=DEFAULT_SAMPLE_RATE)
args = parser.parse_args()
input_file_name = args.input_file
output_file_name = args.output_file or input_file_name + '.aiff'
convert(input_file_name, output_file_name, args.sample_rate)
@mikkovihonen
Copy link

mikkovihonen commented Sep 27, 2023

Noticed your work on the samples. Not knowing you had already looked into these I did something similar a while ago. The python snippet below looks for files with 8SVX header end tag, strips the first occurence of it and the bytes before it from the files and saves the results to a new files with ".fixed" suffix. I then simply converted all the files with sox in a similar manner as you did with the raw files.

import os
header_end = bytes.fromhex("42 4F 44 59 00 00")
for root, dirs, files in os.walk("."):
    path = root.split(os.sep)
    for file in files:
        with open(os.path.basename(root) + os.sep + file, 'rb') as f:
            bytes = f.read()
            f.close()
            index = bytes.find(header_end)
            if (index != -1):
                with open(os.path.basename(root) + os.sep + file + ".fixed", "xb") as newFile:
                    newFile.write(bytes[index+6:len(bytes)])

@sroccaserra
Copy link
Author

Nice, thank you for sharing, this brings back some memories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment