Skip to content

Instantly share code, notes, and snippets.

@milesrout
Last active November 1, 2017 12:14
Show Gist options
  • Save milesrout/4aa35266e2a3944d7f35 to your computer and use it in GitHub Desktop.
Save milesrout/4aa35266e2a3944d7f35 to your computer and use it in GitHub Desktop.
DFPU-17
A Floating Point Unit for the DCPU-16
This is a short document describing the DFPU-17 (D17), a
floating-point coprocessor for the DCPU-16 (D16). Despite its
simplicity, the DCPU-16 provides significant
functionality. However, a significant problem is its lack of
high-throughput floating-point support. The D17 exists to solve
this problem.
DCPU-16 Hardware Info:
Name: DFPU-17 - Floating Point Coprocessor
ID: 0x1DE171F3, Version: 0x0001
Manufacturer: 0x52307537 (MILES_ROUT)
Description:
The DFPU-17 (D17) is an optional floating-point coprocessor for
the DCPU-16. Its main purpose is to be used for navigation and
orientation in space. It is capable of operating in various modes
depending on whether the user's needs.
The D17 has three banks of memory: two 256-word DATA banks and a
256-word TEXT bank. Each time floating-point data needs to be
processed, the user loads text and data into the device and
instructs it to execute the text on the data. The two DATA banks
are double-buffered: the CPU can retrieve data while the device is
working on data. The device works on the "primary buffer" while
the "secondary buffer" is readable and writable by the CPU.
Swapping the buffers simply swaps these labels: the primary
buffer becomes the secondary buffer and vice versa.
Note: The TEXT bank is NOT double-buffered. If the CPU interrupts
the device with the LOAD_TEXT command while the device is
executing code, undefined behaviour will occur: most likely the
device will be rendered inoperable until a hard reset is
performed.
Note: 'Byte' is not used in this document, to avoid confusion.
A word is a 16-bit quantity.
Interrupt Behaviour:
When a hardware interrupt is received by the D17, it reads the Z
register and does one of the following actions:
(0x00) SET_MODE: The device will read the A register and sets the
operating mode to one of the following options:
0x00: MODE_OFF. The device will be deactivated, clearing
all of its data and code memory, and putting it into
standby mode. The device will have negligible power drain
while in standby, and will respond only to the SET_MODE
command.
0x01: MODE_INT: The device will be activated if in
MODE_OFF. Whenever the device receives an EXECUTE command,
it will swap buffers and then execute. When it has completed
execution it will swap buffers again and interrupt the
CPU.
Note: In MODE_INT the device can only effectively use one
buffer.
Note: Switching to MODE_INT will fail and set an error
status and error message if SET_INTERRUPT_MESSAGE has not
been called.
0x02: MODE_POLL: The device operates in the same way as
MODE_INT, except the D17 will not automatically swap buffers,
nor interrupt the CPU when finished. Instead, it is the CPU's
responsibility to poll the device to determine its status.
Note: In MODE_POLL the CPU has more responsibility and incurs
more load, but as a result it can make use of both DATA banks.
Note: See end of document for a demonstration of a procedure for
using MODE_POLL.
(0x01) SET_PREC: The device will read the A register and set the
working precision level. The default is single-precision.
0x00: PREC_SINGLE: The D17 will work with single-precision
floating point numbers. Invalidates loaded text and data.
0x01: PREC_DOUBLE: The D17 will work with double-precision
floating point numbers. Invalidates loaded text and data.
0x02: PREC_HALF: The D17 will work with half-precision
floating point numbers. Invalidates loaded text and data.
(0x02) GET_STATUS: The next interrupt to the device with the
message 0xFFFF will set the following registers as
described:
A: STATUS: the status, selected from the statuses listed
in Table I: Statuses.
B: ERROR: the most recent error message, selected from the
error messages listed in Table II: Error Messages.
C: the register is set to 0. X: the register is set to 0.
Note: this is so that future expansion of GET_STATUS
incorporating useful information in C and X will not break
programs that assume that those registers will be ignored
by GET_STATUS.
(0x03) LOAD_DATA: The device will read the A register and load
512 words from the CPU's RAM at address A into the secondary
DATA buffer.
(0x04) LOAD_TEXT: The device will read the A register and load
256 words from the CPU's RAM at address A into the
TEXT buffer.
(0x05) GET_DATA: The device will read the A register and load 512
words of data from the secondary buffer into the CPU's RAM
at address A.
(0x06) EXECUTE: The device will not read any registers. If in
MODE_POLL, the device will process the data in the primary
buffer.
If in MODE_INT, the device will swap buffers and then process
the data in the new primary buffer. When finished, the device
will swap buffers again and then interrupt the CPU with the
interrupt message set with SET_INTERRUPT_MESSAGE.
(0x07) SWAP_BUFFERS: Swap the primary and secondary DATA
buffers. Only works in MODE_POLL.
(0x08) SET_INTERRUPT_MESSAGE: The device will set the interrupt
message used while in MODE_INT to the contents of the A
register.
Instruction Set Architecture:
The D17 has sixteen/eight/four (depending on precision) floating point
registers and four 8-bit integer flow control and indexing registers.
The device's memory is indexed according to the precision mode i.e.
in PREC_DOUBLE mode, the smallest addressable unit of memory is a
64-bit double, while in PREC_HALF mode, it is a 16-bit half.
Note: Accessing the upper twelve registers while in double-precision
mode or the upper eight while in single-precision mode is
undefined behaviour.
The instruction mnemonics below use the following scheme:
% means 'floating-point register'
@ means 'integer register'
$ means 'integer immediate'
All instructions fall into one of three categories:
(a) Arithmetic instructions, which each take a single cycle.
rnd % - round to integer (back into %register)
abs % - absolute value
add %,% - addition
mul %,% - multiply
sub %,% - subtract (a = a - b)
rsub %,% - subtract (reverse) (a = b - a)
div %,% - divide (a = a / b)
rdiv %,% - divide (reverse) (a = b / a)
fma %,%,% - fused multiply-add (a = a + b * c)
atan %,%,% - 2-arg arctangent (a = arctan(y/x))
rnd @,% - round to integer (into @register)
lt @,%,% - if %left < %right, then @ = 0, else @ = 1
gt @,%,% - if %left > %right, then @ = 0, else @ = 1
eq @,%,% - if %left == %right, then @ = 0, else @ = 1
ne @,%,% - if %left != %right, then @ = 0, else @ = 1
(b) Floating-point functions such as trigonometric functions,
which take a variable number of cycles.
sin % - sine
cos % - cosine
tan % - tangent
asin % - inverse sine
acos % - inverse cosine
atan % - inverse tangent
sqrt % - square root
log % - logarithm base 10
log2 % - logarithm base 2
ln % - natural logarithm
(c) Integral and flow-control instructions that operate on integer
registers.
noop - no operation
wait - pause until EXECUTE is called again
halt - stop execution with good status and reset state (not DATA)
fail - stop execution with bad status and reset state (not DATA)
dec @ - decrement register
inc @ - increment register
zero @ - set register to zero
jmp @ - jump to register
jmp $ - jump to immediate address
jc @,$ - jump to immediate address if register is zero
jc @,@ - jump to @right if @left is zero
loop @,$ - decrement register then jump to immediate address
if register is zero
set @,$ - load an immediate integer into register
set @,@ - move the value from @right to @left
swap @,@ - swap the values in two registers
Note: The device operates most efficiently when flow-control
is minimised.
(d) Memory-related instructions such as loads and stores.
ld %,$ - load a floating point number from DATA
ld %,@ - load a floating point number from DATA
ldz % - load zero (0.0)
ld1 % - load one (1.0)
ldl2e % - load the logarithm base 2 of e
ldl2t % - load the logarithm base 2 of 10
ldlg2 % - load the logarithm base 10 of 2
ldln2 % - load the natural logarithm of 2
ldpi % - load pi (~3.1415)
lde % - load euler's number (~2.7183)
ldsr2 % - load square root of 2 (~1.4142)
ldphi % - load golden ratio (~1.6180)
st $,% - store a floating point number in DATA
st @,% - store a floating point number in DATA
mov %,% - copy from %right to %left
xchg %,% - exchange %left and %right
Note: The device does not support the loading of arbitrary
literal floating-point immediates, so anything that would
otherwise be an immediate must be loaded as data. The device
does have some instructions to load arbitrary integral
immediates.
Instruction Format:
The precise encoding of each function is given in Appendix 1.
Short form instructions
These are highly dense short forms of other more general
instructions. Any word where the most significant bits of
the MSB and the LSB are both set encodes two short form
instructions viz.
0*** **** 0*** **** - does not encode two short instructions
0*** **** 1*** **** - does not encode two short instructions
1*** **** 0*** **** - does not encode two short instructions
1aaa aaaa 1bbb bbbb - two short form instructions, a and b.
The MSB is executed first. TODO: explain why
zero @a - 00000aa
inc @a - 00001aa
dec @a - 00010aa
jmp @a - 00011aa
jc @a,@b - 001aabb
mov @a,@b - 010aabb
swap @a,@b - 011aabb
add %r,%s - 100rrss (*)
sub %r,%s - 101rrss (*)
mul %r,%s - 110rrss (*)
div %r,%s - 111rrss (*)
Note: the short-form instructions marked with a (*) are only
abled to encode a subset of the operands the instruction can
take.
e.g. add %2,%1 can be short-form-encoded but add %15,%3 cannot.
Immediates
Immediate values are values included in the instruction stream
itself.
Floating-point numbers cannot be encoded as immediates, so any floats
a program needs should be included in DATA. Many commonly used
constants have special loading instructions.
e.g. 'ldpi %0' will set %0 to π.
Integers can be encoded as immediates. Integer registers can be
set to arbitrary integral immediates, but also many integral and
flow control instructions can take immediate arguments without
having to go through the very limited integral register set.
Loops and Flow Control:
The loop instruction implements an efficient downwards-counting
loop. It takes an index register and a relative offset, which it
uses to jump back.
For example, the Babylonian method for computing square roots can be
computed like this:
ld %0,$0 ; load first memory cell (number to sqrt) into %0
ld1 %1 ; load 1 into %1
ld %3,$1 ; load second memory cell (literal 2.0) into %3
set @0,$a ; load 10 into @0
LABEL:
mov %2,%0 ; %2 <- S
div %2,%1 ; %2 <- S / x_n
add %2,%1 ; %2 <- x_n + S / x_n
div %2,%3 ; %2 <- (x_n + S / x_n) / 2
mov %1,%2 ; x_{n+1} <- %2
loop @0,$LABEL ; dec @0 and loop until @0 equals 0.
st $2,%1 ; store result
Note: there is a sqrt instruction.
MODE_POLL Algorithm:
This algorithm is useful if you want to do more than a couple of
sets of data with the same code.
load_text(text);
load_data(data1);
swap_buffers();
execute();
load_data(data2);
while (!ready())
cpu_sleep();
swap_buffers();
while (true) {
execute();
get_data(data1);
if (finished)
goto break1;
load_data(data2);
while (!ready())
cpu_sleep();
swap_buffers();
execute();
get_data(data2);
if (finished)
goto break2;
load_data(data1);
while (!ready())
cpu_sleep();
swap_buffers();
}
break1:
while (!ready())
cpu_sleep();
swap_buffers();
get_data(data2);
goto end;
break2:
while (!ready())
cpu_sleep();
swap_buffers();
get_data(data1);
goto end;
end:
//done!
0000 0000 0000 0000 - noop
0000 0000 0000 0001 - wait
0000 0000 0000 0010 - halt
0000 0000 0000 0011 - fail
0000 0xxx xxxx xxxx - [other zero-arg operations]
0000 1000 0000 aabb - jc @a,@b
0000 1000 0001 aabb - set @a,@b
0000 1000 0010 aabb - swap @a,@b
0000 1000 oooo aabb - [unassigned]
0000 1001 0000 00aa - zero @a
0000 1001 0000 01aa - jmp @a
0000 1001 0000 10aa - inc @a
0000 1001 0000 11aa - dec @a
0000 1001 oooo ooaa - [unassigned]
0000 1010 00bb aaaa - ld %a,@b
0000 1010 01bb aaaa - st @b,%a
0000 1010 10bb aaaa - rnd @b,%a
0000 1010 11bb aaaa - [unassigned]
0000 1xxx xxxx xxxx - [unassigned]
0000 1111 bbbb bbbb - jmp $b
0001 00aa bbbb bbbb - loop @a,$b
0001 01aa bbbb bbbb - jc @a,$b
0001 10aa bbbb bbbb - set @a,$b
0001 11aa bbbb bbbb - [unassigned]
0010 aaaa bbbb bbbb - ld %a,$b
0011 aaaa bbbb bbbb - st $b,%a
0100 0000 0000 aaaa - sin %a
0100 0000 0001 aaaa - cos %a
0100 0000 0010 aaaa - tan %a
0100 0000 0011 aaaa - asin %a
0100 0000 0100 aaaa - acos %a
0100 0000 0101 aaaa - atan %a
0100 0000 0110 aaaa - sqrt %a
0100 0000 0111 aaaa - rnd %a
0100 0000 1000 aaaa - log %a
0100 0000 1001 aaaa - log2 %a
0100 0000 1010 aaaa - ln %a
0100 0000 1011 aaaa - [unassigned]
0100 0000 1100 aaaa - abs %a
0100 0000 1101 aaaa - [unassigned]
0100 0000 111o aaaa - [unassigned]
0100 0001 0000 aaaa - ldz %a
0100 0001 0001 aaaa - ld1 %a
0100 0001 0010 aaaa - ldpi %a
0100 0001 0011 aaaa - lde %a
0100 0001 0100 aaaa - ldsr2 %a
0100 0001 0101 aaaa - ldphi %a
0100 0001 011o aaaa - [unassigned]
0100 0001 1000 aaaa - ldl2e %a
0100 0001 1001 aaaa - ldl2t %a
0100 0001 1010 aaaa - ldlg2 %a
0100 0001 1011 aaaa - ldln2 %a
0100 0001 11oo aaaa - [unassigned]
0100 0000 bbbb aaaa - mov %a,%b
0100 0001 bbbb aaaa - xchg %a,%b
0100 0010 bbbb aaaa - add %a,%b
0100 0011 bbbb aaaa - mul %a,%b
0100 0100 bbbb aaaa - sub %a,%b
0100 0101 bbbb aaaa - rsub %a,%b
0100 0110 bbbb aaaa - div %a,%b
0100 0111 bbbb aaaa - rdiv %a,%b
0100 1ooo bbbb aaaa - [unassigned]
0101 00cc bbbb aaaa - lt @c,%a,%b
0101 01cc bbbb aaaa - gt @c,%a,%b
0101 10cc bbbb aaaa - eq @c,%a,%b
0101 11cc bbbb aaaa - ne @c,%a,%b
0110 cccc bbbb aaaa - atan %a,%y,%x
0111 cccc bbbb aaaa - fma %a,%b,%c
1xxx xxxx 0xxx xxxx - [unassigned]
1aaa aaaa 1bbb bbbb - [short-form]
----
0000 nullary, @,@ and @ ops
0001 various @/$ ops
0010 ld %,$
0011 st %,$
0100 %,% and % ops
0101 comparisons
0110 atan (two arg)
0111 fma
1xxx/0 [unassigned]
1xxx/1 [short-form]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment