Last active
November 1, 2017 12:14
-
-
Save milesrout/4aa35266e2a3944d7f35 to your computer and use it in GitHub Desktop.
DFPU-17
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A Floating Point Unit for the DCPU-16 | |
This is a short document describing the DFPU-17 (D17), a | |
floating-point coprocessor for the DCPU-16 (D16). Despite its | |
simplicity, the DCPU-16 provides significant | |
functionality. However, a significant problem is its lack of | |
high-throughput floating-point support. The D17 exists to solve | |
this problem. | |
DCPU-16 Hardware Info: | |
Name: DFPU-17 - Floating Point Coprocessor | |
ID: 0x1DE171F3, Version: 0x0001 | |
Manufacturer: 0x52307537 (MILES_ROUT) | |
Description: | |
The DFPU-17 (D17) is an optional floating-point coprocessor for | |
the DCPU-16. Its main purpose is to be used for navigation and | |
orientation in space. It is capable of operating in various modes | |
depending on whether the user's needs. | |
The D17 has three banks of memory: two 256-word DATA banks and a | |
256-word TEXT bank. Each time floating-point data needs to be | |
processed, the user loads text and data into the device and | |
instructs it to execute the text on the data. The two DATA banks | |
are double-buffered: the CPU can retrieve data while the device is | |
working on data. The device works on the "primary buffer" while | |
the "secondary buffer" is readable and writable by the CPU. | |
Swapping the buffers simply swaps these labels: the primary | |
buffer becomes the secondary buffer and vice versa. | |
Note: The TEXT bank is NOT double-buffered. If the CPU interrupts | |
the device with the LOAD_TEXT command while the device is | |
executing code, undefined behaviour will occur: most likely the | |
device will be rendered inoperable until a hard reset is | |
performed. | |
Note: 'Byte' is not used in this document, to avoid confusion. | |
A word is a 16-bit quantity. | |
Interrupt Behaviour: | |
When a hardware interrupt is received by the D17, it reads the Z | |
register and does one of the following actions: | |
(0x00) SET_MODE: The device will read the A register and sets the | |
operating mode to one of the following options: | |
0x00: MODE_OFF. The device will be deactivated, clearing | |
all of its data and code memory, and putting it into | |
standby mode. The device will have negligible power drain | |
while in standby, and will respond only to the SET_MODE | |
command. | |
0x01: MODE_INT: The device will be activated if in | |
MODE_OFF. Whenever the device receives an EXECUTE command, | |
it will swap buffers and then execute. When it has completed | |
execution it will swap buffers again and interrupt the | |
CPU. | |
Note: In MODE_INT the device can only effectively use one | |
buffer. | |
Note: Switching to MODE_INT will fail and set an error | |
status and error message if SET_INTERRUPT_MESSAGE has not | |
been called. | |
0x02: MODE_POLL: The device operates in the same way as | |
MODE_INT, except the D17 will not automatically swap buffers, | |
nor interrupt the CPU when finished. Instead, it is the CPU's | |
responsibility to poll the device to determine its status. | |
Note: In MODE_POLL the CPU has more responsibility and incurs | |
more load, but as a result it can make use of both DATA banks. | |
Note: See end of document for a demonstration of a procedure for | |
using MODE_POLL. | |
(0x01) SET_PREC: The device will read the A register and set the | |
working precision level. The default is single-precision. | |
0x00: PREC_SINGLE: The D17 will work with single-precision | |
floating point numbers. Invalidates loaded text and data. | |
0x01: PREC_DOUBLE: The D17 will work with double-precision | |
floating point numbers. Invalidates loaded text and data. | |
0x02: PREC_HALF: The D17 will work with half-precision | |
floating point numbers. Invalidates loaded text and data. | |
(0x02) GET_STATUS: The next interrupt to the device with the | |
message 0xFFFF will set the following registers as | |
described: | |
A: STATUS: the status, selected from the statuses listed | |
in Table I: Statuses. | |
B: ERROR: the most recent error message, selected from the | |
error messages listed in Table II: Error Messages. | |
C: the register is set to 0. X: the register is set to 0. | |
Note: this is so that future expansion of GET_STATUS | |
incorporating useful information in C and X will not break | |
programs that assume that those registers will be ignored | |
by GET_STATUS. | |
(0x03) LOAD_DATA: The device will read the A register and load | |
512 words from the CPU's RAM at address A into the secondary | |
DATA buffer. | |
(0x04) LOAD_TEXT: The device will read the A register and load | |
256 words from the CPU's RAM at address A into the | |
TEXT buffer. | |
(0x05) GET_DATA: The device will read the A register and load 512 | |
words of data from the secondary buffer into the CPU's RAM | |
at address A. | |
(0x06) EXECUTE: The device will not read any registers. If in | |
MODE_POLL, the device will process the data in the primary | |
buffer. | |
If in MODE_INT, the device will swap buffers and then process | |
the data in the new primary buffer. When finished, the device | |
will swap buffers again and then interrupt the CPU with the | |
interrupt message set with SET_INTERRUPT_MESSAGE. | |
(0x07) SWAP_BUFFERS: Swap the primary and secondary DATA | |
buffers. Only works in MODE_POLL. | |
(0x08) SET_INTERRUPT_MESSAGE: The device will set the interrupt | |
message used while in MODE_INT to the contents of the A | |
register. | |
Instruction Set Architecture: | |
The D17 has sixteen/eight/four (depending on precision) floating point | |
registers and four 8-bit integer flow control and indexing registers. | |
The device's memory is indexed according to the precision mode i.e. | |
in PREC_DOUBLE mode, the smallest addressable unit of memory is a | |
64-bit double, while in PREC_HALF mode, it is a 16-bit half. | |
Note: Accessing the upper twelve registers while in double-precision | |
mode or the upper eight while in single-precision mode is | |
undefined behaviour. | |
The instruction mnemonics below use the following scheme: | |
% means 'floating-point register' | |
@ means 'integer register' | |
$ means 'integer immediate' | |
All instructions fall into one of three categories: | |
(a) Arithmetic instructions, which each take a single cycle. | |
rnd % - round to integer (back into %register) | |
abs % - absolute value | |
add %,% - addition | |
mul %,% - multiply | |
sub %,% - subtract (a = a - b) | |
rsub %,% - subtract (reverse) (a = b - a) | |
div %,% - divide (a = a / b) | |
rdiv %,% - divide (reverse) (a = b / a) | |
fma %,%,% - fused multiply-add (a = a + b * c) | |
atan %,%,% - 2-arg arctangent (a = arctan(y/x)) | |
rnd @,% - round to integer (into @register) | |
lt @,%,% - if %left < %right, then @ = 0, else @ = 1 | |
gt @,%,% - if %left > %right, then @ = 0, else @ = 1 | |
eq @,%,% - if %left == %right, then @ = 0, else @ = 1 | |
ne @,%,% - if %left != %right, then @ = 0, else @ = 1 | |
(b) Floating-point functions such as trigonometric functions, | |
which take a variable number of cycles. | |
sin % - sine | |
cos % - cosine | |
tan % - tangent | |
asin % - inverse sine | |
acos % - inverse cosine | |
atan % - inverse tangent | |
sqrt % - square root | |
log % - logarithm base 10 | |
log2 % - logarithm base 2 | |
ln % - natural logarithm | |
(c) Integral and flow-control instructions that operate on integer | |
registers. | |
noop - no operation | |
wait - pause until EXECUTE is called again | |
halt - stop execution with good status and reset state (not DATA) | |
fail - stop execution with bad status and reset state (not DATA) | |
dec @ - decrement register | |
inc @ - increment register | |
zero @ - set register to zero | |
jmp @ - jump to register | |
jmp $ - jump to immediate address | |
jc @,$ - jump to immediate address if register is zero | |
jc @,@ - jump to @right if @left is zero | |
loop @,$ - decrement register then jump to immediate address | |
if register is zero | |
set @,$ - load an immediate integer into register | |
set @,@ - move the value from @right to @left | |
swap @,@ - swap the values in two registers | |
Note: The device operates most efficiently when flow-control | |
is minimised. | |
(d) Memory-related instructions such as loads and stores. | |
ld %,$ - load a floating point number from DATA | |
ld %,@ - load a floating point number from DATA | |
ldz % - load zero (0.0) | |
ld1 % - load one (1.0) | |
ldl2e % - load the logarithm base 2 of e | |
ldl2t % - load the logarithm base 2 of 10 | |
ldlg2 % - load the logarithm base 10 of 2 | |
ldln2 % - load the natural logarithm of 2 | |
ldpi % - load pi (~3.1415) | |
lde % - load euler's number (~2.7183) | |
ldsr2 % - load square root of 2 (~1.4142) | |
ldphi % - load golden ratio (~1.6180) | |
st $,% - store a floating point number in DATA | |
st @,% - store a floating point number in DATA | |
mov %,% - copy from %right to %left | |
xchg %,% - exchange %left and %right | |
Note: The device does not support the loading of arbitrary | |
literal floating-point immediates, so anything that would | |
otherwise be an immediate must be loaded as data. The device | |
does have some instructions to load arbitrary integral | |
immediates. | |
Instruction Format: | |
The precise encoding of each function is given in Appendix 1. | |
Short form instructions | |
These are highly dense short forms of other more general | |
instructions. Any word where the most significant bits of | |
the MSB and the LSB are both set encodes two short form | |
instructions viz. | |
0*** **** 0*** **** - does not encode two short instructions | |
0*** **** 1*** **** - does not encode two short instructions | |
1*** **** 0*** **** - does not encode two short instructions | |
1aaa aaaa 1bbb bbbb - two short form instructions, a and b. | |
The MSB is executed first. TODO: explain why | |
zero @a - 00000aa | |
inc @a - 00001aa | |
dec @a - 00010aa | |
jmp @a - 00011aa | |
jc @a,@b - 001aabb | |
mov @a,@b - 010aabb | |
swap @a,@b - 011aabb | |
add %r,%s - 100rrss (*) | |
sub %r,%s - 101rrss (*) | |
mul %r,%s - 110rrss (*) | |
div %r,%s - 111rrss (*) | |
Note: the short-form instructions marked with a (*) are only | |
abled to encode a subset of the operands the instruction can | |
take. | |
e.g. add %2,%1 can be short-form-encoded but add %15,%3 cannot. | |
Immediates | |
Immediate values are values included in the instruction stream | |
itself. | |
Floating-point numbers cannot be encoded as immediates, so any floats | |
a program needs should be included in DATA. Many commonly used | |
constants have special loading instructions. | |
e.g. 'ldpi %0' will set %0 to π. | |
Integers can be encoded as immediates. Integer registers can be | |
set to arbitrary integral immediates, but also many integral and | |
flow control instructions can take immediate arguments without | |
having to go through the very limited integral register set. | |
Loops and Flow Control: | |
The loop instruction implements an efficient downwards-counting | |
loop. It takes an index register and a relative offset, which it | |
uses to jump back. | |
For example, the Babylonian method for computing square roots can be | |
computed like this: | |
ld %0,$0 ; load first memory cell (number to sqrt) into %0 | |
ld1 %1 ; load 1 into %1 | |
ld %3,$1 ; load second memory cell (literal 2.0) into %3 | |
set @0,$a ; load 10 into @0 | |
LABEL: | |
mov %2,%0 ; %2 <- S | |
div %2,%1 ; %2 <- S / x_n | |
add %2,%1 ; %2 <- x_n + S / x_n | |
div %2,%3 ; %2 <- (x_n + S / x_n) / 2 | |
mov %1,%2 ; x_{n+1} <- %2 | |
loop @0,$LABEL ; dec @0 and loop until @0 equals 0. | |
st $2,%1 ; store result | |
Note: there is a sqrt instruction. | |
MODE_POLL Algorithm: | |
This algorithm is useful if you want to do more than a couple of | |
sets of data with the same code. | |
load_text(text); | |
load_data(data1); | |
swap_buffers(); | |
execute(); | |
load_data(data2); | |
while (!ready()) | |
cpu_sleep(); | |
swap_buffers(); | |
while (true) { | |
execute(); | |
get_data(data1); | |
if (finished) | |
goto break1; | |
load_data(data2); | |
while (!ready()) | |
cpu_sleep(); | |
swap_buffers(); | |
execute(); | |
get_data(data2); | |
if (finished) | |
goto break2; | |
load_data(data1); | |
while (!ready()) | |
cpu_sleep(); | |
swap_buffers(); | |
} | |
break1: | |
while (!ready()) | |
cpu_sleep(); | |
swap_buffers(); | |
get_data(data2); | |
goto end; | |
break2: | |
while (!ready()) | |
cpu_sleep(); | |
swap_buffers(); | |
get_data(data1); | |
goto end; | |
end: | |
//done! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0000 0000 0000 0000 - noop | |
0000 0000 0000 0001 - wait | |
0000 0000 0000 0010 - halt | |
0000 0000 0000 0011 - fail | |
0000 0xxx xxxx xxxx - [other zero-arg operations] | |
0000 1000 0000 aabb - jc @a,@b | |
0000 1000 0001 aabb - set @a,@b | |
0000 1000 0010 aabb - swap @a,@b | |
0000 1000 oooo aabb - [unassigned] | |
0000 1001 0000 00aa - zero @a | |
0000 1001 0000 01aa - jmp @a | |
0000 1001 0000 10aa - inc @a | |
0000 1001 0000 11aa - dec @a | |
0000 1001 oooo ooaa - [unassigned] | |
0000 1010 00bb aaaa - ld %a,@b | |
0000 1010 01bb aaaa - st @b,%a | |
0000 1010 10bb aaaa - rnd @b,%a | |
0000 1010 11bb aaaa - [unassigned] | |
0000 1xxx xxxx xxxx - [unassigned] | |
0000 1111 bbbb bbbb - jmp $b | |
0001 00aa bbbb bbbb - loop @a,$b | |
0001 01aa bbbb bbbb - jc @a,$b | |
0001 10aa bbbb bbbb - set @a,$b | |
0001 11aa bbbb bbbb - [unassigned] | |
0010 aaaa bbbb bbbb - ld %a,$b | |
0011 aaaa bbbb bbbb - st $b,%a | |
0100 0000 0000 aaaa - sin %a | |
0100 0000 0001 aaaa - cos %a | |
0100 0000 0010 aaaa - tan %a | |
0100 0000 0011 aaaa - asin %a | |
0100 0000 0100 aaaa - acos %a | |
0100 0000 0101 aaaa - atan %a | |
0100 0000 0110 aaaa - sqrt %a | |
0100 0000 0111 aaaa - rnd %a | |
0100 0000 1000 aaaa - log %a | |
0100 0000 1001 aaaa - log2 %a | |
0100 0000 1010 aaaa - ln %a | |
0100 0000 1011 aaaa - [unassigned] | |
0100 0000 1100 aaaa - abs %a | |
0100 0000 1101 aaaa - [unassigned] | |
0100 0000 111o aaaa - [unassigned] | |
0100 0001 0000 aaaa - ldz %a | |
0100 0001 0001 aaaa - ld1 %a | |
0100 0001 0010 aaaa - ldpi %a | |
0100 0001 0011 aaaa - lde %a | |
0100 0001 0100 aaaa - ldsr2 %a | |
0100 0001 0101 aaaa - ldphi %a | |
0100 0001 011o aaaa - [unassigned] | |
0100 0001 1000 aaaa - ldl2e %a | |
0100 0001 1001 aaaa - ldl2t %a | |
0100 0001 1010 aaaa - ldlg2 %a | |
0100 0001 1011 aaaa - ldln2 %a | |
0100 0001 11oo aaaa - [unassigned] | |
0100 0000 bbbb aaaa - mov %a,%b | |
0100 0001 bbbb aaaa - xchg %a,%b | |
0100 0010 bbbb aaaa - add %a,%b | |
0100 0011 bbbb aaaa - mul %a,%b | |
0100 0100 bbbb aaaa - sub %a,%b | |
0100 0101 bbbb aaaa - rsub %a,%b | |
0100 0110 bbbb aaaa - div %a,%b | |
0100 0111 bbbb aaaa - rdiv %a,%b | |
0100 1ooo bbbb aaaa - [unassigned] | |
0101 00cc bbbb aaaa - lt @c,%a,%b | |
0101 01cc bbbb aaaa - gt @c,%a,%b | |
0101 10cc bbbb aaaa - eq @c,%a,%b | |
0101 11cc bbbb aaaa - ne @c,%a,%b | |
0110 cccc bbbb aaaa - atan %a,%y,%x | |
0111 cccc bbbb aaaa - fma %a,%b,%c | |
1xxx xxxx 0xxx xxxx - [unassigned] | |
1aaa aaaa 1bbb bbbb - [short-form] | |
---- | |
0000 nullary, @,@ and @ ops | |
0001 various @/$ ops | |
0010 ld %,$ | |
0011 st %,$ | |
0100 %,% and % ops | |
0101 comparisons | |
0110 atan (two arg) | |
0111 fma | |
1xxx/0 [unassigned] | |
1xxx/1 [short-form] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment