Created
June 11, 2019 09:17
-
-
Save luelista/557dd8f7f5b28cc1f9c28776c88ec347 to your computer and use it in GitHub Desktop.
THE LOW-DOWN ON LOADALL
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
THE LOW-DOWN ON LOADALL: | |
EXCERPTS FROM THE BOOK | |
THE HYPER-SPACE NAVIGATOR'S GUIDE | |
by | |
Terrance E. Hodgins | |
copyright (C) 1990 by Terrance E. Hodgins, | |
All rights reserved. | |
Semi-Intelligent Systems | |
PO BOX 4492 | |
ALBUQUERQUE, NM 87196 | |
Compuserve: 76416,553 | |
Internet: [email protected] | |
Internet: terry%[email protected] | |
And now the boring legal stuff: | |
This document uses the following trademarks: | |
AST is a registered trademark of AST Research, Inc. | |
IBM, PC-DOS, PC/XT, and PC/AT are registered trademarks of International Busi- | |
ness Machines Corporation. | |
Intel is a registered trademark of Intel Corporation. | |
Lotus is a registered trademark of Lotus Development Corporation. | |
Microsoft, MS-DOS, Windows '286, and OS/2 are registered trademarks of Micro- | |
soft Corporation. | |
Semi-Intelligent Systems, The Hyper-Space Library, Get-High, HI-DOS, High | |
Code, Xcode, and Mode Code are registered trademarks of Semi-Intelligent | |
Systems. | |
Unix is a registered trademark of AT&T, Inc. | |
Disclaimer of Warranty | |
TERRANCE E. HODGINS, AND SEMI-INTELLIGENT SYSTEMS, EXCLUDE ANY AND ALL | |
IMPLIED WARRANTIES, INCLUDING WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A | |
PARTICULAR PURPOSE. | |
NEITHER TERRANCE E. HODGINS, NOR SEMI-INTELLIGENT SYSTEMS, MAKE ANY | |
WARRANTY OF REPRESENTATION, EITHER EXPRESS OR IMPLIED, WITH RESPECT TO THESE | |
PROGRAMS, THEIR QUALITY, PERFORMANCE, MERCHANTABILITY, OR FITNESS FOR A PAR- | |
TICULAR PURPOSE. | |
NEITHER TERRANCE E. HODGINS, NOR SEMI-INTELLIGENT SYSTEMS, SHALL HAVE | |
ANY LIABILITY FOR SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF | |
OR RESULTING FROM THE USE OR MODIFICATION OF THESE PROGRAMS. | |
THE USE OF THE 80286 LOADALL INSTRUCTION IS INHERENTLY DANGEROUS, AND | |
CAN RESULT IN PROGRAM CRASHES, OR RUN-AWAY PROGRAMS, WHICH CAN ALTER, DAMAGE, | |
OR DESTROY COMPUTER DATA, AND WHICH CAN DAMAGE OR DESTROY COMPUTER HARDWARE. | |
USE ONLY AT YOUR OWN RISK. | |
Introduction | |
Yes, there really is an unpublicized, almost secret, instruction in | |
the 80286, which has the ability to do several supposedly impossible things. | |
It is called Loadall. | |
What Loadall does is completely load all the registers of the 80286 | |
from a table starting at 80:0 in low memory. I do mean ALL registers: every | |
register you ever heard of, and a few you haven't, and also the "invisible" | |
internal registers which are NOT OTHERWISE programmable. Executing a Loadall | |
nearly completely re-defines the CPU's state. | |
This means that it is a great warp, or hyper-space, instruction: | |
executing a Loadall will jump you to someplace new, and leave you with your | |
choice of register contents, status and mode settings, and memory segment | |
mappings, allowing you to have your segments anywhere in the 16-megabyte | |
address space of the 80286. Those of you who are familiar with Unix and C | |
programming will be immediately reminded of the "longjump" routine. Loadall | |
is the ultimate long-jump. | |
This is possible in REAL mode. You do NOT have to go into protected | |
mode to get at memory above 1 Megabyte on the AT. Which also means that you | |
don't have to then go through all kinds of odd-ball gyrations to get back out | |
of protected mode. And better yet, this instruction will work in both REAL | |
and PROTECTED mode. | |
Intel included the Loadall instruction in the 80286 for chip testing | |
(they can throw the CPU into any state, and see if it then does what it is | |
supposed to do), but there are much better uses for it than that (in my | |
not-so-humble opinion). | |
The power of being able to re-program ANY and ALL of the registers of | |
the CPU with one single instruction opens up a whole new world of possibili- | |
ties. | |
Including, but not limited to: | |
getting at all the memory in your machine at will, even if it is | |
addressed above 1 megabyte, from real mode. | |
executing real-mode programs in ram above one megabyte. | |
installing a second operating-system-like program, or command proces- | |
sor, or shell, in memory above 1 megabyte, and alternating between that and | |
DOS. | |
installing most of the guts of custom TSR's, shells, and device-driv- | |
ers in ram above 1 megabyte (freeing up precious base memory), leaving in low | |
memory only the stubs to call the code upstairs. | |
writing very large programs, which are "split", and have half the | |
program residing in the low-down 640K, and the other half up in extended | |
memory, and running in either real or protected mode. | |
installing large protected-mode programs in extended memory, where | |
they will not conflict with, or crowd out DOS, and ping-ponging between them | |
and DOS. | |
switching to protected mode. | |
emulating real mode from protected mode (tough, and full of gotchas, | |
but still worth mentioning). | |
this is really off-the-wall, but possible: building automata that use | |
Loadall to warp from state to state, sort of like a computer game of Life, | |
played in the twilight zone. | |
? use your imagination. The sky's the limit. | |
While the Loadall instruction only exists on the 80286 (to the best of | |
anyone's knowledge at present -- anyone who will talk, that is...), the 386 | |
has other instructions which can accomplish much of the same functions. Thus, | |
it is possible to write code that detects the processor being used, and | |
switches strategy accordingly, using subroutines with 386 op-codes to accom- | |
plish the same functions on a 386. Microsoft is already doing that in their | |
RamDrive.Sys and HiMem.Sys programs. Thus you can have code which will run on | |
both the 286 or 386, and makes the best use of each. | |
This instruction opens up so many possibilities (AND creates so many | |
problems) for things like alternate operating systems, and alternate shells, | |
that live above 1 megabyte, in real mode or protected mode, that I foresee the | |
need for a community library of "Hyper-Space" subroutines, which can still | |
work properly even though some segments are in outer space, or the 80286 is in | |
protected mode. I would be happy to collect these, and pass on the best of | |
them with future distributions of this book and software. | |
Please forgive all the legalistic warning messages. If used properly, | |
and carefully, the Loadall instruction can be quite safe. You haven't seen | |
Microsoft Ramdrive.Sys destroying any systems lately, have you? It's just | |
that a few bothersome people love to sue for anything, so you just have to | |
plaster those stupid warning messages all over everything. | |
LOADALL | |
Okay, so what IS the Loadall instruction? | |
Simple: | |
*** 0F 05 hex *** | |
So how does it work? Well, I've already told you the gist of it: all | |
CPU registers are loaded from a 51-word table of data that starts at 80:0h | |
(absolute 24-bit address 800h). This address is one thing that cannot be | |
changed or re-programmed. It's hard-wired into the chip, and that's that. And | |
that's unfortunate, because all versions of anybody's DOS earlier than version | |
3.3 use that area for critical system code. | |
Loadall takes no operands, and is just a two-byte instruction. All the | |
"operands" for the instruction are obtained from the table at 80:0h. | |
Just put "db 0Fh, 05" in your code stream, and watch the fun. But you | |
had better get that table right before you do, or else... (crash). | |
** THE LOAD TABLE ** | |
----------------------------------------------------------- | |
Address Size CPU register | |
(words) | |
----------------------------------------------------------- | |
800 3 unused (?? I don't believe it.) | |
806 1 MSW (Machine Status Word) | |
808 7 unused (?? I don't believe it.) | |
816 1 TR (Task Register) | |
818 1 Flag Word | |
81A 1 IP (Instruction Pointer) | |
81C 1 LDT (Local Descriptor Table) | |
81E 1 DS (Data Segment, or DS Selector) | |
820 1 SS (Stack Segment, or SS Selector) | |
822 1 CS (Code Segment, or CS Selector) | |
824 1 ES (Extra Segment, or CS Selector) | |
826 1 DI (Destination Index) | |
828 1 SI (Source Index) | |
82A 1 BP (Base Pointer) | |
82C 1 SP (Stack Pointer) | |
82E 1 BX (Data Register BX) | |
830 1 DX (Data Register BX) | |
832 1 CX (Data Register BX) | |
834 1 AX (Data Register BX) | |
836 3 ES Descriptor Cache | |
83C 3 CS Descriptor Cache | |
842 3 SS Descriptor Cache | |
848 3 DS Descriptor Cache | |
84E 3 GDTR | |
(Global-Descriptor-Table Register) | |
854 3 LDTDC | |
(Local-Descriptor-Table Descriptor Cache) | |
85A 3 IDTR | |
(Interrupt-Descriptor-Table Register) | |
860 3 TSSDC | |
(Task-State-Segment Descriptor Cache) | |
total = 33h words == 102. bytes | |
THE DESCRIPTOR CACHE ENTRIES | |
(DSDC, SSDC,CSDC, and ESDC) | |
Wait a minute, forward-referencing again! What's a descriptor? You've | |
already used that word up above, and never defined it. | |
Well okay. A segment descriptor is a four-word structure of informa- | |
tion that describes a segment. A descriptor gives a segment's size and 24-bit | |
starting address, and has a byte of encoded information, called the "access | |
byte", that describes the characteristics of the segment (like whether it is a | |
code segment or a data segment, writable or write-protected, and so on). And | |
the desciptor also has a dummy zero word for upward compatibility with the | |
80386. Segment Descriptors are used in protected mode, but not in real mode. | |
In protected mode, when you want to use a segment of memory, you | |
reference the segment descriptor. The 80286 looks into a table or two of | |
descriptors (which can be quite large, up to 16384 entries), to find the right | |
entry, and find out what the segment is. If you had to do this every time you | |
referenced a memory variable, it would be terribly slow. In order to prevent | |
this overhead, saving the descriptor information for the current segments in | |
quickly-accessible CPU registers is a must. That's what the descriptor caches | |
are for. | |
But you said they aren't used in real mode, right? Right. The soft- | |
ware descriptor tables aren't. But the hardware descriptor caches are. The | |
Intel book on the 80286 seems mighty thin when it comes to telling you pre- | |
cisely what the protected-mode hardware does while in real mode, but some of | |
it still works, and is very important (the descriptor caches in particular). | |
The descriptor caches determine where your segments really are, whether in | |
real or protected mode. | |
In real mode, your segments are all normally 64 Kbytes in size, by | |
default, and are always located in the lowest megabyte of the 80286's 16- | |
megabyte address space. When you want to access a segment, you just load a | |
number for the start of the segment into the appropriate segment register, and | |
then read or write that segment of memory. | |
The segment number that you load is the address scaled down by four | |
bits, so that it really addresses a memory address that is sixteen times the | |
number you gave it. You can address anywhere inside that 64 Kbyte-sized | |
window by using an offset. | |
Since the segment registers are 16 bits in size, and have been scaled | |
by four bits, you have the equivalent of 20-bit addressing, and can address a | |
1-megabyte sized area. That's real mode. | |
Did it ever occur to you that that 1-megabyte sized area might itself | |
just be appearing somewhere inside of an even larger area? | |
And that the 1-megabyte-sized real-mode area is made to start at zero | |
when the 80286 chip is reset, but doesn't have to stay there forever? | |
I mean, if in protected mode, the hardware is there to address a 16- | |
megabyte-sized address space, well, that hardware doesn't just go away when | |
you are in real mode, does it? Or all just get turned off? | |
No, it doesn't. As a matter of fact, it still works just fine, but | |
you weren't given any instructions for doing anything with that part of the | |
hardware from real mode. Or were you? | |
Oh yes you were. It's called LOADALL. | |
So how do the hardware desciptor caches work? Well, they hold the | |
information that was read from a (software) descriptor in memory. The 80286 | |
discards the unused zero word, and keeps the rest. When you address memory, | |
you are actually using the segment addresses in the descriptor cache regis- | |
ters, not what is in the segment registers. | |
Perhaps you thought you were using the segment registers for address- | |
ing: it sure looks like you do, because if you load something into a segment | |
register, you will then address the memory that the segment register is point- | |
ing to. What is happening invisibly in the background is that the correspond- | |
ing descriptor cache is being updated whenever you load a segment register, | |
and then the descriptor cache is being used for the actual addressing. | |
So the segment descriptor caches, and not the segment registers, are | |
what actually control what goes out on the address lines, and hence, what | |
memory you will really address. And the addresses in the segment descriptor | |
caches are 24-bit addresses. Now isn't that special? | |
So if we can use Loadall to load anything we want to into the segment | |
descriptor caches, then we can address anywhere in the 16-megabyte address | |
space of the 80286, right? Right. You got it. | |
The contents of the descriptor cache entries in a Loadall table are: | |
The absolute 24-bit address for the start of the segment, | |
in the usual Intel lowest-byte-first byte-order. | |
That is, the bytes are: lowest, middle, highest. | |
An access byte, customarily set to 92h or 93h. This byte | |
is encoded in the usual way that access bytes are | |
encoded in Global Descriptor Table entries (see the | |
accompanying charts). This byte describes the | |
characteristics of the segment, like whether it is | |
code or data, and write-protected or not. | |
A 16-bit segment limit. This is the segment size, minus | |
one. FFFFh is equal to a full 64K. | |
This ordering is exactly backwards, word-order-wise, from the usual | |
layout of the descriptors used in the protected-mode tables like the Global | |
Descriptor Table. This is because the Loadall instruction is essentially just | |
a giant POP-ALL instruction. The word order is backwards (really, "Stack- | |
wards"), but the byte order within words is not reversed. | |
The addresses loaded into the descriptor caches must be 24-bit "abso- | |
lute" or un-segmented "flat-space" versions of the segment start addresses. | |
IE: a segment address of 3456h becomes an absolute address of 034560h. Remem- | |
ber that segment addresses are ordinarily scaled down by 4 bits. So we have | |
to scale them back up to get the 24-bit flatland equivalent. | |
You will notice that there seems to be some duplication of information | |
here: you have a CS register slot in the Loadall table, which is loaded with | |
the desired code segment start address, and you also have a Code Segment | |
Descriptor Cache entry, with an address slot which is loaded with much the | |
same information. The same is also true of DS, SS, and ES. | |
They can't always be the same, because one is 16 bits, and one 24, and | |
the 24-bit descriptor cache entry can specify the address down to the byte, in | |
the full 16-megabyte address space, while the 16-bit segment register can only | |
address on 16-byte boundaries, and can't address beyond 1 megabyte. | |
So if they are different, which ones win out? The answer is, the | |
Descriptor Caches. They have to, because only the Descriptor Caches have the | |
whole 24-bit address necessary for addressing the entire 16-megabyte address | |
space of the 80286. Also because the Descriptor Caches are what are actually | |
wired to the address lines. In protected mode, the segment registers don't | |
even get close to the address lines. | |
But watch out: this gets tricky. For a simple rule-of-thumb, the | |
proper programming practice to follow is: in real mode, always keep them the | |
same. That is, where the bits of the two overlap, keep them the same. The CS | |
register really holds the equivalent to bits A3 to A19 of the 24-bit address | |
in the CS Descriptor Cache, so there is no way that you can "keep them all the | |
same". But you can keep those bits the same, and you will want to. | |
Why? Because, in real mode, certain operations will update a Descrip- | |
tor Cache using the contents of the paired Segment Register. Oh yeh? Yeh. | |
Example: | |
Even if the code segment entry "CS" in a Loadall table is blatantly | |
wrong, but the value in the Code Segment Descriptor Cache "CSDC" is correct, | |
and the Instruction Pointer "IP" value is correct, and you do a Loadall, the | |
Loadall will still work, and you WILL run the code that you intended to be run | |
after the Loadall, but the program will crash at the first jump instruction | |
after the Loadall. Calls to subroutines will likewise crash if the CS is | |
wrong. | |
The jump or call instruction causes updating of the CS Descriptor | |
Cache contents, using the contents of the CS register, and the offset in the | |
jump or call instruction. So your CS descriptor cache goes from right to | |
wrong, without any further help from you. That's why you have to "keep them | |
the same". | |
(This is just a simple rule of thumb. Like all simple rules, there | |
are exceptions, and the rules can be broken. Breaking these rules doesn't buy | |
you anything, but you might note that this is simply a rule of thumb, not a | |
Commandment From On High.) | |
The same is also true of the other Segment Registers, and their match- | |
ing Descriptor Caches, although the instructions that will cause updating will | |
differ. The commonest operation that causes updating of these descriptor | |
caches is loading a new segment value into a segment register. | |
Now obviously, not all the bits of the address in the Descriptor Cache | |
will be updated by such operations. The highest 4 bits cannot be updated from | |
the segment register, because there are no corresponding bits. So what does | |
it do with them? In real mode, the worst. It clears them. Try doing a jump | |
while executing real-mode code upstairs, above 1 Megabyte, and you will come | |
crashing down out of the sky. A simple jump in code located way upstairs will | |
turn into a very long jump to the lowest megabyte of memory. Probably not | |
what you had in mind, at all. Far jumps and far calls are out of the question | |
for the same reason. | |
Curiously, a call will not cause you to fall out of the sky in the | |
same way as a jump will, so we can do reversed jumps, or reversed calls, by | |
shoving a return address, and then a destination address onto the stack, and | |
then executing a return instruction, where a jump to precisely the same place | |
will crash us. | |
When you are executing code in real mode, above 1 megabyte (plus 64 | |
K), your position is as precarious as that of Icarus flying towards the sun on | |
wings held together with wax. (More on that "plus 64 K" note later.) You | |
must keep interrupts turned off because ANY interrupt will yank you down- | |
stairs, and you won't return upstairs again. The interrupt service routine | |
will change some segments or other, particulary the Code Segment, and those | |
segments' descriptor caches will have the highest four bits irretrievably | |
cleared. | |
And the updating of the lowest four bits is an open question. I | |
always set my segments on 16-byte boundaries so I don't get burned there. | |
That is, the four lowest bits of the 24-bit address are always zero. Thus, | |
the 16-bit segment settings in the segment registers will always match the | |
values of the lowest 20 bits of the descriptor cache settings. | |
Here's what these descriptor cache entries look like, in source code, | |
with a set of default values plugged in: | |
newESDC dw 0, 9200h, 0FFFFh | |
newCSDC dw 0, 9200h, 0FFFFh | |
newSSDC dw 0, 9200h, 0FFFFh | |
newDSDC dw 0, 9200h, 0FFFFh | |
The running program will replace those zeroes in the first and second | |
words of each entry with real addresses before doing the Loadall. | |
The "92"'s are the access bytes, and mean: "this item is a descriptor | |
of a data segment, it is valid, it has the highest possible privilege level | |
(0), writing to it is okay, and it has not been accessed" (really, written to. | |
A 'dirty' page, in virtual-memory-system parlance). | |
Those "FFFF"'s set up segments 64K in size. There's no point in set- | |
ting them any smaller, and a lot of grief to be gotten if you do. So just | |
always set them to "FFFF" in real mode. | |
THOSE OTHER BIG REGISTERS | |
GDTR Global Descriptor Table Register | |
LDTDC Local-Descriptor-Table Descriptor Cache | |
IDTR Interrupt Descriptor Table Register | |
TSSDC Task-State-Segment Descriptor Cache | |
These registers do next to nothing while in real mode. The strategy | |
for dealing with these is: just set them up in an acceptable manner, and then | |
forget them. The Interrupt Descriptor Table Register is the most important of | |
these, as it really does determine the starting address of the interrupt | |
vector table. | |
The format of the data for these registers is just about identical to | |
the format of the data in the Descriptor Caches, except for an unused byte | |
(there is no access byte): | |
an absolute 24-bit address for table start, in the | |
usual Intel byte-order. That is, the bytes | |
are: lowest, middle, highest. | |
an "extra", or "trash", or "dummy" byte (pick your | |
favorite name.) Set to either FFh or 0. | |
a 16-bit limit. This is the table size, minus one. | |
(FFFFh == a full 64K) | |
Set up the GDTR (Global Descriptor Table Register) and the IDTR (Inter- | |
rupt Descriptor Table Register) using the instructions "sgdt" and "sidt" -- | |
"store global descriptor table register", and "store interrupt descriptor | |
table register". These two instructions work in both real and protected mode. | |
The values that we get from them are somewhat goofy (especially since | |
we are getting data about non-existent tables), but we use those values any- | |
way, just to keep the 80286 chip happy. We will just stuff back into the chip | |
whatever is already in there. | |
The LDTDC (Local-Descriptor-Table Descriptor Cache) is a real nothing | |
in real mode. In real mode, there is no Local-Descriptor-Table Descriptor to | |
cache. We just set the LDTDC with an acceptable size, same as the GDTR (88h), | |
and let it go at that. | |
The TSSDC (Task-State-Segment Descriptor Cache) is likewise a null | |
register in real mode. There is no Task State Segment to point to. Again, we | |
just set it up with a size that will keep the 80286 chip from freaking out | |
(thinking that the segment is impossibly small), and let it go at that. | |
Set up, just before doing a Loadall, these items will look like: | |
newGDTR dw D8A0h, 0FF00h, 88h | |
newLDTDC dw 0, 0FF0Eh, 88h | |
newIDTR dw 0, 0FF00h, 0FFFFh | |
newTSSDC dw 4000h, 0FF0Eh, 800h | |
The addresses in the newLDTDC and the newTSSDC are E0000h and E4000h, | |
respectively. There is nothing at those addresses but stupid phantom copies | |
of the BIOS Roms, wasting precious low-memory address space. So what I do is | |
put the non-existent tables on top of the non-existant ROMS, and let them | |
fight it out. In truth, those addresses in those descriptor caches' entries | |
will never really be used for anything, anyway, so they could be anywhere. | |
They just don't matter. The starting address of the IDTR is the only one that | |
does matter. | |
AND ALL THOSE OTHER LITTLE REGISTERS | |
The MSW (Machine Status Word) is normally set to zero. On the 286, | |
only the 4 lowest bits are even used. The one super-important bit in this | |
register is the mode bit. Set it, and you warp into protected mode. The | |
other three bits are invalid and irrelevant if you are not in protected mode. | |
Zero this word, unless you really intend to go into unreal mode. Heaven help | |
your program if you set it, and have not set up all the descriptor tables, and | |
all the protected-mode registers, and cross-linked all the pointers to every- | |
thing, correctly, first. We will get into that can of worms later. | |
Just for reference, here's what the bits are: | |
D0 == PE Protected-Mode Enable (yeh, this is IT.) | |
D1 == MP Monitor Process | |
D2 == EM Emulate Processor Extension | |
D3 == TS Task Switched | |
The TR (Task Register) is another register used only in protected | |
mode. It is used for keeping track of which task is running. Not our problem | |
in real mode. Zero it. | |
The Flag Word is the same old flag word that we are already familiar | |
with from ordinary real-mode programming. We just push the flags word here, | |
and we've done it. Or we can zero it. None of our programs are going to do | |
anything as off-the-wall as a conditional jump right after a Loadall, anyway, | |
right? Uh, right? Why do I see you grinning? 12-dimensional Life, huh? | |
The IP (Instruction Pointer) is critical. This one really works. The | |
address we put here will, in combination with the address in the Code Segment | |
Descriptor Cache (CSDC), determine where we will start executing code immedi- | |
ately after the Loadall. So this acts like a jump vector. We set this up in | |
the our programs, just before doing a Loadall, do determine where we will go | |
next. Better get this one right. | |
The LDT (Local Descriptor Table) is another null register in real | |
mode. Zero it. | |
DS (Data Segment, or DS Selector) | |
SS (Stack Segment, or SS Selector) | |
CS (Code Segment, or CS Selector) | |
ES (Extra Segment, or CS Selector) | |
Set these up so that they contain the same number as bits A4 to A19 of | |
the corresponding Segment Descriptor Cache. These work in conjunction with | |
those. | |
All of the following registers are very straight-forward: just load | |
them with whatever you want the registers to have after the Loadall. If you | |
are not trying to carry values in these registers, you can just default most | |
all of them to zeroes. | |
The stack pointer requires some care, as the stack is one of the best | |
ways to carry data into the beyond. I generally stuff the stack just before a | |
Loadall, and then write the current stack pointer to the SP slot in the Loa- | |
dall table, so that I know that I have it right. | |
DI (Destination Index) | |
SI (Source Index) | |
BP (Base Pointer) | |
SP (Stack Pointer) | |
BX (Data Register BX) | |
DX (Data Register DX) | |
CX (Data Register CX) | |
AX (Accumulator AX) | |
And then these little curiosities: the two "dead" spots in the table. | |
800 3 words unused (?? I don't believe it.) | |
808 7 words unused (?? I don't believe it.) | |
Obviously, they are there for something. They must load some invisi- | |
ble register or other. The registers might be some very transient registers, | |
just for intermediate products, which may not be useful... | |
Then again, considering how much we haven't been told so far, they | |
might be good for something. This is another area for future experimentation. | |
In the mean time, zero them. | |
AND A PRETTY-TOGETHER DEFAULT TABLE | |
So here's what a default Loadall table looks like. Note that | |
"new_Reg_Buf" doesn't label any data item that we really use; it's the name of | |
the whole table. | |
; LOADALL Register Load Table for new values to be loaded | |
; into registers by a Loadall. | |
new_Reg_Buf dw 3 dup (0) ; unused space | |
newMSW dw 0 | |
newDead dw 7 dup (0) ; unused space | |
newTR dw 0 | |
newFlagWord dw 0 | |
newIP dw offset after_ldall ; * may chng | |
newLDT dw 0 | |
newDS dw 0 ; *chng | |
newSS dw 0 ; *chng | |
newCS dw 0 ; *chng | |
newES dw 0 ; *chng | |
newDI dw 0 | |
newSI dw 0 | |
newBP dw 0 | |
newSP dw 0 ; *chng | |
newBX dw 0 | |
newDX dw 0 | |
newCX dw 0 | |
newAX dw 0 | |
newESDC dw 0, 9300h, 0FFFFh ; *chng | |
newCSDC dw 0, 9300h, 0FFFFh ; *chng | |
newSSDC dw 0, 9300h, 0FFFFh ; *chng | |
newDSDC dw 0, 9300h, 0FFFFh ; *chng | |
newGDTR dw D8A0h, 0FF00h, 88h ; @ 0D8A:0 *n | |
newLDTDC dw 0, 0FF0Eh, 88h ; @ E000:0 | |
newIDTR dw 0, 0FF00h, 0FFFFh ; @ 0000:0 *n | |
newTSSDC dw 4000h, 0FF0Eh, 800h ; @ E400:0 | |
Those "*chng" comments mean that those items MUST be changed by the | |
running program before actually doing the Loadall. We cannot correctly default | |
them in the sources because the correct values can only be determined at run- | |
time. | |
The "*n" means that those values are not really in the default tables | |
in the sources: the running program uses the sgdt and sidt instructions to get | |
those values and then plugs them into those two entries. Just letting you see | |
what they will look like. You could have anything in the original table there, | |
because the running program will over-write those items with correct values | |
anyway. | |
The "@ 0D8A:0" comments are just noting the addresses in those items, | |
in a more readable form. | |
GATE A20 : Door to the Beyond | |
Before we get heavy into the guts of actually using the Loadall in- | |
struction, we need to touch on this item: Gate A20. Loadall is almost useless | |
without control of Gate A20. | |
Gate A20 is the gate on the motherboard of the AT that enables or | |
disables the 4 highest address lines, A20 to A23. In order to be PC- | |
compatible, they are ordinarily disabled on an AT. The pathetic PC could only | |
address 1 megabyte of space, total, remember? That's 20 bits. If those lines | |
are disabled, then addressing wraps to zero above FFFF:0010. But if they are | |
enabled, then addressing doesn't wrap, and you can address above 1 Megabyte. | |
This has nothing to do with protected mode. Even if the 80286 were in pro- | |
tected mode, it still couldn't address above 1 Megabyte without enabling Gate | |
A20. | |
In the part of the Hyper-Space Library freely distributed with this | |
document and the View-XM program are routines called "A20_on" and "A20_off". | |
They need no arguments. You just call them, and they will enable or disable | |
Gate A20. Do not make a habit of turning Gate A20 on and just leaving it on, | |
as rumor has it that some barbaric programmers from the bad-old days made a | |
habit of depending on address-wrapping, addressing something like FFFF:0345h | |
to get at 0:0335h. Ugh! These subroutines also check whether Gate A20 was | |
already on before the call, and if so, leave it alone. | |
This leads us to a very interesting twist in the game: what if you | |
turn on Gate A20, and load FFFFh into a segment register, like the DS regis- | |
ter, and then address something like DS:0300h? The answer is, you will ad- | |
dress beyond 1 megabyte, without either going into protected mode, or using | |
Loadall tricks. The PC can only address 1 Megabyte total, but the AT can | |
address 1 Megabyte, plus 64K, minus 16 bytes, in REAL mode, without Loadall. | |
This a big part of the XMS driver specification. That's the eXtended | |
Memory Specification (not to be confused with the "EMS" Expanded Memory Speci- | |
fication). The XMS driver accesses memory addressed above 1 Megabyte on AT's. | |
You can write programs which use standardized calls to the XMS driver, | |
and expect that the program will work with anyone's XMS driver. Microsoft, | |
Intel, Lotus, and AST Research (the authors) have put the XMS specification in | |
the public domain (although they retain the copyright), and it is currently | |
supported by them, and probably by many more companies that I don't know of, | |
so we should be seeing plenty of good XMS device-drivers around, and, in turn, | |
programs using it. | |
Furthermore, Microsoft will give you a copy of the XMS driver, and | |
standard, free, if you write to them and ask for one. Write to Microsoft | |
Corporation, 16011 NE 36th Way, Box 97017, Redmond WA 98073, and politely | |
request a floppy copy of the XMS standard and driver. The same files are | |
available from many bulletin board systems, and anonymous FTP sites. | |
Since you want a nice, clean, non-colliding standard way for your | |
programs to be able to get at more ram, using the EMS and XMS standards is the | |
only good way to go. Throughout this book, we are going to support those | |
standards, and others, too. | |
The recommended programming practice is to always support the XMS | |
standard, and use requests to the XMS driver to get at extended memory, rather | |
than just brute-force doing it yourself, even though you can with Loadall, so | |
that your programs will not conflict with others. | |
The PC world is already far too filled with gotchas and incompatibili- | |
ties, and things that collide with other things, for us to be adding to the | |
misery. | |
The one thing that the XMS driver adds, that you will not have if you | |
just take over and use an area of extended memory yourself, is any kind of | |
collision prevention or co-ordination between programs. You won't know if | |
another program is already using that area, but the XMS driver will, as long | |
as the other program is also using the driver. So everybody better be adher- | |
ing to the standard! | |
On the other hand, you would not be reading this book about the | |
"secret" Loadall instruction if you were all that committed to ONLY using | |
"normal" standards, would you? The trick is to support the standards, without | |
being constrained by them. This requires great care and thought about the | |
consequences of any use of Loadall for "non-standard" activities. You can, | |
for instance, allocate some memory, using the XMS driver, and then go ahead | |
and use Loadall to do anything you want to with it, since you now own it. You | |
have the best of both worlds. | |
And so what do the XMS drivers use to get at the extended memory above | |
the High-Memory Area? Either going into protected mode, or Loadall. | |
THE PROCEDURE FOR USING LOADALL | |
(the ultra-safe, long procedure) | |
1. Save the original machine state, so you have a state to return to. | |
This information can be saved in a Loadall table, which is the most convenient | |
form for later use. | |
2. Disable interrupts. Just in case. We want a clean copy of area 80. | |
3. Save the 102-byte (33h words) block of data located at 80:0h. Ver- | |
sions of DOS (both PC- and MS-) earlier than 3.3 use this area for critical | |
system code, and as of DOS 3.3, RamDrive.Sys, and Himem.Sys use this area for | |
their own Loadall tables. | |
4. Re-enable interrupts. Let the clock ticks, or whatever, through, | |
while we do the following step. | |
5. Set up the new Loadall table (new_reg_buf), which defines the new | |
state we want to warp to. | |
6. Disable Interrupts. | |
7. Copy the new Loadall table to 80:0h. | |
8. Execute a Loadall. | |
9. Do something or other with your new machine state. Read or write | |
extended memory, run code upstairs, or whatever. | |
10. Copy the "old" Loadall table, containing the saved machine state, down | |
to 80:0. | |
11. Do another Loadall (Un-Loadall.) This restores the original machine | |
state. | |
12. Copy the block of saved data back to 80:0h. | |
13. Re-enable interrupts. | |
And you have done it. | |
This is the long, drawn-out method. There are various short-cuts and | |
speedups possible. | |
If all you have been doing is reading or writing extended memory, for | |
instance, then you don't have to do the second loadall. Just changing a | |
segment register (loading a new value) will cause the corresponding Descriptor | |
Cache to drop its four highest address bits, restoring addressing to the low | |
megabyte. | |
Read the sources for the program "View-XM" for more details on this. | |
See the full text of The Hyper-Space Navigator's Guide for more. | |
The Hyper-Space Navigator's Guide, the book and software library, is | |
available from Semi-Intelligent Systems for $49.00 (students get 20% | |
discount), and comes with the floppy of source code. With other books, you | |
have to pay $10 or $20 more to get the floppy that should have come with the | |
book in the first place. Here you don't. It is available on any common | |
floppy format: 5.25" 360K or 1.2MB, or 3.5" 720K. If you order it, please | |
state your floppy format preference. | |
FULL source code in assembly and C is provided. | |
The Hyper-Space Navigator's Guide gives the full low-down on Loadall, | |
and other 286- and 386-compatable extended-memory tricks, too: the good, the | |
bad, and the ugly. | |
The book comes with a library of subroutines designed to facilitate | |
the use of extended memory, and includes numberous demo programs which do just | |
about everything you can do with Loadall (or without), including: | |
reading and writing extended memory. | |
running code up there, in both real and protected mode. Yes, you can | |
use Loadall to warp directly into protected mode. Or you can do it the | |
"normal" way, so that the code will be 386-compatable. Both ways are imple- | |
mented in the code. | |
going into, and running in, and then getting back out of protected | |
mode, from within your own programs, on both the 286 and 386. Getting into | |
protected mode is relatively easy. Try getting back out on a 286. I'll show | |
you how. | |
writing "split" programs, with a low-memory half, and a high- or | |
extended-memory half, with the second half in real or protected mode. The | |
cat's meow for image-processing programs which eat memory space like popcorn. | |
installing either real- or protected-mode "high code" inside an ex- | |
tended-memory ram-disk file, where it won't collide with anything or anybody | |
else, and then using a TSR to launch directly into running the code from in | |
there (thus turning a piece of the ram-disk back into ram). | |
Again, full and complete source code, so that the demo programs also | |
supply you with hackable skeletons for quickly building your own programs, | |
(without the many months of day-and-night hacking and hair-tearing I went | |
through to figure out this stuff). Just throw away the middle of the demo | |
main routine and plug your code in. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment