In x86 Protected Mode, a Segment is described by two parameters, the Base address and the Limit:
- The Base address is where in the CPU-addressable space the Segment starts;
- The Limit is the last Segment-relative address that can be accessed offset from the Base.
For example, a Memory Descriptor entry in the Global Descriptor Table may describe an Expand Up Data Descriptor starting at
0000_C000h - in other words, 48 kiB (+1 byte) in the middle of Conventional RAM.
Given the above Descriptor assigned to the
DS segment register, accessing a Byte at
[DS:0000_C000h] will work, but
[DS:0000_C001h] or above won't. The biggest advantage of this mechanism is that various offsets embedded in a program don't need to be "fixed up" at load time to cater for the load address of the program: since every offset is relative to a Segment's Base, the fix-ups are performed by the CPU at run time.
While the Base address is affected by things like Paging (if enabled), the Limit is simply the number of contiguous bytes after that: any access higher than the Limit will cause a General Protection Fault. Unless the Descriptor is for an Expand Down segment, in which case everything changes (see below)...
When Intel introduced Descriptor Tables in the 80286 (with 16-bit data but a 24-bit address bus), they defined one Entry to be 8 bytes. That made it easy to use a Segment Register as a Selector, with a couple of leftover bits as a GDT/LDT selector and Privilege Level selector. Of the 8 bytes, they only needed 6: 3 for the (24-bit maximum) Base, 2 for the (16-bit maximum) Limit, and 1 for the Entry's Type (Data vs Code vs System). They specified that the last two bytes of the eight were reserved, and must be zero.
Since the Limit was 16 bits, Intel needed to decide how to interpret the edge conditions of 0 kiB and 64 kiB: would a Limit of
0000h mean no access (what's the point of a zero-length Segment?) or full access (all 64 kiB available)? So (of course) they chose the third option: the Limit would be the last accessible byte. A Limit of
0000h allowed only 1 accessible address, while
FFFFh allowed all of them.
When Intel introduced the 32-bit 80386 (with both 32-bit data and address buses), they had a problem. They needed one more byte to hold the new 32-bit Base, but two more bytes to hold the new 32-bit Limit - and they only had two bytes spare. They still wanted to be backward-compatible and run existing 80286 software unchanged, but they also wanted to take full advantage of the new 32-bit addresses.
But Intel realised that when dealing with really large Segments, a programmer wouldn't be agonizing over whether it should be 12,086,384 or 12,086,385 bytes in size. At those sorts of sizes - especially with the '386's new paging functionality - a programmer would be working at a higher granularity: in 4 kiB Pages rather than bytes. A Page needs 12 bits to address into, so if the Limit field was marked as Page-granular rather than byte-granular, it would only need to be (32-12=) 20 bits in size - and 16 were already defined!
So they made a compromise in the Descriptor Table Entry. They did add an extra byte for the Base, making it fully 32-bit addressable, but they turned the last available byte into a compound Limit-with-Flags record. The last four bits required for the maximum-sized Limit was the low nybble of that byte, and the high nybble was used for two flags, called Gran and Big:
- Gran is obvious - it indicates whether to use byte-granular (=0) or Page-granular (=1) calculations on the Limit. With a 20-bit Limit in byte-granular mode, it is possible to fine-tune a Segment to anything from 1 byte to 1 MiB. In Page-granular mode, you can specify anything from 4 kiB to 4 GiB in 4 kiB jumps. The 20-bit Limit is shifted 12 bits to the left, and
1s are shifted in. A Page-granular Limit of
- Big is less obvious - it indicates whether addresses used by implicit registers will be 16-bit (=0) or 32-bit (=1). What are implicit registers? There are only two:
(E)SP. When used for a Code segment, instructions are fetched by either
EIP. When used for a Stack, values are pushed using either
ESP. The Big flag indicates which one.
Incidentally, when used in a Code Descriptor, the Big flag is also known as the Default flag. It sets the Default 16- or 32-bit mode that instructions will be interpreted by, when executed in this Segment. For example, the
B8hop-code under a Default=0 Code Segment means
MOV AX,...and needs to be followed by 2 bytes for the 16-bit value to load. Under a Default=1 Code Segment it means
MOV EAX,...and needs to be followed by 4 bytes for the 32-bit value to load. The
AdSiz (67h)instruction prefixes toggle the current Default interpretation.
The effect of the Big flag is subtle: it influences things like in Code Segments the size of the value pushed during a
CALL, and in Data Segments the Segment wrap for Stacks - if
PUSH EAX will make
FFFF_FFFCh depending on that flag - which doesn't mean much until you get to Expand Down Segments.
Segmentation has its detractors. Indeed, most Operating Systems today don't use it - or at least only the minimum vestigial references as required by the architecture. However, the OS then doesn't have the hardware security features inherent in Segmentation; it must perform address fix-ups at load time; and shared code becomes a nightmare. Segmentation solves all of this - abandoning it means that the OS has to duplicate the work.
Some examples might make things clearer.
Many of today's security bugs (and probably application bugs too) are as a result of invalid memory accesses. Whether that's through buffer under/overflows or random pointers, many security-related flaws involve accessing memory that weren't designed to be accessed. To (attempt to) counter these, all sorts of systems have been invented: Address Space Layout Randomisation (ASLR); stack canaries/guard pages; deny execute access; et al. These are designed to limit the damage that accidental or malicious memory references can wreak - and often don't completely work.
The point is that Segmentation already protected against all of that! No memory access (even by the kernel) can go beyond the associated data: you can't overflow the Stack to modify Heap data; or write data then execute it as code. While it is still possible to do these tricky things, by default Segmentation disables this, whereas the flat memory model perforce by default enables it.
When you write code, you don't usually care (or need to know) where the code ends up in memory. In fact, the latest OSes actually randomise where code goes to prevent malicious use of that knowledge.
So at compile time the linker arbitrarily allocates memory locations as needed, and writes a table for the OS Loader to reference when it loads the code into memory. The main program usually gets loaded at the same place in the Virtual Address Space - but (dynamically linked) libraries may not be. If two libraries are linked with overlapping addresses, then the OS Loader has to perform a "fix-up" at load time, adding a delta to each and every embedded code reference to accommodate the change in load location. This slows down the application launch process with housekeeping. Worse, that also means that you can't "page out" the code when RAM gets tight without either writing that code to the Page File, or re-fixing it up when loading it back again.
How does Segmentation fix this? The code can refer to relative Segment locations rather than absolute memory locations. You want to jump to a routine? Reference the Segment number and the offset within that Segment. The (local) Segment can be fixed at link time - all the relative references don't change, regardless of where the code is loaded. The OS Loader can place the code at whatever absolute address it likes - it just fills in the appropriate Local Descriptor Table entry with the address - the actual code doesn't have to change. Need to page out the code and reload it? Use whatever new absolute address you like, since the Segment address doesn't change - no fix-ups required.
Imagine that you define a Heap to be 4 kiB in size. You put it in its own Segment, and allow the program to
free() chunks of memory from it. If an
alloc() comes through for more memory than is available, the code has two choices:
- It can deny the request;
- It can increase the Segment size, which may require moving the whole Segment to a new area of RAM.
This second option is where the beauty of Segmentation comes in: after the move, the Segment's Base address can be modified to point to the new memory and the calling program doesn't know anything happened. It still accesses the Heap with the same Selector value, and unless it uses the
LSL (Load Segment Limit) instruction, it won't even know that the Segment grew.
The point is that growing a Heap doesn't affect any of the offsets in the original data: even the stored Selector value doesn't change. Unfortunately, the same is not true for Stacks.
Now imagine that you define a 4 kiB Stack. Again it is in its own Segment, and
SP starts at
1000h and works its way downwards. When it hits
0000h it's a valid address - but one more
PUSH will make
SP overflow the Limit. At this point the code has two choices:
- It can abort the program due to lack of Stack space;
- It can increase the Stack's size, which again may require moving the whole Segment to a new area of RAM.
The move is again not a problem: change the Segment's Base and it's done. But the Limit is the problem: there's nowhere for the Stack to grow to! Just increasing the Limit won't do anything: the Stack is at the bottom of the Segment, not the top. And moving all the data inside the Stack to the top of the new Segment really won't work: it will muck up all the saved offsets, since (for example)
1000h needs to become
2000h - but which values need to be changed? And what about the values stored in the program's registers? Which of those should be changed?
So in short, growing a Stack like you can a Heap (described above) can't be done. At least, not using traditional Data Segments anyway, which is why Intel designed Expand Down Segments. As it sounds, they're designed to be used with Stacks to permit them to be expanded during run-time. Just by setting a single bit in the Descriptor Table Entry's Type field changes the interpretation of the whole Data Segment completely.
There are two ways to look at Expand Down Segments:
- Valid vs Invalid:
- An Expand Up Segment uses the Limit to define valid and invalid addresses above Base;
- The Expand Down flag swaps them, making the once-valid addresses invalid, and vice versa.
- An Expand Up Segment defines the Base and the largest possible offset you're allowed to add to that Base:
- The Expand Down flag for a 32-bit Segment makes the Base actually the top of the accessed memory, and the Limit (+1) becomes the lowest possible offset you can access below that.
This is slightly incorrect since the CPU is still adding the offset to the Base - but since the offset is so large it's effectively negative, resulting in a smaller final address.
- The Expand Down flag for a 16-bit Segment has the Base 64 kiB below the highest accessible address.
So a 16-bit Stack of 1 kiB with a Base address of
05_0000h and a Limit of
FC00h (marked Expand Down, of course) would start with an
0000h - which may look worrying to some programmers. But it's perfectly safe: the first
PUSH AX or
CALL would make
FFFEh and store the value there, which would actually store the value at RAM location
05_FFFEh. Subsequent Stack operations would continue to decrease
SP until it hit the value
FC00h - where an access would cause a Stack Fault.
The sharp-eyed among you will have seen the difference: an Expand Up Segment's Limit is inclusive - the specified Limit is still an accessible byte. That means that in an Expand Down Segment, the Limit is exclusive - it is effectively the highest address that cannot be accessed. So a better Limit for the above example would be
And now "growing" the Stack is easy: decrease the Limit as desired, and move the memory if necessary (not forgetting that Base is 64 kiB below the top of the area of memory that contains the data). Again, none of the offsets inside the Stack need to be modified.
A 32-bit Expand Down Segment follows on from the 16-bit version exactly, but now the 64 kiB mathematics becomes 4 GiB mathematics - which is the size of the address space! Thus, the Base can be seen as the top of the accessed memory, and the Limit is the highest inaccessible location below that. The problem comes from the fact that the Limit only has 20 bits of resolution, and the Gran flag can't help us now.
The 16-bit example above specified a 1 kiB Stack with a starting Limit of
FC00h (later adjusted to
FBFFh as explained). The 32-bit equivalent would be
FFFF_FBFF - but that won't fit inside the 20-bit Limit field! And specifying it as Page-granular with the Gran flag won't help either - 20 bits means either
F_FFFEh, which when shifted would become
I find everything to do with 32-bit (Big) Expand Down Segments humorous:
- A Page-granular Limit of
F_FFFFhdefines a Segment whose highest inaccessible address is
FFFF_FFFFh- in other words, nothing is accessible!
- A byte-granular Limit of
0_0000hdefines a Segment whose highest inaccessible address is
0000_0000h- in other words, the lowest byte is not accessible!
- A byte-granular Limit of
0_FFFFhdefines a Segment whose lowest accessible address is 64 kiB - and everything above!
In other words, you can specify the size of a Stack to the byte - as long as you want it somewhere between 3.999 and 4.0 GiB in size. If you only want a 6 kiB Stack though, you either have to:
- Use the Gran bit, rounding off the size to 4 kiB chunks (4 or 8 kiB instead of 6); or
- Use a 16-bit entry (Big=0, using
ESP) with a maximum Limit of only 64 kiB. Sizes between 64 kiB and 1 MiB that can be specified to the byte in Expand Up Segments aren't available with Expand Down ones.