Jolitz HeritageJolitz Heritage Site - Chronicling the Legacies of the Jolitz Family of Silicon Valley, including the accomplishments of William Jolitz, Lynne Jolitz, Rebecca Jolitz, Ben Jolitz, and William Leonard Jolitz. [ Jolitz Heritage ] |
|
|
386 Segmentation and Paging
The 386 has six segment registers (CS, DS, SS, ES, FS, and GS) which can select one of 16,383 (8,191 shared and 8,192 private) segment descriptors. These segment descriptors reside in either the Global Descriptor Table (GDT) or the Local Descriptor Table (LDT) and determine underlying characteristics (type attributes, location in linear address space, and segment size). In addition to memory segments, system segments are available to the operating system for special purposes and call gates to facilitate controlled indirection into other possibly hidden segments.
Memory segments can be selected via a dedicated segment register, with different results. The CS register contains program instructions. The DS register selects program data. The SS register selects the program stack. The ES register selects the destination of string instructions. Both the FS and GS registers are undedicated at this time. It is even possible to reassign the segment registers in the machine instructions, so one can view the ES, FS, and GS segment registers as alternative DS segment registers. Each memory segment has a size, and can be as large as 4 gigabytes. In order for that segment to be active, however, it must consume space (global linear address space) in direct proportion to its size. This means that, although a process may possess a total address space greater than 4 gigabytes, only an aggregate of active segments totaling less than or equal to 4 gigabytes is permitted. While the 386 theoretically can address 2{14} x 2{46} bytes, in practice only 2{32} bytes (4 gigabytes) can be active at any time. If the maximum 4 gigabytes of instruction, data, and stack (for both operating system and each user process) is invoked, managing the global linear address space to allow segments to be active (present) when linear address space is available becomes a significant problem. Segments can also be overlapped in linear address space. Because through both segments we can access the same memory interchangeably, possibly with different attributes, this overlap is called an alias. 80x86 segments can be either "bottom up" or "top down." A segment that is bottom up means that one begins with segment relative address 0 and "grows up" to the desired address x (that is, [0 ... x]). A segment that is top down means that one begins with segment relative address 0xffffffff and "grows down" to the desired address y ([y ... Oxfffffff]). (Yes, we know this is awkward, but that's how it works). Segments are grown only in accordance to these rules. The stack segment is the only common example of a downward growing segment. Many other attributes are provided that control the type of access allowed within the segment. The designers of the 386 prefer segments be used in memory protection regulation, and have provided a plethora of features not found in the paging unit. Segment attributes, such as 32-bit vs. 16-bit operations, byte vs. page granularity, and user vs. supervisor mode, control the mode of the microprocessor, depending on the segments that are actually in use. It is quite costly to implement segments in the microprocessor. That is why underlying shadow registers, invisible to the programmer, are used. They provide a hardware "assist" to the segmentation functionality. We manage to avoid many paging bookkeping problems by running in "flat" mode. This is accomplished by aliasing the CS, DS, SS, and ES segment registers to the exact same linear address space (see Figure 4), thus making it an identity function. We can then regard any of the intrasegment addresses as if they were linear address space. Of course, this ends up defeating the advantages of segments as well. Some new microprocessors, such as the 386, feature architectures which exploit large segments. This is because 4 gigabytes is starting to fill up, and going to 64-bit addresses will not be happening soon. Many would argue that 4 gigabytes will never be filled, but history states otherwise. 64-Mbit RAM is already on the drawing boards -- in fact, some actually exist. In a few years, it will be commercially available. Because a typical computer uses on average 64 to 128 RAM chips, with many companies currently offering 64-Mbyte systems (512 1-Mbit RAM), it will not be long before a computer with 512 64-Mbit RAM chips (4 gigabytes) is introduced. As such, segmented architectures may provide a way of spanning the address space gap that could result. It's amazing that at the beginning of the microcomputer revolution, an Altair 8800 with 4 Kbyte of RAM was considered incredible because it could run Basic! How time change. We have seen how segmentation works in the 386. Now let's examine paging. For our purposes, segmentation on the 386 is defeated by running in "flat" mode. We can then consider intrasegment addresses as if they are linear address space. Paging works with a two-level scheme that permits the sparse allocation of address space, so that the whole address space, or even all of the address space mapping information, need not be present. Otherwise, a 4 gigabyte process would require more than 4 Mbyte of page tables, even though it may be the case that only a few thousand would be active at any time. Typically, for our purposes, only three pages of page tables are allocated per process (page directory and the top and bottom address space page tables). This is sufficient to run a 4-Mbyte process (instruction plus data size) and 4 Mbyte of stack. (Note that all processes run with a full-sized address space and can dynamically grow to use it.) This mechanism is quite successful in reducing memory-management overhead. The two-level scheme splits the incoming virtual address into three parts: 10 bits of page table directory index, 10 bits of page table index, and 12 bits of offset within a page. The page table directory is a single page of physical memory that facilitates allocation of page table space by breaking it up into 4-Mbyte chunks of linear address space per each of its 1024 PDEs (Page Directory Entry), which determine the location of underlying page tables in physical memory. Each PDE-addressed page of a page table contains 1024 PTEs (Page Table Entry). A PTE is similar in form and function to a PDE. The major difference between a PDE and a PTE is that a PTE selects the physical page frame for the desired reference. Once the frame offset least-significant address bits are obtained, the final address is determined. This method is identical to that used in many other common microprocessors (the MC68030, Clipper, and NS32532, among others). Each PDE and PTE may be marked either "invalid" (not currently used) or "valid" (the underlying page of physical memory is present). In addition, other attribute bits mark entries as "read only" or "read-write" and "supervisor" or "user." Because segmentation is not used to control memory protection, we keep processes honest by relying entirely on the paging mechanism's attributes for protection as well as for the allocation of memory. The mechanism to convert virtual to physical addresses is quite elaborate. To speed things up, the 386 keeps a Translation Look-aside Buffer (TLB) of 64 cached entries, managed entirely transparently. One side affect of this hardware is that if the operating system changes any of the page tables that may be in use, it must flush this cache. The 386 does not allow selective flushing -- only a complete flush of all cache entries by reloading the page directory address register cr3. This is an expensive operation which may be repeatedly performed as we successively transform an address mapping of a process within the kernel (as many as six times in the worst case). Copyright©1994 Willaim & Lynne Jolitz |