Virtual Mamory System - Australia Assignments

A Microprocessor-Based Virtual Mamory System
~. D. Ruggiero and S. G. Zaky
Department of Electrical Engineering
University of Toronto
Toronto, Ontario, Canada
ABSTRACT
This paper presents a virtual memory system
which may be connected to the memory bus of a
~st computer. The system is based on the use
of a separate processor to interface to the
backing store and perform all the processing related to managing the virtual memory. The proposed approach provides a virtual memory that is
transparent to the host processor, with one exception. When a memory reference generated by
the host results in a page fault, a long delay
is encountered. This delay corresponds to the
time taken by the virtual memory controller to
carry out a page transfer.
A prototype system based on the above approach has been constructed. Bubble memory devices are used to implement the backing store.
Tne suitability of bubble memories in this environment is discussed in the paper.
Keywords: Virtual Memory, Memory Management, Architecture of Sm~ll Computer Systems, Microprocessor Applications, Bubble Mamories.
I. INTRODUCTION
Progress in the areas of computer architecture and LSI technology is quickly removing some
of the limitations on the performance of miniand microcomputers. One of the se@erest restrictions found in small computers is the limited address space resulting from a short word
length. Memory management schemes that allow a
much larger address space – 4 million words or
more – are now available for most minicomputers
and many of the new 16 bit microprocessors.
Despite the continuous drop in the cost of semiconductor memory, it is not economically feasible to provide such a large random access memory. Block accessed serial memories are typically an order of magnitude cheaper per bit, and
thus can readily provide the required memory
si ze.
This work was sponsored by the National Sciences
and Engineering Research Council of Canada under
research grant A8994.
The utilization of a serial memory as storage for a mini- or microcomputer faces only one
obstacle: for a variety of reaso.qs, serial devices are not directly suitable for program storage. The solution to this dilemma has been
known in large computer systems for some
time[l,2]. It consists of using serial, blockaccessed devices together with a limited anount
of RAM in a virtual memory scheme.
Virtual memory approaches have not been
used in small computer systems because the complexity of the control needed to make such a
system function requires extensive redesign of
existing CPUs. Attempts to apply virtual memory
to existing microprocessors have resulted in limitations being imposed on the host
processor[3]. For example, several instructions
could not be used because of the effect of page
faults on their execution. Furthermore, multiword instructions could not be broken over a
page boundary.
This paper presents a virtual memory system
in which an independent processor is employed to
perform all the management functions. These
functions include address calculations, keeping
and updating of page tables, execution of transfers between RAM and mass storage, as well as
implementation of the details of the replacement
algorithms. The system is intended to be connected to the memory bus of a host processor.
At its interface to the bus, it appears as a
large random access memory. Internal details of
the virtual memory organization and management
are transparent to the host. The paper discusses the architectural features that the host
processor must have in order for this approach
to be feasible.
A prototype of the proposed virtual memory
system has been constructed, using bubble memories to implement the backing store. Electromechanical media, e.g. tapes, disks, and drums,
have traditionally dominated the field of serial
memories. However, the technology of solid
state serial memories offers a viable alternative, particularly in small systems. Bubble memories and charge-coupled devices are likely to
play a larger role in computer systems of the
near future. For example, reliability and the
absence of moving parts make such memories attractive in portable and airborne applications.
If the anticipated reductions in cost per bit
materialize, bubble memories could become costeffective in general purpose systems. Some of
the merits and limitations of bubble memories in
a virtual memory environment are discussed later.
CH1494-4/80/0000-0228 $00.75 © 1980IEEE 228
2. DESIGN PHILOSOPHY
The structure of the proposed virtual memory system is shown in Fig. I. The Virtual Memory Controller (VMC) performs two major functions:
I. Translation of addresses generated by
the host to physical RAM addresses.
Address translation is carried out
whenever the host issues a memory request with an address that corresponds
to a page currently residing in the
RAM.
2. Transfer of a page of data from the
mass storage device to the RAM when
the page referenced by the host does
not reside in the RAM.
HOST
PROCESSOR
MASS
STORAGE
I MEMORY BUS
I
VIRTUAL MEMORY
CONTROLLER
(VMC)
RAM
Figure I: Organization of the Virtual Memory
System
The above two functions have sufficiently different time scales that they must be handled
differently. The host must be allowed fast access to pages which reside in the RAM. This means that it is not feasible for the VMC microprocessor to handle individual memory cycles.
Hence, memory references to pages that are resident in the RAM are handled directly by the address translation hardware in the VMC. The microprocessor is called upon only when a page
fault occurs. It then finds a page frame in the
RAM which can be used, stores its contents in
the backing store, if necessary, and transfers
the new page from the backing store to the RAM.
]he host memory request which resulted in the
page fault is serviced once transfer of the new
page is complete.
The use of a separate and independent processor to perform the management functions in
the virtual memory system leads to several advantages. The host processor is relieved of all
knowledge of the internals of the memory system:
it need not concern itself with the management
of the available RAM or with the details of page
tables and address calculations. This scheme
also relieves the host processor of the need to
control mass storage devices directly, since the
VMC processor can take on this function.
Another important advantage of the dedicated
processor approach is to allow the implementation of a large variety of alternative replacement algorithms easily and efficiently. The
ease of implementation results from having the
details of the algorithm localized in the VMC
processor; efficiency is gained because it is
possible to have the VMC processor perform parts
of the algorithm concurrently with the host processing. This allows shorter sequences when a
page fault occurs and consequently greater overall memory speed.
3. HOST ARCHITECTURE
The virtual memory sy&tem described in this
paper was intended to be completely transparent
to the host processor. Transparency can be achieved provided that certain requirements are
satisfied by the host architecture. This is
discussed below in some detail.
3.1 WAIT STATE
As described, the virtual memory system
emulates the characteristics of a RAM, with the
exception that some memory references are extremely slow. A slow reference results when a
page fault occurs and a subsequent transfer of
data between mass storage and RAM takes place.
In order to handle this situation, the host may
simply be forced to wait umtil the page transfer
is completed. This is feasible provided that
operation of the host can be suspended in the
middle of an instruction. Examples of such processors are the Intel 8080, 8085, and 8086 families of microprocessors, which can stay indefinitely in a Wait state during any mamory cycle.
The Zilog Z8000 and Motorola MC68000 also allow
an indefinite Wait state.
229
Not all processors are capable of an indefinite Wait. In the case of the Kotorola M6800,
for example, a memory cycle can be extended by
stretching phase I of the processor clock to a
maximum of about 4~s. Extending the wait period beyond this length leads to loss of information due to the dynamic nature of storage on the
microprocessor chip. The period required to
process a page fault depends on several factors,
but primarily on the page size and the type of
mass storage used. With present technologies
this time is at least an order of magnitude longer than the above 4~s, even for very small
page sizes. Therefore the above approach is not
suitable for use with dynamic processors that do
not have an indefinite Wait capability. The
fact that the processor can be halted between
instructions is of no use in a virtual memory
system, since page faults will occur during
either the fetch or execution phase of an instr uc tion.
point 2 is difficult to implement without changing the structure of the host. Consider for example the case of a PDP-11 processor. The program counter is incremented as successive words
of a multiple word instruction are fetched.
Thus, at the time of a page fault, the starting
address of the instruction that caused the fault
is not known. Furthermore, because of the autoincrement and autodecrement features, the contents of one or more registers may have been altered.
It is very difficult for a device other
than the host rocessor to perform the roll-back
function. Such a capability can be implemented
as an integral part of the design of the host.
However, it requires that the host CPU be designed with virtual memory in mind, which precludes the use of virtual memory with many existing processors.
3.2 TIM£ OUT
Processors that use a time-out mechanism to
handle invalid memory references present similar
problems. The time-out period – for example 20
~us in the case of a PDP-11 – is usually too
short to allow completion of a page transfer.
This situation can be handled simply by increasing the time-out period to allow a worst-case
page fault to be serviced.
3.3 EFFECT OF ABORTED INSTRUCTIONS
If the host processor cannot wait during a
page transfer, the instruction that caused the
page fault must be aborted. After the page
transfer has been completed, execution of the
aborted instruction should be restarted. A number of problems arise with this approach. They
can be divided into two broad categories. The
first concerns the ability of the VMC to force
the host to abort and later restart execution of
an instruction. The second relates to the side
effects of a partially executed instruction.
In order to abort an instruction in progress, the VMC must be able to send a command to
the host to that effect. Upon receipt of this
command the host should perform the following
functions :
I. end the outstanding memory request;
2. roll back its internal state to the
conditions prevailing before execution
of the current instruction began;
3. execute a trap procedure which saves
the state of the processor at this
point and allows trap servicing software to determine subsequent action.
~hile points I and 3 are easily accomplished,
4. BACKING STORE TECHNOLOGY
The concept of virtual memory is centered
around the use of a block accessible serial memory as the backing store. For many years electromechanical devices, mainly disks and drums,
have been used as the backing store for large
mainframe virtual memory systems. For the purpose of this work a device such as a floppy disk
could have been used. However, it was decided
to investigate the feasibility of the use of
newer solid state serial memories in virtual memory applications. In particular, magnetic bubble memories were chosen for use as the backing
store.
The storage section of a magnetic bubble
memory is organized in the form of major and minor loops, as shown in Fig. 2. The minor loops
are used for bit storage; a block of information
consists of the bits occupying a given position
in all the minor loops. The major loop provides
an interface between the storage loops and the
bubble generation and read-out mechanisms. Bubbles in both the major and minor loops move in
synchronism under the influence of a rotating
magnetic field[4]. External control allows bubbles to be generated and transferred between the
major loop and the minor loops. Bubbles in the
major loop may either be directed to the detector or allowed to continue to circulate in the
major loop.
In order to access a given block in the
bubble memory, the bubbles in all loops are rotated until the required block is closest to the
major loop. A transfer signal at this point
moves all the bubbles corresponding to that
block into the major loop. As rotation continues the bits of the block are accessed serially
as they pass under the detector.
Control of the bubble memory subsystem is
complicated by the characteristics of the bubble
memory. These may be sGnmarized as follows.
230
MAJOR
LOOP
BUBBLE
DETECTION
<
BUBBLE f’]
GENERATIONTRANSFER
STATION
MINOR LOOPS
i%”
t~- ONE BLOCK
Figure 2: Organization of a Bubble Memory
I.
2.
.
4.
Due to manufacturing defects, not all
minor loops are guaranteed to be functional. The manufacturer provides
with each memory device a map to indicate which loops did not pass the production tests. These loops must be
ignored.
The control of the bubble memory device intrinsically involves currents
rather than voltages. TTL control
signals must be converted to controlling currents. To make matters worse,
the control currents have a temperature dependency that must be taken
into account if the device is to be
operated over a large range of temperatures.
The rotating magnetic field for the
device is provided by currents which
must be controlled so they assume a
triangular waveform, with rather
strict timing constraints.
The serial nature of the device requires serialization and deserialization of the bit stream. The overall
speed of the field drive is of the
order of 100 kHz; bubble generation
and readout occurs at half this frequency. Despite the slowness of these
rates there is insufficient time for a
device such as a microprocessor to
handle individual bits. Thus the interface for the bubble devices requires hardware for parallel to serial
and serial to parallel conversions for
use with byte (or word) oriented device s.
5. At all times the block of the minor
loops that is currently at the position closest to the major loop must be
known. The bubble memory device itself does not provide any synchronization signal when block 0 – an arbitrarily chosen block – is at the major
loop. This means that position information must be maintained at absolutely all times. In particular, it
must be maintained through power down
and power up sequences.
Bubble memory characteristics make the controller a complicated device, but the problems
encountered are not insurmountable. Bad bit locations are masked by combining the bit stream
from the bubble device with that from a control
ROM containing the device mask. Voltage to current transformations are performed using special
drivers. The triangular drive currents can be
provided by switching an appropriate DC voltage
across the essentially inductive drive coils.
Serialization and deserialization can be easily
performed by hardware. Position information can
be provided by a counter, and this information
can be maintained over power down/up sequences
by ensuring that rotation is stopped with a
pre-specified block closest to the major loop.
As mentioned earlier, direct control of a
bubble memory at the bit transfer level by means
of a microprocessor is not feasible. C~ the
other hand, control of the device at the level
of reading and writing blocks of information occurs on a much longer time scale and can easily
be performed by a microprocessor. This speed
discrepancy can be reconciled by providing high
speed hardware to perform individual functions
that are required repetitively in each cycle.
The microprocessor initiates a given operation
by sending a command that specifies the function
to be performed and the block number at which
implanentation of this function is to begin.
The problems of interfacing to bubble memories have been reduced by the recent introduction of complete bubble memory systems. Such
systems were not available at the start of this
work.
5. SYSTEM HARDWARE
A prototype system embodying the concepts
discussed in the preceding sections has been
built[8]. The VMC processor is an Intel 8085.
The host processor is also a microprocessor system, built around an Intel 8080. A block dia-
23t
gram for the virtual msnory system as implemented is sho~n in Fig. 3.
VIRTUAL MEMORY
CONTROLLER
___I
HOST I
PROCESSOR I
J– — –I
I I
TRANSLATION I
~RDWAREI [,
SYSTEM
BUS
vMc I
PROCESSOR I
(8085) I
I
CONTROLLE
t
RAM
Figure 3: Prototype Vifftual Memory System
The Address Translation Hardware performs
the high speed functions required to allow the
host to have fast access to pages of data residing in the RAM. The address mapping function is
shown in Fig. 4. Part of the host address is
used to index into a page table and thereby obtain the starting address of the required page
frame in the RAM. The remainder of the host address directly provides the offset of the addressed location in the page.
The prototype implementation of the address
translation hardware requires two references to
the RAM for every host memory request. The extra reference is used for the page table
look-up.
The system RAM provides storage for resident pages. The page table is also kept in the
RAM, as are the programs used by the 8085 to
manage the system.
The prototype system provides simple arbitration between the 8085 and the host: normally
only the host is allowed access to the Virtual
Memory System Bus and therefore to the RAM.
Whenever a page fault occurs, bus access is
switched to the 8085. Implementing the interface in this way does not allow for concurrent
operation of the host and the VMC processors; it
avoids the complexity of a memory arbiter at the
expense of system speed.
Mass storage is provided by a bubble memory
subsystem. This consists of a controller and
Texas Instruments TBM0102 100,637-bit bubble memory chips. After allowing for bad loops and
wasted space due to the awkward size of the
loops, about IOK bytes of storage r~nain usable
per chip.
PAGE TABLE
}
P’ ]~ :
HOST ADDRESS
I P I b I
I !
P
ENTRIES
V
P
PAC._S<
PAGE FRAMES
byTES
Figure 4: Address Translation Mechanism
The performance of the virtual memory system is limited by the transfer rate of the bubble memory. As a result, a small page size of
32 bytes was selected. This corresponds to two
blocks of data from the bubble memory. To the
host, the Virtual P~mory System represents an
address space of 32K bytes, or 1024 virtual
pages of memory. Each page table entry, consisting of a pointer to a page frame in RAM and
some control bits, requires two bytes. Thus the
page table occupies a total of 2K bytes. Of the
remaning system RAM, ~K bytes are used for VMC
programs, and 4K bytes are used as page frames.
232

5.1	SYSTEM OVERHEAD
The	prototype system	described above	has
provided a suitable vehicle for testing the con
cepts	involved.	It has	also	allowed	design
trade-offs to be identified,	and the complexity
of various subsystems to be assessed.	However,
because of the limited amount	of memory availa
ble, it has rates using	not been possible to measure fault
realistic	program	sizes.	Fortu
nately,	the is	data required	to estimate	fault
rates	readily	available	in	the
literature[5,6].
Consider a	system with	64K bytes	of RAM,
and 32-byte rate of one	pages.	For a	12SK byte	program a
fault for every 5000	memory refer
ences may be expected[5,6].	Since the 80~0 per
forms	a memory	reference every	2//s, a	page
fault can be expected every 10 ms.
The service time for a page fault is essen
tially the time required to store a page of data
on	the backing	store and	obtain	a new page.
Each of	these operations	requires about	10 ms
for the 32-byte	page size.	Of this	time, 4 ms
is the average access time, and 6 ms is taken up
by the memory.	transfer of data	to or from	the bubble
Therefore, the	host	system would	be
waiting approximately 70% of	the time.	Cperat
ing two bubble memory	modules in parallel would
reduce the transfer time to	3 ms for the entire
page.	This results in a wait time of less than
60%.	These numbers	are comparable to	the per
formance of existing virtual memory	systems on
larger machines[6 ].
Further	paralleling of modules cannot	be
used	to reduce	the transfer	time, because	at
least one block must be read from	each device.
More devices in	parallel could	be used	to in
crease	the	page size	without	increasing	the page
transfer	time.	However, increasing	the
size may lead	to an increase in	the page fault is increased.
rate, unless	the system RAM size
The minimum size of the RAM required to ensure a
low rate	of page	faults is a function	of the
size	and	type	of	program	that	is being
executed [5 ].
The average	access time	quoted above	ap
plies to a unidirectional bubble memory chip.	A
bidirectional chip would exhibit a reduced aver
age delay, due to locality	of reference.	Simu
lation results indicate that a tenfold reduction
in access time can be expected[7].
In the current implementation it is not ne
cessary to know whether a	page has been modi
fied.	All pages, whether or	not they have been
modified, are written back	to the bubble memory
when they	are removed from	the RAM.	In fact,
when a page is loaded into the RAM, the original
copy is not kept on	the backing	store.	There
are two reasons for this. tive read from the bubble	First, a non-destruc memory is slower than
a destructive read	which leaves a block	of all
zeroes.	Secondly, a write operation to the bub
ble memory	can only be performed	to an	empty
block.	Thus, if a copy of	a given page is kept

in the bubble memory, further delay is encountered in order to clear the area in the bubble
memory occupied by that page before the modified
contents can be stored. It should be pointed
out that some of these restrictions have been
relaxed in recently announced bubble memory
chips.
5.2 PARALLEL OPERATION OF EUBBLE MEMORIES
The maxL~um data rate attainable with bubble memories is currently of the order of 50K
bits per second. ~nen many bubble devices are
used, the data rate may be increased by connecting several of them in parallel. Unfortunately
the fact that bad loops exist in bubble memory
devices makes parallel operation more difficult
than it would first appear. Suppose that eight
bubble memories are to be operated in parallel
to give a byte of data every bubble memory period. In general, each of the bubble memory
chips will have bad loops at different loop positions, corresponding to different bit positions in the serial stream. If the bits of a
byte are assigned one to each device, then any
byte that has one or more bed bits must be ignored. This would be very inefficient in its
use of the bubble memory storage since rainy
bytes would be unused even though they contain
only a single bad bit.
/~ more efficient approach is to take the
data streams from ~ll the bubble memories that
are to be operated in parallel and combine them
into a single serial data stream, as shown in
Fig. 5. The mask bit streams from the corresponding PROMs can be combined in the same way.
The resulting data and mask streams can now be
handled in the same manner as for individual
bubble memories.
The scheme described above requires a controller that is somewhat more sophisticated than
the one used with a single bubble memory. Ibwever, it has the advantage that any number of
chips may be operated in parallel. It is not
necessary to make the n~nber of bubble memories
correspond to the number of bits in a byte or a
word, as would be the case in the more direct
approach. As a result, intermediate sizes and
speeds of memory can be implemented without diffic ul ty.
5.3 CONCURRENT PROCESSING
The use of an independent processor offers
a potential gain in performance if processing in
the VMC microprocessor can proceed concurrently
with the host execution. Such a gain may be achieved because of a reduction in the time required to service a page fault. When a page
fault occurs, it is normal to transfer a page
from RAM to the backing store and then obtain
the required page. The service time for this
operation is reduced to half if the new page is
obtained first. If a free page frame is available in the system, the required page ca.n be obtained from the backing store, and the host can
be allowed to continue immediately. Concur-
233
N BUBBLE MEMORY DEVICES
BM0 BM1 . . .
t’
I–I N-BIT SHIFT REGISTER
LL N-BIT SHIFT REGISTER
BM
N-1
DATA STREAM
= PARALLEL
DATA
MASK STREAM
N MASK PROMS
Figure 5: Parallel Operation of FE~bble Memory Devices
rently with the host processing, a page can be
transferred to the backing store to once again
free a page frame in RAM. For a constant system
RAM size this results in one less page frame being available to the virtual memory system.
However, the increase in the page fault rate
from this effect will be small compared to the
saving of half the fault service time.
Another way in which performance may be improved is through the use of the virtual memory
processor to implement more efficient replacement algorithms, such as LRU. This approach requires additional hardware to keep £rack of recent page references. It is not clear that the
cost of additional hardware would be justified,
except perhaps in very high performance systems.
6. DISCUSSION AND CONCLUSIONS
A virtual memory system based on the use of
a separate microprocessor to perform the memory
m~nagement functions has been presented. The
main advantage of this approach is that it provides a virtual memory system that is almost totally transparent to the host processor. It relieves the host of the task of managing page tables and secondary storage.
The system also offers the possibility of
concurrent processing in the host and the virtual memory controller. This can be used to improve the performance of the virtual memory system.
There are a number of salient points to be
drawn from this work. They concern the structure of CPUs and memory systems, as well as the
usefulness of bubble memory modules in their
present form.
]he implementation of a virtual memory system as described in this paper requires that the
host processor be able to accomodate an occasional extra long memory cycle. This means that
the processor should be capable of entering a
Wait state in the middle of an instruction. In
the case of microprocessors that are based on
dynamic logic, internal refreshing should continue during the Wait state. The ability to
abort an instruction, while very useful, is much
more difficult to implement.
The bubble memory device per se is difficult to interface to and work with. The “feat~e” of redundant loops in particular complicates the design of a controller for the device.
Bubble memories will be accepted much more ra-
234
pidly if means are found to make the operations
of bad loops and redundancy transparent to the
user. Some steps in this direction appear to
have been taken in recently announced bubble memory products.
A method has been discussed for parallel
operation of a number of bubble devices. This
leads to a substantial increase in the data
transfer rate. Since parallel operation of several bubble devices interacts with the way in
which bad loops are handled, the requirement of
ease of parallel operation should be incorporated into future bubble memory designs.
References
I.
2.
Richard E. Matick, “k%mory and Storage”,
Introduction to Computer Architecture,
H. Stone Ed., SRA, pI~-247, 1975.
Dionysios C. Tsichritzis and Philip A.
Eernstein, Operating Systems, p96-119,
Academic Press, 1974.
3. Judith A. Anderson and G. J. Lipovski, “A
4.
5.
6.
7.
8.
Virtual Hemory for Microprocessors”,
Proc. Second Annual Symposia1 on
Computer Architecture, p80-84, Jan
20-22, 1975.
Andrew H. Bobeck, Peter I. Bonyhard, and
Joseph E. Oeusic, “F~gnetic ~ubbles – An
Emerging New Memory Technology”, Proc.
IEEE, vol 63, no 8, p1176-1195, Aug
1975.
W. W. Chu and H. Cpderbeck, “Performance of
Replacement Algorithms with Different
Page Sizes”, Computer, vol 7, no 11,
p14-21, Nov 1974.
Gary Tjaden and Fartin Cohn, “Some
Considerations in the Design of
F%inframe Processors with Microprocessor
Tecknology”, Computer, vol 12, no 8,
p68-74, Aug 1979.
Chula naRanong and Dan ~!ammerstrom, “The
Latency Reduction of Bidirectional
Magnetic Dubble ~iemeries”, Proc. IEEE,
vol 67, no 2, p328-330, Feb 1979.
M. D. Ruggiero, “A Microprocessor-Based
Virtual ~emory System”, M. A. Sc.
thesis, University of Toronto, Toronto,
Cntario, 1980.
235