Now also, look at the gap in speed between semiconductor memory and magnetic memory. Semiconductor technologies have been getting denser and quicker while magnetic memories like hard disk drives are getting denser but not signifcantly faster. Now, consider that on most of the common operating systems, all transactions are commited to a disk. In server designs of the past, various hardware designs were implemented to increase this IO throughput. This disk IO throughput is usually the bottleneck for all of the transaction processing on computers acting in the server role.
With this gap between semiconductor memory speed and disk IO speed widening, the older server designs will become more important.
It is surprising that one of the solutions to this problem used has not become a common sight in the commodity PC world. This solution is the use of non-volatile semiconductor memory as the new layer between a computers main memory and magnetic memory. I suggest calling this memory TRAM and I will go over some of the applications that could use this. In this paper, some of the applications wouldn't require a disk. So I define TRAM as any high-speed, non-volatile memory which can be used to accelerate transaction processing.
In the first section I will describe how this technology works. Next, I will address the hardware aspects of this device first in the 'Hardware' section. Then I will discuss the software side of implementing this technology in the 'Software' section. Then I will discuss the benefits in the 'Benefits' section. Finally, there is the 'Similar Work and Future directions' section.
Transactions
In transaction processing systems, there is an acronym called A.C.I.D. Each letter represents the essential properties of a transaction. The letter 'D' is the one that concerns us. It represents the 'durable' aspect of transactions. This property requires that transactions exist after it is complete[Gray, ??]. In todays systems, this commonly translates in meaning to "write the data to the disk, and then reply". As far as transaction processing is concerned, this write to the disk can take the most time.
In most modern operating systems, writes are cached to memory before they go to disk. However, these writes are in jeapordy if the system fails. As a result, these operating systems will have a system for causing certain writes to happen immediately. This is required for databases that run on top of such an operating system. Also, some operating system operations have to be transactional. For example, the Unix operating system guarantees that file renaming and other directory operations are done as a transaction. Usually, if a lot of these operations occur, their performance is poor.
In the past, a solution to this problem has been to use a fast, semiconductor based memory which was non-volatile (NVRAM). Usually, the transactions would store their operations in the NVRAM. The transaction would then commit very quickly and a background process would write the transaction to disk or some other slow non-volatile memory.
TRAM Benefits To Disk Writes
TRAM benefits a system with the two patterns of writes that exist in a disk based system. In some systems, the writes are random "random writes". In others, the writes are predominatly serial and always to new blocks (databases or log based file systems.) "log writes". TRAM can benefit both types of write patterns based on certain factors.
If the set of "random writes" is a subset of the TRAM's size, then the writes could all exist in TRAM. If this was the case, the disk drive wouldn't have to be used. So, as long as the "random write" pattern is a subset of some specified amount of TRAM, there would be a performance benefit. The factors for this benefit are the size of the TRAM and the set of active blocks. I would call this the 'subset' benefit.
If the system is generating "log writes", it is generating new data blocks at a rate. If the effective rate of these new blocks are being generated is under the rate at which a TRAM device can flush blocks to the disk, then these bursty writes can be commited quickly. For example, if writes are serial and the average rate is 1 megabyte a minute and the system can write blocks from TRAM to disk at 1 megabyte a second, it could be guaranteed that the "log writes" will be to TRAM. The benefit will be the quick commit time on the transactions. The factors to consider are: what is the 1 minute rate of writes, what is the 1 minute rate from TRAM to disk, and how large is the TRAM. This could be called the 'leaky bucket' benefit.
If there isn't enough TRAM to satisfy the request, the request will
have to block until the request could be satisfied. At this point the speed
of the transactions would effectively be the same as a system without TRAM.
This should be seen as equivalent to an operating system 'thrashing' when
it doesn't have enough memory to hold it's working set. Systems should
be designed to that the TRAM would deliver performance without ever causing
a block. For most systems small to large, I think this could be achieved
easily.
Benefits
The largest benefit of TRAM is just pure phenomenal transaction and write performance. Disk writes would be few and not often for most users. For a lot of database users, the transactions would be tremendously faster than before. All in all the performance increase will be 100 to 1000 times faster.
Lower power consumption. Most systems today require disks to go at a faster rate. This requires more electrical current. If the TRAM is sufficiently large, it may not be necessary for disks to provide that kind of performance. Disks with lower electrical current could be used and the heat generation could be cut down. There will always be a need for some applications, but the vast majority would have the pressure eased off by TRAM. In fact, with a 128 meg TRAM, most users may not need to use the disk for a day or a week. With 1 Gig of TRAM, I think it would be safe to say that most users wouldn't hit their disk for years. This would help the world's power consumption or green PC initiative. It would also allow the laptop user to not have to spin up or down anywhere near as often and subsequently increase the battery time.
Since file writes would not be as often, the number of writes to a disk would be reduced. This may increase the lifetimes of disks more than they already are.
Noise, without the disk running, the overall computer noise level could be brought down. For some drives, this is very noticable.
The need for fans would be reduced.
Drawbacks
Removable media would require a synchronization with the TRAM before
being removed. Since this is already the case with most operating systems
anyways, it is not a big issue. In fact, it may allow disks like this to
be used for more applications since they are generally not as fast as non-removable
disks..
Not all applications fit well into current memory sizes. More than
90% of the applications do. Some of the applications that wouldn't benefit
from this are appications that are streaming large amounts of data very
quickly. (Video editing, very-large scale transaction processing).
A TRAM, from a hardware point of view, would consist of three sections. The memory, memory controller, and power section. Overall, all of these parts could be made from off the shelf parts. However, there aren't any off the shelf memory controller designs for this application. Since the parts are off the shelf and since the application is essentially just providing more physical memory address space, building these shouldn't be difficult.
Memory
As of today, semiconductor memories have very high densities. DRAM has 64 megabit parts shipping today. Static RAM (SRAM) can be seen in the 4 megabit densities are common. To build an TRAM device, it is recommended that either DRAM or SRAM is used for this purpose. All of the other technologies like Ferroelectric RAM and Flash don't appear to be suitable. Ferroelectric is still something that hasn't become common. Flash has a limitation on writes which would are done quite often with a TRAM.
DRAM is the most commodity of components in computers today and it has the highest density for semiconductor memories. It would be the most cost effective solution for this problem from the pure density per dollar point of view. DRAM's main requirement is that it requires a refresh cycle periodically in order to maintain it's memory state.
SRAM is the next suitable commodity semiconductor memory. It is also the type of memory that has been the most used in the past TRAM designs. Several manufacturers like Dallas Semiconductor make such SRAM's with a battery and necessary circuitry to make it non-volatile.
Memory Controller
The memory controller is the part that would have to interface with the host machine and also control the memory system. On the host side, it would properly interface with whatever bus the system provided (these days that would probably be PCI). When it received read and write requests through that interface, it would apply those signals to the non-volatile memory side.
Since SRAM doesn't require a refresh, the memory controller is a simple gating buffer. This is probably the reason why so many of the TRAM's in the past have used SRAM for its memory. In the future, as this technology becomes popular, we should see designs using DRAM. DRAM has a higher density and is less expensive per bit..
The memory controller could also deal with all error detection and correction. ECC is a common format for memories these days that provides this functionality.
The hardest part of the design is the part that would assure that the memory doesn't get corrupted when the power begins to fade.There are designs to accomidate this. The easiest method of insuring this is to cause the TRAM's acceptable power level to be higher than what the host computer finds acceptable. The software to manage a TRAM would have to make it's operations transactional so only complete transactions get completed.
Power
The power system would be the circuitry that dealt with the power sources and switchover of power to the memory section. It would also consist of some power source that would always be on if power were to be lost. With the advent of laptops, there are many off the shelf parts that could be used. Many of these parts control different types of batteries and can determine the amount of time left until a power failure occured.
The upside is that it is quick to implement. The downside is that it doesn't give the system good performance globally. You have to direct the application to use that disk. You are also limited to a small sized partition.
The changes required would be:
The upside is that it is only a minor change to an OS and it would deliver good global performance to the system. The downside for some systems is that it will not guarantee that the filesystems will be in a consistent state upon boot (if the system did an improper shutdown). Therefore the system or application would have to bring the system into a consistent state again. For a unix system, this would mean an 'fsck'. For MSDOS or Windows 95/98 this would mean a 'scandisk'.
Some Unix's and Windows NT already use a journaled filesystem. In this scenario, the amount of time to take the system into a consistent state would be quick. However, these systems tend to be 'log write' heavy. So, they would require good "leaky bucket' performance. Still, for these systems, Level 1 would be an adequate solution.
The goal is to eliminate having the 'fsck' to examine the whole disk in order to restore consistency to the disk. To accomplish this, it would be required that part of the TRAM would be used to store transactions that would be necessary for filesystem metadata consistency to be stored in TRAM before the data is considered 'commited' to the normal disk portion of the TRAM. The journal would only have to have enough room for the largest filesystem, metadata transaction.
For example, the filesystem locks the appropriate blocks for an atomic
directory operation. It performs the changes and writes those new blocks
to this small portion of TRAM as well as preserve the blocks before they
are written over. When the changes are complete, the TRAM needs to change
it's internal tables that describe what blocks of memory correspond to
which disk blocks. Once that completes, the filesystem code would state
that the journal is usable for the next transaction.
If it didn't complete, upon reboot the filesystem checker could restore
the filesystem to a consistent state very quickly. This is because it only
has to run through one tiny journal. After that, the TRAM and disk are
considered to be consistent.
This would allow existing filesystems to do a filesystem check in the sub millisecond range, per filesystem. This would be nice.
Of course, TRAM could be used for all sorts of fault tolerant applications.
One could easily make a high perforamance site pair [GRAY]. Imagine two FreeBSD computers with a highspeed network connection. Imagine that the method that clients use to connect to the proper server is done (this is a previously solved problem). Now, imagine that any disk write to the primary requires that the data is successfully written to the secondary of the site pair. If they are both using TRAM, that limitation in response will purely be the network infrastructure. Two boxes costing $4000 will be able to provide performance better than boxes 10 times that cost that provide that kind of fault tolerance (my speculation, but I bet that I'm not far off)
The TRAM could be used for storing the state table of a network address translating router. In essence, this is just like the site pair described above, except that there are no disks. These would be FreeBSD or other boxes acting as routers. Network address translation requires the routers to maintain a state table of all the connections that go through the router. This requires very quick performance. TRAM could provide that performance. If it was large enough, there would not be the need for a disk.
The future devices could use the most cost effective memory, DRAM as
it's semiconductor memory.This would allow the devices to be as large as
normal computer memories. It is not uncommon to see a 64 megabyte or 128
megabyte computer. I suspect that in the next few years when memories get
to be in the gigabytes range, hard disks will be supplanted for small computers
and mid sized computers.
The devices could also eventually have a Content Addressable Memory
unit (an MMU) to speed the cache lookup or hardware could speed up any
of the common functions used so the CPU isn't utilized.
Related Work
Previous work on this has been done by many people.
Since most computers were built with core memory in the past, which had the characteristic of being non-volatile, I'm sure there was software that took advantage of this feature. In some computers, like the first internet 'routers', it was the only form of non-volatile storage[2]. It would be kind of 'cool', to have this feature back.
Silicon disks are another related technology[4].
After writing this I stored any links or papers I found on the topic.
McVoy has an interesting paper on line from the 1991 period that is the closesly related to this paper [3]. His even discusses in more detail intelligent disk controller technology and the pro's and con's of putting the NVRAM on the disk.
From the abstract of their paper at the Usenix site, Peacock, et. al.
did a paper that is also closely related to this paper. Although I thought
my idea on using existing filesystems with TRAM was original, their publication
was years ahead (good work guys). I am not a member of Usensix, so it would
be nice if someone could email me the paper.[8]
Just about anybody running a 'server' will easily see the cost justification in an NVRAM board.
Free operating systems like FreeBSD, *BSD, and Linux that use commodity PC hardware, could provide serious transaction processing power and significantly better fault tolerance for very little . This would eliminate the notion that a 'serious box' made by some other vendor would be required for a solution. This has been something that they have lacked the ability to solve.
With performance like this, Network Appliance and Auspex might lose market share in the file server market as commodity PC's will achieve performance of a portion of their market.
Microsoft and other OS vendors could have a new justification
for new OS releases.
Since the PC would now be filling a role as the middle range server,
Sun, SGI, IBM, and HP will be compteting.
The future for NVRAM is bright. I forsee almost every desktop, laptop, or server computer in the future having some form of NVRAM.
If laptop users aren't driven by any heavier data storage needs, they may not need a disk with sufficient NVRAM storage.
When you use your computer, your hard disk may be off for long periods of time.
Analogy, if we have an NVRAM with a size of 1 GIG, do we need a disk??
Please contact me if you know of any papers or related work that I haven't
mentioned...