TACHYON |
TACHYON: A High Performance Object Oriented CPUFeatures
Applications
IntroductionThese pages describe a high performance CPU intended to run multimedia intensive applications written in the Self, Smalltalk or Java programming languages. The key concept is that a huge performance gap has developed between the processor and main memory. When the Merlin 2 was designed, for example, the 68000 took 4 cycles (500 nS at 8 MHz) to make one memory access, while its DRAM could respond in only two cycles. The spare bandwidth was used for video. Today, a 600 MHz Alpha CPU can execute up to four instructions (requiring up to two memory accesses) in one clock cycle, a worst case of a word every 0.2 nS. Yet a "normal" DRAM can take 60 nS to service a request and even fast SDRAMs cycle no faster than a peak of 10 nS per word (50 times too slow). We have reached a point at which the CPU that can execute a program with the fewest accesses to main memory will be the fastest - all other considerations are secondary. Elaborate cache organizations (up to three levels deep) are the current answer, but Tachyon attacks this problem though architectural innovation. It is important to know the memory traffic and tune the hardware to deal with it. For example, the memory access patterns for the stack are very different than for random memory positions. So a dedicated hardware for the stack (like the register windows in the Berkeley RISC and Sun Sparc or the stack cache in the AT&T Hobbit and the Sun PicoJava) can outperform a generic memory cache while using only a small fraction of the number of transistors. In the same way, separating memory traffic related to object access, instruction decoding and to multimedia processing can result in special hardware for low cost/high performance implementations. Project Schedule and Development ToolsThe key to developing a competitive product is the use of the right tools. A great variety of EDA (Electronic Design Automation) software currently exists, but for leading edge projects the development of custom tools is inevitable. Microcode CacheAdaptive compilation was developed for Self, but now this technology is being adapted for high performance implementations of Java. The microcode cache is the silicon equivalent of adaptive compilation, and offers a significantly improved performance relative to alternative implementations of direct bytecode execution. Multimedia Stream ProcessingMost CPU cycles are now taken up by processing video and sound, and this will increase even more in the future. Traditional memory systems are designed with the idea that memory accesses, while highly unpredictable, are very localized (the same memory positions tend to be used over and over). This assumption is false, however, for media processing. There memory accesses are predictable and non repetitive. Each byte in a video or sound buffer will be touched once, sequentially, in most decompression algorithms. A new memory architecture optimized for dealing with regular streams of data, therefore, is needed to handle these tasks. And since multiple parallel streams are the norm (a video sequence plus left and right stereo audio channels, for example), the ideal architecture includes multiple data paths. Object Oriented Memory SystemJust as the traditional data cache can't handle multimedia memory traffic very well, it has proved inefficient for object-oriented applications as well. Even small modifications to the addressing modes and virtual address translation can greatly improve this situation, however, as was shown in the Mushroom project. Palmtop MulticomputingComputer performance should be highly scalable. Today, it is possible to connect a large number of off-the-shelf personal computers to challenge traditional supercomputers for many applications. Object-oriented systems are even better suited for this kind of architecture. By integrating the processor, memory and communications on a single chip, the benefits of these high performance implementations can be extended to small and low cost systems - even to the palmtop computing level. LinksHere are some links to related information available on the web. Many older designs were published in traditional conferences before the widespread use of the internet and can only be found in printed versions, but are still very much worth studying. Java CPUs:Other Object Oriented Architectures:Multimedia CPUs:
Integrated Memory/CPU designs:
Other:
|
Back to Merlintec home page
(c) 2000 Merlintec Computers. Please send comments to the Webmaster