Discovery Guides Areas


Multicore Processors: A Necessity
(Released September 2008)

  by Bryan Schauer  


Key Citations

Visual Resources

News & Scholars


Multicore Implementations


As with any technology, multicore architectures from different manufacturers vary greatly. Along with differences in communication and memory configuration another variance comes in the form of how many cores the microprocessor has. And in some multicore architectures different cores have different functions, hence they are heterogeneous. Differences in architectures are discussed below for Intel's Core 2 Duo, Advanced Micro Devices' Athlon 64 X2, Sony-Toshiba-IBM's CELL Processor, and finally Tilera's TILE64.

Intel and AMD Dual-Core Processors

Intel and AMD are the mainstream manufacturers of microprocessors. Intel produces many different flavors of multicore processors: the Pentium D is used in desktops, Core 2 Duo is used in both laptop and desktop environments, and the Xeon processor is used in servers. AMD has the Althon lineup for desktops, Turion for laptops, and Opteron for servers/workstations. Although the Core 2 Duo and Athlon 64 X2 run on the same platforms their architectures differ greatly.

 block diagrams for the Core 2 Duo
 block diagrams for the Athlon 64 X2
Figure 4: (a) Intel Core 2 Duo, (b) AMD Athlon 64 X2 [5]

Figure 4 shows block diagrams for the Core 2 Duo and Athlon 64 X2, respectively. Both architectures are homogenous dual-core processors. The Core 2 Duo adheres to a shared memory model with private L1 caches and a shared L2 cache which "provides a peak transfer rate of 96 GB/sec." [25] If a L1 cache miss occurs both the L2 cache and the second core's L1 cache are traversed in parallel before sending a request to main memory. In contrast, the Athlon follows a distributed memory model with discrete L2 caches. These L2 caches share a system request interface, eliminating the need for a bus.

The system request interface also connects the cores with an on-chip memory controller and an interconnect called HyperTransport. HyperTransport effectively reduces the number of buses required in a system, reducing bottlenecks and increasing bandwidth. The Core 2 Duo instead uses a bus interface. The Core 2 Duo also has explicit thermal and power control units on-chip. There is no definitive performance advantage of a bus vs. an interconnect, and the Core 2 Duo and Athlon 64 X2 achieve similar performance measures, each using a different communication protocol.

CELL Processor

A Sony-Toshiba-IBM partnership (STI) built the CELL processor for use in Sony's PlayStation 3, therefore, CELL is highly customized for gaming/graphics rendering which means superior processing power for gaming applications. The CELL is a heterogeneous multicore processor consisting of nine cores, one Power Processing Element (PPE) and eight Synergistic Processing Elements (SPEs), as can be seen in Figure 5. With CELL's real-time broadband architecture, 128 concurrent transactions to memory per processor are possible.

synergistic property elements
Figure 5: CELL Processor [6]

The PPE is an extension of the 64-bit PowerPC architecture and manages the operating system and control functions. Each SPE has simplified instruction sets which use 128-bit SIMD instructions and have 256KB of local storage. Direct Memory Access is used to transfer data between local storage and main memory which allows for the high number of concurrent memory transactions. The PPE and SPEs are connected via the Element Interconnect Bus providing internal communication.

Other interesting features of the CELL are the Power Management Unit and Thermal Management Unit. Power and temperature are fundamental concerns in microprocessor design. The PMU allows for power reduction in the form of slowing, pausing, or completely stopping a unit. The TMU consists of one linear sensor and ten digital thermal sensors used to monitor temperature throughout the chip and provide an early warning if temperatures are rising in a certain area of the chip. The ability to measure and account for power and temperature changes has a great advantage in that the processor should never overheat or draw too much power.

Tilera TILE64

Tilera has developed a multicore chip with 64 homogeneous cores set up in a grid, shown in Figure 6. An application that is written to take advantage of these additional cores will run far faster than if it were run on a single core. Imagine having a project to finish, but instead of having to work on it alone you have 64 people to work for you. Each processor has its own L1 and L2 cache for a total of 5MB on-chip and a switch that connects the core into the mesh network rather than a bus or interconnect. The TILE64 also includes on-chip memory and I/O controllers. Like the CELL processor, unused tiles (cores) can be put into a sleep mode to further decrease power consumption. The TILE64 uses a 3-way VLIW (very long instruction word) pipeline to deliver 12 times the instructions as a single-issue, single-core processor. When VLIW is combined with the MIMD (multiple instruction, multiple data) processors, multiple operating systems can be run simultaneously and advanced multimedia applications such as video conferencing and video-on-demand can be run efficiently. [26]

Tilera TILE64
Figure 6: Tilera TILE64 [28]

Go To Multicore Challenges

© 2008, ProQuest LLC. All rights reserved.