Intel manufactured the first microprocessor, the 4-bit 4004, in the early 1970s which was basically just a number-crunching machine. Shortly afterwards they developed the 8008 and 8080, both 8-bit, and Motorola followed suit with their 6800 which was equivalent to Intel's 8080. The companies then fabricated 16-bit microprocessors, Motorola had their 68000 and Intel the 8086 and 8088; the former would be the basis for Intel's 80386 32-bit and later their popular Pentium lineup which were in the first consumer-based PCs. [18, 19] Each generation of processors grew smaller, faster, dissipated more heat, and consumed more power.
One of the guiding principles of computer architecture is known as Moore's Law. In 1965 Gordon Moore stated that the number of transistors on a chip will roughly double each year (he later refined this, in 1975, to every two years). What is often quoted as Moore's Law is Dave House's revision that computer performance will double every 18 months.  The graph in Figure 1 plots many of the early microprocessors briefly discussed in Section 1.1 against the number of transistors per chip.
Figure 1: Depiction of Moore's Law 
As shown in Figure 1, the number of transistors has roughly doubled every 2 years. Moore's law continues to reign; for example, Intel is set to produce the 'world's first 2 billion transistor microprocessor' "Tukwila" later in 2008.  House's prediction, however, needs another correction. Throughout the 1990's and the earlier part of this decade microprocessor frequency was synonymous with performance; higher frequency meant a faster, more capable computer. Since processor frequency has reached a plateau, we must now consider other aspects of the overall performance of a system: power consumption, temperature dissipation, frequency, and number of cores. Multicore processors are often run at slower frequencies, but have much better performance than a single-core processor because 'two heads are better than one'.
Past Efforts to Increase Efficiency
As touched upon above, from the introduction of Intel's 8086 through the Pentium 4 an increase in performance, from one generation to the next, was seen as an increase in processor frequency. For example, the Pentium 4 ranged in speed (frequency) from 1.3 to 3.8 GHz over its 8 year lifetime. The physical size of chips decreased while the number of transistors per chip increased; clock speeds increased which boosted the heat dissipation across the chip to a dangerous level. 
To gain performance within a single core many techniques are used. Superscalar processors with the ability to issue multiple instructions concurrently are the standard. In these pipelines, instructions are pre-fetched, split into sub-components and executed out-of-order. A major focus of computer architects is the branch instruction. Branch instructions are the equivalent of a fork in the road; the processor has to gather all necessary information before making a decision. In order to speed up this process, the processor predicts which path will be taken; if the wrong path is chosen the processor must throw out any data computed while taking the wrong path and backtrack to take the correct path. Often even when an incorrect branch is taken the effect is equivalent to having waited to take the correct path. Branches are also removed using loop unrolling and sophisticated neural network-based predictors are used to minimize the misprediction rate. Other techniques used for performance enhancement include register renaming, trace caches, reorder buffers, dynamic/software scheduling, and data value prediction.
There have also been advances in power- and temperature-aware architectures. There are two flavors of power-sensitive architectures: low-power and power-aware designs. Low-power architectures minimize power consumption while satisfying performance constraints, e.g. embedded systems where low-power and real-time performance are vital. Power-aware architectures maximize performance parameters while satisfying power constraints. Temperature-aware design uses simulation to determine where hot spots lie on the chips and revises the architecture to decrease the number and effect of hot spots.
The Need for Multicore
Due to advances in circuit technology and performance
limitation in wide-issue, super-speculative processors, Chip-Multiprocessors
(CMP) or multi-core technology has become the mainstream in CPU
Speeding up processor frequency had run its course in the earlier part of this decade; computer architects needed a new approach to improve performance. Adding an additional processing core to the same chip would, in theory, result in twice the performance and dissipate less heat, though in practice the actual speed of each core is slower than the fastest single core processor. In September 2005 the IEE Review noted that "power consumption increases by 60% with every 400MHz rise in clock speed…But the dual-core approach means you can get a significant boost in performance without the need to run at ruinous clock rates." 
Multicore is not a new concept, as the idea has been used in embedded systems and for specialized applications for some time, but recently the technology has become mainstream with Intel and Advanced Micro Devices (AMD) introducing many commercially available multicore chips. In contrast to commercially available two and four core machines in 2008, some experts believe that "by 2017 embedded processors could sport 4,096 cores, server CPUs might have 512 cores and desktop chips could use 128 cores."  This rate of growth is astounding considering that current desktop chips are on the cusp of using four cores and a single core has been used for the past 30 years.
Go To Multicore Basics