AMD Palomino

Originally, AMD’s Palomino core was to have been a relatively minor update to its predecessor – the Thunderbird – that focussed on reducing power consumption and associated heat dissipation. However, in the event its release was slipped by several months and the new core ended up representing a significantly greater advance than had at first been envisaged, both in marketing and technological terms.

Whilst the Palomino can justifiably be considered the fourth Athlon core since the release of the K7 core in 1999 – the 0.18-micron K75 and Thunderbird cores being the others – the rationale behind the Athlon 4 nomenclature heavily featured at the time of the new processor’s original launch clearly had more to do with marketing – and Intel’s multi-million dollar Pentium 4 marketing campaign in particular – than technology. That said, the Palomino also clearly represents an important technological step for AMD, fulfilling, as it does, a role in all of the mobile, desktop, workstation and multiprocessor server market sectors. The exact same Palomino core is deployed in each of these arenas, the variation across market sectors being simply a case of differing clock frequency.

Manufactured using AMD’s 0.18-micron copper interconnect technology, the Palomino comprises 37.5 million transistors on a die of 128mm2 – an increase of only 0.5 million/8mm2 compared with its predecessor – and by using a greater number of these that have been optimised for specific portions of the core, AMD claims to have achieved a 20% decrease in power usage compared to an equivalently clocked Thunderbird core. Additionally, the new core has been improved in three major areas, AMD having coined the term QuantiSpeed Architecture to describe the enhanced core in general and the XP’s ability to achieve a higher IPC than Intel’s Pentium 4 in particular.

The first concerns the Processor’s Transition Lookaside Buffer (TLB). The TLB is best thought of as just another cache which – like the better known L1 and L2 caches – provides a mechanism that further enables the CPU to avoid inefficient access to main memory. Specifically, the TLB caches hold data used in the translation of virtual addresses into physical addresses and vice versa. The probability of a CPU finding the address it needs in its TLB – known as the processor’s TLB hit-rate – is generally very high. This is just as well, because conversely, the penalty when a CPU fails to do so can be as much as three clock cycles to resolve a single address.

The Thunderbird core had only a 24-entry L1 TLB instruction cache and a 32-entry L1 TLB data cache. This compares unfavourably with the Pentium III, which has a 32/72-entry L1 TLB. The Palomino goes some way towards redressing the balance, providing a 24/40-entry L1 TLB in addition to a 256/256-entry L2 TLB – unchanged from its predecessor. A further improvement is that – like its L1 and L2 caches – the Palomino’s L1 and L2 TLB caches are guaranteed not to contain duplicate entries.

Whilst the new core’s L1 and L2 cache sizes and mappings remain unchanged, what is different is the Palomino’s automatic data prefetch mechanism that works alongside its cache. This predicts what data the CPU is likely to need and fetches it from main memory into its cache in anticipation of its request. An evolution of previous designs that have been around for some time, the Palomino’s includes a feature which allows software initiated data prefetch functions to take precedence over the core’s own mechanism.

Hitherto, the Athlon processor has supported only a partial implementation of Intel’s SSE technology. The third major improvement over its predecessor sees the Palomino add a further 52 new SIMD instructions to those supported previously. AMD had dubbed the original 21 SIMD instructions implemented 3DNow! and the 19 added subsequently Enhanced 3DNow!. With Palomino’s implementation of the full SSE instruction set AMD’s associated terminology has been revised to subsequently 3DNow! Professional.

A further innovation is the Palomino’s OPGA (organic PGA) packaging, which replaces the somewhat dated CPGA (ceramic PGA) arrangement used by earlier cores. As well as being lighter and cheaper to produce, the new organic material – which is similar to that used on recent Intel CPUs, albeit brown in colour rather than green – confers advantages in thermal behaviour and greater elasticity than the ceramic material used previously. By allowing capacitors to be mounted closer to the core of the CPU on the underside of the packaging, both delivery of power to the core and the ability to filter out noise are improved.

Despite being very different from the previous CPGA packaging, OPGA continues to be based on the well-established 462-pin Socket A form factor, meaning that new Palomino-based CPUs should fit existing Socket A motherboards. For them to work, however, will require both a BIOS upgrade to ensure the new processor is properly recognised and – since the new processors are designed to support operation at 133MHz only – for the motherboard to allow the FSB to be clocked at this frequency.

In move that harked back to the ill-fated P-rating system first introduced by rival chipmaker Cyrix in the mid-1990s, AMD’s XP family of processors is not referenced according to clock speed, but rather are assigned Model Numbers. AMD’s rationale for doing this is well understood.

Dating from the time of the PC’s introduction in the early 1980s, users have been become accustomed to viewing higher performance as being synonymous with higher clock frequency. Until recently this made sense, since PCs from different manufacturers were based on the same internal architecture and therefore performed nearly an identical amount of work per clock cycle. Things changed with the advent of the Intel Pentium 4 and AMD Athlon processors in the late 1990s when the design architectures of the respective companies fundamentally diverged. The consequence was that rival processors operating at identical frequencies may offer dramatically different levels of performance. The reason for this is because the different architectures are capable of performing different amounts of work per clock cycle.

So, a combination of clock frequency and IPC gives a far truer measure of processor performance, and it is this fact that lies behind AMD’s rating and model numbering system. The company hopes that this will need serve only as an interim solution and is playing a leading role in efforts towards the establishment of a independent institution whose role it will be to create a performance measure that is more equitable than the current clock frequency based scheme and that will be universally adopted in future years.

In the meantime, Athlon XP model rating is based on 14 benchmarks representing 34 application covering the diverse fields of visual computing, gaming and office productivity. The company’s intention appears to be to designate XP model numbers which infer an equivalence with similar sounding Pentium 4 clock frequencies – a Athlon XP 1800+ with a Pentium 4 1.8GHz, for example. Moreover, independent testing would appear to indicate that – initially at least – consumers would not be far wrong in drawing such an inference. How long this continues to be the case – given that the Pentium 4’s architecture will allow it to reach significantly higher clock speeds than its competitor over the long run – remains to be seen.

In a departure from what had been AMD’s usual strategy in launching a new processor, the Palomino was first seen in the guise of a mobile processor in mid-2001. It was later used in Athlon MP dual-processor server systems and the low-end desktop Duron range – where it was referred to as Morgan – before finally appearing AMD’s new line of mainstream Athlon desktop processors in the autumn of 2001. Interestingly, the Athlon 4 nomenclature, so prominent at the time of the processor’s launch, has only been used in the context of the company’s mobile processors, with XP – the letters standing for extra performance – being the preferred marketing terminology for the company’s mainstream desktop processors.

The XP family originally comprised four models – 1500+, 1600+, 1700+ and 1800+ – operating at clock speeds of 1.33GHz, 1.40GHz, 1.47GHz and 1.53GHz respectively. By the beginning of 2002 the range had been extended to the XP 2000+. In deference to AMD’s model numbering strategy, suffice to say that this is likely to have equivalent performance to a 2GHz Pentium 4 processor!