First AIM PowerPC processors

The first part was an overview of many different processors up to the mid-90s. The second part was an overview of IBM mainframe processors. Recently, I had the opportunity to program a little for PowerPC, on the basis of which it became possible to add another part to these reviews.

The first processor of the RISC architecture was the 801 processor developed at IBM in the second half of the 70s by John Cocke, which showed a 2-fold increase in performance compared to the CISC architecture processors used on mainframes. Based on the 801, the ROMP processor was developed, which was used in the IBM RT PC workstation (RISC Technology Personal Computer), which appeared in 1986. This processor uses 32-bit registers instead of 24-bit ones on the 801 and has the support for working with virtual memory, which was not on the 801. The first version of the ROMP was made in 1981 and this makes it one of the first RISC processors with a 32-bit architecture (the others being the AT&T Bellmac 32 and Berkeley RISC). However, IBM delayed the introduction of this technology until 1985, which effectively means that advances in the ARM manufacturing were critical in pushing IBM towards the widespread use of RISC processors. IBM could have fallen behind with innovations from a small company, Acorn. Interestingly, IBM eventually lost to ARM. This result was likely a natural consequence of delays in the introduction of new technologies, which may have been part of the global policy of those times.

Further developments in the field of the RISC architecture led IBM in 1990 to the creation of the POWER (Performance Optimization With Enhanced RISC) architecture, which was first used in the POWER1 processors, which found their way, in particular, in the RS/6000 workstations. The POWER architecture is still evolving, with the latest being the POWER10 (2021) processor. The POWER architecture became the basis for the development of the PowerPC (Performance Optimization With Enhanced RISC – Performance Computing) processor by the AIM (Apple-IBM-Motorola) alliance, which was able to make its first processor in 1992.

It's still not entirely clear why Apple chose the PowerPC over other options. They could have just invested in the 68k development and helped Motorola achieve a qualitative evolution of its processors, as Intel was able to do. Although theoretically the 68k is worse suited to the multiscalar architecture than the x86, the practical results shown by the 68060 suggest that this direction was not hopeless. The 68060 turned out to be even faster than the Pentium on a number of standard calculations. In addition, Motorola also had its very fast 88000 RISC processor, which for some reason was ignored by almost all IT companies. Apple, however, tried using the 88000 instead of the 68k, and a 68k emulator and two prototypes were created that successfully booted macOS. But further developments were stopped for reasons that are not entirely clear. It is hard to believe that the unofficial campaign to undermine Motorola, launched in 1980 by Intel, had nothing to do with all subsequent failures and the final collapse of Motorola. Apple could have invested in the ARM architecture, but this most promising option may not have passed, due to the fact that at that time ARM was still controlled by Acorn. There was also an almost successful migration to the Sun SPARC architecture. Sun itself switched to SPARC from 68k, which could have greatly simplified such a transition for Apple, but at the last moment the deal fell through. Also, for unclear reasons related to big business and politics, and not to technical characteristics, there was no deal to transition to MIPS architecture processors. Besides Apple computers, PowerPC processors were also used in low-volume AmigaOne computers, reincarnating the iconic Commodore Amiga computers. There were other PowerPC-based systems.

A characteristic feature of PowerPC-based Macintosh systems is very good support for 68k processor emulation, which allows you to use almost all programs written for 68k systems in exactly the same way as PowerPC programs. Interestingly, for the first Power Macintosh, Apple offered an option with an additional Intel 80486 processor to be able to work with Microsoft and IBM operating systems. This was hard to imagine with the first Macintosh, at that time such additional boards were typical only for Commodore Amiga computers. By the way, the Power Macintosh started using Intel's PCI bus in 1995.

iMac G3, the most successful computer with the PowerPC processor, more than 5 million of them were sold

The PowerPC is a very unusual processor compared to other processors of both RISC and CISC architectures. It is especially unusual in its way of working with flags. Comparison result flags (four bits) can be stored in eight different slots! It is possible to collect the results of up to 8 operations and use them with a delay in subsequent codes! The results of comparisons can be shifted from one slot to another, or even bitwise operations can be performed on the slots – also very unusual properties. There are instructions for signed and unsigned comparisons, which are simply inherited from the IBM/370 architecture. The carry flag is not used in comparisons and is therefore the only one. Working with two overflow flags is also unusual. There is a single current overflow flag, when it is set, the summary overflow flag is also set, which can only be reset in a special way, ordinary arithmetic commands cannot reset it. This flag of the summary (not current!) overflow is copied when executing arithmetic commands to a tetrad of result flags. Somehow it looks very over-the-top.

Another characteristic feature inherited from IBM/370 and POWER is the archaic system of naming assembly instructions. The name of each instruction reflects the types of its arguments. For example, LWZ means to load 32 bits (a word) from memory into a register and LBZ means to load a byte. Such a system allows you to name registers with numbers, but almost no one has been using this for a long time. All assemblers allow you to use names like R1 for registers, and in the absolute majority of codes they do so. Also, a special role of register 0 has been inherited from IBM/370 and POWER. It is replaced by zero when used in addressing and in several other cases.

The PowerPC architecture is also distinguished by the unusual way it works with registers. First of all, there are really a lot of registers. There are 32 general-purpose registers (and for some reason, register number 1 was chosen for the stack pointer), 32 registers for working with floating-point numbers, a register for storing flags of the results of integer mathematical operations, which, as already noted, makes it possible to save up to 8 tetrad slots with such flags. For floating-point numbers, there is also a special register for storing flags and other bit values. There are also special registers, the number of which can be more than 1000! Of these special registers, only a few are available for non-system programming. Some of the special registers may even have different purposes on different PowerPC variants! Three special registers are necessary for application programming: the exception register (it stores, in particular, the carry and both overflow flags), the LR link register (needed to call subroutines, similar to a similar register in the IBM/370, POWER, ARM architectures, etc.), and the counter register. The last two registers are actually an addition to the general purpose registers (GPR). Interestingly, the counter register can also be used as an address. You can notice the absence of the PC register (program counter) in the PowerPC architecture, which is really quite unusual. The status register is a system register and therefore not available for application programming. There are many other system registers in PowerPC. It should be borne in mind that the system registers and special registers are completely different sets of registers.

The PowerPC instruction set is quite large and rather atypical for RISC architectures. Although, strictly speaking, it follows from the name POWER that this architecture only uses the RISC architecture as an element that increases performance, while incorporating other elements. In addition, despite the abundance of commands and registers, PowerPC has very simple addressing methods, which is consistent with the RISC ideology. Instructions usually have three operands. You can use registers and 16-bit constants, which compares favorably with the ARM architecture, where constants are only 8-bit, or IBM/370, where they are 12-bit. For addressing, you can use either an address in a register with a 16-bit explicit offset (ARM uses 12-bit offsets), or index addressing which is the sum of two registers. Using register 0 makes it possible to use just an explicit 16-bit address or an address in a single register. You may notice that there is no relocatable addressing.

Arithmetic commands are distinguished by several unusual features. For example, along with the usual additions, there are also additions with a 16-bit constant, which is considered a high half-word, adding 0 or -1 with the carry. The last two commands use only two operands. Similar variations exist for subtraction, but only in the reverse version, which subtracts from the last operand. 

Math instructions don't change flags by default. To generate them, you need to use special versions of instructions, and for comparison commands, specify in which of the 8 slots the flags should be placed (slot 0 is used by default for integers and slot 1 for floating-point). To generate the carry or use it, special variants of the instructions are also needed. And to generate the overflow, you also need a special version of the instructions. In general, an instruction can have 12 options, differing only in the way they work with flags! Let's consider all variants of addition instructions:
1) ADD ADD. ADDO ADDO. ADDC ADDC. ADDCO ADDCO. ADDE ADDE. ADDEO ADDEO (the main 12);
2) ADDI ADDIS ADDIC ADDIC. (for working with a 16-bit constant);
3) ADDME ADDME. ADDMEO ADDMEO. ADDZE ADDZE. ADDZEO ADDZEO. (two operand, for working with the carry and constants 0 and -1).

As you can notice, not all possible variants of instructions actually exist. There are no, for example, addition instructions with the carry without generating the carry. In total, you can count 24 variants of mnemonics for addition instructions. For subtractions there are slightly fewer, only 21. For comparison, for the x86 architecture there are only two mnemonics for additions: ADD and ADC. For ARM – 4, corresponding to the PowerPC mnemonics: ADD, ADDS, ADC and ADCS.

There are four commands for integer comparison, and taking into account the indication of slot – 32! In most other processors, instead of this variety, there is usually one single CMP instruction.

There are a record number of instructions for working with bitwise logical operations. In addition to the typical AND, OR, XOR and NOT, there are also all other possible bitwise logical operations: NAND, NOR, equivalence, etc. In the AND and OR operations, one of the arguments can be inverted – these are the ANDC and ORC instructions. The name of the second instruction turned out to be very sonorous. The letter C here stands for Complement, but in the addition and subtraction instructions, the same letter stands for Carry, which can be a little confusing. For each bitwise logic instruction with registers, there are variants with and without setting flags. In addition, for the AND, OR and XOR operations, you can use a 16-bit constant in two versions as an operand, as well as for addition or subtraction. Some operations with constants, as in cases of addition and subtraction, can only be used in the variant with flags set, and some only without. For AND and OR operations, you can count six different mnemonics each. In other assemblers, one or two are usually enough.

Multiplication on the PowerPC is similar to multiplication on the ARM, one multiplication gives only half of the result. But unlike the ARM, the PowerPC has multiplication instructions to get both parts of the result. As usual, there are options for commands with and without setting flags. There are also options with and without setting the overflow flag. To get the higher half of the result, there are again two options for signed and unsigned multiplication. In total, we have 8 different names for the multiplication operations.

Division on the PowerPC is different from this operation on other processors, it only produces the quotient. You have to multiply and subtract to get the remainder. As with multiplication, there are 8 options for division. Interestingly, division by zero can be detected by the overflow flag. With sign division, another uncertainty is possible when the largest negative number is divided by -1.

There are some pretty special commands in the PowerPC command system, such as counting the number of leading zeros in a word. Ordinary shifts and rotations on the PowerPC have become special cases of more unusual and complex operations that use a bit mask. Thanks to this mask, in particular, it is possible to shift any bit sequence in a word, for example, the RLWINM R2,R3,4,0,31 instruction simply rotates the value of register 3 to the left 4 times and puts the result in register 2, RLWINM R2,R3,2,0,29 makes a left shift by 2, RLWINM R2,R3,30,2,31 a logical right shift by 2, RLWINM R2,R3,4,0,6 takes 7 bits from position 4 from register 3 and moves them to the beginning of register 2 with all other bits of register 2 set to zero, RLWINM R2,R3,11,25,31 – similar, but transfers these 7 bits to the end of register 2. The same command can simply set bit sequences to zero, for example, RLWINM R2,R2,0,5,5 sets all bits to zero except the 5th in R2, RLWINM R2,R2,0,6,4 sets only bit 5 to zero. The mask can be used not to set to zero, but to keep the selected bits in the result register unchanged, which implements bit insertion, for example, RLWIMI R2,R3,30,2,8 takes the first 7 bits of register 3 and inserts them into position 2 of register 2. You can only rotate to the left, but due to the presence of barrel shifter, replacing the right-out rotations with left-out rotations does not slow down the calculations.

For logical shifts, for some reason, there are 4 duplicating RLWNM operations, which is very strange for the RISC architecture. For arithmetic shifts to the right, special operations are provided, and only they set the carry flag among the operations of shifts and rotations, although they do it completely differently than intuition might suggest. There are no PowerPC commands that can do rotations through the carry, or even just set the carry in a typical way. However, the presence of operations for bit inserts makes it possible to implement multiword shifts and rotations much more efficiently than with the carry.

Memory operations for PowerPC are divided into two groups: those for individual registers and those for groups of registers. The instructions of the first group are very simple and convenient: you can either load a byte, halfword, or word into a register, or save bytes, halfwords, and words from a register into memory. A halfword when loaded into a register is expanded with a sign or zeros, and a byte only with zeros. An interesting feature of PowerPC is addressing with update, when after loading or unloading, the value of the register-address is increased by the size of the loaded or unloaded data. However, unlike ARM, there is no way to increment the address first and then do the memory operation. The PowerPC operations for loading and unloading words and halfwords with the reverse byte order are very interesting.

Group loads and saves are done clumsily in PowerPC, with much less flexibility than for ARM, 68k, or even ancient IBM/360. So PowerPC allows you to load/unload only all registers in a row from this to the last, i.e. R31. Perhaps because of the inconvenience of these instructions, they can be implemented in such a way that they are executed slower than an equivalent group of instructions for individual loading and unloading! For PowerPC, there are even more exotic string or byte-by-byte load-unload instructions that allow both loading unaligned byte sequences into a group of registers and unloading byte sequences to any address in memory. Due to its multiscalar architecture, PowerPC supports special instructions for synchronizing work with memory.

Perhaps the biggest surprise when getting acquainted with the PowerPC instruction system is the great complexity of the branch instructions. They are usually used in simplified rather than complete versions. In the PowerPC, such instructions are processed on a separate unit, in parallel with the execution of other instructions. In branch commands, unlike other commands, you can use not only absolute addressing but also relocatable. For unconditional branches, the explicit address is set to 26 bits, which is quite good for relocatable jumps, but for absolute jumps, it turns out that only the first 64 MB of memory can be addressed. If an absolute jump beyond 64 MB is needed, then the branch address must first be placed in a special register, link or counter. There are no branches on the content of a GPR. However, the special register can only be loaded with the contents of GPR, which makes two additional operations necessary to perform an absolute jump to any memory location.

The biggest problem is related to conditional branches. The explicit address here is given by only 15 bits, which is quite good for relocatable jumps, but almost meaninglessly small for absolute ones. Therefore, instead of the latter, one should usually use conditional branches through the contents of special registers, with all the overhead described above. Almost all conditional jumps use a special counter register that is decremented and compared to zero, which creates one of two conditions for the jump. The second condition is determined by one of the bits in the flag register, i.e. you can select any bit in all eight slots. You can arbitrarily combine both conditions, in particular, ignore one or even both. In the latter case, we get a short unconditional jump, a useless instruction duplicating a long jump. In addition, you can hint the branch predictor, but this feature does not work on all processors. Interestingly, jumping through a counter register can also be done when using this register to generate a branch condition – it's hard to believe that this can somehow be used in practice. It is also interesting that subroutine calls can be made through the link register, i.e. we call the subroutine through the address in this register and store the return address in it. Conditional jumps on the contents of special registers allow conditional returns from subroutines, which is often very useful.

PowerPC, unlike ARM, initially had strong support for system work with memory and cache. Supports paged virtual memory and multitasking. In addition, the idea of switching to a 64-bit architecture was originally laid in PowerPC, so switching to it for this processor is theoretically a completely natural process, unlike x86 or ARM.

The first PowerPC processor (the 601) inherited several dozen commands from the POWER architecture, which were abandoned in subsequent versions of the PowerPC. However, macOS continues to support these commands through emulation. In general, the PowerPC command codes have their correspondences in the POWER architecture, but a number of instructions are defined somewhat differently in both architectures. Interestingly, for some reason, despite this correspondence, the syntax of instructions for POWER and PowerPC is completely different in most cases.

The PowerPC architecture natively supports operations with floating-point numbers. Moreover, integer and floating-point calculations run in parallel, which allows, for example, to speed up integer calculations by using floating-point registers and data movement instructions for integer data. Since floating-point arithmetic has not been reviewed for other systems, there will be no exception for PowerPC with this. We can only mention that the accuracy of calculations on mathematical coprocessors for x86 and 68k is slightly higher than on PowerPC and that created some problems when switching from 68k to PowerPC.

It is interesting to compare the capabilities of the first PowerPC and ARM. PowerPC has several obvious advantages. The main one is a significantly larger amount of GPR. Other benefits include built-in memory management, a built-in math coprocessor, instruction parallel execution, hardware division, masked rotation instructions, indexed addressing, larger values for constants and jumps, and support for working with halfwords. But ARM also has its noticeable advantages: very flexible work with barrel shifter makes many calculations much faster, instructions for group loading and unloading of registers are much more convenient, the presence of common relocatable addressing, significantly lower power consumption, the conditionality of all operations allows you to avoid part of jumps and write more fast and compact codes, and the availability of multiply-accumulate. Modern ARM-32 processors can use halfwords, and full multiplication, and memory management.

In conclusion, a brief list of the most famous PowerPC processors. The 601 processor appeared in 1993 and was used in the first Power Macintosh. The 603 appeared a little later. It is slower than the 601 at the same frequencies, but simpler, which allowed it to produce higher-frequency versions. The 604 appeared in 1994, it has advanced multiscalar capabilities – 6 independent units for executing commands, whereas the 601 has only 3. The 620 appeared with a delay only in 1997 and was the first 64-bit PowerPC. However, it turned out to be slower than the 603 and therefore found only very limited use. The most successful PowerPC processors were variants of the 750 model, which appeared for the first time in 1997. At Apple, this processor was named G3. This is a 32-bit processor. The transition to 64-bit in Apple did not take place immediately. The 7400 was produced since 1999 and supported vector mathematics, Apple called it G4. It was only in 2002 that the 970 processor finally appeared, which became the first 64-bit processor to find application in mass-produced personal computers. In Apple, this processor received the designation G5. And this is where the history of PowerPC in personal computers actually came to an end. The breakup of AIM happened in 2004. Interestingly, Apple began to prepare the transition to the x86 architecture back in the late 90s.

The main purpose of this review was to acquaint readers with the typical features of the first PowerPC processors. I will be glad to receive additions and critical remarks. Thanks for reading.


mirror