Emotional stories about processors for first computers: part 3 (Motorola 68k)

Motorola was the only company that could successfully compete with Intel in the field of production of processors for personal computers for some time. In 1980, Motorola actually put Intel in a situation of crisis, from which it could only get out of by mobilizing all its forces and organizing the crush group for its competitors, whose actions somewhat violated the US antitrust laws.

The 68000 was released in 1979 and at the first impression looked much better than the 8086. It had 16 32-bit registers (more accurately, even 17), a separate command counter and a status register. It could address 16 MB of memory directly which did not create any restrictions for example for large arrays. However careful analysis of features of the 68000 shows that not everything was as good as it seemed. In those years to have a memory of more than 1 MB was an unattainable luxury even for medium-sized organizations. The 68000 code density was worse than for 8086, which means that 68000 code with the same functionality occupied more space. The latter is also due to the fact that any instruction for the 68k processors should be multiples of 2 bytes in length, and for the x86 of 1 byte. But the information about the code density is controversial as there is evidence showing that in some cases the 68000 could have the better code density. Out of 16 registers of the 68k there are 8 address registers, which in some ways, on the one hand, are slightly more advanced analogues of the x86 segment registers, and, on the other hand, in some ways they are a direct legacy of working with registers on 8-bit architectures (the 8080, 6502, Z80, 6809, etc), where there is also a division into data and address registers. The main thing is that data registers cannot be used as addresses and the set of operations with address registers is limited, which is somewhat inconvenient. The ALU and data bus are 16-bit, so operations with 32-bit data are slower than someone could expect. Moreover, from the contrived Big Endian byte order, addition and subtraction operations with 32-bit numbers are performed with additional overhead. The execution time of register-register operations for the 68000 is 4 cycles, and for the 8086 it is only 2. The interrupt latency for the 68000 may reach 378 cycles, and this is quite a lot. Computers based on the 68000 until the mid-80's were much more expensive than those based on the Intel 8088, but the 68000 could not work with virtual memory and did not have hardware support for working with real numbers, which made it unsuitable for use in the most advanced systems. To support the use of virtual memory, another processor was required, usually another 68000 was used for this. In 1982, as Bill Joy noted, Sun began developing their processor because the 68000 did not meet customer needs for performance, and especially for working with real numbers. Interestingly, because of the large size of the 68000, Motorola people called it the "Texas cockroach".

However, the larger size allowed more functionality to be squeezed in. For example, the 68000 can handle up to 8 external interrupt sources without an external controller. The x86 architecture on the contrary, requires an external interrupt controller (the only very special exception is the 80186). The data and address buses of the 68000 are not multiplexed, unlike the 8086. Only since the 80286 Intel abandoned such multiplexing which is slowing computer systems. The 68000 has also privileged instructions, which are necessary for multitasking support. These instructions the x86 only got since the 80286 too. On the other hand, privileged commands alone are not enough to completely support multitasking, and therefore their presence in the 68000 in contrast to the 80286, looks rather almost useless. Of course, it is surprising that Mototola was not producing any memory management hardware for the 68000, so workstation manufacturers had to invent such hardware themselves. This is even more surprising because support for the simplest segmented memory organization is very easy to implement in hardware. But this topic is not simple: the dividing line between workstations and personal computers was precisely the fact of the presence of hardware memory management. The presence of such a simple and cheap device radically changed the class of the computer and its price!

As always with products from Motorola the architecture of the 68000 shows some clumsiness and far-fetched oddities. For example there are two stacks (the Coldfire processor, which is the most popular 68k processor today, only has one stack) and two carry flags (one for condition checks and another for operations). Despite the presence of two carry flags, addition and subtraction with carry instructions support only two addressing modes, which makes them less convenient than such operations on the x86 – the 68000, thus retaining some of the clumsiness of such operations inherent in the IBM/370 and, in part the PDP-11. The oddities with the flags do not end with that. For some reason many instructions including even MOVE and SWAP zero the carry and overflow flags. Another oddity is that the command to save the state of arithmetic flags which worked normally with the 68000, was made privileged in all processors starting with the 68010. This in particular made it impossible to use the same operation to save flags for the 68000 and the later 68k processors. Thus, you can't save flags as on the x86 (with the PUSHF or SAHF commands) on the 68k in the same way! Motorola should not have made the MOVE from SR command privileged, but instead should have simply changed its description so that it would not return information about system flags in user mode, and added a privileged command specifically to read those flags. Some operations irritate by their non-optimization, for example, the CLR instruction of writing zero to memory is slower than writing a constant 0 to memory with the MOVE instruction or shift to the left is slower than adding an operand to itself. In addition to writing to memory, CLR pre-reads it for no reason, which can create problems when working with ports. Even the address registers while seemingly superior to the 8086 segment registers have a number of annoying disadvantages. For example they need to load as much as 4 bytes instead of 2 for the 8086 and of these four, one was extra. The 68000 command system reveals many similarities with the PDP-11 command system developed back in the 60's although some addressing methods and byte order are almost certainly taken from the IBM/370. A complicated exception handling system is probably borrowed from the PDP-11, when a command tries to continue its execution if it fails – this made accelerating 68k processors more difficult than x86, where the command is simply restarted in such cases. Addressing via base and index registers is done on the 68000 at the level of the 8-bit 6800 or Z80, with a single-byte offset – this is somehow completely impractical. On the 8086 offsets are 16 bits, on the ARM or IBM/370 – 12. Even the 6502 can use a 16-bit offset along with indexing. The normal offset size was only supported since the 68020. Surprises with the 68k can occur with the unusual loop instruction for which we should pass not the number of repetitions, but the number one less. The 68k still lacks an instruction like TEST for the 8086. Despite the great capabilities of Motorola, the 68000 was originally made using a relatively old technology, which was inferior even to the one used in the production of the 6502. Incidentally, one important reason for the general collapse of the 68k was Motorola's inability to quickly mass produce the 68000 and 68020.

To the list of unexpected inconveniences, one can add oddities when working with word-sized data. When loading a word into an address register, sign extension to a double word occurs, when loaded into a data register, this does not happen, but the MOVEM instruction does sign extension for data registers too! Operations with address registers (CMP, ADDA, SUBA) of word size use sign extension only for the first operand, the second is always taken by the original, in two words. The latter may make it more difficult to use address registers to store non-address word values, such as a counter.

One cannot help but be surprised that the 68000 instruction set has two different ways to call subroutines – this is a unique oddity of 68k architecture. The BSR.W addr instruction is absolutely identical in functionality, size and timing to the JSR addr(PC) instruction. Likewise, there are two ways in the command system for making an unconditional jump, which are also absolutely identical: BRA.W addr and JMP addr(PC). Some sense to the BSR and BRA instructions is given only by the presence of their short versions. However, BSR.S can be used relatively rarely, for example, for small recursive subroutines. And, in any case, why support completely useless long versions of these instructions?! There are other almost unnecessary commands, for example, there are both arithmetic and logical left shifts which actually do the same thing. By the way, shifts and rotations with memory can be used only with a single movement and only with 16-bit data – bytes and 32-bit values are not supported even on the 68020 and later processors! It may also be surprising that there is the very little useful absolute short addressing mode – it would be better to do relocatable addressing for the destination operand instead.

The codes for the 68k typically look somewhat more cumbersome and clumsy compared to the x86 or ARM. This is largely due to the abundance of unique S, B, W, L suffixes in the 68k assembler instructions. For example, you can write such a strange and useless MOVE.L D0,(A0,D0.W) instruction, which means that you need to write 32-bits of data from register D0 to the address obtained by adding the contents of 32-bits of register A0 and 16-bits of register D0.

On the other hand the 68000 is generally faster than the 8086 at the same frequencies, according to my estimates by about 10-40%. And with the intensive use of 32-bit data or large arrays, the 68000 can even outperform the 8086 several times. In addition, variants of the first 68k could run at more than twice the frequencies available for x86! The 680x0's code also has its inherent special beauty, elegance and less mechanicality than the x86's. Additionally as shown by eab.abime.net experts, the code density of the 68k is often better than that of the x86. The 68000, like the ARM or VAX can use PC as a base, which is very convenient. The x86 and even the IBM/370 can't do this – support for such addressing appeared only in the x86-64. Although it is worth noting that PC addressing on the 68k is available only for the source operand, it does not work for the destination operand or even one-operand instructions (like TST or NOT), which makes it almost useless. Problems with code relocatability made, for example, the need to divide the code into segments no larger than 32 KB under Macintosh OS. Having more registers is also a significant advantage for the 68000 compared to the 8086, although this advantage is only shown when processing 16- and 32-bit data due to the inability of the 68000 to quickly use separate bytes of a 16-bit word. The increment and decrement operations are very good for the 68k, they allow you to use a step from 1 to 8 – the step is always 1 in the x86 and most other known architectures. The 68k, unlike the x86, can load words only from even addresses, so byte operations with a standard stack work atypically, the stack pointer changes by 2, not 1. For stacks organized through registers A0-A6, this does not happen. The 68k has a very flexible and convenient MOVEM instruction that allows you to save or restore any set of registers – there is a similar instruction for the ARM, but on the x86 you have to use many instructions to save or restore individual registers for such operations. However MOVEM occupies 4 bytes, so when you need to save or restore no more than three registers, the x86 code will be more compact. In addition, the x86 (since the 80286) also has a command for saving and restoring all registers at once, so in the general case the 68k advantage due to the presence of MOVEM is not very significant. The almost complete orthogonality of the 68k's MOVE instruction is also a pleasant feature – data can be transferred from different memory locations without using registers. But this command is an exception, other commands, for instance CMP, are not orthogonal. Another attractive feature of the 68k architecture is addressing modes with auto-increment and -decrement, which are not available on the x86. The user stack's independence from interrupts allows data to be used above the top of the stack, which is unthinkable on the 8086. This very odd and not recommended way of working with the stack is also available on the x86 in multitasking environments, where each task and the system have their own stacks.

Overall the 68000 is a good processor with a large instruction set. It was originally planned for use in minicomputers, not personal computers. It is somewhat ironic, therefore, that the last mass application of this processor was found in the second half of the 90's in calculators and pocket computers. However it is for the 68000 that the development of workstations by Sun, Apollo, HP, Silicon Graphics and later NeXT began. Apple, which made the workstation-class Lisa computer, could also be added to this list. The 68000 was used in many of the now legendary personal computers: the first Apple Macintosh computers that were produced before the mid-90s, the first Commodore Amiga multimedia computers, and in relatively inexpensive and high-quality Atari ST computers. The 68000 was also used in relatively inexpensive computers working with Unix variants, in particular in the rather popular Tandy 16B. It is also worth mentioning the fast and inexpensive Sage computers which for some time were the fastest personal computers in the world – their development was very dramatic. Interestingly IBM simultaneously led the development of the PC and the System 9000 computer based on the 68000, which was released less than a year after the PC.

The Apple Lisa – it's strange that the first 68000-based Apple computers (the Lisa and Macintosh) had black-and-white graphics, whereas the eight-bit Apple II computers had colors

This is a famous demo for the Amiga 1000, such graphics in 1985 seemed incredible fantasy. This is an image in GIF format, which allows you to show only 256 colors out of 4096 displayed by the real Amiga – other formats for full-color animated graphics have still not been well supported

The 68010 appeared clearly belatedly only in 1982 at the same time when Intel released the 80286, which put personal computers on the same level as mini-computers. The 68010 is pin-compatible with the 68000 but the system of its instructions is slightly different, so the replacement of the 68000 by 68010 has not become popular. This incompatibility was caused by a far-fetched reason to bring the 68000 into more correspondence with the ideal theory of virtualization. Another almost useless innovation was the ability to relocate the interrupt vector table. The 68010 is only slightly no more than 10% faster than the 68000. In the 68010, a bug was finally fixed that prevented the use of virtual memory. Obviously the 68010 was badly losing to the 80286 and was even weaker than the 80186 that appeared in the same year. Like the 80186 the 68010 almost never found a use in personal computers.

The 68008 was also released in 1982 probably with a hope of repeating the success of the 8088. It's the 68000 but with an 8-bit data bus which allowed it to be used in cheaper systems. But the 68008 like the 68000 does not have an instruction queue which makes it about 50% slower than the 68000. Thus the 68008 may even be a little slower than the 8088, which is only about 20% slower than the 8086 due to the presence of the instruction queue. IBM offered to make the Motorola 68008 by 1980, but then were refused, although it would have cost, according to Motorola employees, the work of one employee for less than a month. If the refusal had not occurred, it was possible that IBM would have chosen the 68008 for the IBM PC.

Based on the 68008 Sir Clive Sinclair made the Sinclair QL, a very interesting computer that because of the lower price could compete with the Atari ST and similar computers. But Clive in parallel and clearly prematurely began to invest a lot in the development of electric vehicles leaving the QL (Quantum Leap) rather as a secondary task, that in the presence of some unsuccessful constructive decisions led the computer and the whole company to premature closure. The company became part of Amstrad, which refused to produce QL.

It would be interesting to calculate the bit index for the 68000, which seems to me clearly higher than 16 although maybe it is not higher than 24.

Appearing in 1984 the 68020 again returned Motorola to the first position. In this processor many very interesting and promising innovations were realized. The strongest effect is certainly the instruction pipeline, which sometimes allows you to execute up to three instructions at once! The 32-bit address bus looked a little premature in those years, and therefore a cheaper version of the processor (the 68020EC) with a 24-bit bus was available, but the 32-bit data bus looked quite appropriate and allowed to significantly speed up the processor. The built-in cache appeared to be an innovation even though it had a small 256 bytes of capacity, which allowed it sometimes to significantly improve the performance because the main dynamic memory could not keep up with the processor. Although in the general case, such a small cache only slightly affected the performance. Quick enough operations for division (64/32 = 32,32) and multiplication (32*32 = 64) for approximately 80 and up to 45 cycles respectively were added. The timings of the instructions were generally improved for example the division (32/16 = 16,16) began to be performed for approximately 45 cycles (more than 140 cycles in the 68000). Some instructions in the most favorable cases can be performed without occupying clocks at all! New address modes were added in particular with scaling, in the x86 this mode appeared only in the next year with the 80386. Other new address modes allow the use of double indirect addressing using several offsets, the PDP-11 has been remarkably outdone here.

Some new instructions for example bulky operations with bit fields or new operations with decimal numbers that have become little needed in the presence of rapid division and multiplication looked more like a fifth wheel of a bus than something essentially useful. Address modes with double indirect addressing theoretically look interesting but practically are needed quite rarely and are executed slowly. These modes, as well as the redundancy of flag generation, did not fit well into the coming era of multiscalar architectures. The ability to use 32-bit offsets in addressing was rather a premature innovation, since such large offsets were almost never required for memory volumes on systems before the mid-90s. Here again as in the case of the 68000, Motorola asked users to pay for the ability to work with such large memory sizes that could not actually be provided with hardware yet. Unlike the 80286 the 68020 takes time to compute the address of the operand, the so-called effective address. The division at the 68020 is still almost twice as slow as the fantastic division of the 80286. Multiplication and some other operations are also slower. Overall, the 68020 is noticeably slower than the 80286 for byte operations. On operations with 16-bit data the 68020 is only slightly slower and only on operations with 32-bit data the 68020 is clearly superior to the 80286. The 68020 doesn't have a built-in memory management unit and the rather exotic ability to connect up to eight coprocessors couldn't fix this. The chief architect of the 68000 himself admitted that too many addressing modes were made in the 68020 and that the result was therefore some kind of monster. They focused on the VAX and the ease of assembly programming, but the future came with RISC, higher speeds and powerful compilers. In addition, here's another quote from Bill Joy: "It became clear that Motorola was doing with their microprocessor line roughly the same mistakes that DEC had done with their microprocessor line, in other words, 68010 68020 68040, were getting more and more complicated. And they were slipping and they weren't getting faster anywhere near the rate that the underlying transistors were getting faster". It is also worth adding that a third stack (specifically for interrupts) was added to the 68020!

It is not surprising therefore that in the modern development of the 68k architecture almost all new instructions of the 68020 have been abandoned. This applies in particular to the Coldfire and 68070 processors used in embedded systems.

The 68020 was widely used in mass computers the Apple Macintosh II, Macintosh LC and Commodore Amiga 1200, it was also used in several Unix systems.

The appearance of the 80386 with a built-in and very well-made MMU and 32-bit buses and registers again put Motorola in position number 2. The 68030 appearing in 1987 for the last time briefly returned the leadership to Motorola. The 68030 has a built-in memory management unit and a doubled cache, divided into a cache for instructions and data, it was a very prospective novelty. The MMU of the 68030 does not slow down, as it did with the external MMU of the 68020. In addition the 68030 could use a faster memory access interface which can speed up memory operations by almost a third. However, in general, working with memory remained slow – 4 clock cycles per access, i.e. the number of clock cycles remained the same as for the 68000. It was even joked about as "Motorola's standard memory cycle". For comparison, the 80286 took 2 clock cycles, while the ARM or 6502 took 1. To be fair, it should be added that officially the memory access period for the 68020 and 68030 takes 3 cycles, but in many instructions it actually turns out to be rather closer to 4. Despite all the innovations the 68030 turned out to be somewhat slower than the 80386 at the same frequency. However the 68030 was available at frequencies up to 50 MHz, and the 80386 only up to 40 MHz, which made top systems based on the 68030 slightly faster. It can be surprising that the 68030 does not support several instructions of the 68020 (CALLM and RTM)! Shortcomings in the architecture of the 68k processors forced major manufacturers of computers based on these processors to look for a replacement. Sun started producing its own SPARC processors, Silicon Graphics switched to the MIPS processors, Apollo developed its own PRISM processor, HP started using its own PA-RISC processors, ATARI started working with custom RISC-chips, and Apple was coerced to switch to the PowerPC processors. Interestingly, Apple was going to switch to the SPARC in the second half of the 80's, but negotiations with Sun failed. One can only wonder how poorly the management of Motorola was working, as if they themselves did not believe in the future of their processors. Here we can also add that Motorola made a variant of the 68030 processor without an MMU! This option was used in the cheapest models of the Commodore Amiga 4000. Intel did not release such products, although the MMU was not needed for the then most popular DOS operating system.

The 68030 was used in computers of the Apple Macintosh, Commodore Amiga 3000, Atari TT, Atari Falcon and some others.

With the 68040 Motorola once again tried to outperform Intel, this processor appeared a year later after the 80486. However the 68040's set of useful qualities was never able to surpass the 80486's. In fact the Motorola 68k having a more overloaded system of instructions was not able to support it, and in a sense has disappeared from the race. In addition, Motorola also participated in the development of the PowerPC, which was planned to replace the 68k and this could not but affect the quality of the 68040 development. In the 68040 only a very truncated coprocessor could be placed to work with real numbers, and the chip itself was heated significantly more than the 80486. According to the results on lowendmac.com/benchmarks, the 68040 only about 2.1 times faster than the 68030 which means that the 68040 is slightly slower than the 80486 at the same frequency. Although on some tests the 68040 is significantly faster than the 80486. The 68040 almost did not find applications in popular computers. Some noticeable use was found only by its cheaper version the 68LC040 which does not have a built-in coprocessor. However the first versions of this chip had a serious hardware defect which did not allow using even the software emulation of the coprocessor! Perhaps this was done intentionally, since the Power Macintosh was not supposed to emulate instructions of the math coprocessor. But the main problem with the 68040 is that Motorola was never able to make a frequency doubling version of it, as Intel did for the 80486DX2 in 1992.

Motorola always had problems with mathematical coprocessors. As was mentioned above Motorola never released such a coprocessor for the 68000/68010, while Intel had released its very successful 8087 since 1980. Atari, however, found a way to turn on the coprocessor for the 68020/30 to the 68000 in ST-series computers, the coprocessor could be connected there as a memory-mapped device, which of course slowed it down and required the use of atypical codes. For the 68020/68030, it turned out to be too much, for them two coprocessors were produced at once: the 68881 and faster pin-compatible 68882. However, these coprocessors were not 100% compatible and in order to get a noticeable increase in performance from using the 68882, code had to be generated in a special way. Thus, codes generated for the 68881 executed only marginally faster on the 68882. The 68882 appeared later and cost much more than 68881. Interestingly, the built-in coprocessor of the PowerPC architecture has less accuracy than the 68881/82, so results of calculations with fractional numbers on the Power Macintosh are less qualitative than on the previous generation of Macintosh computers!

It is appropriate to say that the Intel x86 still has problems with the mathematical coprocessor. The accuracy of calculations of some functions, for example the sine of some arguments, is very small, sometimes no more than 4 digits. Therefore modern compilers often calculate such functions without using the services of the coprocessor.

Surprisingly, Motorola was still able to release a 68k Pentium-class processor, the 68060 in 1994. This processor also had problems with floating point arithmetic. And most importantly not a single popular system remained except for the Commodore Amiga, where the 68060 could find application, but the Commodore company went bankrupt in the same 1994. According to some conspiracy theories, Commodore went bankrupt, in particular, due to the fact that the 68060 could have competed with the Power PC architecture that the Apple Macintosh computers began to use.

Motorola processors up to and including 1994 were generally quite comparable to the Intel x86 and in some important aspects they were always better. However, Intel unlike Motorola spent a lot of effort to retain its customers and attract new ones. Moreover in the fight against its main competitor, Intel sometimes acted rather not kindly. For instance, it's hard to believe that a big review article in Byte magazine from 9/1985, where is stated about the 68000 without proof that "compared to the 8086/8088, it required a massive software effort to get it to do anything", could appear outside the context of this struggle. On the other hand, Motorola did everything later and more expensive than Intel. In addition, Motorola processors clearly lacked originality, too much was copied from DEC and IBM technologies. Among the reasons for the collapse of 68k in particular and Motorola in general, a political reason is sometimes mentioned, namely the company's low investment in the Israeli economy, which made it very different from Intel. Of course, the failure of the 68k was caused by complex reasons, combining both weak strategic marketing and some architectural shortcomings.

But the story didn't end there. In 2015, when Motorola has long been a thing of the past, the Apollo Core 68080 was released!

For some reason, the processors of the 68k architecture did not even try to clone in the USSR (only the 68881 was cloned), although the Besta computer was developed on the basis of the 68020.

Motorola: from the 68000 to 68040