The first ARM processors

The ARM-1 processor was an astonishing development, it continued the 6502 ideology (namely to make a processor that is easier, cheaper and better), and was released by Acorn in 1985. This was at the same time when Intel's technological miracle the 80386 processor appeared. ARM consisted of about ten times less transistors and therefore consumed significantly less energy and was at the same time much faster on average. Indeed ARM did not have an MMU and even divide and multiply operations, so in some calculations based on the division the 80386 could be faster. However the advantages of ARM were so great that today it is the most mass processor architecture, more than 100 billion such processors have been produced.

The ARM's development in 1983 began after Acorn conducted research with the 32016 processor, which showed that many calculations with the 6502 at twice the lower operating frequency than the 32016 could be faster than with what seemed to be a much more powerful processor. At that time the 80286 was already available, it showed very good performance. But Intel perhaps sensing the potential of Acorn refused to provide its processor for testing. The technology of the 80286 was not restricted as was the 80386 and was transferred to many companies, so history is still waiting for the disclosure of details of this somewhat unusual refusal. Perhaps if Intel had allowed the use of its processor, then Acorn would have used it, and would not have developed the ARM.

The ARM was developed by only a few people, and they tested the instruction system using BBC Micro's Basic. The development itself took place in the building of a former barn. Interestingly, one of the main developers of the 6502, Bill Mensch, was the first who was given an opportunity to make the ARM electronics. But, he immediately realized that the ARM was a competitor to the best developments of large companies and decided not to get involved, perhaps fearing that otherwise his company WDC would face the fate of MOS Technology. The processor was eventually made by VLSI. The debut of the ARM turned out rather unsuccessfully. In 1986 the second ARM processor for the BBC Micro was released with the name of the ARM Evaluation system, which contained 4 MB of memory in addition to the processor (this was very much for those years), which made this attachment a very expensive product (above 4000 pounds, it was about $6000). Indeed if you compare it with the computers of that time with comparable performance capabilities, this second processor turned out to be an order of magnitude or even almost two orders of magnitude cheaper. There were very few programs for the new system. This was a bit strange because it was quite possible to port Unix for this system, there were a lot of Unix variants available in that time which didn't require MMU, there were such Unix variants for the 68000, PDP-11, 80186 and even 8088. Linux was ported for the Acorn Archimedes only in the 90's. Perhaps the delay in the appearance of a real Unix for the ARM was caused by Acorn's reluctance to transfer the ARM technology to other companies.

The first ARM based system

The Acorn's somewhat unsuccessful marketing policy led to a very difficult financial situation in 1985. Acorn in addition to the ARM also tried to conduct expensive development of computers for business which failed, in particular due to the shortcomings of the 32016 processor chosen for them. The Acorn Communicator computer was also not very successful. The development of a relatively successful but not quite IBM PC compatible computer Master 512, was very costly. In addition a lot of financial resources were spent in an unsuccessful attempt to enter the US market, which the Italian company Olivetti, with its rather successful Intel 8086 and 80286-based computers was allowed to enter into, as part of a hypothetical big game of absorbing Acorn itself. By the way after the absorption of Acorn the role of Olivetti in the US market quickly faded away.

As part of Olivetti Acorn developed an improved ARM2 chip with built-in multiplication instructions, on the basis of which the Archimedes personal computers were made. They were stunning then for their speed. The first models of those computers became available in 1987. However Olivetti's management was focused on working with the IBM PC compatible computers and did not want to use its resources to sell Acorn products. It is also surprising that the Archimedes did not replace the BBC Micro in English schools, perhaps this happened due to a failed deal with the USSR for the Memotech MTX computers. Memotech received a million pounds from the British government, and after the failure of the deal declared itself bankrupt. After that, the government stopped the practice of supporting its computer manufacturers, including Acorn.

The ARM provides for the use of 16 32-bit registers. There are actually more of them if we take into account the registers for system needs. One of the registers R15 or PC is (like the PDP-11 architecture) a program counter. Almost all operations are performed in 1 clock cycle, more cycles are needed in particular for jumps, multiplications and memory accesses. Unlike popular processors of those years ARM was distinguished by the absence of such a typical structure as a stack. The stack is implemented if necessary through one of the registers (although R13 or SP is considered standard for the stack). When calling subprograms the stack is not used; instead the return address is stored in the register allocated for it (R14 or LR - Link Register). Such a scheme obviously does not work for nested calls for which the stack has to be organized. A unique feature of the ARM is the combination of the program counter (which is 26-bit and therefore it allows you to address up to 64 MB of memory) with a status register. For flags in this register eight bits are allocated, two more bits in this register are obtained due to a fact that the lower two bits of the address are not used, since the codes must be aligned along the 4-byte word boundary. The processor can refer to bytes and 4-byte words, it cannot directly access 16-bit data. The ARM's instructions for working with data are 3-address.

A characteristic feature of RISC architecture is the use of register-memory commands only for loading and storing data. The ARM has a built-in fast bit shifter (barrel shifter) that allows you to shift the value of one of the registers in an instruction by any number of times without any clock cycle. For example multiplying the value of register R0 by 65 and placing the result in register R1 can be written with one single-cycle addition command ADD R1, R0, R0 shl 6, and multiplying by 63 – with one instruction RSB R1, R0, R0 shl 6. The barrel shifter allows, in addition to signed and unsigned shifts to the left or right, to also do rotates and even a rotate through carry. In the instruction system there is a reverse subtraction, which allows in particular to have a unary minus as a special case of this instruction and speed up the division procedure. Several instructions are rather very unusual and their usefulness is questionable, for example, there is a CMN instruction that adds arguments, but the result of addition disappears. This command is used only to set flags. There is also a similar TEQ instruction, in which addition modulo 2 (XOR) is used instead of the usual addition. The only thing that makes some sense to these instructions is that they actually expand a very limited range of constants for use in comparison operations. In addition, TEQ can be used for comparisons that do not change the C and V flags and for directly setting all 8 flags. In addition to RSB, the ARM has another unique feature: all its instructions are conditional. There are 16 cases (flag combinations) that are attached to each instruction, an instruction is executed only if the current set of flags corresponds to the set in this instruction. In processors of other architectures such an execution takes place, as a rule only for conditional jumps. This feature of the ARM allows to avoid slow jump operations in many cases. The latter is also facilitated by a fact that when performing arithmetic operations you can refuse to set status flags. With the ARM like with the 6809 processor you can use both fast and regular hardware interrupts. During a normal interrupt, two registers (R13, R14) are replaced with system registers. With a fast interrupt, even 7 registers (R8-R14) are replaced. This makes interrupt handlers more compact and faster. In supervisor mode, registers R13 and R14 are also replaced. Thus, ARM actually uses 27 registers. The maximum interrupt latency is small, the main delay is that block copy commands (up to 18 clock cycles) cannot be interrupted. There is a special SWI command for calling software interrupts, it switches the processor to supervisor mode.

The ARM instruction system contains significantly fewer basic instructions than the x86 processor instruction system, but the ARM instructions themselves are very flexible and powerful. Several very convenient and powerful ARM instructions have no analogues for the 80386, for example, the RSB (reverse subtraction mentioned afore), the BIC (the AND with inversion, such a command exists for the PDP-11), the 4-address MLA (multiplication with accumulation), the LDM and STM (loading or unloading multiple registers from memory, they are both similar to the MOVEM command for the 68k processors). Almost all of the ARM instructions are 3-address, and almost all of the 80386 instructions have no more than 2 operands. The ARM command system is more orthogonal that means that all registers are interchangeable, some exceptions are registers R14 and R15. Most of the ARM's commands may require 3-4 of the 80386's commands to emulate them, and most of the 80386's commands can be emulated by only 2-3 ARM commands. Interestingly the IBM PC XT emulator on the hardware of the Acorn Archimedes with an 8 MHz processor runs even faster than a real PC XT computer. In the Commodore Amiga with the 68000 @7 MHz, the emulator can only work at a speed no greater than 10-15% of the real PC XT. It is also fascinating that the first computers NeXT with the 25 MHz 68030 showed the same performance of integer calculations as the 8 MHz ARM. Apple was going to make the Apple ]['s successor in the Möbius project, but when it turned out that the prototype of this computer in the emulation mode overtook not only the Apple ][ but also the Macintosh based on the 68k processors, the project was closed!

Among the shortcomings of the ARM we can highlight the problem of loading an immediate constant into a register. You can load only 8 bits at a time, although the constant can be inverted and shifted. Therefore loading a full 32-bit constant can take up to 4 instructions. You can of course load a constant from memory with one instruction, but here the problem arises of specifying an address of this value, since the offset can only be 12-bit. Another shortcoming of the ARM is its relatively low code density, which makes the programs somewhat large and, most importantly reduces the efficiency of the processor cache. However this is probably the result of the low quality of the compilers for this platform. Multiplication instructions allow you to get only the lower 32 bit of the product. For a long time a significant drawback of the ARM was the lack of built-in support for memory management (MMU), Apple for example demanded this support in the early 90's. Coprocessors for working with real numbers for the ARM architecture also began to be used with a significant delay. The ARM did not have such advanced features for debugging as the x86 had. There is still some oddity in the standard assembler language for the ARM: it is standard to write operations for the barrel shifter separated by commas. Thus instead of the simple form R1 shl 7 (shift the contents of the register R1 by 7 bits to the left) you need to write R1, shl 7.

Since 1989 the ARM3 has become available with a built-in cache, it also has an interesting SWP instruction for atomic data exchange between registers and memory. In 1990 the ARM development team separated from Acorn and created ARM Holdings with the help of Apple and VLSI. One of the reasons for the separation was the excessive cost of ARM development in the opinion of Acorn-Olivetti management. It is an irony that subsequently Acorn ceased its independent existence and ARM Holdings became a large company. However the separation of Acorn and ARM Holdings was also initiated by Apple’s desire to have the ARM processors in its Newton computers and not be dependent on another computer manufacturer. By the way, in 1999 VLSI lost its independence, becoming part of Philips.

In 1991, ARM Holdings released the ARM6, which launched a new line of processors that became the forerunners of those that are still widely used today. It implemented support for virtual memory, made the transition to 32-bit addressing and added a number of new features. The history of classic ARM processors was ended by the ARM250, released in 1992. This is actually the ARM3 without a cache, but integrated on the same chip on-die with some basic Acorn Archimedes chips. Such integration later became one of the typical features of the ARM architecture. Another feature of this architecture is the confusing naming of processors and architectures, for example, the ARM6 is a processor of the ARMv3 architecture, while the ARM3 and ARM250 are processors of the ARMv2 architecture.

The ARM showed performance on integer data exceeding the 80486 at the same frequency by approximately 10-20%! Intel was able to achieve the advantage by using clock multiplication technology. Later Intel could firmly fix this advantage with the Pentium. The StrongARM (developed by DEC) was able to briefly regain the ARM's leadership in 1996, after which the technology was purchased by Intel, which has since been a large manufacturer of ARM-architecture processors. Thus, there are several centers of development of this architecture.

Further development of the ARM architecture is also very interesting, but this is another story. Although it can be mentioned that thanks to a share in ARM Holdings Apple was able to avoid bankruptcy in the 90's and, moreover, Apple started the general transition to the ARM architecture in 2022!

A lot of thanks to jms2 and BigEd who helped to improve the style and content. Edited by Richard BN


mirror