Emotional stories about processors for first computers: part 13 (IBM/370)

This part primarily focuses on comparing the mainframe machine language with other systems that were popular between the 70s and 90s. These are primarily the x86, 68k, VAX, and ARM. The System/390 and, in particular System/Z are considered very fragmentary – the main attention is paid to the System/370.

The IBM/360 was a revolutionary system. For the first time, the concept of compatibility between machines of completely different purposes was implemented, the performance of which could vary hundreds of times. This system also became the first general purpose system. Before the introduction of the S/360, computers were divided into commercial and scientific. By the way, the number 360 means degrees, these systems were supposed to cover the entire range of possible applications. However, this global approach began to crumble as early as the early 1970s, when minicomputers began to successfully displace mainframes in many areas. The idea of having a single architecture for all situations ended up being as futile as the idea of having a unified programming language, which IBM also tried to implement in those years with PL/1. Interestingly, IBM itself continued to develop new systems that were not compatible with the IBM/360, notably the System/3 or AS/400! In this context, it is worth mentioning the very unusual history of the IBM PC, which looks more like Intel's use of the IBM brand to widely introduce its processors. IBM used the Intel 8088 processor without even trying to create a cheap variant of its 801 processor or other in-house developments!

The first System/360 began shipping to customers in 1965, and the more the advanced System/370 from 1970. IBM maintains software compatibility with these systems to this day! Surprisingly, before the System/390 which were delivered as you can guess from 1990, mainframes worked with 24-bit addresses, that is, they could address no more than 16 megabytes of memory, the same amount as, for example, the 68000, released in 1979, or the 65816 or 32016, released in 1982. The VAX initially supported 32-bit addressing. Popular processors such as the 68020 or 80386, which appeared in the mid-80's, also supported 32-bit addresses. Actually, 16 MB of memory for the best systems of the second half of the 80's was already not enough. However, since 1983, IBM was producing the 370-compatible computers that could use 31 bits for an address as an extension, which eliminated the problem of memory capacity for the best computers. Unusually and uniquely, these extensions and the System/390 used the 31-bit addressing rather than the full 32-bit one. In 2000, IBM announced the first System/Z that uses 64-bit addressing and data. The System/Z has been using processor chips since 2008. Since 2007, they have been trying to combine the Z architecture with the POWER architecture in a single chip, but so far without success. So far, only Intel has managed to combine CISC and RISC in one chip – the Pentium Pro in 1995 became the first chip of this kind.

The IBM System/370-145 with the 2401 tape unit and a printer instead of display, 1971. It may be surprising that there was no display in this very expensive system, given that TV sets were mass-produced for over 20 years

By the way, some computer authorities believe that the first serial personal computer was the IBM 5100, first produced in 1975, which could execute instructions of the System/360 via a hardware emulator. Its improved versions were produced until the mid-80's. Although most likely the first was rather the Datapoint 2200 or Wang 2200. For the price (around $10000) the first personal computers were clearly not for home use. Surprisingly, the IBM 5100 with Basic was several times slower than the first cheap personal computers, such as the Apple II.

The IBM 5100, a version with APL support

With the advent of the IBM PC architecture which as it turned out for decades determined the mainstream of development for computing, IBM tried in 1983 to combine almost all the best computer technologies in a single product. The PC XT/370 combines elements of the System/370, the IBM PC XT, the Motorola 68000 and Intel 8087. This XT/370 could be used as a smart terminal for working with a mainframe, like a regular IBM XT, or to directly run mainframe software. Interestingly, the XT/370 had support for using virtual memory, which required two 68000s. In 1984, with the introduction of the PC AT, an improved version of the personal mainframe AT/370 was released, which in mainframe mode was about twice as fast as the XT/370. This is not the end of the history of such systems, since the 90s, similar products were produced that corresponded to the System/390. As far as I know, such hardware has not been made for the System/Z.

IBM for its mainframes uses a rather unusual business model for today, in which computers are not sold, but are leased. One of the advantages of this model is that it guarantees a constant upgrade of equipment, outdated equipment is automatically replaced with updated one of the corresponding class. This model also has drawbacks. For example, a particularly noticeable disadvantage for those who are engaged in the history of computer technology is that computers that have served their time are almost always disposed of and therefore they are almost impossible to find in any museum.

It was amazing to find a live IBM 4361 in LCM! However there is reason to believe that this may not be real hardware. For some reason, museum visitors have no access to this computer. It is also unclear what model is allegedly represented there, and this is despite the fact that other computers in the museum are identified very accurately. Among the IBM 4361, three models 3, 4, and 5 are known, with model 3 appearing later than models 4 and 5. But the system in the museum is self-identified as model 1. It is possible that this is a prototype. However, the museum staff did not answer a direct question about providing help with identification, and this is despite the fact that they answer other and often rather more complex questions quite quickly. Some features of code execution timings give grounds although not absolutely firm to assume that the emulator is most likely connected to the network. This summer (2020), the museum, due to the Covid-19 pandemic has overtly switched to an emulator. There is still a chance to get to the real mainframes via the HNET network, but I have not yet succeeded.

But whatever it is, everyone can connect and try to work in much the same kind of environment that high-paid professionals were experiencing from the mid-70s. The prices were such that today it is difficult to believe. For example, an hour of computer time cost more than $20 in the mid-80s and you still had to pay extra for disk space! True, we are talking about the mainframe operating time, not the terminal through which the work was going. This is why for example, when editing a text, someone could pay for only 5 minutes of mainframe time during an hour of actual work. The prices for the mainframes themselves were also fantastic. For example, Intel employees recall that in the early 80's they were given only one mainframe to work with. Its performance was 10 MIPS, and its price was about 20 million dollars then, and the dollar value was three times greater than today! Although this price seems to be some kind of exaggeration. Typical mainframe prices were in the hundreds of thousands of dollars. The cheapest mainframes could cost tens of thousands, but the most expensive up to several million. For example, the Cray-1 supercomputer cost 8 million in 1978, the IBM 4361 model 4 mainframe – about 130 thousand in 1985, the IBM 3081 model QX mainframe – more than 6 million in the same 1985, and the IBM 4321 mini-mainframe – more than 80 thousand in 1982. More details on prices can be found here or here. Now, even a tablet-sized Raspberry Pi at a few dollars can easily produce over 1000 MIPS. By the way on a Raspberry Pi or almost any modern computer, you can run the IBM/370 emulator, which will work much faster than any IBM system from the 80's or even 90's. However, the emulator needs to be configured and not all useful programs for the IBM/370 are freely available, so free access to a well-tuned system is often the best way to work with the mainframe. Surprisingly, such access programs as the 3270 terminal emulators are available even on mobile phones! By the way, I managed to set up my VM/CMS system on the Hercules emulator and deal with the file transfer, but it took at least a week of my time.

The Hercules emulator can emulate the later IBM/390 and IBM/Z, but this is much more difficult to do due to problems with the software license. As an illustration of such problems, I will cite a well-known case when IBM insisted on removing the Emulation section from an already published book! In modern electronic versions of this book this section does not exist; it can only be found in the printed edition or as a separate file on sites dedicated to free software. The fact is that emulation on regular personal computers since the early 2000s could be noticeably faster than execution on much more expensive mainframes. IBM therefore had to change the licenses for its software so that it can only be legally used on hardware purchased from IBM. Of course, it is not that emulators are faster than the best mainframes, they are only demonstrating a markedly better ratio of performance to cost.

One way to work with the Z or 390 systems is to install Linux into an emulator of these systems. At least Ubuntu and Debian distributions are available for the 390 and Z. Here it is worth noting that the rapid development of Linux is largely due to significant support from IBM. In particular, IBM invested a billion dollars in Linux development in 2001.

Let's now look at the features of the machine language of systems compatible with the 360. The basic assembler of such systems is called BAL – Basic Assembly Language. Surprisingly, if you believe the rumors about IBM, an assembler is still one of the main working programming languages there.

The assembler of the mainframes in question has a number of archaic features that were no longer present in most well-known architectures that appeared later. For example, this is about the fact that BAL mnemonics determine the type of arguments. This, by the way, was inherited by assemblers for the Power and PowerPC. Consider the x86 assembly instructions MOV EAX,EBX and MOV EAX,address as an example – both use the mnemonic MOV. For BAL for such cases different mnemonics LR and L are used in the commands LR 0,1 and L 0,address respectively. However, similar different mnemonics allow using numbers for naming registers, although usually macros R0, R1, ... instead of numbers 0, 1, ... are the first thing that is defined in macro packages for programming convenience. Another archaism is the use of label jumps in conditional compilation constructs, although in my humble opinion this is sometimes more convenient than block structures. But the most famous archaism is the use of EBCDIC encoding to work with symbolic information. In this strange, even for yesterday, encoding the letters of the English alphabet are not encoded in succession, for example, the letter I has code 201, and the next one J – 209! This encoding comes from technologies for working with punched cards that originated in the pre-computer era. The System/360 also supports ASCII encoding in hardware, but in its ancient and long-forgotten version, where a character for the digit 0 has code 80, not 48 as it is now. As far as I know, ASCII was better not even trying to use on IBM mainframes. ASCII support was removed already in the System/370, but introduced at a new level in the System/390. Some BAL mnemonics are striking in their super-brevity and even non-mnemonicity, for example, N means AND, O – OR, X – XOR, A – ADD, S – SUBTRACT, M – MULTIPLY, ...

BAL allows you to work with three basic data types: binary, decimal, and real numbers. The System/390 uses another special type to work with real numbers. Some Z systems may also use completely unique type of data such as decimal real numbers. Instructions for working with each type form a special and rather isolated class of instructions. Generally, with very few exceptions all the IBM 360-compatible systems support decimal and real arithmetic instructions. As you know, for the x86 or 68k architectures, support for working with real numbers did not appear immediately and was an optional choice for a long time, and working with decimal numbers was not something completely separate from binary arithmetic – it was rather an extension.

For working with real and binary numbers, different sets of registers are used, and for working with decimal numbers, registers are not used at all. The System/370 provides 16 32-bit general purpose registers for binary integers, with the command counter being part of the processor status word. There is no separate stack, it can be organized using any register – this is how the stack was later implemented in the ARM. The subroutine call is also made as in the ARM, via a link register. All registers are almost always interchangeable, exceptions are very rare. If you compare a system of BAL binary registers with the competitive VAX architecture, you will notice that VAX has one register less. This is true for the ARM as well.

The structure of operands in the instructions will seem quite familiar to those who know the x86 assembler. For binary numbers, operands have a "register-register" or "register-memory" structure, and for the latter case both 32-bit and sign-extensible 16-bit values can be loaded from memory. For example, the analog of the x86 ADD EAX,EBX instruction is AR 0,1, ADD EAX,address – A 0,address, ADD EAX,address[EBX] – A 0,address(1), ADD EAX,address[EBX][EDX] – A 0,address(1,3). However, the System/360 and even its later development do not know how to work with scaling, for example, ADD EAX,address[8*EBX] can not be written in BAL with a single instruction. On the other hand, the x86 cannot usually do a signed extension for a 16-bit number, for example, BAL instruction AH 0,address, which means to take a 16-bit signed number from memory and add it to the content of register 0, will require two commands for its implementation on the x86.

A rare peculiarity of BAL is the presence of separate instructions for addition and subtraction for signed and unsigned numbers, and unsigned operations in BAL are called logical. This oddity is caused by the lack of flags in the 360 architecture that are usual to most other architectures. Instead, only two bits are used which are set differently by different instructions! The only difference between signed and unsigned operations is that they set the two status bits mentioned differently. For signed operations, you can find out whether the result was zero, whether it was positive or negative, whether an overflow occurred, and for unsigned operations, whether the result was zero and whether there was a carry or borrow. Conditional jump instructions allow you to consider all 16 subsets of cases that are possible when using 2 bits. Due to this unusual for today way of working with operation flags, conditional jump instructions are difficult to quickly understand. Although BAL extensions usually add fairly easy-to-understand macros for conditional jumps, where you do not need to parse each of the 4 bits. Here, to be fair, we can note that there are separate commands for signed and unsigned addition and subtraction, for example, in the MIPS architecture, where there are no flags at all!

Another rare peculiarity is in separate instructions for signed and unsigned comparisons. I've met similar ones not only on the MIPS, but also on the PowerPC and MicroBlaze. In the latter, by the way, the carry is the only supported flag.

On systems compatible with the IBM 360, there are no arithmetic operations with the carry flag, so if we need to work with binary numbers, for example, in an 128-bit addition, we must check the carry flag after performing the first 32-bit operations and use a jump if necessary to organize this addition. This is of course, very cumbersome compared to the x86, ARM, 68k, or even 6502, but on the much later MIPS it is even more cumbersome. The normal working with the carry was realized only in the System/Z.

There are no cyclic shifts in BAL, but non-cyclic shifts like in x86, can be either single or double. However, BAL has separate shift instructions for unsigned and signed numbers, only the latter sets status flags. In addition, signed shifts (even to the left) do not change the highest bit! For shifts and some other cases, the standard addressing is used in an unusual way. The number of the shifts is set by index addressing – two registers and an offset! However, one of the registers is simply ignored and the sum of the second register and the offset is used, there is no memory access at all. Rotations were only added to the System/390.

Among the register loading commands in BAL, there are most likely unique ones. You may load the absolute value of an integer, a negation of this value, or a number with an inverted sign – something remotely similar I've only encountered in the ARM architecture. Here it is worth noting that the entire architecture of the System/360 tends to sign arithmetic, and unsigned arithmetic in this architecture is rather secondary. BAL originally did not have unsigned division and multiplication, they were only added to the System/390. When loading the register, the flags as in the x86 do not change, but there is a special loading instruction that sets the flags – this again resembles the ARM, where the setting of flags can be controlled.

All signed arithmetic operations, including shifts, can throw an overflow exception. Whether to generate an exception or not is determined by a special mask flag in the status register. Interestingly, binary division and multiplication in BAL do not affect flags at all – here you can remember the x86, where division only spoils the flags.

Bitwise logical operations in BAL are represented by the usual set of operations AND, OR, excluding OR, i.e. there is no separate operation NOT. Logical operations can have not only a "register-register" or "register-memory" structure, but also "memory-constant" or "memory-memory" – the latter addressing method is similar to that used for decimal numbers. The memory-constant addressing is only possible for working with bytes. Obviously for logical operations, as opposed to arithmetic, the use of 16-bit numbers is not possible. For the memory-memory addressing you can work with data up to 256 bytes long! It turns out that we have three types of data for working with logical operations: bytes, 32-bit words, byte sequences – and special instructions for each of these types, which is rather somehow non-universal.

Logical operations in BAL are adjacent to operations for transferring of bytes. In addition to the usual transfer of up to 256 bytes with a single instruction, there are also unique instructions for transferring of byte tetrads. You may only send the higher or lower halves of bytes and the other halves retain their value after copying! Such strange operations are needed to support BAL features when working with character and decimal information. There are also transfer and comparison instructions that appeared for the System/370 for up to over 16 million bytes at a time, which may be interrupted. Surprisingly, also less than fast commands for working with blocks up to 256 bytes long can't be interrupted, which can create an unpleasant delay in response to an interrupt request. You can also use transfer commands to fill memory with a specified byte. In addition to transferring data from memory to memory, you can also set an individual byte to a specified value. Obviously, commands for byte transferring, if we don't consider new instructions for the 390 and Z, were implemented as more advanced for the x86.

BAL also allows us to load from memory or unload into memory a sequential series of registers, and you can take the series through register 0. For such commands, standard addressing is again used in a completely unusual way. For example, an instruction with two operands SM 1,10(3,4) means to save registers from the 1st to the 3rd at the address in register 4 with an offset of 10! For ARM or 68k architectures, such commands are implemented more powerfully, but in POWER and PowerPC architectures, such commands have become less powerful on the contrary.

You can load not only a value at a specified address into a register, but also the address itself, as in the LEA instructions for the x86 or 68k. This feature also allows you to directly load a required constant into a register, although its maximum value cannot be greater than 4095. It also allows you to increment the register by no more than 4095. However the decrement of the register can be done only by 1. Both an increment and a decrement work with addressing, so they do not change the flags. It is possible to load individual bytes and even groups of bytes from a word in memory into a register, for example only the first and third bytes – such a tricky operation for all other 32-bit architectures known to me is possible only through a series of 4 instructions. Likewise, BAL allows parts of a register to be stored into memory.

A number of BAL instructions are very specialized – in other architectures, they are implemented as a series of simpler instructions. For example, the TR instruction allows you to recode a character string – one argument specifies the string to recode, and the other specifies the address of the conversion table. A special variant of this instruction, TRT, can be used to scan a given string and skip empty characters – this is the functionality of the standard C strpos() call. The ED and EDMK instructions are absolutely unique – they have the functionality of a primitive version of sprintf()! However almost all string operations are limited to the maximum string length, no more than 255 bytes, that significantly reduces their power.

In BAL, it was rather difficult to work with 16-bit unsigned values due to the lack of rotation or SWAP-type commands. Since the System/390, the situation with this problem became better. Some BAL instructions are almost deprecated, for example, the MVO nibble shift instruction has been supplanted by the more convenient SRP. For block transfers and comparisons, it is better to use the new instructions, although because they use a different addressing method this may not be optimal in some rare cases.

Examples of the four basic BAL addressing modes have already been given. There is also a fifth one for three-address instructions. There are no modes such as those typical for the VAX, 68k, PDP-11 or even 6809 with auto-increment or decrement in BAL. There are also no double indirect memory access modes available for the VAX, 68020, or PDP-11, and of course BAL, unlike the VAX or PDP-11 assemblers, is completely non-orthogonal. BAL is the closest to the x86 and ARM assemblers – the most successful modern architectures. The order of operands in BAL is right-to-left, just like in Intel's x86 assembler or ARM assembler, and thus not the same as in the VAX, PDP-11, or 68k. Although the byte order for data in BAL is from higher to lower (MSB), which is different from the x86, ARM, or VAX, but corresponds to the accepted for the 68k or MIPS.

Operations with decimal numbers are implemented in BAL only via memory-to-memory addressing. Decimal numbers can be set in memory chunks up to 16 bytes long, which allows you to use numbers with up to 31 decimal digits. This corresponds to the precision of a 107-bit binary number. Thus, only the most modern programming systems that use integer binary numbers can work with larger values than the System/360 almost 60 years ago! Of course, you can use binary arithmetic to implement arbitrarily large numbers, but for some reason there were no popular programming languages that support numbers larger than the ancient System/360 until recently. Even now, support for 128-bit integers for the x86 is usually by only unofficial extensions, such as for GCC.

Decimal numbers in BAL are represented uniquely, they must keep a sign – this is not the case for the VAX, x86, 68k, and maybe others. Moreover, the sign is stored in the last byte of the number representation! For decimal numbers, BAL has direct support for all basic operations: addition, subtraction, multiplication, and even division – this is also not available in any other architecture I know. In addition, BAL also provides instructions for copying, comparing, and shifting decimal numbers. The MVO and SRP instructions mentioned above are intended for such shifts. Operations can only be performed on packed decimal numbers, but they must be unpacked to print them, and to represent unpacked digits in BAL, you also need a sign, which in this case does not take up space since it is placed in the high tetrad which requires special work with this tetrad before printing. It is strange that the operations for packing and unpacking can only work with no more than 16 bytes of the unpacked decimal number, which allows you to use only no more than 15-digit numbers with them. This unpleasant problem can be solved by using ED or EDMK instructions for unpacking, but packing a large unpacked number has to be done through a not very simple sequence of instructions. New instructions have been added to the System/390 to solve this problem. Unexpectedly, the instructions for packing and unpacking work with any binary data, not just decimal data.

BAL has special unique instructions that allow you to convert one binary number to a packed decimal number at a time and vice versa. For a decimal number, these instructions always allocate 8 bytes, i.e. 15 digits and a sign. However, a 32-bit register is only sufficient to represent a signed number corresponding to a 9-digit decimal number, so not every decimal number in the correct BAL format can be converted to binary with a single command. For the System/Z, there are extended instructions for such transformations.

Jump instructions in BAL distinguish in that they are, as a rule, paired – the jump address can be set both explicitly and by the contents of a register – in many other architectures, jumps on the contents of the register are available only for unconditional jumps. By the way, there are no pure unconditional jumps in BAL, such jumps are implemented by setting an always true condition, which is similar to the ARM architecture. Conditional branching in BAL, as noted, has a unique syntax. Consider for example the BT 9,address instruction, which means to jump if conditions 0 and 3 are encountered, but conditions after different commands mean different things. For example after signed addition, these conditions mean "the result is 0 or an overflow occurred", and after an unsigned addition, "the result is 0 and there was no carry, or the result is not 0 and there was a carry". Despite the clumsiness and some redundancy, one cannot but admit that such a system for working with conditions for jumps is probably the most flexible of all known. The nine in the command from the example is used in binary representation, 1001 that is, it determines the bit numbers – such a system to encode all combinations of conditions with 4 bits is also used in the ARM. In addition to conditional jumps in BAL, there are also jumps by the counter with a decrement, approximately the same as in assemblers for the Z80, x86, 68k, PDP-11, ... But BAL also has two completely unique instructions for jumps, which depending on the number of a register operand can be three or four addresses! In these unique commands, two registers are added together and the resulting sum is compared with the contents of the other register, and the result of this comparison determines whether to jump or not. These unusual instructions are believed to be useful for working with jump tables.

As already noted, the call of subroutines in BAL is implemented without using a stack, by simply storing the return address in a register. However, the BAL instructions for such calls, one of which is also called BAL, store not only the return address, but also part of the status register, in particular the condition flags, the length of the current instruction, and even a mask for optional exceptions, such as integer or decimal overflow – this mask was mentioned above. This unusual extended information storage is due to the fact that the program counter in the mainframe architecture is the highest part of the machine word, and instructions for calling subroutines mechanically preserve this highest part. There are no special commands for returning from subroutines, you need to use a normal jump to the address in the register. In the System/390, new commands for calling and even returning from subroutines were added in connection with the transition to a 31-bit architecture. These new instructions allow you to flexibly use codes that are executed in different modes in the same program.

To quickly call single-command routines, BAL has a unique EX instruction that executes another instruction at the specified address and proceeds to the next command. The EX statement can modify the called instruction, which allows you to use any desired register in this instruction, or set parameters for mass byte transfer. A similar instruction, but simpler, is also in the TMS9900 instruction set.

Initially, BAL did not have relative, relocatable jumps like the Z80 or x86. They were only added to the System/390.

The SPM, TM, TS, STCK, and STPT instructions are also somewhat unusual. The first one allows you to set all operation flags and the optional exception mask with a single command. The TM instruction allows you to check a group of bits and determine three cases: all zeros, all ones, and a mix of zeros and ones. A similar check cannot be performed by a single command in other architectures. However, TM only works with individual bytes in memory. TS is used when working with multiple processors – there is a similar command for the 68k. The STCK instruction reads the value of an external (!) timer, and the STPT instruction reads the value of an internal timer embedded in the processor circuit. Strangely, the STPT command is privileged, but the STCK is not.

It is also worth mentioning the CS and CSS instructions, which are designed to support multiprocessing. They are implemented for the System/370, i.e. they became available since the early 70s. In the x86, the CS analog, the CMPXCHG instruction, was implemented no earlier than 1987, and the CDS analog, the CMPXCHG8B instruction, was implemented only in 1994!

The STIDP processor identification instruction is introduced from the System/370. It is privileged and not very informative. For the x86, analogous command is significantly more powerful. Here you can also notice that the IBM 4361 in LCM allows any user to execute STIDP. This is obviously an exception-triggered emulation.

Four BAL addressing modes specify two operands for the instruction, and the fifth mode specifies three-operand commands. However, ignoring some of the information allows you to have one-operand commands, and the use of implicit information allows you to have four-operand commands. When used in addressing, register 0 has a special role: it is simply ignored there – this allows you to skip the base and index when calculating the address. All BAL instructions take up strictly 2, 4, or 6 bytes. It is similar to the 68000 or PDP-11 but not to the x86, VAX or ARM.

Several more addressing modes were added to the System/390, bringing their number to 18. The number of instructions has also increased significantly. Among new instructions there are even ones that support working with Unicode – this is still not available for x86! Among the new instructions of the System/390, there are other unique ones. Several more addressing modes were added to the System/Z, and the total number of instructions for the modern Z is very large and probably even more than the number of commands for the modern x86-64!

In the systems 360, 370 and 390, the offset when accessing data in memory, as in the ARM, is 12-bit, i.e. no more than 4095, which is not very convenient, in large programs there may be a lack of registers for basing. In x86, this offset in real mode is 16-bit, which, of course, is much more convenient. But the System/Z has added support for addressing with a 20-bit offset, which is of course even better. Although, it is worth noting that in protected mode of the x86 or on the 68020, the offset can be 32-bit. As already noted, in systems before the 390, as in the ARM, it was not possible to use large constants when working with registers. The x86 architecture was much more flexible here. Therefore, when using an assembler with the systems 360 or 370 it was often mandatory to use literals or pseudo-constants, which is somewhat slower.

Systems that are compatible with the IBM/360 have always had good performance. My experiments with the LCM's 4361-1, in particular in the project of calculating the number π using the spigot-algorithm showed quite good timings. The 4361-1 instructions work almost without delay, this is like the ARM or other modern processors. However, due to the somewhat awkward command system inherited from the 60s, in particular due to the lack of division by a 16-bit divider, the result for the efficiency of the processor electronics was at the level of the 80186. This is about 80% slower than the result shown by the then best computer from the VAX family, the model 785. However, the mainframe in LCM is clearly not the best among the IBM mainframes available then. It is also worth noting that the mainframes used channels, specialized processors that made I/O very fast, much faster than it was for most other computers of those years. Although the idea of the channel was driven to the point of absurdity in some situations, for example, text terminals also worked through channels, making it impossible to use popular text editors on mainframes! It is known, for example, that since the 80s, when PCs became widely used, mainframe users sometimes preferred to transfer texts to PCs and edit them there!

As a student, I happened to work with a Soviet IBM/370 clone, the ES-1045, in 1987 through the batch mode, and in 1989 through the dialog mode. For batch mode, we had to prepare punch cards. At that time, I was already using a home computer and therefore the use of archaic punch cards did not leave the best impression. However the dialog mode was not bad, but it often broke down when there were a large number of users. Therefore, some students came to work at 4 am! Since then, I have not been able to deal with mainframes anymore. I only recently decided to use emulation to sort out this landmark technology for the history of computers.

The cloning of the IBM/360 was very popular. Such clones were made in England, Germany, Japan, and by other companies in the United States. In the USSR, this cloning took on a very dramatic connotation. For the sake of this cloning, almost all domestic developments in the field of IT, some of which were quite promising, were curtailed. In particular, a branch of Ural computers was cut off, about which a known computer specialist, Charles Simonyi later spoke with warmth. The BESM-10 project was also closed, although the machines of the previous BESM-6 class were comparable to the IBM/360 in performance. The development of low-cost promising MIR computers, one of which was even bought by IBM in 1967, received low priority. Also, for the sake of this cloning, an almost concluded contract with ICL was cancelled, perhaps with this contract, the British IT industry would have acquired a new dynamic and would not have fallen into decline. Only the Elbrus supercomputers, perhaps because of their connection to the defense industry, survived the "clone invasion" which Dijkstra called the greatest US victory in the Cold war. It is particularly odd that the USSR refused any direct production contacts with the numerous mainframe manufacturers who could have significantly improved the quality of the computers produced in the SU.

As people who worked with mainframes in the USSR recall, domestic clones were distinguished by extremely low reliability and required constant attention from the maintenance staff. While the original American IBM mainframes were among the most reliable computers of their time. Sometimes more than a dozen (typically above 5) kilograms of precious metals (gold, platinum, palladium and silver) were put into a Soviet clone, but this did not help to fix the reliability problem. Because of such a large number of highly liquid assets, it is difficult to imagine that a working domestic clone could be preserved anywhere. It is not difficult to assume that if they simply sold all such assets used in the production of a Soviet mainframe, then with the proceeds, they could buy a reliable American or English mainframe.

As one of the main reasons in favor of the need to switch to cloning, the argument was used that the Soviet economy is not able to produce the necessary software. And cloning made it possible to use the programs for free, or to put it bluntly, it was recommended programs to just steal! However, practice showed that some domestic programs for mainframes turned out to be very successful and they almost completely replaced the branded ones. As one of the examples of such programs, I can name the Primus dialog monitor. Of course, it should also be noted that really branded programs were not obtained for absolutely free at all – they required quite a lot of effort to adapt, in particular, localization. With the latter, sometimes there was an overkill, for example, when programming language could replace all keywords with their translation – for example, it was with Cobol.

Interestingly, the chief architect of the IBM/360 left IBM and founded Amdahl Corporation, which for more than two decades specialized in the production of systems compatible with IBM mainframes and at the same time slightly superior in performance and reliability at lower prices. As a result, due to major changes in the mainframe market, Amdahl, like ICL, became part of the Japanese company Fujitsu.

There were other mainframes, besides computers of the IBM/360 architecture. In the 60s, American mainframe manufacturers unofficially received a sonorous name of Snow White and the Seven Dwarfs. It’s probably not hard to guess that Snow White was IBM. Mainframes of original architectures were also produced in other countries. The British architecture ICL 1900 is especially worth mentioning. The mainframe era, whose decline began in the mid-1970s, had a somewhat unexpected effect: almost all computer technology was almost entirely concentrated in the US, with other countries, including the USSR, effectively abandoning the development of their processors, architectures and basic software. This is very paradoxical, because later the production of computers and programs for them became quite inexpensive. Perhaps this paradox happened because IBM's mainframe efforts created such a tremendous driving force that it swept up almost all the resources of other countries' national economies in an effort to keep pace with IBM. In the U.S. itself, mainframes merely showed businesses that computer technology was becoming a source for market competition.

As is already written above, I managed to set up a working configuration for VM/CMS 6. However, it turned out that the XEDIT editor is not freely available, and plain EDIT is too peculiar and inconvenient, so I have to edit my texts on the host. It has also been discovered that the standard program for transferring files from a terminal emulator to a mainframe and back has been unavailable, this required the use of virtual punch cards for such transfers. Another unpleasant surprise was found in connection with debugging. The DEBUG command does not support step-by-step execution, while this feature was even available for the DDT debugger for the 8080 processor! It is also surprising, though less critical, that DEBUG is not able to do disassembly which was often embedded even in the simplest monitors of the 70s processors. Under CMS, long line wrapping and line break control characters are not supported at the low level! Therefore, when printing from an assembly language program, you need to format lines manually so that they do not disappear beyond the right edge of the screen, and also take care of filling the last line with finishing spaces. The lack of automatic vertical scrolling is also unusual.

Those who want to work with mainframes for the first time should keep in mind that mainframes are a huge "ecosystem", where many familiar concepts may have a different interpretation. For example, there is no simple concept of file. One of the key attributes of a file is its record size, there is nothing like this for Linux or Microsoft Windows files. The files themselves differ in the methods of accessing them and this was written about and may even be being written about in non-thin books. It is also unusual that in CMS the disk name is written at the end of the full file name, and the name, extension, and disk are separated by spaces, and the disk name itself is called the file mode for some reason. I would also like to study more about multitasking MVS, as far as I know they never used it in the USSR.

In general, it is somewhat unexpected that some well-known operating systems that were used on very expensive computers did not support working with file directories, which equated them with the very first and primitive OS for microcomputers, such as CP/M or Commodore DOS. This is why CMS was sometimes called CP/M for mainframes. Surprisingly, as far as I know, support for directories in CMS has not been introduced, although the last release of the system dates back to 2018. For some reason, working with directories for expensive computers before the 80's was often poorly supported. For example, there was no such support in DEC RT-11 and even one of the best OS for the PDP-11, RSX-11, only supported two-level directories. The most popular IBM operating system until the 2000s was MVS (1974) and even here the directories were only partially made, as in Apple MFS (1984). Although in Unix (1973), MS-DOS (since 1983), or even 8-bit Apple ProDOS (1983), this was all right from the start. The most advanced work with files was offered in VAX/VMS (1977), where in addition to directories, there is even built-in support for file versioning. Instead of directories in CMS, as in RT-11, you can use disk images mounted as logical drives.

Interestingly, Rexx the scripting language for CMS, MVS and some other IBM operating systems, in its reduced form became the language of batch files for the Commodore Amiga. It may have been some kind of compensation, as Commodore was a firm supporter of the IBM PC architecture since 1984.

Mainframe software usually uses only two colors. Color-enabled terminals were used relatively rarely and therefore there were few programs using colors. There are also few programs with dynamic graphics: frequent screen updates lead to noticeable unpleasant flickering.

A dynamic demo running on the IBM 4381 emulator in LCM, an emulator of terminal 3270-3 is used

In conclusion, I can't help but express my admiration for IBM's technologies. They have always been distinguished by their unique originality and high level quality. I would especially like to note the very good quality of documentation, which is publicly available even for modern systems. IBM demonstrates tremendous dynamism in technology development, even though it is one of the largest companies in the world. In terms of the number of employees, it is almost equal to Microsoft, Google and Intel put together!

The theme of mainframes is really vast. Of course I was able to write only a small part of what this theme might contain. I would be very grateful for any clarification and additional information.