Superfast Mandelbrot benchmark results

Visualizations of the 8th generations of this Mandelbrot are available here.

All systems use the same algorithm that calculates 128x256 dots of Mandelbrot. They also use almost the same ways to visualize it. Every dot is encoded with 4 bits. So all systems have to output exactly 16 KB of graphical data for every picture. Detailed information about graphic modes used is in the next table (the number in parentheses is the effective CPU frequency).
# System Video mode Window Comments
1БK0010, K1801BM1@3(2) MHz512x256, monochrome512x256x1 4x1 texture/pixel
2"256x256, 4 colors256x256x4 2x1 texture/pixel
3БK0011, K1801BM1@4(2.3) MHz256x256x4 ""
4Amstrad CPC 4, Z80@4(3.2) MHz"""
5"128x256, 16 colors128x256x16
6BBC Micro, 6502@2 MHz256x256, 4 colors256x256x4 2x1 texture/pixel
7"128x256, 16 colors128x256x16
8Commodore +4 (PAL), 6502@1.7 (1) MHz160x256, 4 colors128x256x4 2 flashing dots/pixel
9IBM PC (EGA), 8088@4.8 MHz640x350, 16 colors128x256x16
10Amiga 500 (PAL), 68000@7.1 MHz320x256, 16 colors128x256x16
11Acorn Archimedes 305, ARM2@8 MHz""
12Sinclair QL, 68008@7.5 MHz256x256, 16 colors128x256x16
13Corvette, 8080@2.5 MHz512x256, 8 colors256x256x4 2x1 texture/pixel
14""256x256x8 "

The Mandelbrot algorithm uses the next parameters for the first 16 visualizations.
# iterations x-interval y-interval
1 7 [-4.64062, 4.28906] [-4.5, 4.5]
2 8 [-4.09375, 3.5957] [-3.75, 3.75]
3 9 [-3.69336, 3.00391] [-3.25, 3.25]
4 10[-3.20508, 2.5] [-2.75, 2.75]
5 11[-2.89258, 2.06836] [-2.5, 2.5]
6 12[-2.76562, 1.69922] [-2, 2]
7 13[-2.83203, 1.38477] [-1.5, 1.5]
8 14[-2.60352, 1.11719] [-1, 1]
9 15[-2.33594, 0.888672] [-0.75, 0.75]
1016[-2.0332, 0.695312] [-0.75, 0.75]
1117[-1.95117, 0.529297] [-0.75, 0.75]
1218[-2.09766, 0.382812] [-0.75, 0.75]
1319[-2.22266, 0.257812] [-0.75, 0.75]
1420[-2.33203, 0.148438] [-0.75, 0.75]
1521[-2.42578, 0.0546875] [-0.75, 0.75]
1622[-2.50586, -0.0253906] [-0.75, 0.75]

All systems also provide timing information. The next table shows timings for drawing of pictures #1-16.
# C+4 БK
0010
CPC 16
colors
CPC 4
colors
BBC
Micro
16 colors
BBC
Micro
4 colors
Corvette
8 colors
color mode
Corvette
4 colors
planar mode
v2v1
1 4.72 3.63 2.55 2.76 2.24 2.35 5.00 5.38
2 5.42 3.89 2.86 3.06 2.60 2.70 5.54 5.92
3 6.27 4.20 3.22 3.43 3.03 3.13 6.22 6.60
4 7.66 4.70 3.82 4.03 3.73 3.84 7.32 7.66
5 9.08 5.22 4.44 4.65 4.46 4.56 8.44 8.82
611.64 6.15 5.54 5.75 5.75 5.8610.4610.84
714.92 7.34 6.95 7.16 7.41 7.5113.0413.42
821.68 9.81 9.8910.1010.8510.9618.4218.78
929.1312.5113.1013.3114.6414.7424.3024.66
1034.8214.5915.5615.7717.5217.6328.7229.16
1138.9416.0917.3417.5519.6219.7232.0632.42
1239.4416.2717.5617.7719.8719.9832.4632.82
1339.4015.8917.1117.3219.3419.4431.6232.02
1437.2215.4616.6016.8118.7418.8430.7031.06
1536.2015.0416.0916.3018.1318.2429.7630.12
1634.8014.5915.5615.7717.5117.6228.7829.16
total 371.34165.38168.19171.54185.44187.12312.84318.84

The color writing mode for the Corvette writes data for all 3 graphic planes simultaneously, so it actually updates 24 KB of video RAM on each screen in this mode.

The next table shows the timings for faster computers.
# БK
0011
Amiga
500
IBM PC
5160 EGA
mode 0
IBM PC
5160 EGA
mode 2
Acorn
Archimedes
305
QL QL
external
RAM
v2 v1
1 3.40 0.94 1.87 1.59 0.12 1.94 1.14
2 3.63 1.00 1.98 1.76 0.14 2.16 1.26
3 3.91 1.06 2.08 1.93 0.17 2.39 1.38
4 4.37 1.20 2.36 2.14 0.21 2.61 1.50
5 4.84 1.30 2.64 2.42 0.25 3.18 1.83
6 5.68 1.52 3.07 2.85 0.32 3.91 2.23
7 6.76 1.82 3.68 3.46 0.41 4.83 2.77
8 8.99 2.36 4.89 4.72 0.59 6.75 3.86
911.45 3.00 6.20 6.04 0.80 8.82 5.03
1013.32 3.48 7.25 7.08 0.9610.42 5.96
1114.69 3.82 8.08 7.80 1.0711.60 6.63
1214.85 3.86 8.13 7.91 1.0911.75 6.71
1314.51 3.78 7.96 7.69 1.0611.45 6.54
1414.12 3.70 7.69 7.53 1.0311.13 6.37
1513.73 3.58 7.52 7.30 0.9910.79 6.16
1613.33 3.48 7.30 7.09 0.9610.45 5.97
total151.5839.9082.7079.3110.17114.1765.35

Writing modes 0 and 2 were used for the EGA. Both produce the same picture. I can think that for the VGA results will be the same. The results for the Amiga 500 with fast RAM are only about 1% faster so I haven't included them.

Emulators were used to get these results.

Machine Emulator
БK0010 BK2010 v0.5
БK0011M GID v3.10
Commodore+4 plus4emu v1.2.10
Amstrad CPCep128emu v2.0.11
BBC Master b-em v-ec63538
Amiga 500 FS-UAE 3.0.5
IBM PC XT EGA pcem 17
Acorn Archimedes 305 Arculator v2.1
Sinclair QL QemuLator 3.4
Corvette emu80 v4.0.396

Qemulator appears to be about 7% faster than real hardware. So the unexpanded QL results are adjusted with this 7%. Real QL results have been provided by mk79. A lot of thanks to him. The data is also used for the QL with external memory. It seems that pcem is also about 7% faster than the real IBM PC but I have only indirect information about this so I didn't apply any correction to data from pcem.

The Commodore +4 results can be about 5% faster if we turn on the NTSC mode during vertical retrace time.

The next table contains approximate values of efficiency reciprocals (ER) for the tested CPUs. These values are calculated by multiplication of the total time of the calculation of 16 first Mandelbrot pictures by the effective CPU frequency. The ER value reflects the efficiency of CPU electronics, it gives the reciprocal of the CPU performance at 1 MHz.
Rank Processor The effective
frequency
ER
1 ARM2 8 82
2 68000 7.1 282
3 K1801BM1 2 330
4 6502 2 371
5 8088 4.8 378
6 68008 7.5 490
7 Z80 3.2 538
8 8080 2.5 782

It is also interesting to compare the code density for this task. Two values are provided for this: the total program size and the size of the main loop.
Rank Platform CPU Program size Main loop size
bytesLOC
1 БK T-11 806 32 13
2 IBM PC 8088 879 46 20
3 QL 68008 2029 " "
4 Amiga 68000 2384 " "
5 Amstrad, 16c Z80 979 54 37
6 Amstrad, 4c Z80 1016 " "
7 Corvette, color8080 1057 84 66
8 Corvette, planar8080 1102 " "
9 ArchimedesARM2 1342 100 25
10 BBC Micro, 16c 6502 1266 131 81
11 BBC Micro, 4c 6502 1298 " "
12 Plus4 6502 1684 ""

The QL code consists of two Basic programs which generate and use ML code.

Sources for all these programs are available at github. You also can download their executables there.

If anybody finds a way to speed up these implementations of Mandelbrot calculations, or just creates new implementations, please inform me and I should update this page. Send your reports to zliztwr@yzandex.ru but remove all z from the address.