Superfast Mandelbrot benchmark results

Visualizations of the 8th generations of this Mandelbrot are available here.

All systems use the same algorithm that calculates 128x256 dots of Mandelbrot. They also use almost the same ways to visualize it. Every dot is encoded with 4 bits. So all systems have to output exactly 16 KB of graphical data for every picture. The algorithm implementations for all systems are very optimized, graphics implemented via direct access to hardware but it is not so good optimized as the main Mandelbrot computational code. The next systems have been tested.
#SystemYearOS
The Sanyo MPC-25FD and Sony HB-F700P are typical MSX2 computers. The Panasonic FS-A1GT is an MSX turboR computer.

The Mandelbrot algorithm uses the next parameters for the first 16 visualizations.
# iterations x-interval y-interval
1 7 [-4.64, 4.29] [-4.5, 4.5]
2 8 [-4.09, 3.60] [-3.75, 3.75]
3 9 [-3.69, 3] [-3.25, 3.25]
4 10[-3.21, 2.5] [-2.75, 2.75]
5 11[-2.89, 2.07] [-2.5, 2.5]
6 12[-2.77, 1.70] [-2, 2]
7 13[-2.83, 1.38] [-1.5, 1.5]
8 14[-2.60, 1.12] [-1, 1]
9 15[-2.34, 0.89] [-0.75, 0.75]
1016[-2.03, 0.70] [-0.75, 0.75]
1117[-1.95, 0.53] [-0.75, 0.75]
1218[-2.10, 0.38] [-0.75, 0.75]
1319[-2.22, 0.26] [-0.75, 0.75]
1420[-2.33, 0.15] [-0.75, 0.75]
1521[-2.43, 0.05] [-0.75, 0.75]
1622[-2.51, -0.03] [-0.75, 0.75]

All systems also provide timing information. The next table shows timings for drawing of pictures #1-16. The algorithm uses 16-bit signed arithmetic, so 16/32-bit systems have an advantage. Value "Gr%" presents the part of total time that is spent on the graphic output. The number in parentheses after @ is the approximated effective CPU frequency.

The color writing mode for the Corvette writes data for all 3 graphic planes simultaneously, so it actually updates 24 KB of video RAM on each screen in this mode.

Writing modes 0 and 2 were used for the EGA. Both produce the same picture. I can think that for the VGA results will be the same.

The results for the Amiga 500 with fast RAM are only about 1% faster so I haven't included them.

Some systems (the Apple IIgs, Atari ST, MSX, Geneve 9640, CoCo 3) have to use a slower (rotatated images) way to draw images because their graphics incapable to show 256 raster lines like most other computers. I can estimate that this makes these systems graphic performance up to 50% lower.

The Amstrad CPC/PCW uses faster main Mandelbrot computational code than the MSX or Commodore 128/Z80 because the Amstrad may set a memory layout that allows us to use a faster way to work with the look-up table.

The BK and Geneve take advantage of their CPU's ability to ignore the low bit of the address when working with words. Other architectures have to use special instructions to clear this bit. This gives the BK and Geneve a speed boost of about 20%. It is possible to slightly reduce the accuracy of the calculations and to make the value of the bit insignificant, but the first Mandelbrot program of this series was for the BK and therefore the use of the bit remained the same.

The BBC Master Turbo uses OSWORD 6 to draw pixels and this is not the fastest way possibe.

The Commodore +4 results can be about 5% faster if we turn on the NTSC mode during vertical retrace time.

Qemulator appears to be about 7% faster than real hardware. So the QL results are adjusted according results provided by mk79. It seems that pcem is also about 7% faster than the real IBM PC XT/AT but I have only indirect information about this so I didn't apply any correction to data from pcem.

The next table contains approximate values of efficiency reciprocals (ER) for the tested CPUs at effecive frequencies. These values are calculated by multiplication of the total time of the Mandelbrot calculations for the 16 first Mandelbrot pictures by the effective CPU frequency. The ER value reflects the efficiency of CPU electronics, it gives the reciprocal of the CPU performance at 1 MHz.
Rank Processor Year ER
The actual internal frequency of the R800 is 28.4 MHz, so its ER might be set to . The TMS9995 uses 3 MHz to work with memory, so its ER is probably 4 times better, .

It is also interesting to compare the code density for this task. Two values are provided for this: the total program size and the size of the main loop. The results are sorted by the size of the main loop.
Rank Platform CPU Program
size
Main loop size
bytesLOC
1 БK T-11 902 32 13
2 Geneve 9640, rotated TMS9995 1720 36 14
3 Geneve 9640, interlaced 1702
4 Atari ST, mono 68000 1090 42 18
5Atari ST, rotated 1209
6 Macintosh 1337
7 QL 68008 2241
8 Amiga 68000
68020
2385
9 Pro-380, rotated J-11 1219 46 17
10Pro-380 1221
11IBM PC, mode 28088
80286
919 20
12 IBM PC, mode 01019
13Tandy Coco 3 6809 1105 52 25
146309 1109 54 24
15Amstrad CPC, 16c Z80 1040 58 41
16Amstrad CPC, 4c 1064
17Amstrad PCW 1702
18MSX2, rotated 1432 63 44
19MSX2, interlaced 1481
20Commodore 128 1601
21ArchimedesARM2 1349 64 16
22Apple IIgs 65816 1362 73 39
23Corvette, color8080 1121 81 63
24Corvette, planar 1162
25Vector-06C 1178
26BBC Micro, 16c 6502 1376 131 81
27BBC Micro, 4c 1408
28BBC Master Turbo, 16c 1422
29Commodore 128 1648
30Plus4, interlaced 1768
31Plus4, flashing 1807

The QL code is a Basic program which generates and uses ML code.

Sources for all these programs are available at github. You also can download their executables there.

If anybody finds a way to speed up these implementations of Mandelbrot calculations, or just creates new implementations, please inform me and I should update this page. Send your reports to zliztwr@yzandex.ru but remove all z in the address. Reports may be also sent directly to the project github-page.

Many thanks to the people who helped: stasmas, reddie, mk79, BigEd, RichTW, MMS, stanp, leegleason, Hunta, ... and the staff of Yandex Museum.