Superfast Mandelbrot benchmark results

Visualizations of the 8th generations of this Mandelbrot are available here.

All systems use the same algorithm that calculates 128x256 dots of Mandelbrot. They also use almost the same ways to visualize it. Every dot is encoded with 4 bits. So all systems have to output exactly 16 KB of graphical data for every picture. The algorithm implementations for all systems are very optimized, graphics implemented via direct access to hardware but it is not so good optimized as the main Mandelbrot computational code. The next systems have been tested.

# System Year OS
The Sanyo MPC-25FD and Sony HB-F700P are typical MSX2 computers. The Panasonic FS-A1GT is an MSX turboR computer.

The Mandelbrot algorithm uses the next parameters for the first 16 visualizations.

# iterations x-interval y-interval
1 7 [-4.64, 4.29] [-4.5, 4.5]
2 8 [-4.09, 3.60] [-3.75, 3.75]
3 9 [-3.69, 3] [-3.25, 3.25]
4 10 [-3.21, 2.5] [-2.75, 2.75]
5 11 [-2.89, 2.07] [-2.5, 2.5]
6 12 [-2.77, 1.70] [-2, 2]
7 13 [-2.83, 1.38] [-1.5, 1.5]
8 14 [-2.60, 1.12] [-1, 1]
9 15 [-2.34, 0.89] [-0.75, 0.75]
10 16 [-2.03, 0.70] [-0.75, 0.75]
11 17 [-1.95, 0.53] [-0.75, 0.75]
12 18 [-2.10, 0.38] [-0.75, 0.75]
13 19 [-2.22, 0.26] [-0.75, 0.75]
14 20 [-2.33, 0.15] [-0.75, 0.75]
15 21 [-2.43, 0.05] [-0.75, 0.75]
16 22 [-2.51, -0.03] [-0.75, 0.75]

#	iterations	x-interval	y-interval
1	7	[-4.64, 4.29]	[-4.5, 4.5]
2	8	[-4.09, 3.60]	[-3.75, 3.75]
3	9	[-3.69, 3]	[-3.25, 3.25]
4	10	[-3.21, 2.5]	[-2.75, 2.75]
5	11	[-2.89, 2.07]	[-2.5, 2.5]
6	12	[-2.77, 1.70]	[-2, 2]
7	13	[-2.83, 1.38]	[-1.5, 1.5]
8	14	[-2.60, 1.12]	[-1, 1]
9	15	[-2.34, 0.89]	[-0.75, 0.75]
10	16	[-2.03, 0.70]	[-0.75, 0.75]
11	17	[-1.95, 0.53]	[-0.75, 0.75]
12	18	[-2.10, 0.38]	[-0.75, 0.75]
13	19	[-2.22, 0.26]	[-0.75, 0.75]
14	20	[-2.33, 0.15]	[-0.75, 0.75]
15	21	[-2.43, 0.05]	[-0.75, 0.75]
16	22	[-2.51, -0.03]	[-0.75, 0.75]

All systems also provide timing information. The next table shows timings for drawing of pictures #1-16. The algorithm uses 16-bit signed arithmetic, so 16/32-bit systems have an advantage. Value "Gr%" presents the part of total time that is spent on the graphic output. The number in parentheses after @ is the approximated effective CPU frequency.

The color writing mode for the Corvette writes data for all 3 graphic planes simultaneously, so it actually updates 24 KB of video RAM on each screen in this mode.

Writing modes 0 and 2 were used for the EGA. Both produce the same picture. I can think that for the VGA results will be the same.

The results for the Amiga 500 with fast RAM are only about 1% faster so I haven't included them.

Some systems (the Apple IIgs, Atari ST, MSX, Geneve 9640, CoCo 3) have to use a slower (rotatated images) way to draw images because their graphics incapable to show 256 raster lines like most other computers. I can estimate that this makes these systems graphic performance up to 50% lower.

The Amstrad CPC/PCW uses faster main Mandelbrot computational code than the MSX or Commodore 128/Z80 because the Amstrad may set a memory layout that allows us to use a faster way to work with the look-up table.

The BK and Geneve take advantage of their CPU's ability to ignore the low bit of the address when working with words. Other architectures have to use special instructions to clear this bit. This gives the BK and Geneve a speed boost of about 20%. It is possible to slightly reduce the accuracy of the calculations and to make the value of the bit insignificant, but the first Mandelbrot program of this series was for the BK and therefore the use of the bit remained the same.

The BBC Master Turbo uses OSWORD 6 to draw pixels and this is not the fastest way possibe.

The Commodore +4 results can be about 5% faster if we turn on the NTSC mode during vertical retrace time.

Qemulator appears to be about 7% faster than real hardware. So the QL results are adjusted according results provided by mk79. It seems that pcem is also about 7% faster than the real IBM PC XT/AT but I have only indirect information about this so I didn't apply any correction to data from pcem.

The next table contains approximate values of efficiency reciprocals (ER) for the tested CPUs at effecive frequencies. These values are calculated by multiplication of the total time of the Mandelbrot calculations for the 16 first Mandelbrot pictures by the effective CPU frequency. The ER value reflects the efficiency of CPU electronics, it gives the reciprocal of the CPU performance at 1 MHz.

Rank Processor Year ER
The actual internal frequency of the R800 is 28.4 MHz, so its ER might be set to . The TMS9995 uses 3 MHz to work with memory, so its ER is probably 4 times better, .

It is also interesting to compare the code density for this task. Two values are provided for this: the total program size and the size of the main loop. The results are sorted by the size of the main loop.

Rank Platform CPU Program
size Main loop size
bytes LOC
1 БK T-11 902 32 13
2 Geneve 9640, rotated TMS9995 1720 36 14
3 Geneve 9640, interlaced 1702
4 Atari ST, mono 68000 1090 42 18
5 Atari ST, rotated 1209
6 Macintosh 1337
7 QL 68008 2241
8 Amiga 68000
68020 2385
9 Pro-380, rotated J-11 1219 46 17
10 Pro-380 1221
11 IBM PC, mode 2 8088
80286 919 20
12 IBM PC, mode 0 1019
13 Tandy Coco 3 6809 1105 52 25
14 6309 1109 54 24
15 Amstrad CPC, 16c Z80 1040 58 41
16 Amstrad CPC, 4c 1064
17 Amstrad PCW 1702
18 MSX2, rotated 1432 63 44
19 MSX2, interlaced 1481
20 Commodore 128 1601
21 Archimedes ARM2 1349 64 16
22 Apple IIgs 65816 1362 73 39
23 Corvette, color 8080 1121 81 63
24 Corvette, planar 1162
25 Vector-06C 1178
26 BBC Micro, 16c 6502 1376 131 81
27 BBC Micro, 4c 1408
28 BBC Master Turbo, 16c 1422
29 Commodore 128 1648
30 Plus4, interlaced 1768
31 Plus4, flashing 1807

Rank	Platform	CPU	Program size	Main loop size
bytes	LOC
1	БK	T-11	902	32	13
2	Geneve 9640, rotated	TMS9995	1720	36	14
3	Geneve 9640, interlaced	1702
4	Atari ST, mono	68000	1090	42	18
5	Atari ST, rotated	1209
6	Macintosh	1337
7	QL	68008	2241
8	Amiga	68000 68020	2385
9	Pro-380, rotated	J-11	1219	46	17
10	Pro-380	1221
11	IBM PC, mode 2	8088 80286	919	20
12	IBM PC, mode 0	1019
13	Tandy Coco 3	6809	1105	52	25
14	6309	1109	54	24
15	Amstrad CPC, 16c	Z80	1040	58	41
16	Amstrad CPC, 4c	1064
17	Amstrad PCW	1702
18	MSX2, rotated	1432	63	44
19	MSX2, interlaced	1481
20	Commodore 128	1601
21	Archimedes	ARM2	1349	64	16
22	Apple IIgs	65816	1362	73	39
23	Corvette, color	8080	1121	81	63
24	Corvette, planar	1162
25	Vector-06C	1178
26	BBC Micro, 16c	6502	1376	131	81
27	BBC Micro, 4c	1408
28	BBC Master Turbo, 16c	1422
29	Commodore 128	1648
30	Plus4, interlaced	1768
31	Plus4, flashing	1807

The QL code is a Basic program which generates and uses ML code.

Sources for all these programs are available at github. You also can download their executables there.

If anybody finds a way to speed up these implementations of Mandelbrot calculations, or just creates new implementations, please inform me and I should update this page. Send your reports to zliztwr@yzandex.ru but remove all z in the address. Reports may be also sent directly to the project github-page.

Many thanks to the people who helped: stasmas, reddie, mk79, BigEd, RichTW, MMS, stanp, leegleason, Hunta, ... and the staff of Yandex Museum.