Realtime Fractal Zooming with CUDA

Apr 27, 2015

Image of Mandelbrot set from the project

1. Visualizing Mandelbrot set in realtime

Mandelbrot set can be visualized using OpenGL and cuda together as above. If this is rendered using CPU¹, its FPS is as below:

There is no parallelization with the base CPU model. Procedural double loop and heavy calculation is very slow that causes the rendering to become uninteractive. This is, however, very a good subject for the GPU parallel computation. Because, it has a small amount of branchings and rare memory access. Each pixel will have independency for computation in lots of cases.

2. Using GPU for realtime visualization

Converting double loop to a single loop is simple in case of rendering mandelbrot set, since PBO has a linear memory space. Similar to parallel histogram computation, the distrbution of computation can be easily done with GPU model.

Using proper amount of threads boosts the speed as following:

The left shows a kernel with 49x1024 threads, which boosts about 333 times compared to the base model. With proper thread-block dimension, 15% more improvement can be acheived.

3. n-stream rendering

The above kernel in section 2 renders about 42 pixels per thread ², where each thread's memory access will be about 168 bytes apart with no global memory access. Even for writing, it is better to coalesce the memory access for a better cache use. Instead of reindexing the writing process, changing the calculation level to per-pixel base can help for speeding up the calculation with a parallelization. Using 42 streams with 376x128 threads can speed up even more as below:

This is about 650 times faster than the base CPU model. The use of shared memory could not speed up the process since the GPU model didn't depend on previously calculated values. The realtime visualization of Mandelbrot set was a problem for how to distribute the computation than how to optimize the kernel.