Realtime Fractal Zooming with CUDA

Image of Mandelbrot set from the project

1. Visualizing Mandelbrot set in realtime

Mandelbrot set can be visualized using OpenGL and cuda together as above. If this is rendered using CPU1, its FPS is as below:

CPU usage

There is no parallelization with the base CPU model. Procedural double loop and heavy calculation is very slow that causes the rendering to become uninteractive. This is, however, very a good subject for the GPU parallel computation. Because, it has a small amount of branchings and rare memory access. Each pixel will have independency for computation in lots of cases.

2. Using GPU for realtime visualization

Converting double loop to a single loop is simple in case of rendering mandelbrot set, since PBO has a linear memory space. Similar to parallel histogram computation, the distrbution of computation can be easily done with GPU model.

Using proper amount of threads boosts the speed as following:

GPU usage

The left shows a kernel with 49x1024 threads, which boosts about 333 times compared to the base model. With proper thread-block dimension, 15% more improvement can be acheived.

3. n-stream rendering

The above kernel in section 2 renders about 42 pixels per thread 2, where each thread's memory access will be about 168 bytes apart with no global memory access. Even for writing, it is better to coalesce the memory access for a better cache use. Instead of reindexing the writing process, changing the calculation level to per-pixel base can help for speeding up the calculation with a parallelization. Using 42 streams with 376x128 threads can speed up even more as below:

GPU speedup with streams

This is about 650 times faster than the base CPU model. The use of shared memory could not speed up the process since the GPU model didn't depend on previously calculated values. The realtime visualization of Mandelbrot set was a problem for how to distribute the computation than how to optimize the kernel.

4. Realtime Visualization

The GPU model can be visualized in realtime as below:

Acknowledgement

This project was advised by Prof. Klaus Mueller.

Footnotes


  1. using AMD 8300 in 2015 ↩︎

  2. using Nvidia 780 gtx ↩︎

Avatar
Byung Il Choi

My research interests include Deep Learning, Information Retreival, and Computer Graphics.