Sparse Voxel Octree on GPU

This is the work for my Master’s thesis for which I received distinction resulting in a class I honours.

The concept is using a sparse voxel octree instead of the traditional triangle mesh as the scene geometry. This enables us to use ray-casting for rendering in runtime. This is achieved by using the GPU’s (in this case NVIDIA’s) multiprocessing capabilities. However the implementation uses the common OpenGL (and GLSL) rather than a multi-purpose platform like CUDA.
Rendering Voxels
Since the 3D programs do not create a volume-based geometry (at least not one usable by this algorithm) the program has 2 parts.
First a GLSL shader renders the scene to a special structure in graphics memory. This structure is made of voxels instead of pixels, meaning the scene is rendered into special kinds of volumes, instead of being rasterized to pixels. It is in form of an octree, meaning the largest node (the scene itself is divided into 8 smaller cubes and they are divided into 8 cubes in turn and so on and the whole structure is in form of a tree structure. And finally, it is sparse, meaning that empty cubes have null pointers which save the space in the memory.

Part of the loop in fragment shader which voxelizes the triangle mesh:

nodeindex=0;//starting at root node
while (tester>2)//leaf node here is in fact 2x2x2 cube equivalent to a single 3x3x3 brick
{
    tester		= tester >>1;//half value for halving the volume
    offspring	= octree[nodeindex].children;
    if ( offspring == 0)//node is not divided yet
    {
        //positive means someone else has taken it already
        //if it is zero by getting it, it will be positive and a fork of actions happen
        int lockstate = atomicAdd(mutices+nodeindex,1);
        if (lockstate != 0)//mutex already taken by others
        {
            if (lockstate < 0)
            {
                lockstate = atomicAdd(mutices+nodeindex, -1);
            }
            //add to queue and exit
            int queueind = atomicAdd(pilot->queueindex,1);

            uvec3 xyz = (x | (i<<31), y | (j<<31), z | (k<<31) );
            vec3 rgb;
            pilot->queue[queueind].rgb = rgb;
            pilot->queue[queueind].xyz = xyz;


            return;
        }
        else//this is the first thread which attempts to divide the node
        {
            //...
        }
    }

    //children now available
    nextnode = 0;
    if ((x+i) & tester) nextnode = nextnode + 1;
    if ((y+j) & tester) nextnode = nextnode + 2;
    if ((z+k) & tester) nextnode = nextnode + 4;
    //so, on next loop this particular child is used as parent
    nodeindex = offspring+nextnode;

    //at the end we are either ready to start further division
    //or at n-1st level which means we are ready to write to brick
}//while (traversing all the way down to leaf)

//write to the brick
tempind = ((x&1) + 1 -  2*i)*1+((y&1) + 1 - 2*j)*3+((z&1)+ 1 - 2*k)*9;
brick[nodeindex].sub27[tempind].xyz=rgb_f.xyz;
brick[nodeindex].sub27[tempind].w=1.0;
           

Second part of the program happens during the runtime render. After the first GLSL shader generates the octree, a pointer to the memory structure is used by the second GLSL program. This program casts rays from the viewpoint to the scene and based on the cubes it intersects it collects the corresponding pixel in the final rendered image.

Part of the loop in fragment shader which traverses the octree using a parametrized ray to gather the pixel:

    while (tester>2 )
{
    offspring = octree[nodeindex].children;
    if (offspring ==0)
    {
        skp++;
        break;
    }
    level = level -1;
    tester = tester>>1;
    nextnode = 0;
    if ((intvp.x & tester) > 0) nextnode = nextnode+1;
    if ((intvp.y & tester) > 0) nextnode = nextnode+2;
    if ((intvp.z & tester) > 0) nextnode = nextnode+4;
    //so on next loop this particular child is used as parent
    nodeindex = offspring+nextnode;

}

if (tester == 2)//small cube
{
    bc = brick[nodeindex];
    for (p=0; p<1; p++)
    {
        //find the 8 indices, interpolate, gather and move
        //if exiting the brick break
        //...
        bi = ((intvp.x) & 1) + ((intvp.y) & 1)*3 +((intvp.z) & 1)*9;
        for (i=0; i<2; i++)
        {
            //inner loops of 3 increments for j and 9 increments for k
            contrib.x = contrib.x + bc.sub27&#91;bi+i+j+k&#93;.x * bc.sub27&#91;bi+i+j+k&#93;.w;
            //same for y and z
            contrib.w = contrib.w + bc.sub27&#91;bi+i+j+k&#93;.w;
            if (bc.sub27&#91;bi+i+j+k&#93;.w > float(0)) pixcnt++;
        }
        // use pixcnt to average the collected contrib values
        gathered = contrib;
    }
    if (gathered.w > 0)  break;

    //small cube? small move
    vp = cubemove(vp, 0);

}//small cube
else
{
    //big cube? big move!
    vp = cubemove(vp, level);
}

As said before, concept of the research was using an OpenGL implementation to achieve the goal. Similar research has been done before on a high-end NVIDIA graphic card on a fast Windows-based PC with collaboration of NVIDIA. The implementation here uses a common NVIDIA card on a linux-based average laptop. Nvidia’s latest driver for Linux at the time was still missing the trilinear interpolation functionality for textures which lead to the aliasing effect you see in the image above.

Render Complexity[This is a representation of complexity of render for various pixels on the screen. As you can see, the rays far away from the occupied parts of the scene are gathered faster.]

One interesting feature of voxel-based geometry is that semi-transparent objects can be achieved without special hacks.
Real cloud[A “real” cloud geometry generated by superimposing Perlin noise of various levels of granularity. This is also fully implemented in GLSL.]

You wanted to see my code so you are here!