PA2: Ray tracing faster
Learn how to ray trace faster using an acceleration data structure.
In this assignment we will primarily be referring to chapters of Ray Tracing - The Next Week. Make sure to read through it.
In the previous assignment, each time you traced a ray the code would perform an intersection test against every single primitive in your scene. This is a reasonable solution for scenes with a few primitives, but it quickly becomes impractical in scenes that contain meshes with thousands to billions of triangles. This is because with naive ray tracing the cost of tracing a ray grows linearly with the number of primitives. Acceleration structures allow us to keep this cost sub-linear, providing tremendous speed improvements for typical scenes.
In this assignment you will implement an acceleration structure called a bounding volume hierarchy (BVH), specifically a variant that uses axis-aligned bounding boxes and hence called a bounding box hierarchy (BBH). The base code for this assignment is an addition to the base code from the previous assignment. You can get the new code by doing a
git pull in your
darts-basecode working directory.
Triangles are the default modeling primitives in graphics and hence essential for rendering a variety of scenes. The darts basecode already provides the ability to load OBJ triangle meshes with potentially thousands or millions of triangles, so you can find nice models online and load them in your renderer. Currently, however, darts doesn't know how to intersect a ray with a single triangle. In this task, you will implement a ray-triangle intersection routine.
Open up the
src/triangle.cpp file and take a look at
single_. This method is passed three vertex positions and (optionally) three vertex normals and should test the intersection of the ray against the triangle.
There are many established ray-triangle intersection algorithms out there, and you are free to pick one you like. You can check the PBRT book for inspiration (the Shirley mini-books don't cover this). Personally I recommend the Möller-Trumbore test, which is simple and widely used. A Google search should bring up a lot of resources for this algorithm.
To help you test your implementation, we wrote a small testing feature into darts. When run with a test JSON file, instead of rendering a scene, darts will run a test to help you decide whether your implementation is correct or not. The implementation is available in
include/ and the file(s) in
src/tests/. We've added these already to your
CMakeLists.txt file. We've also made two small changes in
src/, to actually run these tests.
Once you get the new code compiled, run
darts scenes/assignment2/test_intersection.json to test your intersection code.
Once your tests pass, you can test it on a slightly more complex example: Run the
darts application from the terminal on the
assignment2/simple-geometry-no-bvh.json scene. This renders a scene with a few simple triangle meshes, which should look like the image shown (the amount of noise in the image will depend on the number of samples used, ours uses more than the default value in the scene).
This render takes quite a while! It's about time that we fix this using a BBH.
If this scene doesn't render correctly, then the long render times can become a problem when you're trying to debug. It might help you to reduce the
"samples" and the camera
"resolution" fields in the JSON scene file to render smaller and noisier images more quickly while you're debugging.
Chapter 3 of the book discusses how to build a bounding volume hierarchy so you can intersect rays quickly against a large collection of triangles or other geometric primitives.
We already provide you an implementation of an axis-aligned bounding
Box class in
<darts/. Our implementation is similar to what is described in the book, but make sure to read through and understand also how our code works.
Now open up
src/bbh.cpp and take a look at
BBHNode::. The former traverses the hierarchy and tests for intersections with the ray, and the latter is the constructor that builds the hierarchy given a list of primitives.
We largely follow the design in the book, so it should be relatively easy to implement these methods if you follow the book. Note that C++11 allows us to do certain things easier than the code style used in the book (e.g. sorting a lists of things based on a custom criterion). Take a look at the hints in the base code for more information.
Once you think your code is correct, you can test your implementation on the test scenes we provide. Run the
darts application from the terminal on the
assignment2/simple-geometry.json scene. This is the same scene as before, but this time using a BBH. You should produce the exact same image, but much faster. Also take a look at the Ray-Triangle intersection tests statistic that the code outputs after rendering; this should be around 20X lower compared to the image you rendered in Task 1.
If your image is correct, you can go ahead and stress test your code on some more complex scenes. Run the darts application on all the scenes in the
scenes/assignment2 folder. You should get images similar to the reference images below (just noisier):
The BBH presented in the book is simple to implement, but has a lot of potential for improvement.
For your final task, it is your job to optimize your BBH and measure the improvements you've made. You should implement at least two ideas for improving your BBH, and measure the reduction in render time (while your computer is not doing anything else) as well as the reduction in the average number of intersection tests, which darts will report when it finishes rendering an image.
To help get you started, here is a list of potential ideas you could implement to improve your code:
- If you followed the book, then you will pick the splitting axis randomly while you construct the BBH. There are a couple problems with this: 1) using a global random number generator like this becomes problematic once you consider improving performance via parallelism, and 2) you can construct much better BBHs with just a few small changes. Think about how you could cleverly pick your splitting axis to improve performance. For example, you could split the axis along which the bounding box is the longest, or you could pick a fixed axis at each level of recursion (e.g. x-axis at the root, y-axis in the direct children of the root, z-axis in the grand-children, x-axis in the great-grand children etc.). Which method performs the best?
- The book recommend using
std::sort()to sort the primitives along the chosen axis. However, we don't actually need to fully sort the primitives along the axis. All we need is to identify the primitive which should be the median element (the primitive in the middle of the list if it were sorted), and put all primitives left of this median (along the chosen axis) in the first half of the list, and all those right of the median in the second half of the list. This theoretically should require less work because the left and right sublists themselves don't need to be sorted – we just need to identify which primitives should go into each sublist. C++ has a ready-made algorithm for this called
std::nth_element(). The good news is that its runtime complexity is linear in the number of elements in the list, while sorting is log-linear. Figure out how to use this instead of
std::sort()by reading the documentation. Once you have it working, compare the time it takes to build the BBH. The resulting tree should still be exactly the same as before, but it should build considerably faster.
- The book splits the list of primitives so that the left and right child contain the same number or primitives. This is called the median or equal split, and it produces a balanced binary tree. From your data structures class you learned that balanced trees are often what you want when you're searching in a list – it leads to good worst-case runtime. In ray tracing, however, balanced trees can perform poorly if the primitives are not distributed uniformly, but cluster on one side. In that case, you could to better by splitting a node at the spatial midpoint or middle. Pick the center of the bounding box of the primitives of the node, and partition the primitives so that all those whose centroids lie to the left of the midpoint (along the splitting axis) go in one child, and all those to the right go in the other child (you will likely end up with a different number of primitives for each child). Again, C++ has a ready-made function for this, suitably named
std::partition()Allow the user to choose between the
"middle"strategies by specifying a
"split_method"field in the JSON file – our basecode already reads this field and stores it in the BBH class. Does this improve things? Under which circumstances?
- Following the book, each
BBHNodealways has two children. This is fine for internal nodes, but you can lose some efficiency in the leaf nodes, which might benefit from storing more than just two primitives. Modify your implementation such that leaf nodes store a list of primitives instead of only two children (there are different ways to do this). Whenever the number primitives is less than a certain threshold during BBH node construction, turn it into a leaf node; otherwise, an internal node. Play around with this threshold. Does this make things faster? For which thresholds and which scenes?
- Building the BBH can still be pretty time consuming for very big scenes (sometimes it can even take longer then rendering the scene!). There is a lot of opportunity to further accelerate this through thread-level parallelism. Note that once we split a node, we recursively build both the left and right children. These two computations are independent and could be performed in parallel. Make your BBH construction multi-threaded. Darts already includes the nanothread library which provides a minimal cross-platform interface for task parallelism.
These are just examples. If you can come up with your own ideas that are on a similar scope as the suggestions, feel free to implement those ideas instead.
On top of the improvements in the previous task, implement the surface area heuristic (SAH) in your BBH construction.
You can check the BVH section of the PBRT textbook for more information on how it works. The basic idea is to split the list of primitives not simply in the middle or at the median, but split it at a point that minimizes an approximate expression for the cost of intersecting the tree.
We’ve created a scene
scenes/leaderboard.json for you to use to evaluate the performance of your BBH improvements. For this scene we will also have a leaderboard so you can see how your implementation compares to our reference solution and to that of your peers. Publically choosing to include your results in the leaderboard is optional, but we still require that you generate your results for the leaderboard.
To generate your results for the leaderboard:
- Make sure that your instance of darts is built
- If you want your results to be displayed publically on the website, run
python leaderboard.py public, otherwise run
- A file named
leaderboard.datshould have been generated. Commit and push the
.datfile. It will contain your performance metrics as well as your generated image.
- We will update the leaderboard approximately once ever 5ish minutes with your updated results!
- If for any reason this process does not work for you or the scripts break, contact Zack Misso.
Try to beat our implementation on the “figure of merit”. The best submissions are in the running for some extra credit!
In your report, make sure to include:
- Images of all provided scenes as produced by your raytracer: run
dartson all scenes in
scenes/assignment2, and add the images to your report.
- Paste the output of running
- The ray traversal stats (average ray-primitive intersection tests) of your BBH on the different scenes provided (don't change the view location or other scene parameters).