In my ongoing quest to teach myself graphics, I sat down on Friday to work through Shirley, Black, and Hollash's Ray Tracing in One Weekend.
I sit here on Monday without having finished it yet: perhaps Shirley et al were unaware of Hofstadter's Law.
Or perhaps it might have something to do with the book being written in C++, a language I have never before used and of which I know very little.
Luckily, I do know a bit of C, and it's written in a very C-like fashion, so understanding the code snippets hasn't been too difficult. I still don't know how to compile C++, though, which means I'm doing this in Rust instead.
I. Rust Types vs C++ Classes
Translating the code snippets from C++ to Rust isn't too tough either, and aside from remembering what the different keywords mean, the real challenge has been in going from C-like C++ to idiomatic Rust. That mostly means a bunch of structs with a whooooole lot of manual trait implementations.
Right at the beginning of the book, for example, Shirley and the Gang define a 3-dimensional vector class with a bunch of methods and overloading:
class vec3 { public: double e[3]; vec3() : e{0,0,0} {} vec3(double e0, double e1, double e2) : e{e0, e1, e2} {} double x() const { return e[0]; } double y() const { return e[1]; } double z() const { return e[2]; } ... double length() const { return std::sqrt(length_squared()); } double length_squared() const { return e[0]*e[0] + e[1]*e[1] + e[2]*e[2]; } };
It then also defines aliases for points and colors. In the simple ray-tracing implementation used in the book, basically everything boils down to the manipulation of 3-element floating point arrays. This makes it very easy to do something stupid, like add two points together instead of a point and a vector, or add a color to a vector.
using point3 = vec3; using color = vec3;
Color and Point inherit from Vec3, so any functionality we want vectors to have, Color and Point3 also have, and we can also define further methods for those subclasses!
This is great, if you like being wrong.
See, this doesn't actually prevent any of those illegal operations, it just allows you to see that you would be operating on two different classes... if you bother to check.
We’ll use the same class vec3 for colors, locations, directions, offsets, whatever. Some people don’t like this because it doesn’t prevent you from doing something silly, like subtracting a position from a color. They have a good point, but we’re going to always take the “less code” route when not obviously wrong. In spite of this, we do declare two aliases for vec3: point3 and color. Since these two types are just aliases for vec3, you won't get warnings if you pass a color to a function expecting a point3, and nothing is stopping you from adding a point3 to a color, but it makes the code a little bit easier to read and to understand.
That would be easy enough to do in Rust as well: just alias Point3 and Color from Vec3. But come on! This is the kind of thing Rust was designed to prevent, so let's take advantage of that!
The straightforward way to do this is to define three separate structs entirely, which just happen to have the same contents:
#[derive(Copy, Clone, Debug)] pub struct Vec3([f64; 3]); #[derive(Copy, Clone, Debug)] pub struct Point3([f64;3]); #[derive(Copy, Clone, Debug)] pub struct Color([f64;3]);
But now we don't get the benefit of inheritance! Do we just need to define all those repetitive, identical trait and method implementations separately each time? Not only would that be tedious, it would make it a lot more likely that we make a typo in one of them, or forget to update one struct as we develop things.
Fear not! Macros to the rescue!
macro_rules! impl_vec3_core { ($vec_type:ident) => { impl std::ops::Mul<f64> for $vec_type { type Output = Self; fn mul(self, rhs: f64) -> Self { Self([ self.0[0] * rhs, self.0[1] * rhs, self.0[2] * rhs, ]) } } impl std::ops::Div<f64> for $vec_type { type Output = Self; fn div(self, rhs: f64) -> Self { Self([ self.0[0] / rhs, self.0[1] / rhs, self.0[2] / rhs, ]) } } impl Neg for $vec_type { type Output = Self; fn neg(self) -> Self::Output { Self([-self.0[0], -self.0[1], -self.0[2]]) } } impl Index<usize> for $vec_type { type Output = f64; fn index(&self, index: usize) -> &Self::Output { &self.0[index] } } impl IndexMut<usize> for $vec_type { fn index_mut(&mut self, index: usize) -> &mut Self::Output { &mut self.0[index] } } impl $vec_type { pub fn new(x: f64, y: f64, z: f64) -> Self { Self([x, y, z]) } pub fn x(&self) -> f64 {self.0[0]} pub fn y(&self) -> f64 {self.0[1]} pub fn z(&self) -> f64 {self.0[2]} pub fn length_squared(&self) -> f64 { self.0[0] * self.0[0] + self.0[1] * self.0[1] + self.0[2] * self.0[2] } pub fn length(&self) -> f64 {self.length_squared().sqrt()} pub fn unit_vector(self) -> Self {self / self.length()} } }; } impl_vec3_core!(Point3); impl_vec3_core!(Color); impl_vec3_core!(Vec3);
By defining a core set of common implementations, we can just... implement them directly with macros. They're all identical, so there's no problem!
And unlike the C++ implementation, we have the flexibility to define things like operator overloads (here implemented using Trait impls) differently.
Why in the world would we want to implement these things differently for three data types that all amount to [f64;3] in trench coats?
Because as we noted before, some of the interactions between these data types (like adding a point to a color) make no sense, and should be illegal... but there are other interactions between data types which have defined meanings and should definitely be legal: adding a vector to a point, for example, is going to be the core of our tracer.
And there are even some interactions within data types which ought to be illegal, or have special behavior: there is no geometric meaning in adding two points together, for example, and we ought to prevent this at compile time.
But there is geometric meaning in subtracting one point from another. It's just that this operation doesn't produce another point... it produces a vector between the points.
In the C++ implementation, Shirley and the Tracers are just putting different coats of paint on otherwise-indistinguishable variables to help keep track of them. In the Rust implementation, we can define this behavior to the compiler directly and know that it will do that for us.
impl Add<Vec3> for Point3 { type Output = Self; fn add(self, rhs: Vec3) -> Self { Self([ self.0[0] + rhs.0[0], self.0[1] + rhs.0[1], self.0[2] + rhs.0[2], ]) } } impl Sub<Vec3> for Point3 { type Output = Self; fn sub(self, rhs: Vec3) -> Self { Self([ self.0[0] - rhs.0[0], self.0[1] - rhs.0[1], self.0[2] - rhs.0[2], ]) } } impl Sub<Point3> for Point3 { type Output = Vec3; fn sub(self, rhs: Point3) -> Vec3 { Vec3([ self.0[0] - rhs.0[0], self.0[1] - rhs.0[1], self.0[2] - rhs.0[2], ]) } }
Above, we ensure that subtracting two Point3s generates a Vec3, and Vec3 can be added and subtracted from Point3s... but we do not implement Add for two Point3, cutting off that potential error at (and more often, before) compile time.
II. Rejection vs Analytical Sphere Sampling
I'd been implementing the book's algorithms pretty much 1:1 from that point until section 9: Diffuse Materials.
Then we need to figure out how to manipulate a random vector so that we only get results that are on the surface of a hemisphere. There are analytical methods of doing this, but they are actually surprisingly complicated to understand, and quite a bit complicated to implement. Instead, we'll use what is typically the easiest algorithm: A rejection method. A rejection method works by repeatedly generating random samples until we produce a sample that meets the desired criteria. In other words, keep rejecting bad samples until you find a good one.
What?
I get wanting to keep things simple for an introductory book, but I refuse to believe that the analytical method of getting a random vector on the unit hemisphere is that complicated. We went through a multi-step derivation of the unit sphere a few chapters ago! This must be simpler.
It was not quite as simple as I had first expected, but I still consider myself vindicated.
The book's recommended algorithm is to generate a random vector within the unit cube (random x, y, and z in the range [-1, 1]), and then check if this generated vector lies within the unit sphere (norm <= 1).
If it is, convert the generated vector into a unit vector (ie, it lies on the surface of the unit sphere). Otherwise, generate a new vector in the unit cube and repeat.
My first objection: why not just project all the vectors to the unit sphere? After all, any vector outside the sphere but within the cube must intersect with the unit sphere in just one location.
Turns out, it's because this breaks uniform sampling: if we did this, we'd end up sampling disproportionately from the corners of the unit cube, and our bounces would look off.
The same happens with my second idea, generating x and y and then calculating a valid z value using x^2 + y^2 + z^2 = 1. This will generate two possible z values, one of which will be point in the right hemisphere.
This would also end up sampling unevenly, this time from the poles rather than the corners, and would also generate uneven-looking shadows.
Okay, approach number 3: sample uniformly from two orthogonal arcs on the surface of the unit hemisphere, then find the vector that synthesizes them.
This idea has legs, and I spend a little bit refining it:
pub fn random_unit_vector() -> Vec3 { let r1: f64 = rng().random_range(0.0..=1.0); let r2: f64 = rng().random_range(0.0..=1.0); let phi = 2.0 * std::f64::consts::PI * r1; let cos_theta = r2; let sin_theta = (1.0 - cos_theta * cos_theta).sqrt(); let x = phi.cos() * sin_theta; let y = phi.sin() * sin_theta; let z = cos_theta; Vec3::new(x, y, z) }
A bit involved? Sure. But not excessively so, in my opinion. In exchange for a little more complication up front, we buy a lot of performance improvements. Not only are we down to two random numbers per try instead of three, but we know that any two random numbers we generate will, absolutely guaranteed, produce a valid unit vector.
The rejection algorithm, on the other hand, will generate three random numbers to start with, check the vector's length, potentially reject it, and loop again, generating three more, checking the length, potentially rejecting again, and so on. We have the (small, but at this scale not negligible) overhead of a loop, the uncertainty about how many iterations any given call is going to take, and we're torturing the poor branch predictor while we're at it.
At this point, I also remembered that the default rand::Rng random number generator probably isn't the right approach. Rng is cryptographically secure (at least, secure enough for most people's purposes), which is nice, but that security carries extra memory and runtime costs.
For applications that just need a quick, non-secure number generator, SmallRng is the better option: we can also pass &mut SmallRng around as an argument, to avoid the overhead of repeatedly initializing it.
pub fn random_unit_vector_smallrng(smallrng: &mut SmallRng) -> Vec3 { let r1: f64 = smallrng.random_range(-1.0..=1.0); let r2: f64 = smallrng.random_range(-1.0..=1.0); let phi = 2.0 * std::f64::consts::PI * r1; let cos_theta = r2; let sin_theta = (1.0 - cos_theta * cos_theta).sqrt(); let x = phi.cos() * sin_theta; let y = phi.sin() * sin_theta; let z = cos_theta; Vec3::new(x, y, z) }
Nice. I ran this through criterion to check exactly how much faster this was:
Random Vector Generation/Analytic Sampling with SmallRng
time: [10.056 ns 10.075 ns 10.093 ns]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
Random Vector Generation/Rejection Method with SmallRng
time: [17.055 ns 17.078 ns 17.105 ns]
Found 19 outliers among 100 measurements (19.00%)
5 (5.00%) low severe
1 (1.00%) low mild
6 (6.00%) high mild
7 (7.00%) high severe
Random Vector Generation/Analytic Sampling with thread_rng
time: [29.573 ns 29.597 ns 29.624 ns]
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
Random Vector Generation/Rejection Method with thread_rng
time: [66.943 ns 67.037 ns 67.141 ns]
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high mild
Or to summarize:
Rng + rejection: 67.037ns 1.0x
Rng + analytical: 29.597ns 2.27x
SmallRng + rejection: 17.078ns 3.92x
SmallRng + analytical: 10.075ns 6.65x
Over 6x performance difference between the best and worst implementations! Most of that is driven by the choice of SmallRng over Rng, but the choice of analytical rather than rejection sampling is no joke!
I'll probably upload all this to github once I've gone through the whole book. Until then, see ya.
Stellar article! I’m a total layperson, but I learned a lot