Sharing everything I could understand about gradient noise

You've most likely heard about gradient noise through the name Perlin noise, which refers to one particular implementation with various CPU optimizations. Because it's an incredible tool for creative work, it's used virtually everywhere: visual effects, video games, procedural mathematical art, etc. While getting it right can sometimes be subtle, a "broken" implementation can still look good or interesting. After all, "it looks fine, and I'm an artist".

In order to gain a deeper and more meaningful understanding we will start studying the 1D version (a case often omitted in the literature), then slowly climb our way up in dimensions and complexity. We'll also work from a GPU perspective rather than a CPU-based one, hence all code snippets and visuals here are implemented in WebGL2/GLSL (hopefully without being too heavy on performance). They should run on most modern devices; let me know if you run into issues.

Before we begin, credit where it's due: most of the material here are nothing new. This article is the result of weeks of studying and experimenting with the maths from Inigo Quilez's incredible pages and other scattered resources over the Internet. But as rich and valuable these resources are, they sometimes move quickly over the details, assuming they're obvious. This post is an attempt to fill those gaps.

A welcoming wavy 1D gradient noise signal

Hashing function and pseudo-random values

At the most elementary level, we need a deterministic coordinate based pseudo-random system. More specifically, for any given integer coordinate we need a random value, and as uniformly distributed as possible. Something like:

\begin{aligned} h(-3) &= -0.006124 \\ h(-2) &= -0.996686 \\ h(-1) &= 0.200864 \\ h(0) &= -1.000000 \\ h(1) &= 0.053313 \\ h(2) &= -0.893312 \\ h(3) &= 0.854923 \\ \text{...} \end{aligned}

Perlin's implementation relies on a permutation table, which is convenient when working on the CPU, but more awkward for a shader. On the GPU, most people rely on various floating point hacks or sub-optimal bit tricks, which often fall short when exploring the full 32-bit range of inputs.

Note

We can not use a PRNG because it relies on a state where we need determinism (for one coordinate, we always want the same random value). LCG (Linear Congruential Generator) are a sub-classes of PRNG where the state is actually the returned value. But even then, we can not re-use the previously returned value since we need seeking; that is the ability to get the random value associated with our integer coordinate. Trying to feed a PRNG/LCG with our coordinate instead of the expected state would be equivalent to changing the seed at every call, and this may create an important bias in the random distribution. This is the reason why we need integer hashing instead.

So we need a hashing function, and we will have to limit ourselves to 32-bit because we're on the GPU. Fortunately, we won't have to look for very long because in 2018 Chris Wellons found a pretty good one he named lowbias32, which was later refined by TheIronBorn.

This is the first building block we will be using, the hashing function:

uint hash(uint x) {
    x = (x ^ (x >> 16)) * 0x21f0aaadU;
    x = (x ^ (x >> 15)) * 0x735a2d97U;
    return x ^ (x >> 15);
}

This is sweet and all, but a 32-bit unsigned integer is not directly useful by itself, we need a normalized float. We could naively divide by 0xffffffff, but we'll end up with a nonuniform distribution (this is not as important as it may sound to be honest). Instead, we will adapt the technique presented by Vigna Sebastiano for doubles to floats, and in GLSL:

float u2f(uint x) { return float(x >> 8U) * uintBitsToFloat(0x33800000U); }

Combining these two functions into u2f(hash(x)) maps any 32-bit coordinate x to a random float in [0,1).

To match our h(x) function from earlier and make it more "signal-like", we can optionally center it around 0 by remapping it from [0,1) to [-1,1): h(x)=u2f(hash(x))*2.0-1.0.

Expanding the hash function to more dimensions

When we will work in 2 or more dimensions, our input grid coordinates will be more than one value, but we need a way to feed them to our single parameter hash function. One trick is to use nested xor hash:

uint hash(uvec2 x) { return hash(x.x ^ hash(x.y)); }  // for 2D input
uint hash(uvec3 x) { return hash(x.x ^ hash(x.yz)); } // for 3D input

Using the pixel coordinates as input, we can test the hashing function in 2D:

float stepnoise2(vec2 p) {
    ivec2 i = ivec2(floor(p));    // integer coordinate, or lattice
    return u2f(hash(uvec2(i)));   // non-centered h(x)
}

Display h(x,y) by stepping the contiguous coordinates

Warning

We're going through an intermediate signed integer conversion before going to unsigned to avoid issues with negative coordinates. Quoting GLSL 4.60 specifications: "It is undefined to convert a negative floating-point value to an uint". With this conversion we will only have a problem when the coordinates go outside the signed 32-bit range.

Basic signal and white noise

Aside from stepping, we can also assign values to the integer coordinates only and interpolate linearly between them. Let's try this in 1D:

float value(int x) { return u2f(hash(uint(x)))*2.0 - 1.0; } // h(x)

float vnoise1_linear(float p) {
    int i = int(floor(p));                // integer coordinate, or lattice
    float f = fract(p);                   // x-position between the 2 surrounding values
    return mix(value(i), value(i+1), f);  // linear interpolation between the two
}

Feeding this function with the x-axis coordinate, we get the amplitude (height) of the signal:

Basic 1D value noise with linear interpolation

Note

This might not be obvious yet, but the beauty of this is that this "infinite" signal is deterministic and thus seekable. Indeed, we can move forward and backward to any real position p and instantly know how the signal looks like. This is essential for procedural programming.

But the linear interpolation is usually not that great for a signal, so instead of a straight line we will use a smooth fade. The two commonly used functions are:

The cubic Hermite curve, f(t)=3t^2-2t^3 (also used in GLSL smoothstep()) initially used by Ken Perlin in his first Perlin Noise implementation.
The more modern (and complex) quintic curve f(t)=6t^5-15t^4+10t^3 introduced in 2002 by Ken Perlin in his proposed improved version of the Perlin Noise, in order to address discontinuities in the 2nd order derivatives f"(t).

For the record, in GLSL:

float fade_quintic(float t) { return ((6.0*t-15.0)*t+10.0)*t*t*t; }
float fade_hermite(float t) { return (3.0-2.0*t)*t*t; }

We will stick with the quintic for the rest of the article:

#define fade fade_quintic

float vnoise1(float p) {
    int i = int(floor(p));
    float f = fract(p);
    float a = fade(f);
    return mix(value(i), value(i+1), a);
}

1D value noise with quintic interpolation as fading function

1D gradient noise

Still, a signal generated in such a way may not be desirable due to the abrupt changes in slope/frequency; it is too "unstable". So instead of using the random as noise values directly, we interpret them as gradient: this is the gradient noise.

How the gradient values affect the signal shape

Note

If we're pedantic, in 1D they can't exactly be called gradients, we should use the term slopes, or angles. But we will still do it for consistency with 2 and more dimensions.

For each lattice, we decide to give them y=0 at regular interval, and have their random gradient impacts the surrounding curve. This may sound complicated to implement but in practice it's 2 multiplications and 1 subtraction more than the value noise (for 1D at least):

float grad(int x) { // int lattice to random [-1,1)
    return u2f(hash(uint(x))) * 2.0 - 1.0;
}

float noise1(float p) {
    int i = int(floor(p));
    float g0 = grad(i);
    float g1 = grad(i + 1);

    float f = fract(p);
    float v0 = g0 * f;
    float v1 = g1 * (f - 1.0);

    float a = fade(f);
    return mix(v0, v1, a);
}

1D gradient noise with slope indicators on the lattice coordinates

Geometrically speaking, v_0 and v_1 are y-coordinates obtained by extending the 2 slopes (the little sticks at each lattice) around our current target point p and finding where the vertical line x=p intersects them. Then we smoothly interpolate between them using our fade function.

Note

One important property of the gradient noise is that it passes through 0 at every lattice, meaning at a constant rate. In other words, \textbf{noise}_1(p)=0 whenever p is an integer.

Expanding to 2 dimensions

In 2D, each lattice point stores a 2-component gradient vector. To evaluate noise at a point p=(x,y), instead of a simple multiplication we compute the dot product of each gradient vector with the vector from the lattice corner to (x,y), then bilinearly interpolate those four dot products using a 2D fade function.

#define bmix(a,b,c,d,x,y) mix(mix(a,b,x),mix(c,d,x),y) // bilinear interpolation

float noise2(vec2 p) {
    ivec2 i = ivec2(floor(p));
    vec2 g0 = grad(i);
    vec2 g1 = grad(i + ivec2(1, 0));
    vec2 g2 = grad(i + ivec2(0, 1));
    vec2 g3 = grad(i + ivec2(1, 1));

    vec2 f = fract(p);
    float v0 = dot(g0, f);
    float v1 = dot(g1, f - vec2(1.0, 0.0));
    float v2 = dot(g2, f - vec2(0.0, 1.0));
    float v3 = dot(g3, f - vec2(1.0, 1.0));

    vec2 a = fade(f);
    return bmix(v0, v1, v2, v3, a.x, a.y);
}

Since gradients are now in 2D, the random gradient function needs to be extended: we need a normalized vector generator, that is something that gives us a unit vector pointing in any direction.

A first solution would be to call the hash function twice (on itself typically) to obtain random (x,y) coordinates in a square, that we then normalize to get them on a circle:

vec2 grad(ivec2 x) { // ivec2 lattice to random 2D unit vector (normalized square point)
    uint h1 = hash(uvec2(x));
    uint h2 = hash(h1);
    return normalize(vec2(u2f(h1), u2f(h2)) * 2.0 - 1.0);
}

But we can do better with only one hash and some trigonometry:

const float TAU = 6.283185307179586;

vec2 grad(ivec2 x) { // ivec2 lattice to random 2D unit vector (circle point)
    float angle = u2f(hash(uvec2(x))) * TAU;
    return vec2(cos(angle), sin(angle));
}

And this is enough to get our 2D gradient noise:

2D gradient noise (with y-axis coordinates that cover -10 to 10)

Expanding to 3 dimensions

Similarly, for 3D gradient noise we will need these 2 changes:

The interpolation happens between the 8 points of a cube: it's a trilinear interpolation, a combination of 2 bilinear interpolations (itself being a combination of 3 linear interpolations)
The random unit vectors used as gradients need to be distributed evenly on a sphere instead of circle since we are working in 3D

vec3 grad(ivec3 x) { // ivec3 lattice to random 3D unit vector (sphere point)
    uint h0 = hash(uvec3(x));
    uint h1 = hash(h0);
    float theta = acos(2.0*u2f(h0) - 1.0); // use the first random for the polar angle (latitude)
    float phi = TAU * u2f(h1); // use the 2nd random for the azimuth (longitude)
    return vec3(cos(phi) * sin(theta), sin(phi) * sin(theta), cos(theta));
}

#define tmix(a,b,c,d,e,f,g,h,x,y,z) mix(bmix(a,b,c,d,x,y),bmix(e,f,g,h,x,y),z) // trilinear interpolation

float noise3(vec3 p) {
    ivec3 i = ivec3(floor(p));
    vec3 g0 = grad(i);
    vec3 g1 = grad(i + ivec3(1, 0, 0));
    vec3 g2 = grad(i + ivec3(0, 1, 0));
    vec3 g3 = grad(i + ivec3(1, 1, 0));
    vec3 g4 = grad(i + ivec3(0, 0, 1));
    vec3 g5 = grad(i + ivec3(1, 0, 1));
    vec3 g6 = grad(i + ivec3(0, 1, 1));
    vec3 g7 = grad(i + ivec3(1, 1, 1));

    vec3 f = fract(p);
    float v0 = dot(g0, f);
    float v1 = dot(g1, f - vec3(1, 0, 0));
    float v2 = dot(g2, f - vec3(0, 1, 0));
    float v3 = dot(g3, f - vec3(1, 1, 0));
    float v4 = dot(g4, f - vec3(0, 0, 1));
    float v5 = dot(g5, f - vec3(1, 0, 1));
    float v6 = dot(g6, f - vec3(0, 1, 1));
    float v7 = dot(g7, f - vec3(1, 1, 1));

    vec3 a = fade(f);
    return tmix(v0, v1, v2, v3, v4, v5, v6, v7, a.x, a.y, a.z);
}

3D gradient noise on a sphere (with 10x unzoom)

Note

The spherical representation has nothing to do with the selection of a random 3D unit vector on a sphere. The noise in this implementation applies to any 3D geometry. The choice of a sphere for the display was just a simple 3D shape to lay it on. If we were in 2D we could also feed the 3d noise function with p=(x,y,t) where t is the current time, giving us a "cloudy" atmospheric rendering.

The random gradient in 3D is starting to be a bit expensive with all the trigonometry, so we may consider simpler approaches, like normalizing a random position in a cube, similar to what we've initially suggested for 2D noise. From my observation, it doesn't seem to have any noticeable impact visually, but maybe it would have under certain circumstances.

Fractal Brownian Motion (fBm)

The idea behind fBm is to sum multiple "octaves" of noise together to construct a more refined pattern. In the most common cases, we would raise the frequency by doubling it (called a "lacunarity" factor), and halving the amplitude (called a "gain" or "persistence" factor):

Multiple signals

The algorithm is pretty much the same in all dimensions, it's simply a sum of signals. In 2D for example we can write:

const float LACUNARITY = 1.98;
const float GAIN = 0.51;

float fbm(vec2 p, int octaves) {
    float sum = 0.0;
    float amp = 1.0, freq = 1.0;
    for (int i = 0; i < octaves; i++) {
        sum += amp * noise2(p * freq);
        freq *= LACUNARITY;
        amp *= GAIN;
    }
    return sum;
}

2D gradient noise with 5 octaves (with y-axis coordinates that cover -2.5 to 2.5)

Note

We're not using exactly 2.0 and 0.5 for lacunarity and gain to break correlations.

And in 3D for completeness (aside from the raytracer to get a sphere, the noise code is the same, it just calls noise3(p*freq) and uses a vec3 p input):

3D gradient noise with 5 octaves

Derivatives

Derivatives (that is the rate of change of the signal) are useful in many situations. For example, you may have noticed that the 1D signals in this article tend to have a lighter color when the slope is steep compared to when it's flatter: the derivative is used to interpolate between the 2 colors.

Also in 1D case, to display the curve with correct thickness, I'm also using the derivatives for Inigo's distance to curve trick):

float dist = abs(v - p.y) / sqrt(1.0 + d*d); // v: curve value, p: position, d: derivative of curve

Using this signed distance like any other, we can display the curve smoothly:

1D gradient noise

In higher dimension, we may want to use them to get some lighting: indeed, with the derivatives we can compute the normal, which is then used for reflections:

// Lambertian lighting hack
vec3 normal = normalize(vec3(-d, 1.0)); // normal from the 2D partial derivatives d
float a = 5.0*TAU/8.0; // light from south-west
vec3 light_direction = normalize(vec3(cos(a), sin(a), 2.0));
float lighting = max(dot(normal, light_direction), 0.0);
col *= lighting;

2D gradient noise with (right) and without (left) lighting

If you're into terrain generation, another use case is to fake erosion by scaling the value of each noise layer of the fBm by \frac{1}{1+\|\sum_{i=0}^{octaves}d_i\|^2}.

float fbm2e(vec2 p) {
    float sum = 0.0;
    float amp = 1.0, freq = 1.0;
    vec2 d = vec2(0.0);
    for (int i = 0; i < octaves; i++) {
        vec3 n = noise2d(p * freq);    // adjusted noise2() returning the partial derivatives in .xy
        d += n.xy;                     // cumulated derivatives without frequency scaling
        float w = 1.0/(1.0+dot(d,d));  // gradient/slope based weight
        sum += amp * n.z * w;          // value damped down by the weight
        freq *= LACUNARITY;
        amp *= GAIN;
    }
    return sum;
}

2D gradient noise with (right) and without (left) erosion

The intuitive idea behind the formula here is that each new layer get "muted" when the gradients indicate a steep mountain, preventing them from being "rugged" more by higher frequency noises, and thus giving a sharper feel.

Anyway, these are random things we can do with the derivatives, but there are likely others I'm forgetting. The point is, they are particularly useful so we're going to study them.

Numerical vs analytical derivatives

We have two methods to get the derivatives:

The numerical method, where we compute the rate of change between 2 close points. It "works" with all curves (given enough precision to work with), but it has an accuracy (and sometimes speed) issue.
The analytical method, where we derive the exact mathematical formula.

The numerical method can be implemented on the GPU thanks to the dFdx and dFdy functions. These functions use the local and the neighbor fragments data to calculate a derivative of the specified value. It's one of the rare function that communicate information cross-fragment, and as you can guess it requires synchronization and thus has performance implications.

But for example, let's say we take our last 2D noise scene and compare the numerical vs the analytical derivatives:

2D gradient noise with its partial derivatives (both analytical and numerical)

On the left we have our standard 2D gradient fBm noise, and on the right the length of the derivatives of the noise: first the analytical version, then the numerical one. The latter should appear pixelized.

Numerical derivatives

The numerical derivatives were obtained with the following:

float w = 2.0/resolution.y;           // size of a pixel
float v = noise2(p);                  // noise value at position p
vec2 d = vec2(dFdx(v), dFdy(v)) / w;  // numerical partial derivatives

Note

It is also possible to derive numerically ourselves with finite differences, sampling the noise function for the neighbor positions within the same fragment, but this is going to be extremely expensive.

Analytical derivatives

To get the analytical derivatives we need to combine multiple derivatives found in the noise functions, starting from the fading function derivative. This one is easy:

\begin{aligned} \mathrm{fade}(t) &= 6t^5-15t^4+10t^3 \\ \mathrm{fade}'(t) &= 30t^2(t^2-2t+1) \end{aligned}

But we will also need the derivatives of the interpolation functions for each dimension:

lerp: \mathrm{mix}(a,b,x) = (1-x)a + bx
bilerp: \mathrm{bmix}(a,b,c,d,x,y) = \mathrm{mix}(\mathrm{mix}(a,b,x),\mathrm{mix}(c,d,x),y)
trilerp: \mathrm{tmix}(a,b,c,d,e,f,g,h,x,y,z) = \mathrm{mix}(\mathrm{bmix}(a,b,c,d,x,y),\mathrm{bmix}(e,f,g,h,x,y),z)

So here they are:

\partial_x\mathrm{mix}=b-a
\nabla\mathrm{bmix}=\mathrm{mix}(\begin{bmatrix}b\\c\end{bmatrix}-a,d-\begin{bmatrix}c\\b\end{bmatrix},\begin{bmatrix}y\\x\end{bmatrix})
\nabla\mathrm{tmix}=\mathrm{bmix}(\begin{bmatrix}b\\c\\e\end{bmatrix}-a,\begin{bmatrix}d-c\\d-b\\f-b\end{bmatrix},\begin{bmatrix}f-e\\g-e\\g-c\end{bmatrix},h-\begin{bmatrix}g\\f\\d\end{bmatrix},\begin{bmatrix}y\\x\\x\end{bmatrix},\begin{bmatrix}z\\z\\y\end{bmatrix})

Following is a proof for bilinear and trilinear interpolation function partial derivatives, you can skip them if you're not into nasty orgy of math symbols.

Note

I'm introducing a new notation for the partial derivatives with values because the usual ones are awful to make and read. An example is better than a long explanation, so instead of something like \frac{\partial_f(x,y,z)}{\partial_x}(17,u+v,v^2) or its horrendous vertical bar version, I will instead use: \partial_xf(x=17,y=u+v,z=v^2)

\begin{aligned} \partial_x\mathrm{bmix} &= \partial_a\mathrm{mix}\cdot\partial_x\mathrm{mix} + \partial_b\mathrm{mix}\cdot\partial_x\mathrm{mix} \\ &= \partial_a\mathrm{mix}(a=\mathrm{mix}(a,b,x),b=\mathrm{mix}(c,d,x),x=y) \cdot \partial_x\mathrm{mix}(a=a,b=b,x=x) \\ &+ \partial_b\mathrm{mix}(a=\mathrm{mix}(a,b,x),b=\mathrm{mix}(c,d,x),x=y) \cdot \partial_x\mathrm{mix}(a=c,b=d,x=x) \\ &= (1-y)(b-a) + y(d-c) \\ &= \boxed{\mathrm{mix}(b-a,d-c,y)} \\ \\ \partial_y\mathrm{bmix} &= \partial_x\mathrm{mix}_0(a=\mathrm{mix}(a,b,x), b=\mathrm{mix}(c,d,x), x=y) \\ &= \mathrm{mix}(c,d,x) - \mathrm{mix}(a,b,x) \\ &= \boxed{\mathrm{mix}(c-a,d-b,x)} \\ \\ \partial_x \mathrm{tmix} &= \partial_a\mathrm{mix}\cdot\partial_x\mathrm{bmix} + \partial_b\mathrm{mix}\cdot\partial_x\mathrm{bmix} \\ &= \partial_a\mathrm{mix}(a=\mathrm{bmix}(a,b,c,d,x,y),b=\mathrm{bmix}(e,f,g,h,x,y),z) \\ &\cdot \partial_x\mathrm{bmix}(a=a,b=b,c=c,d=d,x=x,y=y) \\ &+ \partial_b\mathrm{mix}(a=\mathrm{bmix}(a,b,c,d,x,y),b=\mathrm{bmix}(e,f,g,h,x,y),z) \\ &\cdot \partial_x\mathrm{bmix}(a=e,b=f,c=g,d=h,x=x,y=y) \\ &= (1-z)\mathrm{mix}(b-a,d-c,y) + z\mathrm{mix}(f-e,h-g,y) \\ &= \mathrm{mix}(\mathrm{mix}(b-a,d-c,y), z\mathrm{mix}(f-e,h-g,y), z) \\ &= \boxed{\mathrm{bmix}(b-a,d-c,f-e,h-g,y,z)} \\ \\ \partial_y \mathrm{tmix} &= \partial_a\mathrm{mix}\cdot\partial_y\mathrm{bmix} + \partial_b\mathrm{mix}\cdot\partial_y\mathrm{bmix} \\ &= \partial_a\mathrm{mix}(a=\mathrm{bmix}(a,b,c,d,x,y),b=\mathrm{bmix}(e,f,g,h,x,y),z) \\ &\cdot \partial_y\mathrm{bmix}(a=a,b=b,c=c,d=d,x=x,y=y) \\ &+ \partial_b\mathrm{mix}(a=\mathrm{bmix}(a,b,c,d,x,y),b=\mathrm{bmix}(e,f,g,h,x,y),z) \\ &\cdot \partial_y\mathrm{bmix}(a=e,b=f,c=g,d=h,x=x,y=y) \\ &= (1-z)\mathrm{mix}(c-a,d-b,x) + z\mathrm{mix}(g-e,h-f,x) \\ &= \mathrm{mix}(\mathrm{mix}(c-a,d-b,x), z\mathrm{mix}(g-e,h-f,x), z) \\ &= \boxed{\mathrm{bmix}(c-a,d-b,g-e,h-f,x,z)} \\ \\ \partial_z\mathrm{tmix} &= \partial_x\mathrm{mix} \\ &= \partial_x\mathrm{mix}(a=\mathrm{bmix}(a,b,c,d,x,y),b=\mathrm{bmix}(e,f,g,h,x,y),x=z) \\ &= \mathrm{bmix}(e,f,g,h,x,y)-\mathrm{bmix}(a,b,c,d,x,y) \\ &= \boxed{\mathrm{bmix}(e-a,f-b,g-c,h-d,x,y)} \end{aligned}

Finally, given these partial derivatives, we can figure out how to compute the derivatives of the noise itself. It helps to know that fract(p) derivative is 1 (mostly) and floor(p) derivative is 0 (mostly), so things simplify themselves out fairly cleanly. I spare you the details this time:

\begin{aligned} \text{1D: } & \boxed{\textbf{mix}(g_0,g_1,\textbf{fade}(f))+\partial_x\textbf{mix}(v_0,v_1)\cdot\textbf{fade}'(f)} \\ \text{2D: } & \boxed{\textbf{bmix}(g_0,g_1,g_2,g_3,\textbf{fade}(f))+\nabla\mathrm{bmix}(v_0,v_1,v_2,v_3,\textbf{fade}(f))\cdot\textbf{fade}'(f)} \\ \text{3D: } & \boxed{\textbf{tmix}(g_0,g_1,g_2,g_3,g_4,g_5,g_6,g_7,\textbf{fade}(f))+\nabla\mathrm{tmix}(v_0,v_1,v_2,v_3,v_4,v_5,v_6,v_7,\textbf{fade}(f))\cdot\textbf{fade}'(f)} \end{aligned}

Adjusting our gradient noise functions to return the derivatives along with the noise value itself:

vec2 noise1d(float p) {
    int i = int(floor(p));
    float g0 = grad(i);
    float g1 = grad(i + 1);

    float f = fract(p);
    float v0 = g0 * f;
    float v1 = g1 * (f - 1.0);

    float a = fade(f);
    float v = mix(v0, v1, a);

    float g = mix(g0, g1, a);
    float d = v1 - v0;                    // derivative of mix with respect to the interpolant
    float da = ((f-2.0)*f+1.0)*30.0*f*f;  // fade'(t): derivative of quintic interpolant

    return vec2(g + d*da, v);
}

vec3 noise2d(vec2 p) {
    // [...]
    float v = bmix(v0, v1, v2, v3, a.x, a.y);

    vec2 g = bmix(g0, g1, g2, g3, a.x, a.y);
    vec2 d = mix(vec2(v1,v2)-v0, v3-vec2(v2,v1), a.yx); // derivatives of bmix with respect to the interpolant
    vec2 da = ((f-2.0)*f+1.0)*30.0*f*f;

    return vec3(g + d*da, v);
}

vec4 noise3d(vec3 p) {
    // [...]
    float v = tmix(v0, v1, v2, v3, v4, v5, v6, v7, a.x, a.y, a.z);

    vec3 g = tmix(g0, g1, g2, g3, g4, g5, g6, g7, a.x, a.y, a.z);
    vec3 d = bmix(vec3(v1,v2,v4)-v0, vec3(v3-v2,v3-v1,v5-v1),  // derivatives of tmix with
                  vec3(v5-v4,v6-v4,v6-v2), v7-vec3(v6,v5,v3),  // respect to the interpolant
                  a.yxx, a.zzy);
    vec3 da = ((f-2.0)*f+1.0)*30.0*f*f;

    return vec4(g + d*da, v);
}

Note

The derivatives are in the first component so that given n=noise3d(p) we can write vec3 d=n.xyz for x/y/z partial derivatives instead of the more awkward vec3 d=n.yzw.

To get the equivalent derivatives for value noise, we just set g=0 in the final expression of the derivatives.

It is possible to unroll the nested mix expression to (maybe) make it faster, but I find the mix expressions simple, elegant, and likely more numerically stable.

Using the derivatives

There are important things to take into consideration when working with the derivatives. Most notably, it's important to follow the chain rule and the product rule. For example, let's say we're working with a noise function that returns the derivatives:

float freq = 0.1;
vec4 n = noise3d(x*freq);
float v = n.a;            // value
vec2 d = n.xyz * freq;    // partial derivatives

If we multiply the input of our function by the frequency, then we need to multiply the derivatives by the same frequency factor.

Similarly, let's say we want to move v from [-1,1] to [0,1]:

v = (v + 1.0) / 2.0; // [-1,1] -> [0,1]
d = d / 2.0;

The intuitive explanation is that if we are squeezing the value in half its original size, then the derivatives (slopes) are getting flatter as well.

This may not sound very important, but failing to propagate these scale factors correctly often leads to subtle bugs. For example in the fBm, if we want it to return the correct derivatives it is necessary to do something like that (notice the freq):

vec4 fbm3d(vec3 p) {
    vec4 sum = vec4(0.0);
    float amp = 1.0, freq = 1.0;
    for (int i = 0; i < octaves; i++) {
        vec4 n = noise3d(p * freq);
        sum.xyz += amp * n.xyz * freq; // derivatives
        sum.a += amp * n.a; // value
        freq *= LACUNARITY;
        amp *= GAIN;
    }
    return sum;
}

Things get tricky when the noise is being adjusted with the derivatives themselves as we saw before with the erosion, especially since this would imply second order derivatives.

Similarly, there is a technique involving the rotation of p at every iteration of the fBm to reduce the correlations between noise layers. This rotation is a linear transformation, but it requires careful adjustments to the derivatives as well. Implementing these transformations correctly is left as an exercise; careful bookkeeping of derivatives is essential.

Going further

With this introduction we only explored the tip of the iceberg. For example, you'll be interested to know that there are alternative noises such as OpenSimplex, where we interpolate with lattice positions on a simplex grid (triangles in 2D, tetrahedra in 3D) instead of a rectangular grid. It has useful properties, for example in fBm it yields fewer directional artifacts, eliminating the need for ad-hoc rotations.

Speaking of fBm, you may want to check out domain warping where nested fBm together make fancy effects:

Domain warping with fbm2(p+fbm2(p+fbm2(p+t)))

Another thing you'll be tempted to research is how to consistently scale the noise so that it fits into a controlled range of value. Spoiler alert: it's a particularly complex subject.

Similarly, we stopped ourselves at 3D, but 4D is also useful, for example let's say we want to morph the 3D noise with time: p=(x,y,z,t).

I hope this article was able to give a good overview of the concepts, and I'll see you next time for new adventures.

For updates and more frequent content you can follow me on Mastodon. Feel also free to subscribe to the RSS in order to be notified of new write-ups. It is also usually possible to reach me through other means (check the footer below). Finally, discussions on some of the articles can sometimes be found on HackerNews, Lobste.rs and Reddit.