<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
 <channel xmlns:atom="http://www.w3.org/2005/Atom">
  <atom:link href="http://blog.pkh.me/rss.xml" rel="self" type="application/rss+xml" />
  <title>A small freedom area RSS</title>
  <description>Default feed for blog.pkh.me</description>
  <link>http://blog.pkh.me/</link>
<item>
 <guid>http://blog.pkh.me/p/48-a-series-of-tricks-and-techniques-i-learned-doing-tiny-glsl-demos.html</guid>
 <link>http://blog.pkh.me/p/48-a-series-of-tricks-and-techniques-i-learned-doing-tiny-glsl-demos.html</link>
 <title>A series of tricks and techniques I learned doing tiny GLSL demos</title>
 <pubDate>Sun, 07 Dec 2025 17:48:26 -0000</pubDate>
 <description>&lt;p&gt;In the past two months or so, I spent some time making tiny GLSL demos. I wrote
an article about the first one, &lt;a href=&quot;http://blog.pkh.me/p/45-code-golfing-a-tiny-demo-using-maths-and-a-pinch-of-insanity.html&quot;&gt;Red Alp&lt;/a&gt;. There, I went into details about the
whole process, so I recommend to check it out first if you&#x27;re not familiar with
the field.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://blog.pkh.me/img/demo-tricks/thumb.jpg&quot; alt=&quot;preview of the 4 demos&quot; /&gt;&lt;/p&gt;
&lt;p&gt;We will look at 4 demos: &lt;a href=&quot;#Moonlight&quot;&gt;Moonlight&lt;/a&gt;, &lt;a href=&quot;#Entrance3&quot;&gt;Entrance 3&lt;/a&gt;,
&lt;a href=&quot;#Archipelago&quot;&gt;Archipelago&lt;/a&gt;, and &lt;a href=&quot;#Cutie&quot;&gt;Cutie&lt;/a&gt;. But this time, for each
demo, we&#x27;re going to cover one or two things I learned from it. It won&#x27;t be a
deep dive into every aspect because it would be extremely redundant. Instead,
I&#x27;ll take you along a journey of learning experiences.&lt;/p&gt;
&lt;p&gt;&lt;a id=&quot;Moonlight&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;Moonlight&lt;/h2&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;340&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demo-tricks/moonlight.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Moonlight demo in 460 characters&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// Moonlight [460] by bµg
// License: CC BY-NC-SA 4.0
void main(){vec3 o,p,u=vec3((P+P-R)/R.y,1),Q;Q++;for(float d,a,m,i,t;i++&amp;lt;1e2;p=t&amp;lt;7.2?Q:vec3(2,1,0),d=abs(d)*.15+.1,o+=p/m+(t&amp;gt;9.?d=9.,Q:p/d),t+=min(m,d))for(p=normalize(u)*t,p.z-=5e1,m=max(length(p)-1e1,.01),p.z+=T,d=5.-length(p.xy*=mat2(cos(t*.2+vec4(0,33,11,0)))),a=.01;a&amp;lt;1.;a+=a)p.xz*=mat2(8,6,-6,8)*.1,d-=abs(dot(sin(p/a*.6-T*.3),p-p+a)),m+=abs(dot(sin(p/a/5.),p-p+a/5.));o/=4e2;O=vec4(tanh(mix(vec3(-35,-15,8),vec3(118,95,60),o-o*length(u.xy*.5))*.01),1);}
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;See it on &lt;a href=&quot;https://b.pkh.me/2025-11-09-moonlight.htm&quot;&gt;its official page&lt;/a&gt;, or play with the code on &lt;a href=&quot;https://www.shadertoy.com/view/wX2Bzy&quot;&gt;its
Shadertoy portage&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In Red Alp, I used volumetric raymarching to go through the clouds and fog, and
it took quite a significant part of the code to make the absorption and emission
convincing. But there is an alternative technique that is surprisingly simpler.&lt;/p&gt;
&lt;p&gt;In the raymarching loop, the color contribution at each iteration becomes &lt;span class=&quot;math inline&quot;&gt;1/d&lt;/span&gt;
or &lt;span class=&quot;math inline&quot;&gt;c/d&lt;/span&gt; where &lt;span class=&quot;math inline&quot;&gt;d&lt;/span&gt; is the density of the material at the current ray position,
and &lt;span class=&quot;math inline&quot;&gt;c&lt;/span&gt; an optional color tint if you don&#x27;t want to work in grayscale level.
Some variants exist, for example &lt;span class=&quot;math inline&quot;&gt;1/d^2&lt;/span&gt;, but we&#x27;ll focus on &lt;span class=&quot;math inline&quot;&gt;1/d&lt;/span&gt;.&lt;/p&gt;
&lt;h3&gt;1/d explanation&lt;/h3&gt;
&lt;p&gt;Let&#x27;s see how it looks in practice with a simple cube raymarch where we use this
peculiar contribution:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;340&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demo-tricks/onecube.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;One glowing and rotating cube&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;void main() {
    float d, t;
    vec3 o, p,
         u = normalize(vec3(P+P-R,R.y)); // screen to world coordinate

    for (int i = 0; i &amp;lt; 30; i++) {
        p = u * t; // ray position

        p.z -= 3.; // take a step back

        // Rodriguez rotation with an arbitrary angle of π/2
        // and unaligned axis
        vec3 a = normalize(cos(T+vec3(0,2,4)));
        p = a*dot(a,p)-cross(a,p);

        // Signed distance function of a cube of size 1
        p = abs(p)-1.;
        d = length(max(p,0.)) + min(max(p.x,max(p.y,p.z)),0.);

        // Maxed out to not enter the solid
        d = max(d,.001);

        t += d; // stepping forward by that distance

        // Our mysterious contribution to the output
        o += 1./d;
    }

    // Arbitrary scale within visible range
    O = vec4(o/200., 1);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;The signed function of the cube is from the &lt;a href=&quot;https://iquilezles.org/articles/distfunctions/&quot;&gt;classic Inigo Quilez
page&lt;/a&gt;. For the rotation you can refer to &lt;a href=&quot;https://mini.gmshaders.com/p/3d-rotation&quot;&gt;Xor&lt;/a&gt; or
&lt;a href=&quot;https://suricrasia.online/blog/shader-functions/&quot;&gt;Blackle&lt;/a&gt; article. For the general understanding of
the code, see my previous article on &lt;a href=&quot;http://blog.pkh.me/p/45-code-golfing-a-tiny-demo-using-maths-and-a-pinch-of-insanity.html&quot;&gt;Red Alp&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The first time I saw it, I wondered whether it was a creative take, or if it was
backed by physical properties.&lt;/p&gt;
&lt;p&gt;Let&#x27;s simplify the problem with the following figure:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/demo-tricks/ray.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;A ray passing by a radiating object&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The glowing object sends photons that spread all around it. The further we go
from the object, the more spread these photons are, basically following the
&lt;a href=&quot;https://en.wikipedia.org/wiki/Inverse-square_law&quot;&gt;inverse square law&lt;/a&gt; &lt;span class=&quot;math inline&quot;&gt;1/r^2&lt;/span&gt;, which gives the photons density,
where &lt;span class=&quot;math inline&quot;&gt;r&lt;/span&gt; is the distance to the target object.&lt;/p&gt;
&lt;p&gt;Let&#x27;s say we send a ray and want to know how many photons are present along the
whole path. We have to &amp;quot;sum&amp;quot;, or rather integrate, all these photons density
measures along the ray. Since we are doing a discrete sampling (the dots on the
figure), we need to interpolate the photons density &lt;em&gt;between&lt;/em&gt; each sampling
point as well.&lt;/p&gt;
&lt;p&gt;Given two arbitrary sampling points and their corresponding distance &lt;span class=&quot;math inline&quot;&gt;d_n&lt;/span&gt;
and &lt;span class=&quot;math inline&quot;&gt;d_{n+1}&lt;/span&gt;, any intermediate distance can be linearly interpolated with
&lt;span class=&quot;math inline&quot;&gt;r=\mathrm{mix}(d_n,d_{n+1},t)&lt;/span&gt; where &lt;span class=&quot;math inline&quot;&gt;t&lt;/span&gt; is within &lt;span class=&quot;math inline&quot;&gt;[0,1]&lt;/span&gt;. Applying the
inverse square law from before (&lt;span class=&quot;math inline&quot;&gt;1/r^2&lt;/span&gt;), the integrated photons density between
these 2 points can be expressed with this formula (in &lt;span class=&quot;math inline&quot;&gt;t&lt;/span&gt; range):&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
v = \Delta t \int \frac{1}{\mathrm{mix}(d_n,d_{n+1},t)^2} dt
&lt;/div&gt;
&lt;p&gt;&lt;span class=&quot;math inline&quot;&gt;t&lt;/span&gt; being normalized, the &lt;span class=&quot;math inline&quot;&gt;\Delta t&lt;/span&gt; is here to covers the actual segment
distance. With the help of Sympy we can do the integration:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&amp;gt;&amp;gt;&amp;gt; a, b, D, t = symbols(&#x27;a b D t&#x27;, real=True)
&amp;gt;&amp;gt;&amp;gt; mix = a*(1-t) + b*t
&amp;gt;&amp;gt;&amp;gt; D * integrate(1/mix**2, (t,0,1)).simplify()
 D
───
a⋅b
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So the result of this integration is:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
v = \frac{\Delta t}{d_{n}d_{n+1}}.
&lt;/div&gt;
&lt;p&gt;Now the key is that in the loop, &lt;span class=&quot;math inline&quot;&gt;\Delta t&lt;/span&gt; stepping is actually &lt;span class=&quot;math inline&quot;&gt;d_{n+1}&lt;/span&gt;, so
we end up with:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
v = \frac{\Delta t}{d_{n}\Delta t} = \frac{1}{d_n}
&lt;/div&gt;
&lt;p&gt;And we find back our mysterious &lt;span class=&quot;math inline&quot;&gt;1/d&lt;/span&gt;. It&#x27;s &amp;quot;physically correct&amp;quot;, assuming
vacuum space. Of course, reality is more complex, and we don&#x27;t even need to
stick to that formula, but it was nice figuring out that this simple fraction is
a fairly good model of reality.&lt;/p&gt;
&lt;h3&gt;Going through the object&lt;/h3&gt;
&lt;p&gt;In the cube example we didn&#x27;t go through the object, using &lt;code&gt;max(d, .001)&lt;/code&gt;. But
if we were to add some transparency, we could have used &lt;code&gt;d = A*abs(d)+B&lt;/code&gt;
instead, where &lt;code&gt;A&lt;/code&gt; could be interpreted as absorption and &lt;code&gt;B&lt;/code&gt; the pass-through,
or transparency.&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;340&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demo-tricks/onecube-alpha.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;One glowing, transparent, and rotating cube; A=0.4, B=0.1&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;I first saw this formula mentioned in &lt;a href=&quot;https://mini.gmshaders.com/p/volumetric&quot;&gt;Xor article on volumetric&lt;/a&gt;.
To understand it a bit better, here is my intuitive take: the &lt;code&gt;+B&lt;/code&gt; causes a
potential penetration into the solid at the next iteration, which wouldn&#x27;t
happen otherwise (or only very marginally). When inside the solid, the &lt;code&gt;abs(d)&lt;/code&gt;
causes the ray to continue further (by the amount of the distance to the closest
edge). Then the multiplication by &lt;code&gt;A&lt;/code&gt; makes sure we don&#x27;t penetrate too fast
into it; it&#x27;s the absorption, or &amp;quot;damping&amp;quot;.&lt;/p&gt;
&lt;p&gt;This is basically the technique I used in Moonlight to avoid the complex
absorption/emission code.&lt;/p&gt;
&lt;p&gt;&lt;a id=&quot;Entrance3&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;Entrance 3&lt;/h2&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;340&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demo-tricks/entrance3.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Entrance 3 demo in 465 characters&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// Entrance 3 [465] by bµg
// License: CC BY-NC-SA 4.0
#define V for(s++;d&amp;lt;l&amp;amp;&amp;amp;s&amp;gt;.001;q=abs(p+=v*s)-45.,b=abs(p+vec3(mod(T*5.,80.)-7.,45.+sin(T*10.)*.2,12))-vec3(1,7,1),d+=s=min(max(p.y,-min(max(abs(p.y+28.)-17.,abs(p.z+12.)-4.),max(q.x,max(q.y,q.z)))),max(b.x,max(b.y,b.z))))
void main(){float d,s,r=1.7,l=2e2;vec3 b,v=b-.58,q,p=mat3(r,0,-r,-1,2,-1,b+1.4)*vec3((P+P-R)/R.y*20.4,30);V;r=exp(-d*d/1e4)*.2;l=length(v=-vec3(90,30,10)-p);v/=l;d=1.;V;r+=50.*d/l/l;O=vec4(pow(mix(vec3(0,4,9),vec3(80,7,2),r*r)*.01,p-p+.45),1);}
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;See it on &lt;a href=&quot;https://b.pkh.me/2025-11-18-entrance-3.htm&quot;&gt;its official page&lt;/a&gt;, or play with the code on &lt;a href=&quot;https://www.shadertoy.com/view/3ctcDn&quot;&gt;its
Shadertoy portage&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This demo was probably one of the most challenging, but I&#x27;m pretty happy with its
atmospheric vibe. It&#x27;s kind of different than the usual demos for this size.&lt;/p&gt;
&lt;p&gt;I initially tried with some voxels, but I couldn&#x27;t make it work with the light
under 512 characters (the initialization code was too large, not the branchless
&lt;a href=&quot;https://en.wikipedia.org/wiki/Digital_differential_analyzer_(graphics_algorithm)&quot;&gt;DDA&lt;/a&gt; stepping). It also had annoying limitations (typically the animation was
unit bound), so I fell back to a classic raymarching.&lt;/p&gt;
&lt;p&gt;The first thing I did differently was to use an &lt;a href=&quot;https://iquilezles.org/articles/distfunctions2dlinf/&quot;&gt;L-∞ norm&lt;/a&gt; instead of an
euclidean norm for the distance function: every solid is a cube so it&#x27;s
appropriate to use simpler formulas.&lt;/p&gt;
&lt;p&gt;For the light, it&#x27;s not an illusion, it&#x27;s an actual light: after the first
raymarch to a solid, the ray direction is reoriented toward the light and the
march runs again (it&#x27;s the &lt;code&gt;V&lt;/code&gt; macro). Hitting a solid or not defines if the
fragment should be lighten up or not.&lt;/p&gt;
&lt;h3&gt;Mobile bugs&lt;/h3&gt;
&lt;p&gt;A bad surprise of this demo was uncovering two driver bugs on mobile:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One with tricky &lt;a href=&quot;https://crbug.com/462233638&quot;&gt;for-loop compounds on Snapdragon/Adreno&lt;/a&gt; because I was trying
hard to avoid the macros and functions.&lt;/li&gt;
&lt;li&gt;One with &lt;a href=&quot;https://crbug.com/462288594&quot;&gt;chained assignments on Imagination/PowerVR&lt;/a&gt; (typically affect
Google Pixel Pro 10).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first was worked around with the &lt;code&gt;V&lt;/code&gt; macro (actually saved 3 characters in
the process), but the 2nd one had to be unpacked and made me lose 2 characters.&lt;/p&gt;
&lt;h3&gt;Isometry&lt;/h3&gt;
&lt;p&gt;Another thing I studied was how to set up the camera in a non-perspective
&lt;a href=&quot;https://en.wikipedia.org/wiki/Isometric_projection&quot;&gt;isometric or dimetric view&lt;/a&gt;. I couldn&#x27;t make sense of the maths from
the Wikipedia page (it just didn&#x27;t work), but Sympy rescued me again:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;# Counter-clockwise rotation
a, ax0, ax1, ax2 = symbols(&#x27;a ax0:3&#x27;)
c, s = cos(a), sin(a)
k = 1-c
rot = Matrix(3,3, [
    # col 1            col 2              # col 3
    ax0*ax0*k + c,     ax0*ax1*k + ax2*s, ax0*ax2*k - ax1*s, # row 1
    ax1*ax0*k - ax2*s, ax1*ax1*k + c,     ax1*ax2*k + ax0*s, # row 2
    ax2*ax0*k + ax1*s, ax2*ax1*k - ax0*s, ax2*ax2*k + c      # row 3
])

# Rotation by 45° on the y-axis
m45 = rot.subs({a:rad(-45), ax0:0, ax1:1, ax2:0})

# Apply the 2nd rotation on the x-axis to get the transform matrices for two
# classic projections
# Note: asin(tan(rad(30))) is the same as atan(sin(rad(45)))
isometric = m45 * rot.subs({a:asin(tan(rad(30))), ax0:1, ax1:0, ax2:0})
dimetric  = m45 * rot.subs({a:         rad(30),   ax0:1, ax1:0, ax2:0})
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Inspecting the matrices and factoring out the common terms, we obtain the
following transform matrices:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
M_{iso} = \sqrt{2}\sqrt{3}\begin{bmatrix}
   \sqrt{3} &amp;amp; -1 &amp;amp; \sqrt{2} \\
          0 &amp;amp;  2 &amp;amp; \sqrt{2} \\
  -\sqrt{3} &amp;amp; -1 &amp;amp; \sqrt{2}
\end{bmatrix} \text{ and } M_{dim} = \frac{4}{\sqrt{2}}\begin{bmatrix}
     2 &amp;amp;       -1 &amp;amp; \sqrt{3} \\
     0 &amp;amp; \sqrt{6} &amp;amp; \sqrt{2} \\
    -2 &amp;amp;       -1 &amp;amp; \sqrt{3}
\end{bmatrix}
&lt;/div&gt;
&lt;p&gt;The ray direction is common to all fragments, so we use the central UV
coordinate (0,0) as reference point. We push it forward for convenience: (0,0,1),
and transform it with our matrix. This gives the central screen coordinate in
world space. Since the obtained point coordinate is relative to the world
origin, to go from that point to the origin, we just have to flip its sign. The
ray direction formula is then:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
d_{iso} = -M_{iso} \begin{bmatrix}0 \\ 0 \\ 1\end{bmatrix} = -\frac{\sqrt{3}}{3}\begin{bmatrix}1 \\ 1 \\ 1\end{bmatrix}
\text{ and } d_{dim} = -M_{dim} \begin{bmatrix}0 \\ 0 \\ 1\end{bmatrix} = -\frac{1}{4} \begin{bmatrix}\sqrt{6} \\ 2 \\ \sqrt{6}\end{bmatrix}
&lt;/div&gt;
&lt;p&gt;To get the ray origin of every other pixel, the remaining question is: what is
the smallest distance we need to step back the screen coordinates such that,
when applying the transformation, the view wouldn&#x27;t clip into the ground at
&lt;span class=&quot;math inline&quot;&gt;y=0&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;This requirement can be modeled with the following expression:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
M \begin{bmatrix}x \\ -1 \\ z\end{bmatrix} &amp;gt; 0
&lt;/div&gt;
&lt;p&gt;The -1 being the lowest y-screen coordinate (which we don&#x27;t want into the
ground). The lazy bum in me just asks Sympy to solve it for me:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;x, z = symbols(&amp;quot;x z&amp;quot;, real=True)
u = m * Matrix([x, -1, z])
uz = solve(u[1] &amp;gt; 0, z)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We get &lt;span class=&quot;math inline&quot;&gt;z&amp;gt;\sqrt{2}&lt;/span&gt; for isometric, and &lt;span class=&quot;math inline&quot;&gt;z&amp;gt;\sqrt{3}&lt;/span&gt; for dimetric.&lt;/p&gt;
&lt;p&gt;With an arbitrary scale &lt;code&gt;S&lt;/code&gt; of the coordinate we end up with the following:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;const float S = 50.;
vec2 u = (P+P-R)/R.y * S; // scaled screen coordinates

float A=sqrt(2.), B=sqrt(3.);

// Isometric
vec3 rd = -vec3(1)/B,
     ro = mat3(B,0,-B,-1,2,-1,A,A,A)/A/B * vec3(u, A*S + eps);

// Dimetric
vec3 rd = -vec3(B,A,B)/A/2.,
     ro = mat3(2,0,-2,-1,A*B,-1,B,A,B)/A/2. * vec3(u, B*S + eps);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;eps&lt;/code&gt; is an arbitrary small value to make sure the y-coordinate ends up
above 0.&lt;/p&gt;
&lt;p&gt;In Entrance 3, I used a rough approximation of the isometric setup.&lt;/p&gt;
&lt;p&gt;&lt;a id=&quot;Archipelago&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;Archipelago&lt;/h2&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;340&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demo-tricks/archipelago.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Archipelago demo in 472 characters&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// Archipelago [472] by bµg
// License: CC BY-NC-SA 4.0
#define r(a)*=mat2(cos(a+vec4(0,11,33,0))),
void main(){vec3 p,q,k;for(float w,x,a,b,i,t,h,e=.1,d=e,z=.001;i++&amp;lt;50.&amp;amp;&amp;amp;d&amp;gt;z;h+=k.y,w=h-d,t+=d=min(d,h)*.8,O=vec4((w&amp;gt;z?k.zxx*e:k.zyz/20.)+i/1e2+max(1.-abs(w/e),z),1))for(p=normalize(vec3(P+P-R,R.y))*t,p.zy r(1.)p.z+=T+T,p.x+=sin(w=T*.4)*2.,p.xy r(cos(w)*e)d=p.y+=4.,h=d-2.3+abs(p.x*.2),q=p,k-=k,a=e,b=.8;a&amp;gt;z;a*=.8,b*=.5)q.xz r(.6)p.xz r(.6)k.y+=abs(dot(sin(q.xz*.4/b),R-R+b)),k.x+=w=a*exp(sin(x=p.x/a*e+T+T)),p.x-=w*cos(x),d-=w;}
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;See it on &lt;a href=&quot;https://b.pkh.me/2025-12-02-archipelago.htm&quot;&gt;its official page&lt;/a&gt;, or play with the code on &lt;a href=&quot;https://www.shadertoy.com/view/wfKcDR&quot;&gt;its
Shadertoy portage&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;For this infinite procedurally generated Japan, I wanted to mark a rupture with
my red/orange obsession. Technically speaking, it&#x27;s actually fairly basic if
you&#x27;re familiar with Red Alp. I used the same noise for the mountains/islands,
but the water uses a different noise.&lt;/p&gt;
&lt;p&gt;The per octave noise curve is &lt;code&gt;w=exp(sin(x))&lt;/code&gt;, with the particularity of
shifting the &lt;code&gt;x&lt;/code&gt; coordinate with its derivative: &lt;code&gt;x-=w*cos(x)&lt;/code&gt;. This is some
form of &lt;a href=&quot;https://iquilezles.org/articles/warp/&quot;&gt;domain warping&lt;/a&gt; that gives the nice effect here. When I say &lt;code&gt;x&lt;/code&gt;, I&#x27;m
really referring to the x-axis position. It is not needed to work with the
z-component (xz forms the flat plane) because each octave of the fbm has a
rotation that &amp;quot;mixes&amp;quot; both axis, so &lt;code&gt;z&lt;/code&gt; is actually backed in &lt;code&gt;x&lt;/code&gt;.&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/demo-tricks/waves.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;w=exp(sin(x))&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;I didn&#x27;t come up with the formula, but found it first one &lt;a href=&quot;https://youtu.be/PH9q0HNBjT4&amp;amp;t=1025s&quot;&gt;this video by
Acerola&lt;/a&gt;. I don&#x27;t know if he&#x27;s the original author, but I&#x27;ve
seen the formula being replicated in various places.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a id=&quot;Cutie&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;Cutie&lt;/h2&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;340&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demo-tricks/cutie.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Cutie demo in 602 characters&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// Cutie [602] by bµg
// License: CC BY-NC-SA 4.0
#define V vec3
#define L length(p
#define C(A,B,X,Y)d=min(d,-.2*log2(exp2(X-L-A)/.2)+exp2(Y-L-B)/.2)))
#define H(Z)S,k=fract(T*1.5+s),a=V(1.3,.2,Z),b=V(1,.3*max(1.-abs(3.*k-1.),z),Z*.75+3.*max(-k*S,k-1.)),q=b*S,q+=a+sqrt(1.-dot(q,q))*normalize(V(-b.y,b.x,0)),C(a,q,3.5,2.5),C(q,a-b,2.5,2.)
void main(){float i,t,k,z,s,S=.5,d=S;for(V p,q,a,b;i++&amp;lt;5e1&amp;amp;&amp;amp;d&amp;gt;.001;t+=d=min(d,s=L+V(S-2.*p.x,-1,S))-S))p=normalize(V(P+P-R,R.y))*t,p.z-=5.,p.zy*=mat2(cos(vec4(1,12,34,1))),p.xz*=mat2(cos(sin(T)+vec4(0,11,33,0))),d=1.+p.y,C(z,V(z,z,1.2),7.5,6.),s=p.x&amp;lt;z?p.x=-p.x,z:H(z),s+=H(1.);O=vec4(V(exp(-i/(s&amp;gt;d?1e2:9.))),1);}
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;See it on &lt;a href=&quot;https://b.pkh.me/2025-12-05-cutie.htm&quot;&gt;its official page&lt;/a&gt;, or play with the code on &lt;a href=&quot;https://www.shadertoy.com/view/tfVcRV&quot;&gt;its
Shadertoy portage&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Here I got cocky and thought I could manage to fit it in 512 chars. I failed,
by 90 characters. I did use the &lt;a href=&quot;https://iquilezles.org/articles/smin/&quot;&gt;smoothmin&lt;/a&gt; operator for the first time: every
limb of the body of Cutie is composed of two spheres creating a rounded cone
(two sphere of different size smoothly merged like metaballs).&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;340&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demo-tricks/metaballs.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;2 spheres merging using the smin operator&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Then I used &lt;a href=&quot;https://iquilezles.org/articles/simpleik/&quot;&gt;simple IK kinetics&lt;/a&gt; for the animation. Using leg parts
with a size of 1 helped simplifying the formula and make it shorter.&lt;/p&gt;
&lt;p&gt;You may be wondering about the smooth visuals itself: I didn&#x27;t use the depth
map but simply the number of iterations. Due to the nature of the raymarching
algorithm, when a ray passes close to a shape, it slows down significantly,
increasing the number of iterations. This is super useful because it exaggerate
the contour of the shapes naturally. It&#x27;s wrapped into an exponential, but &lt;code&gt;i&lt;/code&gt;
defines the output color directly.&lt;/p&gt;
&lt;h2&gt;What&#x27;s next&lt;/h2&gt;
&lt;p&gt;I will continue making more of those, keeping my artistic ambition low because
of the 512 characters constraint I&#x27;m imposing on myself.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://blog.pkh.me/img/demo-tricks/512.jpg&quot; alt=&quot;meme about the 512 chars limit&quot; /&gt;&lt;/p&gt;
&lt;p&gt;You may be wondering why I keep this obsession about 512 characters, and many
people called me out on this one. There are actually many arguments:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A tiny demo has to focus on one or two very scoped aspects of computer
graphics, which makes it perfect as a &lt;strong&gt;learning support&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;It&#x27;s part of the &lt;strong&gt;artistic performance&lt;/strong&gt;: it&#x27;s not just techniques and
visuals, the wizardry of the code is part of why it&#x27;s so impressive. We&#x27;re in
an era of visuals, people have been fed with the craziest VFX ever. But have
they seen them with a few hundreds bytes of code?&lt;/li&gt;
&lt;li&gt;The constraint helps me &lt;strong&gt;finish the work&lt;/strong&gt;: when making art, there is always
this question of when to stop. Here there is an intractable point where I just
cannot do more and I have to move on.&lt;/li&gt;
&lt;li&gt;Similarly, it &lt;strong&gt;prevents my ambition&lt;/strong&gt; from tricking me into some colossal
project I will never finish or even start. That format has a ton of
limitations, and that&#x27;s its strength.&lt;/li&gt;
&lt;li&gt;Working on such a tiny piece of code for days/weeks just &lt;strong&gt;brings me joy&lt;/strong&gt;. I
do feel like a craftsperson, spending an unreasonable amount of time
perfecting it, for the beauty of it.&lt;/li&gt;
&lt;li&gt;I&#x27;m trying to build a portfolio, and it&#x27;s important for me to keep it
&lt;strong&gt;consistent&lt;/strong&gt;. If the size limit was different, I would have done things
differently, so I can&#x27;t change it now. If I had hundreds more characters,
Red Alp might have had birds, the sky opening to lit a beam of light on the
mountains, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Why 512 in particular? It happens to be the size of a toot on &lt;a href=&quot;https://fosstodon.org/@bug&quot;&gt;my Mastodon
instance&lt;/a&gt; so I can fit the code there, and I found it to be a good
balance.&lt;/p&gt;

 </description>
</item>
<item>
 <guid>http://blog.pkh.me/p/47-text-rendering-and-effects-using-gpu-computed-distances.html</guid>
 <link>http://blog.pkh.me/p/47-text-rendering-and-effects-using-gpu-computed-distances.html</link>
 <title>Text rendering and effects using GPU-computed distances</title>
 <pubDate>Sat, 01 Nov 2025 17:20:06 -0000</pubDate>
 <description>&lt;p&gt;Text rendering is &lt;em&gt;cursed&lt;/em&gt;. Anyone who has worked on text will tell you the
same; whether it&#x27;s about layout, bi-directional, shaping, Unicode, or the
rendering itself, it&#x27;s never a completely solved problem. In my personal case,
I&#x27;ve been working on trying to render text in the context of a compositing
engine for creative content. I needed crazy text effects, and I needed them to
be reasonably fast, which implied working with the GPU as much as possible. The
distance field was an obvious requirement because it unlocks anti-aliasing and
the ability to make many great effects for basically free.&lt;/p&gt;
&lt;p&gt;In this article, we will see how to compute signed distance field on the GPU
because it&#x27;s much faster than doing it on the CPU, especially when targeting
mobile devices. We will make the algorithm decently fast, then after lamenting
about the limitations, we will see what kind of effects this opens up.&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/text-rendering/intro.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Progressive build of the &#x27;あ&#x27; glyph from the Mochiy Pop One font&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Extraction of glyph outlines&lt;/h2&gt;
&lt;p&gt;Non-bitmap fonts contain glyphs defined by outlines made of closed sequences
of lines and (quadratic or cubic) Bézier curves. Extracting them isn&#x27;t exactly
complicated: &lt;a href=&quot;https://freetype.org&quot;&gt;FreeType&lt;/a&gt; or &lt;a href=&quot;https://github.com/harfbuzz/ttf-parser&quot;&gt;ttf-parser&lt;/a&gt; typically expose a way to do that.&lt;/p&gt;
&lt;p&gt;For the purpose of this article, we&#x27;re going to hard code the list of the Bézier
curves inside the shader, but of course in a more serious setup those would be
uploaded through storage buffers or similar.&lt;/p&gt;
&lt;p&gt;Using &lt;a href=&quot;http://blog.pkh.me/misc/dump-outline.rs&quot;&gt;this tiny program&lt;/a&gt;, a glyph can be dumped as series of
outlines into a fixed size array:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;struct Bezier {
    vec2 p0; // start point
    vec2 p1; // control point 1
    vec2 p2; // control point 2
    vec2 p3; // end point
};

#define N 42
#define NC 2
const int glyph_A_count[2] = int[](33, 9);
const Bezier glyph_A[42] = Bezier[](
    Bezier(vec2(  0.365370,  -0.570817), vec2(  0.374708,  -0.631518), vec2(  0.332685,  -0.687549), vec2(  0.339689,  -0.748249)),
    Bezier(vec2(  0.339689,  -0.748249), vec2(  0.339689,  -0.748249), vec2(  0.344358,  -0.764591), vec2(  0.351362,  -0.771595)),
    Bezier(vec2(  0.351362,  -0.771595), vec2(  0.384047,  -0.822957), vec2(  0.402724,  -0.855642), vec2(  0.442412,  -0.885992)),
    // ...
);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;glyph_A_count&lt;/code&gt; contains how many Bézier curves there is for each sub-shape
composing the glyph, and &lt;code&gt;glyph_A&lt;/code&gt; contains that list of Bézier cubic curves.&lt;/p&gt;
&lt;p&gt;Even though glyphs are also composed of lines and quadratic curves, we expand
them all into cubics: &amp;quot;who can do more can do less&amp;quot;.&lt;/p&gt;
&lt;p&gt;We use these formulas to respectively expand lines and quadratics into cubics:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
B_1 &amp;amp;= \begin{bmatrix}
P_0 \\
\mathrm{mix}(P_0, P_1, 1/3) \\
\mathrm{mix}(P_0, P_1, 2/3) \\
P_1
\end{bmatrix} \\
B_2 &amp;amp;= \begin{bmatrix}
P_0 \\
\mathrm{mix}(P_0, P_1, 2/3) \\
\mathrm{mix}(P_1, P_2, 1/3) \\
P_2
\end{bmatrix}
\end{aligned}
&lt;/div&gt;
&lt;p&gt;Where &lt;span class=&quot;math inline&quot;&gt;P_n&lt;/span&gt; are the Bézier control points.&lt;/p&gt;
&lt;p&gt;For simplicity and because we want to make sure the most complex case is well
tested, we will stick to this approach in this article. But it also means there
is a lot of room for further optimizations. Since solving linear and quadratics
is much simpler, this is left as an exercise for the reader.&lt;/p&gt;
&lt;div class=&quot;admonition warning&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Warning&lt;/p&gt;
&lt;p&gt;You may be tempted to upload the polynomial form of these curves directly to
save some computation in the shader. Don&#x27;t. You will lose the exact
stitching property because one evaluated polynomial end &lt;span class=&quot;math inline&quot;&gt;B_n(1)&lt;/span&gt; will
not necessary match the next polynomial start &lt;span class=&quot;math inline&quot;&gt;B_{n+1}(0)&lt;/span&gt;. This makes
artificial &amp;quot;precision holes&amp;quot; that will break rendering in obscure ways.&lt;/p&gt;
&lt;/div&gt;
&lt;h2&gt;Signed distance to the shape&lt;/h2&gt;
&lt;p&gt;In the &lt;a href=&quot;http://blog.pkh.me/p/46-fast-calculation-of-the-distance-to-cubic-bezier-curves-on-the-gpu.html&quot;&gt;previous article&lt;/a&gt;, we saw how to get the distance to
a cubic Bézier curve. Each glyph being composed of multiple outlines, we can
simply run over all of them and pick the shortest distance.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float get_distance(vec2 p, Bezier buf[N], int counts[NC]) {
    int base = 0;
    float dist = 1e38;

    for (int j = 0; j &amp;lt; NC; j++) {
        int count = counts[j];

        for (int i = 0; i &amp;lt; count; i++) {
            Bezier b = buf[base + i];
            float d = bezier_sq(p, b.p0, b.p1, b.p2, b.p3);
            dist = min(dist, d);
        }

        base += count;
    }

    return sqrt(dist);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Where &lt;code&gt;bezier_sq()&lt;/code&gt; is the distance to the Bézier curve, squared, as defined in
the &lt;a href=&quot;http://blog.pkh.me/p/46-fast-calculation-of-the-distance-to-cubic-bezier-curves-on-the-gpu.html&quot;&gt;previous article&lt;/a&gt;.&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/text-rendering/glyph_unsigned.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Distance to the &#x27;A&#x27; glyph from the Virgil font&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This works just fine, but as you can imagine, it&#x27;s not cheap to solve so many
distances per pixel. A first straightforward optimization would be to ignore any
curve with a bounding box further than our currently best distance, because none
of them can give a shorter one:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/text-rendering/box-optim.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;Box distance optimization&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Where each box encloses a Bézier curve like this:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/text-rendering/box.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;Most naive/conservative bounding box&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;We could use a tighter bound but it would require more computation so this felt
like a good trade-off.&lt;/p&gt;
&lt;p&gt;Implementing this in the inner loop is pretty simple:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;for (int i = 0; i &amp;lt; count; i++) {
    Bezier b = buf[base + i];
    vec2 p0=b.p0, p1=b.p1, p2=b.p2, p3=b.p3;

    // Distance to box (0 if inside), squared
    vec2 q0 = min(p0, min(p1, min(p2, p3)));
    vec2 q1 = max(p0, max(p1, max(p2, p3)));
    vec2 v = max(abs(q0+q1-p-p)-q1+q0, 0.)*.5;
    float h = dot(v,v);

    // We can&#x27;t get a shorter distance than h if we were to compute the
    // distance to that curve
    if (h &amp;gt; dist)
        continue;

    float d = bezier_sq(p, p0, p1, p2, p3);
    dist = min(dist, d);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The distance to the bounding box formula comes from this &lt;a href=&quot;https://www.youtube.com/watch?v=62-pRVZuS5c&quot;&gt;explicative video by
Inigo Quilez&lt;/a&gt; (the basic one, without the inside distance), adapted to
the Bézier control point coordinates.&lt;/p&gt;
&lt;p&gt;This saves a lot of computation in certain cases, but the worst case is still
pretty terrible, as shown by the heat map of this &#x27;C&#x27; glyph:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/text-rendering/glyph_heatmap_nobest.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Heat map of how many distances are evaluated&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Indeed sometimes, it takes a long time to reach a good Bézier curve that is
small enough to disregard most of the others. We observe it the further we go
away from the beginning of the shape.&lt;/p&gt;
&lt;p&gt;So the next step is to find a good initial candidate. One cheap way to do that
is to first compute the distance to the center point of each curve, and pick
the smallest:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// Find a good initial guess
int best = 0;
float boxd = 1e38;
for (int i = 0; i &amp;lt; count; i++) {
    Bezier b = buf[base + i];
    vec2 p0=b.p0, p1=b.p1, p2=b.p2, p3=b.p3;
    vec2 q0 = min(p0, min(p1, min(p2, p3)));
    vec2 q1 = max(p0, max(p1, max(p2, p3)));
    vec2 v = (q0+q1)*.5 - p;
    float h = dot(v,v);
    if (h &amp;lt; boxd)
        best=i, boxd=h;
}

// Initial guess
Bezier bb = buf[base + best];
dist = min(dist, bezier_sq(p, bb.p0, bb.p1, bb.p2, bb.p3));

for (int i = 0; i &amp;lt; count; i++) {
    if (i == best) // We already computed this one
        continue;

    Bezier b = buf[base + i];
    vec2 p0=b.p0, p1=b.p1, p2=b.p2, p3=b.p3;
    // ...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This optimization is immediately reflected on the heat map, where only the
central point seems to become a critical point (this glyph is a pathological
case as it forms a circle):&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/text-rendering/glyph_heatmap.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Heat map with a rough initial guess&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Winding number&lt;/h2&gt;
&lt;p&gt;The last step is to figure out whether we are inside or outside the shape. There
are two schools here, the &lt;a href=&quot;https://en.wikipedia.org/wiki/Even%E2%80%93odd_rule&quot;&gt;even-odd&lt;/a&gt; and the &lt;a href=&quot;https://en.wikipedia.org/wiki/Nonzero-rule&quot;&gt;non-zero&lt;/a&gt; rules. We&#x27;ll pick the
latter because that&#x27;s the expectation in the case of font rendering.&lt;/p&gt;
&lt;p&gt;In &lt;a href=&quot;http://blog.pkh.me/p/33-deconstructing-be%CC%81zier-curves.html&quot;&gt;deconstructing Bézier curves&lt;/a&gt;, we explained the theory of that
specific algorithm so we&#x27;re not going to dive into the details again. The basic
idea is to strike a ray in one direction from our current position, and get
how many times we cross a given curve. Here we will arbitrarily choose a
horizontal ray line &lt;span class=&quot;math inline&quot;&gt;y = P_y&lt;/span&gt; where P is our current coordinate.&lt;/p&gt;
&lt;p&gt;The topology of each curve can hint us on whether it&#x27;s worth considering it or
not. For example, if every control point is above or below our current position,
it can be ignored. We can store all the signs in a mask and bail out as soon
as the ray is either completely below or completely above the bounding box of
the curve:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;int signs = int(p0.y &amp;lt; p.y)
          | int(p1.y &amp;lt; p.y) &amp;lt;&amp;lt; 1
          | int(p2.y &amp;lt; p.y) &amp;lt;&amp;lt; 2
          | int(p3.y &amp;lt; p.y) &amp;lt;&amp;lt; 3;
if (signs == 0 || signs == 15) // all signs are identical
    return 0;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Each sign indicates the position of the control point with regard to the ray.
We can use the relative position of the starting point as a reference for the
overall orientation (if there is a crossing, we know it will come from below
or above):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;int inc = (signs &amp;amp; 1) == 0 ? 1 : -1;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We also need to convert the Bézier curves to the usual polynomial
&lt;span class=&quot;math inline&quot;&gt;at^3+bt^2+ct+d&lt;/span&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec2 a = -p0 + 3.*(p1 - p2) + p3,
     b = 3. * (p0 - 2.*p1 + p2),
     c = 3. * (p1 - p0),
     d = p0 - p;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then we can find the y-roots and check every point on the x-axis. For every
crossing point (at most 3), we switch the sign:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;    float r[5];
    int count = root_find3(r, a.y, b.y, c.y, d.y);
    vec3 t = vec3(r[0], r[1], r[2]);
    vec3 v = ((a.x*t + b.x)*t + c.x)*t + d.x;
    if (count &amp;gt; 0 &amp;amp;&amp;amp; v.x &amp;gt;= 0.) w += inc;
    if (count &amp;gt; 1 &amp;amp;&amp;amp; v.y &amp;gt;= 0.) w -= inc;
    if (count &amp;gt; 2 &amp;amp;&amp;amp; v.z &amp;gt;= 0.) w += inc;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Since we already have a 5th degree root finder from the previous article, we
just have to build a tiny version for the 3rd degree:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;int root_find3(out float r[5], float a, float b, float c, float d) {
    float r2[5];
    int n = root_find2(r2, 3.*a, b+b, c);
    return cy_find5(r, r2, n, 0., 0., a, b, c, d);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;Our root finder doesn&#x27;t return roots outside &lt;span class=&quot;math inline&quot;&gt;[0,1]&lt;/span&gt; so no filtering is
required.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;To summarize:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;int bezier_winding(vec2 p, vec2 p0, vec2 p1, vec2 p2, vec2 p3) {
    int w = 0;
    int signs = int(p0.y &amp;lt; p.y)
              | int(p1.y &amp;lt; p.y) &amp;lt;&amp;lt; 1
              | int(p2.y &amp;lt; p.y) &amp;lt;&amp;lt; 2
              | int(p3.y &amp;lt; p.y) &amp;lt;&amp;lt; 3;
    if (signs == 0 || signs == 15)
        return 0;
    int inc = (signs &amp;amp; 1) == 0 ? 1 : -1;
    vec2 a = -p0 + 3.*(p1 - p2) + p3,
         b = 3. * (p0 - 2.*p1 + p2),
         c = 3. * (p1 - p0),
         d = p0 - p;
    float r[5];
    int count = root_find3(r, a.y, b.y, c.y, d.y);
    vec3 t = vec3(r[0], r[1], r[2]);
    vec3 v = ((a.x*t + b.x)*t + c.x)*t + d.x;
    if (count &amp;gt; 0 &amp;amp;&amp;amp; v.x &amp;gt;= 0.) w += inc;
    if (count &amp;gt; 1 &amp;amp;&amp;amp; v.y &amp;gt;= 0.) w -= inc;
    if (count &amp;gt; 2 &amp;amp;&amp;amp; v.z &amp;gt;= 0.) w += inc;
    return w;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For every sub-shape, we can accumulate the winding number, and use it at the end
to decide whether we&#x27;re inside or outside:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float get_distance(vec2 p, Bezier buf[N], int counts[NC]) {
    int w = 0;
    int base = 0;
    float dist = 1e38;

    for (int j = 0; j &amp;lt; NC; j++) {
        int count = counts[j];

        // Get the sign of the distance
        for (int i = 0; i &amp;lt; count; i++) {
            Bezier b = buf[base + i];
            w += bezier_winding(p, b.p0, b.p1, b.p2, b.p3);
        }

        // ...
    }

    // Positive outside, negative inside
    return (w != 0 ? -1. : 1.) * sqrt(dist);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And voilà:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/text-rendering/glyph_signed.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Signed distance to the &#x27;A&#x27; glyph from the Virgil font&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;div class=&quot;admonition warning&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Warning&lt;/p&gt;
&lt;p&gt;This winding number logic might be too fragile: it doesn&#x27;t cover potential
degenerate cases such as horizontal tangents / duplicated roots. But for
some reason, while I fought these issues for years, none of the weird corner
cases seemed to glitch in my extensive tests, probably because the root
finder is more resilient than what I was using before.&lt;/p&gt;
&lt;/div&gt;
&lt;h2&gt;Limitations&lt;/h2&gt;
&lt;h3&gt;Wicked curves&lt;/h3&gt;
&lt;p&gt;This may look satisfying, but it&#x27;s only the beginning of the problems. For
example, variadic fonts are typically following chaotic patterns:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/text-rendering/quicksand-e.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;The glyph &#x27;e&#x27; in the Quicksand font&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In addition to the self overlapping part, notice the reverse folding triangle on
the right.&lt;/p&gt;
&lt;p&gt;This completely wreck the distance field:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/text-rendering/glyph_glitch.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Glyph with a broken SDF due to overlaps&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Even with a simple character display (meaning something that doesn&#x27;t exploit the
wide range of effects available with an SDF), it starts to glitch:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/text-rendering/glyph_glitch_nodebug.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Glitching glyph due to broken SDF&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Little &amp;quot;cracks&amp;quot; should appears around the overlaps. This can be mitigated by
lowering the distance by a tiny constant to avoid the zero-crossing, but it
impacts the overall glyph (it gets more bold).&lt;/p&gt;
&lt;p&gt;And it&#x27;s not just because of variadic problem, sometimes designers rely on
overlaps for simplicity:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/text-rendering/quicksand-t.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;The glyph &#x27;t&#x27; in the Quicksand font&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;And sometimes... well let&#x27;s say they have a legitimate reason to do it:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/text-rendering/bengali.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;A Bengali glyph&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is not something that can be addressed easily.&lt;/p&gt;
&lt;p&gt;For example, take these two overlapping shapes:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/text-rendering/overlap.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;Distance inside two overlapping shapes&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;We see that the actual distance (white circle) is not the smallest distance
to either shape, and it&#x27;s not even the smallest distance to any edge: it is at
an intersection point between two curves, which we do not have. Here we&#x27;re
dealing with line segments, but with cubic curves, the problem explodes in
complexity.&lt;/p&gt;
&lt;p&gt;At this point, we need another strategy, like feeding the GPU renderer with
preprocessed outline-only curves. Many people rely on curves flattening to
address this issue. This is unfortunately yet another field of research that
we&#x27;re not going to explore this time.&lt;/p&gt;
&lt;p&gt;Inigo talked about &lt;a href=&quot;https://iquilezles.org/articles/interiordistance/&quot;&gt;the combination of signed distance&lt;/a&gt; if you want
some ideas, but aside from the first one (giving up), none seems particularly
applicable here.&lt;/p&gt;
&lt;h3&gt;Atlas and overlapping distances&lt;/h3&gt;
&lt;p&gt;Some effects such as blur or glow expand beyond the boundaries of the
characters, so the distance field needs to be larger than the glyph itself. This
means when an effect spread too large, there will be an overlap. If we&#x27;re making
an effect on a word, the distance field must be the unified version of all the
word glyphs (or sometimes even the sentence). The classic approach of an atlas
of glyph distances will not work reliably.&lt;/p&gt;
&lt;p&gt;In the following illustration, a geometry per glyph is used, each geometry is
enlarged to account for the larger distance field, and we end up with potential
overlaps when applying effects.&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/text-rendering/atlas.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;Overlapping character geometries due to larger distance&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3&gt;Rounded corners&lt;/h3&gt;
&lt;p&gt;Like all distance maps, it suffers from the same limitations. The most common
one is the rounded corners problem. This is typically addressed using a
&lt;a href=&quot;https://github.com/Chlumsky/msdfgen&quot;&gt;multi-channel signed distance field generator&lt;/a&gt;, but it&#x27;s hard for me
to tell how accessible it is for a portage on the GPU.&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/text-rendering/msdfgen-A.png&quot; alt=&quot;&quot;&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/text-rendering/msdfgen-A-ok.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;msdfgen demonstration of corners improvement&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;This problem only appears with intermediate textures. When computing exact
distances like here directly in the shaders, this is not an issue.&lt;/p&gt;
&lt;/div&gt;
&lt;h2&gt;Effects&lt;/h2&gt;
&lt;p&gt;Despite all these limitations, we can already do so much, so let&#x27;s close this
article on a positive note. This is not done here, but all of these effects are
free as soon as we have the distance field stored in an intermediate texture.&lt;/p&gt;
&lt;p&gt;First, we have anti-aliasing / blur:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/text-rendering/blur.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;AA / blur effect&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;I wrote &lt;a href=&quot;http://blog.pkh.me/p/44-perfecting-anti-aliasing-on-signed-distance-functions.html&quot;&gt;a dedicated article&lt;/a&gt; on the subject of AA (and blur) on SDF if you
want more information on how to achieve that.&lt;/p&gt;
&lt;p&gt;The shape can also be drastically altered with a simple operator such as
&amp;quot;rounding&amp;quot;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;d -= rounding;
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/text-rendering/rounding.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Rounding effect&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is the same technique we suggested to cover up for the overlap glitch
earlier, just rebranded as an effect.&lt;/p&gt;
&lt;p&gt;In the same spirit we can also create an outline stroke (on the outer edge to
preserve the original glyph design):&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/text-rendering/outline.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Outline effect&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is &lt;em&gt;sooo&lt;/em&gt; useful because it makes it possible for our text to be visible no
matter what the background is. So many editors don&#x27;t have this feature because
it&#x27;s hard and expensive to do correctly. Given a distance field though, all we
have to do is this (which also includes anti-aliasing on every border):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float aa = fwidth(d); // pixel width estimates
float w = aa * .5; // half diffuse width
vec2 b = vec2(0,1)*outline - d; // inner and outer boundaries; vec2(-1,0) for inner, vec2(-.5,.5) for centered
float inner_mask = smoothstep(-w, w, b.x); // cut-off between the outline and the outside (whole shape w/ outline)
float outer_mask = smoothstep(-w, w, b.y); // cut-off between the fill color and the outline (whole shape w/o outline)
float outline_mask = outer_mask - inner_mask;
vec3 o = (inner_color*inner_mask + outline_color*outline_mask) * outer_mask;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can also dig into our character with &lt;code&gt;d = abs(d)-ring&lt;/code&gt;:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/text-rendering/ring.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Ring effect&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;And maybe apply some glow to create a neon effect:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/text-rendering/glow.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Ring combined with a neon/glow effect&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float glow_power = glow * exp(-max(d, 0.) * 10.);
o += glow_color * glow_power;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We could also do drop shadows, all sorts of distortions, or so many other
creative way exploiting this distance. You get the idea: it is fundamental as
soon as you want fast visual effects.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This article is the last of the series on 2D rendering for me. I&#x27;ve wanted
to share this experience and knowledge after many years of struggling (mostly
alone) on these issues. I wish I could have succeeded in providing a good free
and open-source text effects rendering engine to compete with the industry
standards. (Un)fortunately for me, the adventure stops here, but I hope this
will benefit creators and future tinkerers interested in the subject.&lt;/p&gt;

 </description>
</item>
<item>
 <guid>http://blog.pkh.me/p/46-fast-calculation-of-the-distance-to-cubic-bezier-curves-on-the-gpu.html</guid>
 <link>http://blog.pkh.me/p/46-fast-calculation-of-the-distance-to-cubic-bezier-curves-on-the-gpu.html</link>
 <title>Fast calculation of the distance to cubic Bezier curves on the GPU</title>
 <pubDate>Sat, 18 Oct 2025 09:21:56 -0000</pubDate>
 <description>&lt;p&gt;Bézier curves are a core building block of text and 2D shapes rendering.
There are several approaches to rendering them, but one especially challenging
problem, both mathematically and technically, is computing the distance to
a Bézier curve. For quadratic curves (one control point), this is fairly
accessible, but for cubic (two control points) we&#x27;re going to see why it is
so hard.&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/bezier-distance/glyph.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;A glyph from the Virgil font, composed of multiple Bézier curves&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Having this distance field opens up many rendering possibilities. It&#x27;s hard, but
it&#x27;s possible; here is a live proof:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/bezier-distance/bezier-dist.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Distance to a cubic Bézier curve&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In this visualization, I&#x27;m borrowing your device resources to compute the
distance to the curve for every single pixel. The yellow points are the control
points of the curve (in white) and the blue zone is a representation of the
distance field.&lt;/p&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;All the demos and code in this article are self-contained GLSL fragment
shaders. Most of the code can be found in the article, but feel free to
inspect the source code of any of these WebGL demo for the complete code.
They can be run verbatim using &lt;a href=&quot;https://github.com/ubitux/ShaderWorkshop&quot;&gt;ShaderWorkshop&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;h2&gt;The basic maths&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;http://blog.pkh.me/p/33-deconstructing-be%CC%81zier-curves.html&quot;&gt;In a previous article&lt;/a&gt;, we explained that a Bézier curve
can be expressed as a polynomial. In our case, a cubic polynomial:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
B_3(t) = \textbf{a}t^3 + \textbf{b}t^2 + \textbf{c}t + \textbf{d}
&lt;/div&gt;
&lt;p&gt;Where &lt;strong&gt;a&lt;/strong&gt;, &lt;strong&gt;b&lt;/strong&gt;, &lt;strong&gt;c&lt;/strong&gt; and &lt;strong&gt;d&lt;/strong&gt; are the vector coefficients derived from
the start (&lt;span class=&quot;math inline&quot;&gt;P_0&lt;/span&gt;), end (&lt;span class=&quot;math inline&quot;&gt;P_3&lt;/span&gt;), and control points (&lt;span class=&quot;math inline&quot;&gt;P_1&lt;/span&gt;, &lt;span class=&quot;math inline&quot;&gt;P_2&lt;/span&gt;) using the
following formulas (you can refer to the previous article for details):&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
\textbf{a} &amp;amp;= -P_0 + 3(P_1-P_2) + P_3 \\
\textbf{b} &amp;amp;= 3P_0 - 6P_1 + 3P_2 \\
\textbf{c} &amp;amp;= -3P_0 + 3P_1 \\
\textbf{d} &amp;amp;= P_0
\end{aligned}
&lt;/div&gt;
&lt;p&gt;For a given point &lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt; in 2D space, the distance to that Bézier curve can be
expressed as a length between our curve and &lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt;:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
d(t) &amp;amp;= ||B_3(t) - \textbf{p}|| \\
     &amp;amp;= ||\textbf{a}t^3 + \textbf{b}t^2 + \textbf{c}t + \textbf{d} - \textbf{p}||
\end{aligned}
&lt;/div&gt;
&lt;p&gt;Our goal is to find the &lt;span class=&quot;math inline&quot;&gt;t&lt;/span&gt; value where &lt;span class=&quot;math inline&quot;&gt;d(t)&lt;/span&gt; is the smallest.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://registry.khronos.org/OpenGL-Refpages/gl4/html/length.xhtml&quot;&gt;length&lt;/a&gt; formula has an annoying square root, so we start with the distance
squared for simplicity, which we are going to unroll:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
D(t) &amp;amp;= d(t)^2 \\
     &amp;amp;= ||\textbf{a}t^3 + \textbf{b}t^2 + \textbf{c}t + \textbf{d} - \textbf{p}||^2 \\
     &amp;amp;= (a_xt^3 + b_xt^2 + c_xt + d_x - p_x)^2 + (a_yt^3 + b_yt^2 + c_yt + d_y - p_y)^2
\end{aligned}
&lt;/div&gt;
&lt;p&gt;The derivative of that function will allow us to identify critical points:
that is, points where the distance starts growing or reducing. Said differently,
solving &lt;span class=&quot;math inline&quot;&gt;D&#x27;(t)=0&lt;/span&gt; will identify all the maximums and minimums (we&#x27;re interested
in the latter) of &lt;span class=&quot;math inline&quot;&gt;D(t)&lt;/span&gt; (and thus &lt;span class=&quot;math inline&quot;&gt;d(t)&lt;/span&gt; as well).&lt;/p&gt;
&lt;p&gt;It is a bit convoluted in our case but straightforward to compute:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
D&#x27;(t) &amp;amp;= 2(3a_xt^2 + 2b_xt + c_x)(a_xt^3 + b_xt^2 + c_xt + d_x - p_x) \\
      &amp;amp;+ 2(3a_yt^2 + 2b_yt + c_y)(a_yt^3 + b_yt^2 + c_yt + d_y - p_y) \\
      &amp;amp;= 6a_x^2t^5 + 10a_xb_xt^4 + (8a_xc_x + 4b_x^2)t^3 + 6(a_xd_x + b_xc_x - a_xp_x)t^2 + (4b_x(d_x-p_x) + 2c_x^2)t + 2c_x(d_x-p_x) \\
      &amp;amp;+ 6a_y^2t^5 + 10a_yb_yt^4 + (8a_yc_y + 4b_y^2)t^3 + 6(a_yd_y + b_yc_y - a_yp_y)t^2 + (4b_y(d_y-p_y) + 2c_y^2)t + 2c_y(d_y-p_y) \\
      &amp;amp;= t^5  6(a_x^2+a_y^2) \\
      &amp;amp;+ t^4  10(a_xb_x+a_yb_y) \\
      &amp;amp;+ t^3  (8(a_xc_x+a_yc_y)+4(b_x^2+b_y^2)) \\
      &amp;amp;+ t^2  6(a_x(d_x-p_x)+a_y(d_y-p_y) + b_xc_x+b_yc_y) \\
      &amp;amp;+ t    (4(b_x(d_x-p_x)+b_y(d_y-p_y)) + 2(c_x^2+c_y^2)) \\
      &amp;amp;+      2(c_x(d_x-p_x)+c_y(d_y-p_y)) \\
\end{aligned}
&lt;/div&gt;
&lt;p&gt;A polynomial, this time of degree 5, emerges here. For conciseness, we can express &lt;span class=&quot;math inline&quot;&gt;D&#x27;(t)&lt;/span&gt;
polynomial coefficients as a bunch of dot products:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
D&#x27;(t) &amp;amp;= t^5 6(\textbf{a}\cdot\textbf{a}) \\
      &amp;amp;+ t^4 10(\textbf{a}\cdot\textbf{b}) \\
      &amp;amp;+ t^3 (8(\textbf{a}\cdot\textbf{c})+4(\textbf{b}\cdot\textbf{b})) \\
      &amp;amp;+ t^2 6(\textbf{a}\cdot(\textbf{d}-\textbf{p}) + \textbf{b}\cdot\textbf{c}) \\
      &amp;amp;+ t   (4(\textbf{b}\cdot(\textbf{d}-\textbf{p})) + 2(\textbf{c}\cdot\textbf{c})) \\
      &amp;amp;+     2(\textbf{c}\cdot(\textbf{d}-\textbf{p})) \\
\end{aligned}
&lt;/div&gt;
&lt;p&gt;Finally, we notice that solving &lt;span class=&quot;math inline&quot;&gt;D&#x27;(t)=0&lt;/span&gt; is equivalent to solving &lt;span class=&quot;math inline&quot;&gt;D&#x27;(t)/2 =
0&lt;/span&gt;, so we simplify the expression:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
D&#x27;(t)/2 &amp;amp;= t^5 3(\textbf{a}\cdot\textbf{a}) \\
        &amp;amp;+ t^4 5(\textbf{a}\cdot\textbf{b}) \\
        &amp;amp;+ t^3 (4(\textbf{a}\cdot\textbf{c})+2(\textbf{b}\cdot\textbf{b})) \\
        &amp;amp;+ t^2 3(\textbf{a}\cdot(\textbf{d}-\textbf{p}) + \textbf{b}\cdot\textbf{c}) \\
        &amp;amp;+ t   (2(\textbf{b}\cdot(\textbf{d}-\textbf{p})) + \textbf{c}\cdot\textbf{c}) \\
        &amp;amp;+     \textbf{c}\cdot(\textbf{d}-\textbf{p}) \\
\end{aligned}
&lt;/div&gt;
&lt;p&gt;Assuming we are able to solve this equation, we will get at most 5 values of
&lt;em&gt;t&lt;/em&gt;, among which we should find the shortest distance from &lt;em&gt;p&lt;/em&gt; to the curve.
Since &lt;em&gt;t&lt;/em&gt; is bound within 0 and 1 (start and end of the curve), we will also
have to test the distance at these locations.&lt;/p&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;We could also compute the 2nd derivative in order to differentiate minimums
from maximums, but simply evaluating the 5(+2) potential &lt;em&gt;t&lt;/em&gt; values and
keeping the smallest works just fine.&lt;/p&gt;
&lt;/div&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/bezier-distance/critical-points.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Distance from a random point to critical testing points of the curve&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The red dot in the blue field is a random point in space. The red lines show
which distances are evaluated (at most 5+2) to find the smallest one.&lt;/p&gt;
&lt;h3&gt;Translated to GLSL code&lt;/h3&gt;
&lt;p&gt;Transposing these formulas into code gives us this base template code:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float bezier_distance(vec2 p, vec2 p0, vec2 p1, vec2 p2, vec2 p3) {
    // Start by testing the distance to the boundary points at t=0 (p0) and t=1 (p3)
    vec2 dp0 = p0 - p,
         dp3 = p3 - p;
    float dist = min(dot(dp0, dp0), dot(dp3, dp3));

    // Bezier cubic points to polynomial coefficients
    vec2 a = -p0 + 3.0*(p1 - p2) + p3,
         b = 3.0 * (p0 - 2.0*p1 + p2),
         c = 3.0 * (p1 - p0),
         d = p0;

    // Solve D&#x27;(t)=0 where D(t) is the distance squared
    vec2 dmp = d - p;
    float da = 3.0 * dot(a, a),
          db = 5.0 * dot(a, b),
          dc = 4.0 * dot(a, c) + 2.0 * dot(b, b),
          dd = 3.0 * (dot(a, dmp) + dot(b, c)),
          de = 2.0 * dot(b, dmp) + dot(c, c),
          df = dot(c, dmp);

    float roots[5];
    int count = root_find5(roots, da, db, dc, dd, de, df);
    for (int i = 0; i &amp;lt; count; i++) {
        float t = roots[i];
        // Evaluate the distance to our point p and keep the smallest
        vec2 dp = ((a * t + b) * t + c) * t + dmp;
        dist = min(dist, dot(dp, dp));
    }

    // We&#x27;ve been working with the squared distance so far, it&#x27;s time to get its
    // square root
    return sqrt(dist);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;&lt;code&gt;dot(dp,dp)&lt;/code&gt; is a shorthand for the length squared, of course cheaper than
computing &lt;code&gt;length()&lt;/code&gt; which contains a square root.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;admonition warning&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Warning&lt;/p&gt;
&lt;p&gt;We assume here the root finder only returns the roots that are within &lt;span class=&quot;math inline&quot;&gt;[0,1]&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code&gt;root_find5()&lt;/code&gt; is our 5th degree root finder, that is the function that gives us
all the &lt;span class=&quot;math inline&quot;&gt;t&lt;/span&gt; (at most 5) which satisfy:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
at^5+bt^4+ct^3+dt^2+et+f = 0
&lt;/div&gt;
&lt;p&gt;But before we are able to solve that, we need to study the simpler 2nd degree
polynomial solving:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
at^2+bt+c = 0
&lt;/div&gt;
&lt;h2&gt;Solving quadratic polynomial equations&lt;/h2&gt;
&lt;p&gt;Diving into the rabbit hole of solving polynomial numerically will lead you
to insanity. But we still have to scratch the surface because superior degree
solvers usually rely on it.&lt;/p&gt;
&lt;p&gt;My favorite quadratic root finding formula is the super simple one introduced
by &lt;a href=&quot;https://www.youtube.com/watch?v=MHXO86wKeDY&quot;&gt;3Blue1Brown&lt;/a&gt;, which involves locating a mid point &lt;span class=&quot;math inline&quot;&gt;m&lt;/span&gt; from
which you get the 2 surrounding roots &lt;span class=&quot;math inline&quot;&gt;r&lt;/span&gt;:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
m &amp;amp;= -\frac{b}{2a} \\
r &amp;amp;= m \pm \sqrt{m^2-\frac{c}{a}}
\end{aligned}
&lt;/div&gt;
&lt;p&gt;In GLSL, a code to cover most common corner cases would look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// Return true if x is not a NaN nor an infinite
// highp is probably mandatory to force IEEE 754 compliance
bool isfinite(highp float x) { return (floatBitsToUint(x) &amp;amp; 0x7f800000u) != 0x7f800000u; }

// Quadratic: solve ax²+bx+c=0
int root_find2(out float r[5], float a, float b, float c) {
    int count = 0;
    float m = -b / (2.*a);
    float d = m*m - c/a;
    if (!isfinite(m) || !isfinite(d)) { // a is (probably) too small
        // Linear: solve bx+c=0
        float s = -c / b;
        if (isfinite(s))
            r[count++] = s;
        return count;
    }
    if (d &amp;lt; 0.) // no root
        return count;
    if (d == 0.) {
        r[count++] = m; // single root
        return count;
    }
    float z = sqrt(d);
    r[count++] = m - z;
    r[count++] = m + z;
    return count;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Not quite as straightforward as the math formula, isn&#x27;t it?&lt;/p&gt;
&lt;p&gt;We cannot know in advance whether the division is going to succeed, so we do
run divisions and only then check if they failed (and assume a reason for the
failing). This is much more reliable than an arbitrary epsilon value. We also
try to avoid duplicated roots.&lt;/p&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;The roots are automatically sorted because &lt;em&gt;z&lt;/em&gt; is always positive.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;admonition warning&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Warning&lt;/p&gt;
&lt;p&gt;&lt;code&gt;isfinite()&lt;/code&gt; may not be as reliable because in GLSL &amp;quot;NaNs are not required
to be generated&amp;quot;, meaning some edge case may not be supported depending on
the hardware, drivers, and the current weather in Yokohama.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;As much as I like it, this implementation might not be the most
stable numerically (even though I don&#x27;t have have strong data to back
this claim). Instead, we may prefer the formula from &lt;a href=&quot;https://numerical.recipes/&quot;&gt;Numerical
Recipes&lt;/a&gt;:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
\delta &amp;amp;= b^2-4ac \\
q &amp;amp;= -\frac{1}{2} (b + \mathrm{sign}(b)\sqrt{\delta}) \\
r_0 &amp;amp;= \frac{q}{a} \\
r_1 &amp;amp;= \frac{c}{q}
\end{aligned}
&lt;/div&gt;
&lt;p&gt;Leading to the following alternative implementation:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;int root_find2(out float r[5], float a, float b, float c) {
    int count = 0;
    float d = b*b - 4.*a*c;
    if (d &amp;lt; 0.)
        return count;
    if (d == 0.) {
        float s = -.5 * b / a;
        if (isfinite(s))
            r[count++] = s;
        return count;
    }
    float h = sqrt(d);
    float q = -.5 * (b + (b &amp;gt; 0. ? h : -h));
    float r0 = q/a, r1 = c/q;
    if (isfinite(r0)) r[count++] = r0;
    if (isfinite(r1)) r[count++] = r1;
    return count;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is not perfect at all (especially with the &lt;span class=&quot;math inline&quot;&gt;b²-4ac&lt;/span&gt; part). There
are actually many other possible implementations, and &lt;a href=&quot;https://cnrs.hal.science/hal-04116310v1/&quot;&gt;this HAL CNRS
paper&lt;/a&gt; shows how near impossible it is to make a correct one. It is
an interesting but &lt;a href=&quot;https://fosstodon.org/@bug/115351364099509082&quot;&gt;depressing&lt;/a&gt; read, especially since it &amp;quot;only&amp;quot; covers IEEE
754 floats, and we have no such guarantee on GPUs. We also don&#x27;t have &lt;code&gt;fma()&lt;/code&gt; in
WebGL, which greatly limits improvements. For now, it will have to do.&lt;/p&gt;
&lt;h2&gt;Solving quintic polynomial equations: attempt 1&lt;/h2&gt;
&lt;p&gt;Solving polynomials of degree 5 cannot be solved analytically like quadratics.
And even if they were, we probably wouldn&#x27;t do it because of numerical
instability. Typically, in my experience, analytical 3rd degree polynomials
solver do not provide reliable results.&lt;/p&gt;
&lt;p&gt;The first iterative algorithm I picked was the &lt;a href=&quot;https://en.wikipedia.org/wiki/Aberth_method&quot;&gt;Aberth–Ehrlich method&lt;/a&gt;.
Nowadays, more appropriate algorithms exist, but at the time I started messing
up with these problems (several years ago), it was a fairly good contender.
&lt;a href=&quot;https://www.youtube.com/watch?v=XIzCzfMDSzk&quot;&gt;This video&lt;/a&gt; explores how it works.&lt;/p&gt;
&lt;p&gt;The convergence to the roots is quick, and it&#x27;s overall simple to implement. But
it&#x27;s not without flaws. The main problem is that it works in complex space. We
can&#x27;t ignore the complex roots because they all &amp;quot;respond&amp;quot; to each others. And
filtering these roots out at the end implies some unreliable arbitrary threshold
mechanism (we keep the root only when the imaginary part is close to 0).&lt;/p&gt;
&lt;p&gt;The initialization process also annoyingly requires you to come up with a guess
at what the roots are, and doesn&#x27;t provide anything relevant. Aberth-Ehrlich
works by refining these initial roots, similar to a more elaborate Newton
iterations algorithm. Choosing better initial estimates leads to a faster
convergence (meaning less iterations).&lt;/p&gt;
&lt;p&gt;The Cauchy bound specifies a space by defining the radius of a disk (complex
numbers are in 2D space) where all the roots of a polynomial should lie within.
We are going to use it for the initial guess, and more specifically its &amp;quot;tight&amp;quot;
version (which unfortunately relies on &lt;code&gt;pow()&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Since Aberth-Ehrlich is a refinement and not just a shrinking process, we define
and use an inner disk that has half the area of Cauchy bound disk. That way,
we&#x27;re more likely to start with initial guesses spread in the &amp;quot;middle&amp;quot; of the
roots; this is where the √2 comes from in the formula below.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://blog.pkh.me/img/bezier-distance/cauchy.png&quot; alt=&quot;Tight cauchy bound&quot; /&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;#define K5_0 vec2( 0.951056516295154,  0.309016994374947)
#define K5_1 vec2( 0.000000000000000,  1.000000000000000)
#define K5_2 vec2(-0.951056516295154,  0.309016994374948)
#define K5_3 vec2(-0.587785252292473, -0.809016994374947)
#define K5_4 vec2( 0.587785252292473, -0.809016994374948)

int root_find5_aberth(out float roots[5], float a, float b, float c, float d, float e, float f) {
    // Initial candidates set mid-way of the tight Cauchy bound estimate
    float r = (1.0 + max_5(
        pow(abs(b/a), 1.0/5.0),
        pow(abs(c/a), 1.0/4.0),
        pow(abs(d/a), 1.0/3.0),
        pow(abs(e/a), 1.0/2.0),
            abs(f/a))) / sqrt(2.0);

    // Spread in a circle
    vec2 r0 = r * K5_0,
         r1 = r * K5_1,
         r2 = r * K5_2,
         r3 = r * K5_3,
         r4 = r * K5_4;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The circle constants are generated with the following script:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;import math
import sys

n = int(sys.argv[1])
for k in range(n):
    angle = 2 * math.pi / n
    off = math.pi / (2 * n)
    z = angle * k + off
    c, s = math.cos(z), math.sin(z)
    print(f&amp;quot;#define K{n}_{k} vec2({c:18.15f}, {s:18.15f})&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, it&#x27;s basically a simple iterative process. Unrolling everything for degree
5 looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;#define close_to_zero(x) (abs(x) &amp;lt; eps)

// This also filters out roots out of the [0,1] range
#define ADD_ROOT_IF_REAL(r) if (close_to_zero(r.y) &amp;amp;&amp;amp; r.x &amp;gt;= 0. &amp;amp;&amp;amp; r.x &amp;lt;= 1.) roots[count++] = r.x

#define SMALL_OFF(off) (dot(off, off) &amp;lt;= eps*eps)

/* Complex multiply, divide, inverse */
vec2 c_mul(vec2 a, vec2 b) { return mat2(a, -a.y, a.x) * b; }
vec2 c_div(vec2 a, vec2 b) { return mat2(a, a.y, -a.x) * b / dot(b, b); }
vec2 c_inv(vec2 z)         { return vec2(z.x, -z.y) / dot(z, z); }

// Compute f(x)/f&#x27;(x): complex polynomial evaluation (y) divided by their
// derivatives (q) using Horner&#x27;s method in one pass
vec2 c_poly5d4(float a, float b, float c, float d, float e, float f, vec2 x) {
    vec2 y =       a*x  + vec2(b, 0), q =       a*x  + y;
         y = c_mul(y,x) + vec2(c, 0); q = c_mul(q,x) + y;
         y = c_mul(y,x) + vec2(d, 0); q = c_mul(q,x) + y;
         y = c_mul(y,x) + vec2(e, 0); q = c_mul(q,x) + y;
         y = c_mul(y,x) + vec2(f, 0);
    return c_div(y, q);
}

vec2 sum_of_inv(vec2 z0, vec2 z1, vec2 z2, vec2 z3, vec2 z4) { return c_inv(z0 - z1) + c_inv(z0 - z2) + c_inv(z0 - z3) + c_inv(z0 - z4); }

int root_find5_aberth(out float roots[5], float a, float b, float c, float d, float e, float f) {
    if (close_to_zero(a))
        return root_find4_aberth(roots, b, c, d, e, f);

    // Code snip: see previous snippet
    // float r = ...
    // vec2 r0, r1, r2, ... 

    for (int m = 0; m &amp;lt; 16; m++) {
        vec2 d0 = c_poly5d4(a, b, c, d, e, f, r0),
             d1 = c_poly5d4(a, b, c, d, e, f, r1),
             d2 = c_poly5d4(a, b, c, d, e, f, r2),
             d3 = c_poly5d4(a, b, c, d, e, f, r3),
             d4 = c_poly5d4(a, b, c, d, e, f, r4);

        vec2 off0 = c_div(d0, vec2(1,0) - c_mul(d0, sum_of_inv(r0, r1, r2, r3, r4))),
             off1 = c_div(d1, vec2(1,0) - c_mul(d1, sum_of_inv(r1, r0, r2, r3, r4))),
             off2 = c_div(d2, vec2(1,0) - c_mul(d2, sum_of_inv(r2, r0, r1, r3, r4))),
             off3 = c_div(d3, vec2(1,0) - c_mul(d3, sum_of_inv(r3, r0, r1, r2, r4))),
             off4 = c_div(d4, vec2(1,0) - c_mul(d4, sum_of_inv(r4, r0, r1, r2, r3)));

        r0 -= off0;
        r1 -= off1;
        r2 -= off2;
        r3 -= off3;
        r4 -= off4;

        if (SMALL_OFF(off0) &amp;amp;&amp;amp; SMALL_OFF(off1) &amp;amp;&amp;amp; SMALL_OFF(off2) &amp;amp;&amp;amp; SMALL_OFF(off3) &amp;amp;&amp;amp; SMALL_OFF(off4))
            break;
    }

    int count = 0;
    ADD_ROOT_IF_REAL(r0);
    ADD_ROOT_IF_REAL(r1);
    ADD_ROOT_IF_REAL(r2);
    ADD_ROOT_IF_REAL(r3);
    ADD_ROOT_IF_REAL(r4);
    return count;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When the main coefficient is too small, we fall back on the 4th degree (and so
on until we reach the analytic quadratic). The 4th and 3rd degree versions of
this function are easy to guess (they&#x27;re pretty much identical, just removing
one coefficient at each degree).&lt;/p&gt;
&lt;p&gt;We&#x27;re also hardcoding a maximum of 16 iterations here because it&#x27;s usually
enough. To have an idea of how many iterations are required in practice,
following is a visualization of the heat map of the number of iterations
for every pixel:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/bezier-distance/aberth-heatmap.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Heat map of the iterations of the Aberth-Ehrlich algorithm&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The big picture and the weaknesses of the algorithm should be pretty obvious
by now. Among all drawbacks of this approach, there are also surprising
pathological cases where the algorithm is not performing well. Fortunately,
there were some progress on the state of the art in recent years.&lt;/p&gt;
&lt;h2&gt;Solving quintic polynomial equations: the state of the art&lt;/h2&gt;
&lt;p&gt;In 2022, &lt;a href=&quot;https://www.cemyuksel.com/research/polynomials/&quot;&gt;Cem Yuksel published a new algorithm for polynomial root
solving&lt;/a&gt;. Initially I had my reservations because the &lt;a href=&quot;https://github.com/cemyuksel/cyCodeBase/&quot;&gt;official
implementation&lt;/a&gt; had a &lt;a href=&quot;https://github.com/cemyuksel/cyCodeBase/issues/20&quot;&gt;few shortcomings on some edge
cases&lt;/a&gt;, which made me question its reliability. It&#x27;s also
optimized for CPU computation and is, to my very personal taste, overly complex.&lt;/p&gt;
&lt;p&gt;Fortunately, Christoph Peters &lt;a href=&quot;https://momentsingraphics.de/GPUPolynomialRoots.html&quot;&gt;showed that it was possible on the
GPU&lt;/a&gt; by implementing it for very large degrees, and without any
recursion. Inspired by that, I decided to unroll it myself for degree 5.&lt;/p&gt;
&lt;p&gt;One core difference with Aberth approach is that it is designed for arbitrary
ranges. In our case this is actually convenient because, due to how Bézier
curves are defined, we are only interested in roots between 0 and 1. We will
need to adjust the Quadratic function to work in this range, as well as keeping
the roots ordered:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-diff&quot;&gt;     }
     float h = sqrt(d);
     float q = -.5 * (b + (b &amp;gt; 0. ? h : -h));
-    float r0 = q/a, r1 = c/q;
-    if (isfinite(r0)) r[count++] = r0;
-    if (isfinite(r1)) r[count++] = r1;
+    vec2 v = vec2(q/a, c/q);
+    if (v.x &amp;gt; v.y) v.xy = v.yx; // keep them ordered
+    if (isfinite(v.x) &amp;amp;&amp;amp; v.x &amp;gt;= 0. &amp;amp;&amp;amp; v.x &amp;lt;= 1.) r[count++] = v.x;
+    if (isfinite(v.y) &amp;amp;&amp;amp; v.y &amp;gt;= 0. &amp;amp;&amp;amp; v.y &amp;lt;= 1.) r[count++] = v.y;
     return r;
 }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The core logic of the algorithm relies on a cascade of derivatives for every
degree. Christoph Peters provides an analytic formula to obtain the derivative
for any degree. This is a huge helper when we need to work for an arbitrary
degree, but in our case we can just differentiate manually:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
f_5(x) &amp;amp;= ax^5+bx^4+cx^3+dx^2+ex+f \\
f_4(x) &amp;amp;= 5ax^4+4bx^3+3cx^2+2dx+e \\
f_3(x) &amp;amp;= 20ax^3+12bx^2+6cx+2d \\
f_2(x) &amp;amp;= 60ax^2+24bx+6c
\end{aligned}
&lt;/div&gt;
&lt;p&gt;Since we&#x27;re only interested in the roots, similar to what we did to &lt;span class=&quot;math inline&quot;&gt;D(t)&lt;/span&gt;, we
can simplify some of these expressions:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
f_5(x) &amp;amp;= ax^5+bx^4+cx^3+dx^2+ex+f \\
f_4(x) &amp;amp;= 5ax^4+4bx^3+3cx^2+2dx+e \\
f_3(x) &amp;amp;= 10ax^3+6bx^2+3cx+d \\
f_2(x) &amp;amp;= 10ax^2+4bx+c
\end{aligned}
&lt;/div&gt;
&lt;p&gt;The purpose of that cascade of derivatives is to cut the curve into monotonic
segments. In practice, the core function looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;int root_find5_cy(out float r[5], float a, float b, float c, float d, float e, float f) {
    float r2[5], r3[5], r4[5];
    int n = root_find2(r2,          10.*a, 4.*b,    c);            // degree 2
    n = cy_find5(r3, r2, n, 0., 0., 10.*a, 6.*b, 3.*c,   d);       // degree 3
    n = cy_find5(r4, r3, n,     0.,  5.*a, 4.*b, 3.*c, d+d, e);    // degree 4
    n = cy_find5(r,  r4, n,             a,    b,    c,   d, e, f); // degree 5
    reutnr n;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We could unroll &lt;code&gt;cy_find3&lt;/code&gt;, &lt;code&gt;cy_find4&lt;/code&gt;, and &lt;code&gt;cy_find5&lt;/code&gt;, but to keep the
code simple, the degree 3 to 5 will share the same function, with leading
coefficients set to 0 (hopefully the compiler does its job properly).&lt;/p&gt;
&lt;p&gt;&lt;code&gt;cy_find5&lt;/code&gt; relies on roots found (at most 4) at previous stages to define
intervals of search:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://blog.pkh.me/img/bezier-distance/root_find5.png&quot; alt=&quot;Finding roots at degree 5&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Such an approach has the nice side effect of keeping the roots ordered.&lt;/p&gt;
&lt;p&gt;The solver itself is not that complex either:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float poly5(float a, float b, float c, float d, float e, float f, float t) {
     return ((((a * t + b) * t + c) * t + d) * t + e) * t + f;
}

// Quintic: solve ax⁵+bx⁴+cx³+dx²+ex+f=0
iint cy_find5(out float r[5], float r4[5], int n, float a, float b, float c, float d, float e, float f) {
    int count = 0;
    vec2 p = vec2(0, poly5(a,b,c,d,e,f, 0.));
    for (int i = 0; i &amp;lt;= n; i++) {
        float x = i == n ? 1. : r4[i],
              y = poly5(a,b,c,d,e,f, x);
        if (p.y * y &amp;gt; 0.)
            continue;
        float v = bisect5(a,b,c,d,e,f, vec2(p.x,x), vec2(p.y,y));
        r[count++] = v;
        p = vec2(x, y);
    }
    return count;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The last brick of the algorithm is the Newton bisection, the slowest part:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// Newton bisection
//
// a,b,c,d,e,f: 5th degree polynomial parameters
// t: x-axis boundaries
// v: respectively f(t.x) and f(t.y)
float bisect5(float a, float b, float c, float d, float e, float f, vec2 t, vec2 v) {
    float x = (t.x+t.y) * .5; // mid point
    float s = v.x &amp;lt; v.y ? 1. : -1.; // sign flip
    for (int i = 0; i &amp;lt; 32; i++) {
        // Evaluate polynomial (y) and its derivative (q) using Horner&#x27;s method in one pass
        float y = a*x + b, q = a*x + y;
              y = y*x + c; q = q*x + y;
              y = y*x + d; q = q*x + y;
              y = y*x + e; q = q*x + y;
              y = y*x + f;

        t = s*y &amp;lt; 0. ? vec2(x, t.y) : vec2(t.x, x);
        float next = x - y/q; // Newton iteration
        next = next &amp;gt;= t.x &amp;amp;&amp;amp; next &amp;lt;= t.y ? next : (t.x+t.y) * .5;
        if (abs(next - x) &amp;lt; eps)
            return next;
        x = next;
    }
    return x;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And that&#x27;s pretty much it. Looking at its heat map, it has a completely
different look than Aberth:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/bezier-distance/cy-bisect-heatmap.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Heat map of the iterations of Cem Yuksel algorithm&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The number of iterations might be larger but it is much faster (I observed a
factor 3 on my machine), the code is shorter, and actually more reliable.&lt;/p&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;The scale used to represent the heat map is &lt;em&gt;not&lt;/em&gt; the same as the one used
in Aberth, but it is identical with the method presented in the next section.&lt;/p&gt;
&lt;/div&gt;
&lt;h2&gt;Exploring ITP convergence&lt;/h2&gt;
&lt;p&gt;The bisection being the hot loop, it is interesting to ponder how to make
this faster. A while back, &lt;a href=&quot;https://levien.com/&quot;&gt;Raph Levien&lt;/a&gt; hypothesized about how the &lt;a href=&quot;https://en.wikipedia.org/wiki/ITP_method&quot;&gt;ITP
method&lt;/a&gt; could perform. Out of curiosity, I gave it a chance. The function
is designed to work like a bisection, claiming to be as performant in the worst
case.&lt;/p&gt;
&lt;p&gt;There isn&#x27;t a lot of code, and the paper provides a pseudo-code. But
implementing it was actually challenging in many ways.&lt;/p&gt;
&lt;p&gt;First of all, the authors didn&#x27;t seem to find relevant to mention that it only
works if &lt;span class=&quot;math inline&quot;&gt;f(a)&amp;lt;0&amp;lt;f(b)&lt;/span&gt;. If &lt;span class=&quot;math inline&quot;&gt;f(a)&amp;gt;0&amp;gt;f(b)&lt;/span&gt;, you&#x27;re pretty much on your own. It
requires just 2 lines of adjustments but figuring out this shortcoming of the
algorithm was particularly unexpected.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://blog.pkh.me/img/bezier-distance/itp-ok-fail.png&quot; alt=&quot;ITP method failing case&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Another bothering aspect concerns the parameters: &lt;span class=&quot;math inline&quot;&gt;K_1&lt;/span&gt;, &lt;span class=&quot;math inline&quot;&gt;K_2&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;n_0&lt;/span&gt;. The
paper proposes those:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
K_1 &amp;amp;= 0.1 \\
K_2 &amp;amp;= 0.98(1+\frac{1+\sqrt{5}}{2})\approx 2.56567 \\
n_0 &amp;amp;= ?
\end{aligned}
&lt;/div&gt;
&lt;p&gt;I played with them for a while and couldn&#x27;t find any set that would make
a real difference, so I ended up with the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For performance reasons, reducing &lt;span class=&quot;math inline&quot;&gt;K_2&lt;/span&gt; to a value of 2 saves a call to
&lt;code&gt;pow()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;For &lt;span class=&quot;math inline&quot;&gt;K_1&lt;/span&gt;, &lt;a href=&quot;https://cran.r-project.org/web/packages/itp/refman/itp.html&quot;&gt;CRAN&lt;/a&gt; seems to suggest &lt;span class=&quot;math inline&quot;&gt;\frac{0.2}{b-a}&lt;/span&gt; so I went along with it&lt;/li&gt;
&lt;li&gt;And for &lt;span class=&quot;math inline&quot;&gt;n_0&lt;/span&gt;, well 1 or 2 seem to be the usual parameter.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the end, the function looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// ITP algorithm (2020) by Oliveira &amp;amp; Takahashi
// &amp;quot;An Enhancement of the Bisection Method Average Performance Preserving Minmax Optimality&amp;quot;
//
// a,b,c,d,e,f: 5th degree polynomial parameters
// t: x-axis boundaries (a and b in the paper)
// v: respectively f(a) and f(b) in the paper (evaluation of the function with t.x and t.y)
float itp5(float a, float b, float c, float d, float e, float f, vec2 t, vec2 v) {
    float diff = t.y-t.x;

    // K1 and n0 suggested by CRAN
    float K1 = .2 / diff;
    int n0 = 1;

    // The paper has the assumption that f(a)&amp;lt;0&amp;lt;f(b) but we want to
    // support f(a)&amp;gt;0&amp;gt;f(b) too, so we keep a sign flip
    float s = v.x &amp;lt; v.y ? 1. : -1.;

    // Using log(ab)=log(a)+log(b): log2(x/(2ε)) &amp;lt;=&amp;gt; log2(x/ε)-1
    int nh = int(ceil(log2(diff/eps)-1.)); // n_{1/2} (half point)
    int n_max = nh + n0;

    // ε 2^(n_max-k) = ε 2^n_max 2^-k = ε 2^n_max ½^k
    // ½^k is done iteratively in the loop, simplifying the arithmetic
    float q = eps * float(1&amp;lt;&amp;lt;n_max);

    while (diff &amp;gt; eps+eps) {
        // Interpolation
        float xf = (v.y*t.x - v.x*t.y) / (v.y-v.x); // Regula-Falsi

        // Truncation
        float xh = (t.x+t.y) * .5; // x half point
        float sigma = sign(xh - xf);
        float delta = K1*diff*diff; // save a pow() by forcing K2=2
        float xt = delta &amp;lt;= abs(xh - xf) ? xf + sigma*delta : xh; // xt: truncation of xf

        // Projection
        float r = q - diff*.5;
        float x = abs(xt-xh) &amp;lt;= r ? xt : xh-sigma*r;

        // Updating
        float y = poly5(a,b,c,d,e,f, x);
        float side = s*y;
        if      (side &amp;gt; 0.) t.y=x, v.y=y;
        else if (side &amp;lt; 0.) t.x=x, v.x=y;
        else                return x;

        diff = t.y-t.x;
        q *= .5;
    }
    return (t.x+t.y) * .5;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This function can be used as a drop&#x27;in replacement for &lt;code&gt;bisect5&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;I had a lot of expectations about it, but in the end it requires more iterations
than the bisection we implemented. The paper claims to perform at least as good
as a bisection, but our &lt;code&gt;bisect5&lt;/code&gt; is driven by the derivatives so it converges
much faster. Here is the heat map with &lt;code&gt;itp5&lt;/code&gt; instead of &lt;code&gt;bisect5&lt;/code&gt;:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/bezier-distance/cy-itp-heatmap.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Heat map of the iterations of Cem Yuksel algorithm with ITP method&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The naive unrolled version of Cem Yuksel paper definitely is, so far, the best
choice for our problem. I have still concerns about how to implement a good
quadratic formula, and I have my reservations about various edge cases. There
is also still room for improvements in the cubic solver (degree 3) because
it&#x27;s still a special case where analytical formulas exist, but in general this
implementation is satisfying.&lt;/p&gt;
&lt;p&gt;The next step is to work with chains of Bézier curves to make up complex shapes
(such as font glyphs). It will lead us to build a &lt;em&gt;signed&lt;/em&gt; distance field. This
is not trivial &lt;em&gt;at all&lt;/em&gt; and mandates one or several dedicated articles. We will
hopefully study these subjects in the not-so-distant future.&lt;/p&gt;

 </description>
</item>
<item>
 <guid>http://blog.pkh.me/p/45-code-golfing-a-tiny-demo-using-maths-and-a-pinch-of-insanity.html</guid>
 <link>http://blog.pkh.me/p/45-code-golfing-a-tiny-demo-using-maths-and-a-pinch-of-insanity.html</link>
 <title>Code golfing a tiny demo using maths and a pinch of insanity</title>
 <pubDate>Mon, 29 Sep 2025 13:30:50 -0000</pubDate>
 <description>&lt;p&gt;A few weeks ago, I made a tiny demo that fits into 448 characters:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/red-alp.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Red Alp GLSL demo in 448 characters&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;void main(){vec3 c,p,K=vec3(3,1,0);for(float z,i,a,g=1.,t,h,d,w,k=.15;i++&amp;lt;1e2;d=max(max(d-3.
,-d),a=z)*k,w=g-g/exp(h&amp;gt;.001?a++,d/.4:h*3e2),g-=a*=w,c+=a*d*4.5+(d&amp;gt;z?z:h/2e2)*K,a=min(p.y+2.
,1.),c.r+=w*a*a*.1,t+=min(h*.2,k/=.985))for(p=normalize(vec3(P+P-R,R.y))*t,p.xz*=mat2(cos(
sin(T*.2)+K.zyxz*11.)),p.z+=T*.3,d=p.y,h=d+.5,a=.01;a&amp;lt;1.;a+=a)p.xz*=mat2(8,6,-6,8)*.1,d+=abs
(dot(sin((p/a+T)*.3),p-p+a)),h+=abs(dot(sin(p.xz*.6/a),P-P+a));O=vec4(tanh(c),1);}
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;The number of characters was 464 characters at first, but thanks to the
community it got reduced further, and the article updated accordingly.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There is no texture, no mesh, no 3D helper: it&#x27;s simply a procedural
mathematical formula evaluated at each pixel assigning them a color. Code
golfing is about making it as short as possible, and thus is part of the art
performance.&lt;/p&gt;
&lt;p&gt;To put things into perspective, the 853x480 JPEG thumbnail of this article is
167x larger than this code.&lt;/p&gt;
&lt;p&gt;You can watch a larger version on &lt;a href=&quot;https://b.pkh.me/2025-09-08-red-alp.htm&quot;&gt;its main dedicated page&lt;/a&gt;, or a
portage on &lt;a href=&quot;https://www.shadertoy.com/view/WflfR8&quot;&gt;Shadertoy&lt;/a&gt; (484 chars). If your device is not powerful
enough (I&#x27;m sorry for the lag on this page) or doesn&#x27;t support WebGL2, a short
preview video can be seen on &lt;a href=&quot;https://fosstodon.org/@bug/115168470956294772&quot;&gt;Mastodon&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I&#x27;m guessing the wizardry of the code has confused many people so we&#x27;re going
to dive through the making-of together. Overall, this demo is a particularly
dense and entangled compilation of different techniques, where each aspect could
mandate a dedicated article. For that reason, some parts will prefer to link to
external resources when the literacy is verbose on the subject.&lt;/p&gt;
&lt;div class=&quot;admonition warning&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Warning&lt;/p&gt;
&lt;p&gt;Some demos in this article will start &amp;quot;decaying&amp;quot; over time due to floating
point variables getting too large. Reloading the page should fix that.&lt;/p&gt;
&lt;/div&gt;
&lt;h2&gt;The base template&lt;/h2&gt;
&lt;p&gt;The code is written in GLSL and is executed for each pixel (technically each
fragment) on a simple quad geometry (to be accurate it&#x27;s even &lt;a href=&quot;https://wallisc.github.io/rendering/2021/04/18/Fullscreen-Pass.html&quot;&gt;a single big
triangle&lt;/a&gt;). There is no geometry aside from that, it&#x27;s basically just
a fragment shader.&lt;/p&gt;
&lt;p&gt;The fragment receives 3 different inputs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the canvas resolution &lt;code&gt;vec2 R&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;the time &lt;code&gt;float T&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;the pixel position &lt;code&gt;vec2 P&lt;/code&gt; (basically &lt;code&gt;gl_FragCoord.xy&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And it has to output a sRGB color in &lt;code&gt;out vec3 O&lt;/code&gt;. The code has to be written in
a &lt;code&gt;void main()&lt;/code&gt; function, and that&#x27;s pretty much all we need to start.&lt;/p&gt;
&lt;p&gt;If you&#x27;re curious about the glue to setup WebGL2, just look at the source code
on &lt;a href=&quot;https://b.pkh.me/2025-09-08-red-alp.htm&quot;&gt;the dedicated page&lt;/a&gt;. There is no external dependency and the canvas
setup code is pretty simple.&lt;/p&gt;
&lt;h2&gt;Development setup&lt;/h2&gt;
&lt;p&gt;For development, people usually directly use Shadertoy. I prefer to use
my own local live coding environment: &lt;a href=&quot;https://github.com/ubitux/ShaderWorkshop&quot;&gt;ShaderWorkshop&lt;/a&gt;. It can be run
without setting up anything, just &lt;code&gt;uv run --with shader-workshop sw-server&lt;/code&gt;
(assuming the &lt;code&gt;uv&lt;/code&gt; Python package manager is installed on the machine).
Aside from the comfort of being able to use your favorite code editor, it
allows instancing live controls for uniforms very easily, making it smooth to
interact with any value and get an immediate feedback.&lt;/p&gt;
&lt;figure&gt;
    &lt;img src=&quot;http://blog.pkh.me/img/demomaking/shader-workshop.png&quot; alt=&quot;&quot;&gt;
    &lt;figcaption&gt;Red Alp demo with user controls as seen from ShaderWorkshop&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Noise&lt;/h2&gt;
&lt;p&gt;One of the core primitive we need is a noise function: it is required for the
mountains, the fog, and the clouds.&lt;/p&gt;
&lt;p&gt;In a recent article, I &lt;a href=&quot;http://blog.pkh.me/p/42-sharing-everything-i-could-understand-about-gradient-noise.html&quot;&gt;talked about gradient noise&lt;/a&gt;. We could
technically use that, but it will have a lot of drawbacks. First of all, it&#x27;s
super expensive. I know because I made a demo using it the other day, and it was
awfully slow. Once per pixel would be fine, but in our case it will have to be
evaluated a hundred times, so we need something faster.&lt;/p&gt;
&lt;p&gt;Secondly, we&#x27;re trying to make it as short as possible, and the 2D gradient
noise, even minified, is already twice as big as the size of the full demo. We
will also need a 3D noise for the clouds and fog, which is even larger and more
expensive. And that&#x27;s not even accounting for the fbm signal combination code.&lt;/p&gt;
&lt;p&gt;Inigo Quilez, in his famous &lt;a href=&quot;https://www.shadertoy.com/view/4ttSWf&quot;&gt;Rainforest&lt;/a&gt;, used value noise. It is faster, but
it still won&#x27;t do for us for the same reasons, just somehow mitigated. And since
we&#x27;re professionals, we&#x27;re not going to cheat by sampling a noise texture.&lt;/p&gt;
&lt;p&gt;Fortunately, while reverse engineering some Shadertoy demos, in particular the
ones from &lt;a href=&quot;https://www.shadertoy.com/user/diatribes&quot;&gt;diatribes&lt;/a&gt;, I came across some code that made use of this incredible
technique of accumulating sine waves.&lt;/p&gt;
&lt;h3&gt;Combining sin waves&lt;/h3&gt;
&lt;p&gt;Let&#x27;s say we want to combine two sine waves in order to get a height map as a
3rd dimension. There are multiple ways of achieving that. For example, we can
multiply them:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
z = \sin x \times \sin y
&lt;/div&gt;
&lt;p&gt;&lt;canvas width=&quot;360&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/sinxsin.frag&quot;&gt;&lt;/canvas&gt;&lt;/p&gt;
&lt;p&gt;But we could also add them together:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
z = \sin x + \sin y
&lt;/div&gt;
&lt;p&gt;&lt;canvas width=&quot;360&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/sinpsin.frag&quot;&gt;&lt;/canvas&gt;&lt;/p&gt;
&lt;p&gt;The surprising take here is that... it&#x27;s pretty much equivalent. It doesn&#x27;t give
the same result for sure, but visually it could be considered the same, just
with a frequency and amplitude a bit different, and rotated on the z axis by 45°.&lt;/p&gt;
&lt;p&gt;Similarly, you may think using cosines instead of sinusoids would make a
difference, but no, even when combined together, they always give the same base
pattern we just saw.&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
z = \sin x + \cos y
&lt;/div&gt;
&lt;p&gt;&lt;canvas width=&quot;360&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/sinpcos.frag&quot;&gt;&lt;/canvas&gt;&lt;/p&gt;
&lt;p&gt;So let&#x27;s pick one, let&#x27;s say &lt;span class=&quot;math inline&quot;&gt;z=\sin x + \sin y&lt;/span&gt;. But this time, we&#x27;re going to
take the absolute value to transform the up and down pattern into bumps:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
z = |\sin x + \sin y|
&lt;/div&gt;
&lt;p&gt;&lt;canvas width=&quot;360&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/abssinpsin.frag&quot;&gt;&lt;/canvas&gt;&lt;/p&gt;
&lt;p&gt;These bumps are the perfect base for clouds, but not so much for spiky mountains
going through aggressive erosion. But with the help of this weird little trick,
we can just flip the shape upside down to get sharp edges:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
z = -|\sin x + \sin y|
&lt;/div&gt;
&lt;p&gt;&lt;canvas width=&quot;360&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/mabssinpsin.frag&quot;&gt;&lt;/canvas&gt;&lt;/p&gt;
&lt;p&gt;We now have the basis for both our clouds and mountains, but it&#x27;s not yet
convincing. The next step is to use the fbm loop as if we were dealing with
Gaussian or value noise: we accumulate several frequencies of our signal
together:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
z = S \sum_{i=0}^{N-1} F(\begin{bmatrix}x \\ y\end{bmatrix} \cdot l^{i}) g^{i}
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;S&lt;/span&gt; is the sign (-1 for spiky, 1 for bobby)&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;i&lt;/span&gt; is the octave identifier going from 0 to &lt;span class=&quot;math inline&quot;&gt;N-1&lt;/span&gt; (included).&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;F(x,y)&lt;/span&gt; is usually the noise signal function, in our case it&#x27;s the sinusoid
combination function, we choose &lt;span class=&quot;math inline&quot;&gt;|\sin x + \sin y|&lt;/span&gt; here.&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;l&lt;/span&gt; is the lacunarity factor, that is how frequency changes at each
octaves; this is usually a multiply by 2 or a close value.&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;g&lt;/span&gt; is the gain, that is by how the amplitude changes at each octaves; this is
usually a multiply by 0.5 or a close value.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;canvas width=&quot;360&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/fbmnorot.frag&quot;&gt;&lt;/canvas&gt;&lt;/p&gt;
&lt;p&gt;Without surprise this is still very periodic, but we can see a glimpse of chaos
emerging. The final touch does all the magic: all we have to do now is simply
rotate each layer by like, 30° or something (I&#x27;ll pick 0.5 radians here, or
about 29°):&lt;/p&gt;
&lt;p&gt;&lt;canvas width=&quot;360&quot; height=&quot;360&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/fbmrot.frag&quot;&gt;&lt;/canvas&gt;&lt;/p&gt;
&lt;p&gt;The symmetry around the origin is still noticeable, but the illusion will work
as we will move away from it. It&#x27;s also possible to add some phase or offsetting
(arbitrary addition within the &lt;code&gt;sin&lt;/code&gt; or between each layer).&lt;/p&gt;
&lt;p&gt;I implemented this in a &lt;a href=&quot;https://www.desmos.com/3d/odvwh2ttdb&quot;&gt;Desmos 3D scene&lt;/a&gt; with all the
parameters if one wants to play with it. The formula there has a few more
controls, for example the vertical location, an optional transition offset in
addition to the rotation, and controls for the base frequency and amplitude.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://blog.pkh.me/img/demomaking/desmos.png&quot; alt=&quot;Screenshot of fake noise in Desmos 3D&quot; /&gt;&lt;/p&gt;
&lt;p&gt;If this mathematical gibberish is above your head, a GLSL code for the 2D noise
could look like this with a lacunarity of 2, a gain of 0.5 and 5 octaves:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float noise(vec2 p) {
    float v = 0.0;
    float amplitude = 1.0;
    for (int i = 0; i &amp;lt; 5; i++) {
        p = rotate(0.5) * p; // rotate our space (more on this in the next section)
        v += abs(sin(p.x) + sin(p.y)) * amplitude; // accumulate noise
        p *= 2.0; // double the frequency at each octave
        amplitude *= 0.5; // half the amplitude at each octave
    }
    return v;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One cool trick here: &lt;code&gt;abs(sin(p.x)+sin(p.y))&lt;/code&gt; could also be written
&lt;code&gt;abs(dot(sin(p),vec2(1)))&lt;/code&gt;. This is interesting because now we can operate on
the two components of &lt;code&gt;p&lt;/code&gt;, easing the possibility to modify them at once (for
example doing &lt;code&gt;p*A+B&lt;/code&gt;). The &lt;code&gt;dot&lt;/code&gt; trick doesn&#x27;t work with &lt;code&gt;sin(p.x)*sin(p.y)&lt;/code&gt;,
but fortunately, as we saw before, multiply and addition are similar and could
be swapped in various situations.&lt;/p&gt;
&lt;h2&gt;Rotations&lt;/h2&gt;
&lt;p&gt;We needed some rotation for the noise, and they will be required again soon, so
we need to have a closer look to them. Let&#x27;s start with the formula most people
are familiar with:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
M =
\begin{bmatrix}
    \cos \theta &amp;amp; -\sin \theta \\
    \sin \theta &amp;amp; \cos \theta
\end{bmatrix}
&lt;/div&gt;
&lt;p&gt;A matrix can be seen as a function, so mathematically writing &lt;span class=&quot;math inline&quot;&gt;p&#x27;=M \cdot p&lt;/span&gt;
would be equivalent to the code &lt;code&gt;p=rotate(angle)*p&lt;/code&gt; with:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// Matrix for a counter-clockwise rotation
mat2 rotate(float a) {
    return mat2(
        cos(a), sin(a), // column 1
       -sin(a), cos(a)  // column 2
    );
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Doing &lt;span class=&quot;math inline&quot;&gt;p&#x27;=M \cdot p&lt;/span&gt; is rotating the &lt;em&gt;space&lt;/em&gt; &lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt; lies into, which means it gives
the illusion the &lt;em&gt;object&lt;/em&gt; is rotating &lt;strong&gt;clockwise&lt;/strong&gt;. Though, in the expression
&lt;code&gt;p=rotate(angle)*p&lt;/code&gt;, I can&#x27;t help but be bothered by the redundancy of &lt;code&gt;p&lt;/code&gt;,
so I would prefer to write &lt;code&gt;p*=rotate(angle)&lt;/code&gt; instead. Since matrices are
not commutative, this will instead do a &lt;strong&gt;counter-clockwise&lt;/strong&gt; rotation of the
object. The inlined rotation ends up being:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;p *= mat2(cos(a),sin(a),-sin(a),cos(a)); // counter-clockwise rotation of object at point p
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;To make the rotation clockwise, we can of course use &lt;code&gt;-a&lt;/code&gt;, or we can
transpose the matrix: &lt;code&gt;mat2(cos(a),-sin(a),sin(a),cos(a))&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is problematic though: we need to repeat the angle 4 times, which can be
particularly troublesome if we want to create a macro and/or don&#x27;t want an
intermediate variable for the angle. But I got you covered: trigonometry has a
shitton of identities, and we can express every &lt;code&gt;sin&lt;/code&gt; according to a &lt;code&gt;cos&lt;/code&gt; (and
the other way around).&lt;/p&gt;
&lt;p&gt;For example, here is another formulation of the same expression:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;p *= mat2(cos(a + vec4(0,3,1,0)*PI/2.0));
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now the angle appears only once, in a vectorized cosine call.&lt;/p&gt;
&lt;p&gt;GLSL has &lt;code&gt;degrees()&lt;/code&gt; and &lt;code&gt;radians()&lt;/code&gt; functions, but it doesn&#x27;t expose anything
for &lt;span class=&quot;math inline&quot;&gt;\pi&lt;/span&gt; nor &lt;span class=&quot;math inline&quot;&gt;\tau&lt;/span&gt; constants. And of course, it doesn&#x27;t have &lt;code&gt;sinpi&lt;/code&gt; and
&lt;code&gt;cospi&lt;/code&gt; implementation either. So it&#x27;s obvious they want us to use &lt;span class=&quot;math inline&quot;&gt;\arccos(-1)&lt;/span&gt;
for &lt;span class=&quot;math inline&quot;&gt;\pi&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;\arccos(0)&lt;/span&gt; for &lt;span class=&quot;math inline&quot;&gt;\pi/2&lt;/span&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;p *= mat2(cos(a + vec4(0,3,1,0)*acos(0.)));
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;To specify &lt;code&gt;a&lt;/code&gt; as a normalized value, we can use
&lt;code&gt;mat2(cos((a*4.+vec4(0,3,1,0))*acos(0.)))&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;On his Unofficial Shadertoy blog, Fabrice Neyret goes further and provide us
with &lt;a href=&quot;https://shadertoyunofficial.wordpress.com/#vector-maths&quot;&gt;a very cute approximation&lt;/a&gt;, which is the one we will use:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;p *= mat2(cos(a + vec4(0,11,33,0)));
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I &lt;a href=&quot;https://github.com/ubitux/research/blob/main/misc/rotation-approx.py&quot;&gt;checked for the best numbers in 2 digits&lt;/a&gt;, and I can confirm
they are indeed the ones providing the best accuracy.&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/rotations-precision.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Comparison of the 2 rotations matrices&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;On this last figure, the slight red/green on the outline of the circle
represents the loss of precision.&lt;/p&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;With 3 digits, &lt;code&gt;344&lt;/code&gt; and &lt;code&gt;699&lt;/code&gt; can respectively be used instead of &lt;code&gt;11&lt;/code&gt;
and &lt;code&gt;33&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This is good when we want a dynamic rotation angle (we will need that for the
camera panning typically), but sometimes we just need a hardcoded value: for
example in the &lt;code&gt;rotate(0.5)&lt;/code&gt; of our combined noise function.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;mat2(cos(.5+vec4(0,11,33,0)))&lt;/code&gt; is fine but we can do better. Through Inigo&#x27;s
demos I found the following: &lt;code&gt;mat2(.8,.6,-.6,.8)&lt;/code&gt;. It makes a rotation angle of
about 37° (around 0.64 radians) in a very tiny form. Since 0.5 was pretty much
arbitrary, we can just use this matrix as well. And we can make it even smaller
(thank you &lt;a href=&quot;https://www.shadertoy.com/user/jolle&quot;&gt;jolle&lt;/a&gt;):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;p *= mat2(8,6,-6,8)*.1; // rotate p counter-clockwise by about 37° without any trigo
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One last rotation tip from Fabrice&#x27;s bag of tricks: rotating in 3D around an
axis can be done with the help of GLSL swizzling:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;p.xz *= rotate(0.5); // 3D rotation around y-axis (the absent component)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We will use this too.&lt;/p&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;&lt;code&gt;p.zy *= rotate(.5)&lt;/code&gt; is the same &lt;code&gt;p.yz *= rotate(-.5)&lt;/code&gt;, if we need to save
one character and can&#x27;t transpose the matrix.&lt;/p&gt;
&lt;/div&gt;
&lt;h2&gt;Camera (and axis) setup&lt;/h2&gt;
&lt;p&gt;One last essential before going creative is the camera setup.&lt;/p&gt;
&lt;p&gt;We start with the 2D &lt;code&gt;P&lt;/code&gt; pixel coordinates which we are going to make resolution
independent by transforming them into a traditional mathematical coordinates
system:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// 1:1 ratio with [-1,1] along the shortest axis (horizontal or vertical)
vec2 u = (2.0*P - R) / min(R.x, R.y);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;http://blog.pkh.me/img/demomaking/coords-system.png&quot; alt=&quot;coordinate system&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Since we know our demo will be rendered in landscape mode, dividing by &lt;code&gt;R.y&lt;/code&gt;
is enough. We can also save one character using &lt;code&gt;P+P&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// 1:1 ratio with [-1,1] along the vertical axis
vec2 u = (P+P - R) / R.y;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To enter 3D space, we append a third component, giving us either a right or a
left-handed Y-up coordinates system. This choice is not completely random.&lt;/p&gt;
&lt;p&gt;Indeed, it&#x27;s easier/shorter to add a 3rd dimension at the end compared
to interleaving a middle component. Compare the length of &lt;code&gt;vec3(P, z)&lt;/code&gt; to
&lt;code&gt;vec3(P.x, z, P.y)&lt;/code&gt; (Z-up convention). In the former case, picking just a plane
remains short and easy thanks to swizzling: &lt;code&gt;p.xz&lt;/code&gt; instead of &lt;code&gt;p.xy&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To work in 3D, we need an origin point (&lt;code&gt;ro&lt;/code&gt; for ray origin) and a looking
direction (&lt;code&gt;rd&lt;/code&gt; for ray direction). &lt;code&gt;ro&lt;/code&gt; is picked arbitrarily for the eye
position, while &lt;code&gt;rd&lt;/code&gt; is usually calculated thanks to a &lt;code&gt;lookAt&lt;/code&gt; helper:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// Right-hand with Y-up (like Godot)
mat3 lookAt(vec3 origin /* where we are */, vec3 target /* where we look */) {
    vec3 w = normalize(target - origin);
    vec3 u = normalize(cross(w, vec3(0,1,0)));
    vec3 v = normalize(cross(u, w)); // Note: normalize() can be ditched here
    return mat3(u, v, w);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/coord-system.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Right-hand Y-up 3D coordinates system&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Which is then used like that, for example:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec2 u = (P+P - R) / R.y;
vec3 target = /* ... */;
vec3 ro = /* ... */;
vec3 rd = normalize(lookAt(ro, target) * vec3(u, 1));
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;I made a &lt;a href=&quot;https://www.shadertoy.com/view/wcfBRS&quot;&gt;Shadertoy demo&lt;/a&gt; to experiment with different
3D coordinate spaces if you are interested in digging this further.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;All of this is perfectly fine because it is flexible, but it&#x27;s also way too much
unnecessary code for our needs, so we need to shrink it.&lt;/p&gt;
&lt;p&gt;One approach is to pick a simple origin and straight target point so that the
matrix is as simple as possible. And then later on apply some transformations on
the point. If we give &lt;code&gt;ro=vec3(0)&lt;/code&gt; and &lt;code&gt;target=vec3(0,0,1)&lt;/code&gt;, we end up with an
identity matrix, so we can ditch everything and just write:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec3 rd = normalize(vec3((P+P - R) / R.y, 1));
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This can be shorten further: since the vector is normalized anyway, we can scale
it at will, for example by a factor &lt;code&gt;R.y&lt;/code&gt;, saving us precious characters:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec3 rd = normalize(vec3(P+P - R, R.y));
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And just like that, we are located at the origin &lt;code&gt;vec3(0)&lt;/code&gt;, looking toward Z+,
ready to render our scene.&lt;/p&gt;
&lt;h2&gt;Mountain height map&lt;/h2&gt;
&lt;p&gt;It&#x27;s finally time to build our scene. We&#x27;re going to start with our &lt;code&gt;noise&lt;/code&gt;
function previously defined, but we&#x27;re going tweak it in various ways to craft a
mountain height map function.&lt;/p&gt;
&lt;p&gt;Here is our first draft:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;const float mountain_y = -0.5; // mountain y-axis position
const float mountain_f = 0.6; // mountain base frequency

float mountain_height_map(vec2 p) {
    float h = mountain_y;
    for (float a = 1.0; a &amp;gt; 0.01; a /= 2.0) {
        p *= rotate(0.5);
        h += abs(dot(sin(p*mountain_f / a), vec2(1))) * a; // dot(sin(v),1) -&amp;gt; sin(v.x)+sin(v.y)
    }
    return -h; // minus for the spiky version of the noise
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We&#x27;re exploiting one important correlation of the noise function: at every
octave, the amplitude is halving while the frequency is doubling. So instead
of having 2 running variables, we just have an amplitude &lt;code&gt;a&lt;/code&gt; getting halved
every octave, and we &lt;em&gt;divide&lt;/em&gt; our position &lt;code&gt;p&lt;/code&gt; by &lt;code&gt;a&lt;/code&gt; (which is the same as
multiplying by a frequency that doubles itself).&lt;/p&gt;
&lt;p&gt;I actually like this way of writing the loop because we can stop the loop
when the amplitude is meaningless (&lt;code&gt;a&amp;gt;0.01&lt;/code&gt; acts as a precision stopper).
Unfortunately, we&#x27;ll have to change it to save one character: &lt;code&gt;a/=2.&lt;/code&gt; is too
long for the iteration, we&#x27;re going to double instead by using &lt;code&gt;a+=a&lt;/code&gt; which
saves one character. So instead the loop will be written the other way around:
&lt;code&gt;for (float a=.01; a&amp;lt;1.; a+=a)&lt;/code&gt;. It&#x27;s not exactly equivalent, but it&#x27;s good
enough (and we can still tweak the values if necessary).&lt;/p&gt;
&lt;p&gt;We&#x27;re going to inline the constants and rotate, and use one more cool trick:
&lt;code&gt;vec2(1)&lt;/code&gt; can be shortened: we just need another &lt;code&gt;vec2&lt;/code&gt;. Luckily we have &lt;code&gt;p&lt;/code&gt;,
so we can simply replace it with &lt;code&gt;p/p&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Finally, we can get rid of the braces of the &lt;code&gt;for&lt;/code&gt; loop by using the &lt;code&gt;,&lt;/code&gt; in its
local scope:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float mountain_height_map(vec2 p) {
    float h = -.5;
    for (float a=.01; a&amp;lt;1.; a+=a)
        p *= mat2(8,6,-6,8)*.1,
        h += abs(dot(sin(p*.6/a), p/p))*a;
    return -h;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;p/p&lt;/code&gt; works fine as long as it&#x27;s not zero. In this particular case, we can
instead use &lt;code&gt;vec2(0)&lt;/code&gt; (obtained with &lt;code&gt;p-p&lt;/code&gt;) and then include the &lt;code&gt;a&lt;/code&gt; amplitude
multiplier within the expression: &lt;code&gt;abs(dot(sin(p*.6/a), p-p+a))&lt;/code&gt;. (&lt;code&gt;p-p+a&lt;/code&gt; is
the same as &lt;code&gt;vec2(a)&lt;/code&gt; when &lt;code&gt;p&lt;/code&gt; is a &lt;code&gt;vec2&lt;/code&gt;). We end up with the following safer
version:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float mountain_height_map(vec2 p) {
    float h = -.5;
    for (float a=.01; a&amp;lt;1.; a+=a)
        p *= mat2(8,6,-6,8)*.1,
        h += abs(dot(sin(p*.6/a), p-p+a));
    return -h;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/hmap.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Mountain height map in 2D (rescaled for display)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;To render this in 3D, we are going to do some ray-marching.&lt;/p&gt;
&lt;h2&gt;Solid ray-marching&lt;/h2&gt;
&lt;p&gt;The main technique used in most Shadertoy demos is ray-marching. I will assume
familiarity with the technique, but if that&#x27;s not the case, &lt;a href=&quot;https://www.youtube.com/watch?v=khblXafu7iA&quot;&gt;An introduction
to Raymarching (YouTube)&lt;/a&gt; by kishimisu and &lt;a href=&quot;https://blog.maximeheckel.com/posts/painting-with-math-a-gentle-study-of-raymarching/&quot;&gt;Painting with Math:
A Gentle Study of Raymarching&lt;/a&gt; by Maxime Heckel were good
resources for me.&lt;/p&gt;
&lt;p&gt;In short: we start from a position in space called the ray origin &lt;code&gt;ro&lt;/code&gt; and we
project it toward a ray direction &lt;code&gt;rd&lt;/code&gt;. At every iteration we check the distance
to the closest solid in our scene, and step toward that distance, hoping to
converge closer and closer to the object boundary.&lt;/p&gt;
&lt;p&gt;We end up with this main loop template:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float t = 0.0;

vec3 ro = vec3(0); // ray origin
vec3 rd = normalize(vec3(P+P - R, R.y)); // ray direction

// 100 iterations should be enough to hit something if there is any
for (int i = 0; i &amp;lt; 100; i++) {
    vec3 p = ro + rd*t; // t amount in rd direction from ro origin
    float h = distance_to_solid(p); // 3D distance function
    if (h &amp;lt; 0.001) { // we converged close enough to a solid
        // Here we assign a color according to where p is
        // [...]
        break;
    }
    t += h; // there is no solid closer than h so we step by that much
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This works fine for solids expressed with &lt;a href=&quot;https://iquilezles.org/articles/distfunctions/&quot;&gt;3D distance fields&lt;/a&gt;, that is
functions that for a given point give the distance to the object. We will use
it for our mountain, with one subtlety: the noise height map of the mountain
is not exactly a distance (it is only the distance to what&#x27;s below our current
point &lt;code&gt;p&lt;/code&gt;):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float distance_to_solid(vec3 p) { // positive outside, negative inside
    return p.y - mountain_height_map(p.xz);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Because of this, we can&#x27;t step by the distance directly, or we&#x27;re likely to go
through mountains during the stepping (&lt;code&gt;t += h&lt;/code&gt;). A common workaround here is to
step a certain percentage of that distance to play it safe.&lt;/p&gt;
&lt;p&gt;Technically we should &lt;a href=&quot;https://www.peterstefek.me/ray-marching-heightfields.html&quot;&gt;figure out the theorical proper shrink
factor&lt;/a&gt;, but we&#x27;re going to take a shortcut today and just
arbitrarily cut. Using trial and error I ended up with 20% of the distance.&lt;/p&gt;
&lt;p&gt;After a few simplifications, we end up with the following (complete) code:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float mountain_height_map(vec2 p) {
    float h = .5;
    for (float a=.01; a&amp;lt;1.; a+=a)
        p *= mat2(8,6,-6,8)*.1,
        h += abs(dot(sin(p*.6/a), p-p+a));
    return -h;
}

float distance_to_solid(vec3 p) {
    return p.y - mountain_height_map(p.xz);
}

void main() {
    vec3 rd = normalize(vec3(P+P - R, R.y));

    float t = 0.0, color = 0.0;
    for (int i = 0; i &amp;lt; 100; i++) {
        vec3 p = rd*t;

        p.z += T*.2; // move forward

        float h = distance_to_solid(p);
        if (h &amp;lt; 0.001) {
            color = exp(-t*t*.01); // depth map like &amp;quot;coloring&amp;quot;
            break;
        }
        t += h * 0.2;
    }

    O = vec4(vec3(pow(color, 3.0/2.2)), 1);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/hmap3d.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Basic ray-marching of the mountain height map&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;We start at &lt;code&gt;ro=vec3(0)&lt;/code&gt; so I dropped the variable entirely.&lt;/p&gt;
&lt;p&gt;You may be curious about the power at the end; this is just a combination
of luminance perception with gamma 2.2 (sRGB) transfer function. It only
works well for grayscale; for more information, see &lt;a href=&quot;http://blog.pkh.me/p/43-the-current-technology-is-not-ready-for-proper-blending.html&quot;&gt;my previous article on
blending&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Clouds and fog&lt;/h2&gt;
&lt;p&gt;Compared to the mountain, the clouds and fog will need a 3 dimensional noise.
Well, we don&#x27;t need to be very original here; we simply extend the 2D noise
to 3D:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float noise3(vec3 p) {
    float v;
    for (float a=.01; a&amp;lt;1.; a+=a)
        p.xz *= mat2(8,6,-6,8)*.1,
        v += abs(dot(sin(p*.3/a + T*.3), p-p+a));
    return v;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The base frequency is lowered to &lt;code&gt;0.3&lt;/code&gt; to make it smoother, and the &lt;code&gt;p&lt;/code&gt; goes
from 2 to 3 dimensions. Notice how the rotation is only done on the y-axis, the
one pointing up): don&#x27;t worry, it&#x27;s good enough for our purpose.&lt;/p&gt;
&lt;p&gt;We also add a phase (meaning we are offsetting the sinusoid) of &lt;code&gt;T*0.3&lt;/code&gt; (&lt;code&gt;T&lt;/code&gt; is
the time in seconds, slowed down by the multiply) to slowly morph it over time.
The base frequency and time scale being identical is a happy &amp;quot;coincidence&amp;quot; to be
factored out later (I actually forgot about it until &lt;a href=&quot;https://www.shadertoy.com/user/jolle&quot;&gt;jolle&lt;/a&gt; reminded me of it).&lt;/p&gt;
&lt;p&gt;You also most definitely noticed &lt;code&gt;v&lt;/code&gt; isn&#x27;t explicitly initialized: while only
true WebGL, it &lt;a href=&quot;https://registry.khronos.org/webgl/specs/latest/1.0/#6.39&quot;&gt;guarantees zero initialization&lt;/a&gt; so we&#x27;re saving a
few characters here.&lt;/p&gt;
&lt;h2&gt;Volumetric ray-marching&lt;/h2&gt;
&lt;p&gt;For volumetric material (clouds and fog), the loop is a bit different: instead
of calculating the distance to the solid for our current point &lt;code&gt;p&lt;/code&gt;, we do
compute the density of our target &amp;quot;object&amp;quot;. Funny enough, it can be thought
as a 3D SDF but with the sign flipped: positive inside (because the density
increases as we go deeper) and negative outside (there is no density, we&#x27;re not
in it).&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;const float clouds_y = 3.0; // vertical position

float clouds_density(vec3 p) {
    float n = noise3(p);     // random value associated with a 3D position in space
    float h = -clouds_y + n; // similar to mountain_height_map() but 3d and bobby
    float d = p.y - h;       // similar to distance_to_solid()
    d = -d;                  // flip sign: distance to density
    // We are only interested in the density within the material,
    // the density will be considered 0 when outside of it.
    return max(d, 0.0);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For simplicity, we&#x27;re going to rewrite the function like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;const float clouds_y = 3.0;

float clouds_density(vec3 p) {
    float n = noise3(p);
    float d = -p.y - cloud_y + n;
    return max(d, 0.0);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Compared to the solid ray-marching loop, the volumetric one doesn&#x27;t bail out
when it reaches the target. Instead, it slowly steps into it, damping the light
as the density increases:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;const float absorption = 0.15;
const float radiance   = 1.0;

void main() {
    float step_len = 0.15;
    float t;

    vec3 rd = normalize(vec3(P+P-R,R.y));
    vec3 color;

    float transmittance = 1.0; // remaining visibility
    for (int i = 0; i &amp;lt; 100; i++) {
        vec3 p = rd*t;

        // Move camera forward
        p.z += T * 1.5;

        // How many particules of the material we can find at that position
        // If negative, we&#x27;re not in the element yet, otherwise it&#x27;s the density
        // (getting higher as we go deeper into it typically).
        float d = clouds_density(p);

        // Integrate the density discretely: we assume the segment of length
        // we&#x27;re walking has the same point density all along
        d *= step_len;

        // The fraction of light that survives through this segment (Beer-Lambert law)
        // The denser, the closer to 0 this gets
        float attenuation = exp(-d*absorption);

        float emission = d*radiance; // how much light is emitted along the segment (glow)
        float alpha = 1.0 - attenuation; // fraction of light removed for that given density segment

        float weight = alpha * transmittance;

        // Accumulate color emission
        color += weight * emission;

        transmittance -= weight; // could also be written transmittance *= attenuation

        // Normal volumetric marching (step_len) clamped to the distance to the
        // solid (mountain)
        t += step_len;

        // Larger volumetric steps as we go far
        step_len *= 1.015;
    }

    O = vec4(pow(color, vec3(3.0/2.2)), 1);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The core idea is that the volumetric material emit some radiance but also
absorbs the atmospheric light. The deeper we get, the smaller the transmittance
gets, til it converges to 0 and stops all light. All the threshold you see are
chosen by tweaking them through trial and error, not any particular logic. It is
also highly dependent on the total number of iterations.&lt;/p&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;Steps get larger and larger as the distance increases; this is because we
don&#x27;t need as much precision per &amp;quot;slice&amp;quot;, but we still want to reach a long
distance.&lt;/p&gt;
&lt;/div&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/clouds.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Basic volumetric ray-marching of the clouds density map&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;We want to be positioned below the clouds, so we&#x27;re going to need a simple sign
flip in the function.&lt;/p&gt;
&lt;p&gt;The fog will take the place at the bottom, except upside down (the
sharpness will give a mountain-hug feeling) and at a different position.
&lt;code&gt;clouds_density()&lt;/code&gt; becomes:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;const float clouds_y = 3.0;
const float fog_y    = 0.0;

float clouds_fog_density(vec3 p) {
    float n = noise3(p);

    float clouds_d = p.y - clouds_y + n;
    float fog_d    = p.y - fog_y    + n;

    // Pick the element with the highest density (they don&#x27;t overlap anyway)
    float d = max(clouds_d, -fog_d);

    return max(d, 0.0);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/cloudsfog.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Both clouds and fog with volumetric ray-marching&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;For more resources on volumetric rendering, following are the ones I studied
the most:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://wallisc.github.io/rendering/2020/05/02/Volumetric-Rendering-Part-1.html&quot;&gt;Volumetric Rendering in 2 parts&lt;/a&gt;, by Chris&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://blog.maximeheckel.com/posts/real-time-cloudscapes-with-volumetric-raymarching/&quot;&gt;Real-time dreamy Cloudscapes with Volumetric Raymarching&lt;/a&gt;, by
Maxime Heckel again&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://mini.gmshaders.com/p/volumetric&quot;&gt;Volumetric Raymarching&lt;/a&gt;, by Xor&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Combining ray-marching&lt;/h2&gt;
&lt;p&gt;Having a single ray-marching loop combining the two methods (solid and
volumetric) can be challenging. In theory, we should stop the marching when we
hit a solid, bail out of the loop, do some fancy normal calculations along with
light position. We can&#x27;t afford any of that, so we&#x27;re going to start doing art
from now on.&lt;/p&gt;
&lt;p&gt;We start from the volumetric ray-marching loop, and add the distance to the
mountain:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;for (int i = 0; i &amp;lt; 100; i++) {
    vec3 p = rd*t;

    // ...

    float d = clouds_fog_density(p);
    float h = distance_to_solid(p);

    // ...
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If &lt;code&gt;h&lt;/code&gt; gets small enough, we can assume we hit a solid:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;bool solid = h &amp;lt; 0.001;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In volumetric, the attenuation is calculated with the Beer-Lambert law. For solid,
we&#x27;re simply going to make it fairly high:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-diff&quot;&gt;-    float attenuation = exp(-d*absorption);
+    float attenuation = solid ? 0.95 : exp(-d*absorption);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This has the effect of making the mountain like a very dense gas.&lt;/p&gt;
&lt;p&gt;We&#x27;re also going to disable the light emission from the solid (it will be
handled differently down the line):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-diff&quot;&gt;-    float emission = d*radiance;
+    float emission = solid ? 0.0 : d*radiance;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The transmittance is not going to be changed when we hit a solid as we just want
to accumulate light onto it:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-diff&quot;&gt;-    transmittance -= weight;
+    if (!solid) transmittance -= weight;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, we have to combine the volumetric stepping (&lt;code&gt;t += step_len&lt;/code&gt;) with the
solid stepping (&lt;code&gt;t += h*0.2&lt;/code&gt;) by choosing the safest step length, that is the
minimum:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-diff&quot;&gt;-    t += step_len;
+    t += min(h*0.2, step_len);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We end up with the following:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/comb0.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Combination of volumetric and solid ray-marching&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;We can notice the mountain from negative space and the discrete presence of the
fog, but it&#x27;s definitely way too dark. So the first thing we&#x27;re going to do is boost
the radiance, as well as the absorption for the contrast:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-diff&quot;&gt;-const float absorption = 0.15;
-const float radiance   = 1.0;
+const float absorption = 2.5;
+const float radiance   = 4.5;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will make the light actually overshoot, so we also have to replace
the current gamma 2.2 correction with a &lt;a href=&quot;https://mini.gmshaders.com/p/func-tanh&quot;&gt;cheap and simple tone mapping
hack&lt;/a&gt;: &lt;code&gt;tanh()&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-diff&quot;&gt;-    O = vec4(pow(color, vec3(3.0/2.2)), 1);
+    O = vec4(tanh(color), 1);
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/comb1.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Tonemapping the scene&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The clouds and fog are much better but the mountain is still trying to act cool.
So we&#x27;re going to tweak it in the loop:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;emission += 0.1;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This boosts the overall emission.&lt;/p&gt;
&lt;p&gt;While at it, since the horizon is also sadly dark, we want to blast some light
into it:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;color += d == 0.0 ? 0.005*h : 0.0;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;mkbosmans&lt;/code&gt; from HackerNews noticed that the opposite of &lt;code&gt;d==0.0&lt;/code&gt; is actually
&lt;code&gt;d&amp;gt;0.0&lt;/code&gt; due to the &lt;code&gt;max(...,0)&lt;/code&gt;. So we could write it more simply:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;color += d &amp;gt; 0.0 ? 0.0 : 0.005*h;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When the density is null (meaning we&#x27;re outside clouds and fog), an additional
light is added, proportional to how far we are from any solid (the sky gets the
most boost basically).&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/comb2.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;More atmospheric light&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The mountain looks fine but I wanted a more eerie atmosphere, so I changed the
attenuation:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-diff&quot;&gt;-    float attenuation = solid ? 0.95 : exp(-d*absorption);
+    float attenuation = exp(solid ? -h*300.0 : -d*absorption);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now instead of being a hard value, the attenuation is correlated with the
proximity to the solid (when getting close to it). This has nothing to with any
physics formula or anything, it&#x27;s more of an implementation trick which relies
on the ray-marching algorithm. The effect it creates is those crack-like polygon
edges on the mountain.&lt;/p&gt;
&lt;p&gt;To add more to the effect, the emission boost is tweaked into:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-diff&quot;&gt;-    emission += 0.1;
+    float e = min(p.y - mountain_y + 1.5, 1.0);
+    emission += e*e * 0.1;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This makes the bottom of the mountain darker quadratically: only the tip of the
mountain would have the glowing cracks.&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/comb3.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Making mountains eerie&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Color&lt;/h2&gt;
&lt;p&gt;We&#x27;ve been working in grayscale so far, which is a usually a sound approach to
visual art in general. But we can afford a few more characters to move the scene
to a decent piece of art from the 21st century.&lt;/p&gt;
&lt;p&gt;Adding the color just requires very tiny changes. First, the emission boost is going
to target only the red component of the color:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-diff&quot;&gt;-    emission += e*e * 0.1;
     float alpha = 1. - attenuation;
     float weight = alpha * transmittance;
     color += weight * emission;
+    color.r += weight * e*e * 0.1;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And similarly, the overall addition of light into the horizon/atmosphere is
going to get a redish/orange tint:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-diff&quot;&gt;-    color += d &amp;gt; 0.0 ? 0.0 : 0.005*h;
+    color += (d &amp;gt; 0.0 ? 0.0 : 0.005*h) * vec3(3,1,0);
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/demomaking/comb4.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Add a red/orange tint&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Last tweaks&lt;/h2&gt;
&lt;p&gt;We&#x27;re almost done. For the last tweak, we&#x27;re going to add a cyclic panning
rotation of the camera, and adjust the moving speed:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;p.xz *= mat2(cos(sin(T*.2)+vec4(0,11,33,0)));
p.z += T*.3;
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;I&#x27;m currently satisfied with the &amp;quot;seed&amp;quot; of the scene, but otherwise it would
have been possible to nudge the noise in different ways. For example,
remember the &lt;code&gt;sin&lt;/code&gt; can be replaced with &lt;code&gt;cos&lt;/code&gt; in either or both volumetric
and mountain related noises. Similarly, the offsetting &lt;code&gt;+T&lt;/code&gt; could be changed
into &lt;code&gt;-T&lt;/code&gt; for a different morphing effect. And of course the rotations can
be swapped (either by changing &lt;code&gt;.xz&lt;/code&gt; into &lt;code&gt;.zx&lt;/code&gt; or transposing the values).&lt;/p&gt;
&lt;/div&gt;
&lt;h2&gt;Code golfing&lt;/h2&gt;
&lt;p&gt;At this point, our code went through early stages of code golfing, but it still
needs some work to reach perfection. Stripped out of its comments, it looks
like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// Reference code: 1278 chars (unnecessary spaces and line breaks are not counted)
const float fog_y      = 0.0;
const float clouds_y   = 3.0;
const float mountain_y = -0.5;
const float absorption = 2.5;
const float radiance   = 4.5;

float noise3(vec3 p) {
    float v;
    for(float a=.01; a&amp;lt;1.; a+=a)
        p.xz *= mat2(8,6,-6,8)*.1,
        v += abs(dot(sin(p*.3/a + T*.3), vec3(1)))*a;
    return v;
}

float clouds_fog_density(vec3 p) {
    float n = noise3(p);
    float clouds_d = p.y-clouds_y+n;
    float fog_d    = p.y-fog_y+n;
    float d = max(clouds_d, -fog_d);
    return max(d, 0.0);
}

float mountain_height_map(vec2 p) {
    float h = -mountain_y;
    for (float a=.01; a&amp;lt;1.; a+=a)
        p *= mat2(8,6,-6,8)*.1,
        h += abs(dot(sin(p*.6/a), vec2(1)))*a;
    return -h;
}

float distance_to_solid(vec3 p) {
    return p.y - mountain_height_map(p.xz);
}

void main() {
    float step_len = 0.15;
    float t;

    vec3 color;

    float transmittance = 1.0;
    vec3 rd = normalize(vec3(P+P-R,R.y));
    for (int i = 0; i &amp;lt; 100; i++) {
        vec3 p = rd*t;

        p.xz *= mat2(cos(sin(T*.2)+vec4(0,11,33,0)));
        p.z += T*.3;

        float d = clouds_fog_density(p);
        float h = distance_to_solid(p);

        bool solid = h &amp;lt; 0.001;
        d *= step_len;
        float attenuation = exp(solid ? -h*300.0 : -d*absorption);
        float emission = solid ? 0.0 : d*radiance;
        float e = min(p.y - mountain_y + 1.5, 1.0);
        float alpha = 1. - attenuation;
        float weight = alpha * transmittance;
        color   += weight * emission;
        color.r += weight * e*e * 0.1;
        color += (d &amp;gt; 0.0 ? 0.0 : 0.005*h) * vec3(3,1,0);
        if (!solid) transmittance -= weight;
        t += min(h*0.2, step_len);
        step_len *= 1.015;
    }

    O = vec4(tanh(color), 1);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first thing we&#x27;re going to do is notice that both the mountain, clouds, and
fog use the exact same loop. Factoring them out and inlining the whole thing in
the main function is the obvious move:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// 922 chars
const float fog_y      = 0.0;
const float clouds_y   = 3.0;
const float mountain_y = -0.5;
const float absorption = 2.5;
const float radiance   = 4.5;

void main() {
    float step_len = 0.15;
    float t;

    vec3 color;

    float transmittance = 1.0;
    vec3 rd = normalize(vec3(P+P-R,R.y));
    for (int i = 0; i &amp;lt; 100; i++) {
        vec3 p = rd*t;

        p.xz *= mat2(cos(sin(T*.2)+vec4(0,11,33,0)));
        p.z += T*.3;

        float d = p.y;
        float h = p.y-mountain_y;
        for (float a=.01; a&amp;lt;1.; a+=a)
            p.xz *= mat2(8,6,-6,8)*.1,
            d += abs(dot(sin(p*.3/a + T*.3), vec3(1)))*a,
            h += abs(dot(sin(p.xz*.6/a), vec2(1)))*a;
        d = max(max(d-clouds_y, -(d-fog_y)), 0.0);

        bool solid = h &amp;lt; 0.001;
        d *= step_len;
        float attenuation = exp(solid ? -h*300.0 : -d*absorption);
        float emission = solid ? 0.0 : d*radiance;
        float e = min(p.y - mountain_y + 1.5, 1.0);
        float alpha = 1. - attenuation;
        float weight = alpha * transmittance;
        color   += weight * emission;
        color.r += weight * e*e * 0.1;
        color += (d &amp;gt; 0.0 ? 0.0 : 0.005*h) * vec3(3,1,0);
        if (!solid) transmittance -= weight;
        t += min(h*0.2, step_len);
        step_len *= 1.015;
    }

    O = vec4(tanh(color), 1);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we are going to do the following changes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rename every variable to single letter or inline them whenever possible&lt;/li&gt;
&lt;li&gt;Inline all constants&lt;/li&gt;
&lt;li&gt;Remove &lt;a href=&quot;https://registry.khronos.org/webgl/specs/latest/1.0/#6.39&quot;&gt;any explicit zero initialization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;float&lt;/code&gt; instead of &lt;code&gt;int&lt;/code&gt; for the iterator and &lt;code&gt;bool&lt;/code&gt; for the solid flag&lt;/li&gt;
&lt;li&gt;Pack all &lt;code&gt;float&lt;/code&gt; and &lt;code&gt;vec3&lt;/code&gt; declarations together&lt;/li&gt;
&lt;li&gt;Simplify numbers: &lt;code&gt;1e2&lt;/code&gt; instead of &lt;code&gt;100.0&lt;/code&gt;, &lt;code&gt;3.&lt;/code&gt; instead of &lt;code&gt;3.0&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;vec*()&lt;/code&gt; constructor act like cast, so you can pass down integers&lt;/li&gt;
&lt;li&gt;Instead of &lt;code&gt;*x&lt;/code&gt;, &lt;code&gt;/(1/x)&lt;/code&gt; is sometimes shorter (for example &lt;code&gt;/.4&lt;/code&gt; instead
of &lt;code&gt;*2.5&lt;/code&gt;) (thanks &lt;a href=&quot;https://www.shadertoy.com/user/coyote&quot;&gt;coyote&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// 491 chars
void main() {
    vec3 c, p;
    for (float i, a, g=1., t, h, d, w, k=.15, x, e; i &amp;lt; 1e2; i++) {
        p = normalize(vec3(P+P-R,R.y))*t;
        p.xz *= mat2(cos(sin(T*.2)+vec4(0,11,33,0)));
        p.z += T*.3;
        d = p.y;
        h = p.y+.5;
        for (a=.01; a&amp;lt;1.; a+=a)
            p.xz *= mat2(8,6,-6,8)*.1,
            d += abs(dot(sin(p*.3/a + T*.3), vec3(1)))*a,
            h += abs(dot(sin(p.xz*.6/a), vec2(1)))*a;
        d = max(max(d-3., -d), 0.);
        x = h &amp;lt; .001 ? 0. : 1.;
        d *= k;
        e = min(p.y+2., 1.);
        w = g * (1. - exp(x==0. ? -h*3e2 : -d/.4));
        c += w * x*d*4.5;
        c.r += w * e*e * .1;
        c += (d &amp;gt; 0. ? .0 : h/2e2) * vec3(3,1,0);
        g -= w * x;
        t += min(h*.2, k);
        k /= .985;
    }
    O = vec4(tanh(c), 1);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Last pass of tricks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Merge and unroll more expressions together&lt;/li&gt;
&lt;li&gt;Use alternative forms for &lt;code&gt;vec*(1)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Rely on mathematical equivalences such as &lt;span class=&quot;math inline&quot;&gt;e^{-x}=1/e^x&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;Some symbol names can be reused (see &lt;code&gt;a&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Notice how the rotation matrix coefficients (&lt;code&gt;0,11,33,0&lt;/code&gt;) are close to the
red factors (&lt;code&gt;3,1,0&lt;/code&gt;)? That&#x27;s right, we can factor that out into a shared
constant &lt;code&gt;K&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Iterate &lt;code&gt;i&lt;/code&gt; within the condition&lt;/li&gt;
&lt;li&gt;We&#x27;re going to inline &lt;code&gt;k*=1.015&lt;/code&gt; inside the &lt;code&gt;min()&lt;/code&gt;: this is &lt;em&gt;not&lt;/em&gt; equivalent,
but in practice it makes no difference&lt;/li&gt;
&lt;li&gt;The first 5 instructions of the main loop go into the initialization
placeholder of the inner &lt;code&gt;for&lt;/code&gt;, and all the others go into the iteration
placeholder of the outer &lt;code&gt;for&lt;/code&gt;, so that we can remove all &lt;code&gt;{}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Declare a &lt;code&gt;z&lt;/code&gt; to be used instead of &lt;code&gt;0.&lt;/code&gt; since we have a bunch of them (thanks
&lt;a href=&quot;https://www.shadertoy.com/user/coyote&quot;&gt;coyote&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;x = h &amp;lt; .001 ? 0. : 1.&lt;/code&gt; can also be obtained progressively through some
increment trick (thanks &lt;a href=&quot;https://www.shadertoy.com/user/coyote&quot;&gt;coyote&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&#x27;m also reordering a bit some instructions for clarity 🙃&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// 448 chars
void main() {
    vec3 c,p,K=vec3(3,1,0);
    for(float z,i,a,g=1.,t,h,d,w,k=.15; i++&amp;lt;1e2;
        d = max(max(d-3.,-d),a=z)*k,
        w = g-g/exp(h&amp;gt;.001?a++,d/.4:h*3e2),
        g -= a*=w,
        c += a*d*4.5+(d&amp;gt;z?z:h/2e2)*K,
        a = min(p.y+2.,1.),
        c.r += w*a*a*.1,
        t += min(h*.2,k/=.985))
        for(p=normalize(vec3(P+P-R,R.y))*t,
            p.xz*=mat2(cos(sin(T*.2)+K.zyxz*11.)),
            p.z+=T*.3,
            d=p.y,h=d+.5,a=.01;a&amp;lt;1.;a+=a)
            p.xz *= mat2(8,6,-6,8)*.1,
            d += abs(dot(sin((p/a+T)*.3),p-p+a)),
            h += abs(dot(sin(p.xz*.6/a),P-P+a));
    O = vec4(tanh(c),1);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And here we are. All we have to do now is remove all unnecessary spaces and
line breaks to obtain the final version. I&#x27;ll leave you here with this readable
version.&lt;/p&gt;
&lt;figure&gt;
    &lt;img src=&quot;http://blog.pkh.me/img/demomaking/courtney-cook-FALrwN_MpeE-unsplash.jpg&quot; alt=&quot;&quot;&gt;
    &lt;figcaption&gt;Golfer by Courtney Cook (Unsplash)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Forewords&lt;/h2&gt;
&lt;p&gt;I&#x27;m definitely breaking the magic of that artwork by explaining everything
in detail here. But it should be replaced with an appreciation for how much
concepts, math, and art can be packed in so little space. Maybe this is possible
because they fundamentally overlap?&lt;/p&gt;
&lt;p&gt;Nevertheless, writing such a piece was extremely refreshing and liberating. As a
developer, we&#x27;re so used to navigate through mountains of abstractions, dealing
with interoperability issues, and pissing glue code like robots. Here, even
though GLSL is a very crude language, I can&#x27;t stop but being in awe by how much
beauty we can produce with a standalone shader. It&#x27;s just... Pure code and math,
and I just love it.&lt;/p&gt;

 </description>
</item>
<item>
 <guid>http://blog.pkh.me/p/44-perfecting-anti-aliasing-on-signed-distance-functions.html</guid>
 <link>http://blog.pkh.me/p/44-perfecting-anti-aliasing-on-signed-distance-functions.html</link>
 <title>Perfecting anti-aliasing on signed distance functions</title>
 <pubDate>Sat, 26 Jul 2025 14:29:32 -0000</pubDate>
 <description>&lt;p&gt;Doing anti-aliasing on &lt;a href=&quot;https://en.wikipedia.org/wiki/Signed_distance_function&quot;&gt;SDF&lt;/a&gt; is not as straightforward as it seems. Most of the
time, we see people use a &lt;code&gt;smoothstep&lt;/code&gt; with hardcoded constants, sometimes with
screen space information, sometimes cryptic or convoluted formulas. Even if SDFs
have the perfect mathematical properties needed for a clean anti-aliasing, the
whole issue has a scope larger than it appears at first glance. And even when
trivial solutions exist, it&#x27;s not always clear why they are a good fit. Let&#x27;s
study that together.&lt;/p&gt;
&lt;h2&gt;SDF&lt;/h2&gt;
&lt;p&gt;The article assumes that you are at least a bit familiar with what an SDF is,
but if I had to provide a quick and informal definition, I would say something
like:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;quot;It&#x27;s a function (or lookup-table of said function, usually stored in a
texture) which returns the signed distance from the specified coordinates to
a given shape, where the sign indicates whether you&#x27;re inside or outside the
shape.&amp;quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A common visualization of it looks like this:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;320&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/anti-aliasing/sdf-debug.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;SDF of a moving pie/pacman, using Inigo Quilez formula and colorscheme for visualization&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The distance is fancily colored here for illustrative purpose, and the shape is
animated to see how it affects the field.&lt;/p&gt;
&lt;p&gt;Another way of seeing it is to switch to a 3D view:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;480&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/anti-aliasing/sdf-3d.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;SDF of a moving pie/pacman, as seen in 3D&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;For the sign interpretation, here we&#x27;re using the convention &lt;strong&gt;positive
inside and negative outside&lt;/strong&gt;, &lt;a href=&quot;https://en.wikipedia.org/wiki/Signed_distance_function#/media/File:Signed_distance1.png&quot;&gt;as seen for example on the Wikipedia
illustration&lt;/a&gt;. But this is not always the case, for example, Inigo
prefers the opposite: &lt;strong&gt;negative inside and positive outside&lt;/strong&gt;. I personally
find the Wikipedia convention to be more intuitive and easy to work with, but
that&#x27;s a matter of preferences so we&#x27;ll figure out the formulas for both models.
Switching from one to the other is just a sign swap, but it&#x27;s important to know
what we are working with.&lt;/p&gt;
&lt;h2&gt;Linear ramp&lt;/h2&gt;
&lt;p&gt;A properly crafted SDF has a gradient of length 1, meaning the slope is either
going up or down, but always at the same constant rate of 1:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/anti-aliasing/sdf-grad1.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;1D side cut of an SDF depicting the gradient/slope&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is an important property since anti-aliasing is all about transitioning
smoothly toward (or away from) the shape. For our first attempt at anti-aliasing
we will simply follow that ramp and make a straight transition.&lt;/p&gt;
&lt;p&gt;Once again, we are going to rely on &lt;code&gt;linear&lt;/code&gt;, one of &lt;a href=&quot;http://blog.pkh.me/p/29-the-most-useful-math-formulas.html&quot;&gt;the most useful math
formulas&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float linear(float a, float b, float x) { return (x-a)/(b-a); }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And more specifically we will need its saturated version &lt;code&gt;linearstep&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float linearstep(float a, float b, float x) { return clamp(linear(a,b,x), 0.0, 1.0); }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is the same as the well-known &lt;code&gt;smoothstep&lt;/code&gt;, except it&#x27;s a straight line
when transitioning from &lt;code&gt;a&lt;/code&gt; to &lt;code&gt;b&lt;/code&gt;.&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/anti-aliasing/linearstep.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;linearstep function&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The length of our ramp (the transition zone between &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt;) is going to be
arbitrary at first, we will call it &lt;code&gt;w&lt;/code&gt; (for &amp;quot;width&amp;quot;). It&#x27;s our diffuse, or blur
parameter if you prefer. The height &lt;code&gt;h&lt;/code&gt; we are looking for corresponds to the
opacity of our shape.&lt;/p&gt;
&lt;p&gt;Given a positive inside and negative outside SDF, we will start with the
transition centered around the boundary between the shape and its outside.&lt;/p&gt;
&lt;p&gt;You might be confused about the relationship between the distance and the
transition zone (diffuse width &lt;code&gt;w&lt;/code&gt;). The following diagram may help clarifying
why:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/anti-aliasing/w-vs-d.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;The relationship between the diffuse width (w) and the signed distance (d)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Remember, the gradient of an SDF is supposed to have a length of 1. This means
there is a direct match between the height of the signed distance (y-axis on the
figure), and the spacial distance traveled (x-axis on the figure).&lt;/p&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;This is why Inigo and other folks spend a lot of energy into looking for
the perfect formula for &lt;a href=&quot;https://iquilezles.org/articles/ellipsedist/&quot;&gt;the distance to an ellipse&lt;/a&gt;. We cannot
just stretch a circle as it would distort the SDF, and thus break this
important property. A broken AA would be one of the consequences.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The previous figure shows that for a centered transition, when the distance
&lt;code&gt;d&lt;/code&gt; is within &lt;code&gt;[-w/2,w/2]&lt;/code&gt;, it represents a transition width of size &lt;code&gt;w&lt;/code&gt; around
the edge, so we want it to be mapped to an opacity within &lt;code&gt;[0,1]&lt;/code&gt;. This can be
expressed with:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float h = linearstep(-w/2.0, w/2.0, d);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which can be unrolled and simplified into the following tiny form:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float h = clamp(0.5 + d/w, 0.0, 1.0);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can also decide to make the transition on the outside or the inside boundary
of the shape:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float h_in  = linearstep(0.0, w, d);  // or simply clamp(    d/w, 0.0, 1.0);
float h_out = linearstep(-w, 0.0, d); // or simply clamp(1.0+d/w, 0.0, 1.0);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And we can be creative and have a cursor indicating where we are on the border.
If we give &lt;code&gt;k=0&lt;/code&gt; for inside, &lt;code&gt;k=0.5&lt;/code&gt; for centered and &lt;code&gt;k=1&lt;/code&gt; for outside:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float h = clamp(k + d/w, 0.0, 1.0);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For the negative inside and positive outside SDF, we simply swap the sign:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float h = clamp(k - d/w, 0.0, 1.0);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Any of these one-liners is all we need to have AA for our shape, but the
question of what value to use for the ramp width &lt;code&gt;w&lt;/code&gt; arises.&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;320&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/anti-aliasing/sdf-blurry.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;&quot;anti-aliasing&quot; with a width w oscillating within [0.1,0.3]&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Pixel size&lt;/h2&gt;
&lt;p&gt;The difference between a blur and anti-aliasing is simply the width value. With
AA, it&#x27;s the size of a &amp;quot;pixel&amp;quot;, and with a blur it&#x27;s typically a user input or
an arbitrarily large value.&lt;/p&gt;
&lt;p&gt;If we are in 2D and have access to the pixel resolution, we can use it to get
the pixel size. Note that this is closely tied to the coordinate space we use to
calculate the SDF.&lt;/p&gt;
&lt;p&gt;For example, let&#x27;s say we have a canvas for which we don&#x27;t know the aspect ratio,
we can calculate the screen coordinates like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec2 p = (2.0*gl_FragCoord.xy - resolution) / min(resolution.x, resolution.y);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will give us a &lt;code&gt;p&lt;/code&gt; value within &lt;code&gt;[-1,1]&lt;/code&gt; on the shortest axis (the y-axis
in landscape mode) and preserve a squared aspect ratio. That means the shortest
axis will have an amplitude of &lt;code&gt;2&lt;/code&gt;, which means the number of pixel on that axis
correspond to 2 units. As a result, the unit-width that will be used for the
signed distance can be obtained with:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float w = 2.0 / min(resolution.x, resolution.y);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Remember that this is true &lt;em&gt;only&lt;/em&gt; if the position &lt;code&gt;p&lt;/code&gt; we use for the SDF is
in that range. Basically, we have to adjust this formula to the coordinate space
we are using.&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;480&quot; height=&quot;320&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/anti-aliasing/sdf-exact.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;anti-aliasing with a width w of 1 pixel&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The same with a x10 resolution to better see the AA:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;48&quot; height=&quot;32&quot; style=&quot;width:480px; height:320px; image-rendering:pixelated&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/anti-aliasing/sdf-exact.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;anti-aliasing with a width w of 1 pixel (resolution x10)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;3D and numerical derivatives&lt;/h2&gt;
&lt;p&gt;But sometimes we might not have access to the resolution, or we may want to map
that 2D SDF onto a plane in 3D or some other transformation. For example, a
decal or a text on the wall in a video game. In that later case, if we were to
use the screen resolution, it would lead to inconsistent anti-aliasing:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;120&quot; height=&quot;80&quot; style=&quot;width:480px; height:320px; image-rendering:pixelated&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/anti-aliasing/sdf-decal.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;An SDF shape viewed from above and in perspective, resolution x4&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;We can see that, when put in perspective, the edge in the back gets way too
sharp while the edge in the front becomes a bit too blurry. Fortunately, there
is a magic trick we can use, the numerical derivatives:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float w = fwidth(d);
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;120&quot; height=&quot;80&quot; style=&quot;width:480px; height:320px; image-rendering:pixelated&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/anti-aliasing/sdf-decal-fwidth.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;An SDF shape viewed from above and in perspective, using w=fwidth(d), resolution x4&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Now we magically have a smooth pixel-wise anti-aliasing, no matter the
perspective. What is this sorcery? 🧙&lt;/p&gt;
&lt;p&gt;&lt;code&gt;fwidth&lt;/code&gt; calculate the rate of change of a given variable using fragment-based
numerical derivatives. Mathematically this is a L1-norm (also known as Taxicab
or Manhattan norm) defined as: &lt;code&gt;abs(dFdx(x)) + abs(dFdy(x))&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;But how the hell is this providing a good pixel width estimate?&lt;/p&gt;
&lt;p&gt;Let&#x27;s see the simple case where we want observe the rate of change of one
variable across one axis. For example, &lt;code&gt;dFdx(px)&lt;/code&gt; where &lt;code&gt;px&lt;/code&gt; is the pixel
coordinate: &lt;code&gt;int px = gl_FragCoord.x&lt;/code&gt;. We will have &lt;code&gt;dFdx(px)=1&lt;/code&gt;. Why? Because
&lt;code&gt;x&lt;/code&gt; changes at a constant rate of 1 (exactly like our SDF) from one pixel to
another. If we remap &lt;code&gt;px&lt;/code&gt; to a value between &lt;code&gt;[-1,1]&lt;/code&gt; using &lt;code&gt;p=px/W*2.0-1.0&lt;/code&gt;
(where &lt;code&gt;W&lt;/code&gt; is the number of pixels on the x axis), we can follow derivation
rules and end up with: &lt;code&gt;dFdx(p)=2.0/W&lt;/code&gt;. This matches with our initial pixel size
computation &lt;code&gt;w&lt;/code&gt; in our previous section.&lt;/p&gt;
&lt;p&gt;Now when 3D and perspective distortions are involved, this still holds and you
may be wondering why. The intuitive answer is that &lt;code&gt;fwidth(d)&lt;/code&gt; is the rate of
change of the signed distance &lt;strong&gt;as seen from the flat pixel screen
perspective&lt;/strong&gt;. In 3D view, in the back of the shape, the distance &lt;code&gt;d&lt;/code&gt; changes
sharply from one pixel to another (meaning &lt;code&gt;fwidth(d)&lt;/code&gt; will be high), while in
the front it&#x27;s way smoother (meaning &lt;code&gt;fwidth(d)&lt;/code&gt; will be low). So this numerical
derivative is used to scale the distance back to a transition that works
smoothly from the 2D pixel point of view.&lt;/p&gt;
&lt;h3&gt;Numerical derivatives refinement&lt;/h3&gt;
&lt;p&gt;Instead of &lt;code&gt;fwidth&lt;/code&gt;, we could also use the L2-norm (also known as euclidean
distance):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float w = length(vec2(dFdx(d), dFdy(d)));
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is more expensive than &lt;code&gt;fwidth&lt;/code&gt;, but it can be considered as an
alternative. The AA will be slightly different but it&#x27;s hard to say which one
really is better than the other:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;80&quot; height=&quot;40&quot; style=&quot;width:640px; height:320px; image-rendering:pixelated&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/anti-aliasing/norm1-vs-norm2.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;L1-norm (left) vs L2-norm (right) for pixel estimate, resolution x8&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Straight vs smooth(er) ramp&lt;/h2&gt;
&lt;p&gt;Instead of a &lt;code&gt;linearstep()&lt;/code&gt; some people like to use &lt;code&gt;smoothstep()&lt;/code&gt;. The main
reason is probably because &lt;code&gt;smoothstep()&lt;/code&gt; is available in builtin while
&lt;code&gt;linearstep()&lt;/code&gt; isn&#x27;t. But is it a better choice?&lt;/p&gt;
&lt;p&gt;Intuitively, to me at least, it makes perfect sense for the alpha value to
follow a linear ramp. A few weeks ago I would have adamantly argued that it&#x27;s
actually a faster and more logical choice than its curved version &lt;code&gt;smoothstep&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Well... I did some tests. With a large diffuse, here is what it looks like with
a linear ramp:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;320&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/anti-aliasing/sdf-linear.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;A blurry shape using a linearstep transition with w=0.3&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;It looks like there is a brighter highlight around the border (before the
fall-off), doesn&#x27;t it? Well, it&#x27;s an illusion, it&#x27;s just our brain noticing the
discontinuity and telling us about it.&lt;/p&gt;
&lt;p&gt;With a &lt;code&gt;smoothstep&lt;/code&gt; things get better:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;320&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/anti-aliasing/sdf-smooth.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;A blurry shape using a smoothstep transition with w=0.3&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;I wasn&#x27;t expecting that, so I stand corrected: &lt;code&gt;smoothstep&lt;/code&gt; is actually a better
choice. It&#x27;s also builtin, so we won&#x27;t need to define our own function.&lt;/p&gt;
&lt;p&gt;Of course, one may prefer an even smoother curve, for example &lt;code&gt;smootherstep()&lt;/code&gt;
using the quintic curve instead of the hermite:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float smootherstep(float a, float b, float x) {
    float t = linearstep(a, b, x);
    return ((6.0*t-15.0)*t+10.0)*t*t*t; // quintic
}
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;320&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/anti-aliasing/sdf-smoother.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;A blurry shape using a smootherstep transition&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;For pixel-wise anti-aliasing, the discontinuity won&#x27;t be noticed so using a
linear interpolation still is a perfectly valid choice.&lt;/p&gt;
&lt;/div&gt;
&lt;h2&gt;Color space&lt;/h2&gt;
&lt;p&gt;When we have our anti-aliasing value, it&#x27;s pretty much done. But we still have
the question on how to use it. My previous examples were in black and white, but
in many cases we need blending between colors. The question of how to blend is
probably the most tricky of all, and &lt;a href=&quot;http://blog.pkh.me/p/43-the-current-technology-is-not-ready-for-proper-blending.html&quot;&gt;my previous article was entirely dedicated
to this particular issue&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In all the examples from this page, I&#x27;ve been using the OkLab blending because
&amp;quot;it&#x27;s perfect&amp;quot;. But the reality is likely to force you to use a simple linear
blending. For anti-aliasing, it&#x27;s honestly just fine, the illusion still works
out, but if you&#x27;re trying to blur I would advise you against it and switch to a
better colorspace like OkLab whenever possible.&lt;/p&gt;
&lt;p&gt;See here how, because of how the human perception works, the linear blending
does feel &amp;quot;bobby&amp;quot; and too large compared to OkLab:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;640&quot; height=&quot;320&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/anti-aliasing/sdf-linear-vs-oklab.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;A blurry shape blend using linear (left) or OkLab (right) blending&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;p&gt;Given all these tools we can combine them according to our needs and
preferences. As closing words, let me propose a few reference examples:&lt;/p&gt;
&lt;h3&gt;A &amp;quot;good enough&amp;quot; centered linear ramp working in 2D or 3D&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec2 p = (2.0*gl_FragCoord.xy - resolution) / min(resolution.x, resolution.y);
float d = sdWP(p, ...); // signed distance, positive inside, negative outside SDF (Wikipedia style)
float h = clamp(0.5 + d/fwidth(d), 0.0, 1.0);
vec3 c = mix(c0, c1, h); // this assumes c0 and c1 colors are in linear space
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;A smooth user blur working in 2D only&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float r = min(resolution.x, resolution.y);
vec2 p = (2.0*gl_FragCoord.xy - resolution) / r;
float w = max(u_blur, 2.0/r); // blur should not be smaller than unit size
float d = sdIQ(p, ...); // signed distance, negative inside, positive outside SDF (iQuilez style)
float h = smoothstep(-w/2.0, w/2.0, -d); // smooth centered blur
vec3 c = mix(c0, c1, h); // this assumes c0 and c1 colors are in linear space
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;An outer anti-aliasing with a more refined unit width estimation&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec2 p = (2.0*gl_FragCoord.xy - resolution) / min(resolution.x, resolution.y);
float d = sdWP(p, ...); // signed distance, positive inside, negative outside SDF (Wikipedia style)
float w = length(vec2(dFdx(d),dFdy(d))); // L2-norm width estimation
float h = smoothstep(-w, 0.0, d); // smooth outer AA
vec3 c = mix(c0, c1, h); // this assumes c0 and c1 colors are in linear space
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Anti-aliasing SDFs can be beautifully simple, and now we also understand better
the magic behind. ✨&lt;/p&gt;

 </description>
</item>
<item>
 <guid>http://blog.pkh.me/p/43-the-current-technology-is-not-ready-for-proper-blending.html</guid>
 <link>http://blog.pkh.me/p/43-the-current-technology-is-not-ready-for-proper-blending.html</link>
 <title>The current technology is not ready for proper blending</title>
 <pubDate>Fri, 18 Jul 2025 20:10:43 -0000</pubDate>
 <description>&lt;p&gt;The idea that we must always linearize sRGB gradients or work in a perceptually
uniform colorspace is starting to be accepted universally. But is it that
simple?&lt;/p&gt;
&lt;p&gt;When I learned about the subject, it felt like being handed a hammer and using
it everywhere. The reality is a bit more nuanced. In this article we will see
when to use which, how to use them, and we will then see why the situation is
more dire than it looks.&lt;/p&gt;
&lt;p&gt;&lt;canvas width=&quot;800&quot; height=&quot;200&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/gradient-blending/intro.frag&quot;&gt;&lt;/canvas&gt;&lt;/p&gt;
&lt;h2&gt;Code snippets&lt;/h2&gt;
&lt;p&gt;Before we start, since we are going to use GLSL as language, following are the
reference functions we will use for the rest of the article.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec3 s2l(vec3 c) { // sRGB to linear
    return mix(c/12.92, pow((max(c,0.0)+0.055)/1.055,vec3(2.4)), step(vec3(0.04045),c));
}

vec3 l2s(vec3 c) { // linear to sRGB
    return mix(c*12.92, 1.055*pow(max(c,0.0),vec3(1./2.4))-0.055, step(vec3(0.0031308),c));
}

vec3 l2oklab(vec3 rgb) { // linear to OkLab
    const mat3 rgb2lms = mat3(
        +0.4122214708, +0.2119034982, +0.0883024619,
        +0.5363325363, +0.6806995451, +0.2817188376,
        +0.0514459929, +0.1073969566, +0.6299787005);
    const mat3 lms2lab = mat3(
        +0.2104542553, +1.9779984951, +0.0259040371,
        +0.7936177850, -2.4285922050, +0.7827717662,
        -0.0040720468, +0.4505937099, -0.8086757660);
    vec3 lms = rgb2lms * rgb;
    return lms2lab * pow(lms, vec3(1.0/3.0));
}

vec3 oklab2l(vec3 lab) { // OkLab to linear
    const mat3 lab2lms = mat3(
        +1.0000000000, +1.0000000000, +1.0000000000,
        +0.3963377774, -0.1055613458, -0.0894841775,
        +0.2158037573, -0.0638541728, -1.2914855480);
    const mat3 lms2rgb = mat3(
        +4.0767416621, -1.2684380046, -0.0041960863,
        -3.3077115913, +2.6097574011, -0.7034186147,
        +0.2309699292, -0.3413193965, +1.7076147010);
    vec3 lms = lab2lms * lab;
    return lms2rgb * (lms*lms*lms);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Also, the output of the pipeline will be expected to be sRGB all the time.&lt;/p&gt;
&lt;h2&gt;Color gradients&lt;/h2&gt;
&lt;p&gt;To illustrate how sRGB, linear RGB and OkLab respectively look like, let&#x27;s
interpolate between two colors with each one of them:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/gradient-blending/gradient-color.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Color gradients from top to bottom: sRGB, linear, OkLab&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The 3 stripes were generated like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec3 o_srgb   = mix(c0, c1, v);
vec3 o_linear = l2s(mix(s2l(c0), s2l(c1), v));
vec3 o_oklab  = l2s(oklab2l(mix(l2oklab(s2l(c0)), l2oklab(s2l(c1)), v)));
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Where &lt;code&gt;v&lt;/code&gt; is simply the x coordinate between 0 and 1, &lt;code&gt;c0&lt;/code&gt; the left color, and
&lt;code&gt;c1&lt;/code&gt; the right one.&lt;/p&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;The input colors are considered to be sRGB in input. Similarly, we always
make sure to output sRGB at the end (with &lt;code&gt;l2s()&lt;/code&gt;) because that&#x27;s what the
pipeline expects.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Key takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;sRGB is not acceptable because of this grayish/brownish zone, which is
also perceived darker. In various situations this creates undesirable muddy
midtones. In general, it&#x27;s &lt;a href=&quot;https://www.youtube.com/watch?v=LKnqECcg6Gw&quot;&gt;wrong and broken&lt;/a&gt; to do that.&lt;/li&gt;
&lt;li&gt;Linear is better from a purely physical point of view as it models the
mixing of light energy properly. But from a color perception point of view
it&#x27;s not ideal, for example here it has this transition into pinkish which
might not be desirable.&lt;/li&gt;
&lt;li&gt;The last one is using &lt;a href=&quot;https://bottosson.github.io/posts/oklab/&quot;&gt;OkLab&lt;/a&gt; for a perceptually uniform gradient; it is
the one providing the best result for our human perception, at a certain
performance cost.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The general consensus is as follow: if you need a color transition within a
shape or texture, or some sort of color map, OkLab is the best tool, while
linear is cheap, physically correct, and usually acceptable visually.&lt;/p&gt;
&lt;h2&gt;But what about monochrome gradients?&lt;/h2&gt;
&lt;p&gt;Things are not as obvious as they seem when we work in monochrome. If
instead of red and blue we pick black and white, this is what happens:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/gradient-blending/gradient-gray.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Grayscale gradients from top to bottom: sRGB, linear, OkLab&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Suddenly this tells a whole different story. sRGB becomes perfectly acceptable,
while linear favors way too much the lightness, and OkLab remains the best. The
linear gradient felt acceptable before, but now it is highly questionable.&lt;/p&gt;
&lt;p&gt;Just to be clear, the linear strip &lt;em&gt;is&lt;/em&gt; linear, you can see it as linear energy
or casually said &amp;quot;wattage&amp;quot;, to which our perception does respond non-linearly.&lt;/p&gt;
&lt;p&gt;At this point one may even argue that sRGB looks best.&lt;/p&gt;
&lt;img src=&quot;http://blog.pkh.me/img/gradient-blending/srgb-linear-iq-meme.jpg&quot; alt=&quot;sRGB vs linear IQ meme&quot;&gt;
&lt;p&gt;So what can we do about this?&lt;/p&gt;
&lt;p&gt;First of all, we always need to question what we are trying to achieve, and
fortunately sometimes we can take a few shortcuts. For example, let&#x27;s say we
want to depict a heat map in black and white. In &lt;a href=&quot;http://blog.pkh.me/p/42-sharing-everything-i-could-understand-about-gradient-noise.html&quot;&gt;my previous article&lt;/a&gt;
I had to display 2D noise, so I wanted the observer to experience a linear
perception of the &amp;quot;height&amp;quot; of the noise. In this case, working in sRGB (that is,
doing zero effort with regards to perception) is actually a better call than
mixing between black and white in linear space:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/gradient-blending/noise2-linear-srgb.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Noise 2D with height as sRGB (left) or linear (right)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Here we are comparing these two:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec3 o_srgb   = vec3(v);      // equivalent to mix(black, white, v)
vec3 o_linear = l2s(vec3(v)); // equivalent to l2s(mix(black, white, v))
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;We removed the &lt;code&gt;mix&lt;/code&gt; out of the formulas because &lt;code&gt;black=vec3(0)&lt;/code&gt; and
&lt;code&gt;white=vec3(1)&lt;/code&gt;, which have the same value when uncompressed to linear
space.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;To do things right we may want to use OkLab but this feels overkill since this
is just a straightforward monochromatic signal. Fortunately, the perceptual
lightness can be fairly simple to model. With monochromatic input, OkLab uses
&lt;code&gt;L=x³&lt;/code&gt;, which is basically equivalent to do a gamma correction with &lt;code&gt;γ=3&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This means that we can simplify the OkLab interpolation we used before to the
very simple:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec3 o_oklab = l2s(vec3(v*v*v));  // equivalent to l2s(oklab2l(vec3(v,0,0)))
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/gradient-blending/noise2-oklab.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Noise 2D with height remapped to human lightness perception&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Doing this simple operation is exactly equivalent to interpolating between black
and white in OkLab space, except it&#x27;s just 2 extra multiplications.&lt;/p&gt;
&lt;p&gt;We still need to be extra careful if we want to swap the black and white. &lt;code&gt;v&lt;/code&gt;
needs to be swapped &lt;em&gt;before&lt;/em&gt; the gamma encoding, and that means before the sRGB
gamma encoding as well:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/gradient-blending/gradient-w2b.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Top to bottom: srgb(1-v³) (incorrect), 1-srgb(v³) (incorrect), srgb((1-v)³)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3&gt;One extra trick: combining gammas&lt;/h3&gt;
&lt;p&gt;sRGB has a curve that closely approximates a gamma correction &lt;code&gt;γ=2.2&lt;/code&gt;. So
sometimes, instead of using &lt;code&gt;l2s(rgb)&lt;/code&gt;, we may prefer to use the simpler
&lt;code&gt;pow(rgb,vec3(1.0/2.2))&lt;/code&gt;. It means we could replace &lt;code&gt;l2s(vec3(v*v*v))&lt;/code&gt; with the
following to merge the two operations:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec3 o_oklab = vec3(pow(v, 3.0/2.2)); // combination of v³ and gamma 2.2 (sRGB-like) encoding
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And the white-to-black version:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec3 o_oklab = vec3(pow(1.0-v, 3.0/2.2));
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition warning&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Warning&lt;/p&gt;
&lt;p&gt;Whenever you use &lt;code&gt;pow&lt;/code&gt;, make sure your input is positive. Adding a
&lt;code&gt;max(v,0.0)&lt;/code&gt; for safety might be reasonable in certain cases.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The difference between a proper sRGB conversion and the combined gamma is pretty
small:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;160&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/gradient-blending/gradient-srgb-vs-pow.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Top: srgb(v³), bottom: v^(3/2.2)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Alpha blending and pre-multiplication&lt;/h2&gt;
&lt;p&gt;Sometimes, instead of fading colors into each others, we need to compose shapes,
textures, masks, ... This need for compositing, or blending, arises when the
pipelines are separated, meaning we are not working in the same fragment shader
for everything. For example, we could have a shape generated in a fragment,
which we need to overlay onto a surface. That shape might have some non-binary
transparency, either for anti-aliasing purposes, a blur, or similar.&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;380&quot; height=&quot;240&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/gradient-blending/shape.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;An example of a shape partially transparent&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;If that shape were to be blend onto another colored surface, we would like to
have the same effect as the gradient earlier. For &lt;a href=&quot;https://www.realtimerendering.com/blog/gpus-prefer-premultiplication/&quot;&gt;well-known reasons&lt;/a&gt;,
it is likely that this shape would end up as a pre-multiplied color, which
would be blend onto one or more layers. If what I just said is confusing, I
recommend checking out this &lt;a href=&quot;https://ciechanow.ski/alpha-compositing/&quot;&gt;good article on alpha compositing from Bartosz
Ciechanowski&lt;/a&gt;. The literature is quite extensive on the subject so I will
assume familiarity with it.&lt;/p&gt;
&lt;p&gt;Of course, if we are to do things right, the blending would have to happen in
linear space. Do not consider sRGB blending in alpha blending, it&#x27;s an even
more terrible idea than before because of the bilinear filtering, transforms or
mipmaps that can happen between the pre-multiplication and the blending itself.&lt;/p&gt;
&lt;p&gt;But that means we would end up with the linear gradient shortcomings from
earlier, wouldn&#x27;t we? And this is where things get ugly.&lt;/p&gt;
&lt;p&gt;Look at the difference between a linear and an OkLab blending, in black and
white:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;500&quot; height=&quot;250&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/gradient-blending/blend-lin-vs-ok-wob.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Blending of a blurry white circle onto black, left is linear, right is OkLab&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;If we invert the colors:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;500&quot; height=&quot;250&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/gradient-blending/blend-lin-vs-ok-bow.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Blending of a blurry black circle onto white, left is linear, right is OkLab&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;We have the exact same problem as earlier, but seeing it with an actual
blending of shapes makes the problem particularly striking. The white and black
OkLab circles look the same size (because they are), and they don&#x27;t have the
unfortunate &amp;quot;bobbing&amp;quot; effect of the linear version (on the white onto black).&lt;/p&gt;
&lt;div class=&quot;admonition warning&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Warning&lt;/p&gt;
&lt;p&gt;The OkLab blending is done with pre-multiplied Lab colors. It is important
not to pre-multiply linear values which are then converted to OkLab, this
will give very unexpected results.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The problem is, it is very unlikely that your whole graphics pipeline would
switch to OkLab for every textures and buffers. And since most of the time the
pipelines are built for more than just black and white, the cube hack suggested
earlier has a very limited scope. In the case of shape blending, it is almost
certain that the whole pipeline would not be contained in a single shader where
you can just mix in OkLab. You&#x27;re probably thinking of using sRGB, but in a
blending pipeline this really is a terrible idea.&lt;/p&gt;
&lt;h2&gt;Final words&lt;/h2&gt;
&lt;p&gt;In practice, neither sRGB nor pure linear blending give good results, and using
OkLab is not always an option. And unfortunately, I don&#x27;t have a good answer to
this whole situation. My next article is about anti-aliasing where this problem
also exists, and I must admit this whole ordeal puts me in quite some distress;
I had to talk about this issue first.&lt;/p&gt;

 </description>
</item>
<item>
 <guid>http://blog.pkh.me/p/42-sharing-everything-i-could-understand-about-gradient-noise.html</guid>
 <link>http://blog.pkh.me/p/42-sharing-everything-i-could-understand-about-gradient-noise.html</link>
 <title>Sharing everything I could understand about gradient noise</title>
 <pubDate>Fri, 06 Jun 2025 14:45:38 -0000</pubDate>
 <description>&lt;p&gt;You&#x27;ve most likely heard about &lt;strong&gt;gradient noise&lt;/strong&gt; through the name &lt;em&gt;Perlin
noise&lt;/em&gt;, which refers to one particular implementation with various CPU
optimizations. Because it&#x27;s an incredible tool for creative work, it&#x27;s used
virtually everywhere: visual effects, video games, procedural mathematical art,
etc. While getting it right can sometimes be subtle, a &amp;quot;broken&amp;quot; implementation
can still look good or interesting. After all, &amp;quot;it looks fine, and I&#x27;m an
artist&amp;quot;.&lt;/p&gt;
&lt;p&gt;In order to gain a deeper and more meaningful understanding we will start
studying the 1D version (a case often omitted in the literature), then slowly
climb our way up in dimensions and complexity. We&#x27;ll also work from a GPU
perspective rather than a CPU-based one, hence all code snippets and visuals
here are implemented in WebGL2/GLSL (hopefully without being too heavy on
performance). They should run on most modern devices; let me know if you run
into issues.&lt;/p&gt;
&lt;p&gt;Before we begin, credit where it&#x27;s due: most of the material here are nothing
new. This article is the result of weeks of studying and experimenting with
the maths from &lt;a href=&quot;https://iquilezles.org/articles/&quot;&gt;Inigo Quilez&#x27;s incredible pages&lt;/a&gt; and other scattered
resources over the Internet. But as rich and valuable these resources are, they
sometimes move quickly over the details, assuming they&#x27;re obvious. This post is
an attempt to fill those gaps.&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;140&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/intro.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;A welcoming wavy 1D gradient noise signal&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Hashing function and pseudo-random values&lt;/h2&gt;
&lt;p&gt;At the most elementary level, we need a deterministic coordinate based
pseudo-random system. More specifically, for any given integer coordinate we
need a random value, and as uniformly distributed as possible. Something like:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
h(-3) &amp;amp;= -0.006124 \\
h(-2) &amp;amp;= -0.996686 \\
h(-1) &amp;amp;= 0.200864 \\
h(0) &amp;amp;= -1.000000 \\
h(1) &amp;amp;= 0.053313 \\
h(2) &amp;amp;= -0.893312 \\
h(3) &amp;amp;= 0.854923 \\
\text{...}
\end{aligned}
&lt;/div&gt;
&lt;p&gt;Perlin&#x27;s implementation relies on a permutation table, which is convenient when
working on the CPU, but more awkward for a shader. On the GPU, most people rely
on various floating point hacks or sub-optimal bit tricks, which often fall
short when exploring the full 32-bit range of inputs.&lt;/p&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;We can not use a PRNG because it relies on a state where we need determinism
(for one coordinate, we always want the same random value). LCG (Linear
Congruential Generator) are a sub-classes of PRNG where the state is
actually the returned value. But even then, we can not re-use the previously
returned value since we need seeking; that is the ability to get the random
value associated with our integer coordinate. Trying to feed a PRNG/LCG with
our coordinate instead of the expected state would be equivalent to changing
the seed at every call, and this may create an important bias in the random
distribution. This is the reason why we need integer hashing instead.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;So we need a hashing function, and we will have to limit ourselves to
32-bit because we&#x27;re on the GPU. Fortunately, we won&#x27;t have to look for
very long because in 2018 &lt;a href=&quot;https://nullprogram.com/blog/2018/07/31/&quot;&gt;Chris Wellons found a pretty good one he named
lowbias32&lt;/a&gt;, which was later &lt;a href=&quot;https://github.com/skeeto/hash-prospector/issues/19#issuecomment-1120105785&quot;&gt;refined by TheIronBorn&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is the first building block we will be using, the hashing function:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;uint hash(uint x) {
    x = (x ^ (x &amp;gt;&amp;gt; 16)) * 0x21f0aaadU;
    x = (x ^ (x &amp;gt;&amp;gt; 15)) * 0x735a2d97U;
    return x ^ (x &amp;gt;&amp;gt; 15);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is sweet and all, but a 32-bit unsigned integer is not directly useful by
itself, we need a normalized float. We could naively divide by &lt;code&gt;0xffffffff&lt;/code&gt;,
but we&#x27;ll end up with a nonuniform distribution (this is not as important as it
may sound to be honest). Instead, we will adapt the &lt;a href=&quot;https://prng.di.unimi.it/&quot;&gt;technique presented by
Vigna Sebastiano for doubles&lt;/a&gt; to floats, and in GLSL:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float u2f(uint x) { return float(x &amp;gt;&amp;gt; 8U) * uintBitsToFloat(0x33800000U); }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Combining these two functions into &lt;code&gt;u2f(hash(x))&lt;/code&gt; maps any 32-bit
coordinate &lt;span class=&quot;math inline&quot;&gt;x&lt;/span&gt; to a random float in &lt;span class=&quot;math inline&quot;&gt;[0,1)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;To match our &lt;span class=&quot;math inline&quot;&gt;h(x)&lt;/span&gt; function from earlier and make it more &amp;quot;signal-like&amp;quot;, we can
optionally center it around &lt;span class=&quot;math inline&quot;&gt;0&lt;/span&gt; by remapping it from &lt;span class=&quot;math inline&quot;&gt;[0,1)&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;[-1,1)&lt;/span&gt;:
&lt;code&gt;h(x)=u2f(hash(x))*2.0-1.0&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Expanding the hash function to more dimensions&lt;/h2&gt;
&lt;p&gt;When we will work in 2 or more dimensions, our input grid coordinates will be
more than one value, but we need a way to feed them to our single parameter hash
function. One trick is to use nested xor hash:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;uint hash(uvec2 x) { return hash(x.x ^ hash(x.y)); }  // for 2D input
uint hash(uvec3 x) { return hash(x.x ^ hash(x.yz)); } // for 3D input
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using the pixel coordinates as input, we can test the hashing function in 2D:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float stepnoise2(vec2 p) {
    ivec2 i = ivec2(floor(p));    // integer coordinate, or lattice
    return u2f(hash(uvec2(i)));   // non-centered h(x)
}
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/stepnoise2.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Display h(x,y) by stepping the contiguous coordinates&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;div class=&quot;admonition warning&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Warning&lt;/p&gt;
&lt;p&gt;We&#x27;re going through an intermediate signed integer conversion before going
to unsigned to avoid issues with negative coordinates. Quoting GLSL 4.60
specifications: &amp;quot;It is undefined to convert a negative floating-point value
to an uint&amp;quot;. With this conversion we will only have a problem when the
coordinates go outside the signed 32-bit range.&lt;/p&gt;
&lt;/div&gt;
&lt;h2&gt;Basic signal and white noise&lt;/h2&gt;
&lt;p&gt;Aside from stepping, we can also assign values to the integer coordinates only
and interpolate linearly between them. Let&#x27;s try this in 1D:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float value(int x) { return u2f(hash(uint(x)))*2.0 - 1.0; } // h(x)

float vnoise1_linear(float p) {
    int i = int(floor(p));                // integer coordinate, or lattice
    float f = fract(p);                   // x-position between the 2 surrounding values
    return mix(value(i), value(i+1), f);  // linear interpolation between the two
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Feeding this function with the x-axis coordinate, we get the amplitude (height)
of the signal:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;150&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/vnoise1_linear.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Basic 1D value noise with linear interpolation&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;This might not be obvious yet, but the beauty of this is that this
&amp;quot;&lt;strong&gt;infinite&lt;/strong&gt;&amp;quot; signal is &lt;strong&gt;deterministic&lt;/strong&gt; and thus &lt;strong&gt;seekable&lt;/strong&gt;. Indeed,
we can move forward and backward to any real position &lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt; and instantly know
how the signal looks like. This is essential for &lt;strong&gt;procedural&lt;/strong&gt; programming.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;But the linear interpolation is usually not that great for a signal, so instead
of a straight line we will use a smooth fade. The two commonly used functions
are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;cubic Hermite curve&lt;/strong&gt;, &lt;span class=&quot;math inline&quot;&gt;f(t)=3t^2-2t^3&lt;/span&gt; (also used in GLSL
&lt;code&gt;smoothstep()&lt;/code&gt;) initially used by Ken Perlin in his first Perlin Noise
implementation.&lt;/li&gt;
&lt;li&gt;The more modern (and complex) &lt;strong&gt;quintic curve&lt;/strong&gt; &lt;span class=&quot;math inline&quot;&gt;f(t)=6t^5-15t^4+10t^3&lt;/span&gt;
introduced in 2002 by Ken Perlin in his proposed improved version of the
Perlin Noise, in order to address discontinuities in the 2nd order derivatives
&lt;span class=&quot;math inline&quot;&gt;f&amp;quot;(t)&lt;/span&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For the record, in GLSL:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float fade_quintic(float t) { return ((6.0*t-15.0)*t+10.0)*t*t*t; }
float fade_hermite(float t) { return (3.0-2.0*t)*t*t; }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We will stick with the quintic for the rest of the article:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;#define fade fade_quintic

float vnoise1(float p) {
    int i = int(floor(p));
    float f = fract(p);
    float a = fade(f);
    return mix(value(i), value(i+1), a);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;150&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/vnoise1.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;1D value noise with quintic interpolation as fading function&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;1D gradient noise&lt;/h2&gt;
&lt;p&gt;Still, a signal generated in such a way may not be desirable due to the abrupt
changes in slope/frequency; it is too &amp;quot;unstable&amp;quot;. So instead of using the
random as noise values directly, we &lt;strong&gt;interpret them as gradient&lt;/strong&gt;: this is the
&lt;em&gt;gradient noise&lt;/em&gt;.&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/gnoise1dancing.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;How the gradient values affect the signal shape&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;If we&#x27;re pedantic, in 1D they can&#x27;t exactly be called gradients, we should
use the term &lt;em&gt;slopes&lt;/em&gt;, or &lt;em&gt;angles&lt;/em&gt;. But we will still do it for consistency
with 2 and more dimensions.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;For each lattice, we decide to give them &lt;span class=&quot;math inline&quot;&gt;y=0&lt;/span&gt; at regular interval, and have
their random gradient impacts the surrounding curve. This may sound complicated
to implement but in practice it&#x27;s 2 multiplications and 1 subtraction more than
the value noise (for 1D at least):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float grad(int x) { // int lattice to random [-1,1)
    return u2f(hash(uint(x))) * 2.0 - 1.0;
}

float noise1(float p) {
    int i = int(floor(p));
    float g0 = grad(i);
    float g1 = grad(i + 1);

    float f = fract(p);
    float v0 = g0 * f;
    float v1 = g1 * (f - 1.0);

    float a = fade(f);
    return mix(v0, v1, a);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/gnoise1dbg.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;1D gradient noise with slope indicators on the lattice coordinates&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Geometrically speaking, &lt;span class=&quot;math inline&quot;&gt;v_0&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;v_1&lt;/span&gt; are y-coordinates obtained by extending
the 2 slopes (the little sticks at each lattice) around our current target
point &lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt; and finding where the vertical line &lt;span class=&quot;math inline&quot;&gt;x=p&lt;/span&gt; intersects them.
Then we smoothly interpolate between them using our fade function.&lt;/p&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;One important property of the gradient noise is that it passes through &lt;span class=&quot;math inline&quot;&gt;0&lt;/span&gt;
at every lattice, meaning at a constant rate. In other words,
&lt;span class=&quot;math inline&quot;&gt;\textbf{noise}_1(p)=0&lt;/span&gt; whenever &lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt; is an integer.&lt;/p&gt;
&lt;/div&gt;
&lt;h2&gt;Expanding to 2 dimensions&lt;/h2&gt;
&lt;p&gt;In 2D, each lattice point stores a 2-component gradient vector. To evaluate
noise at a point &lt;span class=&quot;math inline&quot;&gt;p=(x,y)&lt;/span&gt;, instead of a simple multiplication we compute the
dot product of each gradient vector with the vector from the lattice corner
to &lt;span class=&quot;math inline&quot;&gt;(x,y)&lt;/span&gt;, then &lt;strong&gt;bilinearly interpolate&lt;/strong&gt; those four dot products using a 2D
fade function.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;#define bmix(a,b,c,d,x,y) mix(mix(a,b,x),mix(c,d,x),y) // bilinear interpolation

float noise2(vec2 p) {
    ivec2 i = ivec2(floor(p));
    vec2 g0 = grad(i);
    vec2 g1 = grad(i + ivec2(1, 0));
    vec2 g2 = grad(i + ivec2(0, 1));
    vec2 g3 = grad(i + ivec2(1, 1));

    vec2 f = fract(p);
    float v0 = dot(g0, f);
    float v1 = dot(g1, f - vec2(1.0, 0.0));
    float v2 = dot(g2, f - vec2(0.0, 1.0));
    float v3 = dot(g3, f - vec2(1.0, 1.0));

    vec2 a = fade(f);
    return bmix(v0, v1, v2, v3, a.x, a.y);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Since gradients are now in 2D, the random gradient function needs to be
extended: we need &lt;strong&gt;a normalized vector generator&lt;/strong&gt;, that is something that
gives us a unit vector pointing in any direction.&lt;/p&gt;
&lt;p&gt;A first solution would be to call the hash function twice (on itself typically)
to obtain random &lt;span class=&quot;math inline&quot;&gt;(x,y)&lt;/span&gt; coordinates in a square, that we then normalize to get
them on a circle:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec2 grad(ivec2 x) { // ivec2 lattice to random 2D unit vector (normalized square point)
    uint h1 = hash(uvec2(x));
    uint h2 = hash(h1);
    return normalize(vec2(u2f(h1), u2f(h2)) * 2.0 - 1.0);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But we can do better with only one hash and some trigonometry:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;const float TAU = 6.283185307179586;

vec2 grad(ivec2 x) { // ivec2 lattice to random 2D unit vector (circle point)
    float angle = u2f(hash(uvec2(x))) * TAU;
    return vec2(cos(angle), sin(angle));
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And this is enough to get our 2D gradient noise:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/gnoise2.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;2D gradient noise (with y-axis coordinates that cover -10 to 10)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Expanding to 3 dimensions&lt;/h2&gt;
&lt;p&gt;Similarly, for 3D gradient noise we will need these 2 changes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The interpolation happens between the 8 points of a cube: it&#x27;s a &lt;strong&gt;trilinear
interpolation&lt;/strong&gt;, a combination of 2 bilinear interpolations (itself being a
combination of 3 linear interpolations)&lt;/li&gt;
&lt;li&gt;The random unit vectors used as gradients need to be distributed evenly on a
&lt;strong&gt;sphere&lt;/strong&gt; instead of circle since we are working in 3D&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec3 grad(ivec3 x) { // ivec3 lattice to random 3D unit vector (sphere point)
    uint h0 = hash(uvec3(x));
    uint h1 = hash(h0);
    // use the first random for the polar angle (latitude)
    float c = 2.0*u2f(h0) - 1.0, // c = cos(theta) = cos(acos(2x-1)) = 2x-1
          s = sqrt(1.0 - c*c);   // s = sin(theta) = sin(acos(c)) = sqrt(1-c*c)
    float phi = TAU * u2f(h1);   // use the 2nd random for the azimuth (longitude)
    return vec3(cos(phi) * s, sin(phi) * s, c);
}

#define tmix(a,b,c,d,e,f,g,h,x,y,z) mix(bmix(a,b,c,d,x,y),bmix(e,f,g,h,x,y),z) // trilinear interpolation

float noise3(vec3 p) {
    ivec3 i = ivec3(floor(p));
    vec3 g0 = grad(i);
    vec3 g1 = grad(i + ivec3(1, 0, 0));
    vec3 g2 = grad(i + ivec3(0, 1, 0));
    vec3 g3 = grad(i + ivec3(1, 1, 0));
    vec3 g4 = grad(i + ivec3(0, 0, 1));
    vec3 g5 = grad(i + ivec3(1, 0, 1));
    vec3 g6 = grad(i + ivec3(0, 1, 1));
    vec3 g7 = grad(i + ivec3(1, 1, 1));

    vec3 f = fract(p);
    float v0 = dot(g0, f);
    float v1 = dot(g1, f - vec3(1, 0, 0));
    float v2 = dot(g2, f - vec3(0, 1, 0));
    float v3 = dot(g3, f - vec3(1, 1, 0));
    float v4 = dot(g4, f - vec3(0, 0, 1));
    float v5 = dot(g5, f - vec3(1, 0, 1));
    float v6 = dot(g6, f - vec3(0, 1, 1));
    float v7 = dot(g7, f - vec3(1, 1, 1));

    vec3 a = fade(f);
    return tmix(v0, v1, v2, v3, v4, v5, v6, v7, a.x, a.y, a.z);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/gnoise3.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;3D gradient noise on a sphere (with 10x unzoom)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;The spherical representation has nothing to do with the selection of a
random 3D unit vector on a sphere. The noise in this implementation applies
to any 3D geometry. The choice of a sphere for the display was just a simple
3D shape to lay it on. If we were in 2D we could also feed the 3d noise
function with &lt;span class=&quot;math inline&quot;&gt;p=(x,y,t)&lt;/span&gt; where &lt;span class=&quot;math inline&quot;&gt;t&lt;/span&gt; is the current time, giving us a
&amp;quot;cloudy&amp;quot; atmospheric rendering.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The random gradient in 3D might be a bit expensive, so we may consider simpler
approaches, like normalizing a random position in a cube, similar to what we&#x27;ve
initially suggested for 2D noise. From my observation, it doesn&#x27;t seem to have
any noticeable impact visually, but maybe it would have under certain
circumstances.&lt;/p&gt;
&lt;h2&gt;Fractal Brownian Motion (fBm)&lt;/h2&gt;
&lt;p&gt;The idea behind fBm is to sum multiple &amp;quot;octaves&amp;quot; of noise together to construct
a more refined pattern. In the most common cases, we would raise the frequency
by doubling it (called a &amp;quot;lacunarity&amp;quot; factor), and halving the amplitude (called
a &amp;quot;gain&amp;quot; or &amp;quot;persistence&amp;quot; factor):&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;600&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/gnoise1multi.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Multiple signals&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The algorithm is pretty much the same in all dimensions, it&#x27;s simply a sum of
signals. In 2D for example we can write:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;const float LACUNARITY = 1.98;
const float GAIN = 0.51;

float fbm(vec2 p, int octaves) {
    float sum = 0.0;
    float amp = 1.0, freq = 1.0;
    for (int i = 0; i &amp;lt; octaves; i++) {
        sum += amp * noise2(p * freq);
        freq *= LACUNARITY;
        amp *= GAIN;
    }
    return sum;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/gnoise2fbm.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;2D gradient noise with 5 octaves (with y-axis coordinates that cover -2.5 to 2.5)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;We&#x27;re not using exactly 2.0 and 0.5 for lacunarity and gain to break
correlations.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;And in 3D for completeness (aside from the raytracer to get a sphere, the noise
code is the same, it just calls &lt;code&gt;noise3(p*freq)&lt;/code&gt; and uses a &lt;code&gt;vec3 p&lt;/code&gt; input):&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/gnoise3fbm.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;3D gradient noise with 5 octaves&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Derivatives&lt;/h2&gt;
&lt;p&gt;Derivatives (that is the rate of change of the signal) are useful in many
situations. For example, you may have noticed that the 1D signals in this
article tend to have a lighter color when the slope is steep compared to when
it&#x27;s flatter: the derivative is used to interpolate between the 2 colors.&lt;/p&gt;
&lt;p&gt;Also in 1D case, to display the curve with correct thickness, I&#x27;m also using
the derivatives for &lt;a href=&quot;https://iquilezles.org/articles/distance/&quot;&gt;Inigo&#x27;s distance to curve trick&lt;/a&gt;):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float dist = abs(v - p.y) / sqrt(1.0 + d*d); // v: curve value, p: position, d: derivative of curve
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using this signed distance like any other, we can display the curve smoothly:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;150&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/gnoise1.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;1D gradient noise&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In higher dimension, we may want to use them to get some lighting: indeed,
with the derivatives we can compute the normal, which is then used for
reflections:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;// Lambertian lighting hack
vec3 normal = normalize(vec3(-d, 1.0)); // normal from the 2D partial derivatives d
float a = 5.0*TAU/8.0; // light from south-west
vec3 light_direction = normalize(vec3(cos(a), sin(a), 2.0));
float lighting = max(dot(normal, light_direction), 0.0);
col *= lighting;
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/gnoise2light.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;2D gradient noise with (right) and without (left) lighting&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;If you&#x27;re into terrain generation, another use case is to &lt;a href=&quot;https://iquilezles.org/articles/morenoise/&quot;&gt;fake
erosion&lt;/a&gt; by scaling the value of each noise layer of the fBm by
&lt;span class=&quot;math inline&quot;&gt;\frac{1}{1+\|\sum_{i=0}^{octaves}d_i\|^2}&lt;/span&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float fbm2e(vec2 p) {
    float sum = 0.0;
    float amp = 1.0, freq = 1.0;
    vec2 d = vec2(0.0);
    for (int i = 0; i &amp;lt; octaves; i++) {
        vec3 n = noise2d(p * freq);    // adjusted noise2() returning the partial derivatives in .xy
        d += n.xy;                     // cumulated derivatives without frequency scaling
        float w = 1.0/(1.0+dot(d,d));  // gradient/slope based weight
        sum += amp * n.z * w;          // value damped down by the weight
        freq *= LACUNARITY;
        amp *= GAIN;
    }
    return sum;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/gnoise2erosion.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;2D gradient noise with (right) and without (left) erosion&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The intuitive idea behind the formula here is that each new layer get &amp;quot;muted&amp;quot;
when the gradients indicate a steep mountain, preventing them from being
&amp;quot;rugged&amp;quot; more by higher frequency noises, and thus giving a sharper feel.&lt;/p&gt;
&lt;p&gt;Anyway, these are random things we can do with the derivatives, but there are
likely others I&#x27;m forgetting. The point is, they are particularly useful so
we&#x27;re going to study them.&lt;/p&gt;
&lt;h3&gt;Numerical vs analytical derivatives&lt;/h3&gt;
&lt;p&gt;We have two methods to get the derivatives:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The &lt;strong&gt;numerical&lt;/strong&gt; method, where we compute the rate of change between 2 close
points. It &amp;quot;works&amp;quot; with all curves (given enough precision to work with), but it
has an accuracy (and sometimes speed) issue.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;analytical&lt;/strong&gt; method, where we derive the exact mathematical formula.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The numerical method can be implemented on the GPU thanks to the &lt;code&gt;dFdx&lt;/code&gt; and
&lt;code&gt;dFdy&lt;/code&gt; functions. These functions use the local and the neighbor fragments data
to calculate a derivative of the specified value. It&#x27;s one of the rare function
that communicate information cross-fragment, and as you can guess it requires
synchronization and thus has performance implications.&lt;/p&gt;
&lt;p&gt;But for example, let&#x27;s say we take our last 2D noise scene and compare the
numerical vs the analytical derivatives:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/gnoise2fbmderiv.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;2D gradient noise with its partial derivatives (both analytical and numerical)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;On the left we have our standard 2D gradient fBm noise, and on the right the
length of the derivatives of the noise: first the analytical version, then the
numerical one. The latter should appear pixelized.&lt;/p&gt;
&lt;h3&gt;Numerical derivatives&lt;/h3&gt;
&lt;p&gt;The numerical derivatives were obtained with the following:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float w = 2.0/resolution.y;           // size of a pixel
float v = noise2(p);                  // noise value at position p
vec2 d = vec2(dFdx(v), dFdy(v)) / w;  // numerical partial derivatives
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;It is also possible to derive numerically ourselves with finite differences,
sampling the noise function for the neighbor positions within the same
fragment, but this is going to be extremely expensive.&lt;/p&gt;
&lt;/div&gt;
&lt;h3&gt;Analytical derivatives&lt;/h3&gt;
&lt;p&gt;To get the analytical derivatives we need to combine multiple derivatives found
in the noise functions, starting from the fading function derivative. This one
is easy:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
\mathrm{fade}(t)  &amp;amp;= 6t^5-15t^4+10t^3 \\
\mathrm{fade}&#x27;(t) &amp;amp;= 30t^2(t^2-2t+1)
\end{aligned}
&lt;/div&gt;
&lt;p&gt;But we will also need the derivatives of the interpolation functions for each
dimension:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;lerp: &lt;span class=&quot;math inline&quot;&gt;\mathrm{mix}(a,b,x) = (1-x)a + bx&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;bilerp: &lt;span class=&quot;math inline&quot;&gt;\mathrm{bmix}(a,b,c,d,x,y) = \mathrm{mix}(\mathrm{mix}(a,b,x),\mathrm{mix}(c,d,x),y)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;trilerp: &lt;span class=&quot;math inline&quot;&gt;\mathrm{tmix}(a,b,c,d,e,f,g,h,x,y,z) = \mathrm{mix}(\mathrm{bmix}(a,b,c,d,x,y),\mathrm{bmix}(e,f,g,h,x,y),z)&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So here they are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;\partial_x\mathrm{mix}=b-a&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;\nabla\mathrm{bmix}=\mathrm{mix}(\begin{bmatrix}b\\c\end{bmatrix}-a,d-\begin{bmatrix}c\\b\end{bmatrix},\begin{bmatrix}y\\x\end{bmatrix})&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;\nabla\mathrm{tmix}=\mathrm{bmix}(\begin{bmatrix}b\\c\\e\end{bmatrix}-a,\begin{bmatrix}d-c\\d-b\\f-b\end{bmatrix},\begin{bmatrix}f-e\\g-e\\g-c\end{bmatrix},h-\begin{bmatrix}g\\f\\d\end{bmatrix},\begin{bmatrix}y\\x\\x\end{bmatrix},\begin{bmatrix}z\\z\\y\end{bmatrix})&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Following is a proof for bilinear and trilinear interpolation function partial
derivatives, you can skip them if you&#x27;re not into nasty orgy of math symbols.&lt;/p&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;I&#x27;m introducing a new notation for the partial derivatives with values
because the usual ones are awful to make and read. An example is better than a
long explanation, so instead of something like
&lt;span class=&quot;math inline&quot;&gt;\frac{\partial_f(x,y,z)}{\partial_x}(17,u+v,v^2)&lt;/span&gt; or its horrendous
vertical bar version, I will instead use: &lt;span class=&quot;math inline&quot;&gt;\partial_xf(x=17,y=u+v,z=v^2)&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
\partial_x\mathrm{bmix} &amp;amp;= \partial_a\mathrm{mix}\cdot\partial_x\mathrm{mix} + \partial_b\mathrm{mix}\cdot\partial_x\mathrm{mix} \\
                        &amp;amp;= \partial_a\mathrm{mix}(a=\mathrm{mix}(a,b,x),b=\mathrm{mix}(c,d,x),x=y) \cdot \partial_x\mathrm{mix}(a=a,b=b,x=x) \\
                        &amp;amp;+ \partial_b\mathrm{mix}(a=\mathrm{mix}(a,b,x),b=\mathrm{mix}(c,d,x),x=y) \cdot \partial_x\mathrm{mix}(a=c,b=d,x=x) \\
                        &amp;amp;= (1-y)(b-a) + y(d-c) \\
                        &amp;amp;= \boxed{\mathrm{mix}(b-a,d-c,y)} \\
\\
\partial_y\mathrm{bmix} &amp;amp;= \partial_x\mathrm{mix}_0(a=\mathrm{mix}(a,b,x), b=\mathrm{mix}(c,d,x), x=y) \\
                        &amp;amp;= \mathrm{mix}(c,d,x) - \mathrm{mix}(a,b,x) \\
                        &amp;amp;= \boxed{\mathrm{mix}(c-a,d-b,x)} \\
\\
\partial_x \mathrm{tmix} &amp;amp;= \partial_a\mathrm{mix}\cdot\partial_x\mathrm{bmix} + \partial_b\mathrm{mix}\cdot\partial_x\mathrm{bmix} \\
                         &amp;amp;= \partial_a\mathrm{mix}(a=\mathrm{bmix}(a,b,c,d,x,y),b=\mathrm{bmix}(e,f,g,h,x,y),z) \\
                         &amp;amp;\cdot \partial_x\mathrm{bmix}(a=a,b=b,c=c,d=d,x=x,y=y) \\
                         &amp;amp;+ \partial_b\mathrm{mix}(a=\mathrm{bmix}(a,b,c,d,x,y),b=\mathrm{bmix}(e,f,g,h,x,y),z) \\
                         &amp;amp;\cdot \partial_x\mathrm{bmix}(a=e,b=f,c=g,d=h,x=x,y=y) \\
                         &amp;amp;= (1-z)\mathrm{mix}(b-a,d-c,y) + z\mathrm{mix}(f-e,h-g,y) \\
                         &amp;amp;= \mathrm{mix}(\mathrm{mix}(b-a,d-c,y), z\mathrm{mix}(f-e,h-g,y), z) \\
                         &amp;amp;= \boxed{\mathrm{bmix}(b-a,d-c,f-e,h-g,y,z)} \\
\\
\partial_y \mathrm{tmix} &amp;amp;= \partial_a\mathrm{mix}\cdot\partial_y\mathrm{bmix} + \partial_b\mathrm{mix}\cdot\partial_y\mathrm{bmix} \\
                         &amp;amp;= \partial_a\mathrm{mix}(a=\mathrm{bmix}(a,b,c,d,x,y),b=\mathrm{bmix}(e,f,g,h,x,y),z) \\
                         &amp;amp;\cdot \partial_y\mathrm{bmix}(a=a,b=b,c=c,d=d,x=x,y=y) \\
                         &amp;amp;+ \partial_b\mathrm{mix}(a=\mathrm{bmix}(a,b,c,d,x,y),b=\mathrm{bmix}(e,f,g,h,x,y),z) \\
                         &amp;amp;\cdot \partial_y\mathrm{bmix}(a=e,b=f,c=g,d=h,x=x,y=y) \\
                         &amp;amp;= (1-z)\mathrm{mix}(c-a,d-b,x) + z\mathrm{mix}(g-e,h-f,x) \\
                         &amp;amp;= \mathrm{mix}(\mathrm{mix}(c-a,d-b,x), z\mathrm{mix}(g-e,h-f,x), z) \\
                         &amp;amp;= \boxed{\mathrm{bmix}(c-a,d-b,g-e,h-f,x,z)} \\
\\
\partial_z\mathrm{tmix} &amp;amp;= \partial_x\mathrm{mix} \\
                        &amp;amp;= \partial_x\mathrm{mix}(a=\mathrm{bmix}(a,b,c,d,x,y),b=\mathrm{bmix}(e,f,g,h,x,y),x=z) \\
                        &amp;amp;= \mathrm{bmix}(e,f,g,h,x,y)-\mathrm{bmix}(a,b,c,d,x,y) \\
                        &amp;amp;= \boxed{\mathrm{bmix}(e-a,f-b,g-c,h-d,x,y)}
\end{aligned}
&lt;/div&gt;
&lt;p&gt;Finally, given these partial derivatives, we can figure out how to compute the
derivatives of the noise itself. It helps to know that &lt;code&gt;fract(p)&lt;/code&gt; derivative
is 1 (mostly) and &lt;code&gt;floor(p)&lt;/code&gt; derivative is 0 (mostly), so things simplify
themselves out fairly cleanly. I spare you the details this time:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{aligned}
\text{1D: } &amp;amp; \boxed{\textbf{mix}(g_0,g_1,\textbf{fade}(f))+\partial_x\textbf{mix}(v_0,v_1)\cdot\textbf{fade}&#x27;(f)} \\
\text{2D: } &amp;amp; \boxed{\textbf{bmix}(g_0,g_1,g_2,g_3,\textbf{fade}(f))+\nabla\mathrm{bmix}(v_0,v_1,v_2,v_3,\textbf{fade}(f))\cdot\textbf{fade}&#x27;(f)} \\
\text{3D: } &amp;amp; \boxed{\textbf{tmix}(g_0,g_1,g_2,g_3,g_4,g_5,g_6,g_7,\textbf{fade}(f))+\nabla\mathrm{tmix}(v_0,v_1,v_2,v_3,v_4,v_5,v_6,v_7,\textbf{fade}(f))\cdot\textbf{fade}&#x27;(f)}
\end{aligned}
&lt;/div&gt;
&lt;p&gt;Adjusting our gradient noise functions to return the derivatives along with the
noise value itself:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec2 noise1d(float p) {
    int i = int(floor(p));
    float g0 = grad(i);
    float g1 = grad(i + 1);

    float f = fract(p);
    float v0 = g0 * f;
    float v1 = g1 * (f - 1.0);

    float a = fade(f);
    float v = mix(v0, v1, a);

    float g = mix(g0, g1, a);
    float d = v1 - v0;                    // derivative of mix with respect to the interpolant
    float da = ((f-2.0)*f+1.0)*30.0*f*f;  // fade&#x27;(t): derivative of quintic interpolant

    return vec2(g + d*da, v);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec3 noise2d(vec2 p) {
    // [...]
    float v = bmix(v0, v1, v2, v3, a.x, a.y);

    vec2 g = bmix(g0, g1, g2, g3, a.x, a.y);
    vec2 d = mix(vec2(v1,v2)-v0, v3-vec2(v2,v1), a.yx); // derivatives of bmix with respect to the interpolant
    vec2 da = ((f-2.0)*f+1.0)*30.0*f*f;

    return vec3(g + d*da, v);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec4 noise3d(vec3 p) {
    // [...]
    float v = tmix(v0, v1, v2, v3, v4, v5, v6, v7, a.x, a.y, a.z);

    vec3 g = tmix(g0, g1, g2, g3, g4, g5, g6, g7, a.x, a.y, a.z);
    vec3 d = bmix(vec3(v1,v2,v4)-v0, vec3(v3-v2,v3-v1,v5-v1),  // derivatives of tmix with
                  vec3(v5-v4,v6-v4,v6-v2), v7-vec3(v6,v5,v3),  // respect to the interpolant
                  a.yxx, a.zzy);
    vec3 da = ((f-2.0)*f+1.0)*30.0*f*f;

    return vec4(g + d*da, v);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;The derivatives are in the first component so that given &lt;code&gt;n=noise3d(p)&lt;/code&gt; we
can write &lt;code&gt;vec3 d=n.xyz&lt;/code&gt; for x/y/z partial derivatives instead of the more
awkward &lt;code&gt;vec3 d=n.yzw&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;To get the equivalent derivatives for value noise, we just set &lt;span class=&quot;math inline&quot;&gt;g=0&lt;/span&gt; in the
final expression of the derivatives.&lt;/p&gt;
&lt;p&gt;It is possible to unroll the nested &lt;code&gt;mix&lt;/code&gt; expression to (maybe) make it faster,
but I find the mix expressions simple, elegant, and likely more numerically
stable.&lt;/p&gt;
&lt;h3&gt;Using the derivatives&lt;/h3&gt;
&lt;p&gt;There are important things to take into consideration when working with the
derivatives. Most notably, it&#x27;s important to follow the &lt;a href=&quot;https://en.wikipedia.org/wiki/Chain_rule&quot;&gt;chain rule&lt;/a&gt;
and the &lt;a href=&quot;https://en.wikipedia.org/wiki/Product_rule&quot;&gt;product rule&lt;/a&gt;. For example, let&#x27;s say we&#x27;re working with
a noise function that returns the derivatives:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;float freq = 0.1;
vec4 n = noise3d(x*freq);
float v = n.a;            // value
vec2 d = n.xyz * freq;    // partial derivatives
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we multiply the input of our function by the frequency, then we need to
multiply the derivatives by the same frequency factor.&lt;/p&gt;
&lt;p&gt;Similarly, let&#x27;s say we want to move &lt;span class=&quot;math inline&quot;&gt;v&lt;/span&gt; from &lt;span class=&quot;math inline&quot;&gt;[-1,1]&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;[0,1]&lt;/span&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;v = (v + 1.0) / 2.0; // [-1,1] -&amp;gt; [0,1]
d = d / 2.0;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The intuitive explanation is that if we are squeezing the value in half its
original size, then the derivatives (slopes) are getting flatter as well.&lt;/p&gt;
&lt;p&gt;This may not sound very important, but failing to propagate these scale factors
correctly often leads to subtle bugs. For example in the fBm, if we want it to
return the correct derivatives it is necessary to do something like that (notice
the &lt;code&gt;freq&lt;/code&gt;):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-glsl&quot;&gt;vec4 fbm3d(vec3 p) {
    vec4 sum = vec4(0.0);
    float amp = 1.0, freq = 1.0;
    for (int i = 0; i &amp;lt; octaves; i++) {
        vec4 n = noise3d(p * freq);
        sum.xyz += amp * n.xyz * freq; // derivatives
        sum.a += amp * n.a; // value
        freq *= LACUNARITY;
        amp *= GAIN;
    }
    return sum;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Things get tricky when the noise is being adjusted with the derivatives
themselves as we saw before with the erosion, especially since this would imply
second order derivatives.&lt;/p&gt;
&lt;p&gt;Similarly, there is a technique involving the rotation of &lt;span class=&quot;math inline&quot;&gt;p&lt;/span&gt; at every iteration
of the fBm to reduce the correlations between noise layers. This rotation is a
linear transformation, but it requires careful adjustments to the derivatives as
well. Implementing these transformations correctly is left as an exercise;
careful bookkeeping of derivatives is essential.&lt;/p&gt;
&lt;h2&gt;Going further&lt;/h2&gt;
&lt;p&gt;With this introduction we only explored the tip of the iceberg. For example,
you&#x27;ll be interested to know that there are alternative noises such as
&lt;a href=&quot;https://en.wikipedia.org/wiki/OpenSimplex_noise&quot;&gt;OpenSimplex&lt;/a&gt;, where we interpolate with lattice positions on
a simplex grid (triangles in 2D, tetrahedra in 3D) instead of a rectangular
grid. It has useful properties, for example in fBm it yields fewer directional
artifacts, eliminating the need for ad-hoc rotations.&lt;/p&gt;
&lt;p&gt;Speaking of fBm, you may want to check out &lt;a href=&quot;https://iquilezles.org/articles/warp/&quot;&gt;domain warping&lt;/a&gt; where
nested fBm together make fancy effects:&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;600&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/noise/gnoise2warp.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Domain warping with fbm2(p+fbm2(p+fbm2(p+t)))&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Another thing you&#x27;ll be tempted to research is how to consistently scale the
noise so that it fits into a controlled range of value. Spoiler alert: it&#x27;s a
&lt;em&gt;particularly&lt;/em&gt; complex subject.&lt;/p&gt;
&lt;p&gt;Similarly, we stopped ourselves at 3D, but 4D is also useful, for example let&#x27;s
say we want to morph the 3D noise with time: &lt;span class=&quot;math inline&quot;&gt;p=(x,y,z,t)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;I hope this article was able to give a good overview of the concepts, and I&#x27;ll
see you next time for new adventures.&lt;/p&gt;

 </description>
</item>
<item>
 <guid>http://blog.pkh.me/p/41-fixing-the-iterative-damping-interpolation-in-video-games.html</guid>
 <link>http://blog.pkh.me/p/41-fixing-the-iterative-damping-interpolation-in-video-games.html</link>
 <title>Fixing the iterative damping interpolation in video games</title>
 <pubDate>Sat, 18 May 2024 12:22:15 -0000</pubDate>
 <description>&lt;p&gt;As I&#x27;m exploring the fantastic world of indie game development lately, I end up
watching a large number of video tutorials on the subject. Even though the
quality of the content is pretty variable, I&#x27;m very grateful to the creators
for it. That being said, I couldn&#x27;t help noticing this particular bit times and
times again:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;a = lerp(a, B, delta * RATE)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Behind this apparent banal call hides a terrible curse, forever perpetrated by
innocent souls on the Internet.&lt;/p&gt;
&lt;p&gt;In this article we will study what it&#x27;s trying to achieve, how it works, why
it&#x27;s wrong, and then we&#x27;ll come up with a good solution to the initial problem.&lt;/p&gt;
&lt;p&gt;The usual warning: I don&#x27;t have a mathematics or academic background, so the
article is addressed at other neanderthals like myself, who managed to
understand that pressing keys on a keyboard make pixels turn on and off.&lt;/p&gt;
&lt;h2&gt;What is it?&lt;/h2&gt;
&lt;p&gt;Let&#x27;s start from the beginning. We&#x27;re in a game engine main loop callback
called at a regular interval (roughly), passing down the time difference from
the last call.&lt;/p&gt;
&lt;p&gt;In Godot engine, it looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-text&quot;&gt;func _physics_process(delta: float):
    ...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If the game is configured to refresh at 60 FPS, we can expect this function to
be called around 60 times per second with &lt;code&gt;delta = 1/60 = 0.01666...&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;As a game developer, we want some smooth animations for all kind of
transformations. For example, we may want the speed of the player to go down to
zero as they release the moving key. We could do that linearly, but to make the
stop less brutal and robotic we want to slow down the speed progressively.&lt;/p&gt;
&lt;figure&gt;
  &lt;canvas width=&quot;300&quot; height=&quot;300&quot; class=&quot;shader-canvas&quot; data-fragment=&quot;http://blog.pkh.me/frag/lin-vs-exp.frag&quot;&gt;&lt;/canvas&gt;
  &lt;figcaption&gt;Linear (top) versus smooth/exponential (bottom) animation&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Virtually every tutorial will suggest updating some random variable with
something like that:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;velocity = lerp(velocity, 0, delta * RATE)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At 60 FPS, a decay &lt;code&gt;RATE&lt;/code&gt; defined to &lt;code&gt;3.5&lt;/code&gt;, and an initial &lt;code&gt;velocity&lt;/code&gt; of &lt;code&gt;100&lt;/code&gt;,
the &lt;code&gt;velocity&lt;/code&gt; will go down to &lt;code&gt;0&lt;/code&gt; following this curve:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/fixing-damp/velocity-curve.png&quot; alt=&quot;velocity curve&quot;&gt;
  &lt;figcaption&gt;Example curve of a decaying variable&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;&lt;code&gt;velocity&lt;/code&gt; is just a variable name example, it can be found in many other
contexts&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you&#x27;re familiar with &lt;code&gt;lerp()&lt;/code&gt; (&amp;quot;linear interpolation&amp;quot;) you may be wondering
why this is making a curve. Indeed, this &lt;code&gt;lerp()&lt;/code&gt; function, also known as
&lt;code&gt;mix()&lt;/code&gt;, is a simple linear function defined as &lt;code&gt;lerp(a,b,x) = x*(b-a) + a&lt;/code&gt; or
its alternative stable form &lt;code&gt;lerp(a,b,x) = (1-a)x + bx&lt;/code&gt;. For more information,
see a &lt;a href=&quot;http://blog.pkh.me/p/29-the-most-useful-math-formulas.html&quot;&gt;previous article&lt;/a&gt; about this particular function. But here
we are re-using the previous value, so this essentially means nesting &lt;code&gt;lerp()&lt;/code&gt;
function calls, which expands into a power formula, forming a curve composed of
a chain of small straight segments.&lt;/p&gt;
&lt;h2&gt;Why is it wrong?&lt;/h2&gt;
&lt;p&gt;The main issue is that the formula is heavily depending on the refresh rate. If
the game is supposed to work at 30, 60, or 144 FPS, then it means the physics
engine is going to behave differently.&lt;/p&gt;
&lt;p&gt;Here is an illustration of the kind of instability we can expect:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/fixing-damp/problematic-formula.png&quot; alt=&quot;problematic formula&quot;&gt;
  &lt;figcaption&gt;Comparison of the curves at different frame rates with the problematic formula&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Note that the inaccuracy when compared to an ideal curve is not the issue here.
The problem is that the game mechanics are different depending on the hardware,
the system, and the wind direction observed in a small island of Japan. Imagine
being able to jump further if we replace our 60Hz monitor with a 144Hz one,
that would be some nasty pay to win incentive.&lt;/p&gt;
&lt;p&gt;We may be able to get away with this by forcing a constant refresh rate for the
game and consider this a non-issue (I&#x27;m not convinced this is achievable on all
engines and platforms), but then we meet another problem: the device may not be
able to hold this requirement at all times because of potential lags (for
reasons that may be outside our control). That&#x27;s right, so far we assumed
&lt;code&gt;delta=1/FPS&lt;/code&gt; but that&#x27;s merely a target, it could fluctuate, causing mild to
dramatic situations gameplay wise.&lt;/p&gt;
&lt;p&gt;One last issue with that formula is the situation of a huge delay spike,
causing an overshooting of the target. For example if we have &lt;code&gt;RATE=3&lt;/code&gt; and we
end up with a frame that takes 500ms for whatever random reason, we&#x27;re going to
interpolate with a value of 1.5, which is way above 1. This is easily fixed by
maxing out the 3rd argument of &lt;code&gt;lerp&lt;/code&gt; to 1, but we have to keep that issue in
mind.&lt;/p&gt;
&lt;p&gt;To summarize, the formula is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;not frame rate agnostic ❌&lt;/li&gt;
&lt;li&gt;non deterministic ❌&lt;/li&gt;
&lt;li&gt;vulnerable to overshooting ❌&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you&#x27;re not interested in the gory details on the &lt;em&gt;how&lt;/em&gt;, you can now jump
straight to the conclusion for a better alternative.&lt;/p&gt;
&lt;h2&gt;Study&lt;/h2&gt;
&lt;p&gt;We&#x27;re going to switch to a more mathematical notation from now on. It&#x27;s only
going to be linear algebra, nothing particularly fancy, but we&#x27;re going to make
a mess of 1 letter symbols, bear with me.&lt;/p&gt;
&lt;p&gt;Let&#x27;s name the exhaustive list of inputs of our problem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Initial value: &lt;span class=&quot;math inline&quot;&gt;a_0=\Alpha&lt;/span&gt; (from where we start, only used once)&lt;/li&gt;
&lt;li&gt;Target value: &lt;span class=&quot;math inline&quot;&gt;\Beta&lt;/span&gt; (where we are going, constant value)&lt;/li&gt;
&lt;li&gt;Time delta: &lt;span class=&quot;math inline&quot;&gt;\Delta_n&lt;/span&gt; (time difference from last call)&lt;/li&gt;
&lt;li&gt;The rate of change: &lt;span class=&quot;math inline&quot;&gt;R&lt;/span&gt; (arbitrary scaling user constant)&lt;/li&gt;
&lt;li&gt;Original sequence: &lt;span class=&quot;math inline&quot;&gt;a_{n+1} = \mathrm{lerp}(a_n, \Beta, R\Delta_n)&lt;/span&gt; (the code
in the main loop callback)&lt;/li&gt;
&lt;li&gt;Frame rate: &lt;span class=&quot;math inline&quot;&gt;F&lt;/span&gt; (the target frame rate, for example &lt;span class=&quot;math inline&quot;&gt;60&lt;/span&gt; FPS)&lt;/li&gt;
&lt;li&gt;Time: &lt;span class=&quot;math inline&quot;&gt;t&lt;/span&gt; (animation time elapsed)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What we are looking for is a new sequence formula &lt;span class=&quot;math inline&quot;&gt;u_n&lt;/span&gt; (&lt;span class=&quot;math inline&quot;&gt;u&lt;/span&gt; standing for
&lt;em&gt;purfect&lt;/em&gt;) that doesn&#x27;t have the 3 previously mentioned pitfalls.&lt;/p&gt;
&lt;p&gt;The first thing we can do is to transform this recursive sequence into the
expected ideal contiguous time based function. The original sequence was
designed for a given rate &lt;span class=&quot;math inline&quot;&gt;R&lt;/span&gt; and FPS &lt;span class=&quot;math inline&quot;&gt;F&lt;/span&gt;: this means that while &lt;span class=&quot;math inline&quot;&gt;\Delta_n&lt;/span&gt;
changes in practice every frame, the ideal function we are looking for is
constant: &lt;span class=&quot;math inline&quot;&gt;\Delta=1/F&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;So instead of starting from &lt;span class=&quot;math inline&quot;&gt;a_{n+1} = \mathrm{lerp}(a_n, \Beta, R\Delta_n)&lt;/span&gt;,
we will look for &lt;span class=&quot;math inline&quot;&gt;u_n&lt;/span&gt; starting from &lt;span class=&quot;math inline&quot;&gt;u_{n+1} = \mathrm{lerp}(u_n, \Beta,
R\Delta)&lt;/span&gt; with &lt;span class=&quot;math inline&quot;&gt;u_0=a_0=\Alpha&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;Since I&#x27;m lazy and incompetent, we are just going to ask WolframAlpha for help
finding the solution to the recursive sequence. But to feed its input we need
to simplify the terms a bit:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{split}
u_{n+1} &amp;amp;= \mathrm{lerp}(u_n, \Beta, R\Delta) \\
      &amp;amp;= u_n(1-R\Delta) + \Beta R\Delta \\
      &amp;amp;= u_nP + Q
\end{split}
&lt;/div&gt;
&lt;p&gt;...with &lt;span class=&quot;math inline&quot;&gt;P=(1-R\Delta)&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;Q=\Beta R\Delta&lt;/span&gt;. We do that so we have a familiar &lt;span class=&quot;math inline&quot;&gt;ax+b&lt;/span&gt; linear form.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.wolframalpha.com/input?i=u%280%29%3DA+and+u%28n%2B1%29%3Du%28n%29*P%2BQ&quot;&gt;According to WolframAlpha&lt;/a&gt; this is equivalent to:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
u_n = \Alpha P^n + \frac{Q(P^n-1)}{P-1}
&lt;/div&gt;
&lt;p&gt;This is great because we now have the formula according to &lt;span class=&quot;math inline&quot;&gt;n&lt;/span&gt;, our frame
number. We can also express that discrete sequence into a contiguous function
according to the time &lt;span class=&quot;math inline&quot;&gt;t&lt;/span&gt;:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
f(t) = \Alpha P^{tF} + \frac{Q(P^{tF}-1)}{P-1}
&lt;/div&gt;
&lt;p&gt;Expanding our temporary &lt;span class=&quot;math inline&quot;&gt;P&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;Q&lt;/span&gt; placeholders with their values and
unrolling, we get:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{split}
f(t) &amp;amp;= AP^{tF} + \frac{Q(P^{tF}-1)}{P-1} \\
     &amp;amp;= A(1-R\Delta)^{tF} + \frac{\Beta R\Delta((1-R\Delta)^{tF}-1)}{(1-R\Delta)-1} \\
     &amp;amp;= A(1-R\Delta)^{tF} - \Beta((1-R\Delta)^{tF}-1) \\
     &amp;amp;= A(1-R\Delta)^{tF} + \Beta(1-(1-R\Delta)^{tF}) \\
     &amp;amp;= \mathrm{lerp}(\Beta, \Alpha, (1-R\Delta)^{tF}) \\
     &amp;amp;= \mathrm{lerp}(\Beta, \Alpha, (1-R/F)^{tF}) \\
f(t) &amp;amp;= \boxed{\mathrm{lerp}(\Alpha, \Beta, 1-(1-R/F)^{tF})}
\end{split}
&lt;/div&gt;
&lt;p&gt;This function perfectly matches the initial &lt;span class=&quot;math inline&quot;&gt;\mathrm{lerp}()&lt;/span&gt; sequence in the
hypothetical situation where the frame rate is honored. Basically, it&#x27;s &lt;strong&gt;what
the sequence &lt;span class=&quot;math inline&quot;&gt;a_{n+1}&lt;/span&gt; was meant to emulate at a given frame rate &lt;span class=&quot;math inline&quot;&gt;F&lt;/span&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;We swapped the first 2 terms of &lt;span class=&quot;math inline&quot;&gt;\mathrm{lerp}()&lt;/span&gt; at the last step because
it makes more sense semantically to go from &lt;span class=&quot;math inline&quot;&gt;\Alpha&lt;/span&gt; to &lt;span class=&quot;math inline&quot;&gt;\Beta&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Let&#x27;s again summarize what we have and what we want: we&#x27;re into the game main
loop and we want our running value to stick to that &lt;span class=&quot;math inline&quot;&gt;f(t)&lt;/span&gt; function. We
have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;v=f(t)&lt;/span&gt;: the value previously computed (&lt;span class=&quot;math inline&quot;&gt;t&lt;/span&gt; is the running duration so far,
but we don&#x27;t have it); in the original sequence this is known as &lt;span class=&quot;math inline&quot;&gt;a_n&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;\Delta_n&lt;/span&gt;: the delta time for the current frame&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We are looking for a function &lt;span class=&quot;math inline&quot;&gt;\Eta(v,\Delta_n)&lt;/span&gt; which defines the position of
a new point on the curve, only knowing &lt;span class=&quot;math inline&quot;&gt;v&lt;/span&gt; and &lt;span class=&quot;math inline&quot;&gt;\Delta_n&lt;/span&gt;. It&#x27;s a &amp;quot;time
agnostic&amp;quot; version of &lt;span class=&quot;math inline&quot;&gt;f(t)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;Basically, it is defined as &lt;span class=&quot;math inline&quot;&gt;\Eta(v,\Delta_n)=f(t+\Delta_n)&lt;/span&gt;, but since we don&#x27;t have
&lt;span class=&quot;math inline&quot;&gt;t&lt;/span&gt; it&#x27;s not very helpful. That being said, while we don&#x27;t have &lt;span class=&quot;math inline&quot;&gt;t&lt;/span&gt;, we do have
&lt;span class=&quot;math inline&quot;&gt;f(t)&lt;/span&gt; (the previous value &lt;span class=&quot;math inline&quot;&gt;v&lt;/span&gt;).&lt;/p&gt;
&lt;p&gt;Looking at the curve, we know the y-value of the previous point, and we know
the difference between the new point and the previous point on the x-axis:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/fixing-damp/prev-to-current-curve.png&quot; alt=&quot;previous to current point&quot;&gt;
  &lt;figcaption&gt;Previous and current point in time&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;If we want &lt;span class=&quot;math inline&quot;&gt;t&lt;/span&gt; (the total time elapsed at the previous point), we need the
inverse function &lt;span class=&quot;math inline&quot;&gt;f^{-1}&lt;/span&gt;. Indeed, &lt;span class=&quot;math inline&quot;&gt;t = f^{-1}(f(t))&lt;/span&gt;: taking the inverse of a
function gives back the input. We know &lt;span class=&quot;math inline&quot;&gt;f&lt;/span&gt; so we can inverse it, relying on
WolframAlpha again (what a blessing this website is):&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
f^{-1}(x) = \frac{\ln{\frac{\Beta-x}{\Beta-\Alpha}}}{F \ln(1-R/F)}
&lt;/div&gt;
&lt;div class=&quot;admonition note&quot;&gt;
&lt;p class=&quot;admonition-title&quot;&gt;Note&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;math inline&quot;&gt;\ln&lt;/span&gt; stands for natural logarithm, sometimes also called &lt;span class=&quot;math inline&quot;&gt;\log&lt;/span&gt;. Careful
though, on Desmos for example &lt;span class=&quot;math inline&quot;&gt;\log&lt;/span&gt; is in base 10, not base &lt;span class=&quot;math inline&quot;&gt;e&lt;/span&gt; (while its
&lt;span class=&quot;math inline&quot;&gt;\exp&lt;/span&gt; is in base &lt;span class=&quot;math inline&quot;&gt;e&lt;/span&gt; for some reason).&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This complex formula may feel a bit intimidating but we can now find &lt;span class=&quot;math inline&quot;&gt;\Eta&lt;/span&gt;
only using its two parameters:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{split}
\Eta(v,\Delta_n) &amp;amp;= f(t + \Delta_n) \\
                &amp;amp;= f(f^{-1}(f(t)) + \Delta_n) \\
                &amp;amp;= f(f^{-1}(v) + \Delta_n) \\
                &amp;amp;= f(\frac{\ln{\frac{\Beta-v}{\Beta-\Alpha}}}{F \ln(1-R/F)} + \Delta_n) \\
                &amp;amp;= \mathrm{lerp}(\Alpha, \Beta, 1-(1-R/F)^{(\frac{\ln{\frac{\Beta-v}{\Beta-\Alpha}}}{F \ln(1-R/F)} + \Delta_n) \times F}) \\
                &amp;amp;= \mathrm{lerp}(\Alpha, \Beta, 1-(1-R/F)^{\frac{\ln{\frac{\Beta-v}{\Beta-\Alpha}}}{\ln(1-R/F)}} (1-R/F)^{F\Delta_n}) \\
                &amp;amp;= \mathrm{lerp}(\Alpha, \Beta, 1-\frac{\Beta-v}{\Beta-\Alpha} (1-R/F)^{F\Delta_n}) \\
                &amp;amp;= (1-\frac{\Beta-v}{\Beta-\Alpha} (1-R/F)^{F\Delta_n})(\Beta-\Alpha) + A \\
                &amp;amp;= (\Beta-\Alpha) - (\Beta-v) (1-R/F)^{F\Delta_n} + A \\
                &amp;amp;= (v-\Beta)(1-R/F)^{F\Delta_n} + \Beta \\
                &amp;amp;= \mathrm{lerp}(\Beta, v, (1-R/F)^{F\Delta_n}) \\
\Eta(v,\Delta_n) &amp;amp;= \mathrm{lerp}(v, \Beta, 1-(1-R/F)^{F\Delta_n})
\end{split}
&lt;/div&gt;
&lt;p&gt;Again we swapped the first 2 arguments of &lt;code&gt;lerp&lt;/code&gt; at the last step at the cost
of an additional subtraction: this is more readable because &lt;span class=&quot;math inline&quot;&gt;\Beta&lt;/span&gt; is our
destination point.&lt;/p&gt;
&lt;p&gt;An interesting property that is going to be helpful here is &lt;span class=&quot;math inline&quot;&gt;m^n = e^{n
\ln{m}}&lt;/span&gt;. For my fellow programmers getting tensed here: &lt;code&gt;pow(m, n) == exp(n * log(m))&lt;/code&gt;. Replacing the power with the exponential may not seem like an
improvement at first, but it allows packing all the constant terms together:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{split}
\Eta(v,\Delta_n) &amp;amp;= \mathrm{lerp}(v, \Beta, 1-(1-R/F)^{F\Delta_n}) \\
                &amp;amp;= \mathrm{lerp}(v, \Beta, 1-e^{F\ln(1-R/F)\Delta_n})
\end{split}
&lt;/div&gt;
&lt;p&gt;&lt;span class=&quot;math inline&quot;&gt;F\ln(1-R/F)&lt;/span&gt; can be pre-computed because it is constant: it&#x27;s our &lt;strong&gt;rate
conversion formula&lt;/strong&gt;, which we can extract:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{split}
             R&#x27; &amp;amp;= F\ln(1-R/F) \\
\Eta(v,\Delta_n) &amp;amp;= \mathrm{lerp}(v, \Beta, 1-e^{R&#x27;\Delta_n})
\end{split}
&lt;/div&gt;
&lt;p&gt;Rewriting this in a sequence notation, we get:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\begin{split}
  R&#x27; &amp;amp;= F\ln(1-R/F) \\
u_{n+1} &amp;amp;= \mathrm{lerp}(u_n, \Beta, 1-e^{R&#x27;\Delta_n})
\end{split}
&lt;/div&gt;
&lt;p&gt;We&#x27;re going to make one last adjustment: &lt;span class=&quot;math inline&quot;&gt;R&#x27;&lt;/span&gt; is negative, which is not
exactly intuitive to work with as a user (in case it is defined arbitrarily and
not through the conversion formula), so we make a sign swap for convenience:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\boxed{\begin{split}
  R&#x27; &amp;amp;= -F\ln(1-R/F) \\
u_{n+1} &amp;amp;= \mathrm{lerp}(u_n, \Beta, 1-e^{-R&#x27;\Delta_n})
\end{split}}
&lt;/div&gt;
&lt;p&gt;The conversion formula is optional, it&#x27;s only needed to port a previously
broken code to the new formula. One interesting thing here is that &lt;span class=&quot;math inline&quot;&gt;R&#x27;&lt;/span&gt; is
fairly close to &lt;span class=&quot;math inline&quot;&gt;R&lt;/span&gt; when &lt;span class=&quot;math inline&quot;&gt;R&lt;/span&gt; is small.&lt;/p&gt;
&lt;p&gt;For example, a rate factor &lt;span class=&quot;math inline&quot;&gt;R=5&lt;/span&gt; at 60 FPS gives us &lt;span class=&quot;math inline&quot;&gt;R&#x27; \approx 5.22&lt;/span&gt;. This
means that if the rate factors weren&#x27;t closely tuned, it is probably acceptable
to go with &lt;span class=&quot;math inline&quot;&gt;R&#x27;=R&lt;/span&gt; and not bother with any conversion. Still, having that
formula can be useful to update all the decay constants and check that
everything still works as expected.&lt;/p&gt;
&lt;p&gt;Also, notice how if the delta gets very large, &lt;span class=&quot;math inline&quot;&gt;-R&#x27;\Delta_n&lt;/span&gt; is going toward
&lt;span class=&quot;math inline&quot;&gt;-\infty&lt;/span&gt;, &lt;span class=&quot;math inline&quot;&gt;e^{-R&#x27;\Delta_n}&lt;/span&gt; toward &lt;span class=&quot;math inline&quot;&gt;0&lt;/span&gt;, &lt;span class=&quot;math inline&quot;&gt;1-e^{-R&#x27;\Delta_n}&lt;/span&gt; toward &lt;span class=&quot;math inline&quot;&gt;1&lt;/span&gt;, and so
the interpolation is going to reach our final target &lt;span class=&quot;math inline&quot;&gt;\Beta&lt;/span&gt; without
overshooting. This means the formula doesn&#x27;t need any extra care with regard to
the 3rd issue we pointed out earlier.&lt;/p&gt;
&lt;p&gt;Looking at the previous curves but now with the new formula and an adjusted
rate:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/fixing-damp/new-formula.png&quot; alt=&quot;new formula&quot;&gt;
  &lt;figcaption&gt;Comparison of the curves at different frame rates with the new formula&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;So there we have it, the perfect formula, frame rate agnostic ✅,
deterministic ✅ and resilient to overshooting ✅. If you&#x27;ve quickly skimmed
through the maths, here is what you need to know:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;a = lerp(a, B, delta * RATE)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Should be changed to:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;a = lerp(a, B, 1.0 - exp(-delta * RATE2))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With the precomputed &lt;code&gt;RATE2 = -FPS * log(1 - RATE/FPS)&lt;/code&gt; (where &lt;code&gt;log&lt;/code&gt; is the
natural logarithm), or simply using &lt;code&gt;RATE2 = RATE&lt;/code&gt; as a rough equivalent.&lt;/p&gt;
&lt;p&gt;Also, any existing overshooting clamping can safely be dropped.&lt;/p&gt;
&lt;p&gt;Now please adjust your game to make the world a better and safer place for
everyone ♥&lt;/p&gt;
&lt;h2&gt;Going further&lt;/h2&gt;
&lt;p&gt;As &lt;a href=&quot;https://news.ycombinator.com/item?id=40401152&quot;&gt;suggested on HN&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For numerical stability it makes sense to &lt;a href=&quot;https://www.johndcook.com/blog/cpp_expm1/&quot;&gt;use &lt;code&gt;-expm1(x)&lt;/code&gt;&lt;/a&gt; instead of
&lt;code&gt;1-exp(x)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;API wise, proposing a &lt;a href=&quot;https://en.wikipedia.org/wiki/Exponential_smoothing#Time_constant&quot;&gt;time constant&lt;/a&gt; &lt;code&gt;T&lt;/code&gt; instead of the rate (where
&lt;code&gt;T=1/rate&lt;/code&gt;) might be more intuitive&lt;/li&gt;
&lt;li&gt;For performance reasons, the exponential could be expanded manually to
&lt;code&gt;x+x²/2&lt;/code&gt; for small value of &lt;code&gt;x&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

 </description>
</item>
<item>
 <guid>http://blog.pkh.me/p/40-hacking-window-titles-to-help-obs.html</guid>
 <link>http://blog.pkh.me/p/40-hacking-window-titles-to-help-obs.html</link>
 <title>Hacking window titles to help OBS</title>
 <pubDate>Tue, 06 Jun 2023 09:27:10 -0000</pubDate>
 <description>&lt;p&gt;This write-up is meant to present the rationale and technical details behind a
tiny project I wrote the other day, &lt;a href=&quot;https://github.com/ubitux/WindowTitleHack&quot;&gt;WTH, or WindowTitleHack&lt;/a&gt;, which is
meant to force a constant window name for apps that keep changing it (I&#x27;m
looking specifically at Firefox and Krita, but there are probably many others).&lt;/p&gt;
&lt;h2&gt;Why tho?&lt;/h2&gt;
&lt;p&gt;I&#x27;ve been streaming on Twitch from Linux (X11) with a barebone &lt;a href=&quot;https://obsproject.com/&quot;&gt;OBS
Studio&lt;/a&gt; setup for a while now, and while most of the experience has been
relatively smooth, one particularly striking frustration has been dealing with
windows detection.&lt;/p&gt;
&lt;p&gt;If we don&#x27;t want to capture the whole desktop for privacy reasons or simply to
have control over the scene layout depending on the currently focused app, we
need to rely on the &lt;code&gt;Window Capture (XComposite)&lt;/code&gt; source. This works mostly
fine, and it is actually able to track windows even when their title bar is
renamed. But obviously, upon restart it can&#x27;t find them again because both the
window titles and the window IDs changed, meaning we have to redo our setup by
reselecting the windows again.&lt;/p&gt;
&lt;p&gt;It would have been acceptable if that was the only issue I had, but one of the
more advanced feature I&#x27;m extensively using is the &lt;code&gt;Advanced Scene Switcher&lt;/code&gt;
(the builtin one, available through the &lt;code&gt;Tools&lt;/code&gt; menu). This tool is a basic
window title pattern matching system that allows automatic scene switches
depending on the current window. Note that it does seem to support regex, which
could help with the problem, but there is no guarantee that the app would leave
a recognizable matchable pattern in its title. Also, if we want multiple
Firefox windows but only match one in particular, the regex wouldn&#x27;t help.&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/windowtitlehack/obs-automatic-scene-switcher.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;OBS native automatic scene switcher&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Hacking Windows&lt;/h2&gt;
&lt;p&gt;One unreliable hack would be to spam &lt;code&gt;xdotool&lt;/code&gt; commands to correct the window
title. This could be a resource hog, and it would create quite a bunch of
races. One slight improvement over this would be to use &lt;code&gt;xprop -spy&lt;/code&gt;, but that
wouldn&#x27;t address the race conditions (since we would adjust the title &lt;em&gt;after&lt;/em&gt;
it&#x27;s been already changed).&lt;/p&gt;
&lt;p&gt;So how do we deal with that properly? Well, on X11 with the reference library
(&lt;code&gt;Xlib&lt;/code&gt;) there are actually various (actually a lot of) ways of changing the
title bar. It took me a while to identify which call(s) to target, but ended up
with the following call graph, where each function is actually exposed
publicly:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/windowtitlehack/x11-xchangeproperty.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;X11 XChangeProperty call tree&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;From this we can easily see that we only need to hook the deepest function
&lt;code&gt;XChangeProperty&lt;/code&gt;, and check if the property is &lt;code&gt;XA_WM_NAME&lt;/code&gt; (or its &amp;quot;modern&amp;quot;
sibling, &lt;code&gt;_NET_WM_NAME&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;How do we do that? With the help of the &lt;code&gt;LD_PRELOAD&lt;/code&gt; environment variable and a
dynamic library that implements a custom &lt;code&gt;XChangeProperty&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;First, we grab the original function:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;dlfcn.h&amp;gt;

/* A type matching the prototype of the target function */
typedef int (*XChangeProperty_func_type)(
    Display *display,
    Window w,
    Atom property,
    Atom type,
    int format,
    int mode,
    const unsigned char *data,
    int nelements
);

/* [...] */

XChangeProperty_func_type XChangeProperty_orig = dlsym(RTLD_NEXT, &amp;quot;XChangeProperty&amp;quot;);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We also need to craft a custom &lt;code&gt;_NET_WM_NAME&lt;/code&gt; atom:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;_NET_WM_NAME = XInternAtom(display, &amp;quot;_NET_WM_NAME&amp;quot;, 0);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With this we are now able to intercept all the &lt;code&gt;WM_NAME&lt;/code&gt; events and override
them with our own:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;if (property == XA_WM_NAME || property == _NET_WM_NAME) {
    data = (const unsigned char *)new_title;
    nelements = (int)strlen(new_title);
}
return XChangeProperty_orig(display, w, property, type, format, mode, data, nelements);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We wrap all of this into our own redefinition of &lt;code&gt;XChangeProperty&lt;/code&gt; and… that&#x27;s
pretty much it.&lt;/p&gt;
&lt;p&gt;Now due to a long history of development, &lt;code&gt;Xlib&lt;/code&gt; has been &amp;quot;deprecated&amp;quot; and
superseded by &lt;code&gt;libxcb&lt;/code&gt;. Both are widely used, but fortunately the APIs are more
or less similar. The function to hook is &lt;code&gt;xcb_change_property&lt;/code&gt;, and defining
&lt;code&gt;_NET_WM_NAME&lt;/code&gt; is slightly more cumbered but not exactly challenging:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;const xcb_intern_atom_cookie_t cookie = xcb_intern_atom(conn, 0, strlen(&amp;quot;_NET_WM_NAME&amp;quot;), &amp;quot;_NET_WM_NAME&amp;quot;);
xcb_intern_atom_reply_t *reply = xcb_intern_atom_reply(conn, cookie, NULL);
if (reply)
    _NET_WM_NAME = reply-&amp;gt;atom;
free(reply);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Aside from that, the code is pretty much the same.&lt;/p&gt;
&lt;h2&gt;Configuration&lt;/h2&gt;
&lt;p&gt;To pass down the custom title to override, I&#x27;ve been relying on an environment
variable &lt;code&gt;WTH_TITLE&lt;/code&gt;. From a user point of view, it looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;LD_PRELOAD=&amp;quot;builddir/libwth.so&amp;quot; WTH_TITLE=&amp;quot;Krita4ever&amp;quot; krita
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We could probably improve the usability by creating a wrapping tool (so that we
could have something such as &lt;code&gt;./wth --title=Krita4ever krita&lt;/code&gt;). Unfortunately I
wasn&#x27;t yet able to make a self-referencing executable accepted by &lt;code&gt;LD_PRELOAD&lt;/code&gt;,
so for now the manual &lt;code&gt;LD_PRELOAD&lt;/code&gt; and &lt;code&gt;WTH_TITLE&lt;/code&gt; environment will do just
fine.&lt;/p&gt;
&lt;h2&gt;Thread safety&lt;/h2&gt;
&lt;p&gt;To avoid a bunch of redundant function round-trips we need to globally cache a
few things: the new title (to avoid fetching it in the environment all the
time), the original functions (to save the &lt;code&gt;dlsym&lt;/code&gt; call), and &lt;code&gt;_NET_WM_NAME&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Those are loaded lazily at the first function call, but we have no guarantee
with regards to concurrent calls on that hooked function so we must create our
own lock. I initially though about using &lt;code&gt;pthread_once&lt;/code&gt; but unfortunately the
initialization callback mechanism doesn&#x27;t allow any custom argument. Again,
this is merely a slight annoyance since we can implement our own in a few lines
of code:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;/* The &amp;quot;once&amp;quot; API is similar to pthread_once but allows a custom function argument */
struct wth_once {
    pthread_mutex_t lock;
    int initialized;
};

#define WTH_ONCE_INITIALIZER {.lock=PTHREAD_MUTEX_INITIALIZER}

typedef void (*init_func_type)(void *user_arg);

void wth_init_once(struct wth_once *once, init_func_type init_func, void *user_arg)
{
    pthread_mutex_lock(&amp;amp;once-&amp;gt;lock);
    if (!once-&amp;gt;initialized) {
        init_func(user_arg);
        once-&amp;gt;initialized = 1;
    }
    pthread_mutex_unlock(&amp;amp;once-&amp;gt;lock);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which we use like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;static struct wth_once once = WTH_ONCE_INITIALIZER;

static void init_once(void *user_arg)
{
    Display *display = user_arg;
    /* [...] */
}

/* [...] */

wth_init_once(&amp;amp;once, init_once, display);
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;The End?&lt;/h2&gt;
&lt;p&gt;I&#x27;ve been delaying doing this project for weeks because it felt complex at
first glance, but it actually just took me a few hours. Probably the same
amount of time it took me to write this article. While the project is
admittedly really small, it still feel like a nice accomplishment. I hope it&#x27;s
useful to other people.&lt;/p&gt;
&lt;p&gt;Now, the Wayland support is probably the most obvious improvement the project
can receive, but I don&#x27;t have such a setup locally to test yet, so this is
postponed for an undetermined amount of time.&lt;/p&gt;
&lt;p&gt;The code is released with a permissive license (MIT); if you want to contribute
you can open a pull request but getting in touch with me first is appreciated
to avoid unnecessary and overlapping efforts.&lt;/p&gt;

 </description>
</item>
<item>
 <guid>http://blog.pkh.me/p/39-improving-color-quantization-heuristics.html</guid>
 <link>http://blog.pkh.me/p/39-improving-color-quantization-heuristics.html</link>
 <title>Improving color quantization heuristics</title>
 <pubDate>Sat, 31 Dec 2022 12:00:43 -0000</pubDate>
 <description>&lt;p&gt;In 2015, I wrote an article about &lt;a href=&quot;http://blog.pkh.me/p/21-high-quality-gif-with-ffmpeg.html&quot;&gt;how the palette color quantization was
improved in FFmpeg&lt;/a&gt; in order to make nice animated GIF files. For some
reason, to this day this is one of my most popular article.&lt;/p&gt;
&lt;p&gt;As time passed, my experience with colors grew and I ended up being quite
ashamed and frustrated with the state of these filters. A lot of the code was
naive (when not terribly wrong), despite the apparent good results.&lt;/p&gt;
&lt;p&gt;One of the major change I wanted to do was to evaluate the color distances using
a perceptually uniform color space, instead of using a naive euclidean distance
of RGB triplets.&lt;/p&gt;
&lt;p&gt;As usual it felt like a week-end long project; after all, all I have to do is
change the distance function to work in a different space, right? Well, if
you&#x27;re following my blog you might have noticed I&#x27;ve add numerous adventures
that stacked up on each others:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I had to work out the &lt;a href=&quot;http://blog.pkh.me/p/38-porting-oklab-colorspace-to-integer-arithmetic.html&quot;&gt;colorspace with integer arithmetic&lt;/a&gt; first&lt;/li&gt;
&lt;li&gt;...which forced me to look into &lt;a href=&quot;http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html&quot;&gt;integer division&lt;/a&gt; more deeply&lt;/li&gt;
&lt;li&gt;...which confronted me to all sort of &lt;a href=&quot;http://blog.pkh.me/p/37-gcc-undefined-behaviors-are-getting-wild.html&quot;&gt;undefined behaviours&lt;/a&gt; in the
process&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And when I finally reached the point where I could make the switch to
&lt;a href=&quot;https://bottosson.github.io/posts/oklab/&quot;&gt;OkLab&lt;/a&gt; (the perceptual colorspace), a few experiments showed that the
flavor of the core algorithm I was using might contain some fundamental flaws,
or at least was not implementing optimal heuristics. So here we go again,
quickly enough I find myself starting a new research study in the pursuit of
understanding how to put pixels on the screen. This write-up is the story of
yet another self-inflicted struggle.&lt;/p&gt;
&lt;h2&gt;Palette quantization&lt;/h2&gt;
&lt;p&gt;But what is &lt;em&gt;palette quantization&lt;/em&gt;? It essentially refers to the process of
reducing the number of available colors of an image down to a smaller subset.
In sRGB, an image can have up to 16.7 million colors. In practice though it&#x27;s
generally much less, to the surprise of no one. Still, it&#x27;s not rare to have a
few hundreds of thousands different colors in a single picture. Our goal is to
reduce that to something like 256 colors that represent them best, and use
these colors to create a new picture.&lt;/p&gt;
&lt;p&gt;Why you may ask? There are multiple reasons, here are some:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Improve size compression (this is a lossy operation of course, and using
dithering on top might actually defeat the original purpose)&lt;/li&gt;
&lt;li&gt;Some codecs might not support anything else than limited palettes (GIF or
subtitles codecs are examples)&lt;/li&gt;
&lt;li&gt;Various artistic purposes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Following is an example of a picture quantized at different levels:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Original (26125 colors)&lt;/th&gt;
&lt;th&gt;Quantized to 8bpp (256 colors)&lt;/th&gt;
&lt;th&gt;Quantized to 2bpp (4 colors)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;img src=&quot;http://blog.pkh.me/img/color-quant/cat-orig.png&quot; alt=&quot;Cat (original)&quot; /&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src=&quot;http://blog.pkh.me/img/color-quant/cat-256.png&quot; alt=&quot;Cat (8bpp)&quot; /&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src=&quot;http://blog.pkh.me/img/color-quant/cat-4.png&quot; alt=&quot;Cat (2bpp)&quot; /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This color quantization process can be roughly summarized in a 4-steps based
process:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Sample the input image: we build a histogram of all the colors in the
picture (basically a simple statistical analysis)&lt;/li&gt;
&lt;li&gt;Design a colormap: we build the palette through various means using the
histograms&lt;/li&gt;
&lt;li&gt;Create a pixel mapping which associates a color (one that can be found in
the input image) with another (one that can be found in the newly created
palette)&lt;/li&gt;
&lt;li&gt;Image quantizing: we use the color mapping to build our new image. This step
may also involve some &lt;a href=&quot;https://en.wikipedia.org/wiki/Dither&quot;&gt;dithering&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The study here will focus on step 2 (which itself relies on step 1).&lt;/p&gt;
&lt;h2&gt;Colormap design algorithms&lt;/h2&gt;
&lt;p&gt;A palette is simply a set of colors. It can be represented in various ways, for
example here in 2D and 3D:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/color-quant/pal-2d-3d.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;A 256 color palette represented in 2D and 3D&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;To generate such a palette, all sort of algorithms exists. They are usually
classified into 2 large categories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dividing/splitting algorithms (such as Median-Cut and its various flavors)&lt;/li&gt;
&lt;li&gt;Clustering algorithms (such as K-means, maximin distance, (E)LBG or pairwise
clustering)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The former are faster but non-optimal while the latter are slower but better.
The problem is &lt;a href=&quot;https://en.wikipedia.org/wiki/NP-completeness&quot;&gt;NP-complete&lt;/a&gt;, meaning it&#x27;s possible to find the
optimal solution but it can be extremely costly. On the other hand, it&#x27;s
possible to find &amp;quot;local optimums&amp;quot; at minimal cost.&lt;/p&gt;
&lt;p&gt;Since I&#x27;m working within FFmpeg, speed has always been a priority. This was the
reason that motivated me to initially implement the Median-Cut over a more
expensive algorithm.&lt;/p&gt;
&lt;p&gt;The rough picture of the algorithm is relatively easy to grasp. Assuming we
want a palette of &lt;code&gt;K&lt;/code&gt; colors:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A set &lt;code&gt;S&lt;/code&gt; of all the colors in the input picture is constructed, along with
a respective set &lt;code&gt;W&lt;/code&gt; of the weight of each color (how much they appear)&lt;/li&gt;
&lt;li&gt;Since the colors are expressed as RGB triplets, they can be encapsulated
in one big cuboid, or box&lt;/li&gt;
&lt;li&gt;The box is cut in two along one of the axis (R, G or B) on the median
(hence the name of the algorithm)&lt;/li&gt;
&lt;li&gt;If we don&#x27;t have a total &lt;code&gt;K&lt;/code&gt; boxes yet, pick one of them and go back to
previous step&lt;/li&gt;
&lt;li&gt;All the colors in each of the &lt;code&gt;K&lt;/code&gt; boxes are then averaged to form the color
palette entries&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Here is how the process looks like visually:&lt;/p&gt;
&lt;p&gt;&lt;video src=&quot;http://blog.pkh.me/misc/mediancut-parrot-16.mp4&quot; controls=&quot;controls&quot; width=&quot;800&quot;&gt;Median-Cut algorithm targeting 16 boxes&lt;/video&gt;&lt;/p&gt;
&lt;p&gt;You may have spotted in this video that the colors are not expressed in RGB but
in Lab: this is because instead of representing the colors in a traditional RGB
colorspace, we are instead using the OkLab colorspace which has the property of
being perceptually uniform. It doesn&#x27;t really change the Median Cut algorithm,
but it definitely has an impact on the resulting palette.&lt;/p&gt;
&lt;p&gt;One striking limitation of this algorithm is that we are working exclusively
with cuboids: the cuts are limited to an axis, we are not cutting along an
arbitrary plane or a more complex shape. Think of it like working with voxels
instead of more free-form geometries. The main benefit is that the algorithm is
pretty simple to implement.&lt;/p&gt;
&lt;p&gt;Now the description provided earlier conveniently avoided describing two
important aspects happening in step 3 and 4:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;How do we choose the next box to split?&lt;/li&gt;
&lt;li&gt;How do we choose along which axis of the box we make the cut?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I pondered about that for a quite a long time.&lt;/p&gt;
&lt;h2&gt;An overview of the possible heuristics&lt;/h2&gt;
&lt;p&gt;In bulk, some of the heuristics I started thinking of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Should we take the box that has the longest axis across all boxes?&lt;/li&gt;
&lt;li&gt;Should we take the box that has the largest volume?&lt;/li&gt;
&lt;li&gt;Should we take the box that has the biggest &lt;a href=&quot;https://en.wikipedia.org/wiki/Mean_squared_error&quot;&gt;Mean Squared Error&lt;/a&gt; when
compared to its average color?&lt;/li&gt;
&lt;li&gt;Should we take the box that has the &lt;em&gt;axis&lt;/em&gt; with the biggest MSE?&lt;/li&gt;
&lt;li&gt;Assuming we choose to go with the MSE, should it be normalized across all
boxes?&lt;/li&gt;
&lt;li&gt;Should we even account for the weight of each color or consider them equal?&lt;/li&gt;
&lt;li&gt;what about the axis? Is it better to pick the longest or the one with the
higher MSE?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I tried to formalize these questions mathematically to the best of my limited
abilities. So let&#x27;s start by saying that all the colors &lt;code&gt;c&lt;/code&gt; of a given box are
stored in a &lt;code&gt;N×M&lt;/code&gt; 2D-array following the matrix notation:&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
    &lt;tr&gt;&lt;td&gt;L₁&lt;/td&gt;&lt;td&gt;L₂&lt;/td&gt;&lt;td&gt;L₃&lt;/td&gt;&lt;td&gt;…&lt;/td&gt;&lt;td&gt;Lₘ&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;a₁&lt;/td&gt;&lt;td&gt;a₂&lt;/td&gt;&lt;td&gt;a₃&lt;/td&gt;&lt;td&gt;…&lt;/td&gt;&lt;td&gt;aₘ&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;td&gt;b₁&lt;/td&gt;&lt;td&gt;b₂&lt;/td&gt;&lt;td&gt;b₃&lt;/td&gt;&lt;td&gt;…&lt;/td&gt;&lt;td&gt;bₘ&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;code&gt;N&lt;/code&gt; is the number of components (3 in our case, whether it&#x27;s RGB or Lab), and
&lt;code&gt;M&lt;/code&gt; the number of colors in that box. You can visualize this as a list of
vectors as well, where &lt;code&gt;c_{i,j}&lt;/code&gt; is the color at row &lt;code&gt;i&lt;/code&gt; and column &lt;code&gt;j&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;With that in mind we can sketch the following diagram representing the tree of
heuristic possibilities to implement:&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/color-quant/diagram-heuristics.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;Tree of potential heuristics for the Median-Cut algorithm&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Mathematicians are going to kill me for doodling random notes all over this
perfectly understandable symbols gibberish, but I believe this is required for
the human beings reading this article.&lt;/p&gt;
&lt;p&gt;In summary, we end up with a total of 24 combinations to try out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2 axis selection heuristics:
&lt;ul&gt;
&lt;li&gt;Cut the axis with the maximum error squared&lt;/li&gt;
&lt;li&gt;Cut the axis with the maximum length&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;3 operators:
&lt;ul&gt;
&lt;li&gt;Maximum measurement out of all the channels&lt;/li&gt;
&lt;li&gt;Product of the measurements of all the channels&lt;/li&gt;
&lt;li&gt;Sum of the measurements of all the channels&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;4 measurements:
&lt;ul&gt;
&lt;li&gt;Error squared, honoring weights&lt;/li&gt;
&lt;li&gt;Error squared, not honoring weights&lt;/li&gt;
&lt;li&gt;Error squared, honoring weights, normalized&lt;/li&gt;
&lt;li&gt;Length of the axis&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If we start to intuitively think about which ones are likely going to perform
the best, we quickly realize that we haven&#x27;t actually formalized what we are
trying to achieve. Such a rookie mistake. Clarifying this will help us getting
a better feeling about the likely outcome.&lt;/p&gt;
&lt;p&gt;I chose to target an output that minimizes the MSE against the reference image,
in a perceptual way. Said differently, trying to make the perceptual distance
between an input and output color pixel as minimal as possible. This is an
arbitrary and debatable target, but it&#x27;s relatively simple and objective to
evaluate if we have faith in the selected perceptual model. Another appropriate
metric could have been to find the ideal palette through another algorithm, and
compare against that instead. Doing that unfortunately implied that I would
trust that other algorithm, its implementation, and that I have enough
computing power.&lt;/p&gt;
&lt;p&gt;So to summarize, we want to minimize the MSE between the input and output,
evaluated in the OkLab color space. This can be expressed with the following
formula:&lt;/p&gt;
&lt;div class=&quot;math block&quot;&gt;
\min_{P : |P| = K} \displaystyle\sum_{C \in P} \sum_{c \in C} w_c||c-\mu_C||^2
&lt;/div&gt;
&lt;p&gt;Where:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;P&lt;/span&gt; is a &lt;a href=&quot;https://en.m.wikipedia.org/wiki/Partition_of_a_set&quot;&gt;partition&lt;/a&gt;
(which we constrain to a box in our implementation)&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;C&lt;/span&gt; the set of colors in the partition &lt;span class=&quot;math inline&quot;&gt;P&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;w&lt;/span&gt; the weight of a color&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;c&lt;/span&gt; a single color&lt;/li&gt;
&lt;li&gt;&lt;span class=&quot;math inline&quot;&gt;\mu&lt;/span&gt; the average color of the set &lt;span class=&quot;math inline&quot;&gt;C&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Special thanks to &lt;code&gt;criver&lt;/code&gt; for helping me a ton on the math area, this last
formula is from them.&lt;/p&gt;
&lt;p&gt;Looking at the formula, we can see how similar it is to certain branches of the
heuristics tree, so we can start getting an intuition about the result of the
experiment.&lt;/p&gt;
&lt;h2&gt;Experiment language&lt;/h2&gt;
&lt;p&gt;Short deviation from the main topic (feel free to skip to the next section):
working in C within FFmpeg quickly became a hurdle more than anything. Aside
from the lack of flexibility, the implicit casts destroying the precision
deceitfully, and the undefined behaviours, all kind of C quirks went in the way
several times, which made me question my sanity. This one typically severely
messed me up while trying to average the colors:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdint.h&amp;gt;

int main (void)
{
    const int32_t x = -30;
    const uint32_t y = 10;

    const uint32_t a = 30;
    const int32_t b = -10;

    printf(&amp;quot;%d×%u=%d\n&amp;quot;, x, y, x * y);
    printf(&amp;quot;%u×%d=%d\n&amp;quot;, a, b, a * b);
    printf(&amp;quot;%d/%u=%d\n&amp;quot;, x, y, x / y);
    printf(&amp;quot;%u/%d=%d\n&amp;quot;, a, b, a / b);
    return 0;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt;% cc -Wall -Wextra -fsanitize=undefined test.c -o test &amp;amp;&amp;amp; ./test
-30×10=-300
30×-10=-300
-30/10=429496726
30/-10=0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Anyway, I know this is obvious but if you aren&#x27;t already doing that I suggest
you build your experiments in another language, Python or whatever, and rewrite
them in C later when you figured out your expected output.&lt;/p&gt;
&lt;p&gt;Re-implementing what I needed in Python didn&#x27;t take me long. It was, and still
is obviously much slower at runtime, but that&#x27;s fine. There is a lot of room
for speed improvement, typically by relying on &lt;code&gt;numpy&lt;/code&gt; (which I didn&#x27;t bother
with).&lt;/p&gt;
&lt;h2&gt;Experiment results&lt;/h2&gt;
&lt;p&gt;I created a &lt;a href=&quot;https://github.com/ubitux/research/&quot;&gt;research repository&lt;/a&gt; for the occasion. The code to
reproduce and the results can be found in the &lt;a href=&quot;https://github.com/ubitux/research/tree/main/color-quantization&quot;&gt;color quantization
README&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In short, based on the results, we can conclude that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Overall, the box that has the axis with the largest non-normalized weighted
sum of squared error is the best candidate in the box selection algorithm&lt;/li&gt;
&lt;li&gt;Overall, cutting the axis with the largest weighted sum of squared error is
the best axis cut selection algorithm&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To my surprise, normalizing the weights per box is not a good idea. I initially
observed that by trial and error, which was actually one of the main motivator
for this research. I initially thought normalizing each box was necessary in
order to compare them against each others (such that they are compared on a
common ground). My loose explanation of the phenomenon was that not normalizing
causes a bias towards boxes with many colors, but that&#x27;s actually exactly what
we want. I believe it can also be explained by our evaluation function: we want
to minimize the error across the whole set of colors, so small partitions (in
color counts) must not be made stronger. At least not in the context of the
target we chose.&lt;/p&gt;
&lt;p&gt;It&#x27;s also interesting to see how the &lt;code&gt;max()&lt;/code&gt; seems to perform better than the
&lt;code&gt;sum()&lt;/code&gt; of the variance of each component most of the time. Admittedly my
picture samples set is not that big, which may imply that more experiments to
confirm that tendency are required.&lt;/p&gt;
&lt;p&gt;In retrospective, this might have been quickly predictable to someone with a
mathematical background. But since I don&#x27;t have that, nor do I trust my
abstract thinking much, I&#x27;m kind of forced to try things out often. This is
likely one of the many instances where I spent way too much energy on something
obvious from the beginning, but I have the hope it will actually provide some
useful information for other lost souls out there.&lt;/p&gt;
&lt;h2&gt;Known limitations&lt;/h2&gt;
&lt;p&gt;There are two main limitations I want to discuss before closing this article.
The first one is related to minimizing the MSE even more.&lt;/p&gt;
&lt;h3&gt;K-means refinement&lt;/h3&gt;
&lt;p&gt;We know the Median-Cut actually provides a rough estimate of the optimal
palette. One thing we could do is use it as a first step before refinement, for
example by running a few K-means iterations as post-processing (how much
refinement/iterations could be a user control). The general idea of K-means is
to progressively move each colors individually to a more appropriate box, that
is a box for which the color distance to the average color of that box is
smaller. I started implementing that in a very naive way, so it&#x27;s extremely
slow, but that&#x27;s something to investigate further because it definitely
improves the results.&lt;/p&gt;
&lt;p&gt;Most of the academic literature seems to suggest the use of the K-means
clustering, but all of them require some startup step. Some come up with
various heuristics, some use PCA, but I&#x27;ve yet to see one that rely on
Median-Cut as first pass; maybe that&#x27;s not such a good idea, but who knows.&lt;/p&gt;
&lt;h3&gt;Bias toward perceived lightness&lt;/h3&gt;
&lt;p&gt;Another more annoying problem for which I have no solution for is with regards
to the human perception being much more sensitive to light changes than hue. If
you look at the first demo with the parrot, you may have observed the boxes are
kind of thin. This is because the &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt; components (respectively how
green/red and blue/yellow the color is) have a much smaller amplitude compared
to the &lt;code&gt;L&lt;/code&gt; (perceived lightness).&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;http://blog.pkh.me/img/color-quant/oklab-axis-scaled.png&quot; alt=&quot;&quot;&gt;
  &lt;figcaption&gt;Side by side comparison of the spread of colors between a stretched and normalized view&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;You may rightfully question whether this is a problem or not. In practice, this
means that when &lt;code&gt;K&lt;/code&gt; is low (let&#x27;s say smaller than 8 or even 16), cuts along &lt;code&gt;L&lt;/code&gt;
will almost always be preferred, causing the picture to be heavily desaturated.
This is because it tries to preserve the most significant attribute in human
perception: the lightness.&lt;/p&gt;
&lt;p&gt;That particular picture is actually a pathological study case:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;4 colors&lt;/th&gt;
&lt;th&gt;8 colors&lt;/th&gt;
&lt;th&gt;12 colors&lt;/th&gt;
&lt;th&gt;16 colors&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;img src=&quot;http://blog.pkh.me/img/color-quant/woman-4.png&quot; alt=&quot;Portrait K=4&quot; /&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src=&quot;http://blog.pkh.me/img/color-quant/woman-8.png&quot; alt=&quot;Portrait K=8&quot; /&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src=&quot;http://blog.pkh.me/img/color-quant/woman-12.png&quot; alt=&quot;Portrait K=12&quot; /&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src=&quot;http://blog.pkh.me/img/color-quant/woman-16.png&quot; alt=&quot;Portrait K=16&quot; /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We can see the hue timidly appearing around &lt;code&gt;K=16&lt;/code&gt; (specifically it starts
being more strongly noticeable starting the cut &lt;code&gt;K=13&lt;/code&gt;).&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;For now, I&#x27;m mostly done with this &amp;quot;week-end long project&amp;quot; into which I
actually poured 2 or 3 months of lifetime. The FFmpeg patchset will likely be
upstreamed soon so everyone should hopefully be able to benefit from it in the
next release. It will also come with &lt;a href=&quot;https://fosstodon.org/@bug/109602427382086789&quot;&gt;additional dithering
methods&lt;/a&gt;, which implementation actually was a relaxing
distraction from all this hardship. There are still many ways of improving this
work, but it&#x27;s the end of the line for me, so I&#x27;ll trust the Internet with it.&lt;/p&gt;

 </description>
</item>
 </channel>
</rss>