A small freedom area RSSDefault feed for blog.pkh.me
http://blog.pkh.me/
http://blog.pkh.me/p/41-fixing-the-iterative-damping-interpolation-in-video-games.html
http://blog.pkh.me/p/41-fixing-the-iterative-damping-interpolation-in-video-games.html
Fixing the iterative damping interpolation in video gamesSat, 18 May 2024 12:22:15 -0000<p>As I'm exploring the fantastic world of indie game development lately, I end up
watching a large number of video tutorials on the subject. Even though the
quality of the content is pretty variable, I'm very grateful to the creators
for it. That being said, I couldn't help noticing this particular bit times and
times again:</p>
<pre><code class="language-python">a = lerp(a, B, delta * RATE)
</code></pre>
<p>Behind this apparent banal call hides a terrible curse, forever perpetrated by
innocent souls on the Internet.</p>
<p>In this article we will study what it's trying to achieve, how it works, why
it's wrong, and then we'll come up with a good solution to the initial problem.</p>
<p>The usual warning: I don't have a mathematics or academic background, so the
article is addressed at other neanderthals like myself, who managed to
understand that pressing keys on a keyboard make pixels turn on and off.</p>
<h2>What is it?</h2>
<p>Let's start from the beginning. We're in a game engine main loop callback
called at a regular interval (roughly), passing down the time difference from
the last call.</p>
<p>In Godot engine, it looks like this:</p>
<pre><code class="language-text">func _physics_process(delta: float):
...
</code></pre>
<p>If the game is configured to refresh at 60 FPS, we can expect this function to
be called around 60 times per second with <code>delta = 1/60 = 0.01666...</code>.</p>
<p>As a game developer, we want some smooth animations for all kind of
transformations. For example, we may want the speed of the player to go down to
zero as they release the moving key. We could do that linearly, but to make the
stop less brutal and robotic we want to slow down the speed progressively.</p>
<figure>
<img src="http://blog.pkh.me/img/fixing-damp/lin-vs-exp.gif" alt="linear vs exponential GIF">
<figcaption>Linear (top) versus smooth/exponential (bottom) animation</figcaption>
</figure>
<p>Virtually every tutorial will suggest updating some random variable with
something like that:</p>
<pre><code class="language-python">velocity = lerp(velocity, 0, delta * RATE)
</code></pre>
<p>At 60 FPS, a decay <code>RATE</code> defined to <code>3.5</code>, and an initial <code>velocity</code> of <code>100</code>,
the <code>velocity</code> will go down to <code>0</code> following this curve:</p>
<figure>
<img src="http://blog.pkh.me/img/fixing-damp/velocity-curve.png" alt="velocity curve">
<figcaption>Example curve of a decaying variable</figcaption>
</figure>
<p><strong>Note</strong>: <code>velocity</code> is just a variable name example, it can be found in many
other contexts</p>
<p>If you're familiar with <code>lerp()</code> ("linear interpolation") you may be wondering
why this is making a curve. Indeed, this <code>lerp()</code> function, also known as
<code>mix()</code>, is a simple linear function defined as <code>lerp(a,b,x) = x*(b-a) + a</code> or
its alternative stable form <code>lerp(a,b,x) = (1-a)x + bx</code>. For more information,
see a <a href="http://blog.pkh.me/p/29-the-most-useful-math-formulas.html">previous article</a> about this particular function. But here
we are re-using the previous value, so this essentially means nesting <code>lerp()</code>
function calls, which expands into a power formula, forming a curve composed of
a chain of small straight segments.</p>
<h2>Why is it wrong?</h2>
<p>The main issue is that the formula is heavily depending on the refresh rate. If
the game is supposed to work at 30, 60, or 144 FPS, then it means the physics
engine is going to behave differently.</p>
<p>Here is an illustration of the kind of instability we can expect:</p>
<figure>
<img src="http://blog.pkh.me/img/fixing-damp/problematic-formula.png" alt="problematic formula">
<figcaption>Comparison of the curves at different frame rates with the problematic formula</figcaption>
</figure>
<p>Note that the inaccuracy when compared to an ideal curve is not the issue here.
The problem is that the game mechanics are different depending on the hardware,
the system, and the wind direction observed in a small island of Japan. Imagine
being able to jump further if we replace our 60Hz monitor with a 144Hz one,
that would be some nasty pay to win incentive.</p>
<p>We may be able to get away with this by forcing a constant refresh rate for the
game and consider this a non-issue (I'm not convinced this is achievable on all
engines and platforms), but then we meet another problem: the device may not be
able to hold this requirement at all times because of potential lags (for
reasons that may be outside our control). That's right, so far we assumed
<code>delta=1/FPS</code> but that's merely a target, it could fluctuate, causing mild to
dramatic situations gameplay wise.</p>
<p>One last issue with that formula is the situation of a huge delay spike,
causing an overshooting of the target. For example if we have <code>RATE=3</code> and we
end up with a frame that takes 500ms for whatever random reason, we're going to
interpolate with a value of 1.5, which is way above 1. This is easily fixed by
maxing out the 3rd argument of <code>lerp</code> to 1, but we have to keep that issue in
mind.</p>
<p>To summarize, the formula is:</p>
<ol>
<li>not frame rate agnostic ❌</li>
<li>non deterministic ❌</li>
<li>vulnerable to overshooting ❌</li>
</ol>
<p>If you're not interested in the gory details on the <em>how</em>, you can now jump
straight to the conclusion for a better alternative.</p>
<h2>Study</h2>
<p>We're going to switch to a more mathematical notation from now on. It's only
going to be linear algebra, nothing particularly fancy, but we're going to make
a mess of 1 letter symbols, bear with me.</p>
<p>Let's name the exhaustive list of inputs of our problem:</p>
<ul>
<li>initial value: <span class="math inline">a_0=\Alpha</span> (from where we start, only used once)</li>
<li>target value: <span class="math inline">\Beta</span> (where we are going, constant value)</li>
<li>time delta: <span class="math inline">\Delta_n</span> (time difference from last call)</li>
<li>the rate of change: <span class="math inline">R</span> (arbitrary scaling user constant)</li>
<li>original sequence: <span class="math inline">a_{n+1} = \texttt{lerp}(a_n, \Beta, R\Delta_n)</span> (the code in the main
loop callback)</li>
<li>frame rate: <span class="math inline">F</span> (the target frame rate, for example <span class="math inline">60</span> FPS)</li>
<li>time: <span class="math inline">t</span> (animation time elapsed)</li>
</ul>
<p>What we are looking for is a new sequence formula <span class="math inline">u_n</span> (<span class="math inline">u</span> standing for
<em>purfect</em>) that doesn't have the 3 previously mentioned pitfalls.</p>
<p>The first thing we can do is to transform this recursive sequence into the
expected ideal contiguous time based function. The original sequence was
designed for a given rate <span class="math inline">R</span> and FPS <span class="math inline">F</span>: this means that while <span class="math inline">\Delta_n</span>
changes in practice every frame, the ideal function we are looking for is
constant: <span class="math inline">\Delta=1/F</span>.</p>
<p>So instead of starting from <span class="math inline">a_{n+1} = \texttt{lerp}(a_n, \Beta, R\Delta_n)</span>,
we will look for <span class="math inline">u_n</span> starting from <span class="math inline">u_{n+1} = \texttt{lerp}(u_n, \Beta,
R\Delta)</span> with <span class="math inline">u_0=a_0=\Alpha</span>.</p>
<p>Since I'm lazy and incompetent, we are just going to ask WolframAlpha for help
finding the solution to the recursive sequence. But to feed its input we need
to simplify the terms a bit:</p>
<div class="math block">
\begin{split}
u_{n+1} &= \texttt{lerp}(u_n, \Beta, R\Delta) \\
&= u_n(1-R\Delta) + \Beta R\Delta \\
&= u_nP + Q
\end{split}
</div>
<p>...with <span class="math inline">P=(1-R\Delta)</span> and <span class="math inline">Q=\Beta R\Delta</span>. We do that so we have a familiar <span class="math inline">ax+b</span> linear form.</p>
<p><a href="https://www.wolframalpha.com/input?i=u%280%29%3DA+and+u%28n%2B1%29%3Du%28n%29*P%2BQ">According to WolframAlpha</a> this is equivalent to:</p>
<div class="math block">
u_n = \Alpha P^n + \frac{Q(P^n-1)}{P-1}
</div>
<p>This is great because we now have the formula according to <span class="math inline">n</span>, our frame
number. We can also express that discrete sequence into a contiguous function
according to the time <span class="math inline">t</span>:</p>
<div class="math block">
f(t) = \Alpha P^{tF} + \frac{Q(P^{tF}-1)}{P-1}
</div>
<p>Expanding our temporary <span class="math inline">P</span> and <span class="math inline">Q</span> placeholders with their values and
unrolling, we get:</p>
<div class="math block">
\begin{split}
f(t) &= AP^{tF} + \frac{Q(P^{tF}-1)}{P-1} \\
&= A(1-R\Delta)^{tF} + \frac{\Beta R\Delta((1-R\Delta)^{tF}-1)}{(1-R\Delta)-1} \\
&= A(1-R\Delta)^{tF} - \Beta((1-R\Delta)^{tF}-1) \\
&= A(1-R\Delta)^{tF} + \Beta(1-(1-R\Delta)^{tF}) \\
&= \texttt{lerp}(\Beta, \Alpha, (1-R\Delta)^{tF}) \\
&= \texttt{lerp}(\Beta, \Alpha, (1-R/F)^{tF}) \\
f(t) &= \boxed{\texttt{lerp}(\Alpha, \Beta, 1-(1-R/F)^{tF})}
\end{split}
</div>
<p>This function perfectly matches the initial <code>lerp()</code> sequence in the
hypothetical situation where the frame rate is honored. Basically, it's <strong>what
the sequence <span class="math inline">a_{n+1}</span> was meant to emulate at a given frame rate <span class="math inline">F</span></strong>.</p>
<p><strong>Note</strong>: we swapped the first 2 terms of <code>lerp()</code> at the last step because it
makes more sense semantically to go from <span class="math inline">\Alpha</span> to <span class="math inline">\Beta</span>.</p>
<p>Let's again summarize what we have and what we want: we're into the game main
loop and we want our running value to stick to that <span class="math inline">f(t)</span> function. We
have:</p>
<ul>
<li><span class="math inline">v=f(t)</span>: the value previously computed (<span class="math inline">t</span> is the running duration so far,
but we don't have it); in the original sequence this is known as <span class="math inline">a_n</span></li>
<li><span class="math inline">\Delta_n</span>: the delta time for the current frame</li>
</ul>
<p>We are looking for a function <span class="math inline">\Eta(v,\Delta_n)</span> which defines the position of
a new point on the curve, only knowing <span class="math inline">v</span> and <span class="math inline">\Delta_n</span>. It's a "time
agnostic" version of <span class="math inline">f(t)</span>.</p>
<p>Basically, it is defined as <span class="math inline">\Eta(v,\Delta_n)=f(t+\Delta_n)</span>, but since we don't have
<span class="math inline">t</span> it's not very helpful. That being said, while we don't have <span class="math inline">t</span>, we do have
<span class="math inline">f(t)</span> (the previous value <span class="math inline">v</span>).</p>
<p>Looking at the curve, we know the y-value of the previous point, and we know
the difference between the new point and the previous point on the x-axis:</p>
<figure>
<img src="http://blog.pkh.me/img/fixing-damp/prev-to-current-curve.png" alt="previous to current point">
<figcaption>Previous and current point in time</figcaption>
</figure>
<p>If we want <span class="math inline">t</span> (the total time elapsed at the previous point), we need the
inverse function <span class="math inline">f^{-1}</span>. Indeed, <span class="math inline">t = f^{-1}(f(t))</span>: taking the inverse of a
function gives back the input. We know <span class="math inline">f</span> so we can inverse it, relying on
WolframAlpha again (what a blessing this website is):</p>
<div class="math block">
f^{-1}(x) = \frac{\ln{\frac{\Beta-x}{\Beta-\Alpha}}}{F \ln(1-R/F)}
</div>
<p><strong>Note</strong>: <span class="math inline">\ln</span> stands for natural logarithm, sometimes also called <span class="math inline">\log</span>.
Careful though, on Desmos for example <span class="math inline">\log</span> is in base 10, not base <span class="math inline">e</span> (while
its <span class="math inline">\exp</span> is in base <span class="math inline">e</span> for some reason).</p>
<p>This complex formula may feel a bit intimidating but we can now find <span class="math inline">\Eta</span>
only using its two parameters:</p>
<div class="math block">
\begin{split}
\Eta(v,\Delta_n) &= f(t + \Delta_n) \\
&= f(f^{-1}(f(t)) + \Delta_n) \\
&= f(f^{-1}(v) + \Delta_n) \\
&= f(\frac{\ln{\frac{\Beta-v}{\Beta-\Alpha}}}{F \ln(1-R/F)} + \Delta_n) \\
&= \texttt{lerp}(\Alpha, \Beta, 1-(1-R/F)^{(\frac{\ln{\frac{\Beta-v}{\Beta-\Alpha}}}{F \ln(1-R/F)} + \Delta_n) \times F}) \\
&= \texttt{lerp}(\Alpha, \Beta, 1-(1-R/F)^{\frac{\ln{\frac{\Beta-v}{\Beta-\Alpha}}}{\ln(1-R/F)}} (1-R/F)^{F\Delta_n}) \\
&= \texttt{lerp}(\Alpha, \Beta, 1-\frac{\Beta-v}{\Beta-\Alpha} (1-R/F)^{F\Delta_n}) \\
&= (1-\frac{\Beta-v}{\Beta-\Alpha} (1-R/F)^{F\Delta_n})(\Beta-\Alpha) + A \\
&= (\Beta-\Alpha) - (\Beta-v) (1-R/F)^{F\Delta_n} + A \\
&= (v-\Beta)(1-R/F)^{F\Delta_n} + \Beta \\
&= \texttt{lerp}(\Beta, v, (1-R/F)^{F\Delta_n}) \\
\Eta(v,\Delta_n) &= \texttt{lerp}(v, \Beta, 1-(1-R/F)^{F\Delta_n})
\end{split}
</div>
<p>Again we swapped the first 2 arguments of <code>lerp</code> at the last step at the cost
of an additional subtraction: this is more readable because <span class="math inline">\Beta</span> is our
destination point.</p>
<p>An interesting property that is going to be helpful here is <span class="math inline">m^n = e^{n
\ln{m}}</span>. For my fellow programmers getting tensed here: <code>pow(m, n) == exp(n * log(m))</code>. Replacing the power with the exponential may not seem like an
improvement at first, but it allows packing all the constant terms together:</p>
<div class="math block">
\begin{split}
\Eta(v,\Delta_n) &= \texttt{lerp}(v, \Beta, 1-(1-R/F)^{F\Delta_n}) \\
&= \texttt{lerp}(v, \Beta, 1-e^{F\ln(1-R/F)\Delta_n})
\end{split}
</div>
<p><span class="math inline">F\ln(1-R/F)</span> can be pre-computed because it is constant: it's our <strong>rate
conversion formula</strong>, which we can extract:</p>
<div class="math block">
\begin{split}
R' &= F\ln(1-R/F) \\
\Eta(v,\Delta_n) &= \texttt{lerp}(v, \Beta, 1-e^{R'\Delta_n})
\end{split}
</div>
<p>Rewriting this in a sequence notation, we get:</p>
<div class="math block">
\begin{split}
R' &= F\ln(1-R/F) \\
u_{n+1} &= \texttt{lerp}(u_n, \Beta, 1-e^{R'\Delta_n})
\end{split}
</div>
<p>We're going to make one last adjustment: <span class="math inline">R'</span> is negative, which is not
exactly intuitive to work with as a user (in case it is defined arbitrarily and
not through the conversion formula), so we make a sign swap for convenience:</p>
<div class="math block">
\boxed{\begin{split}
R' &= -F\ln(1-R/F) \\
u_{n+1} &= \texttt{lerp}(u_n, \Beta, 1-e^{-R'\Delta_n})
\end{split}}
</div>
<p>The conversion formula is optional, it's only needed to port a previously
broken code to the new formula. One interesting thing here is that <span class="math inline">R'</span> is
fairly close to <span class="math inline">R</span> when <span class="math inline">R</span> is small.</p>
<p>For example, a rate factor <span class="math inline">R=5</span> at 60 FPS gives us <span class="math inline">R' \approx 5.22</span>. This
means that if the rate factors weren't closely tuned, it is probably acceptable
to go with <span class="math inline">R'=R</span> and not bother with any conversion. Still, having that
formula can be useful to update all the decay constants and check that
everything still works as expected.</p>
<p>Also, notice how if the delta gets very large, <span class="math inline">-R'\Delta_n</span> is going toward
<span class="math inline">-\infty</span>, <span class="math inline">e^{-R'\Delta_n}</span> toward <span class="math inline">0</span>, <span class="math inline">1-e^{-R'\Delta_n}</span> toward <span class="math inline">1</span>, and so
the interpolation is going to reach our final target <span class="math inline">\Beta</span> without
overshooting. This means the formula doesn't need any extra care with regard to
the 3rd issue we pointed out earlier.</p>
<p>Looking at the previous curves but now with the new formula and an adjusted
rate:</p>
<figure>
<img src="http://blog.pkh.me/img/fixing-damp/new-formula.png" alt="new formula">
<figcaption>Comparison of the curves at different frame rates with the new formula</figcaption>
</figure>
<h2>Conclusion</h2>
<p>So there we have it, the perfect formula, frame rate agnostic ✅,
deterministic ✅ and resilient to overshooting ✅. If you've quickly skimmed
through the maths, here is what you need to know:</p>
<pre><code class="language-python">a = lerp(a, B, delta * RATE)
</code></pre>
<p>Should be changed to:</p>
<pre><code class="language-python">a = lerp(a, B, 1.0 - exp(-delta * RATE2))
</code></pre>
<p>With the precomputed <code>RATE2 = -FPS * log(1 - RATE/FPS)</code> (where <code>log</code> is the
natural logarithm), or simply using <code>RATE2 = RATE</code> as a rough equivalent.</p>
<p>Also, any existing overshooting clamping can safely be dropped.</p>
<p>Now please adjust your game to make the world a better and safer place for
everyone ♥</p>
<h2>Going further</h2>
<p>As <a href="https://news.ycombinator.com/item?id=40401152">suggested on HN</a>:</p>
<ul>
<li>for numerical stability it makes sense to <a href="https://www.johndcook.com/blog/cpp_expm1/">use <code>-expm1(x)</code></a> instead of
<code>1-exp(x)</code></li>
<li>API wise, proposing a <a href="https://en.wikipedia.org/wiki/Exponential_smoothing#Time_constant">time constant</a> <code>T</code> instead of the rate (where
<code>T=1/rate</code>) might be more intuitive</li>
<li>for performance reasons, the exponential could be expanded manually to
<code>x+x²/2</code> for small value of <code>x</code></li>
</ul>
http://blog.pkh.me/p/40-hacking-window-titles-to-help-obs.html
http://blog.pkh.me/p/40-hacking-window-titles-to-help-obs.html
Hacking window titles to help OBSTue, 06 Jun 2023 09:27:10 -0000<p>This write-up is meant to present the rationale and technical details behind a
tiny project I wrote the other day, <a href="https://github.com/ubitux/WindowTitleHack">WTH, or WindowTitleHack</a>, which is
meant to force a constant window name for apps that keep changing it (I'm
looking specifically at Firefox and Krita, but there are probably many others).</p>
<h2>Why tho?</h2>
<p>I've been streaming on Twitch from Linux (X11) with a barebone <a href="https://obsproject.com/">OBS
Studio</a> setup for a while now, and while most of the experience has been
relatively smooth, one particularly striking frustration has been dealing with
windows detection.</p>
<p>If we don't want to capture the whole desktop for privacy reasons or simply to
have control over the scene layout depending on the currently focused app, we
need to rely on the <code>Window Capture (XComposite)</code> source. This works mostly
fine, and it is actually able to track windows even when their title bar is
renamed. But obviously, upon restart it can't find them again because both the
window titles and the window IDs changed, meaning we have to redo our setup by
reselecting the windows again.</p>
<p>It would have been acceptable if that was the only issue I had, but one of the
more advanced feature I'm extensively using is the <code>Advanced Scene Switcher</code>
(the builtin one, available through the <code>Tools</code> menu). This tool is a basic
window title pattern matching system that allows automatic scene switches
depending on the current window. Note that it does seem to support regex, which
could help with the problem, but there is no guarantee that the app would leave
a recognizable matchable pattern in its title. Also, if we want multiple
Firefox windows but only match one in particular, the regex wouldn't help.</p>
<p><img src="http://blog.pkh.me/img/windowtitlehack/obs-automatic-scene-switcher.png" alt="centerimg" /></p>
<h2>Hacking Windows</h2>
<p>One unreliable hack would be to spam <code>xdotool</code> commands to correct the window
title. This could be a resource hog, and it would create quite a bunch of
races. One slight improvement over this would be to use <code>xprop -spy</code>, but that
wouldn't address the race conditions (since we would adjust the title <em>after</em>
it's been already changed).</p>
<p>So how do we deal with that properly? Well, on X11 with the reference library
(<code>Xlib</code>) there are actually various (actually a lot of) ways of changing the
title bar. It took me a while to identify which call(s) to target, but ended up
with the following call graph, where each function is actually exposed
publicly:</p>
<p><img src="http://blog.pkh.me/img/windowtitlehack/x11-xchangeproperty.png" alt="centerimg" /></p>
<p>From this we can easily see that we only need to hook the deepest function
<code>XChangeProperty</code>, and check if the property is <code>XA_WM_NAME</code> (or its "modern"
sibling, <code>_NET_WM_NAME</code>).</p>
<p>How do we do that? With the help of the <code>LD_PRELOAD</code> environment variable and a
dynamic library that implements a custom <code>XChangeProperty</code>.</p>
<p>First, we grab the original function:</p>
<pre><code class="language-c">#include <dlfcn.h>
/* A type matching the prototype of the target function */
typedef int (*XChangeProperty_func_type)(
Display *display,
Window w,
Atom property,
Atom type,
int format,
int mode,
const unsigned char *data,
int nelements
);
/* [...] */
XChangeProperty_func_type XChangeProperty_orig = dlsym(RTLD_NEXT, "XChangeProperty");
</code></pre>
<p>We also need to craft a custom <code>_NET_WM_NAME</code> atom:</p>
<pre><code class="language-c">_NET_WM_NAME = XInternAtom(display, "_NET_WM_NAME", 0);
</code></pre>
<p>With this we are now able to intercept all the <code>WM_NAME</code> events and override
them with our own:</p>
<pre><code class="language-c">if (property == XA_WM_NAME || property == _NET_WM_NAME) {
data = (const unsigned char *)new_title;
nelements = (int)strlen(new_title);
}
return XChangeProperty_orig(display, w, property, type, format, mode, data, nelements);
</code></pre>
<p>We wrap all of this into our own redefinition of <code>XChangeProperty</code> and… that's
pretty much it.</p>
<p>Now due to a long history of development, <code>Xlib</code> has been "deprecated" and
superseded by <code>libxcb</code>. Both are widely used, but fortunately the APIs are more
or less similar. The function to hook is <code>xcb_change_property</code>, and defining
<code>_NET_WM_NAME</code> is slightly more cumbered but not exactly challenging:</p>
<pre><code class="language-c">const xcb_intern_atom_cookie_t cookie = xcb_intern_atom(conn, 0, strlen("_NET_WM_NAME"), "_NET_WM_NAME");
xcb_intern_atom_reply_t *reply = xcb_intern_atom_reply(conn, cookie, NULL);
if (reply)
_NET_WM_NAME = reply->atom;
free(reply);
</code></pre>
<p>Aside from that, the code is pretty much the same.</p>
<h2>Configuration</h2>
<p>To pass down the custom title to override, I've been relying on an environment
variable <code>WTH_TITLE</code>. From a user point of view, it looks like this:</p>
<pre><code class="language-sh">LD_PRELOAD="builddir/libwth.so" WTH_TITLE="Krita4ever" krita
</code></pre>
<p>We could probably improve the usability by creating a wrapping tool (so that we
could have something such as <code>./wth --title=Krita4ever krita</code>). Unfortunately I
wasn't yet able to make a self-referencing executable accepted by <code>LD_PRELOAD</code>,
so for now the manual <code>LD_PRELOAD</code> and <code>WTH_TITLE</code> environment will do just
fine.</p>
<h2>Thread safety</h2>
<p>To avoid a bunch of redundant function round-trips we need to globally cache a
few things: the new title (to avoid fetching it in the environment all the
time), the original functions (to save the <code>dlsym</code> call), and <code>_NET_WM_NAME</code>.</p>
<p>Those are loaded lazily at the first function call, but we have no guarantee
with regards to concurrent calls on that hooked function so we must create our
own lock. I initially though about using <code>pthread_once</code> but unfortunately the
initialization callback mechanism doesn't allow any custom argument. Again,
this is merely a slight annoyance since we can implement our own in a few lines
of code:</p>
<pre><code class="language-c">/* The "once" API is similar to pthread_once but allows a custom function argument */
struct wth_once {
pthread_mutex_t lock;
int initialized;
};
#define WTH_ONCE_INITIALIZER {.lock=PTHREAD_MUTEX_INITIALIZER}
typedef void (*init_func_type)(void *user_arg);
void wth_init_once(struct wth_once *once, init_func_type init_func, void *user_arg)
{
pthread_mutex_lock(&once->lock);
if (!once->initialized) {
init_func(user_arg);
once->initialized = 1;
}
pthread_mutex_unlock(&once->lock);
}
</code></pre>
<p>Which we use like this:</p>
<pre><code class="language-c">static struct wth_once once = WTH_ONCE_INITIALIZER;
static void init_once(void *user_arg)
{
Display *display = user_arg;
/* [...] */
}
/* [...] */
wth_init_once(&once, init_once, display);
</code></pre>
<h2>The End?</h2>
<p>I've been delaying doing this project for weeks because it felt complex at
first glance, but it actually just took me a few hours. Probably the same
amount of time it took me to write this article. While the project is
admittedly really small, it still feel like a nice accomplishment. I hope it's
useful to other people.</p>
<p>Now, the Wayland support is probably the most obvious improvement the project
can receive, but I don't have such a setup locally to test yet, so this is
postponed for an undetermined amount of time.</p>
<p>The code is released with a permissive license (MIT); if you want to contribute
you can open a pull request but getting in touch with me first is appreciated
to avoid unnecessary and overlapping efforts.</p>
http://blog.pkh.me/p/39-improving-color-quantization-heuristics.html
http://blog.pkh.me/p/39-improving-color-quantization-heuristics.html
Improving color quantization heuristicsSat, 31 Dec 2022 12:00:43 -0000<p>In 2015, I wrote an article about <a href="http://blog.pkh.me/p/21-high-quality-gif-with-ffmpeg.html">how the palette color quantization was
improved in FFmpeg</a> in order to make nice animated GIF files. For some
reason, to this day this is one of my most popular article.</p>
<p>As time passed, my experience with colors grew and I ended up being quite
ashamed and frustrated with the state of these filters. A lot of the code was
naive (when not terribly wrong), despite the apparent good results.</p>
<p>One of the major change I wanted to do was to evaluate the color distances
using a perceptually uniform colorspace, instead of using a naive euclidean
distance of RGB triplets.</p>
<p>As usual it felt like a week-end long project; after all, all I have to do is
change the distance function to work in a different space, right? Well, if
you're following my blog you might have noticed I've add numerous adventures
that stacked up on each others:</p>
<ul>
<li>I had to work out the <a href="http://blog.pkh.me/p/38-porting-oklab-colorspace-to-integer-arithmetic.html">colorspace with integer arithmetic</a> first</li>
<li>...which forced me to look into <a href="http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html">integer division</a> more deeply</li>
<li>...which confronted me to all sort of <a href="http://blog.pkh.me/p/37-gcc-undefined-behaviors-are-getting-wild.html">undefined behaviours</a> in the
process</li>
</ul>
<p>And when I finally reached the point where I could make the switch to
<a href="https://bottosson.github.io/posts/oklab/">OkLab</a> (the perceptual colorspace), a few experiments showed that the
flavour of the core algorithm I was using might contain some fundamental flaws,
or at least was not implementing optimal heuristics. So here we go again,
quickly enough I find myself starting a new research study in the pursuit of
understanding how to put pixels on the screen. This write-up is the story of
yet another self-inflicted struggle.</p>
<h2>Palette quantization</h2>
<p>But what is <em>palette quantization</em>? It essentially refers to the process of
reducing the number of available colors of an image down to a smaller subset.
In sRGB, an image can have up to 16.7 million colors. In practice though it's
generally much less, to the surprise of no one. Still, it's not rare to have a
few hundreds of thousands different colors in a single picture. Our goal is to
reduce that to something like 256 colors that represent them best, and use
these colors to create a new picture.</p>
<p>Why you may ask? There are multiple reasons, here are some:</p>
<ul>
<li>Improve size compression (this is a lossy operation of course, and using
dithering on top might actually defeat the original purpose)</li>
<li>Some codecs might not support anything else than limited palettes (GIF or
subtitles codecs are examples)</li>
<li>Various artistic purposes</li>
</ul>
<p>Following is an example of a picture quantized at different levels:</p>
<table>
<thead>
<tr>
<th>Original (26125 colors)</th>
<th>Quantized to 8bpp (256 colors)</th>
<th>Quantized to 2bpp (4 colors)</th>
</tr>
</thead>
<tbody>
<tr>
<td><img src="http://blog.pkh.me/img/color-quant/cat-orig.png" alt="Cat (original)" /></td>
<td><img src="http://blog.pkh.me/img/color-quant/cat-256.png" alt="Cat (8bpp)" /></td>
<td><img src="http://blog.pkh.me/img/color-quant/cat-4.png" alt="Cat (2bpp)" /></td>
</tr>
</tbody>
</table>
<p>This color quantization process can be roughly summarized in a 4-steps based
process:</p>
<ol>
<li>Sample the input image: we build an histogram of all the colors in the
picture (basically a simple statistical analysis)</li>
<li>Design a colormap: we build the palette through various means using the
histograms</li>
<li>Create a pixel mapping which associates a color (one that can be found in
the input image) with another (one that can be found in the newly created
palette)</li>
<li>Image quantizing: we use the color mapping to build our new image. This step
may also involve some <a href="https://en.wikipedia.org/wiki/Dither">dithering</a>.</li>
</ol>
<p>The study here will focus on step 2 (which itself relies on step 1).</p>
<h2>Colormap design algorithms</h2>
<p>A palette is simply a set of colors. It can be represented in various ways, for
example here in 2D and 3D:</p>
<p><img src="http://blog.pkh.me/img/color-quant/pal-2d-3d.png" alt="centerimg" /></p>
<p>To generate such a palette, all sort of algorithms exists. They are usually
classified into 2 large categories:</p>
<ul>
<li>Dividing/splitting algorithms (such as Median-Cut and its various flavors)</li>
<li>Clustering algorithms (such as K-means, maximin distance, (E)LBG or pairwise
clustering)</li>
</ul>
<p>The former are faster but non-optimal while the latter are slower but better.
The problem is <a href="https://en.wikipedia.org/wiki/NP-completeness">NP-complete</a>, meaning it's possible to find the
optimal solution but it can be extremely costly. On the other hand, it's
possible to find "local optimums" at minimal cost.</p>
<p>Since I'm working within FFmpeg, speed has always been a priority. This was the
reason that motivated me to initially implement the Median-Cut over a more
expensive algorithm.</p>
<p>The rough picture of the algorithm is relatively easy to grasp. Assuming we
want a palette of <code>K</code> colors:</p>
<ol>
<li>A set <code>S</code> of all the colors in the input picture is constructed, along with
a respective set <code>W</code> of the weight of each color (how much they appear)</li>
<li>Since the colors are expressed as RGB triplets, they can be encapsulated
in one big cuboid, or box</li>
<li>The box is cut in two along one of the axis (R, G or B) on the median
(hence the name of the algorithm)</li>
<li>If we don't have a total <code>K</code> boxes yet, pick one of them and go back to
previous step</li>
<li>All the colors in each of the <code>K</code> boxes are then averaged to form the color
palette entries</li>
</ol>
<p>Here is how the process looks like visually:</p>
<p><video src="http://blog.pkh.me/misc/mediancut-parrot-16.mp4" controls="controls" width="800">Median-Cut algorithm targeting 16 boxes</video></p>
<p>You may have spotted in this video that the colors are not expressed in RGB but
in Lab: this is because instead of representing the colors in a traditional RGB
colorspace, we are instead using the OkLab colorspace which has the property of
being perceptually uniform. It doesn't really change the Median Cut algorithm,
but it definitely has an impact on the resulting palette.</p>
<p>One striking limitation of this algorithm is that we are working exclusively
with cuboids: the cuts are limited to an axis, we are not cutting along an
arbitrary plane or a more complex shape. Think of it like working with voxels
instead of more free-form geometries. The main benefit is that the algorithm is
pretty simple to implement.</p>
<p>Now the description provided earlier conveniently avoided describing two
important aspects happening in step 3 and 4:</p>
<ol>
<li>How do we choose the next box to split?</li>
<li>How do we choose along which axis of the box we make the cut?</li>
</ol>
<p>I pondered about that for a quite a long time.</p>
<h2>An overview of the possible heuristics</h2>
<p>In bulk, some of the heuristics I started thinking of:</p>
<ul>
<li>should we take the box that has the longest axis across all boxes?</li>
<li>should we take the box that has the largest volume?</li>
<li>should we take the box that has the biggest <a href="https://en.wikipedia.org/wiki/Mean_squared_error">Mean Squared Error</a> when
compared to its average color?</li>
<li>should we take the box that has the <em>axis</em> with the biggest MSE?</li>
<li>assuming we choose to go with the MSE, should it be normalized across all
boxes?</li>
<li>should we even account for the weight of each color or consider them equal?</li>
<li>what about the axis? Is it better to pick the longest or the one with the
higher MSE?</li>
</ul>
<p>I tried to formalize these questions mathematically to the best of my limited
abilities. So let's start by saying that all the colors <code>c</code> of a given box are
stored in a <code>N×M</code> 2D-array following the matrix notation:</p>
<table>
<tbody>
<tr><td>L₁</td><td>L₂</td><td>L₃</td><td>…</td><td>Lₘ</td></tr>
<tr><td>a₁</td><td>a₂</td><td>a₃</td><td>…</td><td>aₘ</td></tr>
<tr><td>b₁</td><td>b₂</td><td>b₃</td><td>…</td><td>bₘ</td></tr>
</tbody>
</table>
<p><code>N</code> is the number of components (3 in our case, whether it's RGB or Lab), and
<code>M</code> the number of colors in that box. You can visualize this as a list of
vectors as well, where <code>c_{i,j}</code> is the color at row <code>i</code> and column <code>j</code>.</p>
<p>With that in mind we can sketch the following diagram representing the tree of
heuristic possibilities to implement:</p>
<p><img src="http://blog.pkh.me/img/color-quant/diagram-heuristics.png" alt="centerimg" /></p>
<p>Mathematicians are going to kill me for doodling random notes all over this
perfectly understandable symbols gibberish, but I believe this is required for
the human beings reading this article.</p>
<p>In summary, we end up with a total of 24 combinations to try out:</p>
<ul>
<li>2 axis selection heuristics:
<ul>
<li>cut the axis with the maximum error squared</li>
<li>cut the axis with the maximum length</li>
</ul>
</li>
<li>3 operators:
<ul>
<li>maximum measurement out of all the channels</li>
<li>product of the measurements of all the channels</li>
<li>sum of the measurements of all the channels</li>
</ul>
</li>
<li>4 measurements:
<ul>
<li>error squared, honoring weights</li>
<li>error squared, not honoring weights</li>
<li>error squared, honoring weights, normalized</li>
<li>length of the axis</li>
</ul>
</li>
</ul>
<p>If we start to intuitively think about which ones are likely going to perform
the best, we quickly realize that we haven't actually formalized what we are
trying to achieve. Such a rookie mistake. Clarifying this will help us getting
a better feeling about the likely outcome.</p>
<p>I chose to target an output that minimizes the MSE against the reference image,
in a perceptual way. Said differently, trying to make the perceptual distance
between an input and output color pixel as minimal as possible. This is an
arbitrary and debatable target, but it's relatively simple and objective to
evaluate if we have faith in the selected perceptual model. Another appropriate
metric could have been to find the ideal palette through another algorithm, and
compare against that instead. Doing that unfortunately implied that I would
trust that other algorithm, its implementation, and that I have enough
computing power.</p>
<p>So to summarize, we want to minimize the MSE between the input and output,
evaluated in the OkLab colorspace. This can be expressed with the following
formula:</p>
<p><img src="http://blog.pkh.me/img/color-quant/evaluation.png" alt="centerimg" /></p>
<p>Where:</p>
<ul>
<li><code>P</code> is a <a href="https://en.m.wikipedia.org/wiki/Partition_of_a_set">partition</a>
(which we constrain to a box in our implementation)</li>
<li><code>C</code> the set of colors in the partition <code>P</code></li>
<li><code>w</code> the weight of a color</li>
<li><code>c</code> a single color</li>
<li><code>µ</code> the average color of the set <code>C</code></li>
</ul>
<p>Special thanks to <code>criver</code> for helping me a ton on the math area, this last
formula is from them.</p>
<p>Looking at the formula, we can see how similar it is to certain branches of the
heuristics tree, so we can start getting an intuition about the result of the
experiment.</p>
<h2>Experiment language</h2>
<p>Short deviation from the main topic (feel free to skip to the next section):
working in C within FFmpeg quickly became a hurdle more than anything. Aside
from the lack of flexibility, the implicit casts destroying the precision
deceitfully, and the undefined behaviours, all kind of C quirks went in the way
several times, which made me question my sanity. This one typically severly
messed me up while trying to average the colors:</p>
<pre><code class="language-c">#include <stdio.h>
#include <stdint.h>
int main (void)
{
const int32_t x = -30;
const uint32_t y = 10;
const uint32_t a = 30;
const int32_t b = -10;
printf("%d×%u=%d\n", x, y, x * y);
printf("%u×%d=%d\n", a, b, a * b);
printf("%d/%u=%d\n", x, y, x / y);
printf("%u/%d=%d\n", a, b, a / b);
return 0;
}
</code></pre>
<pre><code class="language-shell">% cc -Wall -Wextra -fsanitize=undefined test.c -o test && ./test
-30×10=-300
30×-10=-300
-30/10=429496726
30/-10=0
</code></pre>
<p>Anyway, I know this is obvious but if you aren't already doing that I suggest
you build your experiments in another language, Python or whatever, and rewrite
them in C later when you figured out your expected output.</p>
<p>Re-implementing what I needed in Python didn't take me long. It was, and still
is obviously much slower at runtime, but that's fine. There is a lot of room
for speed improvement, typically by relying on <code>numpy</code> (which I didn't bother
with).</p>
<h2>Experiment results</h2>
<p>I created a <a href="https://github.com/ubitux/research/">research repository</a> for the occasion. The code to
reproduce and the results can be found in the <a href="https://github.com/ubitux/research/tree/main/color-quantization">color quantization
README</a>.</p>
<p>In short, based on the results, we can conclude that:</p>
<ul>
<li>Overall, the box that has the axis with the largest non-normalized weighted
sum of squared error is the best candidate in the box selection algorithm</li>
<li>Overall, cutting the axis with the largest weighted sum of squared error is
the best axis cut selection algorithm</li>
</ul>
<p>To my surprise, normalizing the weights per box is not a good idea. I initially
observed that by trial and error, which was actually one of the main motivator
for this research. I initially thought normalizing each box was necessary in
order to compare them against each others (such that they are compared on a
common ground). My loose explanation of the phenomenon was that not normalizing
causes a bias towards boxes with many colors, but that's actually exactly what
we want. I believe it can also be explained by our evaluation function: we want
to minimize the error across the whole set of colors, so small partitions (in
color counts) must not be made stronger. At least not in the context of the
target we chose.</p>
<p>It's also interesting to see how the <code>max()</code> seems to perform better than the
<code>sum()</code> of the variance of each component most of the time. Admittedly my
picture samples set is not that big, which may imply that more experiments to
confirm that tendency are required.</p>
<p>In retrospective, this might have been quickly predictable to someone with a
mathematical background. But since I don't have that, nor do I trust my
abstract thinking much, I'm kind of forced to try things out often. This is
likely one of the many instances where I spent way too much energy on something
obvious from the beginning, but I have the hope it will actually provide some
useful information for other lost souls out there.</p>
<h2>Known limitations</h2>
<p>There are two main limitations I want to discuss before closing this article.
The first one is related to minimizing the MSE even more.</p>
<h3>K-means refinement</h3>
<p>We know the Median-Cut actually provides a rough estimate of the optimal
palette. One thing we could do is use it as a first step before refinement, for
example by running a few K-means iterations as post-processing (how much
refinement/iterations could be a user control). The general idea of K-means is
to progressively move each colors individually to a more appropriate box, that
is a box for which the color distance to the average color of that box is
smaller. I started implementing that in a very naive way, so it's extremely
slow, but that's something to investigate further because it definitely
improves the results.</p>
<p>Most of the academic literature seems to suggest the use of the K-means
clustering, but all of them require some startup step. Some come up with
various heuristics, some use PCA, but I've yet to see one that rely on
Median-Cut as first pass; maybe that's not such a good idea, but who knows.</p>
<h3>Bias toward perceived lightness</h3>
<p>Another more annoying problem for which I have no solution for is with regards
to the human perception being much more sensitive to light changes than hue. If
you look at the first demo with the parrot, you may have observed the boxes are
kind of thin. This is because the <code>a</code> and <code>b</code> components (respectively how
green/red and blue/yellow the color is) have a much smaller amplitude compared
to the <code>L</code> (perceived lightness).</p>
<p>Here is a side by side comparison of the spread of colors between a stretched
and normalized view:</p>
<p><img src="http://blog.pkh.me/img/color-quant/oklab-axis-scaled.png" alt="centerimg" /></p>
<p>You may rightfully question whether this is a problem or not. In practice, this
means that when <code>K</code> is low (let's say smaller than 8 or even 16), cuts along <code>L</code>
will almost always be preferred, causing the picture to be heavily desaturated.
This is because it tries to preserve the most significant attribute in human
perception: the lightness.</p>
<p>That particular picture is actually a pathological study case:</p>
<table>
<thead>
<tr>
<th>4 colors</th>
<th>8 colors</th>
<th>12 colors</th>
<th>16 colors</th>
</tr>
</thead>
<tbody>
<tr>
<td><img src="http://blog.pkh.me/img/color-quant/woman-4.png" alt="Portrait K=4" /></td>
<td><img src="http://blog.pkh.me/img/color-quant/woman-8.png" alt="Portrait K=8" /></td>
<td><img src="http://blog.pkh.me/img/color-quant/woman-12.png" alt="Portrait K=12" /></td>
<td><img src="http://blog.pkh.me/img/color-quant/woman-16.png" alt="Portrait K=16" /></td>
</tr>
</tbody>
</table>
<p>We can see the hue timidly appearing around <code>K=16</code> (specifically it starts
being more strongly noticeable starting the cut <code>K=13</code>).</p>
<h2>Conclusion</h2>
<p>For now, I'm mostly done with this "week-end long project" into which I
actually poured 2 or 3 months of lifetime. The FFmpeg patchset will likely be
upstreamed soon so everyone should hopefully be able to benefit from it in the
next release. It will also come with <a href="https://fosstodon.org/@bug/109602427382086789">additional dithering
methods</a>, which implementation actually was a relaxing
distraction from all this hardship. There are still many ways of improving this
work, but it's the end of the line for me, so I'll trust the Internet with it.</p>
http://blog.pkh.me/p/38-porting-oklab-colorspace-to-integer-arithmetic.html
http://blog.pkh.me/p/38-porting-oklab-colorspace-to-integer-arithmetic.html
Porting OkLab colorspace to integer arithmeticSun, 11 Dec 2022 22:01:17 -0000<p>For reasons I'll explain in a futur write-up, I needed to make use of a
perceptually uniform colorspace in some computer vision code. <a href="https://bottosson.github.io/posts/oklab/">OkLab from Björn
Ottosson</a> was a great candidate given how simple the implementation is.</p>
<p><img src="http://blog.pkh.me/img/oklab-int/hue_oklab.png" alt="centerimg" title="OkLab hue" /></p>
<p>But there is a plot twist: I needed the code to be deterministic for the tests
to be portable across a large variety of architecture, systems and
configurations. Several solutions were offered to me, including reworking the
test framework to support a difference mechanism with threshold, but having
done that in another project I can confidently say that it's not trivial (when
not downright impossible in certain cases). Another approach would have been to
hardcode the libc math functions, but even then I wasn't confident the floating
point arithmetic would determinism would be guaranteed in all cases.</p>
<p>So I ended up choosing to port the code to integer arithmetic. I'm sure many
people would disagree with that approach, but:</p>
<ul>
<li>code determinism is guaranteed</li>
<li>not all FPU are that efficient, typically on embedded</li>
<li>it can now be used in the kernel; while this is far-fetched for OkLab (though
maybe someone needs some color management in v4l2 or something), sRGB
transforms might have their use cases</li>
<li>it's a learning experience which can be re-used in other circumstances</li>
<li>working on the integer arithmetic versions unlocked various optimizations for
the normal case</li>
</ul>
<p><strong>Note</strong>: I'm following Björn Ottosson will to have OkLab code in the public
domain as well as under MIT license, so this "dual licensing" applies to all
the code presented in this article.</p>
<p><strong>Warning</strong>: The integer arithmetics in this write-up can only work if your
language behaves the same as C99 (or more recent) with regard to integer
division. See <a href="http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html">this previous article on integer division</a> for more
information.</p>
<h2>Quick summary of uniform colorspaces</h2>
<p>For those unfamiliar with color management, one of the main benefit of a
uniform colorspace like OkLab is that the euclidean distance between two colors
is directly correlated with the human perception of these colors.</p>
<p>More concretely, if we want to evaluate the distance between the RGB triplets
<code>(R₀,G₀,B₀)</code> and <code>(R₁,G₁,B₁)</code>, one may naively compute the euclidean distance
<code>√((R₀-R₁)²+(G₀-G₁)²+(B₀-B₁)²)</code>. Unfortunately, even if the RGB is gamma
expanded into linear values, the computed distance will actually be pretty far
from reflecting how the human eye perceive this difference. It typically isn't
going to be consistent when applied to another pair of colors.</p>
<p>With OkLab (and many other uniform colorspaces), the colors are also identified
with 3D coordinates, but instead of <code>(R,G,B)</code> we call them <code>(L,a,b)</code> (which is
an entirely different 3D space). In that space <code>√((L₀-L₁)²+(a₀-a₁)²+(b₀-b₁)²</code>
(called <code>ΔE</code>, or <code>Delta-E</code>) is expected to be aligned with human perception of
color differences.</p>
<p>Of course, this is just one model, and it doesn't take into account many
parameters. For instance, the perception of a color depends a lot on the
surrounding colors. Still, these models are much better than working with RGB
triplets, which don't make much sense visually speaking.</p>
<h2>Reference code / diagram</h2>
<p>In this study case, We will be focusing on the transform that goes from sRGB to
OkLab, and back again into sRGB. Only the first part is interesting if we want
the color distance, but sometimes we also want to alter a color uniformly and
thus we need the 2nd part as well to reconstruct an sRGB color from it.</p>
<p>We are only considering the sRGB input and output for simplicity, which means
we will be inlining the sRGB color transfer in the pipeline. If you're not
familiar with gamma compression, there are <a href="https://en.wikipedia.org/wiki/Gamma_correction">many</a> <a href="https://bottosson.github.io/posts/colorwrong/">resources</a>
<a href="http://filmicworlds.com/blog/linear-space-lighting-i-e-gamma/">about</a> <a href="https://www.youtube.com/watch?v=LKnqECcg6Gw">it</a> on the Internet which you may want to look into
first.</p>
<p>Here is a diagram of the complete pipeline:</p>
<p><img src="http://blog.pkh.me/img/oklab-int/pipeline.png" alt="centerimg" title="sRGB/OkLab pipeline" /></p>
<p>And the corresponding code (of the 4 circles in the diagram) we will be porting:</p>
<pre><code class="language-c">struct Lab { float L, a, b; }
uint8_t linear_f32_to_srgb_u8(float x)
{
if (x <= 0.0) {
return 0;
} else if (x >= 1.0) {
return 0xff;
} else {
const float v = x < 0.0031308f ? x*12.92f : 1.055f*powf(x, 1.f/2.4f) - 0.055f;
return lrintf(v * 255.f);
}
}
float srgb_u8_to_linear_f32(uint8_t x)
{
const float v = x / 255.f;
return v < 0.04045f ? v/12.92f : powf((v+0.055f)/1.055f, 2.4f);
}
struct Lab srgb_u8_to_oklab_f32(uint32_t srgb)
{
const float r = srgb_u8_to_linear_f32(srgb >> 16 & 0xff);
const float g = srgb_u8_to_linear_f32(srgb >> 8 & 0xff);
const float b = srgb_u8_to_linear_f32(srgb & 0xff);
const float l = 0.4122214708f * r + 0.5363325363f * g + 0.0514459929f * b;
const float m = 0.2119034982f * r + 0.6806995451f * g + 0.1073969566f * b;
const float s = 0.0883024619f * r + 0.2817188376f * g + 0.6299787005f * b;
const float l_ = cbrtf(l);
const float m_ = cbrtf(m);
const float s_ = cbrtf(s);
const struct Lab ret = {
.L = 0.2104542553f * l_ + 0.7936177850f * m_ - 0.0040720468f * s_,
.a = 1.9779984951f * l_ - 2.4285922050f * m_ + 0.4505937099f * s_,
.b = 0.0259040371f * l_ + 0.7827717662f * m_ - 0.8086757660f * s_,
};
return ret;
}
uint32_t oklab_f32_to_srgb_u8(struct Lab c)
{
const float l_ = c.L + 0.3963377774f * c.a + 0.2158037573f * c.b;
const float m_ = c.L - 0.1055613458f * c.a - 0.0638541728f * c.b;
const float s_ = c.L - 0.0894841775f * c.a - 1.2914855480f * c.b;
const float l = l_*l_*l_;
const float m = m_*m_*m_;
const float s = s_*s_*s_;
const uint8_t r = linear_f32_to_srgb_u8(+4.0767416621f * l - 3.3077115913f * m + 0.2309699292f * s);
const uint8_t g = linear_f32_to_srgb_u8(-1.2684380046f * l + 2.6097574011f * m - 0.3413193965f * s);
const uint8_t b = linear_f32_to_srgb_u8(-0.0041960863f * l - 0.7034186147f * m + 1.7076147010f * s);
return r<<16 | g<<8 | b;
}
</code></pre>
<h2>sRGB to Linear</h2>
<p>The first step is converting the sRGB color to linear values. That sRGB
transfer function can be intimidating, but it's pretty much a simple power
function:</p>
<p><img src="http://blog.pkh.me/img/oklab-int/srgb-eotf.png" alt="centerimg" title="sRGB EOTF" /></p>
<p>The input is 8-bit (<code>[0x00;0xff]</code> for each of the 3 channels) which means we
can use a simple 256 values lookup table containing the precomputed resulting
linear values. Note that we can already do that with the reference code with a
table remapping the 8-bit index into a float value.</p>
<p>For our integer version we need to pick an arbitrary precision for the linear
representation. <a href="https://blog.demofox.org/2018/03/10/dont-convert-srgb-u8-to-linear-u8/">8-bit is not going to be enough precision</a>, so
we're going to pick the next power of two to be space efficient: 16-bit. We
will be using the constant <code>K=(1<<16)-1=0xffff</code> to refer to this scale.</p>
<p>Alternatively we could rely on a fixed point mapping (an integer for the
decimal part and another integer for the fractional part), but in our case
pretty much everything is normalized so the decimal part doesn't really matter.</p>
<pre><code class="language-c">/**
* Table mapping formula:
* f(x) = x < 0.04045 ? x/12.92 : ((x+0.055)/1.055)^2.4 (sRGB EOTF)
* Where x is the normalized index in the table and f(x) the value in the table.
* f(x) is remapped to [0;K] and rounded.
*/
static const uint16_t srgb2linear[256] = {
0x0000, 0x0014, 0x0028, 0x003c, 0x0050, 0x0063, 0x0077, 0x008b,
0x009f, 0x00b3, 0x00c7, 0x00db, 0x00f1, 0x0108, 0x0120, 0x0139,
0x0154, 0x016f, 0x018c, 0x01ab, 0x01ca, 0x01eb, 0x020e, 0x0232,
0x0257, 0x027d, 0x02a5, 0x02ce, 0x02f9, 0x0325, 0x0353, 0x0382,
0x03b3, 0x03e5, 0x0418, 0x044d, 0x0484, 0x04bc, 0x04f6, 0x0532,
0x056f, 0x05ad, 0x05ed, 0x062f, 0x0673, 0x06b8, 0x06fe, 0x0747,
0x0791, 0x07dd, 0x082a, 0x087a, 0x08ca, 0x091d, 0x0972, 0x09c8,
0x0a20, 0x0a79, 0x0ad5, 0x0b32, 0x0b91, 0x0bf2, 0x0c55, 0x0cba,
0x0d20, 0x0d88, 0x0df2, 0x0e5e, 0x0ecc, 0x0f3c, 0x0fae, 0x1021,
0x1097, 0x110e, 0x1188, 0x1203, 0x1280, 0x1300, 0x1381, 0x1404,
0x1489, 0x1510, 0x159a, 0x1625, 0x16b2, 0x1741, 0x17d3, 0x1866,
0x18fb, 0x1993, 0x1a2c, 0x1ac8, 0x1b66, 0x1c06, 0x1ca7, 0x1d4c,
0x1df2, 0x1e9a, 0x1f44, 0x1ff1, 0x20a0, 0x2150, 0x2204, 0x22b9,
0x2370, 0x242a, 0x24e5, 0x25a3, 0x2664, 0x2726, 0x27eb, 0x28b1,
0x297b, 0x2a46, 0x2b14, 0x2be3, 0x2cb6, 0x2d8a, 0x2e61, 0x2f3a,
0x3015, 0x30f2, 0x31d2, 0x32b4, 0x3399, 0x3480, 0x3569, 0x3655,
0x3742, 0x3833, 0x3925, 0x3a1a, 0x3b12, 0x3c0b, 0x3d07, 0x3e06,
0x3f07, 0x400a, 0x4110, 0x4218, 0x4323, 0x4430, 0x453f, 0x4651,
0x4765, 0x487c, 0x4995, 0x4ab1, 0x4bcf, 0x4cf0, 0x4e13, 0x4f39,
0x5061, 0x518c, 0x52b9, 0x53e9, 0x551b, 0x5650, 0x5787, 0x58c1,
0x59fe, 0x5b3d, 0x5c7e, 0x5dc2, 0x5f09, 0x6052, 0x619e, 0x62ed,
0x643e, 0x6591, 0x66e8, 0x6840, 0x699c, 0x6afa, 0x6c5b, 0x6dbe,
0x6f24, 0x708d, 0x71f8, 0x7366, 0x74d7, 0x764a, 0x77c0, 0x7939,
0x7ab4, 0x7c32, 0x7db3, 0x7f37, 0x80bd, 0x8246, 0x83d1, 0x855f,
0x86f0, 0x8884, 0x8a1b, 0x8bb4, 0x8d50, 0x8eef, 0x9090, 0x9235,
0x93dc, 0x9586, 0x9732, 0x98e2, 0x9a94, 0x9c49, 0x9e01, 0x9fbb,
0xa179, 0xa339, 0xa4fc, 0xa6c2, 0xa88b, 0xaa56, 0xac25, 0xadf6,
0xafca, 0xb1a1, 0xb37b, 0xb557, 0xb737, 0xb919, 0xbaff, 0xbce7,
0xbed2, 0xc0c0, 0xc2b1, 0xc4a5, 0xc69c, 0xc895, 0xca92, 0xcc91,
0xce94, 0xd099, 0xd2a1, 0xd4ad, 0xd6bb, 0xd8cc, 0xdae0, 0xdcf7,
0xdf11, 0xe12e, 0xe34e, 0xe571, 0xe797, 0xe9c0, 0xebec, 0xee1b,
0xf04d, 0xf282, 0xf4ba, 0xf6f5, 0xf933, 0xfb74, 0xfdb8, 0xffff,
};
int32_t srgb_u8_to_linear_int(uint8_t x)
{
return (int32_t)srgb2linear[x];
}
</code></pre>
<p>You may have noticed that we are returning the value in a <code>i32</code>: this is to
ease arithmetic operations (preserving the 16-bit unsigned precision would have
overflow warping implications when working with the value).</p>
<h2>Linear to OkLab</h2>
<p>The OkLab is expressed in a virtually continuous space (floats). If we feed all
16.7 millions sRGB colors to the OkLab transform we get the following ranges in
output:</p>
<pre><code class="language-plaintext">min Lab: 0.000000 -0.233887 -0.311528
max Lab: 1.000000 0.276216 0.198570
</code></pre>
<p>We observe that <code>L</code> is always positive and neatly within <code>[0;1]</code> while <code>a</code> and
<code>b</code> are in a more restricted and signed range. Multiple choices are offered to
us with regard to the integer representation we pick.</p>
<p>Since we chose 16-bit for the input linear value, it makes sense to preserve
that precision for <code>Lab</code>. For the <code>L</code> component, this fits neatly (<code>[0;1]</code> in the ref
maps to <code>[0;0xffff]</code> in the integer version), but for the <code>a</code> and <code>b</code>
component, not so much. We could pick a signed 16-bit, but that would imply a
15-bit precision for the arithmetic and 1-bit for the sign, which is going to
be troublesome: we want to preserve the same precision for <code>L</code>, <code>a</code> and <code>b</code>
since the whole point of this operation is to have a uniform space.</p>
<p>Instead, I decided to go with 16-bits of precision, with one extra bit for the
sign (which will be used for <code>a</code> and <code>b</code>), and thus storing <code>Lab</code> in 3 signed
<code>i32</code>. Alternatively, we could decide to have a 15-bit precision with an extra
bit for the sign by using 3 <code>i16</code>. This should work mostly fine but having the
values fit exactly the boundaries of the storage can be problematic in various
situations, typically anything that involves boundary checks and overflows.
Picking a larger storage simplifies a bunch of things.</p>
<p>Looking at <code>srgb_u8_to_oklab_f32</code> we quickly see that for most of the function
it's simple arithmetic, but we have a cube root (<code>cbrt()</code>), so let's study that
first.</p>
<h3>Cube root</h3>
<p>All the <code>cbrt</code> inputs are driven by this:</p>
<pre><code class="language-c">const float l = 0.4122214708f * r + 0.5363325363f * g + 0.0514459929f * b;
const float m = 0.2119034982f * r + 0.6806995451f * g + 0.1073969566f * b;
const float s = 0.0883024619f * r + 0.2817188376f * g + 0.6299787005f * b;
</code></pre>
<p>This might not be obvious at first glance but here <code>l</code>, <code>m</code> and <code>s</code> all are in
<code>[0;1]</code> range (the sum of the coefficients of each row is <code>1</code>), so we will only
need to deal with this range in our <code>cbrt</code> implementation. This simplifies
greatly the problem!</p>
<p>Now, what does it look like?</p>
<p><img src="http://blog.pkh.me/img/oklab-int/cbrt01.png" alt="centerimg" title="Cube root function between 0 and 1" /></p>
<p>This function is simply the inverse of <code>f(x)=x³</code>, which is a more convenient
function to work with. And I have some great news: not long ago, I wrote <a href="http://blog.pkh.me/p/32-invert-a-function-using-newton-iterations.html">an
article on how to inverse a function</a>, so that's exactly what we
are going to do here: inverse <code>f(x)=x³</code>.</p>
<p>What we first need though is a good approximation of the curve. A straight line
is probably fine but we could try to use some symbolic regression in order to
get some sort of rough polynomial approximation. <a href="https://astroautomata.com/PySR/">PySR</a> can do that in a
few lines of code:</p>
<pre><code class="language-python">import numpy as np
from pysr import PySRRegressor
# 25 points of ³√x within [0;1]
x = np.linspace(0, 1, 25).reshape(-1, 1)
y = x ** (1/3)
model = PySRRegressor(model_selection="accuracy", binary_operators=["+", "-", "*"], niterations=200)
r = model.fit(x, y, variable_names=["x"])
print(r)
</code></pre>
<p>The output is not deterministic for some reason (which is quite annoying) and
the expressions provided usually follows a wonky form. Still, in my run it
seemed to take a liking to the following polynomial: <code>u₀ = x³ - 2.19893x² + 2.01593x + 0.219407</code> (reformatted in a sane polynomial form thanks to
WolframAlpha).</p>
<p>Note that increasing the number of data points is not really a good idea
because we quickly start being confronted to <a href="https://en.wikipedia.org/wiki/Runge%27s_phenomenon">Runge's phenomenon</a>. No
need to overthink it, 25 points is just fine.</p>
<p>Now we can make a few Newton iterations. For that, we need the derivative of
<code>f(uₙ)=uₙ³-x</code>, so <code>f'(uₙ)=3uₙ²</code> and thus the iteration expressions can be
obtained easily:</p>
<pre><code class="language-plaintext">uₙ₊₁ = uₙ - (f(uₙ)-v)/f'(uₙ)
= uₙ - (uₙ³-v)/(3uₙ²)
= (2uₙ³+v)/(3uₙ²)
</code></pre>
<p>If you don't understand what the hell is going on here, check <a href="http://blog.pkh.me/p/32-invert-a-function-using-newton-iterations.html">the article
referred to earlier</a>, we're simply following the recipe here.</p>
<p>Now I had a look into how most libc compute <code>cbrt</code>, and <a href="https://twitter.com/insouris/status/1589649490075561984">despite sometimes
referring to Newton iterations, they were actually using Halley
iterations</a>. So we're going to do the same (not lying, just the
Halley part). To get the Halley iteration instead of Newton, we need the first
but also the second derivative of <code>f(uₙ)=uₙ³-x</code> (<code>f'(uₙ)=3uₙ²</code> and
<code>f"(uₙ)=6uₙ</code>) from which we deduce a relatively simple expression:</p>
<pre><code class="language-plaintext">uₙ₊₁ = uₙ-2f(uₙ)f'(uₙ)/(2f'(uₙ)²-f(uₙ)f"(uₙ))
= uₙ(2x+uₙ³)/(x+2uₙ³)
</code></pre>
<p>We have everything we need to approximate a cube root of a real between <code>0</code> and
<code>1</code>. In Python a complete implementation would be as simple as this snippet:</p>
<pre><code class="language-python">b, c, d = -2.19893, 2.01593, 0.219407
def cbrt01(x):
# We only support [0;1]
if x <= 0: return 0
if x >= 1: return 1
# Initial approximation
u = x**3 + b*x**2 + c*x + d
# 2 Halley iterations
u = u * (2*x+u**3) / (x+2*u**3)
u = u * (2*x+u**3) / (x+2*u**3)
return u
</code></pre>
<p>But now we need to scale the floating values up into 16-bit integers.</p>
<p>First of all, in the integer version our <code>x</code> is actually in <code>K</code> scale, which
means we want to express <code>u</code> according to <code>X=x·K</code>. Similarly, we want to use
<code>B=b·K</code>, <code>C=c·K</code> and <code>D=d·K</code> instead of <code>b</code>, <code>c</code> and <code>d</code> because we have no way
of expressing the former as integer otherwise. Finally, we're not actually
going to compute <code>u₀</code> but <code>u₀·K</code> because we're preserving the scale through the
function. We have:</p>
<pre><code class="language-plaintext">u₀·K = K·(x³ + bx² + cx + d)
= K·((x·K)³/K³ + b(x·K)²/K² + c(x·K)/K + d)
= K·(X³/K³ + bX²/K² + cX/K + d)
= X³·K/K³ + bX²·K/K² + cX·K/K + d·K
= X³/K² + BX²/K² + CX/K + D
= X³/K² + BX²/K² + CX/K + D
= (X³ + BX²)/K² + CX/K + D
= ((X³ + BX²)/K + CX)/K + D
= (X(X² + BX)/K + CX)/K + D
U₀ = (X(X(X + B)/K + CX)/K + D
</code></pre>
<p>With this we have a relatively cheap expression where the <code>K</code> divisions would
still preserve enough precision even if evaluated as integer division.</p>
<p>We can do the same for the Halley iteration. I spare you the algebra, the
expression <code>u(2x+u³) / (x+2u³)</code> becomes <code>(U(2X+U³/K²)) / (X+2U³/K²)</code>.</p>
<p>Looking at this expression you may start to worry about overflows, and that
would fair since even <code>K²</code> is getting dangerously close to the sun (it's
actually already larger than <code>INT32_MAX</code>). For this reason we're going to cheat
and simply use 64-bit arithmetic in this function. I believe we could reduce
the risk of overflow, but I don't think there is a way to remain in 32-bit
without nasty compromises anyway. This is also why in the code below you'll
notice the constants are suffixed with <code>LL</code> (to force long-long/64-bit
arithmetic).</p>
<p>Beware that overflows are a terrible predicament to get into as they will lead
to <a href="http://blog.pkh.me/p/37-gcc-undefined-behaviors-are-getting-wild.html">undefined behaviour</a>. <strong>Do not underestimate this risk</strong>. You might not
detect them early enough, and missing them may mislead you when interpreting
the results. For this reason, I strongly suggest to <strong>always build with
<code>-fsanitize=undefined</code></strong> during test and development. I don't do that often,
but for this kind of research, I also highly recommend to <strong>first write tests
that cover all possible integers input</strong> (when applicable) so that overflows
are detected as soon as possible.</p>
<p>Before we write the integer version of our function, we need to address
rounding. In the case of the initial approximation I don't think we need to
bother, but for our Halley iteration we're going to need as much precision as
we can get. Since we know <code>U</code> is positive (remember we're evaluating <code>cbrt(x)</code>
where <code>x</code> is in <code>[0;1]</code>), we can use <a href="http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html">the <code>(a+b/2)/b</code> rounding
formula</a>.</p>
<p>Our function finally just looks like:</p>
<pre><code class="language-c">#define K2 ((int64_t)K*K)
int32_t cbrt01_int(int32_t x)
{
int64_t u;
/* Approximation curve is for the [0;1] range */
if (x <= 0) return 0;
if (x >= K) return K;
/* Initial approximation: x³ - 2.19893x² + 2.01593x + 0.219407 */
u = x*(x*(x - 144107LL) / K + 132114LL) / K + 14379LL;
/* Refine with 2 Halley iterations. */
for (int i = 0; i < 2; i++) {
const int64_t u3 = u*u*u;
const int64_t den = x + (2*u3 + K2/2) / K2;
u = (u * (2*x + (u3 + K2/2) / K2) + den/2) / den;
}
return u;
}
</code></pre>
<p>Cute, isn't it? If we test the accuracy of this function by calling it for all
the possible values we actually get extremely good results. Here is a test
code:</p>
<pre><code class="language-c">int main(void)
{
float max_diff = 0;
float total_diff = 0;
for (int i = 0; i <= K; i++) {
const float ref = cbrtf(i / (float)K);
const float out = cbrt01_int(i) / (float)K;
const float d = fabs(ref - out);
if (d > max_diff)
max_diff = d;
total_diff += d;
}
printf("max_diff=%f total_diff=%f avg_diff=%f\n",
max_diff, total_diff, total_diff / (K + 1));
return 0;
}
</code></pre>
<p>Output: <code>max_diff=0.030831 total_diff=0.816078 avg_diff=0.000012</code></p>
<p>If we want to trade precision for speed, we could adjust the function to use
Newton iterations, and maybe remove the rounding.</p>
<h3>Back to the core</h3>
<p>Going back to our sRGB-to-OkLab function, everything should look
straightforward to implement now. There is one thing though, while <code>lms</code>
computation (at the beginning of the function) is exclusively working with
positive values, the output <code>Lab</code> value expression is signed. For this reason
we will need a more involved rounded division, so referring again to <a href="http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html">my last
article</a> we will use:</p>
<pre><code class="language-c">static int64_t div_round64(int64_t a, int64_t b) { return (a^b)<0 ? (a-b/2)/b : (a+b/2)/b; }
</code></pre>
<p>And thus, we have:</p>
<pre><code class="language-c">struct LabInt { int32_t L, a, b; };
struct LabInt srgb_u8_to_oklab_int(uint32_t srgb)
{
const int32_t r = (int32_t)srgb2linear[srgb >> 16 & 0xff];
const int32_t g = (int32_t)srgb2linear[srgb >> 8 & 0xff];
const int32_t b = (int32_t)srgb2linear[srgb & 0xff];
// Note: lms can actually be slightly over K due to rounded coefficients
const int32_t l = (27015LL*r + 35149LL*g + 3372LL*b + K/2) / K;
const int32_t m = (13887LL*r + 44610LL*g + 7038LL*b + K/2) / K;
const int32_t s = (5787LL*r + 18462LL*g + 41286LL*b + K/2) / K;
const int32_t l_ = cbrt01_int(l);
const int32_t m_ = cbrt01_int(m);
const int32_t s_ = cbrt01_int(s);
const struct LabInt ret = {
.L = div_round64( 13792LL*l_ + 52010LL*m_ - 267LL*s_, K),
.a = div_round64(129628LL*l_ - 159158LL*m_ + 29530LL*s_, K),
.b = div_round64( 1698LL*l_ + 51299LL*m_ - 52997LL*s_, K),
};
return ret;
}
</code></pre>
<p>The note in this code is here to remind us that we have to saturate <code>lms</code> to a
maximum of <code>K</code> (corresponding to <code>1.0</code> with floats), which is what we're doing
in <code>cbrt01_int()</code>.</p>
<p>At this point we can already work within the OkLab space but we're only
half-way through the pain. Fortunately, things are going to be easier from now
on.</p>
<h2>OkLab to sRGB</h2>
<p>Our OkLab-to-sRGB function relies on the Linear-to-sRGB function (at the end),
so we're going to deal with it first.</p>
<h3>Linear to sRGB</h3>
<p><img src="http://blog.pkh.me/img/oklab-int/srgb-oetf.png" alt="centerimg" title="sRGB OETF" /></p>
<p>Contrary to sRGB-to-Linear it's going to be tricky to rely on a table because
it would be way too large to hold all possible values (since it would require
<code>K</code> entries). I initially considered computing <code>powf(x, 1.f/2.4f)</code> with integer
arithmetic somehow, but this is much more involved than how we managed to
implement <code>cbrt</code>. So instead I thought about approximating the curve with a
bunch of points (stored in a table), and then approximate any intermediate
value with a linear interpolation, that is as if the point were joined through
small segments.</p>
<p>We gave 256 16-bit entries to <code>srgb2linear</code>, so if we were to give as much
storage to <code>linear2srgb</code> we could have a table of 512 8-bit entries (our output
is 8-bit). Here it is:</p>
<pre><code class="language-c">/**
* Table mapping formula:
* f(x) = x < 0.0031308 ? x*12.92 : (1.055)*x^(1/2.4)-0.055 (sRGB OETF)
* Where x is the normalized index in the table and f(x) the value in the table.
* f(x) is remapped to [0;0xff] and rounded.
*
* Since a 16-bit table is too large, we reduce its precision to 9-bit.
*/
static const uint8_t linear2srgb[P + 1] = {
0x00, 0x06, 0x0d, 0x12, 0x16, 0x19, 0x1c, 0x1f, 0x22, 0x24, 0x26, 0x28, 0x2a, 0x2c, 0x2e, 0x30,
0x32, 0x33, 0x35, 0x36, 0x38, 0x39, 0x3b, 0x3c, 0x3d, 0x3e, 0x40, 0x41, 0x42, 0x43, 0x45, 0x46,
0x47, 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56,
0x56, 0x57, 0x58, 0x59, 0x5a, 0x5b, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x5f, 0x60, 0x61, 0x62, 0x62,
0x63, 0x64, 0x65, 0x65, 0x66, 0x67, 0x67, 0x68, 0x69, 0x6a, 0x6a, 0x6b, 0x6c, 0x6c, 0x6d, 0x6e,
0x6e, 0x6f, 0x6f, 0x70, 0x71, 0x71, 0x72, 0x73, 0x73, 0x74, 0x74, 0x75, 0x76, 0x76, 0x77, 0x77,
0x78, 0x79, 0x79, 0x7a, 0x7a, 0x7b, 0x7b, 0x7c, 0x7d, 0x7d, 0x7e, 0x7e, 0x7f, 0x7f, 0x80, 0x80,
0x81, 0x81, 0x82, 0x82, 0x83, 0x84, 0x84, 0x85, 0x85, 0x86, 0x86, 0x87, 0x87, 0x88, 0x88, 0x89,
0x89, 0x8a, 0x8a, 0x8b, 0x8b, 0x8c, 0x8c, 0x8c, 0x8d, 0x8d, 0x8e, 0x8e, 0x8f, 0x8f, 0x90, 0x90,
0x91, 0x91, 0x92, 0x92, 0x93, 0x93, 0x93, 0x94, 0x94, 0x95, 0x95, 0x96, 0x96, 0x97, 0x97, 0x97,
0x98, 0x98, 0x99, 0x99, 0x9a, 0x9a, 0x9a, 0x9b, 0x9b, 0x9c, 0x9c, 0x9c, 0x9d, 0x9d, 0x9e, 0x9e,
0x9f, 0x9f, 0x9f, 0xa0, 0xa0, 0xa1, 0xa1, 0xa1, 0xa2, 0xa2, 0xa3, 0xa3, 0xa3, 0xa4, 0xa4, 0xa5,
0xa5, 0xa5, 0xa6, 0xa6, 0xa6, 0xa7, 0xa7, 0xa8, 0xa8, 0xa8, 0xa9, 0xa9, 0xa9, 0xaa, 0xaa, 0xab,
0xab, 0xab, 0xac, 0xac, 0xac, 0xad, 0xad, 0xae, 0xae, 0xae, 0xaf, 0xaf, 0xaf, 0xb0, 0xb0, 0xb0,
0xb1, 0xb1, 0xb1, 0xb2, 0xb2, 0xb3, 0xb3, 0xb3, 0xb4, 0xb4, 0xb4, 0xb5, 0xb5, 0xb5, 0xb6, 0xb6,
0xb6, 0xb7, 0xb7, 0xb7, 0xb8, 0xb8, 0xb8, 0xb9, 0xb9, 0xb9, 0xba, 0xba, 0xba, 0xbb, 0xbb, 0xbb,
0xbc, 0xbc, 0xbc, 0xbd, 0xbd, 0xbd, 0xbe, 0xbe, 0xbe, 0xbf, 0xbf, 0xbf, 0xc0, 0xc0, 0xc0, 0xc1,
0xc1, 0xc1, 0xc1, 0xc2, 0xc2, 0xc2, 0xc3, 0xc3, 0xc3, 0xc4, 0xc4, 0xc4, 0xc5, 0xc5, 0xc5, 0xc6,
0xc6, 0xc6, 0xc6, 0xc7, 0xc7, 0xc7, 0xc8, 0xc8, 0xc8, 0xc9, 0xc9, 0xc9, 0xc9, 0xca, 0xca, 0xca,
0xcb, 0xcb, 0xcb, 0xcc, 0xcc, 0xcc, 0xcc, 0xcd, 0xcd, 0xcd, 0xce, 0xce, 0xce, 0xce, 0xcf, 0xcf,
0xcf, 0xd0, 0xd0, 0xd0, 0xd0, 0xd1, 0xd1, 0xd1, 0xd2, 0xd2, 0xd2, 0xd2, 0xd3, 0xd3, 0xd3, 0xd4,
0xd4, 0xd4, 0xd4, 0xd5, 0xd5, 0xd5, 0xd6, 0xd6, 0xd6, 0xd6, 0xd7, 0xd7, 0xd7, 0xd7, 0xd8, 0xd8,
0xd8, 0xd9, 0xd9, 0xd9, 0xd9, 0xda, 0xda, 0xda, 0xda, 0xdb, 0xdb, 0xdb, 0xdc, 0xdc, 0xdc, 0xdc,
0xdd, 0xdd, 0xdd, 0xdd, 0xde, 0xde, 0xde, 0xde, 0xdf, 0xdf, 0xdf, 0xe0, 0xe0, 0xe0, 0xe0, 0xe1,
0xe1, 0xe1, 0xe1, 0xe2, 0xe2, 0xe2, 0xe2, 0xe3, 0xe3, 0xe3, 0xe3, 0xe4, 0xe4, 0xe4, 0xe4, 0xe5,
0xe5, 0xe5, 0xe5, 0xe6, 0xe6, 0xe6, 0xe6, 0xe7, 0xe7, 0xe7, 0xe7, 0xe8, 0xe8, 0xe8, 0xe8, 0xe9,
0xe9, 0xe9, 0xe9, 0xea, 0xea, 0xea, 0xea, 0xeb, 0xeb, 0xeb, 0xeb, 0xec, 0xec, 0xec, 0xec, 0xed,
0xed, 0xed, 0xed, 0xee, 0xee, 0xee, 0xee, 0xef, 0xef, 0xef, 0xef, 0xef, 0xf0, 0xf0, 0xf0, 0xf0,
0xf1, 0xf1, 0xf1, 0xf1, 0xf2, 0xf2, 0xf2, 0xf2, 0xf3, 0xf3, 0xf3, 0xf3, 0xf3, 0xf4, 0xf4, 0xf4,
0xf4, 0xf5, 0xf5, 0xf5, 0xf5, 0xf6, 0xf6, 0xf6, 0xf6, 0xf6, 0xf7, 0xf7, 0xf7, 0xf7, 0xf8, 0xf8,
0xf8, 0xf8, 0xf9, 0xf9, 0xf9, 0xf9, 0xf9, 0xfa, 0xfa, 0xfa, 0xfa, 0xfb, 0xfb, 0xfb, 0xfb, 0xfb,
0xfc, 0xfc, 0xfc, 0xfc, 0xfd, 0xfd, 0xfd, 0xfd, 0xfd, 0xfe, 0xfe, 0xfe, 0xfe, 0xff, 0xff, 0xff,
};
</code></pre>
<p>Again we're going to start with the floating point version as it's easier to reason with.</p>
<p>We have a precision <code>P</code> of 9-bits: <code>P = (1<<9)-1 = 511 = 0x1ff</code>. But for the
sake of understanding the math, the following diagram will assume a <code>P</code> of <code>3</code>
so that we can clearly see the segment divisions:</p>
<p><img src="http://blog.pkh.me/img/oklab-int/srgb-eotf-lut.png" alt="centerimg" title="sRGB EOTF with a LUT of P=3" /></p>
<p>The input of our table is an integer index which needs to be calculated
according to our input <code>x</code>. But as stated earlier, we won't need one but two
indices in order to interpolate a point between 2 discrete values from our
table. We will refer to these indices as <code>iₚ</code> and <code>iₙ</code>, which can be computed
like this:</p>
<pre><code class="language-plaintext">i = x·P
iₚ = ⌊i⌋
iₙ = iₚ + 1
</code></pre>
<p>(<code>⌊a⌋</code> means <code>floor(a)</code>)</p>
<p>In order to get an approximation of <code>y</code> according to <code>i</code>, we simply need a
linear remapping: the ratio of <code>i</code> between <code>iₚ</code> and <code>iₙ</code> is the same ratio as
<code>y</code> between <code>yₚ</code> and <code>yₙ</code>. So yet again we're going to rely on <a href="http://blog.pkh.me/p/29-the-most-useful-math-formulas.html">the most useful
maths formulas</a>: <code>remap(iₚ,iₙ,yₚ,yₙ,i) = mix(yₚ,yₙ,linear(iₚ,iₙ,i))</code>.</p>
<p>The ratio <code>r</code> we're computing as an input to the y-mix can be simplified a bit:</p>
<pre><code class="language-plaintext">r = linear(iₚ,iₙ,i)
= (i-iₚ) / (iₙ-iₚ)
= i-iₚ
= x·P - ⌊x·P⌋
= fract(x·P)
</code></pre>
<p>So in the end our formula is simply: <code>y = mix(yₚ,yₙ,fract(x·P))</code></p>
<p>Translated into C we can write it like this:</p>
<pre><code class="language-c">uint8_t linear_f32_to_srgb_u8_fast(float x)
{
if (x <= 0.f) {
return 0;
} else if (x >= 1.f) {
return 0xff;
} else {
const float i = x * P;
const int32_t idx = (int32_t)floorf(i);
const float y0 = linear2srgb[idx];
const float y1 = linear2srgb[idx + 1];
const float r = i - idx;
return lrintf(mix(y0, y1, r));
}
}
</code></pre>
<p><strong>Note</strong>: in case you are concerned about <code>idx+1</code> overflowing,
<code>floorf((1.0-FLT_EPSILON)*P)</code> is <code>P-1</code>, so this is safe.</p>
<h3>Linear to sRGB, integer version</h3>
<p>In the integer version, our function input <code>x</code> is within <code>[0;K]</code>, so we need to
make a few adjustments.</p>
<p>The first issue we have is that with integer arithmetic our <code>i</code> and <code>idx</code> are
the same. We have <code>X=x·K</code> as input, so <code>i = idx = X·P/K</code> because we are using
an integer division, which in this case is equivalent to the <code>floor()</code>
expression in the float version. So while it's a simple and fast way to get
<code>yₚ</code> and <code>yₙ</code>, we have an issue figuring out the ratio <code>r</code>.</p>
<p>One tool we have is the modulo operator: the integer division is destructive of
the fractional part, but fortunately the modulo (the rest of the division)
gives this information back. It can also be obtained for free most of the time
because CPU division instructions tend to also provide that modulo as well
without extra computation.</p>
<p>If we give <code>m = (X·P) % K</code>, we have the fractional part of the division
expressed in the <code>K</code> scale, which means we can derivate our ratio <code>r</code> from it:
<code>r = m / K</code>.</p>
<p>Slipping the <code>K</code> division in our <code>mix()</code> expression we end up with the
following code:</p>
<pre><code class="language-c">uint8_t linear_int_to_srgb_u8(int32_t x)
{
if (x <= 0) {
return 0;
} else if (x >= K) {
return 0xff;
} else {
const int32_t xP = x * P;
const int32_t i = xP / K;
const int32_t m = xP % K;
const int32_t y0 = linear2srgb[i];
const int32_t y1 = linear2srgb[i + 1];
return (m * (y1 - y0) + K/2) / K + y0;
}
}
</code></pre>
<p>Testing this function for all the possible input of <code>x</code>, the biggest inaccuracy
is a off-by-one, which concerns 6280 of the 65536 possible values (less than
10%): 2886 "off by -1" and 3394 "off by +1". It matches exactly the inaccuracy
of the float version of this function, so I think we can be pretty happy with it.</p>
<p>Given how good this approach is, we could also consider applying the same
strategy for <code>cbrt</code>, so this is left as an exercise to the reader.</p>
<h3>Back to the core</h3>
<p>We're finally in our last function. Using everything we've learned so far, it
can be trivially converted to integer arithmetic:</p>
<pre><code class="language-c">uint32_t oklab_int_to_srgb_u8(struct LabInt c)
{
const int64_t l_ = c.L + div_round64(25974LL * c.a, K) + div_round64( 14143LL * c.b, K);
const int64_t m_ = c.L + div_round64(-6918LL * c.a, K) + div_round64( -4185LL * c.b, K);
const int64_t s_ = c.L + div_round64(-5864LL * c.a, K) + div_round64(-84638LL * c.b, K);
const int32_t l = l_*l_*l_ / K2;
const int32_t m = m_*m_*m_ / K2;
const int32_t s = s_*s_*s_ / K2;
const uint8_t r = linear_int_to_srgb_u8((267169LL * l - 216771LL * m + 15137LL * s + K/2) / K);
const uint8_t g = linear_int_to_srgb_u8((-83127LL * l + 171030LL * m - 22368LL * s + K/2) / K);
const uint8_t b = linear_int_to_srgb_u8(( -275LL * l - 46099LL * m + 111909LL * s + K/2) / K);
return r<<16 | g<<8 | b;
}
</code></pre>
<p>Important things to notice:</p>
<ul>
<li>we're storing <code>l_</code>, <code>m_</code> and <code>s_</code> in 64-bits values so that the following
cubic do not overflow</li>
<li>we're using <code>div_round64</code> for part of the expressions of <code>l_</code>, <code>m_</code> and <code>s_</code>
because they are using signed sub-expressions</li>
<li>we're using a naive integer division in <code>r</code>, <code>g</code> and <code>b</code> because the value is
expected to be positive</li>
</ul>
<h2>Evaluation</h2>
<p>We're finally there. In the end the complete code is less than 200 lines of
code and even less for the optimized float one (assuming we don't implement our
own <code>cbrt</code>). The complete code, test functions and benchmarks tools <a href="https://github.com/ubitux/oklab-int">can be
found on Github</a>.</p>
<h3>Accuracy</h3>
<p>Comparing the integer version to the reference float gives use the following results:</p>
<ul>
<li>sRGB to OkLab: <code>max_diff=0.000883 total_diff=0.051189</code></li>
<li>OkLab to sRGB: <code>max_diff_r=2 max_diff_g=1 max_diff_b=1</code></li>
</ul>
<p>I find these results pretty decent for an integer version, but you're free to
disagree and improve them.</p>
<h3>Speed</h3>
<p>The benchmarks are also interesting: on my main workstation (Intel® Core™
i7-12700, glibc 2.36, GCC 12.2.0), the integer arithmetic is slightly slower
that the optimized float version:</p>
<table>
<thead>
<tr>
<th style="text-align:left">Command</th>
<th style="text-align:right">Mean [s]</th>
<th style="text-align:right">Min [s]</th>
<th style="text-align:right">Max [s]</th>
<th style="text-align:right">Relative</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left"><strong>Reference</strong></td>
<td style="text-align:right">1.425 ± 0.008</td>
<td style="text-align:right">1.414</td>
<td style="text-align:right">1.439</td>
<td style="text-align:right">1.59 ± 0.01</td>
</tr>
<tr>
<td style="text-align:left"><strong>Fast float</strong></td>
<td style="text-align:right">0.897 ± 0.005</td>
<td style="text-align:right">0.888</td>
<td style="text-align:right">0.902</td>
<td style="text-align:right">1.00</td>
</tr>
<tr>
<td style="text-align:left"><strong>Integer arithmetic</strong></td>
<td style="text-align:right">0.937 ± 0.006</td>
<td style="text-align:right">0.926</td>
<td style="text-align:right">0.947</td>
<td style="text-align:right">1.04 ± 0.01</td>
</tr>
</tbody>
</table>
<p>Observations:</p>
<ul>
<li>The FPU is definitely fast in modern CPUs</li>
<li>Both integer and optimized float versions are destroying the reference code
(note that this only because of the transfer functions optimizations, as we
have no change in the OkLab functions themselves in the optimized float
version)</li>
</ul>
<p>On the other hand, on one of my random ARM board (NanoPI NEO 2 with a Cortex
A53, glibc 2.35, GCC 12.1.0), I get different results:</p>
<table>
<thead>
<tr>
<th style="text-align:left">Command</th>
<th style="text-align:right">Mean [s]</th>
<th style="text-align:right">Min [s]</th>
<th style="text-align:right">Max [s]</th>
<th style="text-align:right">Relative</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left"><strong>Reference</strong></td>
<td style="text-align:right">27.678 ± 0.009</td>
<td style="text-align:right">27.673</td>
<td style="text-align:right">27.703</td>
<td style="text-align:right">2.04 ± 0.00</td>
</tr>
<tr>
<td style="text-align:left"><strong>Fast float</strong></td>
<td style="text-align:right">15.769 ± 0.001</td>
<td style="text-align:right">15.767</td>
<td style="text-align:right">15.772</td>
<td style="text-align:right">1.16 ± 0.00</td>
</tr>
<tr>
<td style="text-align:left"><strong>Integer arithmetic</strong></td>
<td style="text-align:right">13.551 ± 0.001</td>
<td style="text-align:right">13.550</td>
<td style="text-align:right">13.553</td>
<td style="text-align:right">1.00</td>
</tr>
</tbody>
</table>
<p>Not that much faster proportionally speaking, but the integer version is still
significantly faster overall on such low-end device.</p>
<h2>Conclusion</h2>
<p>This took me ages to complete, way longer than I expected but I'm pretty happy
with the end results and with everything I learned in the process. Also, you
may have noticed how much I referred to previous work; this has been
particularly satisfying from my point of view (re-using previous toolboxes
means they were actually useful). This write-up won't be an exception to the
rule: in a later article, I will make use of OkLab for another project I've
been working on for a while now. See you soon!</p>
http://blog.pkh.me/p/37-gcc-undefined-behaviors-are-getting-wild.html
http://blog.pkh.me/p/37-gcc-undefined-behaviors-are-getting-wild.html
GCC undefined behaviors are getting wildSun, 27 Nov 2022 22:13:26 -0000<p>Happy with my recent breakthrough in <a href="http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html">understanding C integer divisions</a>
after weeks of struggle, I was minding my own business having fun writing
integer arithmetic code. Life was good, when suddenly… <code>zsh: segmentation fault (core dumped)</code>.</p>
<p>That code wasn't messing with memory much so it was more likely to be a side
effect of an arithmetic overflow or something. Using <code>-fsanitize=undefined</code>
quickly identified the issue, which confirmed the presence of an integer
overflow. The fix was easy but something felt off. I was under the impression
my code was robust enough against that kind of honest mistake. Turns out, the
protecting condition I had in place should indeed have been enough, so I tried
to extract a minimal reproducible case:</p>
<pre><code class="language-c">#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
uint8_t tab[0x1ff + 1];
uint8_t f(int32_t x)
{
if (x < 0)
return 0;
int32_t i = x * 0x1ff / 0xffff;
if (i >= 0 && i < sizeof(tab)) {
printf("tab[%d] looks safe because %d is between [0;%d[\n", i, i, (int)sizeof(tab));
return tab[i];
}
return 0;
}
int main(int ac, char **av)
{
return f(atoi(av[1]));
}
</code></pre>
<p>The overflow can happen on <code>x * 0x1ff</code>. Since an integer overflow is undefined,
GCC makes the assumption that it cannot happen, ever. In practice in this case
it does, but the <code>i >= 0 && i < sizeof(tab)</code> condition should be enough to take
care of it, whatever crazy value it becomes, right? Well, I have bad news:</p>
<pre><code class="language-shell">% cc -Wall -O2 overflow.c -o overflow && ./overflow 50000000
tab[62183] looks safe because 62183 is between [0;512[
zsh: segmentation fault (core dumped) ./overflow 50000000
</code></pre>
<p><strong>Note</strong>: this is GCC <code>12.2.0</code> on x86-64.</p>
<p>We have <code>i=62183</code> as the result of the overflow, and nevertheless the execution
violates the gate condition, spout a non-sense lie, go straight into
dereferencing <code>tab</code>, and die miserably.</p>
<p>Let's study what GCC is doing here. Firing up Ghidra we observe the following
decompiled code:</p>
<pre><code class="language-c">uint8_t f(int x)
{
int tmp;
if (-1 < x) {
tmp = x * 0x1ff;
if (tmp < 0x1fffe00) {
printf("tab[%d] looks safe because %d is between [0;%d[\n",(ulong)(uint)tmp / 0xffff, (ulong)(uint)tmp / 0xffff,0x200);
return tab[(int)((uint)tmp / 0xffff)];
}
}
return '\0';
}
</code></pre>
<p>When I said GCC makes the assumption that it cannot happen this is what I
meant: <code>tmp</code> is not supposed to overflow so part of the condition I had in
place was simply removed. More specifically since <code>x</code> can not be lesser than
<code>0</code>, and since GCC assumes a multiplication cannot overflow into a random value
(that could be negative) because it is undefined behaviour, it then decides to
drop the "redundant" <code>i >= 0</code> condition because "it cannot happen".</p>
<p>I <a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107890">reported that exact issue to GCC</a> to make sure it wasn't a bug, and
it was indeed confirmed to me that the undefined behaviour of an integer
overflow is not limited in scope to whatever insane value it could take: it is
apparently perfectly acceptable to mess up the code flow entirely.</p>
<p>While I understand how attractive it can be from an optimization point of view,
the paranoid developer in me is straight up terrified by the perspective of a
single integer overflow removing security protection and causing such havoc.
I've worked several years in a project where the integer overflows were (and
probably still are) legion. Identifying and fixing of all them is likely a
lifetime mission of several opinionated individuals.</p>
<p>I'm expecting this article to make the rust crew go in a crusade again, and I
think I might be with them this time.</p>
<p><strong>Edit</strong>: it was made clear to me while reading <a href="https://predr.ag/blog/falsehoods-programmers-believe-about-undefined-behavior/">Predrag's blog</a> that
the key to my misunderstanding boils down to this: "Undefined behavior is not
the same as implementation-defined behavior". While I was indeed talking about
undefined behaviour, subconsciously I was thinking that the behaviour of an
overflow on a multiplication would be "implementation-defined behaviour". This
is not the case, it is indeed an undefined behaviour, and yes the compiler is
free to do whatever it wants to because it is compliant with the
specifications. It's my mistake of course, but to my defense, despite the
arrogant comments I read, this confusion happens a lot. This happens I believe
because it's violating the <a href="https://en.wikipedia.org/wiki/Principle_of_least_astonishment">Principle of least astonishment</a>. To
illustrate this I'll take <a href="https://undeadly.org/cgi?action=article&sid=20060330071917">this interesting old OpenBSD developer blog
post</a> being concerned about the result of the multiplication
rather than the invalidation of any guarantee with regard to what's going to
happen to the execution flow (before and after). This is not uncommon and in my
opinion perfectly understandable.</p>
http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html
http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html
Figuring out round, floor and ceil with integer divisionFri, 25 Nov 2022 08:28:34 -0000<p>Lately I've been transforming a float based algorithm to integers in order to
make it bit-exact. Preserving the precision as best as possible was way more
challenging than I initially though, which forced me to go deep down the rabbit
hole. During the process I realized I had many wrong assumptions about integer
divisions, and also discovered some remarkably useful mathematical properties.</p>
<p>This story is about a journey into figuring out equivalent functions to
<code>round(a/b)</code>, <code>floor(a/b)</code> and <code>ceil(a/b)</code> with <code>a</code> and <code>b</code> integers, while
staying in the integer domain (no intermediate <code>float</code> transformation allowed).</p>
<p><strong>Note</strong>: for the sake of conciseness (and to make a bridge with the
mathematics world), <code>floor(x)</code> and <code>ceil(x)</code> will sometimes respectively be
written <code>⌊x⌋</code> and <code>⌈x⌉</code>.</p>
<h2>Clarifying the mission</h2>
<p>Better than explained with words, here is how the functions we're looking for
behave with a real as input:</p>
<p><img src="http://blog.pkh.me/img/intdiv/round-floor-ceil.png" alt="centerimg" /></p>
<p>The dots indicate on which lines the stitching applies; for example <code>round(½)</code>
is <code>1</code>, not <code>0</code>.</p>
<h2>Language specificities (important!)</h2>
<p>Here are the corresponding prototypes, in C:</p>
<pre><code class="language-c">int div_round(int a, int b); // round(a/b)
int div_floor(int a, int b); // floor(a/b)
int div_ceil(int a, int b); // ceil(a/b)
</code></pre>
<p>We're going to work in C99 (or more recent), and this is actually the first
warning I have here. If you're working with a different language, you must
absolutely look into how its integer division works. In C, the integer division
is <strong>toward zero</strong>, for <strong>both positive and negative integers</strong>, and only
defined as such <strong>starting C99</strong> (it is implementation defined before that). Be
mindful about it if your codebase is in C89 or C90.</p>
<p>This means that in C:</p>
<pre><code class="language-c">printf("%d %d %d\n", 10/30, 15/30, 20/30);
printf("%d %d %d\n", -10/30, -15/30, -20/30);
</code></pre>
<p>We get:</p>
<pre><code class="language-plaintext">0 0 0
0 0 0
</code></pre>
<p>This is typically different in Python:</p>
<pre><code class="language-python">>>> 10//30, 15//30, 20//30
(0, 0, 0)
>>> -10//30, -15//30, -20//30
(-1, -1, -1)
</code></pre>
<p>In Python 2 and 3, the integer division is toward -∞, which means it is
directly equivalent to how the <code>floor()</code> function behaves.</p>
<p>In C, the integer division is equivalent to <code>floor()</code> <strong>only for positive
numbers</strong>, otherwise it behaves the same as <code>ceil()</code>. This is the division
behavior we will assume in this article:</p>
<p><img src="http://blog.pkh.me/img/intdiv/c-div.png" alt="centerimg" /></p>
<p>And again, I can't stress that enough: make sure you understand how the integer
division of your language works.</p>
<p>Similarly, you may have noticed we picked the <code>round</code> function as defined by
POSIX, meaning rounding half away from <code>0</code>. Again, in Python a different method
was selected:</p>
<pre><code class="language-python">>>> [round(x) for x in (0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5)]
[0, 2, 2, 4, 4, 6, 6]
</code></pre>
<p>Python is following the round toward even choice rule. This is not what we are
implementing here (<strong>Edit</strong>: a partial implementation is provided at the end
though). There are <a href="https://en.wikipedia.org/wiki/Rounding">many ways of rounding</a>, so make sure you've
clarified what method your language picked.</p>
<h2>Ceiling and flooring</h2>
<p>The integer division is symmetrical around <code>0</code> but <code>ceil</code> and <code>floor</code> aren't,
so we need a way get the sign in order to branch in one direction or another.
If <code>a</code> and <code>b</code> have the same sign, then <code>a/b</code> is positive, otherwise it's
negative. This is well expressed with a <code>xor</code> operator, so we will be using the
sign of <code>(a^b)</code> (where <code>^</code> is a <code>xor</code> operator). Of course we only need to
<code>xor</code> the sign bit so we could instead use <code>(a<0)^(b<0)</code> but it is a bit more
complex.</p>
<p><strong>Edit</strong>: note that <code>(a^b)</code> is not <code>> 0</code> when <code>a == b</code>. Also, as <a href="https://lobste.rs/s/eggk4l/figuring_out_round_floor_ceil_with#c_okgqlh">pointed out
on lobste.rs</a> it's likely to rely on unspecified /
implementation-defined behavior (hopefully not undefined behaviour). We could
use the safer <code>(a<0)^(b<0)</code> form which only generates an extra shift
instruction on x86.</p>
<p>Looking at the graphics, we observe the following symmetries:</p>
<ul>
<li><code>floor(x)</code>:
<ul>
<li>For positive <code>x</code>, the C division works the same</li>
<li>For negative <code>x</code>, the C division is one step too high (with the exception
of the stitching point)</li>
</ul>
</li>
<li><code>ceil(x)</code>
<ul>
<li>For negative <code>x</code>, the C division works the same</li>
<li>For positive <code>x</code>, the C division is one step too low (with the exception
of the stitching point)</li>
</ul>
</li>
</ul>
<p>We can translate these observations into code using a modulo trick (which
purpose is to <strong>not</strong> offset the stitching point when the division is round):</p>
<pre><code class="language-c">int div_floor(int a, int b) { return a/b - (a%b!=0 && (a^b)<0); }
int div_ceil(int a, int b) { return a/b + (a%b!=0 && (a^b)>0); }
</code></pre>
<p>One may wonder about the double division (<code>a/b</code> and <code>a%b</code>), but fortunately CPU
architectures usually offer a division instruction that computes both at once
so this is not as expensive as it would seem in the first place.</p>
<p>Now you also have an alternative without the modulo, but it generates less
effective code (at least here on <code>x86-64</code> with a modern CPU according to my
benchmarks):</p>
<pre><code class="language-c">int div_floor(int a, int b) { return (a^b)<0 && a ? (1-abs(a))/abs(b)-1 : a/b; }
int div_ceil(int a, int b) { return (a^b)>0 && a ? (abs(a)-1)/abs(b)+1 : a/b; }
</code></pre>
<p><strong>Edit</strong>: note that these versions suffer from undefined behaviour in case of
<code>abs(INT_MIN)</code> as pointed out by <code>nortti</code> in previous comment about <code>xor</code>.</p>
<p>I have no hard proof to provide for these right now, so this is left as an
exercise to the reader, but some tools can be found in in <em>Concrete Mathematics
(2nd ed)</em> by Ronald L. Graham, Donald E. Knuth and Oren Patashnik. In
particular:</p>
<ul>
<li>the reflection properties: <code>⌊-x⌋ = -⌈x⌉</code> and <code>⌈-x⌉ = -⌊x⌋</code></li>
<li><code>⌈n/m⌉ = ⌊(n-1)/m⌋+1</code> and <code>⌊n/m⌋ = ⌈(n+1)/m⌉-1</code></li>
</ul>
<h2>Rounding</h2>
<p>The <code>round()</code> function is the most useful one when trying to approximate floats
operations with integers (typically what I was looking for initially:
converting an algorithm into a bit-exact one).</p>
<p>We are going to study the positive ones only at first, and try to define it
according to the integer C division (just like we did for <code>floor</code> and <code>ceil</code>).
Since we are on the positive side, the division is equivalent to a <code>floor()</code>,
which simplifies a bunch of things.</p>
<p>I initially used a <code>round</code> function defined as <code>round(a,b) = (a+b/2)/b</code> and
thought to myself "if we are improving the accuracy of the division by <code>b</code>
using a <code>b/2</code> offset, why shouldn't we also improve the accuracy of <code>b/2</code> by
doing <code>(b+1)/2</code> instead?" Very proud of my deep insight I went on with this,
until I realized it was causing more off by ones (with a bias always in the
same direction). So <strong>don't do that</strong>, it's wrong, we will instead try to find
the appropriate formula.</p>
<p>Looking at the <code>round</code> function we can make the observation that it's pretty
much the <code>floor()</code> function with the <code>x</code> offset by <code>½</code>: <code>round(x) = floor(x+½)</code></p>
<p>So we have:</p>
<pre><code class="language-plaintext">round(a/b) = ⌊a/b + ½⌋
= ⌊(2a+b)/(2b)⌋
</code></pre>
<p>We could stop right here but this suffers from overflow limitations if
translated into C. We are lucky though, because we're about to discover the
most mind blowing property of integers division:</p>
<p><img src="http://blog.pkh.me/img/intdiv/nested-division.png" alt="centerimg" /></p>
<p>This again comes from <em>Concrete Mathematics (2nd ed)</em>, page 72.</p>
<p>You may not immediately realize how insane and great this is, so let me
elaborate: it basically means <code>N</code> successive truncating divisions can be merged
into one <strong>without loss of precision</strong> (and the other way around).</p>
<p>Here is a concrete example:</p>
<pre><code class="language-python">>>> n = 5647817612937
>>> d = 712
>>> n//d//d//d == n//(d*d*d)
True
</code></pre>
<p>That's great but how does that help us? Well, we can do this now:</p>
<pre><code class="language-plaintext">round(a/b) = ⌊a/b + ½⌋
= ⌊(2a+b)/(2b)⌋
= ⌊⌊(2a+b)/2⌋/b⌋ <--- applying the nested division property to split in 2 floor expressions
= ⌊⌊a+b/2⌋/b⌋
= ⌊(a+⌊b/2⌋)/b⌋
</code></pre>
<p>How cute is that, we're back to the original formula I was using: <code>round(a,b) = (a+b/2)/b</code> (because again the C division is equivalent to <code>floor()</code> for
positive values).</p>
<p>Now how about the negative version, that is when <code>a/b < 0</code>? We can make the
similar observation that for a negative <code>x</code>, <code>round(x) = ceil(x-½)</code>, so we
have:</p>
<pre><code class="language-plaintext">round(a/b) = ⌈a/b - ½⌉
= ⌈(2a-b)/(2b)⌉
= ⌈⌈(2a-b)/2⌉/b⌉
= ⌈⌈a-b/2⌉/b⌉
= ⌈(a-⌈b/2⌉)/b⌉
</code></pre>
<p>And since <code>a/b</code> is negative, the C division is equivalent to <code>ceil()</code>. So in
the end we simply have:</p>
<pre><code class="language-c">int div_round(int a, int b) { return (a^b)<0 ? (a-b/2)/b : (a+b/2)/b; }
</code></pre>
<p>This is the generic version, but of course in many cases we can (and probably
should) simplify the expression appropriately.</p>
<p>Let's say for example we want to remap an <code>u16</code> to an <code>u8</code>:
<code>remap(x,0,0xff,0,0xffff) = x*0xff/0xffff = x/257</code>. The appropriate way to
round this division is simply: <code>(x+257/2)/257</code>, or just: <code>(x+128)/257</code>.</p>
<p><strong>Edit</strong>: it was pointed out several times on <a href="https://news.ycombinator.com/item?id=33751236">HackerNews</a> that
this function still suffer from overflows. Though, it remains more robust than
the previous version with <code>×2</code>.</p>
<h2>Bonus: partial round to even choice rounding</h2>
<p>Equivalent to <code>lrintf</code>, this function provided by <a href="https://mathstodon.xyz/@antopatriarca/109408606503586148">Antonio on
Mastodon</a> can be used:</p>
<pre><code class="language-c">static int div_lrint(int a, int b)
{
const int d = a/b;
const int m = a%b;
return m < b/2 + (b&1) ? d : m > b/2 ? d + 1 : (d + 1) & ~1;
}
</code></pre>
<p><strong>Warning</strong>: this only works with positive values.</p>
<h2>Verification</h2>
<p>Since you should definitely not trust my math nor my understanding of
computers, here is a test code to demonstrate the exactitude of the formulas:</p>
<pre><code class="language-c">#include <stdio.h>
#include <math.h>
static int div_floor(int a, int b) { return a/b - (a%b!=0 && (a^b)<0); }
static int div_ceil(int a, int b) { return a/b + (a%b!=0 && (a^b)>0); }
static int div_round(int a, int b) { return (a^b)<0 ? (a-b/2)/b : (a+b/2)/b; }
#define N 3000
int main()
{
for (int a = -N; a <= N; a++) {
for (int b = -N; b <= N; b++) {
if (!b)
continue;
const float f = a / (float)b;
const int ef = (int)floorf(f);
const int er = (int)roundf(f);
const int ec = (int)ceilf(f);
const int of = div_floor(a, b);
const int or = div_round(a, b);
const int oc = div_ceil(a, b);
const int df = ef != of;
const int dr = er != or;
const int dc = ec != oc;
if (df || dr || dc) {
fprintf(stderr, "%d/%d=%g%s\n", a, b, f, (a ^ b) < 0 ? " (diff sign)" : "");
if (df) fprintf(stderr, "floor: %d ≠ %d\n", of, ef);
if (dr) fprintf(stderr, "round: %d ≠ %d\n", or, er);
if (dc) fprintf(stderr, "ceil: %d ≠ %d\n", oc, ec);
}
}
}
return 0;
}
</code></pre>
<h2>Conclusion</h2>
<p>These trivial code snippets have proven to be extremely useful to me so far,
and I have the hope that it will benefit others as well. I spent an
unreasonable amount of time on this issue, and given the amount of mistakes (or
at the very least non optimal code) I've observed in the wild, I'm most
certainly not the only one being confused about all of this.</p>
http://blog.pkh.me/p/35-investigating-why-steam-started-picking-a-random-font.html
http://blog.pkh.me/p/35-investigating-why-steam-started-picking-a-random-font.html
Investigating why Steam started picking a random fontFri, 18 Nov 2022 22:17:04 -0000<p>Out of the blue my Steam started picking a random font I had in my user fonts
dir: <a href="https://github.com/excalidraw/virgil/">Virgil</a>, the <a href="https://excalidraw.com/">Excalidraw</a> font.</p>
<p><img src="http://blog.pkh.me/img/steam-font-broken.png" alt="centerimg" /></p>
<p>That triggered me all sorts of emotions, ranging from laugh to total
incredulity. I initially thought the root cause was a random derping from Valve
but the Internet seemed quiet about it, so the unreasonable idea that it might
have been my fault surfaced.</p>
<p>To understand how it came to this, I have to tell you about <a href="https://store.steampowered.com/app/221910/The_Stanley_Parable/">The Stanley
Parable</a>, an incredibly funny game I highly recommend. One of the
achievement of the game is to not play it for 5 years.</p>
<p>To get it, I disabled NTP, changed my system clock to 2030, started the game,
enjoyed my achievement and restored NTP. So far so good, mission is a success,
I can move on with my life.</p>
<p>But not satisfied with this first victory I soon wanted to achieve the same in
<a href="https://store.steampowered.com/app/1703340/The_Stanley_Parable_Ultra_Deluxe/">the Ultra Deluxe</a> edition. This one comes with the same
achievement, except it's 10 years instead of 5. Since 2022+10 is too hard of a
mental calculation for me I rounded it up to 2040, and followed the same
procedure as previously. Achievement unlocked, easy peasy.</p>
<p>Problem is, Steam accessed many files during that short lapse of time, which
caused them to have their access time updated to 2040. And you know what's
special about 2040? It's <strong>after 2038</strong>.</p>
<p>Get it yet? Here is a hint: <a href="https://en.wikipedia.org/wiki/Year_2038_problem">Year 2038 problem</a>.</p>
<p>This is the kind of error I was seeing in the console: <code>"/usr/share/fonts": Value too large for defined data type</code>.</p>
<p>What kind of error could that be?</p>
<pre><code class="language-shell">% errno -s "Value too large"
EOVERFLOW 75 Value too large for defined data type
</code></pre>
<p>Nice, so we're triggering an overflow somewhere. More precisely, fontconfig
32-bit (an underlying code to be exact) was going mad crazy because of this:</p>
<pre><code class="language-shell">% stat /etc/fonts/conf.d/*|grep 2040
Access: 2040-11-22 00:00:04.110328309 +0100
Access: 2040-11-22 00:00:04.110328309 +0100
Access: 2040-11-22 00:00:04.110328309 +0100
...
</code></pre>
<p>In order to fix this mess I had to be a bit brutal:</p>
<pre><code class="language-shell">% sudo mount -o remount,strictatime /
% sudo mount -o remount,strictatime /home
% sudo find / -newerat 2039-12-31 -exec touch -a {} +
% sudo mount -o remount,relatime /
% sudo mount -o remount,relatime /home
</code></pre>
<p>The remounts were needed because <code>relatime</code> is the default, which means file
accesses get updated only if the current time is past the access time. And I
had to remount both my root and home partition because Steam touched files
everywhere.</p>
<p>Not gonna lie, this self-inflicted bug brought quite a few life lessons to me:</p>
<ul>
<li>The Stanley Parable meta-game has no limit to madness</li>
<li>2038 is going to be a lot of fun</li>
<li>32-bit games preservation is a sad state of affair</li>
</ul>
http://blog.pkh.me/p/34-exploring-intricate-execution-mysteries-by-reversing-a-crackme.html
http://blog.pkh.me/p/34-exploring-intricate-execution-mysteries-by-reversing-a-crackme.html
Exploring intricate execution mysteries by reversing a crackmeThu, 27 Oct 2022 10:04:29 -0000<p>It's been a very long time since I've done some actual reverse engineering
work. Going through a difficult period currently, I needed to take a break from
the graphics world and go back to the roots: understanding obscure or
elementary tech stuff. One may argue that it was most certainly not the best
way to deal with a burnout, but apparently that was what I needed at that
moment. Put on your black hoodie and follow me, it's gonna be fun.</p>
<h2>The beginning and the start of the end</h2>
<p>So I started solving a few crackmes from <a href="https://crackmes.one">crackmes.one</a> to get a hang of
it. Most were solved in a relatively short time window, until I came across
<a href="https://crackmes.one/crackme/615888be33c5d4329c344f66">JCWasmx86's cm001</a>. I initially thought the most interesting part was
going to be reversing the key verification algorithm, and I couldn't be more
wrong. This article will be focusing on various other aspects (while still
covering the algorithm itself).</p>
<h2>The validation function</h2>
<p>After loading the executable into <a href="https://github.com/NationalSecurityAgency/ghidra">Ghidra</a> and following the entry
point, we can identify the <code>main</code> function quickly. A few renames later we
figure out that it's a pretty straightforward function (code adjusted manually
from the decompiled view):</p>
<pre><code class="language-c">int main(void)
{
char input[64+1] = {0};
puts("Input:");
fgets(input, sizeof(input), stdin);
validate_input(input, strlen(input));
return 0;
}
</code></pre>
<p>The <code>validate_input()</code> function on the other hand is quite a different beast.
According to the crackme description we can expect some parts written in
assembly. And indeed, it's hard to make Ghidra generate a sane decompiled code
out of it. For that reason, we are going to switch to a graph view
representation.</p>
<p>I'm going to use <a href="https://cutter.re/">Cutter</a> for… aesthetic reasons. Here it is, with a
few annotations to understand what is actually happening:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/validate_input.png" alt="centerimg" /></p>
<p>To summarize, we have a 64 bytes long input, split into 4 lanes of data, which
are followed by a series of checks. This flow is very odd for several reasons
though:</p>
<ol>
<li>We don't see any exit here: it basically ends with a division, and all other
exits lead to <code>failed_password</code> (the function that displays the error). What
we also don't see in the graph is that after the last instruction (<code>div</code>,
<code>Oddity #1</code>), the code falls through into the <code>failed_password</code> code, just
like the other exit code paths</li>
<li>We see an explicit check only for the first and second lanes, the 2 others
are somehow used in the division, but even there, only slices of them are
used, the rest is stored at some random global location (in the <code>.bss</code>, at
<code>0x4040b0</code> and <code>0x4040a8</code> respectively)</li>
<li>128 bits of data are stored at <code>0x4040b0</code> (<code>Oddity #0</code>): we'll see later why
this is strange</li>
</ol>
<p>The only way I would see this flow go somewhere else would be some sort of
exception/interruption. Looking through all the instructions again, the only
one I see causing anything like this would be the last <code>div</code> instruction, with
a floating point exception. But how could that even be caught and handled, we
didn't see anything about it in the main or in the validate function.</p>
<p>At some point, something grabbed my attention:</p>
<pre><code class="language-plaintext">Relocation section '.rela.plt' at offset 0x598 contains 6 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000404018 000100000007 R_X86_64_JUMP_SLO 0000000000000000 puts@GLIBC_2.2.5 + 0
000000404020 000200000007 R_X86_64_JUMP_SLO 0000000000000000 write@GLIBC_2.2.5 + 0
000000404028 000300000007 R_X86_64_JUMP_SLO 0000000000000000 strlen@GLIBC_2.2.5 + 0
000000404030 000500000007 R_X86_64_JUMP_SLO 0000000000000000 fgets@GLIBC_2.2.5 + 0
000000404038 000600000007 R_X86_64_JUMP_SLO 0000000000000000 signal@GLIBC_2.2.5 + 0
000000404040 000800000007 R_X86_64_JUMP_SLO 0000000000000000 exit@GLIBC_2.2.5 + 0
</code></pre>
<p>There is a <code>signal</code> symbol in the relocation section, so there must be code
somewhere calling this function, and it must certainly happens before the
<code>main</code>. Tracing back the function usage from Ghidra land us here (again, code
reworked from its decompiled form):</p>
<pre><code class="language-c">void _INIT_1(void)
{
signal(SIGFPE, handle_fpe);
return;
}
</code></pre>
<p>But how does this function end up being called?</p>
<h2>Program entry point</h2>
<p>At this point I needed to dive quite extensively into the Linux program startup
procedure in order to understand what the hell was going on. I didn't need to
understand it all during the reverse, but I came back to it later on to clarify
the situation. I'll try to explain the best I can how it essentially works
because it's probably the most useful piece of information I got out of this
experience. Brace yourselves.</p>
<h3>Modern (glibc ≥ 2.34, around 2018)</h3>
<p>On a Linux system with a modern glibc, if we try to compile <code>int main(){return 0;}</code> into an ELF binary (<code>cc test.c -o test</code>), the file <code>crt1.o</code> (for <em>Core
Runtime Object</em>) or one of its variant such as <code>Scrt1.o</code> (<code>S</code> for "shared") is
linked into the final executable by the toolchain linker. These object files
are distributed by our libc package, glibc being the most common one.</p>
<p>They contain the real entry point of the program, identified by the label
<code>_start</code>. Their bootstrap code is actually fairly short:</p>
<pre><code class="language-shell">% objdump -d -Mintel /usr/lib/Scrt1.o
/usr/lib/Scrt1.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_start>:
0: f3 0f 1e fa endbr64
4: 31 ed xor ebp,ebp
6: 49 89 d1 mov r9,rdx
9: 5e pop rsi
a: 48 89 e2 mov rdx,rsp
d: 48 83 e4 f0 and rsp,0xfffffffffffffff0
11: 50 push rax
12: 54 push rsp
13: 45 31 c0 xor r8d,r8d
16: 31 c9 xor ecx,ecx
18: 48 8b 3d 00 00 00 00 mov rdi,QWORD PTR [rip+0x0] # 1f <_start+0x1f>
1f: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # 25 <_start+0x25>
25: f4 hlt
</code></pre>
<p>If we look closely at the assembly above, we notice it's a skeleton with a few
placeholders. More specifically the <code>call</code> argument and the <code>rdi</code> register just
before. These are respectively going to be replaced at link time with a call to
the <code>__libc_start_main()</code> function, and a pointer to the <code>main</code> function. Using
<code>objdump -r</code> clarifies these relocation entries:</p>
<pre><code class="language-plaintext"> 18: 48 8b 3d 00 00 00 00 mov rdi,QWORD PTR [rip+0x0] # 1f <_start+0x1f>
1b: R_X86_64_REX_GOTPCRELX main-0x4
1f: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # 25 <_start+0x25>
21: R_X86_64_GOTPCRELX __libc_start_main-0x4
</code></pre>
<p>Note that <code>__libc_start_main()</code> is an external function: it is located inside
the glibc itself (typically <code>/usr/lib/libc.so.6</code>).</p>
<p>Said in more simple terms, what this code is essentially doing is jumping
straight into the libc by calling <code>__libc_start_main(main, <a few other args>)</code>. That function will be responsible for calling <code>main</code> itself, using the
transmitted pointer.</p>
<p>Why not call directly the <code>main</code>? Well, there might be some stuff to initialize
before the <code>main</code>. Either in externally linked libraries, or simply through
constructors.</p>
<p>Here is an example of a C code with such a construct:</p>
<pre><code class="language-c">#include <stdio.h>
__attribute__((constructor))
static void ctor(void)
{
printf("ctor\n");
}
int main()
{
printf("main\n");
return 0;
}
</code></pre>
<pre><code class="language-shell">% cc test.c -o test && ./test
ctor
main
</code></pre>
<p>In this case, a pointer to <code>ctor</code> is stored in a table in one of the ELF
section: <code>.init_array</code>. At some point in <code>__libc_start_main()</code>, all the
functions of that array are going to be called one by one.</p>
<p>With this executable loaded into Ghidra, we can observe this table at that
particular section:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/init_array_ctor_example.png" alt="centerimg" /></p>
<p>So basically a table of 2 function pointers, the latter being our custom <code>ctor</code>
function.</p>
<p>The way that code is able to access the ELF header is for another story.
Similarly, even though related, I'm going to skip details about the dynamic
linker. I'll just point out that the program has an <code>.interp</code> section with a
string such as <code>"/lib64/ld-linux-x86-64.so.2"</code> identifying the dynamic linker
to use (which is also an ELF program, see <code>man ld.so</code> for more information).
This program is actually executed before our <code>main</code> as well since it is
responsible for loading the dynamic libraries.</p>
<h3>Legacy (glibc < 2.34)</h3>
<p>So far we've seen how a modern program is built and started, but it wasn't
always exactly like this. It actually changed "recently", around 2018. We have
to study how it was before because the crackme we're interested in is actually
compiled in these pre-2018 conditions. The patterns we get don't match the
modern construct we just observed.</p>
<p>If we look at how the <code>Scrt1.o</code> of glibc was before 2.34, we get the following:</p>
<pre><code class="language-plaintext">0000000000000000 <_start>:
0: 31 ed xor ebp,ebp
2: 49 89 d1 mov r9,rdx
5: 5e pop rsi
6: 48 89 e2 mov rdx,rsp
9: 48 83 e4 f0 and rsp,0xfffffffffffffff0
d: 50 push rax
e: 54 push rsp
f: 4c 8b 05 00 00 00 00 mov r8,QWORD PTR [rip+0x0] # 16 <_start+0x16>
16: 48 8b 0d 00 00 00 00 mov rcx,QWORD PTR [rip+0x0] # 1d <_start+0x1d>
1d: 48 8b 3d 00 00 00 00 mov rdi,QWORD PTR [rip+0x0] # 24 <_start+0x24>
24: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # 2a <_start+0x2a>
2a: f4 hlt
</code></pre>
<p>It's pretty similar to what we've seen before but we can see more relocation
entries (see <code>r8</code> and <code>rcx</code> registers). A grasp on the x86-64 calling
convention is going to be helpful here: a function is expected to read its
arguments in the following register order: <code>rdi</code>, <code>rsi</code>, <code>rdx</code>, <code>rcx</code>, <code>r8</code>,
<code>r9</code> (assuming no floats). In the dump above we can actually see all these
registers being loaded before the <code>call</code> instruction, so they're very likely
preparing the arguments for that <code>__libc_start_main</code> call.</p>
<p>At this point, we need to know more about <code>__libc_start_main</code> actual prototype.
Looking on the web for it, we may land on such a page:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/libc_start_main_web.png" alt="centerimg" /></p>
<p><strong>This is extremelly outdated</strong>. It is actually a prototype from a long time
ago when the <code>init</code> function passed as argument didn't receive any parameter.
The prototype for <code>__libc_start_main</code> in glibc now looks like this (extracted,
tweaked and commented for clarity from <code>glibc/csu/libc-start.c</code>):</p>
<pre><code class="language-c">int __libc_start_main(
int (*main)(int, char **, char ** MAIN_AUXVEC_DECL), /* RDI */
int argc, /* RSI */
char **argv, /* RDX */
__typeof (main) init, /* RCX */
void (*fini)(void), /* R8 */
void (*rtld_fini)(void), /* R9 */
void *stack_end /* RSP (stack pointer) */
)
</code></pre>
<p>The <code>init</code> parameter now matches the prototype of the <code>main</code>. For those
interested in archaeology, this is true <a href="https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=2b089f2101801ca2a3295fcd755261288ce6268e">since 2003</a>, which I
believe is around the Palaeolithic period.</p>
<p>Going back to our <code>__libc_start_main()</code> call at the entry point: there is now 2
extra arguments compared to the modern version: <code>rcx</code> (the <code>init</code> argument) and
<code>r8</code> (the <code>fini</code> argument). These are respectively going to point to two
functions respectively called <code>__libc_csu_init</code> and <code>__libc_csu_fini</code>. In
Ghidra if the binary is not stripped we observe the following:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/libc_csu_fini_init.png" alt="centerimg" /></p>
<p>Now here is the trick: where do you think these functions are located? One may
expect to have them in the glibc, just like <code>__libc_start_main</code>, but that's not
the case. They are actually embedded within our ELF binary. The reason for this
is still unclear to me.</p>
<p>The mechanism of injecting that code inside the binary was also a mystery to
me: while the canonical <code>crt1.o</code> mechanism is followed by build toolchains
since forever, that object doesn't contain <code>__libc_csu_init</code> and
<code>__libc_csu_fini</code>. So where the hell do they even come from? Well, here is the
magic trick (thank you <code>strace</code>):</p>
<pre><code class="language-shell">% file /lib/libc.so
/lib/libc.so: ASCII text
% cat /lib/libc.so
/* GNU ld script
Use the shared library, but some functions are only in
the static library, so try that secondarily. */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /usr/lib/libc.so.6 /usr/lib/libc_nonshared.a AS_NEEDED ( /usr/lib/ld-linux-x86-64.so.2 ) )
</code></pre>
<p>That's right, just as deceptive as <code>ld.so</code> is a program, <code>libc.so</code> is a linker
script. We see it instructing the linker to use <code>libc_nonshared.a</code>, which is
another file distributed by the glibc, containing a bunch of functions, notably
<code>__libc_csu_init</code> and <code>__libc_csu_fini</code>. This means that thanks to this script,
this static non-shared archive containing yet another batch of weird init
routines, is dumped into every dynamically linked ELF executable. I'm still
having a hard time processing this.</p>
<p>Note that <code>libc_nonshared.a</code> still exists in the modern setup (as of 2.36 at
least), but it's much smaller and doesn't have those functions anymore.</p>
<p>So what are these functions doing? Well, they're responsible for calling the
pre and post-main functions, just like <code>__libc_start_main</code> is doing in its
modern setup. Here is what they looked like before getting removed in glibc
2.34 (extracted and simplified from <code>glibc/csu/elf-init.c</code> in 2.33):</p>
<pre><code class="language-c">void __libc_csu_init (int argc, char **argv, char **envp)
{
_init ();
const size_t size = __init_array_end - __init_array_start;
for (size_t i = 0; i < size; i++)
(*__init_array_start [i]) (argc, argv, envp);
}
void __libc_csu_fini (void)
{
_fini ();
}
</code></pre>
<p><strong>Note</strong>: CSU likely stands for "C Start Up" or "Canonical Start Up".</p>
<p>The <a href="https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=035c012e32c11e84d64905efaf55e74f704d3668">commit removing these functions</a> is actually pretty damn
interesting and we can learn a lot from it:</p>
<ol>
<li>it has security implications: the ROP gadgets referred to are basically
snippets of instructions that are useful for exploitation, having them in
the binary is a liability</li>
<li><code>__libc_start_main()</code> kept its prototype for backward compatibility, so
<code>init</code> and <code>fini</code> arguments are still there, just passed as <code>NULL</code> (look at
the 2 <code>xor</code> instructions in the modern <code>Scrt1.o</code> shared earlier)</li>
<li>the forward compatibility on the other hand is not possible: we can run an
old executable on a modern system, but we cannot run a modern executable on
an old system</li>
</ol>
<p>With all that new knowledge we are now armed to decipher the startup mechanism
of our crackme.</p>
<h2>Within Ghidra</h2>
<p>After analysis, the entry point of our crackme looks like this:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/ghidra_entry.png" alt="centerimg" /></p>
<p>We recognize the <code>_start</code> pattern of our <code>crt1.o</code>. More specifically, we can
see that it's loading 2 pointers in <code>rcx</code> and <code>r8</code>, so we know we're in the
pattern pre-2018:</p>
<ul>
<li><code>r8</code>: <code>FUN_00401730</code> is <code>__libc_csu_fini</code></li>
<li><code>rcx</code>: <code>FUN_004016c0</code> is <code>__libc_csu_init</code></li>
<li><code>rdi</code>: <code>LAB_004010b0</code> is <code>main</code></li>
</ul>
<p>If we want to find the custom inits, we have to follow <code>__libc_csu_init</code>, where
we can see it matching the snippet shared earlier, except <code>__init_array_start</code>
is named <code>__DT_INIT_ARRAY</code> but still located at the <code>.init_array</code> ELF section.
And in that table, we find again our init callbacks:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/init_array.png" alt="centerimg" /></p>
<p><code>_INIT_0</code> corresponds to <code>frame_dummy</code>, and <code>_INIT_1</code> is the first user
constructor. So just like <code>ctor</code> in sample C code, we are interested in what's
happening in <code>_INIT_1</code>, which is the function shown earlier calling <code>signal</code>.</p>
<p>Of course, someone familiar with this pattern will go straight into the
<code>.init_array</code> section, but with crackmes you never know if they're actually
going to follow the expected path, so it's a good thing to be familiar with the
complete execution path.</p>
<h2>Going deeper, uncovering Ghidra bugs</h2>
<p>We could stop our research on the init procedure here but I have to make a
detour to talk about some unfortunate things in x86-64 and Ghidra (as of
10.1.5).</p>
<p>If we look at the decompiler view of the entry point, we see a weird prototype:</p>
<pre><code class="language-c">void entry(undefined8 param_1,undefined8 param_2,undefined8 param_3)
{
/* ... */
}
</code></pre>
<p>The thing is, when a program entry point is called, it's not supposed to have 3
arguments like that. According to glibc <code>sysdeps/x86_64/start.S</code> (which is the
source of <code>crt1.o</code>), here are the actual inputs for <code>_start</code>:</p>
<pre><code class="language-plaintext">This is the canonical entry point, usually the first thing in the text
segment. The SVR4/i386 ABI (pages 3-31, 3-32) says that when the entry
point runs, most registers' values are unspecified, except for:
%rdx Contains a function pointer to be registered with `atexit'.
This is how the dynamic linker arranges to have DT_FINI
functions called for shared libraries that have been loaded
before this code runs.
%rsp The stack contains the arguments and environment:
0(%rsp) argc
LP_SIZE(%rsp) argv[0]
...
(LP_SIZE*argc)(%rsp) NULL
(LP_SIZE*(argc+1))(%rsp) envp[0]
...
</code></pre>
<p>Basically only the <code>rdx</code> register is expected to be set (along with the stack
and its register) which the program entry function usually forwards down to
<code>__libc_start_main</code> (as <code>rtld_fini</code> argument) which itself passes it down to
<code>atexit</code>. You will find similar information in the kernel in its ELF loader
code.</p>
<p>Do you remember the x86-64 calling convention from earlier? The function
arguments are passed in the following register order: <code>rdi</code>, <code>rsi</code>, <code>rdx</code>,
<code>rcx</code>, <code>r8</code>, <code>r9</code>. But like we just saw the entry point code of the program is
expected to only read <code>rdx</code> (equivalent to the 3rd argument in the calling
convention), while <code>rdi</code> and <code>rsi</code> content is undefined. Since the program
entry point is usually respecting that (reading <code>rdx</code> to get <code>rtld_fini</code>),
Ghidra infers that the 1st and 2nd argument must also exist, and get confused
when <code>rdi</code> and <code>rsi</code> are actually overridden to setup the call to
<code>__libc_start_main</code> instead.</p>
<p>Now one may ask, why even use <code>rdx</code> in the 1st place if it conflicts with the
calling convention? Well, on 32-bit it uses <code>edx</code>, which makes a little more
sense to use since it doesn't overlap with the calling convention: all the
function arguments are expected to be on the stack on 32-bit. And during the
move to 64-bit they unfortunately just extended <code>edx</code> into <code>rdx</code>.</p>
<p>While not immediately problematic, I still don't know why they decided to use
<code>edx</code> on 32-bit in the kernel instead of the stack though; apparently this is
described in "SVR4/i386 ABI (pages 3-31, 3-32)" but I couldn't find much
information about it.</p>
<p>Anyway, all of this to say that until the NSA fixes <a href="https://github.com/NationalSecurityAgency/ghidra/issues/4667">the bug</a>, I'd
recommend to override the <code>_start</code> prototype: <code>void entry(undefined8 param_1,undefined8 param_2,undefined8 param_3)</code> should be <code>void _start(void)</code>,
and you should expect the code to read the <code>rdx</code> register.</p>
<h2>Remaining bits of the algorithm</h2>
<p>Alright, so we're back to our previous flow. Assuming the division raised a
floating point error, we're following the callback forwarded to <code>signal()</code>, and
we end up at another location, which after various renames and retyping in
Ghidra decompiler looks like this:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/handle_fpe.png" alt="centerimg" /></p>
<p>I'll spare you the details since it's an overly complex implementation of a
very simple routine:</p>
<ol>
<li>read the 2 halves of registers stored earlier (remember half of <code>lane2</code> and
<code>lane3</code> were stored for later use, here is where we read them back)</li>
<li>check that those are different</li>
<li>for each halves, make the sum of each element of the data by slicing it in
nibbles (4-bits), with each nibble value being permuted using a simple table</li>
<li>check that the checksums are the same</li>
</ol>
<p>And that's pretty much it.</p>
<p>Now we roughly know how the 64 bytes of input are read and checked. There is
one thing we need to study more though: the <code>div</code> instruction.</p>
<h2>Oddity #1: the division</h2>
<p>We need to understand how the <code>div</code> instruction works since it's the trigger to
our success path. Here is what the relevant Intel documentation says about it:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/div.png" alt="centerimg" /></p>
<p>In English this means that if we have <code>div rbx</code>, then the registers <code>rdx</code> and
<code>rax</code> are combined together to form a single 128-bit value, which is then
divided by <code>rbx</code>.</p>
<p>As a reminder, the chunk doing the division looks like this:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/div_asm.png" alt="centerimg" /></p>
<ul>
<li>Our divider is <code>rbx</code>, a large hardcoded number: <code>0xffff231203</code> (meaning the
exception cannot be a division by zero, but could be an overflow)</li>
<li><code>rax</code> contains the lower part of the <code>xmm3</code> register (the 4th lane) xored
with the higher part of the <code>xmm2</code> register (the 3rd lane)</li>
<li><code>rdx</code> contains… wait, what does it contain? We don't know.</li>
</ul>
<p>Looking through the code, <code>rdx</code> value looks pretty much undefined. If it's big
enough, the result of the division will luckily not fit in a 64-bit register
and will overflow, causing the floating point exception. Under "normal"
conditions it seems to happen, but if run through let's say <code>valgrind</code>, <code>rdx</code>
will be initialized to something else and the overflow won't be triggered.</p>
<p>This is actually a bug, an undefined behaviour in the crackme. That's too bad
because the original idea was pretty good. But it also means we won't have to
think much about whatever data we put into that part of the input.</p>
<h2>Oddity #0</h2>
<p>One last oddity before we're ready to write a keygen: the <code>Oddity #0</code> is a
write of a 128-bit register at an address where only 64 bits are available,
located at the end of the <code>.bss</code> section. For some reason the code still works
so I'm assuming we are lucky thanks to some padding in the memory map…</p>
<p>The issue can actually easily be noticed because it drives the decompiler nuts
in that area:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/invalid_write.png" alt="centerimg" /></p>
<p>If you patch the instruction from <code>xmmword ptr [0x004040b0],XMM1</code> to <code>xmmword ptr [0x004040a8],XMM1</code>, you'll observe everything going back to normal in the
decompiler view.</p>
<p>I later became aware about <a href="https://github.com/JCWasmx86/Crackme/">the code source of the crackme on Github</a>,
so I could see why the mistake happened in the first place. I <a href="https://github.com/JCWasmx86/Crackme/issues/2">reported the
issue</a> if you want more information on that topic.</p>
<h2>Writing the keygen</h2>
<p>Onto the final step: writing a keygen.</p>
<p>To summarize all the conditions that need to be met:</p>
<ol>
<li>input length must be 64-bytes long</li>
<li>xor'ing each character of the 1st lane with each other (after encoding with
the xor key) must be 0</li>
<li>the sum of all the characters of the 2nd lane must be equal to: <code>(lane0[11] ^ xor_key[11]) × 136 + 314</code></li>
<li>the first half of the 3rd lane and the 2nd half of the 4th lane must be
different</li>
<li>the sum of the permuted nibbles of the first half of the 3rd lane and the
2nd half of the 4th lane must be equal</li>
<li>the 2nd half of the 3rd lane and the 1st half of the 1st lane don't really
matter</li>
</ol>
<p>I don't think solving this part is the most interesting, particularly for a
reader, but I described the strategy I followed in the keygen code, so I'll
just share it as is:</p>
<pre><code class="language-python"># Range of allowed characters in the input; we'll use the xor key as part of
# the password so we're kind of constraint to its range
xor_key = bytes.fromhex("64 47 34 36 72 73 6b 6a 38 2d 34 35 37 28 7e 3a")
ord_min, ord_max = min(xor_key), max(xor_key)
def xor0(data: str) -> int:
"""Encode the data using the xor key"""
assert len(data) == len(xor_key) == 16
r = 0
for c, x in zip(data, xor_key):
r ^= ord(c) ^ x
return r
def get_lane0(k11: str) -> str:
"""
Compute lane0 of the input
We have the following constraints on lane0:
- the character at position 11 must be k11
- xoring all characters must give 0
- input characters must be within accepted range (self-imposed)
Strategy:
- start with the xor key itself because the xor reduce will give our
perfect zero score
- replace the 11th char with our k11 and figure out which bits get off
because of it
- go through each character to see if we can flip the off bits
"""
lane0 = "".join(map(chr, xor_key))
lane0 = lane0[:11] + k11 + lane0[12:]
off = xor0(lane0)
off_bits = [(1 << i) for i in range(8) if off & (1 << i)]
fixed_lane0 = lane0
for i, c in enumerate(lane0):
if i == 11:
continue
remains = []
for bit in list(off_bits):
o = ord(c) ^ bit
if ord_min <= o <= ord_max:
c = chr(o)
else:
remains.append(bit)
fixed_lane0 = fixed_lane0[:i] + c + fixed_lane0[i + 1 :]
off_bits = remains
if not off_bits:
break
assert not off_bits
off = xor0(fixed_lane0)
assert xor0(fixed_lane0) == 0
return fixed_lane0
def get_lane1(t: int) -> str:
# First estimate by taking the average
avg_ord = t // 16
assert ord_min <= avg_ord <= ord_max
lane1 = [avg_ord] * 16
# Adjust with off by ones to reach target if necessary
off = sum(lane1) - t
if off:
sgn = [-1, 1][off < 0]
for i in range(abs(off)):
lane1[i] += sgn
assert sum(lane1) == t
return "".join(map(chr, lane1))
def get_divdata():
# The div data doesn't really matter, so we just use some slashes to carry
# the division meaning
d0 = d1 = "/" * 8
return d0, d1
def chksum4(data: str) -> int:
"""nibble (4-bit) checksum"""
permutes4 = [0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4]
return sum(permutes4[ord(c) >> 4] << 4 | permutes4[ord(c) & 0xF] for c in data)
def get_chksums4():
# We need the value to be different but the checksum to be the same, so we
# simply interleave 2 working characters differently
c0 = (chr(ord_min) + chr(ord_max)) * 4
c1 = (chr(ord_max) + chr(ord_min)) * 4
assert c0 != c1
assert chksum4(c0) == chksum4(c1)
return c0, c1
def get_passwords():
# The user input key is composed of 4x16B, which will be referred to as 4
# lanes: lane[0..3]. The character at lane0[11] defines what is going to be the
# target T that S=sum(lane1) will need to reach. Here we compute all potential
# T value that can be obtained within our range of characters.
x11 = xor_key[11]
allowed_ords = range(ord_min, ord_max + 1)
all_t = {136 * (o ^ x11) + 314: o for o in allowed_ords}
# Compute the extreme sums our input lane1 can reach and filter out T values
# that land outside these boundaries
min_t, max_t = ord_min * 16, ord_max * 16
possible_t = {t: chr(k11) for t, k11 in all_t.items() if min_t <= t <= max_t}
for t, k11 in possible_t.items():
lane0 = get_lane0(k11)
lane1 = get_lane1(t)
d0, d1 = get_divdata()
c0, c1 = get_chksums4()
lane2 = c0 + d0
lane3 = d1 + c1
password = lane0 + lane1 + lane2 + lane3
assert len(password) == 16 * 4
yield password
for password in get_passwords():
print(password)
</code></pre>
<p>It executes instantly and gives the following keys (it's not exhaustive to all
possibilities):</p>
<pre><code class="language-shell">% python 615888be33c5d4329c344f66_cm001.py
aG46rskj8-407(~:??>>>>>>>>>>>>>>(~(~(~(~////////////////~(~(~(~(
`G46rskj8-417(~:6666666666555555(~(~(~(~////////////////~(~(~(~(
cG46rskj8-427(~:PPOOOOOOOOOOOOOO(~(~(~(~////////////////~(~(~(~(
bG46rskj8-437(~:GGGGGGGGGGFFFFFF(~(~(~(~////////////////~(~(~(~(
gG46rskj8-467(~:..--------------(~(~(~(~////////////////~(~(~(~(
hG46rskj8-497(~:zzzzzzzzzzyyyyyy(~(~(~(~////////////////~(~(~(~(
mG46rskj8-4<7(~:aa``````````````(~(~(~(~////////////////~(~(~(~(
lG46rskj8-4=7(~:XXXXXXXXXXWWWWWW(~(~(~(~////////////////~(~(~(~(
oG46rskj8-4>7(~:rrqqqqqqqqqqqqqq(~(~(~(~////////////////~(~(~(~(
nG46rskj8-4?7(~:iiiiiiiiiihhhhhh(~(~(~(~////////////////~(~(~(~(
</code></pre>
<p>All of these seem to be working keys. We can visually see how each segment
corresponds to a specific part of the algorithm. The keys are ugly, but at
least they're printable.</p>
<p>The most tricky part for me was to anticipate the range of guaranteed keys, due
to the dependency between <code>lane0</code> and <code>lane1</code>, the rest was relatively simple.</p>
<h2>Conclusion</h2>
<p>I didn't expect such a ride to be honest. There were just so many incentives to
dig down the rabbit hole of various intricacies. The bugs in the crackme caused
me a lot of confusion, but I don't think they're even close to the obfuscated
level of the glibc and its messy history of deceptive patterns.</p>
http://blog.pkh.me/p/33-deconstructing-be%CC%81zier-curves.html
http://blog.pkh.me/p/33-deconstructing-be%CC%81zier-curves.html
Deconstructing Bézier curvesTue, 16 Aug 2022 06:29:19 -0000<p>Graphists, animators, game programmers, font designers, and other graphics
professionals and enthusiasts are often working with Bézier curves. They're
popular, extensively documented, and used pretty much everywhere. That being
said, I find them being explained almost exclusively in 2 or 3 dimensions,
which can be a source of confusion in various situations. I'll try to
deconstruct them a bit further in this article. At the end or the post, we'll
conclude with a concrete example where this deconstruction is helpful.</p>
<h2>A Bézier curve in pop culture</h2>
<p>Most people are first confronted with Bézier curves through an UI that may look
like this:</p>
<p><img src="http://blog.pkh.me/img/bezier/b3-labels.png" alt="centerimg" /></p>
<p>In this case the curve is composed of 4 user controllable points, meaning it's
a Cubic Bézier.</p>
<p><code>C₀</code>, <code>C₁</code>, <code>C₂</code> and <code>C₃</code> are respectively the start, controls and end 2D point
coordinates. Evaluating this formula for all the <code>t</code> values within <code>[0;1]</code> will
give all the points of the curve. Simple enough.</p>
<p>Now this is obvious but the important take here is that this formula applies
<strong>to each dimension</strong>. Since we are working in 2D here, it is evaluated on both
the x and y-axis. As a result, a more explicit writing of the formula would be:</p>
<p><img src="http://blog.pkh.me/img/bezier/bezier-0.png" alt="centerimg" /></p>
<p><strong>Note</strong>: if we were working with Bézier in 3D space, the <code>C</code> vectors would be
in 3D as well.</p>
<p>Intuitively, you may start to see in the mathematical form how each point
contributes to the curve, but it involves some tricky mental gymnastic (at
least for me). So before diving into the multidimensional aspect, we will
simplify the problem by looking into lower degrees.</p>
<h2>Lower degrees</h2>
<p>As implied by its name, the <strong>Cubic</strong> curve <code>B₃(t)</code> is of the 3rd degree. The
2nd most popular curve is the <strong>Quadratic</strong> curve <code>B₂(t)</code> where instead of 2
control points, we only have one (<code>Q₁</code>, in the middle):</p>
<p><img src="http://blog.pkh.me/img/bezier/b2-labels.png" alt="centerimg" /></p>
<p>Can we go lower? Well, there is a "1st degree Bézier curve" but you won't hear
that term very often, because after removing the remaining control point:</p>
<p><img src="http://blog.pkh.me/img/bezier/b1-labels.png" alt="centerimg" /></p>
<p>The "curve" is now a simple line between the 2 points. Still, the concept of
interpolation between the points is consistent/symmetric with the cubic and the
quadratic.</p>
<p>Do you recognize the formula (see title of the figure)? Yes, this is <a href="http://blog.pkh.me/p/29-the-most-useful-math-formulas.html">mix(),
one of the most useful math formula</a>!
The contribution of each factor should make sense this time: <code>t</code> varies within
<code>[0;1]</code>, at <code>t=0</code> we have 100% of <code>L₀</code> (the starting point), at <code>t=1</code> we have
100% of <code>L₁</code>, in the middle at <code>t=½</code> we have 50% of each, etc. All intermediate
values of <code>t</code> define a straight line between these 2 points. We have a simple
linear interpolation.</p>
<p>The presence of this function in the 1st degree is not just a coincidence:
<strong>the <code>mix</code> function is actually the corner stone of all the Bézier curves</strong>.
Indeed, we can build up the Bézier formulas using exclusively nested <code>mix()</code>:</p>
<ul>
<li><code>B₁(l₀,l₁,t) = mix(l₀,l₁,t)</code></li>
<li><code>B₂(q₀,q₁,q₂,t) = B₁(mix(q₀,q₁,t), mix(q₁,q₂,t), t)</code></li>
<li><code>B₃(c₀,c₁,c₂,c₃,t) = B₂(mix(c₀,c₁,t), mix(c₁,c₂,t), mix(c₂,c₃,t))</code></li>
</ul>
<p>This way of formulating the curves is basically <a href="https://en.wikipedia.org/wiki/De_Casteljau%27s_algorithm">De Casteljau's
algorithm</a>. You have no idea how much I love accidentally finding
yet again a relationship with my favourite mathematical function.</p>
<p>But back to our "Bézier 1st degree", remember that we are still in 2D:</p>
<p><img src="http://blog.pkh.me/img/bezier/bezier-1.png" alt="centerimg" /></p>
<p>This multi-dimensional graphic representation can be problematic because it is
<strong>exclusively spatial</strong>: if one is interested in the <code>t</code> parameter, it has to
be extrapolated visually from a twisted curve using mind bending powers, which
is not always practical.</p>
<h2>Mono-dimensional</h2>
<p>In order to represent <code>t</code>, we have to split each spatial dimension and draw
them according to <code>t</code> (defined within <code>[0;1]</code>).</p>
<p>Let's work this out with the following cubic curve (start point is
bottom-left):</p>
<p><img src="http://blog.pkh.me/img/bezier/cubic-2d.png" alt="centerimg" /></p>
<p>If we study this curve, we can see that the <code>x</code> is slightly decreasing, then
increasing for most of the curve, then slightly decreasing again. In
comparison, the <code>y</code> seems to be increasing, decreasing, then increasing again,
probably more strongly than with <code>x</code>. But can you tell for sure what their
respective curves actually look like precisely? I for sure can't, but my
computer can:</p>
<p><img src="http://blog.pkh.me/img/bezier/cubic-1d.png" alt="centerimg" /></p>
<p>Just to be extra clear: the formula is unchanged, we're simply tracing the x
and y dimensions separately according to <code>t</code> instead of plotting the curve in a
xy plane. Note that this means <strong><code>C₀</code>, <code>C₁</code>, <code>C₂</code> and <code>C₂</code> can now only change
vertically</strong>: they are respectively placed at <code>t=0</code>, <code>t=⅓</code>, <code>t=⅔</code> and <code>t=1</code>.
The vertical axis corresponds to their value on their respective plane.</p>
<p>Similarly, with a quadratic we would have <code>Q₀</code> at <code>t=0</code>, <code>Q₁</code> at <code>t=½</code> and <code>Q₂</code>
at <code>t=1</code>.</p>
<p>So what's so great about this representation? Well, first of all the curves are
not going backward anymore, they can be understood by following a left-to-right
reading everyone is familiar with: there is no shenanigan involved in the
interpretation anymore. Also, we are now going to be able to work them out in
algebraic form.</p>
<h2>Polynomial form</h2>
<p>So far we've looked at the curve under their Bézier form, but they can also be
expressed in their polynomial form:</p>
<pre><code class="language-plaintext">B₁(t) = (1-t)·L₀ + t·L₁
= (-L₀+L₁)·t + L₀
= a₁t + b₁
</code></pre>
<pre><code class="language-plaintext">B₂(t) = (1-t)²·Q₀ + 2(1-t)t·Q₁ + t²·Q₂
= (Q₀-2Q₁+Q₂)·t² + (-2Q₀+2Q₁)·t + Q₀
= a₂t² + b₂t + c₂
</code></pre>
<pre><code class="language-plaintext">B₃(t) = (1-t)³·C₀ + 3(1-t)²t·C₁ + 3(1-t)t²·C₂ + t³·C₃
= (-C₀+3C₁-3C₂+C₃)·t³ + (3C₀-6C₁+3C₂)·t² + (-3C₀+3C₁)·t + C₀
= a₃t³ + b₃t² + c₃t + d₃
</code></pre>
<p>This algebraic form is great because we can now plug the formula into a
polynomial root finding algorithm in order to identify the roots. Let's study a
concrete use case of this.</p>
<h2>Concrete use case: intersecting ray</h2>
<p>A fundamental problem of text rendering is figuring out whether a given pixel
<code>P</code> lands inside or outside the character shape (which is composed of a chain
of Bézier curves). The most common algorithms (<a href="https://en.wikipedia.org/wiki/Nonzero-rule">non-zero rule</a> or
<a href="https://en.wikipedia.org/wiki/Even-odd_rule">even-odd rule</a>) involve a <em>ray</em> going from the pixel position into an
arbitrary direction toward infinity (usually horizontal for simplicity). If we
can identify every intersection of this ray with each curve of the shape, we
can deduce if our pixel point <code>P=(Px,Py)</code> is inside or outside.</p>
<p>We will simplify the problem to the crossing of just one curve, using the one
from previous section. It would look like this with an arbitrary point <code>P</code>:</p>
<p><img src="http://blog.pkh.me/img/bezier/cubic-2d-ray.png" alt="centerimg" /></p>
<p>We're looking for the intersection coordinates, but how can we do that in 2D
space? Well, with an horizontal ray, we would have to know when the
y-coordinate of the curve is the same as the y-coordinate of <code>P</code>, so we first
have to solve <code>By(t) = Py</code>, or <code>By(t)-Py=0</code>, where <code>By(t)</code> is the <code>y</code> component
of the given Bézier curve <code>B(t)</code>.</p>
<p>This is a schoolbook <a href="https://en.wikipedia.org/wiki/Root-finding_algorithms">root finding</a> problem, because given that
<code>B(t)</code> is of the third degree, we end up solving the equation: <code>a₃t³ + b₃t² + c₃t + d₃ - Py = 0</code> (the <code>d₃ - Py</code> part is constant, so it acts as the last
coefficient of the polynomial). This gives us the <code>t</code> values (or roots), that
is where the ray crosses our <code>y</code> component.</p>
<p>Since this is a 3rd degree polynomial (highest power is 3), we will have <em>at
most</em> 3 points were the ray crosses the curve. In our case, we do actually get
the maximum number of roots:</p>
<p><img src="http://blog.pkh.me/img/bezier/cubic-1d-y-ray.png" alt="centerimg" /></p>
<p>Now that we have the <code>t</code> values on our curve (remember that <code>t</code> values are
common for both x and y axis), we can simply evaluate the <code>x</code> component of the
<code>B(t)</code> to obtain the <code>x</code> coordinate.</p>
<p><img src="http://blog.pkh.me/img/bezier/cubic-1d-x-ray.png" alt="centerimg" /></p>
<p>Using <code>Px</code>, we can filter which roots we want to keep. In this case,
<code>Px=-0.75</code>, so we're going to keep all the intersections (all the roots
x-coordinates are located above this value).</p>
<p>We could do exactly the same operation by solving <code>Bx(t)-Px=0</code> and evaluating
<code>By(t)</code> on the roots we found: this would give us the intersections with a
vertical ray instead of an horizontal one.</p>
<p>I'm voluntarily omitting a lot of technical details here, such as the root
finding algorithm and floating point inaccuracies challenges: the point is to
illustrate how the 1D deconstruction is essential in understanding and
manipulating Bézier curves.</p>
<h2>Bonus</h2>
<p>During the writing of this article, I made a small <code>matplotlib</code> demo which got
quite popular on Twitter, so I'm sharing it again:</p>
<div style="text-align:center">
<video src="http://blog.pkh.me/misc/bezier.webm" controls="controls" width="800">Animated Bézier curves</video>
</div>
<p>The script used to generate this video:</p>
<pre><code class="language-python">import matplotlib.pyplot as plt
import numpy as np
from matplotlib.animation import FuncAnimation
def mix(a, b, x): return (1 - x) * a + b * x
def linear(a, b, x): return (x - a) / (b - a)
def remap(a, b, c, d, x): return mix(c, d, linear(a, b, x))
def bezier1(p0, p1, t): return mix(p0, p1, t)
def bezier2(p0, p1, p2, t): return bezier1(mix(p0, p1, t), mix(p1, p2, t), t)
def bezier3(p0, p1, p2, p3, t): return bezier2(mix(p0, p1, t), mix(p1, p2, t), mix(p2, p3, t), t)
def _main():
pad = 0.05
bmin, bmax = -1, 1
x_color, y_color, xy_color = "#ff4444", "#44ff44", "#ffdd00"
np.random.seed(0)
r0, r1 = np.random.uniform(-1, 1, (2, 4))
r2, r3 = np.random.uniform(0, 2 * np.pi, (2, 4))
cfg = {
"axes.facecolor": "333333",
"figure.facecolor": "111111",
"font.family": "monospace",
"font.size": 9,
"grid.color": "666666",
}
plt.style.use("dark_background")
with plt.rc_context(cfg):
fig = plt.figure(figsize=[8, 4.5])
gs = fig.add_gridspec(nrows=2, ncols=3)
ax_x = fig.add_subplot(gs[0, 0])
ax_x.grid(True)
for i in range(4):
ax_x.axvline(x=i / 3, linestyle="--", alpha=0.5)
ax_x.axhline(y=0, alpha=0.5)
ax_x.set_xlabel("t")
ax_x.set_ylabel("x", rotation=0, color=x_color)
ax_x.set_xlim(0 - pad, 1 + pad)
ax_x.set_ylim(bmin - pad, bmax + pad)
(x_plt,) = ax_x.plot([], [], "-", color=x_color)
(x_plt_c0,) = ax_x.plot([], [], "o:", color=x_color)
(x_plt_c1,) = ax_x.plot([], [], "o:", color=x_color)
ax_y = fig.add_subplot(gs[1, 0])
ax_y.grid(True)
for i in range(4):
ax_y.axvline(x=i / 3, linestyle="--", alpha=0.5)
ax_y.axhline(y=0, alpha=0.5)
ax_y.set_xlabel("t")
ax_y.set_ylabel("y", rotation=0, color=y_color)
ax_y.set_xlim(0 - pad, 1 + pad)
ax_y.set_ylim(bmin - pad, bmax + pad)
(y_plt,) = ax_y.plot([], [], "-", color=y_color)
(y_plt_c0,) = ax_y.plot([], [], "o:", color=y_color)
(y_plt_c1,) = ax_y.plot([], [], "o:", color=y_color)
ax_xy = fig.add_subplot(gs[0:2, 1:3])
ax_xy.grid(True)
ax_xy.axvline(x=0, alpha=0.8)
ax_xy.axhline(y=0, alpha=0.8)
ax_xy.set_aspect("equal", "box")
ax_xy.set_xlabel("x", color=x_color)
ax_xy.set_ylabel("y", rotation=0, color=y_color)
ax_xy.set_xlim(bmin - pad, bmax + pad)
ax_xy.set_ylim(bmin - pad, bmax + pad)
(xy_plt,) = ax_xy.plot([], [], "-", color=xy_color)
(xy_plt_c0,) = ax_xy.plot([], [], "o:", color=xy_color)
(xy_plt_c1,) = ax_xy.plot([], [], "o:", color=xy_color)
fig.tight_layout()
def update(frame):
px = remap(-1, 1, bmin, bmax, np.sin(r0 * frame + r2))
py = remap(-1, 1, bmin, bmax, np.sin(r1 * frame + r3))
t = np.linspace(0, 1)
x = bezier3(px[0], px[1], px[2], px[3], t)
y = bezier3(py[0], py[1], py[2], py[3], t)
x_plt.set_data(t, x)
x_plt_c0.set_data((0, 1 / 3), (px[0], px[1]))
x_plt_c1.set_data((2 / 3, 1), (px[2], px[3]))
y_plt.set_data(t, y)
y_plt_c0.set_data((0, 1 / 3), (py[0], py[1]))
y_plt_c1.set_data((2 / 3, 1), (py[2], py[3]))
xy_plt.set_data(x, y)
xy_plt_c0.set_data((px[0], px[1]), (py[0], py[1]))
xy_plt_c1.set_data((px[2], px[3]), (py[2], py[3]))
duration, fps, speed = 15, 60, 3
frames = np.linspace(0, duration * speed, duration * fps)
anim = FuncAnimation(fig, update, frames=frames)
anim.save("/tmp/bezier.webm", fps=fps, codec="vp9", extra_args=["-preset", "veryslow", "-tune-content", "screen"])
if __name__ == "__main__":
_main()
</code></pre>
http://blog.pkh.me/p/32-invert-a-function-using-newton-iterations.html
http://blog.pkh.me/p/32-invert-a-function-using-newton-iterations.html
Invert a function using Newton iterationsThu, 11 Aug 2022 06:59:53 -0000<p>Newton's method is probably one of the most popular algorithm for finding the
roots of a function through successive numeric approximations. In less cryptic
words, if you have an opaque function <code>f(x)</code>, and you need to solve <code>f(x)=0</code>
(finding where the function crosses the x-axis), the Newton-Raphson method
gives you a dead simple cookbook to achieve that (a few conditions need to be
met though).</p>
<p>I recently had to solve a similar problem where instead of finding the roots I
had to inverse the function. At first glance this may sound like two entirely
different problems but in practice it's almost the same thing. Since I barely
avoided a mental breakdown in the process of figuring that out, I thought it
would make sense to share the experience of walking the road to enlightenment.</p>
<h2>A function and its inverse</h2>
<p>We are given a funky function, let's say <code>f(x)=2/3(x+1)²-sin(x)-1</code>, and we want
to figure out its inverse <code>f¯¹()</code>:</p>
<p><img src="http://blog.pkh.me/img/newton/newton-01.png" alt="centerimg" /></p>
<p>The diagonal is highlighted for the symmetry to be more obvious. One thing you
may immediately wonder is how is such an inverse function even possible?
Indeed, if you look at <code>x=0</code>, the inverse function gives (at least) 2 <code>y</code>
values, which means it's impossible to trace according to the x-axis. What we
just did here is we swapped the axis: we simply drew <code>y=f(x)</code> and <code>x=f(y)</code>,
which means the axis do not correspond to the same thing whether we are looking
at one curve or the other. For <code>y=f(x)</code> (abbreviated <code>f</code> or <code>f(x)</code>), the
horizontal axis is the x-axis, and for <code>x=f(y)</code> (abbreviated <code>f¯¹</code> or <code>f¯¹(y)</code>)
the horizontal axis is the y-axis because we actually drew the curve according
to the vertical axis.</p>
<p>What can we do here to bring this problem back to reality? Well, first of all
we can reduce the domain and focus on only one segment of the function where
the function can be inverted. This is one of the condition that needs to be
met, otherwise it is simply impossible to solve because it doesn't make any
sense. So we'll redefine our problem to make it solvable by assuming our
function is actually defined in the range <code>R=[R₀,R₁]</code> which we arbitrarily set
to <code>R=[0.1;1.5]</code> in our case (could be anything as long as we have no
discontinuity):</p>
<p><img src="http://blog.pkh.me/img/newton/newton-02.png" alt="centerimg" /></p>
<p>Now <code>f'</code> (the derivative of <code>f</code>) is never null, implying there won't be
multiple solution for a given <code>x</code>, so we should be safe. Indeed, while we are
still tracing <code>f¯¹</code> by flipping the axis, we can see that it could also exist
in the same space as <code>f</code>, meaning we could now draw it according to the
horizontal axis, just like <code>f</code>.</p>
<p>What's so hard though? Bear with me for a moment, because this took me quite a
while to wrap my head around. The symmetry is such that it's trivial to go from
a point on <code>f</code> to a point on <code>f¯¹</code>:</p>
<p><img src="http://blog.pkh.me/img/newton/newton-03.png" alt="centerimg" /></p>
<p>Transforming point <code>A</code> into point <code>B</code> is a matter of simply swapping the
coordinates. Said differently, if I have a <code>x</code> coordinate, evaluating <code>f(x)</code>
will give me the <code>A.y</code> coordinate, so we have <code>A=(x,f(x))</code> and we can get <code>B</code>
with <code>B=(A.y,A.x)=(f(x),x)</code>. But while we are going to use this property, this
is not actually what we are looking for in the first place: our input is the
<code>x</code> coordinate of <code>B</code> (or the <code>y</code> coordinate of <code>A</code>) and we want the other
component.</p>
<p>So how do we do that? This is where root finding actually comes into play.</p>
<h2>Root finding</h2>
<p>We are going to distance ourselves a bit from the graphic representation (it
can be quite confusing anyway) and try to reason with algebra. Not that I'm
much more comfortable with it but we can manage something with the basics here.</p>
<p>The key to not getting your mind mixed up in <code>x</code> and <code>y</code> confusion is to use
different terms because we associate <code>x</code> and <code>y</code> respectively with the
horizontal and vertical axis. So instead we are going to redefine our functions
according to <code>u</code> and <code>v</code>. We have:</p>
<ol>
<li><code>f(u)=v</code></li>
<li><code>f¯¹(v)=u</code> (reminder: <code>v</code> is our input and <code>u</code> is what we are looking for)</li>
</ol>
<p><strong>Note</strong>: writing <code>f¯¹</code> doesn't mean our function is anything special, the <code>¯¹</code>
simply acts as some sort of semantic tagging, we could very well have written
<code>h(v)=v</code>. Both functions <code>f</code> and <code>f¯¹</code> are simply mapping a real number to
another one.</p>
<p>In the previous section we've seen than <code>f(u)=v</code> is actually equivalent to
<code>f¯¹(v)=u</code>. This may sound like an arbitrary statement, so let me rephrase it
differently: for a given value of <code>u</code> it only exists one corresponding value of
<code>v</code>. If we now feed that same <code>v</code> to <code>f¯¹</code> we will get <code>u</code> back. To paraphrase
this with algebra: <code>f¯¹(f(u)) = u</code>.</p>
<p>How does that all of this help us? Well it means that <code>f¯¹(v)=u</code> is equivalent
to <code>f(u)=v</code>. So all we have to do is solve <code>f(u)=v</code>, or <code>f(u)-v=0</code>. <strong>The
process of solving this equation to find <code>u</code> is equivalent to evaluating
<code>f¯¹(v)</code>.</strong></p>
<p>And there we have it, with a simple subtraction of <code>v</code>, we're back into known
territory. We declare a new function <code>g(u)=f(u)-v</code> and we are going to find its
root by solving <code>g(u)=0</code> with the help of Newton's method.</p>
<p>Summary with less babble:</p>
<pre><code class="language-plaintext">f¯¹(v)=u ⬄ f(u)=v
⬄ f(u)-v=0
⬄ g(u)=0 with g(u)=f(u)-v
</code></pre>
<h2>Newton's method</h2>
<p>The Newton iterations are dead-ass simple: it's a suite (or an iterative loop
if you prefer):</p>
<pre><code class="language-plaintext">uₙ₊₁ = uₙ - g(uₙ)/g'(uₙ)
</code></pre>
<p>…repeated as much as needed (it converges quickly).</p>
<ul>
<li><code>g</code> is the function from which we're trying to find the root</li>
<li><code>g'</code> its derivative</li>
<li><code>u</code> our current approximation, which gets closer to the truth at each
iteration</li>
</ul>
<p>We can evaluate <code>g</code> (<code>g(u)=f(u)-v</code>) but we need two more pieces to the puzzle:
<code>g'</code> and an initial value for <code>u</code>.</p>
<h3>Derivative</h3>
<p>There is actually something cool with the derivative <code>g'</code>: since <code>v</code> is a
constant term, the derivative of <code>g</code> is actually the derivative of <code>f</code>:
<code>g(u)=f(u)-v</code> so <code>g'(u)=f'(u)</code>.</p>
<p>This means that we can rewrite our iteration according to <code>f</code> instead of <code>g</code>:</p>
<pre><code class="language-plaintext">uₙ₊₁ = uₙ - (f(uₙ)-v)/f'(uₙ)
</code></pre>
<p>Now for the derivative <code>f'</code> itself we have two choices. If we know the function
<code>f</code>, we can derive it analytically. This should be the preferred choice if you
can because it's faster and more accurate. In our case:</p>
<pre><code class="language-plaintext"> f(x) = 2/3(x+1)² - sin(x) - 1
f'(x) = 4x/3 - cos(x) + 4/3
</code></pre>
<p>You can rely on the <a href="https://www.mathsisfun.com/calculus/derivatives-rules.html">derivative rules</a> to figure the analytic
formula for your function or… you can cheat by using "derivative …" on
<a href="https://www.wolframalpha.com/">WolframAlpha</a>.</p>
<p>But you may be in the situation where you don't actually have that information
because the function is opaque. In this case, you could use an approximation:
take a very small value <code>ε</code> (let's say <code>1e-6</code>) and approximate the derivative
with for example <code>f'(x)=(f(x+ε)-f(x-ε))/(2ε)</code>. It's a dumb trick: we're
basically figuring out the slope by taking two very close points around <code>x</code>.
This would also work by using <code>g</code> instead of <code>f</code>, but you have to compute two
extra subtractions (the <code>- v</code>) for no benefit because they cancel each others.</p>
<h3>Initial approximation</h3>
<p>For the 3rd and last piece of the puzzle, the initial <code>u</code>, we need to figure
out something more elaborate. The simplest we can do is to start with a first
approximation function <code>f₀¯¹</code> as a straight line between the point <code>(f(R₀),R₀)</code>
and <code>(f(R₁),R₁)</code>. How do we create a function that linearly link these 2 points
together? We of course use <a href="http://blog.pkh.me/p/29-the-most-useful-math-formulas.html">one of the most useful math formulas</a>:
<code>remap(a,b,c,d,x) = mix(c,d,linear(a,b,x))</code>, and we evaluate it for our first
approximation value <code>u₀</code>:</p>
<pre><code class="language-plaintext">u₀ = remap(f(R₀),f(R₁),R₀,R₁,v)
</code></pre>
<p>If your boundaries are simpler, typically if <code>R=[0;1]</code>, this expression can be
dramatically simplified. A <code>linear()</code> might be enough, or even a simple
division. We have a pathological case here so we're using the generic
expression.</p>
<p>We get:</p>
<p><img src="http://blog.pkh.me/img/newton/newton-04.png" alt="centerimg" /></p>
<p>Close enough, we can start iterating from here.</p>
<h3>Iterating</h3>
<p>If we do a single Newton iteration, <code>u₁ = u₀ - (f(u₀)-v)/f'(u₀)</code> our straight
line becomes:</p>
<p><img src="http://blog.pkh.me/img/newton/newton-05.png" alt="centerimg" /></p>
<p>With one more iteration:</p>
<p><img src="http://blog.pkh.me/img/newton/newton-06.png" alt="centerimg" /></p>
<p>Seems like we're getting pretty close, aren't we?</p>
<p>If you want to converge even faster, you may want to consider <a href="https://en.wikipedia.org/wiki/Halley's_method">Halley's
method</a>. It's more expensive to
compute, but 1 iteration of Halley may cost less than 2 iterations of Newton.
Up to you to study if the trade-off is worth it.</p>
<h2>Demo code</h2>
<p>If you want to play with this, here is a <code>matplotlib</code> demo generating a graphic
pretty similar to what's found in this post:</p>
<pre><code class="language-python">import numpy as np
import matplotlib.pyplot as plt
N = 1 # Number of iterations
R0, R1 = (0.1, 1.5) # Reduced domain
# The function to inverse and its derivative
def f(x): return 2 / 3 * (x + 1) ** 2 - np.sin(x) - 1
def d(x): return 4 / 3 * x - np.cos(x) + 4 / 3
# The most useful math functions
def mix(a, b, x): return a * (1 - x) + b * x
def linear(a, b, x): return (x - a) / (b - a)
def remap(a, b, c, d, x): return mix(c, d, linear(a, b, x))
# The inverse approximation using Newton-Raphson iterations
def inverse(v, n):
u = remap(f(R0), f(R1), R0, R1, v)
for _ in range(n):
u = u - (f(u) - v) / d(u)
return u
def _main():
_, ax = plt.subplots()
x = np.linspace(R0, R1)
y = f(x)
ax.plot((-1 / 2, 2), (-1 / 2, 2), "--", color="gray")
ax.plot(x, y, "-", color="C0", label="f")
ax.plot([R0, R1], [f(R0), f(R1)], "o", color="C0")
ax.plot(y, x, "-", color="C1", label="f¯¹")
v = np.linspace(f(R0), f(R1))
u = inverse(v, N)
ax.plot(v, u, "-", color="C3", label=f"f¯¹ approx in {N} iteration(s)")
ax.plot([f(R0), f(R1)], [R0, R1], "o", color="C3")
ax.set_aspect("equal", "box")
ax.grid(True)
ax.legend()
plt.show()
_main()
</code></pre>