A small freedom area RSSDefault feed for blog.pkh.me
http://blog.pkh.me/
http://blog.pkh.me/p/39-improving-color-quantization-heuristics.html
http://blog.pkh.me/p/39-improving-color-quantization-heuristics.html
Improving color quantization heuristicsSat, 31 Dec 2022 12:00:43 -0000<p>In 2015, I wrote an article about <a href="http://blog.pkh.me/p/21-high-quality-gif-with-ffmpeg.html">how the palette color quantization was
improved in FFmpeg</a> in order to make nice animated GIF files. For some
reason, to this day this is one of my most popular article.</p>
<p>As time passed, my experience with colors grew and I ended up being quite
ashamed and frustrated with the state of these filters. A lot of the code was
naive (when not terribly wrong), despite the apparent good results.</p>
<p>One of the major change I wanted to do was to evaluate the color distances
using a perceptually uniform colorspace, instead of using a naive euclidean
distance of RGB triplets.</p>
<p>As usual it felt like a week-end long project; after all, all I have to do is
change the distance function to work in a different space, right? Well, if
you're following my blog you might have noticed I've add numerous adventures
that stacked up on each others:</p>
<ul>
<li>I had to work out the <a href="http://blog.pkh.me/p/38-porting-oklab-colorspace-to-integer-arithmetic.html">colorspace with integer arithmetic</a> first</li>
<li>...which forced me to look into <a href="http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html">integer division</a> more deeply</li>
<li>...which confronted me to all sort of <a href="http://blog.pkh.me/p/37-gcc-undefined-behaviors-are-getting-wild.html">undefined behaviours</a> in the
process</li>
</ul>
<p>And when I finally reached the point where I could make the switch to
<a href="https://bottosson.github.io/posts/oklab/">OkLab</a> (the perceptual colorspace), a few experiments showed that the
flavour of the core algorithm I was using might contain some fundamental flaws,
or at least was not implementing optimal heuristics. So here we go again,
quickly enough I find myself starting a new research study in the pursuit of
understanding how to put pixels on the screen. This write-up is the story of
yet another self-inflicted struggle.</p>
<h2>Palette quantization</h2>
<p>But what is <em>palette quantization</em>? It essentially refers to the process of
reducing the number of available colors of an image down to a smaller subset.
In sRGB, an image can have up to 16.7 million colors. In practice though it's
generally much less, to the surprise of no one. Still, it's not rare to have a
few hundreds of thousands different colors in a single picture. Our goal is to
reduce that to something like 256 colors that represent them best, and use
these colors to create a new picture.</p>
<p>Why you may ask? There are multiple reasons, here are some:</p>
<ul>
<li>Improve size compression (this is a lossy operation of course, and using
dithering on top might actually defeat the original purpose)</li>
<li>Some codecs might not support anything else than limited palettes (GIF or
subtitles codecs are examples)</li>
<li>Various artistic purposes</li>
</ul>
<p>Following is an example of a picture quantized at different levels:</p>
<table>
<thead>
<tr>
<th>Original (26125 colors)</th>
<th>Quantized to 8bpp (256 colors)</th>
<th>Quantized to 2bpp (4 colors)</th>
</tr>
</thead>
<tbody>
<tr>
<td><img src="http://blog.pkh.me/img/color-quant/cat-orig.png" alt="Cat (original)" /></td>
<td><img src="http://blog.pkh.me/img/color-quant/cat-256.png" alt="Cat (8bpp)" /></td>
<td><img src="http://blog.pkh.me/img/color-quant/cat-4.png" alt="Cat (2bpp)" /></td>
</tr>
</tbody>
</table>
<p>This color quantization process can be roughly summarized in a 4-steps based
process:</p>
<ol>
<li>Sample the input image: we build an histogram of all the colors in the
picture (basically a simple statistical analysis)</li>
<li>Design a colormap: we build the palette through various means using the
histograms</li>
<li>Create a pixel mapping which associates a color (one that can be found in
the input image) with another (one that can be found in the newly created
palette)</li>
<li>Image quantizing: we use the color mapping to build our new image. This step
may also involve some <a href="https://en.wikipedia.org/wiki/Dither">dithering</a>.</li>
</ol>
<p>The study here will focus on step 2 (which itself relies on step 1).</p>
<h2>Colormap design algorithms</h2>
<p>A palette is simply a set of colors. It can be represented in various ways, for
example here in 2D and 3D:</p>
<p><img src="http://blog.pkh.me/img/color-quant/pal-2d-3d.png" alt="centerimg" /></p>
<p>To generate such a palette, all sort of algorithms exists. They are usually
classified into 2 large categories:</p>
<ul>
<li>Dividing/splitting algorithms (such as Median-Cut and its various flavors)</li>
<li>Clustering algorithms (such as K-means, maximin distance, (E)LBG or pairwise
clustering)</li>
</ul>
<p>The former are faster but non-optimal while the latter are slower but better.
The problem is <a href="https://en.wikipedia.org/wiki/NP-completeness">NP-complete</a>, meaning it's possible to find the
optimal solution but it can be extremely costly. On the other hand, it's
possible to find "local optimums" at minimal cost.</p>
<p>Since I'm working within FFmpeg, speed has always been a priority. This was the
reason that motivated me to initially implement the Median-Cut over a more
expensive algorithm.</p>
<p>The rough picture of the algorithm is relatively easy to grasp. Assuming we
want a palette of <code>K</code> colors:</p>
<ol>
<li>A set <code>S</code> of all the colors in the input picture is constructed, along with
a respective set <code>W</code> of the weight of each color (how much they appear)</li>
<li>Since the colors are expressed as RGB triplets, they can be encapsulated
in one big cuboid, or box</li>
<li>The box is cut in two along one of the axis (R, G or B) on the median
(hence the name of the algorithm)</li>
<li>If we don't have a total <code>K</code> boxes yet, pick one of them and go back to
previous step</li>
<li>All the colors in each of the <code>K</code> boxes are then averaged to form the color
palette entries</li>
</ol>
<p>Here is how the process looks like visually:</p>
<p><video src="http://blog.pkh.me/misc/mediancut-parrot-16.mp4" controls="controls" width="800">Median-Cut algorithm targeting 16 boxes</video></p>
<p>You may have spotted in this video that the colors are not expressed in RGB but
in Lab: this is because instead of representing the colors in a traditional RGB
colorspace, we are instead using the OkLab colorspace which has the property of
being perceptually uniform. It doesn't really change the Median Cut algorithm,
but it definitely has an impact on the resulting palette.</p>
<p>One striking limitation of this algorithm is that we are working exclusively
with cuboids: the cuts are limited to an axis, we are not cutting along an
arbitrary plane or a more complex shape. Think of it like working with voxels
instead of more free-form geometries. The main benefit is that the algorithm is
pretty simple to implement.</p>
<p>Now the description provided earlier conveniently avoided describing two
important aspects happening in step 3 and 4:</p>
<ol>
<li>How do we choose the next box to split?</li>
<li>How do we choose along which axis of the box we make the cut?</li>
</ol>
<p>I pondered about that for a quite a long time.</p>
<h2>An overview of the possible heuristics</h2>
<p>In bulk, some of the heuristics I started thinking of:</p>
<ul>
<li>should we take the box that has the longest axis across all boxes?</li>
<li>should we take the box that has the largest volume?</li>
<li>should we take the box that has the biggest <a href="https://en.wikipedia.org/wiki/Mean_squared_error">Mean Squared Error</a> when
compared to its average color?</li>
<li>should we take the box that has the <em>axis</em> with the biggest MSE?</li>
<li>assuming we choose to go with the MSE, should it be normalized across all
boxes?</li>
<li>should we even account for the weight of each color or consider them equal?</li>
<li>what about the axis? Is it better to pick the longest or the one with the
higher MSE?</li>
</ul>
<p>I tried to formalize these questions mathematically to the best of my limited
abilities. So let's start by saying that all the colors <code>c</code> of a given box are
stored in a <code>N×M</code> 2D-array following the matrix notation:</p>
<table>
<tbody>
<tr><td>L₁</td><td>L₂</td><td>L₃</td><td>…</td><td>Lₘ</td></tr>
<tr><td>a₁</td><td>a₂</td><td>a₃</td><td>…</td><td>aₘ</td></tr>
<tr><td>b₁</td><td>b₂</td><td>b₃</td><td>…</td><td>bₘ</td></tr>
</tbody>
</table>
<p><code>N</code> is the number of components (3 in our case, whether it's RGB or Lab), and
<code>M</code> the number of colors in that box. You can visualize this as a list of
vectors as well, where <code>c_{i,j}</code> is the color at row <code>i</code> and column <code>j</code>.</p>
<p>With that in mind we can sketch the following diagram representing the tree of
heuristic possibilities to implement:</p>
<p><img src="http://blog.pkh.me/img/color-quant/diagram-heuristics.png" alt="centerimg" /></p>
<p>Mathematicians are going to kill me for doodling random notes all over this
perfectly understandable symbols gibberish, but I believe this is required for
the human beings reading this article.</p>
<p>In summary, we end up with a total of 24 combinations to try out:</p>
<ul>
<li>2 axis selection heuristics:
<ul>
<li>cut the axis with the maximum error squared</li>
<li>cut the axis with the maximum length</li>
</ul>
</li>
<li>3 operators:
<ul>
<li>maximum measurement out of all the channels</li>
<li>product of the measurements of all the channels</li>
<li>sum of the measurements of all the channels</li>
</ul>
</li>
<li>4 measurements:
<ul>
<li>error squared, honoring weights</li>
<li>error squared, not honoring weights</li>
<li>error squared, honoring weights, normalized</li>
<li>length of the axis</li>
</ul>
</li>
</ul>
<p>If we start to intuitively think about which ones are likely going to perform
the best, we quickly realize that we haven't actually formalized what we are
trying to achieve. Such a rookie mistake. Clarifying this will help us getting
a better feeling about the likely outcome.</p>
<p>I chose to target an output that minimizes the MSE against the reference image,
in a perceptual way. Said differently, trying to make the perceptual distance
between an input and output color pixel as minimal as possible. This is an
arbitrary and debatable target, but it's relatively simple and objective to
evaluate if we have faith in the selected perceptual model. Another appropriate
metric could have been to find the ideal palette through another algorithm, and
compare against that instead. Doing that unfortunately implied that I would
trust that other algorithm, its implementation, and that I have enough
computing power.</p>
<p>So to summarize, we want to minimize the MSE between the input and output,
evaluated in the OkLab colorspace. This can be expressed with the following
formula:</p>
<p><img src="http://blog.pkh.me/img/color-quant/evaluation.png" alt="centerimg" /></p>
<p>Where:</p>
<ul>
<li><code>P</code> is a <a href="https://en.m.wikipedia.org/wiki/Partition_of_a_set">partition</a>
(which we constrain to a box in our implementation)</li>
<li><code>C</code> the set of colors in the partition <code>P</code></li>
<li><code>w</code> the weight of a color</li>
<li><code>c</code> a single color</li>
<li><code>µ</code> the average color of the set <code>C</code></li>
</ul>
<p>Special thanks to <code>criver</code> for helping me a ton on the math area, this last
formula is from them.</p>
<p>Looking at the formula, we can see how similar it is to certain branches of the
heuristics tree, so we can start getting an intuition about the result of the
experiment.</p>
<h2>Experiment language</h2>
<p>Short deviation from the main topic (feel free to skip to the next section):
working in C within FFmpeg quickly became a hurdle more than anything. Aside
from the lack of flexibility, the implicit casts destroying the precision
deceitfully, and the undefined behaviours, all kind of C quirks went in the way
several times, which made me question my sanity. This one typically severly
messed me up while trying to average the colors:</p>
<pre><code class="language-c">#include <stdio.h>
#include <stdint.h>
int main (void)
{
const int32_t x = -30;
const uint32_t y = 10;
const uint32_t a = 30;
const int32_t b = -10;
printf("%d×%u=%d\n", x, y, x * y);
printf("%u×%d=%d\n", a, b, a * b);
printf("%d/%u=%d\n", x, y, x / y);
printf("%u/%d=%d\n", a, b, a / b);
return 0;
}
</code></pre>
<pre><code class="language-shell">% cc -Wall -Wextra -fsanitize=undefined test.c -o test && ./test
-30×10=-300
30×-10=-300
-30/10=429496726
30/-10=0
</code></pre>
<p>Anyway, I know this is obvious but if you aren't already doing that I suggest
you build your experiments in another language, Python or whatever, and rewrite
them in C later when you figured out your expected output.</p>
<p>Re-implementing what I needed in Python didn't take me long. It was, and still
is obviously much slower at runtime, but that's fine. There is a lot of room
for speed improvement, typically by relying on <code>numpy</code> (which I didn't bother
with).</p>
<h2>Experiment results</h2>
<p>I created a <a href="https://github.com/ubitux/research/">research repository</a> for the occasion. The code to
reproduce and the results can be found in the <a href="https://github.com/ubitux/research/tree/main/color-quantization">color quantization
README</a>.</p>
<p>In short, based on the results, we can conclude that:</p>
<ul>
<li>Overall, the box that has the axis with the largest non-normalized weighted
sum of squared error is the best candidate in the box selection algorithm</li>
<li>Overall, cutting the axis with the largest weighted sum of squared error is
the best axis cut selection algorithm</li>
</ul>
<p>To my surprise, normalizing the weights per box is not a good idea. I initially
observed that by trial and error, which was actually one of the main motivator
for this research. I initially thought normalizing each box was necessary in
order to compare them against each others (such that they are compared on a
common ground). My loose explanation of the phenomenon was that not normalizing
causes a bias towards boxes with many colors, but that's actually exactly what
we want. I believe it can also be explained by our evaluation function: we want
to minimize the error across the whole set of colors, so small partitions (in
color counts) must not be made stronger. At least not in the context of the
target we chose.</p>
<p>It's also interesting to see how the <code>max()</code> seems to perform better than the
<code>sum()</code> of the variance of each component most of the time. Admittedly my
picture samples set is not that big, which may imply that more experiments to
confirm that tendency are required.</p>
<p>In retrospective, this might have been quickly predictable to someone with a
mathematical background. But since I don't have that, nor do I trust my
abstract thinking much, I'm kind of forced to try things out often. This is
likely one of the many instances where I spent way too much energy on something
obvious from the beginning, but I have the hope it will actually provide some
useful information for other lost souls out there.</p>
<h2>Known limitations</h2>
<p>There are two main limitations I want to discuss before closing this article.
The first one is related to minimizing the MSE even more.</p>
<h3>K-means refinement</h3>
<p>We know the Median-Cut actually provides a rough estimate of the optimal
palette. One thing we could do is use it as a first step before refinement, for
example by running a few K-means iterations as post-processing (how much
refinement/iterations could be a user control). The general idea of K-means is
to progressively move each colors individually to a more appropriate box, that
is a box for which the color distance to the average color of that box is
smaller. I started implementing that in a very naive way, so it's extremely
slow, but that's something to investigate further because it definitely
improves the results.</p>
<p>Most of the academic literature seems to suggest the use of the K-means
clustering, but all of them require some startup step. Some come up with
various heuristics, some use PCA, but I've yet to see one that rely on
Median-Cut as first pass; maybe that's not such a good idea, but who knows.</p>
<h3>Bias toward perceived lightness</h3>
<p>Another more annoying problem for which I have no solution for is with regards
to the human perception being much more sensitive to light changes than hue. If
you look at the first demo with the parrot, you may have observed the boxes are
kind of thin. This is because the <code>a</code> and <code>b</code> components (respectively how
green/red and blue/yellow the color is) have a much smaller amplitude compared
to the <code>L</code> (perceived lightness).</p>
<p>Here is a side by side comparison of the spread of colors between a stretched
and normalized view:</p>
<p><img src="http://blog.pkh.me/img/color-quant/oklab-axis-scaled.png" alt="centerimg" /></p>
<p>You may rightfully question whether this is a problem or not. In practice, this
means that when <code>K</code> is low (let's say smaller than 8 or even 16), cuts along <code>L</code>
will almost always be preferred, causing the picture to be heavily desaturated.
This is because it tries to preserve the most significant attribute in human
perception: the lightness.</p>
<p>That particular picture is actually a pathological study case:</p>
<table>
<thead>
<tr>
<th>4 colors</th>
<th>8 colors</th>
<th>12 colors</th>
<th>16 colors</th>
</tr>
</thead>
<tbody>
<tr>
<td><img src="http://blog.pkh.me/img/color-quant/woman-4.png" alt="Portrait K=4" /></td>
<td><img src="http://blog.pkh.me/img/color-quant/woman-8.png" alt="Portrait K=8" /></td>
<td><img src="http://blog.pkh.me/img/color-quant/woman-12.png" alt="Portrait K=12" /></td>
<td><img src="http://blog.pkh.me/img/color-quant/woman-16.png" alt="Portrait K=16" /></td>
</tr>
</tbody>
</table>
<p>We can see the hue timidly appearing around <code>K=16</code> (specifically it starts
being more strongly noticeable starting the cut <code>K=13</code>).</p>
<h2>Conclusion</h2>
<p>For now, I'm mostly done with this "week-end long project" into which I
actually poured 2 or 3 months of lifetime. The FFmpeg patchset will likely be
upstreamed soon so everyone should hopefully be able to benefit from it in the
next release. It will also come with <a href="https://fosstodon.org/@bug/109602427382086789">additional dithering
methods</a>, which implementation actually was a relaxing
distraction from all this hardship. There are still many ways of improving this
work, but it's the end of the line for me, so I'll trust the Internet with it.</p>
http://blog.pkh.me/p/38-porting-oklab-colorspace-to-integer-arithmetic.html
http://blog.pkh.me/p/38-porting-oklab-colorspace-to-integer-arithmetic.html
Porting OkLab colorspace to integer arithmeticSun, 11 Dec 2022 22:01:17 -0000<p>For reasons I'll explain in a futur write-up, I needed to make use of a
perceptually uniform colorspace in some computer vision code. <a href="https://bottosson.github.io/posts/oklab/">OkLab from Björn
Ottosson</a> was a great candidate given how simple the implementation is.</p>
<p><img src="http://blog.pkh.me/img/oklab-int/hue_oklab.png" alt="centerimg" title="OkLab hue" /></p>
<p>But there is a plot twist: I needed the code to be deterministic for the tests
to be portable across a large variety of architecture, systems and
configurations. Several solutions were offered to me, including reworking the
test framework to support a difference mechanism with threshold, but having
done that in another project I can confidently say that it's not trivial (when
not downright impossible in certain cases). Another approach would have been to
hardcode the libc math functions, but even then I wasn't confident the floating
point arithmetic would determinism would be guaranteed in all cases.</p>
<p>So I ended up choosing to port the code to integer arithmetic. I'm sure many
people would disagree with that approach, but:</p>
<ul>
<li>code determinism is guaranteed</li>
<li>not all FPU are that efficient, typically on embedded</li>
<li>it can now be used in the kernel; while this is far-fetched for OkLab (though
maybe someone needs some color management in v4l2 or something), sRGB
transforms might have their use cases</li>
<li>it's a learning experience which can be re-used in other circumstances</li>
<li>working on the integer arithmetic versions unlocked various optimizations for
the normal case</li>
</ul>
<p><strong>Note</strong>: I'm following Björn Ottosson will to have OkLab code in the public
domain as well as under MIT license, so this "dual licensing" applies to all
the code presented in this article.</p>
<p><strong>Warning</strong>: The integer arithmetics in this write-up can only work if your
language behaves the same as C99 (or more recent) with regard to integer
division. See <a href="http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html">this previous article on integer division</a> for more
information.</p>
<h2>Quick summary of uniform colorspaces</h2>
<p>For those unfamiliar with color management, one of the main benefit of a
uniform colorspace like OkLab is that the euclidean distance between two colors
is directly correlated with the human perception of these colors.</p>
<p>More concretely, if we want to evaluate the distance between the RGB triplets
<code>(R₀,G₀,B₀)</code> and <code>(R₁,G₁,B₁)</code>, one may naively compute the euclidean distance
<code>√((R₀-R₁)²+(G₀-G₁)²+(B₀-B₁)²)</code>. Unfortunately, even if the RGB is gamma
expanded into linear values, the computed distance will actually be pretty far
from reflecting how the human eye perceive this difference. It typically isn't
going to be consistent when applied to another pair of colors.</p>
<p>With OkLab (and many other uniform colorspaces), the colors are also identified
with 3D coordinates, but instead of <code>(R,G,B)</code> we call them <code>(L,a,b)</code> (which is
an entirely different 3D space). In that space <code>√((L₀-L₁)²+(a₀-a₁)²+(b₀-b₁)²</code>
(called <code>ΔE</code>, or <code>Delta-E</code>) is expected to be aligned with human perception of
color differences.</p>
<p>Of course, this is just one model, and it doesn't take into account many
parameters. For instance, the perception of a color depends a lot on the
surrounding colors. Still, these models are much better than working with RGB
triplets, which don't make much sense visually speaking.</p>
<h2>Reference code / diagram</h2>
<p>In this study case, We will be focusing on the transform that goes from sRGB to
OkLab, and back again into sRGB. Only the first part is interesting if we want
the color distance, but sometimes we also want to alter a color uniformly and
thus we need the 2nd part as well to reconstruct an sRGB color from it.</p>
<p>We are only considering the sRGB input and output for simplicity, which means
we will be inlining the sRGB color transfer in the pipeline. If you're not
familiar with gamma compression, there are <a href="https://en.wikipedia.org/wiki/Gamma_correction">many</a> <a href="https://bottosson.github.io/posts/colorwrong/">resources</a>
<a href="http://filmicworlds.com/blog/linear-space-lighting-i-e-gamma/">about</a> <a href="https://www.youtube.com/watch?v=LKnqECcg6Gw">it</a> on the Internet which you may want to look into
first.</p>
<p>Here is a diagram of the complete pipeline:</p>
<p><img src="http://blog.pkh.me/img/oklab-int/pipeline.png" alt="centerimg" title="sRGB/OkLab pipeline" /></p>
<p>And the corresponding code (of the 4 circles in the diagram) we will be porting:</p>
<pre><code class="language-c">struct Lab { float L, a, b; }
uint8_t linear_f32_to_srgb_u8(float x)
{
if (x <= 0.0) {
return 0;
} else if (x >= 1.0) {
return 0xff;
} else {
const float v = x < 0.0031308f ? x*12.92f : 1.055f*powf(x, 1.f/2.4f) - 0.055f;
return lrintf(v * 255.f);
}
}
float srgb_u8_to_linear_f32(uint8_t x)
{
const float v = x / 255.f;
return v < 0.04045f ? v/12.92f : powf((v+0.055f)/1.055f, 2.4f);
}
struct Lab srgb_u8_to_oklab_f32(uint32_t srgb)
{
const float r = srgb_u8_to_linear_f32(srgb >> 16 & 0xff);
const float g = srgb_u8_to_linear_f32(srgb >> 8 & 0xff);
const float b = srgb_u8_to_linear_f32(srgb & 0xff);
const float l = 0.4122214708f * r + 0.5363325363f * g + 0.0514459929f * b;
const float m = 0.2119034982f * r + 0.6806995451f * g + 0.1073969566f * b;
const float s = 0.0883024619f * r + 0.2817188376f * g + 0.6299787005f * b;
const float l_ = cbrtf(l);
const float m_ = cbrtf(m);
const float s_ = cbrtf(s);
const struct Lab ret = {
.L = 0.2104542553f * l_ + 0.7936177850f * m_ - 0.0040720468f * s_,
.a = 1.9779984951f * l_ - 2.4285922050f * m_ + 0.4505937099f * s_,
.b = 0.0259040371f * l_ + 0.7827717662f * m_ - 0.8086757660f * s_,
};
return ret;
}
uint32_t oklab_f32_to_srgb_u8(struct Lab c)
{
const float l_ = c.L + 0.3963377774f * c.a + 0.2158037573f * c.b;
const float m_ = c.L - 0.1055613458f * c.a - 0.0638541728f * c.b;
const float s_ = c.L - 0.0894841775f * c.a - 1.2914855480f * c.b;
const float l = l_*l_*l_;
const float m = m_*m_*m_;
const float s = s_*s_*s_;
const uint8_t r = linear_f32_to_srgb_u8(+4.0767416621f * l - 3.3077115913f * m + 0.2309699292f * s);
const uint8_t g = linear_f32_to_srgb_u8(-1.2684380046f * l + 2.6097574011f * m - 0.3413193965f * s);
const uint8_t b = linear_f32_to_srgb_u8(-0.0041960863f * l - 0.7034186147f * m + 1.7076147010f * s);
return r<<16 | g<<8 | b;
}
</code></pre>
<h2>sRGB to Linear</h2>
<p>The first step is converting the sRGB color to linear values. That sRGB
transfer function can be intimidating, but it's pretty much a simple power
function:</p>
<p><img src="http://blog.pkh.me/img/oklab-int/srgb-eotf.png" alt="centerimg" title="sRGB EOTF" /></p>
<p>The input is 8-bit (<code>[0x00;0xff]</code> for each of the 3 channels) which means we
can use a simple 256 values lookup table containing the precomputed resulting
linear values. Note that we can already do that with the reference code with a
table remapping the 8-bit index into a float value.</p>
<p>For our integer version we need to pick an arbitrary precision for the linear
representation. <a href="https://blog.demofox.org/2018/03/10/dont-convert-srgb-u8-to-linear-u8/">8-bit is not going to be enough precision</a>, so
we're going to pick the next power of two to be space efficient: 16-bit. We
will be using the constant <code>K=(1<<16)-1=0xffff</code> to refer to this scale.</p>
<p>Alternatively we could rely on a fixed point mapping (an integer for the
decimal part and another integer for the fractional part), but in our case
pretty much everything is normalized so the decimal part doesn't really matter.</p>
<pre><code class="language-c">/**
* Table mapping formula:
* f(x) = x < 0.04045 ? x/12.92 : ((x+0.055)/1.055)^2.4 (sRGB EOTF)
* Where x is the normalized index in the table and f(x) the value in the table.
* f(x) is remapped to [0;K] and rounded.
*/
static const uint16_t srgb2linear[256] = {
0x0000, 0x0014, 0x0028, 0x003c, 0x0050, 0x0063, 0x0077, 0x008b,
0x009f, 0x00b3, 0x00c7, 0x00db, 0x00f1, 0x0108, 0x0120, 0x0139,
0x0154, 0x016f, 0x018c, 0x01ab, 0x01ca, 0x01eb, 0x020e, 0x0232,
0x0257, 0x027d, 0x02a5, 0x02ce, 0x02f9, 0x0325, 0x0353, 0x0382,
0x03b3, 0x03e5, 0x0418, 0x044d, 0x0484, 0x04bc, 0x04f6, 0x0532,
0x056f, 0x05ad, 0x05ed, 0x062f, 0x0673, 0x06b8, 0x06fe, 0x0747,
0x0791, 0x07dd, 0x082a, 0x087a, 0x08ca, 0x091d, 0x0972, 0x09c8,
0x0a20, 0x0a79, 0x0ad5, 0x0b32, 0x0b91, 0x0bf2, 0x0c55, 0x0cba,
0x0d20, 0x0d88, 0x0df2, 0x0e5e, 0x0ecc, 0x0f3c, 0x0fae, 0x1021,
0x1097, 0x110e, 0x1188, 0x1203, 0x1280, 0x1300, 0x1381, 0x1404,
0x1489, 0x1510, 0x159a, 0x1625, 0x16b2, 0x1741, 0x17d3, 0x1866,
0x18fb, 0x1993, 0x1a2c, 0x1ac8, 0x1b66, 0x1c06, 0x1ca7, 0x1d4c,
0x1df2, 0x1e9a, 0x1f44, 0x1ff1, 0x20a0, 0x2150, 0x2204, 0x22b9,
0x2370, 0x242a, 0x24e5, 0x25a3, 0x2664, 0x2726, 0x27eb, 0x28b1,
0x297b, 0x2a46, 0x2b14, 0x2be3, 0x2cb6, 0x2d8a, 0x2e61, 0x2f3a,
0x3015, 0x30f2, 0x31d2, 0x32b4, 0x3399, 0x3480, 0x3569, 0x3655,
0x3742, 0x3833, 0x3925, 0x3a1a, 0x3b12, 0x3c0b, 0x3d07, 0x3e06,
0x3f07, 0x400a, 0x4110, 0x4218, 0x4323, 0x4430, 0x453f, 0x4651,
0x4765, 0x487c, 0x4995, 0x4ab1, 0x4bcf, 0x4cf0, 0x4e13, 0x4f39,
0x5061, 0x518c, 0x52b9, 0x53e9, 0x551b, 0x5650, 0x5787, 0x58c1,
0x59fe, 0x5b3d, 0x5c7e, 0x5dc2, 0x5f09, 0x6052, 0x619e, 0x62ed,
0x643e, 0x6591, 0x66e8, 0x6840, 0x699c, 0x6afa, 0x6c5b, 0x6dbe,
0x6f24, 0x708d, 0x71f8, 0x7366, 0x74d7, 0x764a, 0x77c0, 0x7939,
0x7ab4, 0x7c32, 0x7db3, 0x7f37, 0x80bd, 0x8246, 0x83d1, 0x855f,
0x86f0, 0x8884, 0x8a1b, 0x8bb4, 0x8d50, 0x8eef, 0x9090, 0x9235,
0x93dc, 0x9586, 0x9732, 0x98e2, 0x9a94, 0x9c49, 0x9e01, 0x9fbb,
0xa179, 0xa339, 0xa4fc, 0xa6c2, 0xa88b, 0xaa56, 0xac25, 0xadf6,
0xafca, 0xb1a1, 0xb37b, 0xb557, 0xb737, 0xb919, 0xbaff, 0xbce7,
0xbed2, 0xc0c0, 0xc2b1, 0xc4a5, 0xc69c, 0xc895, 0xca92, 0xcc91,
0xce94, 0xd099, 0xd2a1, 0xd4ad, 0xd6bb, 0xd8cc, 0xdae0, 0xdcf7,
0xdf11, 0xe12e, 0xe34e, 0xe571, 0xe797, 0xe9c0, 0xebec, 0xee1b,
0xf04d, 0xf282, 0xf4ba, 0xf6f5, 0xf933, 0xfb74, 0xfdb8, 0xffff,
};
int32_t srgb_u8_to_linear_int(uint8_t x)
{
return (int32_t)srgb2linear[x];
}
</code></pre>
<p>You may have noticed that we are returning the value in a <code>i32</code>: this is to
ease arithmetic operations (preserving the 16-bit unsigned precision would have
overflow warping implications when working with the value).</p>
<h2>Linear to OkLab</h2>
<p>The OkLab is expressed in a virtually continuous space (floats). If we feed all
16.7 millions sRGB colors to the OkLab transform we get the following ranges in
output:</p>
<pre><code class="language-plaintext">min Lab: 0.000000 -0.233887 -0.311528
max Lab: 1.000000 0.276216 0.198570
</code></pre>
<p>We observe that <code>L</code> is always positive and neatly within <code>[0;1]</code> while <code>a</code> and
<code>b</code> are in a more restricted and signed range. Multiple choices are offered to
us with regard to the integer representation we pick.</p>
<p>Since we chose 16-bit for the input linear value, it makes sense to preserve
that precision for <code>Lab</code>. For the <code>L</code> component, this fits neatly (<code>[0;1]</code> in the ref
maps to <code>[0;0xffff]</code> in the integer version), but for the <code>a</code> and <code>b</code>
component, not so much. We could pick a signed 16-bit, but that would imply a
15-bit precision for the arithmetic and 1-bit for the sign, which is going to
be troublesome: we want to preserve the same precision for <code>L</code>, <code>a</code> and <code>b</code>
since the whole point of this operation is to have a uniform space.</p>
<p>Instead, I decided to go with 16-bits of precision, with one extra bit for the
sign (which will be used for <code>a</code> and <code>b</code>), and thus storing <code>Lab</code> in 3 signed
<code>i32</code>. Alternatively, we could decide to have a 15-bit precision with an extra
bit for the sign by using 3 <code>i16</code>. This should work mostly fine but having the
values fit exactly the boundaries of the storage can be problematic in various
situations, typically anything that involves boundary checks and overflows.
Picking a larger storage simplifies a bunch of things.</p>
<p>Looking at <code>srgb_u8_to_oklab_f32</code> we quickly see that for most of the function
it's simple arithmetic, but we have a cube root (<code>cbrt()</code>), so let's study that
first.</p>
<h3>Cube root</h3>
<p>All the <code>cbrt</code> inputs are driven by this:</p>
<pre><code class="language-c">const float l = 0.4122214708f * r + 0.5363325363f * g + 0.0514459929f * b;
const float m = 0.2119034982f * r + 0.6806995451f * g + 0.1073969566f * b;
const float s = 0.0883024619f * r + 0.2817188376f * g + 0.6299787005f * b;
</code></pre>
<p>This might not be obvious at first glance but here <code>l</code>, <code>m</code> and <code>s</code> all are in
<code>[0;1]</code> range (the sum of the coefficients of each row is <code>1</code>), so we will only
need to deal with this range in our <code>cbrt</code> implementation. This simplifies
greatly the problem!</p>
<p>Now, what does it look like?</p>
<p><img src="http://blog.pkh.me/img/oklab-int/cbrt01.png" alt="centerimg" title="Cube root function between 0 and 1" /></p>
<p>This function is simply the inverse of <code>f(x)=x³</code>, which is a more convenient
function to work with. And I have some great news: not long ago, I wrote <a href="http://blog.pkh.me/p/32-invert-a-function-using-newton-iterations.html">an
article on how to inverse a function</a>, so that's exactly what we
are going to do here: inverse <code>f(x)=x³</code>.</p>
<p>What we first need though is a good approximation of the curve. A straight line
is probably fine but we could try to use some symbolic regression in order to
get some sort of rough polynomial approximation. <a href="https://astroautomata.com/PySR/">PySR</a> can do that in a
few lines of code:</p>
<pre><code class="language-python">import numpy as np
from pysr import PySRRegressor
# 25 points of ³√x within [0;1]
x = np.linspace(0, 1, 25).reshape(-1, 1)
y = x ** (1/3)
model = PySRRegressor(model_selection="accuracy", binary_operators=["+", "-", "*"], niterations=200)
r = model.fit(x, y, variable_names=["x"])
print(r)
</code></pre>
<p>The output is not deterministic for some reason (which is quite annoying) and
the expressions provided usually follows a wonky form. Still, in my run it
seemed to take a liking to the following polynomial: <code>u₀ = x³ - 2.19893x² + 2.01593x + 0.219407</code> (reformatted in a sane polynomial form thanks to
WolframAlpha).</p>
<p>Note that increasing the number of data points is not really a good idea
because we quickly start being confronted to <a href="https://en.wikipedia.org/wiki/Runge%27s_phenomenon">Runge's phenomenon</a>. No
need to overthink it, 25 points is just fine.</p>
<p>Now we can make a few Newton iterations. For that, we need the derivative of
<code>f(uₙ)=uₙ³-x</code>, so <code>f'(uₙ)=3uₙ²</code> and thus the iteration expressions can be
obtained easily:</p>
<pre><code class="language-plaintext">uₙ₊₁ = uₙ - (f(uₙ)-v)/f'(uₙ)
= uₙ - (uₙ³-v)/(3uₙ²)
= (2uₙ³+v)/(3uₙ²)
</code></pre>
<p>If you don't understand what the hell is going on here, check <a href="http://blog.pkh.me/p/32-invert-a-function-using-newton-iterations.html">the article
referred to earlier</a>, we're simply following the recipe here.</p>
<p>Now I had a look into how most libc compute <code>cbrt</code>, and <a href="https://twitter.com/insouris/status/1589649490075561984">despite sometimes
referring to Newton iterations, they were actually using Halley
iterations</a>. So we're going to do the same (not lying, just the
Halley part). To get the Halley iteration instead of Newton, we need the first
but also the second derivative of <code>f(uₙ)=uₙ³-x</code> (<code>f'(uₙ)=3uₙ²</code> and
<code>f"(uₙ)=6uₙ</code>) from which we deduce a relatively simple expression:</p>
<pre><code class="language-plaintext">uₙ₊₁ = uₙ-2f(uₙ)f'(uₙ)/(2f'(uₙ)²-f(uₙ)f"(uₙ))
= uₙ(2x+uₙ³)/(x+2uₙ³)
</code></pre>
<p>We have everything we need to approximate a cube root of a real between <code>0</code> and
<code>1</code>. In Python a complete implementation would be as simple as this snippet:</p>
<pre><code class="language-python">b, c, d = -2.19893, 2.01593, 0.219407
def cbrt01(x):
# We only support [0;1]
if x <= 0: return 0
if x >= 1: return 1
# Initial approximation
u = x**3 + b*x**2 + c*x + d
# 2 Halley iterations
u = u * (2*x+u**3) / (x+2*u**3)
u = u * (2*x+u**3) / (x+2*u**3)
return u
</code></pre>
<p>But now we need to scale the floating values up into 16-bit integers.</p>
<p>First of all, in the integer version our <code>x</code> is actually in <code>K</code> scale, which
means we want to express <code>u</code> according to <code>X=x·K</code>. Similarly, we want to use
<code>B=b·K</code>, <code>C=c·K</code> and <code>D=d·K</code> instead of <code>b</code>, <code>c</code> and <code>d</code> because we have no way
of expressing the former as integer otherwise. Finally, we're not actually
going to compute <code>u₀</code> but <code>u₀·K</code> because we're preserving the scale through the
function. We have:</p>
<pre><code class="language-plaintext">u₀·K = K·(x³ + bx² + cx + d)
= K·((x·K)³/K³ + b(x·K)²/K² + c(x·K)/K + d)
= K·(X³/K³ + bX²/K² + cX/K + d)
= X³·K/K³ + bX²·K/K² + cX·K/K + d·K
= X³/K² + BX²/K² + CX/K + D
= X³/K² + BX²/K² + CX/K + D
= (X³ + BX²)/K² + CX/K + D
= ((X³ + BX²)/K + CX)/K + D
= (X(X² + BX)/K + CX)/K + D
U₀ = (X(X(X + B)/K + CX)/K + D
</code></pre>
<p>With this we have a relatively cheap expression where the <code>K</code> divisions would
still preserve enough precision even if evaluated as integer division.</p>
<p>We can do the same for the Halley iteration. I spare you the algebra, the
expression <code>u(2x+u³) / (x+2u³)</code> becomes <code>(U(2X+U³/K²)) / (X+2U³/K²)</code>.</p>
<p>Looking at this expression you may start to worry about overflows, and that
would fair since even <code>K²</code> is getting dangerously close to the sun (it's
actually already larger than <code>INT32_MAX</code>). For this reason we're going to cheat
and simply use 64-bit arithmetic in this function. I believe we could reduce
the risk of overflow, but I don't think there is a way to remain in 32-bit
without nasty compromises anyway. This is also why in the code below you'll
notice the constants are suffixed with <code>LL</code> (to force long-long/64-bit
arithmetic).</p>
<p>Beware that overflows are a terrible predicament to get into as they will lead
to <a href="http://blog.pkh.me/p/37-gcc-undefined-behaviors-are-getting-wild.html">undefined behaviour</a>. <strong>Do not underestimate this risk</strong>. You might not
detect them early enough, and missing them may mislead you when interpreting
the results. For this reason, I strongly suggest to <strong>always build with
<code>-fsanitize=undefined</code></strong> during test and development. I don't do that often,
but for this kind of research, I also highly recommend to <strong>first write tests
that cover all possible integers input</strong> (when applicable) so that overflows
are detected as soon as possible.</p>
<p>Before we write the integer version of our function, we need to address
rounding. In the case of the initial approximation I don't think we need to
bother, but for our Halley iteration we're going to need as much precision as
we can get. Since we know <code>U</code> is positive (remember we're evaluating <code>cbrt(x)</code>
where <code>x</code> is in <code>[0;1]</code>), we can use <a href="http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html">the <code>(a+b/2)/b</code> rounding
formula</a>.</p>
<p>Our function finally just looks like:</p>
<pre><code class="language-c">#define K2 ((int64_t)K*K)
int32_t cbrt01_int(int32_t x)
{
int64_t u;
/* Approximation curve is for the [0;1] range */
if (x <= 0) return 0;
if (x >= K) return K;
/* Initial approximation: x³ - 2.19893x² + 2.01593x + 0.219407 */
u = x*(x*(x - 144107LL) / K + 132114LL) / K + 14379LL;
/* Refine with 2 Halley iterations. */
for (int i = 0; i < 2; i++) {
const int64_t u3 = u*u*u;
const int64_t den = x + (2*u3 + K2/2) / K2;
u = (u * (2*x + (u3 + K2/2) / K2) + den/2) / den;
}
return u;
}
</code></pre>
<p>Cute, isn't it? If we test the accuracy of this function by calling it for all
the possible values we actually get extremely good results. Here is a test
code:</p>
<pre><code class="language-c">int main(void)
{
float max_diff = 0;
float total_diff = 0;
for (int i = 0; i <= K; i++) {
const float ref = cbrtf(i / (float)K);
const float out = cbrt01_int(i) / (float)K;
const float d = fabs(ref - out);
if (d > max_diff)
max_diff = d;
total_diff += d;
}
printf("max_diff=%f total_diff=%f avg_diff=%f\n",
max_diff, total_diff, total_diff / (K + 1));
return 0;
}
</code></pre>
<p>Output: <code>max_diff=0.030831 total_diff=0.816078 avg_diff=0.000012</code></p>
<p>If we want to trade precision for speed, we could adjust the function to use
Newton iterations, and maybe remove the rounding.</p>
<h3>Back to the core</h3>
<p>Going back to our sRGB-to-OkLab function, everything should look
straightforward to implement now. There is one thing though, while <code>lms</code>
computation (at the beginning of the function) is exclusively working with
positive values, the output <code>Lab</code> value expression is signed. For this reason
we will need a more involved rounded division, so referring again to <a href="http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html">my last
article</a> we will use:</p>
<pre><code class="language-c">static int64_t div_round64(int64_t a, int64_t b) { return (a^b)<0 ? (a-b/2)/b : (a+b/2)/b; }
</code></pre>
<p>And thus, we have:</p>
<pre><code class="language-c">struct LabInt { int32_t L, a, b; };
struct LabInt srgb_u8_to_oklab_int(uint32_t srgb)
{
const int32_t r = (int32_t)srgb2linear[srgb >> 16 & 0xff];
const int32_t g = (int32_t)srgb2linear[srgb >> 8 & 0xff];
const int32_t b = (int32_t)srgb2linear[srgb & 0xff];
// Note: lms can actually be slightly over K due to rounded coefficients
const int32_t l = (27015LL*r + 35149LL*g + 3372LL*b + K/2) / K;
const int32_t m = (13887LL*r + 44610LL*g + 7038LL*b + K/2) / K;
const int32_t s = (5787LL*r + 18462LL*g + 41286LL*b + K/2) / K;
const int32_t l_ = cbrt01_int(l);
const int32_t m_ = cbrt01_int(m);
const int32_t s_ = cbrt01_int(s);
const struct LabInt ret = {
.L = div_round64( 13792LL*l_ + 52010LL*m_ - 267LL*s_, K),
.a = div_round64(129628LL*l_ - 159158LL*m_ + 29530LL*s_, K),
.b = div_round64( 1698LL*l_ + 51299LL*m_ - 52997LL*s_, K),
};
return ret;
}
</code></pre>
<p>The note in this code is here to remind us that we have to saturate <code>lms</code> to a
maximum of <code>K</code> (corresponding to <code>1.0</code> with floats), which is what we're doing
in <code>cbrt01_int()</code>.</p>
<p>At this point we can already work within the OkLab space but we're only
half-way through the pain. Fortunately, things are going to be easier from now
on.</p>
<h2>OkLab to sRGB</h2>
<p>Our OkLab-to-sRGB function relies on the Linear-to-sRGB function (at the end),
so we're going to deal with it first.</p>
<h3>Linear to sRGB</h3>
<p><img src="http://blog.pkh.me/img/oklab-int/srgb-oetf.png" alt="centerimg" title="sRGB OETF" /></p>
<p>Contrary to sRGB-to-Linear it's going to be tricky to rely on a table because
it would be way too large to hold all possible values (since it would require
<code>K</code> entries). I initially considered computing <code>powf(x, 1.f/2.4f)</code> with integer
arithmetic somehow, but this is much more involved than how we managed to
implement <code>cbrt</code>. So instead I thought about approximating the curve with a
bunch of points (stored in a table), and then approximate any intermediate
value with a linear interpolation, that is as if the point were joined through
small segments.</p>
<p>We gave 256 16-bit entries to <code>srgb2linear</code>, so if we were to give as much
storage to <code>linear2srgb</code> we could have a table of 512 8-bit entries (our output
is 8-bit). Here it is:</p>
<pre><code class="language-c">/**
* Table mapping formula:
* f(x) = x < 0.0031308 ? x*12.92 : (1.055)*x^(1/2.4)-0.055 (sRGB OETF)
* Where x is the normalized index in the table and f(x) the value in the table.
* f(x) is remapped to [0;0xff] and rounded.
*
* Since a 16-bit table is too large, we reduce its precision to 9-bit.
*/
static const uint8_t linear2srgb[P + 1] = {
0x00, 0x06, 0x0d, 0x12, 0x16, 0x19, 0x1c, 0x1f, 0x22, 0x24, 0x26, 0x28, 0x2a, 0x2c, 0x2e, 0x30,
0x32, 0x33, 0x35, 0x36, 0x38, 0x39, 0x3b, 0x3c, 0x3d, 0x3e, 0x40, 0x41, 0x42, 0x43, 0x45, 0x46,
0x47, 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56,
0x56, 0x57, 0x58, 0x59, 0x5a, 0x5b, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x5f, 0x60, 0x61, 0x62, 0x62,
0x63, 0x64, 0x65, 0x65, 0x66, 0x67, 0x67, 0x68, 0x69, 0x6a, 0x6a, 0x6b, 0x6c, 0x6c, 0x6d, 0x6e,
0x6e, 0x6f, 0x6f, 0x70, 0x71, 0x71, 0x72, 0x73, 0x73, 0x74, 0x74, 0x75, 0x76, 0x76, 0x77, 0x77,
0x78, 0x79, 0x79, 0x7a, 0x7a, 0x7b, 0x7b, 0x7c, 0x7d, 0x7d, 0x7e, 0x7e, 0x7f, 0x7f, 0x80, 0x80,
0x81, 0x81, 0x82, 0x82, 0x83, 0x84, 0x84, 0x85, 0x85, 0x86, 0x86, 0x87, 0x87, 0x88, 0x88, 0x89,
0x89, 0x8a, 0x8a, 0x8b, 0x8b, 0x8c, 0x8c, 0x8c, 0x8d, 0x8d, 0x8e, 0x8e, 0x8f, 0x8f, 0x90, 0x90,
0x91, 0x91, 0x92, 0x92, 0x93, 0x93, 0x93, 0x94, 0x94, 0x95, 0x95, 0x96, 0x96, 0x97, 0x97, 0x97,
0x98, 0x98, 0x99, 0x99, 0x9a, 0x9a, 0x9a, 0x9b, 0x9b, 0x9c, 0x9c, 0x9c, 0x9d, 0x9d, 0x9e, 0x9e,
0x9f, 0x9f, 0x9f, 0xa0, 0xa0, 0xa1, 0xa1, 0xa1, 0xa2, 0xa2, 0xa3, 0xa3, 0xa3, 0xa4, 0xa4, 0xa5,
0xa5, 0xa5, 0xa6, 0xa6, 0xa6, 0xa7, 0xa7, 0xa8, 0xa8, 0xa8, 0xa9, 0xa9, 0xa9, 0xaa, 0xaa, 0xab,
0xab, 0xab, 0xac, 0xac, 0xac, 0xad, 0xad, 0xae, 0xae, 0xae, 0xaf, 0xaf, 0xaf, 0xb0, 0xb0, 0xb0,
0xb1, 0xb1, 0xb1, 0xb2, 0xb2, 0xb3, 0xb3, 0xb3, 0xb4, 0xb4, 0xb4, 0xb5, 0xb5, 0xb5, 0xb6, 0xb6,
0xb6, 0xb7, 0xb7, 0xb7, 0xb8, 0xb8, 0xb8, 0xb9, 0xb9, 0xb9, 0xba, 0xba, 0xba, 0xbb, 0xbb, 0xbb,
0xbc, 0xbc, 0xbc, 0xbd, 0xbd, 0xbd, 0xbe, 0xbe, 0xbe, 0xbf, 0xbf, 0xbf, 0xc0, 0xc0, 0xc0, 0xc1,
0xc1, 0xc1, 0xc1, 0xc2, 0xc2, 0xc2, 0xc3, 0xc3, 0xc3, 0xc4, 0xc4, 0xc4, 0xc5, 0xc5, 0xc5, 0xc6,
0xc6, 0xc6, 0xc6, 0xc7, 0xc7, 0xc7, 0xc8, 0xc8, 0xc8, 0xc9, 0xc9, 0xc9, 0xc9, 0xca, 0xca, 0xca,
0xcb, 0xcb, 0xcb, 0xcc, 0xcc, 0xcc, 0xcc, 0xcd, 0xcd, 0xcd, 0xce, 0xce, 0xce, 0xce, 0xcf, 0xcf,
0xcf, 0xd0, 0xd0, 0xd0, 0xd0, 0xd1, 0xd1, 0xd1, 0xd2, 0xd2, 0xd2, 0xd2, 0xd3, 0xd3, 0xd3, 0xd4,
0xd4, 0xd4, 0xd4, 0xd5, 0xd5, 0xd5, 0xd6, 0xd6, 0xd6, 0xd6, 0xd7, 0xd7, 0xd7, 0xd7, 0xd8, 0xd8,
0xd8, 0xd9, 0xd9, 0xd9, 0xd9, 0xda, 0xda, 0xda, 0xda, 0xdb, 0xdb, 0xdb, 0xdc, 0xdc, 0xdc, 0xdc,
0xdd, 0xdd, 0xdd, 0xdd, 0xde, 0xde, 0xde, 0xde, 0xdf, 0xdf, 0xdf, 0xe0, 0xe0, 0xe0, 0xe0, 0xe1,
0xe1, 0xe1, 0xe1, 0xe2, 0xe2, 0xe2, 0xe2, 0xe3, 0xe3, 0xe3, 0xe3, 0xe4, 0xe4, 0xe4, 0xe4, 0xe5,
0xe5, 0xe5, 0xe5, 0xe6, 0xe6, 0xe6, 0xe6, 0xe7, 0xe7, 0xe7, 0xe7, 0xe8, 0xe8, 0xe8, 0xe8, 0xe9,
0xe9, 0xe9, 0xe9, 0xea, 0xea, 0xea, 0xea, 0xeb, 0xeb, 0xeb, 0xeb, 0xec, 0xec, 0xec, 0xec, 0xed,
0xed, 0xed, 0xed, 0xee, 0xee, 0xee, 0xee, 0xef, 0xef, 0xef, 0xef, 0xef, 0xf0, 0xf0, 0xf0, 0xf0,
0xf1, 0xf1, 0xf1, 0xf1, 0xf2, 0xf2, 0xf2, 0xf2, 0xf3, 0xf3, 0xf3, 0xf3, 0xf3, 0xf4, 0xf4, 0xf4,
0xf4, 0xf5, 0xf5, 0xf5, 0xf5, 0xf6, 0xf6, 0xf6, 0xf6, 0xf6, 0xf7, 0xf7, 0xf7, 0xf7, 0xf8, 0xf8,
0xf8, 0xf8, 0xf9, 0xf9, 0xf9, 0xf9, 0xf9, 0xfa, 0xfa, 0xfa, 0xfa, 0xfb, 0xfb, 0xfb, 0xfb, 0xfb,
0xfc, 0xfc, 0xfc, 0xfc, 0xfd, 0xfd, 0xfd, 0xfd, 0xfd, 0xfe, 0xfe, 0xfe, 0xfe, 0xff, 0xff, 0xff,
};
</code></pre>
<p>Again we're going to start with the floating point version as it's easier to reason with.</p>
<p>We have a precision <code>P</code> of 9-bits: <code>P = (1<<9)-1 = 511 = 0x1ff</code>. But for the
sake of understanding the math, the following diagram will assume a <code>P</code> of <code>3</code>
so that we can clearly see the segment divisions:</p>
<p><img src="http://blog.pkh.me/img/oklab-int/srgb-eotf-lut.png" alt="centerimg" title="sRGB EOTF with a LUT of P=3" /></p>
<p>The input of our table is an integer index which needs to be calculated
according to our input <code>x</code>. But as stated earlier, we won't need one but two
indices in order to interpolate a point between 2 discrete values from our
table. We will refer to these indices as <code>iₚ</code> and <code>iₙ</code>, which can be computed
like this:</p>
<pre><code class="language-plaintext">i = x·P
iₚ = ⌊i⌋
iₙ = iₚ + 1
</code></pre>
<p>(<code>⌊a⌋</code> means <code>floor(a)</code>)</p>
<p>In order to get an approximation of <code>y</code> according to <code>i</code>, we simply need a
linear remapping: the ratio of <code>i</code> between <code>iₚ</code> and <code>iₙ</code> is the same ratio as
<code>y</code> between <code>yₚ</code> and <code>yₙ</code>. So yet again we're going to rely on <a href="http://blog.pkh.me/p/29-the-most-useful-math-formulas.html">the most useful
maths formulas</a>: <code>remap(iₚ,iₙ,yₚ,yₙ,i) = mix(yₚ,yₙ,linear(iₚ,iₙ,i))</code>.</p>
<p>The ratio <code>r</code> we're computing as an input to the y-mix can be simplified a bit:</p>
<pre><code class="language-plaintext">r = linear(iₚ,iₙ,i)
= (i-iₚ) / (iₙ-iₚ)
= i-iₚ
= x·P - ⌊x·P⌋
= fract(x·P)
</code></pre>
<p>So in the end our formula is simply: <code>y = mix(yₚ,yₙ,fract(x·P))</code></p>
<p>Translated into C we can write it like this:</p>
<pre><code class="language-c">uint8_t linear_f32_to_srgb_u8_fast(float x)
{
if (x <= 0.f) {
return 0;
} else if (x >= 1.f) {
return 0xff;
} else {
const float i = x * P;
const int32_t idx = (int32_t)floorf(i);
const float y0 = linear2srgb[idx];
const float y1 = linear2srgb[idx + 1];
const float r = i - idx;
return lrintf(mix(y0, y1, r));
}
}
</code></pre>
<p><strong>Note</strong>: in case you are concerned about <code>idx+1</code> overflowing,
<code>floorf((1.0-FLT_EPSILON)*P)</code> is <code>P-1</code>, so this is safe.</p>
<h3>Linear to sRGB, integer version</h3>
<p>In the integer version, our function input <code>x</code> is within <code>[0;K]</code>, so we need to
make a few adjustments.</p>
<p>The first issue we have is that with integer arithmetic our <code>i</code> and <code>idx</code> are
the same. We have <code>X=x·K</code> as input, so <code>i = idx = X·P/K</code> because we are using
an integer division, which in this case is equivalent to the <code>floor()</code>
expression in the float version. So while it's a simple and fast way to get
<code>yₚ</code> and <code>yₙ</code>, we have an issue figuring out the ratio <code>r</code>.</p>
<p>One tool we have is the modulo operator: the integer division is destructive of
the fractional part, but fortunately the modulo (the rest of the division)
gives this information back. It can also be obtained for free most of the time
because CPU division instructions tend to also provide that modulo as well
without extra computation.</p>
<p>If we give <code>m = (X·P) % K</code>, we have the fractional part of the division
expressed in the <code>K</code> scale, which means we can derivate our ratio <code>r</code> from it:
<code>r = m / K</code>.</p>
<p>Slipping the <code>K</code> division in our <code>mix()</code> expression we end up with the
following code:</p>
<pre><code class="language-c">uint8_t linear_int_to_srgb_u8(int32_t x)
{
if (x <= 0) {
return 0;
} else if (x >= K) {
return 0xff;
} else {
const int32_t xP = x * P;
const int32_t i = xP / K;
const int32_t m = xP % K;
const int32_t y0 = linear2srgb[i];
const int32_t y1 = linear2srgb[i + 1];
return (m * (y1 - y0) + K/2) / K + y0;
}
}
</code></pre>
<p>Testing this function for all the possible input of <code>x</code>, the biggest inaccuracy
is a off-by-one, which concerns 6280 of the 65536 possible values (less than
10%): 2886 "off by -1" and 3394 "off by +1". It matches exactly the inaccuracy
of the float version of this function, so I think we can be pretty happy with it.</p>
<p>Given how good this approach is, we could also consider applying the same
strategy for <code>cbrt</code>, so this is left as an exercise to the reader.</p>
<h3>Back to the core</h3>
<p>We're finally in our last function. Using everything we've learned so far, it
can be trivially converted to integer arithmetic:</p>
<pre><code class="language-c">uint32_t oklab_int_to_srgb_u8(struct LabInt c)
{
const int64_t l_ = c.L + div_round64(25974LL * c.a, K) + div_round64( 14143LL * c.b, K);
const int64_t m_ = c.L + div_round64(-6918LL * c.a, K) + div_round64( -4185LL * c.b, K);
const int64_t s_ = c.L + div_round64(-5864LL * c.a, K) + div_round64(-84638LL * c.b, K);
const int32_t l = l_*l_*l_ / K2;
const int32_t m = m_*m_*m_ / K2;
const int32_t s = s_*s_*s_ / K2;
const uint8_t r = linear_int_to_srgb_u8((267169LL * l - 216771LL * m + 15137LL * s + K/2) / K);
const uint8_t g = linear_int_to_srgb_u8((-83127LL * l + 171030LL * m - 22368LL * s + K/2) / K);
const uint8_t b = linear_int_to_srgb_u8(( -275LL * l - 46099LL * m + 111909LL * s + K/2) / K);
return r<<16 | g<<8 | b;
}
</code></pre>
<p>Important things to notice:</p>
<ul>
<li>we're storing <code>l_</code>, <code>m_</code> and <code>s_</code> in 64-bits values so that the following
cubic do not overflow</li>
<li>we're using <code>div_round64</code> for part of the expressions of <code>l_</code>, <code>m_</code> and <code>s_</code>
because they are using signed sub-expressions</li>
<li>we're using a naive integer division in <code>r</code>, <code>g</code> and <code>b</code> because the value is
expected to be positive</li>
</ul>
<h2>Evaluation</h2>
<p>We're finally there. In the end the complete code is less than 200 lines of
code and even less for the optimized float one (assuming we don't implement our
own <code>cbrt</code>). The complete code, test functions and benchmarks tools <a href="https://github.com/ubitux/oklab-int">can be
found on Github</a>.</p>
<h3>Accuracy</h3>
<p>Comparing the integer version to the reference float gives use the following results:</p>
<ul>
<li>sRGB to OkLab: <code>max_diff=0.000883 total_diff=0.051189</code></li>
<li>OkLab to sRGB: <code>max_diff_r=2 max_diff_g=1 max_diff_b=1</code></li>
</ul>
<p>I find these results pretty decent for an integer version, but you're free to
disagree and improve them.</p>
<h3>Speed</h3>
<p>The benchmarks are also interesting: on my main workstation (Intel® Core™
i7-12700, glibc 2.36, GCC 12.2.0), the integer arithmetic is slightly slower
that the optimized float version:</p>
<table>
<thead>
<tr>
<th style="text-align:left">Command</th>
<th style="text-align:right">Mean [s]</th>
<th style="text-align:right">Min [s]</th>
<th style="text-align:right">Max [s]</th>
<th style="text-align:right">Relative</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left"><strong>Reference</strong></td>
<td style="text-align:right">1.425 ± 0.008</td>
<td style="text-align:right">1.414</td>
<td style="text-align:right">1.439</td>
<td style="text-align:right">1.59 ± 0.01</td>
</tr>
<tr>
<td style="text-align:left"><strong>Fast float</strong></td>
<td style="text-align:right">0.897 ± 0.005</td>
<td style="text-align:right">0.888</td>
<td style="text-align:right">0.902</td>
<td style="text-align:right">1.00</td>
</tr>
<tr>
<td style="text-align:left"><strong>Integer arithmetic</strong></td>
<td style="text-align:right">0.937 ± 0.006</td>
<td style="text-align:right">0.926</td>
<td style="text-align:right">0.947</td>
<td style="text-align:right">1.04 ± 0.01</td>
</tr>
</tbody>
</table>
<p>Observations:</p>
<ul>
<li>The FPU is definitely fast in modern CPUs</li>
<li>Both integer and optimized float versions are destroying the reference code
(note that this only because of the transfer functions optimizations, as we
have no change in the OkLab functions themselves in the optimized float
version)</li>
</ul>
<p>On the other hand, on one of my random ARM board (NanoPI NEO 2 with a Cortex
A53, glibc 2.35, GCC 12.1.0), I get different results:</p>
<table>
<thead>
<tr>
<th style="text-align:left">Command</th>
<th style="text-align:right">Mean [s]</th>
<th style="text-align:right">Min [s]</th>
<th style="text-align:right">Max [s]</th>
<th style="text-align:right">Relative</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left"><strong>Reference</strong></td>
<td style="text-align:right">27.678 ± 0.009</td>
<td style="text-align:right">27.673</td>
<td style="text-align:right">27.703</td>
<td style="text-align:right">2.04 ± 0.00</td>
</tr>
<tr>
<td style="text-align:left"><strong>Fast float</strong></td>
<td style="text-align:right">15.769 ± 0.001</td>
<td style="text-align:right">15.767</td>
<td style="text-align:right">15.772</td>
<td style="text-align:right">1.16 ± 0.00</td>
</tr>
<tr>
<td style="text-align:left"><strong>Integer arithmetic</strong></td>
<td style="text-align:right">13.551 ± 0.001</td>
<td style="text-align:right">13.550</td>
<td style="text-align:right">13.553</td>
<td style="text-align:right">1.00</td>
</tr>
</tbody>
</table>
<p>Not that much faster proportionally speaking, but the integer version is still
significantly faster overall on such low-end device.</p>
<h2>Conclusion</h2>
<p>This took me ages to complete, way longer than I expected but I'm pretty happy
with the end results and with everything I learned in the process. Also, you
may have noticed how much I referred to previous work; this has been
particularly satisfying from my point of view (re-using previous toolboxes
means they were actually useful). This write-up won't be an exception to the
rule: in a later article, I will make use of OkLab for another project I've
been working on for a while now. See you soon!</p>
http://blog.pkh.me/p/37-gcc-undefined-behaviors-are-getting-wild.html
http://blog.pkh.me/p/37-gcc-undefined-behaviors-are-getting-wild.html
GCC undefined behaviors are getting wildSun, 27 Nov 2022 22:13:26 -0000<p>Happy with my recent breakthrough in <a href="http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html">understanding C integer divisions</a>
after weeks of struggle, I was minding my own business having fun writing
integer arithmetic code. Life was good, when suddenly… <code>zsh: segmentation fault (core dumped)</code>.</p>
<p>That code wasn't messing with memory much so it was more likely to be a side
effect of an arithmetic overflow or something. Using <code>-fsanitize=undefined</code>
quickly identified the issue, which confirmed the presence of an integer
overflow. The fix was easy but something felt off. I was under the impression
my code was robust enough against that kind of honest mistake. Turns out, the
protecting condition I had in place should indeed have been enough, so I tried
to extract a minimal reproducible case:</p>
<pre><code class="language-c">#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
uint8_t tab[0x1ff + 1];
uint8_t f(int32_t x)
{
if (x < 0)
return 0;
int32_t i = x * 0x1ff / 0xffff;
if (i >= 0 && i < sizeof(tab)) {
printf("tab[%d] looks safe because %d is between [0;%d[\n", i, i, (int)sizeof(tab));
return tab[i];
}
return 0;
}
int main(int ac, char **av)
{
return f(atoi(av[1]));
}
</code></pre>
<p>The overflow can happen on <code>x * 0x1ff</code>. Since an integer overflow is undefined,
GCC makes the assumption that it cannot happen, ever. In practice in this case
it does, but the <code>i >= 0 && i < sizeof(tab)</code> condition should be enough to take
care of it, whatever crazy value it becomes, right? Well, I have bad news:</p>
<pre><code class="language-shell">% cc -Wall -O2 overflow.c -o overflow && ./overflow 50000000
tab[62183] looks safe because 62183 is between [0;512[
zsh: segmentation fault (core dumped) ./overflow 50000000
</code></pre>
<p><strong>Note</strong>: this is GCC <code>12.2.0</code> on x86-64.</p>
<p>We have <code>i=62183</code> as the result of the overflow, and nevertheless the execution
violates the gate condition, spout a non-sense lie, go straight into
dereferencing <code>tab</code>, and die miserably.</p>
<p>Let's study what GCC is doing here. Firing up Ghidra we observe the following
decompiled code:</p>
<pre><code class="language-c">uint8_t f(int x)
{
int tmp;
if (-1 < x) {
tmp = x * 0x1ff;
if (tmp < 0x1fffe00) {
printf("tab[%d] looks safe because %d is between [0;%d[\n",(ulong)(uint)tmp / 0xffff, (ulong)(uint)tmp / 0xffff,0x200);
return tab[(int)((uint)tmp / 0xffff)];
}
}
return '\0';
}
</code></pre>
<p>When I said GCC makes the assumption that it cannot happen this is what I
meant: <code>tmp</code> is not supposed to overflow so part of the condition I had in
place was simply removed. More specifically since <code>x</code> can not be lesser than
<code>0</code>, and since GCC assumes a multiplication cannot overflow into a random value
(that could be negative) because it is undefined behaviour, it then decides to
drop the "redundant" <code>i >= 0</code> condition because "it cannot happen".</p>
<p>I <a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107890">reported that exact issue to GCC</a> to make sure it wasn't a bug, and
it was indeed confirmed to me that the undefined behaviour of an integer
overflow is not limited in scope to whatever insane value it could take: it is
apparently perfectly acceptable to mess up the code flow entirely.</p>
<p>While I understand how attractive it can be from an optimization point of view,
the paranoid developer in me is straight up terrified by the perspective of a
single integer overflow removing security protection and causing such havoc.
I've worked several years in a project where the integer overflows were (and
probably still are) legion. Identifying and fixing of all them is likely a
lifetime mission of several opinionated individuals.</p>
<p>I'm expecting this article to make the rust crew go in a crusade again, and I
think I might be with them this time.</p>
<p><strong>Edit</strong>: it was made clear to me while reading <a href="https://predr.ag/blog/falsehoods-programmers-believe-about-undefined-behavior/">Predrag's blog</a> that
the key to my misunderstanding boils down to this: "Undefined behavior is not
the same as implementation-defined behavior". While I was indeed talking about
undefined behaviour, subconsciously I was thinking that the behaviour of an
overflow on a multiplication would be "implementation-defined behaviour". This
is not the case, it is indeed an undefined behaviour, and yes the compiler is
free to do whatever it wants to because it is compliant with the
specifications. It's my mistake of course, but to my defense, despite the
arrogant comments I read, this confusion happens a lot. This happens I believe
because it's violating the <a href="https://en.wikipedia.org/wiki/Principle_of_least_astonishment">Principle of least astonishment</a>. To
illustrate this I'll take <a href="https://undeadly.org/cgi?action=article&sid=20060330071917">this interesting old OpenBSD developer blog
post</a> being concerned about the result of the multiplication
rather than the invalidation of any guarantee with regard to what's going to
happen to the execution flow (before and after). This is not uncommon and in my
opinion perfectly understandable.</p>
http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html
http://blog.pkh.me/p/36-figuring-out-round%2C-floor-and-ceil-with-integer-division.html
Figuring out round, floor and ceil with integer divisionFri, 25 Nov 2022 08:28:34 -0000<p>Lately I've been transforming a float based algorithm to integers in order to
make it bit-exact. Preserving the precision as best as possible was way more
challenging than I initially though, which forced me to go deep down the rabbit
hole. During the process I realized I had many wrong assumptions about integer
divisions, and also discovered some remarkably useful mathematical properties.</p>
<p>This story is about a journey into figuring out equivalent functions to
<code>round(a/b)</code>, <code>floor(a/b)</code> and <code>ceil(a/b)</code> with <code>a</code> and <code>b</code> integers, while
staying in the integer domain (no intermediate <code>float</code> transformation allowed).</p>
<p><strong>Note</strong>: for the sake of conciseness (and to make a bridge with the
mathematics world), <code>floor(x)</code> and <code>ceil(x)</code> will sometimes respectively be
written <code>⌊x⌋</code> and <code>⌈x⌉</code>.</p>
<h2>Clarifying the mission</h2>
<p>Better than explained with words, here is how the functions we're looking for
behave with a real as input:</p>
<p><img src="http://blog.pkh.me/img/intdiv/round-floor-ceil.png" alt="centerimg" /></p>
<p>The dots indicate on which lines the stitching applies; for example <code>round(½)</code>
is <code>1</code>, not <code>0</code>.</p>
<h2>Language specificities (important!)</h2>
<p>Here are the corresponding prototypes, in C:</p>
<pre><code class="language-c">int div_round(int a, int b); // round(a/b)
int div_floor(int a, int b); // floor(a/b)
int div_ceil(int a, int b); // ceil(a/b)
</code></pre>
<p>We're going to work in C99 (or more recent), and this is actually the first
warning I have here. If you're working with a different language, you must
absolutely look into how its integer division works. In C, the integer division
is <strong>toward zero</strong>, for <strong>both positive and negative integers</strong>, and only
defined as such <strong>starting C99</strong> (it is implementation defined before that). Be
mindful about it if your codebase is in C89 or C90.</p>
<p>This means that in C:</p>
<pre><code class="language-c">printf("%d %d %d\n", 10/30, 15/30, 20/30);
printf("%d %d %d\n", -10/30, -15/30, -20/30);
</code></pre>
<p>We get:</p>
<pre><code class="language-plaintext">0 0 0
0 0 0
</code></pre>
<p>This is typically different in Python:</p>
<pre><code class="language-python">>>> 10//30, 15//30, 20//30
(0, 0, 0)
>>> -10//30, -15//30, -20//30
(-1, -1, -1)
</code></pre>
<p>In Python 2 and 3, the integer division is toward -∞, which means it is
directly equivalent to how the <code>floor()</code> function behaves.</p>
<p>In C, the integer division is equivalent to <code>floor()</code> <strong>only for positive
numbers</strong>, otherwise it behaves the same as <code>ceil()</code>. This is the division
behavior we will assume in this article:</p>
<p><img src="http://blog.pkh.me/img/intdiv/c-div.png" alt="centerimg" /></p>
<p>And again, I can't stress that enough: make sure you understand how the integer
division of your language works.</p>
<p>Similarly, you may have noticed we picked the <code>round</code> function as defined by
POSIX, meaning rounding half away from <code>0</code>. Again, in Python a different method
was selected:</p>
<pre><code class="language-python">>>> [round(x) for x in (0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5)]
[0, 2, 2, 4, 4, 6, 6]
</code></pre>
<p>Python is following the round toward even choice rule. This is not what we are
implementing here (<strong>Edit</strong>: a partial implementation is provided at the end
though). There are <a href="https://en.wikipedia.org/wiki/Rounding">many ways of rounding</a>, so make sure you've
clarified what method your language picked.</p>
<h2>Ceiling and flooring</h2>
<p>The integer division is symmetrical around <code>0</code> but <code>ceil</code> and <code>floor</code> aren't,
so we need a way get the sign in order to branch in one direction or another.
If <code>a</code> and <code>b</code> have the same sign, then <code>a/b</code> is positive, otherwise it's
negative. This is well expressed with a <code>xor</code> operator, so we will be using the
sign of <code>(a^b)</code> (where <code>^</code> is a <code>xor</code> operator). Of course we only need to
<code>xor</code> the sign bit so we could instead use <code>(a<0)^(b<0)</code> but it is a bit more
complex.</p>
<p><strong>Edit</strong>: note that <code>(a^b)</code> is not <code>> 0</code> when <code>a == b</code>. Also, as <a href="https://lobste.rs/s/eggk4l/figuring_out_round_floor_ceil_with#c_okgqlh">pointed out
on lobste.rs</a> it's likely to rely on unspecified /
implementation-defined behavior (hopefully not undefined behaviour). We could
use the safer <code>(a<0)^(b<0)</code> form which only generates an extra shift
instruction on x86.</p>
<p>Looking at the graphics, we observe the following symmetries:</p>
<ul>
<li><code>floor(x)</code>:
<ul>
<li>For positive <code>x</code>, the C division works the same</li>
<li>For negative <code>x</code>, the C division is one step too high (with the exception
of the stitching point)</li>
</ul>
</li>
<li><code>ceil(x)</code>
<ul>
<li>For negative <code>x</code>, the C division works the same</li>
<li>For positive <code>x</code>, the C division is one step too low (with the exception
of the stitching point)</li>
</ul>
</li>
</ul>
<p>We can translate these observations into code using a modulo trick (which
purpose is to <strong>not</strong> offset the stitching point when the division is round):</p>
<pre><code class="language-c">int div_floor(int a, int b) { return a/b - (a%b!=0 && (a^b)<0); }
int div_ceil(int a, int b) { return a/b + (a%b!=0 && (a^b)>0); }
</code></pre>
<p>One may wonder about the double division (<code>a/b</code> and <code>a%b</code>), but fortunately CPU
architectures usually offer a division instruction that computes both at once
so this is not as expensive as it would seem in the first place.</p>
<p>Now you also have an alternative without the modulo, but it generates less
effective code (at least here on <code>x86-64</code> with a modern CPU according to my
benchmarks):</p>
<pre><code class="language-c">int div_floor(int a, int b) { return (a^b)<0 && a ? (1-abs(a))/abs(b)-1 : a/b; }
int div_ceil(int a, int b) { return (a^b)>0 && a ? (abs(a)-1)/abs(b)+1 : a/b; }
</code></pre>
<p><strong>Edit</strong>: note that these versions suffer from undefined behaviour in case of
<code>abs(INT_MIN)</code> as pointed out by <code>nortti</code> in previous comment about <code>xor</code>.</p>
<p>I have no hard proof to provide for these right now, so this is left as an
exercise to the reader, but some tools can be found in in <em>Concrete Mathematics
(2nd ed)</em> by Ronald L. Graham, Donald E. Knuth and Oren Patashnik. In
particular:</p>
<ul>
<li>the reflection properties: <code>⌊-x⌋ = -⌈x⌉</code> and <code>⌈-x⌉ = -⌊x⌋</code></li>
<li><code>⌈n/m⌉ = ⌊(n-1)/m⌋+1</code> and <code>⌊n/m⌋ = ⌈(n+1)/m⌉-1</code></li>
</ul>
<h2>Rounding</h2>
<p>The <code>round()</code> function is the most useful one when trying to approximate floats
operations with integers (typically what I was looking for initially:
converting an algorithm into a bit-exact one).</p>
<p>We are going to study the positive ones only at first, and try to define it
according to the integer C division (just like we did for <code>floor</code> and <code>ceil</code>).
Since we are on the positive side, the division is equivalent to a <code>floor()</code>,
which simplifies a bunch of things.</p>
<p>I initially used a <code>round</code> function defined as <code>round(a,b) = (a+b/2)/b</code> and
thought to myself "if we are improving the accuracy of the division by <code>b</code>
using a <code>b/2</code> offset, why shouldn't we also improve the accuracy of <code>b/2</code> by
doing <code>(b+1)/2</code> instead?" Very proud of my deep insight I went on with this,
until I realized it was causing more off by ones (with a bias always in the
same direction). So <strong>don't do that</strong>, it's wrong, we will instead try to find
the appropriate formula.</p>
<p>Looking at the <code>round</code> function we can make the observation that it's pretty
much the <code>floor()</code> function with the <code>x</code> offset by <code>½</code>: <code>round(x) = floor(x+½)</code></p>
<p>So we have:</p>
<pre><code class="language-plaintext">round(a/b) = ⌊a/b + ½⌋
= ⌊(2a+b)/(2b)⌋
</code></pre>
<p>We could stop right here but this suffers from overflow limitations if
translated into C. We are lucky though, because we're about to discover the
most mind blowing property of integers division:</p>
<p><img src="http://blog.pkh.me/img/intdiv/nested-division.png" alt="centerimg" /></p>
<p>This again comes from <em>Concrete Mathematics (2nd ed)</em>, page 72.</p>
<p>You may not immediately realize how insane and great this is, so let me
elaborate: it basically means <code>N</code> successive truncating divisions can be merged
into one <strong>without loss of precision</strong> (and the other way around).</p>
<p>Here is a concrete example:</p>
<pre><code class="language-python">>>> n = 5647817612937
>>> d = 712
>>> n//d//d//d == n//(d*d*d)
True
</code></pre>
<p>That's great but how does that help us? Well, we can do this now:</p>
<pre><code class="language-plaintext">round(a/b) = ⌊a/b + ½⌋
= ⌊(2a+b)/(2b)⌋
= ⌊⌊(2a+b)/2⌋/b⌋ <--- applying the nested division property to split in 2 floor expressions
= ⌊⌊a+b/2⌋/b⌋
= ⌊(a+⌊b/2⌋)/b⌋
</code></pre>
<p>How cute is that, we're back to the original formula I was using: <code>round(a,b) = (a+b/2)/b</code> (because again the C division is equivalent to <code>floor()</code> for
positive values).</p>
<p>Now how about the negative version, that is when <code>a/b < 0</code>? We can make the
similar observation that for a negative <code>x</code>, <code>round(x) = ceil(x-½)</code>, so we
have:</p>
<pre><code class="language-plaintext">round(a/b) = ⌈a/b - ½⌉
= ⌈(2a-b)/(2b)⌉
= ⌈⌈(2a-b)/2⌉/b⌉
= ⌈⌈a-b/2⌉/b⌉
= ⌈(a-⌈b/2⌉)/b⌉
</code></pre>
<p>And since <code>a/b</code> is negative, the C division is equivalent to <code>ceil()</code>. So in
the end we simply have:</p>
<pre><code class="language-c">int div_round(int a, int b) { return (a^b)<0 ? (a-b/2)/b : (a+b/2)/b; }
</code></pre>
<p>This is the generic version, but of course in many cases we can (and probably
should) simplify the expression appropriately.</p>
<p>Let's say for example we want to remap an <code>u16</code> to an <code>u8</code>:
<code>remap(x,0,0xff,0,0xffff) = x*0xff/0xffff = x/257</code>. The appropriate way to
round this division is simply: <code>(x+257/2)/257</code>, or just: <code>(x+128)/257</code>.</p>
<p><strong>Edit</strong>: it was pointed out several times on <a href="https://news.ycombinator.com/item?id=33751236">HackerNews</a> that
this function still suffer from overflows. Though, it remains more robust than
the previous version with <code>×2</code>.</p>
<h2>Bonus: partial round to even choice rounding</h2>
<p>Equivalent to <code>lrintf</code>, this function provided by <a href="https://mathstodon.xyz/@antopatriarca/109408606503586148">Antonio on
Mastodon</a> can be used:</p>
<pre><code class="language-c">static int div_lrint(int a, int b)
{
const int d = a/b;
const int m = a%b;
return m < b/2 + (b&1) ? d : m > b/2 ? d + 1 : (d + 1) & ~1;
}
</code></pre>
<p><strong>Warning</strong>: this only works with positive values.</p>
<h2>Verification</h2>
<p>Since you should definitely not trust my math nor my understanding of
computers, here is a test code to demonstrate the exactitude of the formulas:</p>
<pre><code class="language-c">#include <stdio.h>
#include <math.h>
static int div_floor(int a, int b) { return a/b - (a%b!=0 && (a^b)<0); }
static int div_ceil(int a, int b) { return a/b + (a%b!=0 && (a^b)>0); }
static int div_round(int a, int b) { return (a^b)<0 ? (a-b/2)/b : (a+b/2)/b; }
#define N 3000
int main()
{
for (int a = -N; a <= N; a++) {
for (int b = -N; b <= N; b++) {
if (!b)
continue;
const float f = a / (float)b;
const int ef = (int)floorf(f);
const int er = (int)roundf(f);
const int ec = (int)ceilf(f);
const int of = div_floor(a, b);
const int or = div_round(a, b);
const int oc = div_ceil(a, b);
const int df = ef != of;
const int dr = er != or;
const int dc = ec != oc;
if (df || dr || dc) {
fprintf(stderr, "%d/%d=%g%s\n", a, b, f, (a ^ b) < 0 ? " (diff sign)" : "");
if (df) fprintf(stderr, "floor: %d ≠ %d\n", of, ef);
if (dr) fprintf(stderr, "round: %d ≠ %d\n", or, er);
if (dc) fprintf(stderr, "ceil: %d ≠ %d\n", oc, ec);
}
}
}
return 0;
}
</code></pre>
<h2>Conclusion</h2>
<p>These trivial code snippets have proven to be extremely useful to me so far,
and I have the hope that it will benefit others as well. I spent an
unreasonable amount of time on this issue, and given the amount of mistakes (or
at the very least non optimal code) I've observed in the wild, I'm most
certainly not the only one being confused about all of this.</p>
http://blog.pkh.me/p/35-investigating-why-steam-started-picking-a-random-font.html
http://blog.pkh.me/p/35-investigating-why-steam-started-picking-a-random-font.html
Investigating why Steam started picking a random fontFri, 18 Nov 2022 22:17:04 -0000<p>Out of the blue my Steam started picking a random font I had in my user fonts
dir: <a href="https://github.com/excalidraw/virgil/">Virgil</a>, the <a href="https://excalidraw.com/">Excalidraw</a> font.</p>
<p><img src="http://blog.pkh.me/img/steam-font-broken.png" alt="centerimg" /></p>
<p>That triggered me all sorts of emotions, ranging from laugh to total
incredulity. I initially thought the root cause was a random derping from Valve
but the Internet seemed quiet about it, so the unreasonable idea that it might
have been my fault surfaced.</p>
<p>To understand how it came to this, I have to tell you about <a href="https://store.steampowered.com/app/221910/The_Stanley_Parable/">The Stanley
Parable</a>, an incredibly funny game I highly recommend. One of the
achievement of the game is to not play it for 5 years.</p>
<p>To get it, I disabled NTP, changed my system clock to 2030, started the game,
enjoyed my achievement and restored NTP. So far so good, mission is a success,
I can move on with my life.</p>
<p>But not satisfied with this first victory I soon wanted to achieve the same in
<a href="https://store.steampowered.com/app/1703340/The_Stanley_Parable_Ultra_Deluxe/">the Ultra Deluxe</a> edition. This one comes with the same
achievement, except it's 10 years instead of 5. Since 2022+10 is too hard of a
mental calculation for me I rounded it up to 2040, and followed the same
procedure as previously. Achievement unlocked, easy peasy.</p>
<p>Problem is, Steam accessed many files during that short lapse of time, which
caused them to have their access time updated to 2040. And you know what's
special about 2040? It's <strong>after 2038</strong>.</p>
<p>Get it yet? Here is a hint: <a href="https://en.wikipedia.org/wiki/Year_2038_problem">Year 2038 problem</a>.</p>
<p>This is the kind of error I was seeing in the console: <code>"/usr/share/fonts": Value too large for defined data type</code>.</p>
<p>What kind of error could that be?</p>
<pre><code class="language-shell">% errno -s "Value too large"
EOVERFLOW 75 Value too large for defined data type
</code></pre>
<p>Nice, so we're triggering an overflow somewhere. More precisely, fontconfig
32-bit (an underlying code to be exact) was going mad crazy because of this:</p>
<pre><code class="language-shell">% stat /etc/fonts/conf.d/*|grep 2040
Access: 2040-11-22 00:00:04.110328309 +0100
Access: 2040-11-22 00:00:04.110328309 +0100
Access: 2040-11-22 00:00:04.110328309 +0100
...
</code></pre>
<p>In order to fix this mess I had to be a bit brutal:</p>
<pre><code class="language-shell">% sudo mount -o remount,strictatime /
% sudo mount -o remount,strictatime /home
% sudo find / -newerat 2039-12-31 -exec touch -a {} +
% sudo mount -o remount,relatime /
% sudo mount -o remount,relatime /home
</code></pre>
<p>The remounts were needed because <code>relatime</code> is the default, which means file
accesses get updated only if the current time is past the access time. And I
had to remount both my root and home partition because Steam touched files
everywhere.</p>
<p>Not gonna lie, this self-inflicted bug brought quite a few life lessons to me:</p>
<ul>
<li>The Stanley Parable meta-game has no limit to madness</li>
<li>2038 is going to be a lot of fun</li>
<li>32-bit games preservation is a sad state of affair</li>
</ul>
http://blog.pkh.me/p/34-exploring-intricate-execution-mysteries-by-reversing-a-crackme.html
http://blog.pkh.me/p/34-exploring-intricate-execution-mysteries-by-reversing-a-crackme.html
Exploring intricate execution mysteries by reversing a crackmeThu, 27 Oct 2022 10:04:29 -0000<p>It's been a very long time since I've done some actual reverse engineering
work. Going through a difficult period currently, I needed to take a break from
the graphics world and go back to the roots: understanding obscure or
elementary tech stuff. One may argue that it was most certainly not the best
way to deal with a burnout, but apparently that was what I needed at that
moment. Put on your black hoodie and follow me, it's gonna be fun.</p>
<h2>The beginning and the start of the end</h2>
<p>So I started solving a few crackmes from <a href="https://crackmes.one">crackmes.one</a> to get a hang of
it. Most were solved in a relatively short time window, until I came across
<a href="https://crackmes.one/crackme/615888be33c5d4329c344f66">JCWasmx86's cm001</a>. I initially thought the most interesting part was
going to be reversing the key verification algorithm, and I couldn't be more
wrong. This article will be focusing on various other aspects (while still
covering the algorithm itself).</p>
<h2>The validation function</h2>
<p>After loading the executable into <a href="https://github.com/NationalSecurityAgency/ghidra">Ghidra</a> and following the entry
point, we can identify the <code>main</code> function quickly. A few renames later we
figure out that it's a pretty straightforward function (code adjusted manually
from the decompiled view):</p>
<pre><code class="language-c">int main(void)
{
char input[64+1] = {0};
puts("Input:");
fgets(input, sizeof(input), stdin);
validate_input(input, strlen(input));
return 0;
}
</code></pre>
<p>The <code>validate_input()</code> function on the other hand is quite a different beast.
According to the crackme description we can expect some parts written in
assembly. And indeed, it's hard to make Ghidra generate a sane decompiled code
out of it. For that reason, we are going to switch to a graph view
representation.</p>
<p>I'm going to use <a href="https://cutter.re/">Cutter</a> for… aesthetic reasons. Here it is, with a
few annotations to understand what is actually happening:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/validate_input.png" alt="centerimg" /></p>
<p>To summarize, we have a 64 bytes long input, split into 4 lanes of data, which
are followed by a series of checks. This flow is very odd for several reasons
though:</p>
<ol>
<li>We don't see any exit here: it basically ends with a division, and all other
exits lead to <code>failed_password</code> (the function that displays the error). What
we also don't see in the graph is that after the last instruction (<code>div</code>,
<code>Oddity #1</code>), the code falls through into the <code>failed_password</code> code, just
like the other exit code paths</li>
<li>We see an explicit check only for the first and second lanes, the 2 others
are somehow used in the division, but even there, only slices of them are
used, the rest is stored at some random global location (in the <code>.bss</code>, at
<code>0x4040b0</code> and <code>0x4040a8</code> respectively)</li>
<li>128 bits of data are stored at <code>0x4040b0</code> (<code>Oddity #0</code>): we'll see later why
this is strange</li>
</ol>
<p>The only way I would see this flow go somewhere else would be some sort of
exception/interruption. Looking through all the instructions again, the only
one I see causing anything like this would be the last <code>div</code> instruction, with
a floating point exception. But how could that even be caught and handled, we
didn't see anything about it in the main or in the validate function.</p>
<p>At some point, something grabbed my attention:</p>
<pre><code class="language-plaintext">Relocation section '.rela.plt' at offset 0x598 contains 6 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000404018 000100000007 R_X86_64_JUMP_SLO 0000000000000000 puts@GLIBC_2.2.5 + 0
000000404020 000200000007 R_X86_64_JUMP_SLO 0000000000000000 write@GLIBC_2.2.5 + 0
000000404028 000300000007 R_X86_64_JUMP_SLO 0000000000000000 strlen@GLIBC_2.2.5 + 0
000000404030 000500000007 R_X86_64_JUMP_SLO 0000000000000000 fgets@GLIBC_2.2.5 + 0
000000404038 000600000007 R_X86_64_JUMP_SLO 0000000000000000 signal@GLIBC_2.2.5 + 0
000000404040 000800000007 R_X86_64_JUMP_SLO 0000000000000000 exit@GLIBC_2.2.5 + 0
</code></pre>
<p>There is a <code>signal</code> symbol in the relocation section, so there must be code
somewhere calling this function, and it must certainly happens before the
<code>main</code>. Tracing back the function usage from Ghidra land us here (again, code
reworked from its decompiled form):</p>
<pre><code class="language-c">void _INIT_1(void)
{
signal(SIGFPE, handle_fpe);
return;
}
</code></pre>
<p>But how does this function end up being called?</p>
<h2>Program entry point</h2>
<p>At this point I needed to dive quite extensively into the Linux program startup
procedure in order to understand what the hell was going on. I didn't need to
understand it all during the reverse, but I came back to it later on to clarify
the situation. I'll try to explain the best I can how it essentially works
because it's probably the most useful piece of information I got out of this
experience. Brace yourselves.</p>
<h3>Modern (glibc ≥ 2.34, around 2018)</h3>
<p>On a Linux system with a modern glibc, if we try to compile <code>int main(){return 0;}</code> into an ELF binary (<code>cc test.c -o test</code>), the file <code>crt1.o</code> (for <em>Core
Runtime Object</em>) or one of its variant such as <code>Scrt1.o</code> (<code>S</code> for "shared") is
linked into the final executable by the toolchain linker. These object files
are distributed by our libc package, glibc being the most common one.</p>
<p>They contain the real entry point of the program, identified by the label
<code>_start</code>. Their bootstrap code is actually fairly short:</p>
<pre><code class="language-shell">% objdump -d -Mintel /usr/lib/Scrt1.o
/usr/lib/Scrt1.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_start>:
0: f3 0f 1e fa endbr64
4: 31 ed xor ebp,ebp
6: 49 89 d1 mov r9,rdx
9: 5e pop rsi
a: 48 89 e2 mov rdx,rsp
d: 48 83 e4 f0 and rsp,0xfffffffffffffff0
11: 50 push rax
12: 54 push rsp
13: 45 31 c0 xor r8d,r8d
16: 31 c9 xor ecx,ecx
18: 48 8b 3d 00 00 00 00 mov rdi,QWORD PTR [rip+0x0] # 1f <_start+0x1f>
1f: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # 25 <_start+0x25>
25: f4 hlt
</code></pre>
<p>If we look closely at the assembly above, we notice it's a skeleton with a few
placeholders. More specifically the <code>call</code> argument and the <code>rdi</code> register just
before. These are respectively going to be replaced at link time with a call to
the <code>__libc_start_main()</code> function, and a pointer to the <code>main</code> function. Using
<code>objdump -r</code> clarifies these relocation entries:</p>
<pre><code class="language-plaintext"> 18: 48 8b 3d 00 00 00 00 mov rdi,QWORD PTR [rip+0x0] # 1f <_start+0x1f>
1b: R_X86_64_REX_GOTPCRELX main-0x4
1f: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # 25 <_start+0x25>
21: R_X86_64_GOTPCRELX __libc_start_main-0x4
</code></pre>
<p>Note that <code>__libc_start_main()</code> is an external function: it is located inside
the glibc itself (typically <code>/usr/lib/libc.so.6</code>).</p>
<p>Said in more simple terms, what this code is essentially doing is jumping
straight into the libc by calling <code>__libc_start_main(main, <a few other args>)</code>. That function will be responsible for calling <code>main</code> itself, using the
transmitted pointer.</p>
<p>Why not call directly the <code>main</code>? Well, there might be some stuff to initialize
before the <code>main</code>. Either in externally linked libraries, or simply through
constructors.</p>
<p>Here is an example of a C code with such a construct:</p>
<pre><code class="language-c">#include <stdio.h>
__attribute__((constructor))
static void ctor(void)
{
printf("ctor\n");
}
int main()
{
printf("main\n");
return 0;
}
</code></pre>
<pre><code class="language-shell">% cc test.c -o test && ./test
ctor
main
</code></pre>
<p>In this case, a pointer to <code>ctor</code> is stored in a table in one of the ELF
section: <code>.init_array</code>. At some point in <code>__libc_start_main()</code>, all the
functions of that array are going to be called one by one.</p>
<p>With this executable loaded into Ghidra, we can observe this table at that
particular section:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/init_array_ctor_example.png" alt="centerimg" /></p>
<p>So basically a table of 2 function pointers, the latter being our custom <code>ctor</code>
function.</p>
<p>The way that code is able to access the ELF header is for another story.
Similarly, even though related, I'm going to skip details about the dynamic
linker. I'll just point out that the program has an <code>.interp</code> section with a
string such as <code>"/lib64/ld-linux-x86-64.so.2"</code> identifying the dynamic linker
to use (which is also an ELF program, see <code>man ld.so</code> for more information).
This program is actually executed before our <code>main</code> as well since it is
responsible for loading the dynamic libraries.</p>
<h3>Legacy (glibc < 2.34)</h3>
<p>So far we've seen how a modern program is built and started, but it wasn't
always exactly like this. It actually changed "recently", around 2018. We have
to study how it was before because the crackme we're interested in is actually
compiled in these pre-2018 conditions. The patterns we get don't match the
modern construct we just observed.</p>
<p>If we look at how the <code>Scrt1.o</code> of glibc was before 2.34, we get the following:</p>
<pre><code class="language-plaintext">0000000000000000 <_start>:
0: 31 ed xor ebp,ebp
2: 49 89 d1 mov r9,rdx
5: 5e pop rsi
6: 48 89 e2 mov rdx,rsp
9: 48 83 e4 f0 and rsp,0xfffffffffffffff0
d: 50 push rax
e: 54 push rsp
f: 4c 8b 05 00 00 00 00 mov r8,QWORD PTR [rip+0x0] # 16 <_start+0x16>
16: 48 8b 0d 00 00 00 00 mov rcx,QWORD PTR [rip+0x0] # 1d <_start+0x1d>
1d: 48 8b 3d 00 00 00 00 mov rdi,QWORD PTR [rip+0x0] # 24 <_start+0x24>
24: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # 2a <_start+0x2a>
2a: f4 hlt
</code></pre>
<p>It's pretty similar to what we've seen before but we can see more relocation
entries (see <code>r8</code> and <code>rcx</code> registers). A grasp on the x86-64 calling
convention is going to be helpful here: a function is expected to read its
arguments in the following register order: <code>rdi</code>, <code>rsi</code>, <code>rdx</code>, <code>rcx</code>, <code>r8</code>,
<code>r9</code> (assuming no floats). In the dump above we can actually see all these
registers being loaded before the <code>call</code> instruction, so they're very likely
preparing the arguments for that <code>__libc_start_main</code> call.</p>
<p>At this point, we need to know more about <code>__libc_start_main</code> actual prototype.
Looking on the web for it, we may land on such a page:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/libc_start_main_web.png" alt="centerimg" /></p>
<p><strong>This is extremelly outdated</strong>. It is actually a prototype from a long time
ago when the <code>init</code> function passed as argument didn't receive any parameter.
The prototype for <code>__libc_start_main</code> in glibc now looks like this (extracted,
tweaked and commented for clarity from <code>glibc/csu/libc-start.c</code>):</p>
<pre><code class="language-c">int __libc_start_main(
int (*main)(int, char **, char ** MAIN_AUXVEC_DECL), /* RDI */
int argc, /* RSI */
char **argv, /* RDX */
__typeof (main) init, /* RCX */
void (*fini)(void), /* R8 */
void (*rtld_fini)(void), /* R9 */
void *stack_end /* RSP (stack pointer) */
)
</code></pre>
<p>The <code>init</code> parameter now matches the prototype of the <code>main</code>. For those
interested in archaeology, this is true <a href="https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=2b089f2101801ca2a3295fcd755261288ce6268e">since 2003</a>, which I
believe is around the Palaeolithic period.</p>
<p>Going back to our <code>__libc_start_main()</code> call at the entry point: there is now 2
extra arguments compared to the modern version: <code>rcx</code> (the <code>init</code> argument) and
<code>r8</code> (the <code>fini</code> argument). These are respectively going to point to two
functions respectively called <code>__libc_csu_init</code> and <code>__libc_csu_fini</code>. In
Ghidra if the binary is not stripped we observe the following:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/libc_csu_fini_init.png" alt="centerimg" /></p>
<p>Now here is the trick: where do you think these functions are located? One may
expect to have them in the glibc, just like <code>__libc_start_main</code>, but that's not
the case. They are actually embedded within our ELF binary. The reason for this
is still unclear to me.</p>
<p>The mechanism of injecting that code inside the binary was also a mystery to
me: while the canonical <code>crt1.o</code> mechanism is followed by build toolchains
since forever, that object doesn't contain <code>__libc_csu_init</code> and
<code>__libc_csu_fini</code>. So where the hell do they even come from? Well, here is the
magic trick (thank you <code>strace</code>):</p>
<pre><code class="language-shell">% file /lib/libc.so
/lib/libc.so: ASCII text
% cat /lib/libc.so
/* GNU ld script
Use the shared library, but some functions are only in
the static library, so try that secondarily. */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /usr/lib/libc.so.6 /usr/lib/libc_nonshared.a AS_NEEDED ( /usr/lib/ld-linux-x86-64.so.2 ) )
</code></pre>
<p>That's right, just as deceptive as <code>ld.so</code> is a program, <code>libc.so</code> is a linker
script. We see it instructing the linker to use <code>libc_nonshared.a</code>, which is
another file distributed by the glibc, containing a bunch of functions, notably
<code>__libc_csu_init</code> and <code>__libc_csu_fini</code>. This means that thanks to this script,
this static non-shared archive containing yet another batch of weird init
routines, is dumped into every dynamically linked ELF executable. I'm still
having a hard time processing this.</p>
<p>Note that <code>libc_nonshared.a</code> still exists in the modern setup (as of 2.36 at
least), but it's much smaller and doesn't have those functions anymore.</p>
<p>So what are these functions doing? Well, they're responsible for calling the
pre and post-main functions, just like <code>__libc_start_main</code> is doing in its
modern setup. Here is what they looked like before getting removed in glibc
2.34 (extracted and simplified from <code>glibc/csu/elf-init.c</code> in 2.33):</p>
<pre><code class="language-c">void __libc_csu_init (int argc, char **argv, char **envp)
{
_init ();
const size_t size = __init_array_end - __init_array_start;
for (size_t i = 0; i < size; i++)
(*__init_array_start [i]) (argc, argv, envp);
}
void __libc_csu_fini (void)
{
_fini ();
}
</code></pre>
<p><strong>Note</strong>: CSU likely stands for "C Start Up" or "Canonical Start Up".</p>
<p>The <a href="https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=035c012e32c11e84d64905efaf55e74f704d3668">commit removing these functions</a> is actually pretty damn
interesting and we can learn a lot from it:</p>
<ol>
<li>it has security implications: the ROP gadgets referred to are basically
snippets of instructions that are useful for exploitation, having them in
the binary is a liability</li>
<li><code>__libc_start_main()</code> kept its prototype for backward compatibility, so
<code>init</code> and <code>fini</code> arguments are still there, just passed as <code>NULL</code> (look at
the 2 <code>xor</code> instructions in the modern <code>Scrt1.o</code> shared earlier)</li>
<li>the forward compatibility on the other hand is not possible: we can run an
old executable on a modern system, but we cannot run a modern executable on
an old system</li>
</ol>
<p>With all that new knowledge we are now armed to decipher the startup mechanism
of our crackme.</p>
<h2>Within Ghidra</h2>
<p>After analysis, the entry point of our crackme looks like this:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/ghidra_entry.png" alt="centerimg" /></p>
<p>We recognize the <code>_start</code> pattern of our <code>crt1.o</code>. More specifically, we can
see that it's loading 2 pointers in <code>rcx</code> and <code>r8</code>, so we know we're in the
pattern pre-2018:</p>
<ul>
<li><code>r8</code>: <code>FUN_00401730</code> is <code>__libc_csu_fini</code></li>
<li><code>rcx</code>: <code>FUN_004016c0</code> is <code>__libc_csu_init</code></li>
<li><code>rdi</code>: <code>LAB_004010b0</code> is <code>main</code></li>
</ul>
<p>If we want to find the custom inits, we have to follow <code>__libc_csu_init</code>, where
we can see it matching the snippet shared earlier, except <code>__init_array_start</code>
is named <code>__DT_INIT_ARRAY</code> but still located at the <code>.init_array</code> ELF section.
And in that table, we find again our init callbacks:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/init_array.png" alt="centerimg" /></p>
<p><code>_INIT_0</code> corresponds to <code>frame_dummy</code>, and <code>_INIT_1</code> is the first user
constructor. So just like <code>ctor</code> in sample C code, we are interested in what's
happening in <code>_INIT_1</code>, which is the function shown earlier calling <code>signal</code>.</p>
<p>Of course, someone familiar with this pattern will go straight into the
<code>.init_array</code> section, but with crackmes you never know if they're actually
going to follow the expected path, so it's a good thing to be familiar with the
complete execution path.</p>
<h2>Going deeper, uncovering Ghidra bugs</h2>
<p>We could stop our research on the init procedure here but I have to make a
detour to talk about some unfortunate things in x86-64 and Ghidra (as of
10.1.5).</p>
<p>If we look at the decompiler view of the entry point, we see a weird prototype:</p>
<pre><code class="language-c">void entry(undefined8 param_1,undefined8 param_2,undefined8 param_3)
{
/* ... */
}
</code></pre>
<p>The thing is, when a program entry point is called, it's not supposed to have 3
arguments like that. According to glibc <code>sysdeps/x86_64/start.S</code> (which is the
source of <code>crt1.o</code>), here are the actual inputs for <code>_start</code>:</p>
<pre><code class="language-plaintext">This is the canonical entry point, usually the first thing in the text
segment. The SVR4/i386 ABI (pages 3-31, 3-32) says that when the entry
point runs, most registers' values are unspecified, except for:
%rdx Contains a function pointer to be registered with `atexit'.
This is how the dynamic linker arranges to have DT_FINI
functions called for shared libraries that have been loaded
before this code runs.
%rsp The stack contains the arguments and environment:
0(%rsp) argc
LP_SIZE(%rsp) argv[0]
...
(LP_SIZE*argc)(%rsp) NULL
(LP_SIZE*(argc+1))(%rsp) envp[0]
...
</code></pre>
<p>Basically only the <code>rdx</code> register is expected to be set (along with the stack
and its register) which the program entry function usually forwards down to
<code>__libc_start_main</code> (as <code>rtld_fini</code> argument) which itself passes it down to
<code>atexit</code>. You will find similar information in the kernel in its ELF loader
code.</p>
<p>Do you remember the x86-64 calling convention from earlier? The function
arguments are passed in the following register order: <code>rdi</code>, <code>rsi</code>, <code>rdx</code>,
<code>rcx</code>, <code>r8</code>, <code>r9</code>. But like we just saw the entry point code of the program is
expected to only read <code>rdx</code> (equivalent to the 3rd argument in the calling
convention), while <code>rdi</code> and <code>rsi</code> content is undefined. Since the program
entry point is usually respecting that (reading <code>rdx</code> to get <code>rtld_fini</code>),
Ghidra infers that the 1st and 2nd argument must also exist, and get confused
when <code>rdi</code> and <code>rsi</code> are actually overridden to setup the call to
<code>__libc_start_main</code> instead.</p>
<p>Now one may ask, why even use <code>rdx</code> in the 1st place if it conflicts with the
calling convention? Well, on 32-bit it uses <code>edx</code>, which makes a little more
sense to use since it doesn't overlap with the calling convention: all the
function arguments are expected to be on the stack on 32-bit. And during the
move to 64-bit they unfortunately just extended <code>edx</code> into <code>rdx</code>.</p>
<p>While not immediately problematic, I still don't know why they decided to use
<code>edx</code> on 32-bit in the kernel instead of the stack though; apparently this is
described in "SVR4/i386 ABI (pages 3-31, 3-32)" but I couldn't find much
information about it.</p>
<p>Anyway, all of this to say that until the NSA fixes <a href="https://github.com/NationalSecurityAgency/ghidra/issues/4667">the bug</a>, I'd
recommend to override the <code>_start</code> prototype: <code>void entry(undefined8 param_1,undefined8 param_2,undefined8 param_3)</code> should be <code>void _start(void)</code>,
and you should expect the code to read the <code>rdx</code> register.</p>
<h2>Remaining bits of the algorithm</h2>
<p>Alright, so we're back to our previous flow. Assuming the division raised a
floating point error, we're following the callback forwarded to <code>signal()</code>, and
we end up at another location, which after various renames and retyping in
Ghidra decompiler looks like this:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/handle_fpe.png" alt="centerimg" /></p>
<p>I'll spare you the details since it's an overly complex implementation of a
very simple routine:</p>
<ol>
<li>read the 2 halves of registers stored earlier (remember half of <code>lane2</code> and
<code>lane3</code> were stored for later use, here is where we read them back)</li>
<li>check that those are different</li>
<li>for each halves, make the sum of each element of the data by slicing it in
nibbles (4-bits), with each nibble value being permuted using a simple table</li>
<li>check that the checksums are the same</li>
</ol>
<p>And that's pretty much it.</p>
<p>Now we roughly know how the 64 bytes of input are read and checked. There is
one thing we need to study more though: the <code>div</code> instruction.</p>
<h2>Oddity #1: the division</h2>
<p>We need to understand how the <code>div</code> instruction works since it's the trigger to
our success path. Here is what the relevant Intel documentation says about it:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/div.png" alt="centerimg" /></p>
<p>In English this means that if we have <code>div rbx</code>, then the registers <code>rdx</code> and
<code>rax</code> are combined together to form a single 128-bit value, which is then
divided by <code>rbx</code>.</p>
<p>As a reminder, the chunk doing the division looks like this:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/div_asm.png" alt="centerimg" /></p>
<ul>
<li>Our divider is <code>rbx</code>, a large hardcoded number: <code>0xffff231203</code> (meaning the
exception cannot be a division by zero, but could be an overflow)</li>
<li><code>rax</code> contains the lower part of the <code>xmm3</code> register (the 4th lane) xored
with the higher part of the <code>xmm2</code> register (the 3rd lane)</li>
<li><code>rdx</code> contains… wait, what does it contain? We don't know.</li>
</ul>
<p>Looking through the code, <code>rdx</code> value looks pretty much undefined. If it's big
enough, the result of the division will luckily not fit in a 64-bit register
and will overflow, causing the floating point exception. Under "normal"
conditions it seems to happen, but if run through let's say <code>valgrind</code>, <code>rdx</code>
will be initialized to something else and the overflow won't be triggered.</p>
<p>This is actually a bug, an undefined behaviour in the crackme. That's too bad
because the original idea was pretty good. But it also means we won't have to
think much about whatever data we put into that part of the input.</p>
<h2>Oddity #0</h2>
<p>One last oddity before we're ready to write a keygen: the <code>Oddity #0</code> is a
write of a 128-bit register at an address where only 64 bits are available,
located at the end of the <code>.bss</code> section. For some reason the code still works
so I'm assuming we are lucky thanks to some padding in the memory map…</p>
<p>The issue can actually easily be noticed because it drives the decompiler nuts
in that area:</p>
<p><img src="http://blog.pkh.me/img/re-cm001/invalid_write.png" alt="centerimg" /></p>
<p>If you patch the instruction from <code>xmmword ptr [0x004040b0],XMM1</code> to <code>xmmword ptr [0x004040a8],XMM1</code>, you'll observe everything going back to normal in the
decompiler view.</p>
<p>I later became aware about <a href="https://github.com/JCWasmx86/Crackme/">the code source of the crackme on Github</a>,
so I could see why the mistake happened in the first place. I <a href="https://github.com/JCWasmx86/Crackme/issues/2">reported the
issue</a> if you want more information on that topic.</p>
<h2>Writing the keygen</h2>
<p>Onto the final step: writing a keygen.</p>
<p>To summarize all the conditions that need to be met:</p>
<ol>
<li>input length must be 64-bytes long</li>
<li>xor'ing each character of the 1st lane with each other (after encoding with
the xor key) must be 0</li>
<li>the sum of all the characters of the 2nd lane must be equal to: <code>(lane0[11] ^ xor_key[11]) × 136 + 314</code></li>
<li>the first half of the 3rd lane and the 2nd half of the 4th lane must be
different</li>
<li>the sum of the permuted nibbles of the first half of the 3rd lane and the
2nd half of the 4th lane must be equal</li>
<li>the 2nd half of the 3rd lane and the 1st half of the 1st lane don't really
matter</li>
</ol>
<p>I don't think solving this part is the most interesting, particularly for a
reader, but I described the strategy I followed in the keygen code, so I'll
just share it as is:</p>
<pre><code class="language-python"># Range of allowed characters in the input; we'll use the xor key as part of
# the password so we're kind of constraint to its range
xor_key = bytes.fromhex("64 47 34 36 72 73 6b 6a 38 2d 34 35 37 28 7e 3a")
ord_min, ord_max = min(xor_key), max(xor_key)
def xor0(data: str) -> int:
"""Encode the data using the xor key"""
assert len(data) == len(xor_key) == 16
r = 0
for c, x in zip(data, xor_key):
r ^= ord(c) ^ x
return r
def get_lane0(k11: str) -> str:
"""
Compute lane0 of the input
We have the following constraints on lane0:
- the character at position 11 must be k11
- xoring all characters must give 0
- input characters must be within accepted range (self-imposed)
Strategy:
- start with the xor key itself because the xor reduce will give our
perfect zero score
- replace the 11th char with our k11 and figure out which bits get off
because of it
- go through each character to see if we can flip the off bits
"""
lane0 = "".join(map(chr, xor_key))
lane0 = lane0[:11] + k11 + lane0[12:]
off = xor0(lane0)
off_bits = [(1 << i) for i in range(8) if off & (1 << i)]
fixed_lane0 = lane0
for i, c in enumerate(lane0):
if i == 11:
continue
remains = []
for bit in list(off_bits):
o = ord(c) ^ bit
if ord_min <= o <= ord_max:
c = chr(o)
else:
remains.append(bit)
fixed_lane0 = fixed_lane0[:i] + c + fixed_lane0[i + 1 :]
off_bits = remains
if not off_bits:
break
assert not off_bits
off = xor0(fixed_lane0)
assert xor0(fixed_lane0) == 0
return fixed_lane0
def get_lane1(t: int) -> str:
# First estimate by taking the average
avg_ord = t // 16
assert ord_min <= avg_ord <= ord_max
lane1 = [avg_ord] * 16
# Adjust with off by ones to reach target if necessary
off = sum(lane1) - t
if off:
sgn = [-1, 1][off < 0]
for i in range(abs(off)):
lane1[i] += sgn
assert sum(lane1) == t
return "".join(map(chr, lane1))
def get_divdata():
# The div data doesn't really matter, so we just use some slashes to carry
# the division meaning
d0 = d1 = "/" * 8
return d0, d1
def chksum4(data: str) -> int:
"""nibble (4-bit) checksum"""
permutes4 = [0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4]
return sum(permutes4[ord(c) >> 4] << 4 | permutes4[ord(c) & 0xF] for c in data)
def get_chksums4():
# We need the value to be different but the checksum to be the same, so we
# simply interleave 2 working characters differently
c0 = (chr(ord_min) + chr(ord_max)) * 4
c1 = (chr(ord_max) + chr(ord_min)) * 4
assert c0 != c1
assert chksum4(c0) == chksum4(c1)
return c0, c1
def get_passwords():
# The user input key is composed of 4x16B, which will be referred to as 4
# lanes: lane[0..3]. The character at lane0[11] defines what is going to be the
# target T that S=sum(lane1) will need to reach. Here we compute all potential
# T value that can be obtained within our range of characters.
x11 = xor_key[11]
allowed_ords = range(ord_min, ord_max + 1)
all_t = {136 * (o ^ x11) + 314: o for o in allowed_ords}
# Compute the extreme sums our input lane1 can reach and filter out T values
# that land outside these boundaries
min_t, max_t = ord_min * 16, ord_max * 16
possible_t = {t: chr(k11) for t, k11 in all_t.items() if min_t <= t <= max_t}
for t, k11 in possible_t.items():
lane0 = get_lane0(k11)
lane1 = get_lane1(t)
d0, d1 = get_divdata()
c0, c1 = get_chksums4()
lane2 = c0 + d0
lane3 = d1 + c1
password = lane0 + lane1 + lane2 + lane3
assert len(password) == 16 * 4
yield password
for password in get_passwords():
print(password)
</code></pre>
<p>It executes instantly and gives the following keys (it's not exhaustive to all
possibilities):</p>
<pre><code class="language-shell">% python 615888be33c5d4329c344f66_cm001.py
aG46rskj8-407(~:??>>>>>>>>>>>>>>(~(~(~(~////////////////~(~(~(~(
`G46rskj8-417(~:6666666666555555(~(~(~(~////////////////~(~(~(~(
cG46rskj8-427(~:PPOOOOOOOOOOOOOO(~(~(~(~////////////////~(~(~(~(
bG46rskj8-437(~:GGGGGGGGGGFFFFFF(~(~(~(~////////////////~(~(~(~(
gG46rskj8-467(~:..--------------(~(~(~(~////////////////~(~(~(~(
hG46rskj8-497(~:zzzzzzzzzzyyyyyy(~(~(~(~////////////////~(~(~(~(
mG46rskj8-4<7(~:aa``````````````(~(~(~(~////////////////~(~(~(~(
lG46rskj8-4=7(~:XXXXXXXXXXWWWWWW(~(~(~(~////////////////~(~(~(~(
oG46rskj8-4>7(~:rrqqqqqqqqqqqqqq(~(~(~(~////////////////~(~(~(~(
nG46rskj8-4?7(~:iiiiiiiiiihhhhhh(~(~(~(~////////////////~(~(~(~(
</code></pre>
<p>All of these seem to be working keys. We can visually see how each segment
corresponds to a specific part of the algorithm. The keys are ugly, but at
least they're printable.</p>
<p>The most tricky part for me was to anticipate the range of guaranteed keys, due
to the dependency between <code>lane0</code> and <code>lane1</code>, the rest was relatively simple.</p>
<h2>Conclusion</h2>
<p>I didn't expect such a ride to be honest. There were just so many incentives to
dig down the rabbit hole of various intricacies. The bugs in the crackme caused
me a lot of confusion, but I don't think they're even close to the obfuscated
level of the glibc and its messy history of deceptive patterns.</p>
http://blog.pkh.me/p/33-deconstructing-be%CC%81zier-curves.html
http://blog.pkh.me/p/33-deconstructing-be%CC%81zier-curves.html
Deconstructing Bézier curvesTue, 16 Aug 2022 06:29:19 -0000<p>Graphists, animators, game programmers, font designers, and other graphics
professionals and enthusiasts are often working with Bézier curves. They're
popular, extensively documented, and used pretty much everywhere. That being
said, I find them being explained almost exclusively in 2 or 3 dimensions,
which can be a source of confusion in various situations. I'll try to
deconstruct them a bit further in this article. At the end or the post, we'll
conclude with a concrete example where this deconstruction is helpful.</p>
<h2>A Bézier curve in pop culture</h2>
<p>Most people are first confronted with Bézier curves through an UI that may look
like this:</p>
<p><img src="http://blog.pkh.me/img/bezier/b3-labels.png" alt="centerimg" /></p>
<p>In this case the curve is composed of 4 user controllable points, meaning it's
a Cubic Bézier.</p>
<p><code>C₀</code>, <code>C₁</code>, <code>C₂</code> and <code>C₃</code> are respectively the start, controls and end 2D point
coordinates. Evaluating this formula for all the <code>t</code> values within <code>[0;1]</code> will
give all the points of the curve. Simple enough.</p>
<p>Now this is obvious but the important take here is that this formula applies
<strong>to each dimension</strong>. Since we are working in 2D here, it is evaluated on both
the x and y-axis. As a result, a more explicit writing of the formula would be:</p>
<p><img src="http://blog.pkh.me/img/bezier/bezier-0.png" alt="centerimg" /></p>
<p><strong>Note</strong>: if we were working with Bézier in 3D space, the <code>C</code> vectors would be
in 3D as well.</p>
<p>Intuitively, you may start to see in the mathematical form how each point
contributes to the curve, but it involves some tricky mental gymnastic (at
least for me). So before diving into the multidimensional aspect, we will
simplify the problem by looking into lower degrees.</p>
<h2>Lower degrees</h2>
<p>As implied by its name, the <strong>Cubic</strong> curve <code>B₃(t)</code> is of the 3rd degree. The
2nd most popular curve is the <strong>Quadratic</strong> curve <code>B₂(t)</code> where instead of 2
control points, we only have one (<code>Q₁</code>, in the middle):</p>
<p><img src="http://blog.pkh.me/img/bezier/b2-labels.png" alt="centerimg" /></p>
<p>Can we go lower? Well, there is a "1st degree Bézier curve" but you won't hear
that term very often, because after removing the remaining control point:</p>
<p><img src="http://blog.pkh.me/img/bezier/b1-labels.png" alt="centerimg" /></p>
<p>The "curve" is now a simple line between the 2 points. Still, the concept of
interpolation between the points is consistent/symmetric with the cubic and the
quadratic.</p>
<p>Do you recognize the formula (see title of the figure)? Yes, this is <a href="http://blog.pkh.me/p/29-the-most-useful-math-formulas.html">mix(),
one of the most useful math formula</a>!
The contribution of each factor should make sense this time: <code>t</code> varies within
<code>[0;1]</code>, at <code>t=0</code> we have 100% of <code>L₀</code> (the starting point), at <code>t=1</code> we have
100% of <code>L₁</code>, in the middle at <code>t=½</code> we have 50% of each, etc. All intermediate
values of <code>t</code> define a straight line between these 2 points. We have a simple
linear interpolation.</p>
<p>The presence of this function in the 1st degree is not just a coincidence:
<strong>the <code>mix</code> function is actually the corner stone of all the Bézier curves</strong>.
Indeed, we can build up the Bézier formulas using exclusively nested <code>mix()</code>:</p>
<ul>
<li><code>B₁(l₀,l₁,t) = mix(l₀,l₁,t)</code></li>
<li><code>B₂(q₀,q₁,q₂,t) = B₁(mix(q₀,q₁,t), mix(q₁,q₂,t), t)</code></li>
<li><code>B₃(c₀,c₁,c₂,c₃,t) = B₂(mix(c₀,c₁,t), mix(c₁,c₂,t), mix(c₂,c₃,t))</code></li>
</ul>
<p>This way of formulating the curves is basically <a href="https://en.wikipedia.org/wiki/De_Casteljau%27s_algorithm">De Casteljau's
algorithm</a>. You have no idea how much I love accidentally finding
yet again a relationship with my favourite mathematical function.</p>
<p>But back to our "Bézier 1st degree", remember that we are still in 2D:</p>
<p><img src="http://blog.pkh.me/img/bezier/bezier-1.png" alt="centerimg" /></p>
<p>This multi-dimensional graphic representation can be problematic because it is
<strong>exclusively spatial</strong>: if one is interested in the <code>t</code> parameter, it has to
be extrapolated visually from a twisted curve using mind bending powers, which
is not always practical.</p>
<h2>Mono-dimensional</h2>
<p>In order to represent <code>t</code>, we have to split each spatial dimension and draw
them according to <code>t</code> (defined within <code>[0;1]</code>).</p>
<p>Let's work this out with the following cubic curve (start point is
bottom-left):</p>
<p><img src="http://blog.pkh.me/img/bezier/cubic-2d.png" alt="centerimg" /></p>
<p>If we study this curve, we can see that the <code>x</code> is slightly decreasing, then
increasing for most of the curve, then slightly decreasing again. In
comparison, the <code>y</code> seems to be increasing, decreasing, then increasing again,
probably more strongly than with <code>x</code>. But can you tell for sure what their
respective curves actually look like precisely? I for sure can't, but my
computer can:</p>
<p><img src="http://blog.pkh.me/img/bezier/cubic-1d.png" alt="centerimg" /></p>
<p>Just to be extra clear: the formula is unchanged, we're simply tracing the x
and y dimensions separately according to <code>t</code> instead of plotting the curve in a
xy plane. Note that this means <strong><code>C₀</code>, <code>C₁</code>, <code>C₂</code> and <code>C₂</code> can now only change
vertically</strong>: they are respectively placed at <code>t=0</code>, <code>t=⅓</code>, <code>t=⅔</code> and <code>t=1</code>.
The vertical axis corresponds to their value on their respective plane.</p>
<p>Similarly, with a quadratic we would have <code>Q₀</code> at <code>t=0</code>, <code>Q₁</code> at <code>t=½</code> and <code>Q₂</code>
at <code>t=1</code>.</p>
<p>So what's so great about this representation? Well, first of all the curves are
not going backward anymore, they can be understood by following a left-to-right
reading everyone is familiar with: there is no shenanigan involved in the
interpretation anymore. Also, we are now going to be able to work them out in
algebraic form.</p>
<h2>Polynomial form</h2>
<p>So far we've looked at the curve under their Bézier form, but they can also be
expressed in their polynomial form:</p>
<pre><code class="language-plaintext">B₁(t) = (1-t)·L₀ + t·L₁
= (-L₀+L₁)·t + L₀
= a₁t + b₁
</code></pre>
<pre><code class="language-plaintext">B₂(t) = (1-t)²·Q₀ + 2(1-t)t·Q₁ + t²·Q₂
= (Q₀-2Q₁+Q₂)·t² + (-2Q₀+2Q₁)·t + Q₀
= a₂t² + b₂t + c₂
</code></pre>
<pre><code class="language-plaintext">B₃(t) = (1-t)³·C₀ + 3(1-t)²t·C₁ + 3(1-t)t²·C₂ + t³·C₃
= (-C₀+3C₁-3C₂+C₃)·t³ + (3C₀-6C₁+3C₂)·t² + (-3C₀+3C₁)·t + C₀
= a₃t³ + b₃t² + c₃t + d₃
</code></pre>
<p>This algebraic form is great because we can now plug the formula into a
polynomial root finding algorithm in order to identify the roots. Let's study a
concrete use case of this.</p>
<h2>Concrete use case: intersecting ray</h2>
<p>A fundamental problem of text rendering is figuring out whether a given pixel
<code>P</code> lands inside or outside the character shape (which is composed of a chain
of Bézier curves). The most common algorithms (<a href="https://en.wikipedia.org/wiki/Nonzero-rule">non-zero rule</a> or
<a href="https://en.wikipedia.org/wiki/Even-odd_rule">even-odd rule</a>) involve a <em>ray</em> going from the pixel position into an
arbitrary direction toward infinity (usually horizontal for simplicity). If we
can identify every intersection of this ray with each curve of the shape, we
can deduce if our pixel point <code>P=(Px,Py)</code> is inside or outside.</p>
<p>We will simplify the problem to the crossing of just one curve, using the one
from previous section. It would look like this with an arbitrary point <code>P</code>:</p>
<p><img src="http://blog.pkh.me/img/bezier/cubic-2d-ray.png" alt="centerimg" /></p>
<p>We're looking for the intersection coordinates, but how can we do that in 2D
space? Well, with an horizontal ray, we would have to know when the
y-coordinate of the curve is the same as the y-coordinate of <code>P</code>, so we first
have to solve <code>By(t) = Py</code>, or <code>By(t)-Py=0</code>, where <code>By(t)</code> is the <code>y</code> component
of the given Bézier curve <code>B(t)</code>.</p>
<p>This is a schoolbook <a href="https://en.wikipedia.org/wiki/Root-finding_algorithms">root finding</a> problem, because given that
<code>B(t)</code> is of the third degree, we end up solving the equation: <code>a₃t³ + b₃t² + c₃t + d₃ - Py = 0</code> (the <code>d₃ - Py</code> part is constant, so it acts as the last
coefficient of the polynomial). This gives us the <code>t</code> values (or roots), that
is where the ray crosses our <code>y</code> component.</p>
<p>Since this is a 3rd degree polynomial (highest power is 3), we will have <em>at
most</em> 3 points were the ray crosses the curve. In our case, we do actually get
the maximum number of roots:</p>
<p><img src="http://blog.pkh.me/img/bezier/cubic-1d-y-ray.png" alt="centerimg" /></p>
<p>Now that we have the <code>t</code> values on our curve (remember that <code>t</code> values are
common for both x and y axis), we can simply evaluate the <code>x</code> component of the
<code>B(t)</code> to obtain the <code>x</code> coordinate.</p>
<p><img src="http://blog.pkh.me/img/bezier/cubic-1d-x-ray.png" alt="centerimg" /></p>
<p>Using <code>Px</code>, we can filter which roots we want to keep. In this case,
<code>Px=-0.75</code>, so we're going to keep all the intersections (all the roots
x-coordinates are located above this value).</p>
<p>We could do exactly the same operation by solving <code>Bx(t)-Px=0</code> and evaluating
<code>By(t)</code> on the roots we found: this would give us the intersections with a
vertical ray instead of an horizontal one.</p>
<p>I'm voluntarily omitting a lot of technical details here, such as the root
finding algorithm and floating point inaccuracies challenges: the point is to
illustrate how the 1D deconstruction is essential in understanding and
manipulating Bézier curves.</p>
<h2>Bonus</h2>
<p>During the writing of this article, I made a small <code>matplotlib</code> demo which got
quite popular on Twitter, so I'm sharing it again:</p>
<div style="text-align:center">
<video src="http://blog.pkh.me/misc/bezier.webm" controls="controls" width="800">Animated Bézier curves</video>
</div>
<p>The script used to generate this video:</p>
<pre><code class="language-python">import matplotlib.pyplot as plt
import numpy as np
from matplotlib.animation import FuncAnimation
def mix(a, b, x): return (1 - x) * a + b * x
def linear(a, b, x): return (x - a) / (b - a)
def remap(a, b, c, d, x): return mix(c, d, linear(a, b, x))
def bezier1(p0, p1, t): return mix(p0, p1, t)
def bezier2(p0, p1, p2, t): return bezier1(mix(p0, p1, t), mix(p1, p2, t), t)
def bezier3(p0, p1, p2, p3, t): return bezier2(mix(p0, p1, t), mix(p1, p2, t), mix(p2, p3, t), t)
def _main():
pad = 0.05
bmin, bmax = -1, 1
x_color, y_color, xy_color = "#ff4444", "#44ff44", "#ffdd00"
np.random.seed(0)
r0, r1 = np.random.uniform(-1, 1, (2, 4))
r2, r3 = np.random.uniform(0, 2 * np.pi, (2, 4))
cfg = {
"axes.facecolor": "333333",
"figure.facecolor": "111111",
"font.family": "monospace",
"font.size": 9,
"grid.color": "666666",
}
plt.style.use("dark_background")
with plt.rc_context(cfg):
fig = plt.figure(figsize=[8, 4.5])
gs = fig.add_gridspec(nrows=2, ncols=3)
ax_x = fig.add_subplot(gs[0, 0])
ax_x.grid(True)
for i in range(4):
ax_x.axvline(x=i / 3, linestyle="--", alpha=0.5)
ax_x.axhline(y=0, alpha=0.5)
ax_x.set_xlabel("t")
ax_x.set_ylabel("x", rotation=0, color=x_color)
ax_x.set_xlim(0 - pad, 1 + pad)
ax_x.set_ylim(bmin - pad, bmax + pad)
(x_plt,) = ax_x.plot([], [], "-", color=x_color)
(x_plt_c0,) = ax_x.plot([], [], "o:", color=x_color)
(x_plt_c1,) = ax_x.plot([], [], "o:", color=x_color)
ax_y = fig.add_subplot(gs[1, 0])
ax_y.grid(True)
for i in range(4):
ax_y.axvline(x=i / 3, linestyle="--", alpha=0.5)
ax_y.axhline(y=0, alpha=0.5)
ax_y.set_xlabel("t")
ax_y.set_ylabel("y", rotation=0, color=y_color)
ax_y.set_xlim(0 - pad, 1 + pad)
ax_y.set_ylim(bmin - pad, bmax + pad)
(y_plt,) = ax_y.plot([], [], "-", color=y_color)
(y_plt_c0,) = ax_y.plot([], [], "o:", color=y_color)
(y_plt_c1,) = ax_y.plot([], [], "o:", color=y_color)
ax_xy = fig.add_subplot(gs[0:2, 1:3])
ax_xy.grid(True)
ax_xy.axvline(x=0, alpha=0.8)
ax_xy.axhline(y=0, alpha=0.8)
ax_xy.set_aspect("equal", "box")
ax_xy.set_xlabel("x", color=x_color)
ax_xy.set_ylabel("y", rotation=0, color=y_color)
ax_xy.set_xlim(bmin - pad, bmax + pad)
ax_xy.set_ylim(bmin - pad, bmax + pad)
(xy_plt,) = ax_xy.plot([], [], "-", color=xy_color)
(xy_plt_c0,) = ax_xy.plot([], [], "o:", color=xy_color)
(xy_plt_c1,) = ax_xy.plot([], [], "o:", color=xy_color)
fig.tight_layout()
def update(frame):
px = remap(-1, 1, bmin, bmax, np.sin(r0 * frame + r2))
py = remap(-1, 1, bmin, bmax, np.sin(r1 * frame + r3))
t = np.linspace(0, 1)
x = bezier3(px[0], px[1], px[2], px[3], t)
y = bezier3(py[0], py[1], py[2], py[3], t)
x_plt.set_data(t, x)
x_plt_c0.set_data((0, 1 / 3), (px[0], px[1]))
x_plt_c1.set_data((2 / 3, 1), (px[2], px[3]))
y_plt.set_data(t, y)
y_plt_c0.set_data((0, 1 / 3), (py[0], py[1]))
y_plt_c1.set_data((2 / 3, 1), (py[2], py[3]))
xy_plt.set_data(x, y)
xy_plt_c0.set_data((px[0], px[1]), (py[0], py[1]))
xy_plt_c1.set_data((px[2], px[3]), (py[2], py[3]))
duration, fps, speed = 15, 60, 3
frames = np.linspace(0, duration * speed, duration * fps)
anim = FuncAnimation(fig, update, frames=frames)
anim.save("/tmp/bezier.webm", fps=fps, codec="vp9", extra_args=["-preset", "veryslow", "-tune-content", "screen"])
if __name__ == "__main__":
_main()
</code></pre>
http://blog.pkh.me/p/32-invert-a-function-using-newton-iterations.html
http://blog.pkh.me/p/32-invert-a-function-using-newton-iterations.html
Invert a function using Newton iterationsThu, 11 Aug 2022 06:59:53 -0000<p>Newton's method is probably one of the most popular algorithm for finding the
roots of a function through successive numeric approximations. In less cryptic
words, if you have an opaque function <code>f(x)</code>, and you need to solve <code>f(x)=0</code>
(finding where the function crosses the x-axis), the Newton-Raphson method
gives you a dead simple cookbook to achieve that (a few conditions need to be
met though).</p>
<p>I recently had to solve a similar problem where instead of finding the roots I
had to inverse the function. At first glance this may sound like two entirely
different problems but in practice it's almost the same thing. Since I barely
avoided a mental breakdown in the process of figuring that out, I thought it
would make sense to share the experience of walking the road to enlightenment.</p>
<h2>A function and its inverse</h2>
<p>We are given a funky function, let's say <code>f(x)=2/3(x+1)²-sin(x)-1</code>, and we want
to figure out its inverse <code>f¯¹()</code>:</p>
<p><img src="http://blog.pkh.me/img/newton/newton-01.png" alt="centerimg" /></p>
<p>The diagonal is highlighted for the symmetry to be more obvious. One thing you
may immediately wonder is how is such an inverse function even possible?
Indeed, if you look at <code>x=0</code>, the inverse function gives (at least) 2 <code>y</code>
values, which means it's impossible to trace according to the x-axis. What we
just did here is we swapped the axis: we simply drew <code>y=f(x)</code> and <code>x=f(y)</code>,
which means the axis do not correspond to the same thing whether we are looking
at one curve or the other. For <code>y=f(x)</code> (abbreviated <code>f</code> or <code>f(x)</code>), the
horizontal axis is the x-axis, and for <code>x=f(y)</code> (abbreviated <code>f¯¹</code> or <code>f¯¹(y)</code>)
the horizontal axis is the y-axis because we actually drew the curve according
to the vertical axis.</p>
<p>What can we do here to bring this problem back to reality? Well, first of all
we can reduce the domain and focus on only one segment of the function where
the function can be inverted. This is one of the condition that needs to be
met, otherwise it is simply impossible to solve because it doesn't make any
sense. So we'll redefine our problem to make it solvable by assuming our
function is actually defined in the range <code>R=[R₀,R₁]</code> which we arbitrarily set
to <code>R=[0.1;1.5]</code> in our case (could be anything as long as we have no
discontinuity):</p>
<p><img src="http://blog.pkh.me/img/newton/newton-02.png" alt="centerimg" /></p>
<p>Now <code>f'</code> (the derivative of <code>f</code>) is never null, implying there won't be
multiple solution for a given <code>x</code>, so we should be safe. Indeed, while we are
still tracing <code>f¯¹</code> by flipping the axis, we can see that it could also exist
in the same space as <code>f</code>, meaning we could now draw it according to the
horizontal axis, just like <code>f</code>.</p>
<p>What's so hard though? Bear with me for a moment, because this took me quite a
while to wrap my head around. The symmetry is such that it's trivial to go from
a point on <code>f</code> to a point on <code>f¯¹</code>:</p>
<p><img src="http://blog.pkh.me/img/newton/newton-03.png" alt="centerimg" /></p>
<p>Transforming point <code>A</code> into point <code>B</code> is a matter of simply swapping the
coordinates. Said differently, if I have a <code>x</code> coordinate, evaluating <code>f(x)</code>
will give me the <code>A.y</code> coordinate, so we have <code>A=(x,f(x))</code> and we can get <code>B</code>
with <code>B=(A.y,A.x)=(f(x),x)</code>. But while we are going to use this property, this
is not actually what we are looking for in the first place: our input is the
<code>x</code> coordinate of <code>B</code> (or the <code>y</code> coordinate of <code>A</code>) and we want the other
component.</p>
<p>So how do we do that? This is where root finding actually comes into play.</p>
<h2>Root finding</h2>
<p>We are going to distance ourselves a bit from the graphic representation (it
can be quite confusing anyway) and try to reason with algebra. Not that I'm
much more comfortable with it but we can manage something with the basics here.</p>
<p>The key to not getting your mind mixed up in <code>x</code> and <code>y</code> confusion is to use
different terms because we associate <code>x</code> and <code>y</code> respectively with the
horizontal and vertical axis. So instead we are going to redefine our functions
according to <code>u</code> and <code>v</code>. We have:</p>
<ol>
<li><code>f(u)=v</code></li>
<li><code>f¯¹(v)=u</code> (reminder: <code>v</code> is our input and <code>u</code> is what we are looking for)</li>
</ol>
<p><strong>Note</strong>: writing <code>f¯¹</code> doesn't mean our function is anything special, the <code>¯¹</code>
simply acts as some sort of semantic tagging, we could very well have written
<code>h(v)=v</code>. Both functions <code>f</code> and <code>f¯¹</code> are simply mapping a real number to
another one.</p>
<p>In the previous section we've seen than <code>f(u)=v</code> is actually equivalent to
<code>f¯¹(v)=u</code>. This may sound like an arbitrary statement, so let me rephrase it
differently: for a given value of <code>u</code> it only exists one corresponding value of
<code>v</code>. If we now feed that same <code>v</code> to <code>f¯¹</code> we will get <code>u</code> back. To paraphrase
this with algebra: <code>f¯¹(f(u)) = u</code>.</p>
<p>How does that all of this help us? Well it means that <code>f¯¹(v)=u</code> is equivalent
to <code>f(u)=v</code>. So all we have to do is solve <code>f(u)=v</code>, or <code>f(u)-v=0</code>. <strong>The
process of solving this equation to find <code>u</code> is equivalent to evaluating
<code>f¯¹(v)</code>.</strong></p>
<p>And there we have it, with a simple subtraction of <code>v</code>, we're back into known
territory. We declare a new function <code>g(u)=f(u)-v</code> and we are going to find its
root by solving <code>g(u)=0</code> with the help of Newton's method.</p>
<p>Summary with less babble:</p>
<pre><code class="language-plaintext">f¯¹(v)=u ⬄ f(u)=v
⬄ f(u)-v=0
⬄ g(u)=0 with g(u)=f(u)-v
</code></pre>
<h2>Newton's method</h2>
<p>The Newton iterations are dead-ass simple: it's a suite (or an iterative loop
if you prefer):</p>
<pre><code class="language-plaintext">uₙ₊₁ = uₙ - g(uₙ)/g'(uₙ)
</code></pre>
<p>…repeated as much as needed (it converges quickly).</p>
<ul>
<li><code>g</code> is the function from which we're trying to find the root</li>
<li><code>g'</code> its derivative</li>
<li><code>u</code> our current approximation, which gets closer to the truth at each
iteration</li>
</ul>
<p>We can evaluate <code>g</code> (<code>g(u)=f(u)-v</code>) but we need two more pieces to the puzzle:
<code>g'</code> and an initial value for <code>u</code>.</p>
<h3>Derivative</h3>
<p>There is actually something cool with the derivative <code>g'</code>: since <code>v</code> is a
constant term, the derivative of <code>g</code> is actually the derivative of <code>f</code>:
<code>g(u)=f(u)-v</code> so <code>g'(u)=f'(u)</code>.</p>
<p>This means that we can rewrite our iteration according to <code>f</code> instead of <code>g</code>:</p>
<pre><code class="language-plaintext">uₙ₊₁ = uₙ - (f(uₙ)-v)/f'(uₙ)
</code></pre>
<p>Now for the derivative <code>f'</code> itself we have two choices. If we know the function
<code>f</code>, we can derive it analytically. This should be the preferred choice if you
can because it's faster and more accurate. In our case:</p>
<pre><code class="language-plaintext"> f(x) = 2/3(x+1)² - sin(x) - 1
f'(x) = 4x/3 - cos(x) + 4/3
</code></pre>
<p>You can rely on the <a href="https://www.mathsisfun.com/calculus/derivatives-rules.html">derivative rules</a> to figure the analytic
formula for your function or… you can cheat by using "derivative …" on
<a href="https://www.wolframalpha.com/">WolframAlpha</a>.</p>
<p>But you may be in the situation where you don't actually have that information
because the function is opaque. In this case, you could use an approximation:
take a very small value <code>ε</code> (let's say <code>1e-6</code>) and approximate the derivative
with for example <code>f'(x)=(f(x+ε)-f(x-ε))/(2ε)</code>. It's a dumb trick: we're
basically figuring out the slope by taking two very close points around <code>x</code>.
This would also work by using <code>g</code> instead of <code>f</code>, but you have to compute two
extra subtractions (the <code>- v</code>) for no benefit because they cancel each others.</p>
<h3>Initial approximation</h3>
<p>For the 3rd and last piece of the puzzle, the initial <code>u</code>, we need to figure
out something more elaborate. The simplest we can do is to start with a first
approximation function <code>f₀¯¹</code> as a straight line between the point <code>(f(R₀),R₀)</code>
and <code>(f(R₁),R₁)</code>. How do we create a function that linearly link these 2 points
together? We of course use <a href="http://blog.pkh.me/p/29-the-most-useful-math-formulas.html">one of the most useful math formulas</a>:
<code>remap(a,b,c,d,x) = mix(c,d,linear(a,b,x))</code>, and we evaluate it for our first
approximation value <code>u₀</code>:</p>
<pre><code class="language-plaintext">u₀ = remap(f(R₀),f(R₁),R₀,R₁,v)
</code></pre>
<p>If your boundaries are simpler, typically if <code>R=[0;1]</code>, this expression can be
dramatically simplified. A <code>linear()</code> might be enough, or even a simple
division. We have a pathological case here so we're using the generic
expression.</p>
<p>We get:</p>
<p><img src="http://blog.pkh.me/img/newton/newton-04.png" alt="centerimg" /></p>
<p>Close enough, we can start iterating from here.</p>
<h3>Iterating</h3>
<p>If we do a single Newton iteration, <code>u₁ = u₀ - (f(u₀)-v)/f'(u₀)</code> our straight
line becomes:</p>
<p><img src="http://blog.pkh.me/img/newton/newton-05.png" alt="centerimg" /></p>
<p>With one more iteration:</p>
<p><img src="http://blog.pkh.me/img/newton/newton-06.png" alt="centerimg" /></p>
<p>Seems like we're getting pretty close, aren't we?</p>
<p>If you want to converge even faster, you may want to consider <a href="https://en.wikipedia.org/wiki/Halley's_method">Halley's
method</a>. It's more expensive to
compute, but 1 iteration of Halley may cost less than 2 iterations of Newton.
Up to you to study if the trade-off is worth it.</p>
<h2>Demo code</h2>
<p>If you want to play with this, here is a <code>matplotlib</code> demo generating a graphic
pretty similar to what's found in this post:</p>
<pre><code class="language-python">import numpy as np
import matplotlib.pyplot as plt
N = 1 # Number of iterations
R0, R1 = (0.1, 1.5) # Reduced domain
# The function to inverse and its derivative
def f(x): return 2 / 3 * (x + 1) ** 2 - np.sin(x) - 1
def d(x): return 4 / 3 * x - np.cos(x) + 4 / 3
# The most useful math functions
def mix(a, b, x): return a * (1 - x) + b * x
def linear(a, b, x): return (x - a) / (b - a)
def remap(a, b, c, d, x): return mix(c, d, linear(a, b, x))
# The inverse approximation using Newton-Raphson iterations
def inverse(v, n):
u = remap(f(R0), f(R1), R0, R1, v)
for _ in range(n):
u = u - (f(u) - v) / d(u)
return u
def _main():
_, ax = plt.subplots()
x = np.linspace(R0, R1)
y = f(x)
ax.plot((-1 / 2, 2), (-1 / 2, 2), "--", color="gray")
ax.plot(x, y, "-", color="C0", label="f")
ax.plot([R0, R1], [f(R0), f(R1)], "o", color="C0")
ax.plot(y, x, "-", color="C1", label="f¯¹")
v = np.linspace(f(R0), f(R1))
u = inverse(v, N)
ax.plot(v, u, "-", color="C3", label=f"f¯¹ approx in {N} iteration(s)")
ax.plot([f(R0), f(R1)], [R0, R1], "o", color="C3")
ax.set_aspect("equal", "box")
ax.grid(True)
ax.legend()
plt.show()
_main()
</code></pre>
http://blog.pkh.me/p/31-from-roots-to-polynomials.html
http://blog.pkh.me/p/31-from-roots-to-polynomials.html
From roots to polynomialsSun, 07 Aug 2022 21:50:31 -0000<p>Polynomials can be represented in various forms. The most common ones are
those I call the "sum of powers" (for example <code>f(x)=ax³+bx²+cx+d</code>) and the
"root factors" (for example <code>f(x)=(x-r)(x-s)(x-t)</code>, where <code>r</code>, <code>s</code> and <code>t</code> are
the roots). The process of transforming the former into the latter is called
"root solving"; the goal is to find all the <code>x</code> that satisfy <code>f(x)=0</code>. This is
a field of research that has been going on for hundreds of years. But what
about the reverse operation?</p>
<h2>Roots finding</h2>
<p>Most of the literature circles around roots finding. Analytic solutions to
obtain these roots exist up to degree 4 (<code>f(x)=ax⁴+bx³+cx²+dx+e</code>), and at each
degree the complexity increases dramatically. Starting degree 5 <a href="https://en.wikipedia.org/wiki/Abel%E2%80%93Ruffini_theorem" title="Abel-Ruffini theorem">it is proven
that no analytical solution can exist</a> and we must rely on trial
and error methods.</p>
<p>It is interesting to note that even though we've known how to find these roots
mathematically for about <a href="https://en.wikipedia.org/wiki/Ars_Magna_(Gerolamo_Cardano)" title="Ars Magna (Cardano book)">500 years</a>, it is still a huge challenge
for computers, mainly because of the arithmetic instabilities when working with
<a href="https://en.wikipedia.org/wiki/IEEE_754" title="IEEE-754">IEEE 754</a> floating points. Aside from the multiple analytic
solutions, many algorithms continue to appear to this day to address these
shortcomings, with mixed results.</p>
<p>In order to evaluate these algorithms, I need to build polynomials in their
"sum of powers" form using generated/known roots. More on that particular
project in a future blog post, but the point is that the automation of the
inverse operation is essential.</p>
<h2>Roots concealing</h2>
<p>If we were to transform the degree 4 polynomial <code>f(x)=(x-r)(x-s)(x-t)(x-u)</code>
into <code>f(x)=ax⁴+bx³+cx²+dx+e</code>, we could just do it manually with basic
arithmetic. It's a bit laborious, but there is nothing really difficult about
it. But can we find a generic way of doing this transformation, for all
degrees?</p>
<p>Since I'm a lazy engineer, my first reflex is to submit a 2nd degree polynomial
to <a href="https://www.wolframalpha.com">Wolfram|Alpha</a>:</p>
<p><img src="http://blog.pkh.me/img/wolframalpha-poly-xr-xs.png" alt="centerimg" /></p>
<p>So for <code>(x-r)(x-s)</code> we get the expanded form: <code>rs - rx - sx + x²</code></p>
<p>How about higher degrees?</p>
<ul>
<li><code>(x-r)(x-s)(x-t)</code> gives <code>-rst + rsx + rtx - rx² + stx - sx² - tx² + x³</code></li>
<li><code>(x-r)(x-s)(x-t)(x-u)</code> gives <code>rstu - rstx - rsux + rsx² - rtux + rtx² + rux² - rx³ - stux + stx² + sux² - sx³ + tux² - tx³ - ux³ + x⁴</code></li>
</ul>
<p>If we group the powers properly, we get:</p>
<ul>
<li>degree 2: <code>x² - (r+s)x + rs</code></li>
<li>degree 3: <code>x³ - (r+s+t)x² + (rs+rt+st)x - rst</code></li>
<li>degree 4: <code>x⁴ - (r+s+t+u)x³ + (rs+rt+ru+st+su+tu)x² - (rst+rsu+rtu+stu)x + rstu</code></li>
</ul>
<p>It looks like a pattern is emerging! Some observations:</p>
<ol>
<li>The first coefficient is always <code>1</code>: technically it could be anything, but
that anything would need to multiply every other coefficients. The
polynomial wouldn't be the same, but the solutions would remain identical.
This constant would act as some sort of scale on the curve, but it would
always cross <code>0</code> on the same <code>x</code> values. We will keep <code>1</code> for now.</li>
<li>The signs alternate between <code>-</code> and <code>+</code>, starting with <code>+</code></li>
<li>The 2nd coefficient is always the sum of all the roots</li>
<li>The last coefficient is always the product of all the roots</li>
</ol>
<p>If we focus on the more complex coefficients, we see that they're always a sum
of products of combinations of roots.</p>
<p>Let's see if we can figure out something with the help of the <a href="https://docs.python.org/3/library/itertools.html"><code>itertools</code>
module in Python</a>. Can we for example rebuild the expression <code>rst+rsu+rtu+stu</code>
(from degree 4) using <code>rstu</code> as input?</p>
<pre><code class="language-python-repl">>>> from itertools import combinations
>>> list(combinations("rstu", 3))
[('r', 's', 't'), ('r', 's', 'u'), ('r', 't', 'u'), ('s', 't', 'u')]
>>>
</code></pre>
<p>It looks like we can. Hell, the product are even ordered the same (not that it
matters). How about the <code>rs+rt+ru+st+su+tu</code> in the same degree?</p>
<pre><code class="language-python-repl">>>> list(combinations("rstu", 2))
[('r', 's'), ('r', 't'), ('r', 'u'), ('s', 't'), ('s', 'u'), ('t', 'u')]
</code></pre>
<p>Similarly, we confirm that it works with <code>1</code> and <code>4</code> (2nd and last
coefficient) and even the first coefficient. So for degree 4, the
coefficients can trivially be obtained from:</p>
<ul>
<li><code>a</code>: <code>combinations("rstu", 0)</code> (this is empty so involving no root, and thus
leading to <code>1</code>)</li>
<li><code>b</code>: <code>combinations("rstu", 1)</code></li>
<li><code>c</code>: <code>combinations("rstu", 2)</code></li>
<li><code>d</code>: <code>combinations("rstu", 3)</code></li>
</ul>
<p>More tests also confirm that this works the same for lower (and higher!)
degrees.</p>
<h2>Magic formula</h2>
<p>In the end, we can build a very simple function:</p>
<pre><code class="language-python">from itertools import combinations
from math import prod
def coeffs_from_roots(roots):
return [
(1, -1)[i & 1] * sum(map(prod, combinations(roots, i)))
for i in range(len(roots) + 1)
]
</code></pre>
<ul>
<li><code>(1, -1)[i & 1]</code> gives us the <code>+</code>/<code>-</code> juggling; it can be replaced with <code>-1 if i & 1 else 1</code>, <code>1-(i&1)*2</code> or even <code>(-1)**i</code></li>
<li><code>sum(map(prod, comb...))</code> as the name implies is the sum of the products of
the combinations</li>
</ul>
<p><strong>Edit: thanks to <a href="https://twitter.com/raymondh">@raymondh</a> for the <a href="https://news.ycombinator.com/item?id=32400265">suggested
simplifications</a>. Following this
last link will give yet another improvement with regard to the sign handling.</strong></p>
<h2>Proof?</h2>
<p>Let's be clear, I have absolutely no proof that this will work for all degrees,
it was only inferred from observation. Instinctively, it looks pretty reliable
to me, but if someone knows a proof for this, don't hesitate to contact me.
<strong>Edit: apparently, these are <a href="https://en.wikipedia.org/wiki/Vieta%27s_formulas" title="Vieta's formulas">Vieta's formulas</a>, thanks
<a href="https://jix.one">jix</a>!</strong></p>
<p>I'd be very happy to hear if there is a named theorem and demonstration out
there about this particular property. I'm also curious about how the function I
wrote in Python would look like in a mathematical notation. I know the sum and
product symbols, but I'm not sure how the combination would be expressed.</p>
http://blog.pkh.me/p/30-saving-a-restic-backup-the-hard-way.html
http://blog.pkh.me/p/30-saving-a-restic-backup-the-hard-way.html
Saving a restic backup the hard wayMon, 06 Sep 2021 21:39:08 -0000<p>This is the end of the holidays in which I spent a long time building a backup
infrastructure like a responsible adult. The villain of the following story is
the junk MIPS machine which held hostage most of my important data (about 1TB).
Its evil plan was to build up a corrupted backup snapshot for a whole week and
have me deal with it. Various options were available to me, but I decided to
use the scalpel and manually strike at the core.</p>
<p><img src="http://blog.pkh.me/img/dead-car.jpg" alt="centerimg" /></p>
<h2>The plot</h2>
<p>Long story short, this is the plot in technical terms:</p>
<ul>
<li><a href="https://restic.net/">restic</a> is a popular backup system in which every time you call a
<code>backup</code> command, it will create a new snapshot of the pointed data, encrypt
it, and push it over a dedicated repository</li>
<li>the MIPS machine holding my data ran one <code>restic backup</code> command on a remote
(and more reliable) machine. This process took about a week because the MIPS
CPU is anemic.</li>
<li>the snapshot seemed to be fine (<code>restic check</code> raised no error)</li>
<li>trying to <code>restic copy</code> that repository to another location was unfortunately
causing an error at around 25%</li>
</ul>
<p>This was the error:</p>
<pre><code class="language-plaintext">LoadBlob(9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27) returned error
blob 9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27 returned invalid hash
</code></pre>
<h2>The beginning of an adventure</h2>
<p>Before starting, we need some serious information gathering, and a clear
understanding of how things work. The <a href="https://restic.readthedocs.io/en/stable/100_references.html">restic references</a> page is a
gold mine for that. Here are a few interesting bits in order to understand the
rest of the story:</p>
<ul>
<li>in a restic repository, the data is stored into <em>packs</em>. A pack is simply one
file located in the <code>data</code> directory of a repository.</li>
<li>each pack contains a chain of independent <em>blobs</em> and a header at the end</li>
<li>a blob is a chunk of encrypted data, encapsulated with a few crypto
information</li>
</ul>
<p>With that much information we can already start investigating. We know one of
the blob (a chunk of data in a given pack file) has an integrity error
according to our error message. We need to identify where it is located, and to
which data (file) it actually corresponds:</p>
<pre><code class="language-shell">% restic find --show-pack-id --blob 9f02880a
repository 390a6747 opened successfully, password is correct
Found blob 9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27
... in file /saving/private/ryan.mp3
(tree f73fb24fa4f8c0885452a51c3d97912efe44fd8f72907eda446bcada4463a309)
... in snapshot cd60b511 (2021-08-29 00:57:08)
Object belongs to pack fdd48b5c364ad5004324312e10c78bc0101095de141022c8775d14485fd77e73
... Pack fdd48b5c: <Blob (data) 9f02880a, offset 3120083, length 3401765>
</code></pre>
<p>We indeed find a <code>data/fd/fdd48b5c...</code> file on the remote repository, which we
can grab right away in order to work with it later down the line.</p>
<p>After building <code>restic</code> in debug mode (<code>go build -tags debug ./cmd/restic</code>), we
can also dump the pack using <code>debug examine --extract-pack</code>. Here is what we
get:</p>
<pre><code class="language-shell">% ls dump
-rw-r--r-- 1 ux ux 606141 Sep 5 13:42 correct-bd30e59fb99f30794bb5c4c8d3460eb22627433a6f0a7a07087a4e56b9f2276c.bin
-rw-r--r-- 1 ux ux 1738374 Sep 5 13:42 correct-becb3f4e8308b26ff66d2ba5cf85c15f73daf432c3c61d9f61c55f212aa26b7e.bin
-rw-r--r-- 1 ux ux 775472 Sep 5 13:42 correct-f0def1ccba1c6efe30b4dc14b67d68db1a6b81e44e7d4cba18a1330477abe877.bin
-rw-r--r-- 1 ux ux 3401733 Sep 5 13:42 wrong-hash-f99b85dbc25b54e1fa16fe75f33118e4a347644f62602913c41907878e902f47.bin
</code></pre>
<p><strong>Note</strong>: these blobs are in a decrypted state, but the encrypted data would be
the same size since restic is using AES256 CTR.</p>
<p>From the dump log we also get the layout (the order in which the blobs are
packed):</p>
<pre><code class="language-plaintext">data blob becb3f4e8308b26ff66d2ba5cf85c15f73daf432c3c61d9f61c55f212aa26b7e, offset 0 , raw length 1738406
data blob f0def1ccba1c6efe30b4dc14b67d68db1a6b81e44e7d4cba18a1330477abe877, offset 1738406, raw length 775504
data blob bd30e59fb99f30794bb5c4c8d3460eb22627433a6f0a7a07087a4e56b9f2276c, offset 2513910, raw length 606173
data blob 9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27, offset 3120083, raw length 3401765
</code></pre>
<p>We also notice this information:</p>
<pre><code class="language-plaintext"> loading blob 9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27 at 3120083 (length 3401765)
successfully decrypted blob (length 3401733), hash is f99b85dbc25b54e1fa16fe75f33118e4a347644f62602913c41907878e902f47, ID does not match, wanted 9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27
decrypt of blob f99b85dbc25b54e1fa16fe75f33118e4a347644f62602913c41907878e902f47 stored at wrong-hash-f99b85dbc25b54e1fa16fe75f33118e4a347644f62602913c41907878e902f47.bin
</code></pre>
<p>In particular: <code>9f02880a</code> is the expected hash, but we do get <code>f99b85db</code>
instead.</p>
<p>Since we have a mismatch between the hash and the data, we need to figure out
which one is wrong. Fortunately for me, I have archaic backups everywhere, so I
was able to grab the original file for comparison. This file will be called
<code>ryan.mp3</code> (as in "/saving/private/ryan.mp3") for the rest of the story.</p>
<p>We notice that <code>ryan.mp3</code> is bigger than the pack file (and so even bigger than
the blob it corresponds to):</p>
<pre><code class="language-plaintext">-rw------- 1 ux ux 6522032 Sep 5 14:12 fdd48b5c364ad5004324312e10c78bc0101095de141022c8775d14485fd77e73
-rw-r--r-- 1 ux ux 10418993 Sep 5 13:34 ryan.mp3
</code></pre>
<p>Before truncating our reference (<code>ryan.mp3</code>), we can check the impact of the
diff with the corresponding blob (<code>wrong-hash-f99b85db....bin</code>):</p>
<p><img src="http://blog.pkh.me/img/ryan-blob-bindiff.png" alt="centerimg" /></p>
<p>So the data blob (and not the hash) is wrong: 32 bytes are off. This could be
because of a hardware memory issue, typically a random bitflip during
encryption <a href="https://forum.restic.net/t/help-debugging-a-blob-invalid-hash/4318/4">as suggested in the help-wanted forum post I made</a>
(who needs ECC memory heh?).</p>
<h2>Data reconstruction</h2>
<p>The previous diff shows that the <code>wrong-hash</code> blob starts at the beginning of
the file. We can truncate our ref to obtain the desired correct blob:</p>
<pre><code class="language-shell">% dd if=ryan.mp3 of=ryan-cut.mp3 bs=3401733 count=1
1+0 records in
1+0 records out
3401733 bytes (3.4 MB, 3.2 MiB) copied, 0.00986717 s, 345 MB/s
</code></pre>
<p>A checksum shows that our blob now has the expected hash:</p>
<pre><code class="language-shell">% sha256sum ryan-cut.mp3
9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27 ryan-cut.mp3
</code></pre>
<p>This is perfect: it confirms that the metadata are correct but the encrypted
data blob isn't. We add our newly corrected data blob to the dump directory
(for later use):</p>
<pre><code class="language-shell">% cp ryan-cut.mp3 dump/correct-$(sha256sum ryan-cut.mp3|head -c64).bin
</code></pre>
<p>Encryption now. The documentation says:</p>
<blockquote>
<p>Apart from the files stored within the keys directory, all files are
encrypted with AES-256 in counter mode (CTR).</p>
</blockquote>
<p>The documentation also indicates that the layout of blobs (and most of the
objects) follows this simple structure:</p>
<table>
<thead>
<tr>
<th>Block</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>IV</td>
<td>16B</td>
</tr>
<tr>
<td>Ciphertext (encrypted data)</td>
<td><em>variable</em></td>
</tr>
<tr>
<td>MAC info</td>
<td>16</td>
</tr>
</tbody>
</table>
<p>For the pack file, it is:</p>
<table>
<thead>
<tr>
<th>Block</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>Blob #0</td>
<td><em>variable</em></td>
</tr>
<tr>
<td>Blob #1</td>
<td><em>variable</em></td>
</tr>
<tr>
<td>Blob #2</td>
<td><em>variable</em></td>
</tr>
<tr>
<td>...</td>
<td></td>
</tr>
<tr>
<td>Blob #N</td>
<td><em>variable</em></td>
</tr>
<tr>
<td>Header</td>
<td><code>Length</code></td>
</tr>
<tr>
<td>Length</td>
<td>4B (little-endian)</td>
</tr>
</tbody>
</table>
<p>Our pack file contains a header length of <code>0x000000B4</code> (180). If we sum all our
dumped blob sizes with it (accounting IV and Mac) plus the 4B of the header
length, we have:</p>
<pre><code class="language-python-repl">>>> blob_sizes = [606141, 1738374, 775472, 3401733]
>>> s = sum(blob_sizes)
>>> s += len(blob_sizes) * (16 + 16)
>>> s += 180
>>> s += 4
>>> s
</code></pre>
<p>This exactly matches the size of our pack file, so we're on the right track.</p>
<p>Now to confirm that we have a proper understanding of the layout, we could
re-encrypt every blob from the dump directory, and compare them to what is
inside the raw packfile:</p>
<pre><code class="language-bash">#!/bin/bash
set -uex # -e mandatory to make sure `cmp` breaks the execution in case of error
# Fetch the master encryption key as a hex string using the restic API
blob_key=$(restic cat masterkey | jq -r .encrypt | base64 -d | xxd -p -c32)
broken_packfile=fdd48b5c364ad5004324312e10c78bc0101095de141022c8775d14485fd77e73
# Ordered list of blobs present in the broken packfile
blobs=$(cat << END
dump/correct-becb3f4e8308b26ff66d2ba5cf85c15f73daf432c3c61d9f61c55f212aa26b7e.bin
dump/correct-f0def1ccba1c6efe30b4dc14b67d68db1a6b81e44e7d4cba18a1330477abe877.bin
dump/correct-bd30e59fb99f30794bb5c4c8d3460eb22627433a6f0a7a07087a4e56b9f2276c.bin
dump/wrong-hash-f99b85dbc25b54e1fa16fe75f33118e4a347644f62602913c41907878e902f47.bin
END
)
rm -f dump/*.enc dump/*.ref
iv_size=16
mac_size=16
offset=0
for blob in $blobs; do
# Extract the blob IV
blob_iv=$(cat $broken_packfile | xxd -seek $offset -p -l16)
# Encrypt blob with AES-256 CTR using the blob IV and the master key
openssl aes-256-ctr -e -in $blob -out $blob.enc -iv $blob_iv -K $blob_key
blob_off=$(($offset+$iv_size))
blob_cyphertext_size=$(stat -c '%s' $blob)
# Extract the encrypted blob from the packfile to compare with our own
dd if=$broken_packfile of=$blob.ref ibs=1 count=$blob_cyphertext_size skip=$blob_off 2>/dev/null
# Compare what we encrypted with what's in the packfile
cmp $blob.enc $blob.ref
length=$(($iv_size+$blob_cyphertext_size+$mac_size))
offset=$(($offset+$length))
done
</code></pre>
<p>Running this doesn't fail, which means we were able to re-encode every blob
exactly the same as they appear within that packfile. So now, how about we
replace the <code>wrong-hash-f99b85db....bin</code> file with our carefully crafted
<code>correct-9f02880a....bin</code> file and re-encapsulate a new packfile?</p>
<p>Let's adjust our script:</p>
<pre><code class="language-bash">#!/bin/bash
set -uex
# Fetch the master encryption key as a hex string using the restic API
blob_key=$(restic cat masterkey | jq -r .encrypt | base64 -d | xxd -p -c32)
broken_packfile=fdd48b5c364ad5004324312e10c78bc0101095de141022c8775d14485fd77e73
fixed_packfile=${broken_packfile}.fixed
# Ordered list of blobs to repack into a new pack file
blobs=$(cat << END
dump/correct-becb3f4e8308b26ff66d2ba5cf85c15f73daf432c3c61d9f61c55f212aa26b7e.bin
dump/correct-f0def1ccba1c6efe30b4dc14b67d68db1a6b81e44e7d4cba18a1330477abe877.bin
dump/correct-bd30e59fb99f30794bb5c4c8d3460eb22627433a6f0a7a07087a4e56b9f2276c.bin
dump/correct-9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27.bin
END
)
# Start with the broken pack file as base (so that we preserve all existing
# IV, MAC, headers)
cp $broken_packfile $fixed_packfile
rm -f dump/*.enc
iv_size=16
mac_size=16
offset=0
for blob in $blobs; do
# Extract the blob IV
blob_iv=$(cat $broken_packfile | xxd -seek $offset -p -l16)
# Encrypt blob with AES-256 CTR using the blob IV and the master key
openssl aes-256-ctr -e -in $blob -out $blob.enc -iv $blob_iv -K $blob_key
blob_off=$(($offset+$iv_size))
blob_cyphertext_size=$(stat -c '%s' $blob)
# Insert the newly encoded blob into our pack file at its correct location
dd if=$blob.enc of=$fixed_packfile bs=1 count=$blob_cyphertext_size seek=$blob_off conv=notrunc
length=$(($iv_size+$blob_cyphertext_size+$mac_size))
offset=$(($offset+$length))
done
</code></pre>
<p>We are now in possession of a packfile with the data corrected, where all the
blobs match their respective checksum. Is this a win?</p>
<p>Alas, replacing the data file with our new one brings this new error when
trying a <code>restic cat blob 9f02880a</code>: "<em>ciphertext verification failed</em>". Sad.</p>
<p>This is because while the checksum is now correct, the MAC signature (the 16B
at the end of the blob) is now wrong, so we need to correct it.</p>
<p>After trying to recompute it manually using openssl for an hour, I figured a
simpler way to achieve that would be to patch restic to leak the expected mac:</p>
<pre><code class="language-diff">diff --git a/internal/crypto/crypto.go b/internal/crypto/crypto.go
index 56ee61db..f2b48a66 100644
--- a/internal/crypto/crypto.go
+++ b/internal/crypto/crypto.go
@@ -121,7 +121,14 @@ func poly1305Verify(msg []byte, nonce []byte, key *MACKey, mac []byte) bool {
var m [16]byte
copy(m[:], mac)
- return poly1305.Verify(&m, msg, &k)
+ ret := poly1305.Verify(&m, msg, &k)
+ if ! ret {
+ var tmp [16]byte
+ poly1305.Sum(&tmp, msg, &k)
+ fmt.Println("expected", tmp, "got", m)
+ }
+
+ return ret
}
// NewRandomKey returns new encryption and message authentication keys.
</code></pre>
<p>Running a <code>restic cat blob</code> again still fails but it also prints the key on
stdout, which we can store in a <code>fixed.mac</code> file:</p>
<pre><code class="language-python-repl">>>> s = '60 8 175 15 140 206 212 132 102 59 123 192 61 19 34 36'
>>> with open('fixed.mac', 'wb') as f:
... f.write(b''.join(int(x).to_bytes(1, 'little') for x in s.split()))
...
16
</code></pre>
<p>We also need to find where exactly the mac is located, so we add this to our
repacking script:</p>
<pre><code class="language-bash">mac_offset=$(($offset+$iv_size+$blob_cyphertext_size))
blob_mac=$(cat $broken_packfile | xxd -seek $mac_offset -p -l16)
echo "mac_offset:$mac_offset blob_mac:$blob_mac"
</code></pre>
<p>After double checking that the <code>blob_mac:</code> matches the restic "got" log, we can
patch it in our pack file using the <code>mac_offset</code> as <code>seek</code> argument to <code>dd</code>:</p>
<pre><code class="language-shell">% dd if=fixed.mac of=fdd48b5c364ad5004324312e10c78bc0101095de141022c8775d14485fd77e73.fixed bs=1 seek=6521832 count=16 conv=notrunc
</code></pre>
<p>And that's it, we can now replace <code>data/fd/fdd48b5c...</code> with our new fixed file
and profit, because this time you're goddamn right <strong>it works</strong>.</p>
<h2>EDIT 2021/09/12</h2>
<p>It works, we can read the file properly, checksum is fine, <em>but</em> restic will
still report index errors when running a <code>check --read-data</code>. One thing we
forgot is that the pack file checksum also changes:</p>
<pre><code class="language-shell">% sha256sum fdd48b5c*
fdd48b5c364ad5004324312e10c78bc0101095de141022c8775d14485fd77e73 fdd48b5c364ad5004324312e10c78bc0101095de141022c8775d14485fd77e73
fdd48b5c364ad5004324312e10c78bc0101095de141022c8775d14485fd77e73.fixed 81b4816c341d7f60f10e6a554d6b7053d1216d16dd688437c728479490e4eff6
</code></pre>
<p>This means we must add <code>data/81/81b4816c</code> and remove <code>data/fd/fdd48b5c...</code>,
then run <code>restic rebuild-index</code>. This gets rid of the remaining issue. Thanks
to Michael Eischer from the restic forum for the hints.</p>
<h2>Closing words</h2>
<p>One could argue that I could have relied on higher level tools provided by
restic and simply removed the packfile, re-indexed the repo and made a new
snapshot (or something along these lines). This would have been more generic,
simpler and probably more reliable. But it also has its drawbacks. First of
all, it implies that you trust the tools for repairing a repository, which are
probably based on various heuristics, likely less tested as those are pretty
rare and specific scenarios.</p>
<p>Also, and this was important to me, I wanted to understand how restic worked
because I trust this system with the most precious data in my life. This issue
was actually a very good opportunity to get comfortable with its internals, and
it also re-enforced the confidence I have in the tool and in myself to face
such issue in the future.</p>
<p>And finally, heck, it was fun.</p>
<p>Oh, and for the curious, the MIPS machine is a <a href="http://gnubee.org/">GnuBee</a>.
I'm not going to badmouth it because it served its purpose and it's an awesome
initiative (in particular because it allowed me to use my large stack of 2.5"
hard drives in a very compact form factor), but on the other hand it's going to
be hammered soon because I don't negotiate with terrorists.</p>