index

GCC undefined behaviors are getting wild

2022-11-27

Happy with my recent breakthrough in understanding C integer divisions after weeks of struggle, I was minding my own business having fun writing integer arithmetic code. Life was good, when suddenly… zsh: segmentation fault (core dumped).

That code wasn't messing with memory much so it was more likely to be a side effect of an arithmetic overflow or something. Using -fsanitize=undefined quickly identified the issue, which confirmed the presence of an integer overflow. The fix was easy but something felt off. I was under the impression my code was robust enough against that kind of honest mistake. Turns out, the protecting condition I had in place should indeed have been enough, so I tried to extract a minimal reproducible case:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

uint8_t tab[0x1ff + 1];

uint8_t f(int32_t x)
{
    if (x < 0)
        return 0;
    int32_t i = x * 0x1ff / 0xffff;
    if (i >= 0 && i < sizeof(tab)) {
        printf("tab[%d] looks safe because %d is between [0;%d[\n", i, i, (int)sizeof(tab));
        return tab[i];
    }

    return 0;
}

int main(int ac, char **av)
{
    return f(atoi(av[1]));
}

The overflow can happen on x * 0x1ff. Since an integer overflow is undefined, GCC makes the assumption that it cannot happen, ever. In practice in this case it does, but the i >= 0 && i < sizeof(tab) condition should be enough to take care of it, whatever crazy value it becomes, right? Well, I have bad news:

% cc -Wall -O2 overflow.c -o overflow && ./overflow 50000000
tab[62183] looks safe because 62183 is between [0;512[
zsh: segmentation fault (core dumped)  ./overflow 50000000

Note: this is GCC 12.2.0 on x86-64.

We have i=62183 as the result of the overflow, and nevertheless the execution violates the gate condition, spout a non-sense lie, go straight into dereferencing tab, and die miserably.

Let's study what GCC is doing here. Firing up Ghidra we observe the following decompiled code:

uint8_t f(int x)
{
  int tmp;

  if (-1 < x) {
    tmp = x * 0x1ff;
    if (tmp < 0x1fffe00) {
      printf("tab[%d] looks safe because %d is between [0;%d[\n",(ulong)(uint)tmp / 0xffff, (ulong)(uint)tmp / 0xffff,0x200);
      return tab[(int)((uint)tmp / 0xffff)];
    }
  }
  return '\0';
}

When I said GCC makes the assumption that it cannot happen this is what I meant: tmp is not supposed to overflow so part of the condition I had in place was simply removed. More specifically since x can not be lesser than 0, and since GCC assumes a multiplication cannot overflow into a random value (that could be negative) because it is undefined behaviour, it then decides to drop the "redundant" i >= 0 condition because "it cannot happen".

I reported that exact issue to GCC to make sure it wasn't a bug, and it was indeed confirmed to me that the undefined behaviour of an integer overflow is not limited in scope to whatever insane value it could take: it is apparently perfectly acceptable to mess up the code flow entirely.

While I understand how attractive it can be from an optimization point of view, the paranoid developer in me is straight up terrified by the perspective of a single integer overflow removing security protection and causing such havoc. I've worked several years in a project where the integer overflows were (and probably still are) legion. Identifying and fixing of all them is likely a lifetime mission of several opinionated individuals.

I'm expecting this article to make the rust crew go in a crusade again, and I think I might be with them this time.

Edit: it was made clear to me while reading Predrag's blog that the key to my misunderstanding boils down to this: "Undefined behavior is not the same as implementation-defined behavior". While I was indeed talking about undefined behaviour, subconsciously I was thinking that the behaviour of an overflow on a multiplication would be "implementation-defined behaviour". This is not the case, it is indeed an undefined behaviour, and yes the compiler is free to do whatever it wants to because it is compliant with the specifications. It's my mistake of course, but to my defense, despite the arrogant comments I read, this confusion happens a lot. This happens I believe because it's violating the Principle of least astonishment. To illustrate this I'll take this interesting old OpenBSD developer blog post being concerned about the result of the multiplication rather than the invalidation of any guarantee with regard to what's going to happen to the execution flow (before and after). This is not uncommon and in my opinion perfectly understandable.


For updates and more frequent content you can follow me on Mastodon. Feel also free to subscribe to the RSS in order to be notified of new write-ups. It is also usually possible to reach me through other means (check the footer below). Finally, discussions on some of the articles can sometimes be found on HackerNews, Lobste.rs and Reddit.

index