A small freedom area.

Fun and canny optim for a Canny Edge Detector

Sat 08 Sep 2012

fun, trick, prog, algo, ffmpeg, av

The canny edge detector is likely the most commonly implemented edge detection algorithm. It's generally a nice filter to write when starting with image processing as it is not much complicated and provide kind of useful results in some situations.

That's why I took a few days to implement it in FFmpeg. Following is a screenshot of a Lynx video by Stefan Sobotta filtered with the edgedetect filter:


Command line used: ffplay ~/samples/Lynx.mp4 -vf 'split[b], pad=iw:ih*2[src], [b]edgedetect, [src]overlay=0:h'

For those not familiar with the algorithm, it is essentially a few successive filters applied on a gray scale image:

For more details, I'd recommend the Canny edge detector Wikipedia page and its external links.

This article will focus on the second step, and especially the directions computation, since it involves some little math and can be expensive. The other steps are either trivial or widely documented so they won't be the topic of this post.

Direction computation

This step really is in my opinion the most interesting one. But first we need some little understanding of this Sobel operator. In input we have a blurred image, and for each pixel we get a gradient value G = |Gx|+|Gy|, where Gx and Gy are defined using 6 surrounding pixels (both use a 3x3 kernel with 3 zero values).

The kernel for Gx is:


And the kernel for Gy is:

0 0 0

The interesting thing about Gx and Gy is that they allow to define an edge direction angle Θ = atan2(|Gx|,|Gy|). In the Canny Edge Detector algorithm, this angle must be rounded to one of the four directions:

These directions, defined for each pixel, will be used in the next step when looking for maximums.

Naive implementation

Let's write a piece of code to test our direction rounder:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <math.h>

enum {

static const uint8_t colors[] = {
    0xff, 0x00, 0x00, // red     -> 45 up
    0x00, 0xff, 0x00, // green   -> 45 down
    0x00, 0x00, 0xff, // blue    -> horizontal
    0xff, 0xff, 0x00, // yellow  -> vertical
    0x00, 0x00, 0x00, // black   -> unknown

static int get_direction(int gx, int gy)
    /* TODO */

static void print_ppm(int m)
    int y, x, sz = m*2 + 1;

    printf("P3\n%d %d\n255\n", sz, sz);
    for (y = -m; y <= m; y++) {
        for (x = -m; x <= m; x++) {
            const uint8_t *c = colors + 3*get_direction(x, y);
            if (x != -m)
                printf("  ");
            printf("%3d %3d %3d", c[0], c[1], c[2]);

int main(int ac, char **av)
    if (ac != 2) {
        fprintf(stderr, "usage: %s limit\n", av[0]);
        return 0;
    print_ppm(strtol(av[1], NULL, 10));
    return 0;

That code takes a limit parameter used to limit the range of the x and y values. Basically, get_direction() will be called for all the integer values for x in [-limit;limit] and for y in [-limit;limit]. Each pixel will be attributed a color describing the direction (blue for horizontal, yellow for vertical, etc).

The usage is pretty simple:

% gcc -Wall -lm a.c && ./a.out 100 > out.ppm && feh out.ppm

The resulting output out.ppm is a picture of size 2xlimit x 2xlimit full of colors.

Here is a naive get_direction() implementation, where we compute Θ to compare it to reference angles (π/8 and 3π/8), and use the signs of Gx and Gy:

static int get_direction_naive(int gx, int gy)
    const double theta = atan2(abs(gy), abs(gx));
    if (gx) {
        if (theta < 3*M_PI/8 && theta > M_PI/8) {
            if ((gx > 0 && gy > 0) || (gx < 0 && gy < 0))
                return DIRECTION_45DOWN;
                return DIRECTION_45UP;
        if (theta < M_PI/8)
            return DIRECTION_HORIZONTAL;

The resulting out.ppm with limit=100:


So this is exactly what we want to achieve. Unfortunately, the call to atan2() for each pixel is quite troublesome.

Note: if Gx = 0 and Gy = 0, we are at the center of the picture.

Get rid of the tangent computation

If we want to get get rid of the atan2() we can just use tan(): basically the tangent of the angle Θ defined by Gx and Gy is Gy/Gx: tan(Θ) = Gy/Gx. So we just have to compare Gy/Gx against the tangent of the reference angle, which are the following constants:

And to avoid the division of Gy/Gx, we can simply compare Gy against Gx multiplied by these constants. Here is a new proposition of the get_direction() function:

static int get_direction_no_atan2(int gx, int gy)
    double tanpi8gx, tan3pi8gx;

#define SQRT2 1.4142135623730951

    if (gx) {
        if (gx < 0)
            gx = -gx, gy = -gy;
        tanpi8gx  = (SQRT2-1) * gx;
        tan3pi8gx = (SQRT2+1) * gx;
        if (gy > -tan3pi8gx && gy < -tanpi8gx)  return DIRECTION_45UP;
        if (gy > -tanpi8gx  && gy <  tanpi8gx)  return DIRECTION_HORIZONTAL;
        if (gy >  tanpi8gx  && gy <  tan3pi8gx) return DIRECTION_45DOWN;

You'll notice I also took the liberty to also change the branching logic to make the code somehow more readable.

So now that we dropped the dependency to libm and the atan2() call, let's do something even better.

Making the code bit exact

We are still using floating point operations here, which is kind of a problem (for performances, and for architecture independent testing at least). Let's try to make some integer arithmetic instead.

First, we need to analyze what will be the range of values Gx and Gy can take. Both are using the 3x3 kernels defined a bit earlier, and assuming the pixel is defined as a 8-bit gray-scale value (0 to 0xFF), we can conclude that Gx and Gy are both in the range: [(-1)x255 + (-2)x255 + (-1)x255; 1x255 + 2x255 + 1x255], or [-1020;1020].

This range is quite small, and multiplying these values by 1<<16 will largely fit in an integer, so it should be safe to use 16-bit arithmetic.

Instead of comparing Gy against:

...like we did previously, let's do the comparison between Gy x (1<<16) and:

The rounded values for these constants are:

These values will be multiplied to Gx, and 1020 x 158218 fits into a 32-bit integer so it's still OK. Following, the optimized version:

static int get_direction_integer(int gx, int gy)
    if (gx) {
        int tanpi8gx, tan3pi8gx;

        if (gx < 0)
            gx = -gx, gy = -gy;
        gy <<= 16;
        tanpi8gx  =  27146 * gx;
        tan3pi8gx = 158218 * gx;
        if (gy > -tan3pi8gx && gy < -tanpi8gx)  return DIRECTION_45UP;
        if (gy > -tanpi8gx  && gy <  tanpi8gx)  return DIRECTION_HORIZONTAL;
        if (gy >  tanpi8gx  && gy <  tan3pi8gx) return DIRECTION_45DOWN;

We should now have a way faster version than our first naive implementation.


Following is a naive time benchmark, on a i7 Intel CPU, and using gcc -O3:

Runnaiveno atan2integer
1 0.257 0.032 0.025
2 0.251 0.032 0.029
3 0.255 0.029 0.027

As usual, performances depend on a lot of various other parameters, but it looks like it was worth the effort since it's almost 10 times faster (and now bit exact).