The canny edge detector is likely the most commonly implemented edge detection algorithm. It's generally a nice filter to write when starting with image processing as it is not much complicated and provide kind of useful results in some situations.
That's why I took a few days to implement it in FFmpeg.
Following is a screenshot of a Lynx video by Stefan Sobotta
filtered with the edgedetect
filter:
Command line used: ffplay ~/samples/Lynx.mp4 -vf 'split[b], pad=iw:ih*2[src], [b]edgedetect, [src]overlay=0:h'
For those not familiar with the algorithm, it is essentially a few successive filters applied on a gray scale image:
- apply a gaussian filter to reduce noise (it blurs the image)
- compute the gradients, and the directions for the next step (see Sobel)
- non maximum suppressions
- double threshold to keep only the interesting values
For more details, I'd recommend the Canny edge detector Wikipedia page and its external links.
This article will focus on the second step, and especially the directions computation, since it involves some little math and can be expensive. The other steps are either trivial or widely documented so they won't be the topic of this post.
Direction computation
This step really is in my opinion the most interesting one. But first we need
some little understanding of this Sobel operator. In input we have a blurred
image, and for each pixel we get a gradient value G = |Gx|+|Gy|
, where Gx
and Gy
are defined using 6 surrounding pixels (both use a 3x3
kernel with 3
zero values).
The kernel for Gx
is:
-1 | 0 | +1 |
-2 | 0 | +2 |
-1 | 0 | +1 |
And the kernel for Gy
is:
-1 | -2 | -1 |
0 | 0 | 0 |
+1 | +2 | +1 |
The interesting thing about Gx
and Gy
is that they allow to define an edge
direction angle Θ = atan2(|Gx|,|Gy|)
. In the Canny Edge Detector algorithm, this
angle must be rounded to one of the four directions:
- horizontal
- vertical,
- 45° going "up"
- 45° going "down"
These directions, defined for each pixel, will be used in the next step when looking for maximums.
Naive implementation
Let's write a piece of code to test our direction rounder:
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <math.h>
enum {
DIRECTION_45UP,
DIRECTION_45DOWN,
DIRECTION_HORIZONTAL,
DIRECTION_VERTICAL,
DIRECTION_UNKNWON,
};
static const uint8_t colors[] = {
0xff, 0x00, 0x00, // red -> 45 up
0x00, 0xff, 0x00, // green -> 45 down
0x00, 0x00, 0xff, // blue -> horizontal
0xff, 0xff, 0x00, // yellow -> vertical
0x00, 0x00, 0x00, // black -> unknown
};
static int get_direction(int gx, int gy)
{
/* TODO */
return DIRECTION_UNKNWON;
}
static void print_ppm(int m)
{
int y, x, sz = m*2 + 1;
printf("P3\n%d %d\n255\n", sz, sz);
for (y = -m; y <= m; y++) {
for (x = -m; x <= m; x++) {
const uint8_t *c = colors + 3*get_direction(x, y);
if (x != -m)
printf(" ");
printf("%3d %3d %3d", c[0], c[1], c[2]);
}
printf("\n");
}
}
int main(int ac, char **av)
{
if (ac != 2) {
fprintf(stderr, "usage: %s limit\n", av[0]);
return 0;
}
print_ppm(strtol(av[1], NULL, 10));
return 0;
}
That code takes a limit
parameter used to limit the range of the x
and y
values. Basically, get_direction()
will be called for all the integer values
for x
in [-limit;limit]
and for y
in [-limit;limit]
. Each pixel will be
attributed a color describing the direction (blue for horizontal, yellow for
vertical, etc).
The usage is pretty simple:
% gcc -Wall -lm a.c && ./a.out 100 > out.ppm && feh out.ppm
The resulting output out.ppm
is a picture of size 2xlimit
x 2xlimit
full
of colors.
Here is a naive get_direction()
implementation, where we compute Θ
to
compare it to reference angles (π/8
and 3π/8
), and use the signs of Gx
and Gy
:
static int get_direction_naive(int gx, int gy)
{
const double theta = atan2(abs(gy), abs(gx));
if (gx) {
if (theta < 3*M_PI/8 && theta > M_PI/8) {
if ((gx > 0 && gy > 0) || (gx < 0 && gy < 0))
return DIRECTION_45DOWN;
else
return DIRECTION_45UP;
}
if (theta < M_PI/8)
return DIRECTION_HORIZONTAL;
}
return DIRECTION_VERTICAL;
}
The resulting out.ppm
with limit=100
:
So this is exactly what we want to achieve. Unfortunately, the call to
atan2()
for each pixel is quite troublesome.
Note: if Gx = 0
and Gy = 0
, we are at the center of the picture.
Get rid of the tangent computation
If we want to get get rid of the atan2()
we can just use tan()
: basically
the tangent of the angle Θ
defined by Gx
and Gy
is Gy/Gx
: tan(Θ) = Gy/Gx
. So we just have to compare Gy/Gx
against the tangent of the reference
angle, which are the following constants:
tan( π/8)
=√2 - 1
tan(3π/8)
=√2 + 1
And to avoid the division of Gy/Gx
, we can simply compare Gy
against Gx
multiplied by these constants. Here is a new proposition of the
get_direction()
function:
#define SQRT2 1.4142135623730951
static int get_direction_no_atan2(int gx, int gy)
{
double tanpi8gx, tan3pi8gx;
if (gx) {
if (gx < 0)
gx = -gx, gy = -gy;
tanpi8gx = (SQRT2-1) * gx;
tan3pi8gx = (SQRT2+1) * gx;
if (gy > -tan3pi8gx && gy < -tanpi8gx) return DIRECTION_45UP;
if (gy > -tanpi8gx && gy < tanpi8gx) return DIRECTION_HORIZONTAL;
if (gy > tanpi8gx && gy < tan3pi8gx) return DIRECTION_45DOWN;
}
return DIRECTION_VERTICAL;
}
You'll notice I also took the liberty to also change the branching logic to make the code somehow more readable.
So now that we dropped the dependency to libm
and the atan2()
call, let's
do something even better.
Making the code bit exact
We are still using floating point operations here, which is kind of a problem (for performances, and for architecture independent testing at least). Let's try to make some integer arithmetic instead.
First, we need to analyze what will be the range of values Gx
and Gy
can
take. Both are using the 3x3
kernels defined a bit earlier, and
assuming the pixel is defined as a 8-bit gray-scale value (0
to 0xFF
), we
can conclude that Gx
and Gy
are both in the range: [(-1)x255 + (-2)x255 + (-1)x255; 1x255 + 2x255 + 1x255]
, or [-1020;1020]
.
This range is quite small, and multiplying these values by 1<<16
will largely
fit in an integer, so it should be safe to use 16-bit arithmetic.
Instead of comparing Gy
against:
Gx
x√2-1
Gx
x√2+1
...like we did previously, let's do the comparison between Gy
x (1<<16)
and:
Gx
x√2-1
x(1<<16)
Gx
x√2+1
x(1<<16)
The rounded values for these constants are:
round((sqrt(2)-1) * (1<<16)) = 27146
round((sqrt(2)+1) * (1<<16)) = 158218
These values will be multiplied to Gx
, and 1020 x 158218
fits into a 32-bit
integer so it's still OK. Following, the optimized version:
static int get_direction_integer(int gx, int gy)
{
if (gx) {
int tanpi8gx, tan3pi8gx;
if (gx < 0)
gx = -gx, gy = -gy;
gy <<= 16;
tanpi8gx = 27146 * gx;
tan3pi8gx = 158218 * gx;
if (gy > -tan3pi8gx && gy < -tanpi8gx) return DIRECTION_45UP;
if (gy > -tanpi8gx && gy < tanpi8gx) return DIRECTION_HORIZONTAL;
if (gy > tanpi8gx && gy < tan3pi8gx) return DIRECTION_45DOWN;
}
return DIRECTION_VERTICAL;
}
We should now have a way faster version than our first naive implementation.
Benchmark
Following is a naive time
benchmark, on a i7 Intel CPU, and using gcc -O3
:
Run | naive | no atan2 | integer |
---|---|---|---|
1 | 0.257 | 0.032 | 0.025 |
2 | 0.251 | 0.032 | 0.029 |
3 | 0.255 | 0.029 | 0.027 |
As usual, performances depend on a lot of various other parameters, but it looks like it was worth the effort since it's almost 10 times faster (and now bit exact).