The question of how to do templating in C is often raised, and I couldn't find a good overview of the different approaches so I'll try to make a relatively exhaustive list here.
Note: the methods presented here rely on the C preprocessor, which you can use
directly yourself by calling the cpp
command, paste your code and see the
output by sending an EOS
(pressing control + d
typically).
Simple C macro
So the most common method is to dump some code enclosed (or not) into the do { ... } while (0)
form:
#define DO_RANDOM_STUFF(type) do { \
int i; \
type *p = buf; \
\
for (i = 0; i < len; i++) \
p[i] = p[i] * k; \
} while (0)
... and using it directly:
int func(void *buf, int len, float k, int request)
{
if (request == INT8) DO_RANDOM_STUFF(int8_t);
else if (request == INT16) DO_RANDOM_STUFF(int16_t);
else if (request == INT32) DO_RANDOM_STUFF(int32_t);
}
Simple C macro, full function
It is also common to create the whole function that way:
#define DECLARE_FUNC(n) \
static void func_##n(int##n##_t *p, int len, float k) \
{ \
int i; \
\
for (i = 0; i < len; i++) \
p[i] = p[i] * k; \
}
DECLARE_FUNC(8)
DECLARE_FUNC(16)
DECLARE_FUNC(32)
... which you will then use by simply calling func_8()
, func_16()
and
func_32()
.
Alternative function creator
So far we used the templating just to avoid typing redundancy, but it is sometimes motivated by performance. Let's observe the following pattern:
int process_image(void *img, int width, int height, const int n)
{
int x, y;
for (y = 0; y < height; y++) {
for (x = 0; x < width; x++) {
if (n == 0) foo(img, x, y);
else if (n == 1) bar(img, x, y);
else baz(img, x, y);
}
}
}
We will assume here that foo()
, bar()
and baz()
functions are meant to be
inlined for performance reasons (be it justified or not, we assume you do not
want a per-pixel function call), so you do not want to use an array of function
pointers and do func_lut[n](img, x, y)
in the inner loop. We could also
consider the scenario where the functions take a completely different set of
parameters.
Your compiler will sometimes be smart enough to see that the n
check in the
inner loop can instead enclose the whole logic just as if you had written this:
int process_image(void *img, int width, int height, const int n)
{
int x, y;
if (n == 0)
for (y = 0; y < height; y++)
for (x = 0; x < width; x++)
foo(img, x, y);
else if (n == 1)
for (y = 0; y < height; y++)
for (x = 0; x < width; x++)
bar(img, x, y);
else
for (y = 0; y < height; y++)
for (x = 0; x < width; x++)
baz(img, x, y);
}
Unfortunately, it is very likely that your code is much more complex (otherwise you could even just put the 2 loops into your inner functions) which will cause your compiler not to do it.
One solution for this is to create wrapper for each scenario. It would look like this:
static inline int process_image(void *img, int width, int height, const int n)
{
int x, y;
for (y = 0; y < height; y++) {
for (x = 0; x < width; x++) {
if (n == 0) foo(img, x, y);
else if (n == 1) bar(img, x, y);
else baz(img, x, y);
}
}
}
int process_image_foo(void *img, int width, int height)
{
return process_image(img, width, height, 0);
}
int process_image_bar(void *img, int width, int height)
{
return process_image(img, width, height, 1);
}
int process_image_baz(void *img, int width, int height)
{
return process_image(img, width, height, 2);
}
Notice how the process_image()
function has been marked as static inline
in
order to make sure it is fully inlined in each wrapper.
By doing so, the compiler can very easily see the dead paths (because n
is
now a constant in the wrappers) and generate 3 standalone functions with zero
call nor inner condition. These functions can now be put into a function lookup
table mapped to n
(process_image_lut[n](img, width, height)
).
One great thing about this method is that you can also do type-specific code in
each of the function (by interpreting img
as int16_t
, float
, uint64_t
,
etc for example).
As a side effect, you will notice that the main logic is not under a huge hardly maintainable macro, and the redundancy overhead is also very small.
Note: you might want to rely on __attribute__((always_inline))
if your
compiler supports it and you want the inline to be effective at every
optimization level.
Mixing full functions mechanism and macros
If the redundancy overhead in the previous example is still too much for you, you can mix it with a small macro mechanism:
static inline int process_image(void *img, int width, int height, const int n)
{
int x, y;
for (y = 0; y < height; y++) {
for (x = 0; x < width; x++) {
if (n == 0) foo(img, x, y);
else if (n == 1) bar(img, x, y);
else baz(img, x, y);
}
}
}
#define DECLARE_PROCESS_IMAGE_FUNC(name, n) \
int process_image_##name(void *img, int width, int height) \
{ \
return process_image(img, width, height, n); \
}
DECLARE_PROCESS_IMAGE_FUNC(foo, 0)
DECLARE_PROCESS_IMAGE_FUNC(bar, 1)
DECLARE_PROCESS_IMAGE_FUNC(baz, 2)
This avoids the painful redundancy and still benefits from the same advantages as previously.
FFmpeg's XBR filter and paletteuse filter are a few real cases examples.
External file
One alternative to all of this is to use external files.
The logic is pretty simple; let's start with the content of caller.c
:
#include <stdint.h>
#define TEMPLATE_U16
#include "evil_template.c"
#undef TEMPLATE_U16
#define TEMPLATE_U32
#include "evil_template.c"
#undef TEMPLATE_U32
#define TEMPLATE_FLT
#include "evil_template.c"
#undef TEMPLATE_FLT
#define TEMPLATE_DBL
#include "evil_template.c"
#undef TEMPLATE_DBL
The content of evil_template.c
could look like something like this:
#if defined(TEMPLATE_U16)
# define RENAME(N) N ## _u16
# define TYPE uint16_t
# define SUM_TYPE uint32_t
# define XDIV(x, n) (((x) + ((1<<(n))-1)) >> (n))
#elif defined(TEMPLATE_U32)
# define RENAME(N) N ## _u32
# define TYPE uint32_t
# define SUM_TYPE uint64_t
# define XDIV(x, n) (((x) + ((1<<(n))-1)) >> (n))
#elif defined(TEMPLATE_FLT)
# define RENAME(N) N ## _flt
# define TYPE float
# define SUM_TYPE float
# define XDIV(x, n) ((x) / (float)(1<<(n)))
#elif defined(TEMPLATE_DBL)
# define RENAME(N) N ## _dbl
# define TYPE double
# define SUM_TYPE double
# define XDIV(x, n) ((x) / (double)(1<<(n)))
#endif
TYPE RENAME(func)(const TYPE *p, int n)
{
int i;
SUM_TYPE sum = 0;
for (i = 0; i < 1<<n; i++)
sum += p[i];
return XDIV(sum, n);
}
#undef RENAME
#undef TYPE
#undef SUM_TYPE
#undef XDIV
This will produce the following functions:
% gcc -Wall -c caller.c -o caller.o && readelf -s caller.o | grep func
9: 0000000000000000 110 FUNC GLOBAL DEFAULT 1 func_u16
10: 000000000000006e 119 FUNC GLOBAL DEFAULT 1 func_u32
11: 00000000000000e5 128 FUNC GLOBAL DEFAULT 1 func_flt
12: 0000000000000165 131 FUNC GLOBAL DEFAULT 1 func_dbl
This way of templating could be relatively handy if you have a set of functions in the template but can be considered by many as the root of all evils. Nevertheless, this is still a few order of magnitude better than C++ templating.
Note: make sure your build system properly handles the dependency of
evil_template.c
from caller.c
.
Resampling code in libswresample (which inspired this example) is one of the many real case example.
Conclusion
Mixing the full functions mechanism and macros covers most of the cases and can lead to very decent code with high level of readability and maintenance.
For more complex cases, you also can simply rely on another language, generally a scripting one (but you can use C to generate C as well of course). This can often be the most appropriate approach if the complexity is increasing and your project already has a dependency on such a language.
Edit: As pointed by someone on HN, it is also worth
mentioning __Generic
in C11 that can be used for similar purpose if your
project can afford the dependency on C11.