1 |
root |
1.14 |
=head1 LIBECB - e-C-Builtins |
2 |
root |
1.3 |
|
3 |
root |
1.14 |
=head2 ABOUT LIBECB |
4 |
|
|
|
5 |
|
|
Libecb is currently a simple header file that doesn't require any |
6 |
|
|
configuration to use or include in your project. |
7 |
|
|
|
8 |
sf-exg |
1.16 |
It's part of the e-suite of libraries, other members of which include |
9 |
root |
1.14 |
libev and libeio. |
10 |
|
|
|
11 |
|
|
Its homepage can be found here: |
12 |
|
|
|
13 |
|
|
http://software.schmorp.de/pkg/libecb |
14 |
|
|
|
15 |
|
|
It mainly provides a number of wrappers around GCC built-ins, together |
16 |
|
|
with replacement functions for other compilers. In addition to this, |
17 |
sf-exg |
1.16 |
it provides a number of other lowlevel C utilities, such as endianness |
18 |
root |
1.14 |
detection, byte swapping or bit rotations. |
19 |
|
|
|
20 |
root |
1.24 |
Or in other words, things that should be built into any standard C system, |
21 |
|
|
but aren't, implemented as efficient as possible with GCC, and still |
22 |
|
|
correct with other compilers. |
23 |
root |
1.17 |
|
24 |
root |
1.14 |
More might come. |
25 |
root |
1.3 |
|
26 |
|
|
=head2 ABOUT THE HEADER |
27 |
|
|
|
28 |
root |
1.14 |
At the moment, all you have to do is copy F<ecb.h> somewhere where your |
29 |
|
|
compiler can find it and include it: |
30 |
|
|
|
31 |
|
|
#include <ecb.h> |
32 |
|
|
|
33 |
|
|
The header should work fine for both C and C++ compilation, and gives you |
34 |
|
|
all of F<inttypes.h> in addition to the ECB symbols. |
35 |
|
|
|
36 |
sf-exg |
1.16 |
There are currently no object files to link to - future versions might |
37 |
root |
1.14 |
come with an (optional) object code library to link against, to reduce |
38 |
|
|
code size or gain access to additional features. |
39 |
|
|
|
40 |
|
|
It also currently includes everything from F<inttypes.h>. |
41 |
|
|
|
42 |
|
|
=head2 ABOUT THIS MANUAL / CONVENTIONS |
43 |
|
|
|
44 |
|
|
This manual mainly describes each (public) function available after |
45 |
|
|
including the F<ecb.h> header. The header might define other symbols than |
46 |
|
|
these, but these are not part of the public API, and not supported in any |
47 |
|
|
way. |
48 |
|
|
|
49 |
|
|
When the manual mentions a "function" then this could be defined either as |
50 |
|
|
as inline function, a macro, or an external symbol. |
51 |
|
|
|
52 |
|
|
When functions use a concrete standard type, such as C<int> or |
53 |
|
|
C<uint32_t>, then the corresponding function works only with that type. If |
54 |
|
|
only a generic name is used (C<expr>, C<cond>, C<value> and so on), then |
55 |
|
|
the corresponding function relies on C to implement the correct types, and |
56 |
|
|
is usually implemented as a macro. Specifically, a "bool" in this manual |
57 |
|
|
refers to any kind of boolean value, not a specific type. |
58 |
root |
1.1 |
|
59 |
root |
1.40 |
=head2 TYPES / TYPE SUPPORT |
60 |
|
|
|
61 |
|
|
ecb.h makes sure that the following types are defined (in the expected way): |
62 |
|
|
|
63 |
root |
1.42 |
int8_t uint8_t int16_t uint16_t |
64 |
|
|
int32_t uint32_t int64_t uint64_t |
65 |
root |
1.49 |
intptr_t uintptr_t |
66 |
root |
1.40 |
|
67 |
|
|
The macro C<ECB_PTRSIZE> is defined to the size of a pointer on this |
68 |
root |
1.45 |
platform (currently C<4> or C<8>) and can be used in preprocessor |
69 |
|
|
expressions. |
70 |
root |
1.40 |
|
71 |
root |
1.74 |
For C<ptrdiff_t> and C<size_t> use C<stddef.h>/C<cstddef>. |
72 |
root |
1.49 |
|
73 |
root |
1.62 |
=head2 LANGUAGE/ENVIRONMENT/COMPILER VERSIONS |
74 |
root |
1.43 |
|
75 |
sf-exg |
1.46 |
All the following symbols expand to an expression that can be tested in |
76 |
root |
1.44 |
preprocessor instructions as well as treated as a boolean (use C<!!> to |
77 |
|
|
ensure it's either C<0> or C<1> if you need that). |
78 |
|
|
|
79 |
root |
1.43 |
=over 4 |
80 |
|
|
|
81 |
root |
1.44 |
=item ECB_C |
82 |
|
|
|
83 |
|
|
True if the implementation defines the C<__STDC__> macro to a true value, |
84 |
root |
1.55 |
while not claiming to be C++. |
85 |
root |
1.44 |
|
86 |
root |
1.43 |
=item ECB_C99 |
87 |
|
|
|
88 |
root |
1.47 |
True if the implementation claims to be compliant to C99 (ISO/IEC |
89 |
root |
1.55 |
9899:1999) or any later version, while not claiming to be C++. |
90 |
root |
1.47 |
|
91 |
|
|
Note that later versions (ECB_C11) remove core features again (for |
92 |
|
|
example, variable length arrays). |
93 |
root |
1.43 |
|
94 |
root |
1.74 |
=item ECB_C11, ECB_C17 |
95 |
root |
1.43 |
|
96 |
root |
1.74 |
True if the implementation claims to be compliant to C11/C17 (ISO/IEC |
97 |
|
|
9899:2011, :20187) or any later version, while not claiming to be C++. |
98 |
root |
1.44 |
|
99 |
|
|
=item ECB_CPP |
100 |
|
|
|
101 |
|
|
True if the implementation defines the C<__cplusplus__> macro to a true |
102 |
|
|
value, which is typically true for C++ compilers. |
103 |
|
|
|
104 |
root |
1.74 |
=item ECB_CPP11, ECB_CPP14, ECB_CPP17 |
105 |
root |
1.44 |
|
106 |
root |
1.74 |
True if the implementation claims to be compliant to C++11/C++14/C++17 |
107 |
|
|
(ISO/IEC 14882:2011, :2014, :2017) or any later version. |
108 |
root |
1.43 |
|
109 |
root |
1.57 |
=item ECB_GCC_VERSION (major, minor) |
110 |
root |
1.43 |
|
111 |
|
|
Expands to a true value (suitable for testing in by the preprocessor) |
112 |
sf-exg |
1.46 |
if the compiler used is GNU C and the version is the given version, or |
113 |
root |
1.43 |
higher. |
114 |
|
|
|
115 |
|
|
This macro tries to return false on compilers that claim to be GCC |
116 |
|
|
compatible but aren't. |
117 |
|
|
|
118 |
root |
1.50 |
=item ECB_EXTERN_C |
119 |
|
|
|
120 |
|
|
Expands to C<extern "C"> in C++, and a simple C<extern> in C. |
121 |
|
|
|
122 |
|
|
This can be used to declare a single external C function: |
123 |
|
|
|
124 |
|
|
ECB_EXTERN_C int printf (const char *format, ...); |
125 |
|
|
|
126 |
|
|
=item ECB_EXTERN_C_BEG / ECB_EXTERN_C_END |
127 |
|
|
|
128 |
|
|
These two macros can be used to wrap multiple C<extern "C"> definitions - |
129 |
|
|
they expand to nothing in C. |
130 |
|
|
|
131 |
|
|
They are most useful in header files: |
132 |
|
|
|
133 |
|
|
ECB_EXTERN_C_BEG |
134 |
|
|
|
135 |
|
|
int mycfun1 (int x); |
136 |
|
|
int mycfun2 (int x); |
137 |
|
|
|
138 |
|
|
ECB_EXTERN_C_END |
139 |
|
|
|
140 |
|
|
=item ECB_STDFP |
141 |
|
|
|
142 |
|
|
If this evaluates to a true value (suitable for testing in by the |
143 |
|
|
preprocessor), then C<float> and C<double> use IEEE 754 single/binary32 |
144 |
|
|
and double/binary64 representations internally I<and> the endianness of |
145 |
|
|
both types match the endianness of C<uint32_t> and C<uint64_t>. |
146 |
|
|
|
147 |
|
|
This means you can just copy the bits of a C<float> (or C<double>) to an |
148 |
|
|
C<uint32_t> (or C<uint64_t>) and get the raw IEEE 754 bit representation |
149 |
|
|
without having to think about format or endianness. |
150 |
|
|
|
151 |
|
|
This is true for basically all modern platforms, although F<ecb.h> might |
152 |
|
|
not be able to deduce this correctly everywhere and might err on the safe |
153 |
|
|
side. |
154 |
|
|
|
155 |
root |
1.54 |
=item ECB_AMD64, ECB_AMD64_X32 |
156 |
|
|
|
157 |
|
|
These two macros are defined to C<1> on the x86_64/amd64 ABI and the X32 |
158 |
|
|
ABI, respectively, and undefined elsewhere. |
159 |
|
|
|
160 |
|
|
The designers of the new X32 ABI for some inexplicable reason decided to |
161 |
|
|
make it look exactly like amd64, even though it's completely incompatible |
162 |
|
|
to that ABI, breaking about every piece of software that assumed that |
163 |
|
|
C<__x86_64> stands for, well, the x86-64 ABI, making these macros |
164 |
|
|
necessary. |
165 |
|
|
|
166 |
root |
1.43 |
=back |
167 |
|
|
|
168 |
root |
1.62 |
=head2 MACRO TRICKERY |
169 |
|
|
|
170 |
|
|
=over 4 |
171 |
|
|
|
172 |
|
|
=item ECB_CONCAT (a, b) |
173 |
|
|
|
174 |
|
|
Expands any macros in C<a> and C<b>, then concatenates the result to form |
175 |
|
|
a single token. This is mainly useful to form identifiers from components, |
176 |
|
|
e.g.: |
177 |
|
|
|
178 |
|
|
#define S1 str |
179 |
|
|
#define S2 cpy |
180 |
|
|
|
181 |
|
|
ECB_CONCAT (S1, S2)(dst, src); // == strcpy (dst, src); |
182 |
|
|
|
183 |
|
|
=item ECB_STRINGIFY (arg) |
184 |
|
|
|
185 |
|
|
Expands any macros in C<arg> and returns the stringified version of |
186 |
|
|
it. This is mainly useful to get the contents of a macro in string form, |
187 |
|
|
e.g.: |
188 |
|
|
|
189 |
|
|
#define SQL_LIMIT 100 |
190 |
|
|
sql_exec ("select * from table limit " ECB_STRINGIFY (SQL_LIMIT)); |
191 |
|
|
|
192 |
root |
1.64 |
=item ECB_STRINGIFY_EXPR (expr) |
193 |
|
|
|
194 |
|
|
Like C<ECB_STRINGIFY>, but additionally evaluates C<expr> to make sure it |
195 |
|
|
is a valid expression. This is useful to catch typos or cases where the |
196 |
|
|
macro isn't available: |
197 |
|
|
|
198 |
|
|
#include <errno.h> |
199 |
|
|
|
200 |
|
|
ECB_STRINGIFY (EDOM); // "33" (on my system at least) |
201 |
|
|
ECB_STRINGIFY_EXPR (EDOM); // "33" |
202 |
|
|
|
203 |
|
|
// now imagine we had a typo: |
204 |
|
|
|
205 |
|
|
ECB_STRINGIFY (EDAM); // "EDAM" |
206 |
|
|
ECB_STRINGIFY_EXPR (EDAM); // error: EDAM undefined |
207 |
|
|
|
208 |
root |
1.62 |
=back |
209 |
|
|
|
210 |
sf-exg |
1.60 |
=head2 ATTRIBUTES |
211 |
root |
1.1 |
|
212 |
sf-exg |
1.60 |
A major part of libecb deals with additional attributes that can be |
213 |
|
|
assigned to functions, variables and sometimes even types - much like |
214 |
|
|
C<const> or C<volatile> in C. They are implemented using either GCC |
215 |
|
|
attributes or other compiler/language specific features. Attributes |
216 |
|
|
declarations must be put before the whole declaration: |
217 |
root |
1.20 |
|
218 |
|
|
ecb_const int mysqrt (int a); |
219 |
|
|
ecb_unused int i; |
220 |
|
|
|
221 |
root |
1.1 |
=over 4 |
222 |
|
|
|
223 |
root |
1.3 |
=item ecb_unused |
224 |
|
|
|
225 |
|
|
Marks a function or a variable as "unused", which simply suppresses a |
226 |
|
|
warning by GCC when it detects it as unused. This is useful when you e.g. |
227 |
|
|
declare a variable but do not always use it: |
228 |
|
|
|
229 |
root |
1.15 |
{ |
230 |
sf-exg |
1.61 |
ecb_unused int var; |
231 |
root |
1.3 |
|
232 |
root |
1.15 |
#ifdef SOMECONDITION |
233 |
|
|
var = ...; |
234 |
|
|
return var; |
235 |
|
|
#else |
236 |
|
|
return 0; |
237 |
|
|
#endif |
238 |
|
|
} |
239 |
root |
1.3 |
|
240 |
root |
1.56 |
=item ecb_deprecated |
241 |
|
|
|
242 |
|
|
Similar to C<ecb_unused>, but marks a function, variable or type as |
243 |
|
|
deprecated. This makes some compilers warn when the type is used. |
244 |
|
|
|
245 |
root |
1.62 |
=item ecb_deprecated_message (message) |
246 |
|
|
|
247 |
root |
1.67 |
Same as C<ecb_deprecated>, but if possible, the specified diagnostic is |
248 |
root |
1.62 |
used instead of a generic depreciation message when the object is being |
249 |
|
|
used. |
250 |
|
|
|
251 |
root |
1.31 |
=item ecb_inline |
252 |
root |
1.29 |
|
253 |
root |
1.73 |
Expands either to (a compiler-specific equivalent of) C<static inline> or |
254 |
|
|
to just C<static>, if inline isn't supported. It should be used to declare |
255 |
|
|
functions that should be inlined, for code size or speed reasons. |
256 |
root |
1.29 |
|
257 |
|
|
Example: inline this function, it surely will reduce codesize. |
258 |
|
|
|
259 |
root |
1.31 |
ecb_inline int |
260 |
root |
1.29 |
negmul (int a, int b) |
261 |
|
|
{ |
262 |
|
|
return - (a * b); |
263 |
|
|
} |
264 |
|
|
|
265 |
root |
1.2 |
=item ecb_noinline |
266 |
|
|
|
267 |
sf-exg |
1.66 |
Prevents a function from being inlined - it might be optimised away, but |
268 |
root |
1.3 |
not inlined into other functions. This is useful if you know your function |
269 |
|
|
is rarely called and large enough for inlining not to be helpful. |
270 |
|
|
|
271 |
root |
1.2 |
=item ecb_noreturn |
272 |
|
|
|
273 |
root |
1.17 |
Marks a function as "not returning, ever". Some typical functions that |
274 |
|
|
don't return are C<exit> or C<abort> (which really works hard to not |
275 |
|
|
return), and now you can make your own: |
276 |
|
|
|
277 |
|
|
ecb_noreturn void |
278 |
|
|
my_abort (const char *errline) |
279 |
|
|
{ |
280 |
|
|
puts (errline); |
281 |
|
|
abort (); |
282 |
|
|
} |
283 |
|
|
|
284 |
sf-exg |
1.19 |
In this case, the compiler would probably be smart enough to deduce it on |
285 |
|
|
its own, so this is mainly useful for declarations. |
286 |
root |
1.17 |
|
287 |
root |
1.53 |
=item ecb_restrict |
288 |
|
|
|
289 |
|
|
Expands to the C<restrict> keyword or equivalent on compilers that support |
290 |
|
|
them, and to nothing on others. Must be specified on a pointer type or |
291 |
|
|
an array index to indicate that the memory doesn't alias with any other |
292 |
|
|
restricted pointer in the same scope. |
293 |
|
|
|
294 |
|
|
Example: multiply a vector, and allow the compiler to parallelise the |
295 |
|
|
loop, because it knows it doesn't overwrite input values. |
296 |
|
|
|
297 |
|
|
void |
298 |
sf-exg |
1.61 |
multiply (ecb_restrict float *src, |
299 |
|
|
ecb_restrict float *dst, |
300 |
root |
1.53 |
int len, float factor) |
301 |
|
|
{ |
302 |
|
|
int i; |
303 |
|
|
|
304 |
|
|
for (i = 0; i < len; ++i) |
305 |
|
|
dst [i] = src [i] * factor; |
306 |
|
|
} |
307 |
|
|
|
308 |
root |
1.2 |
=item ecb_const |
309 |
|
|
|
310 |
sf-exg |
1.19 |
Declares that the function only depends on the values of its arguments, |
311 |
root |
1.17 |
much like a mathematical function. It specifically does not read or write |
312 |
|
|
any memory any arguments might point to, global variables, or call any |
313 |
|
|
non-const functions. It also must not have any side effects. |
314 |
|
|
|
315 |
|
|
Such a function can be optimised much more aggressively by the compiler - |
316 |
|
|
for example, multiple calls with the same arguments can be optimised into |
317 |
|
|
a single call, which wouldn't be possible if the compiler would have to |
318 |
|
|
expect any side effects. |
319 |
|
|
|
320 |
|
|
It is best suited for functions in the sense of mathematical functions, |
321 |
sf-exg |
1.19 |
such as a function returning the square root of its input argument. |
322 |
root |
1.17 |
|
323 |
|
|
Not suited would be a function that calculates the hash of some memory |
324 |
|
|
area you pass in, prints some messages or looks at a global variable to |
325 |
|
|
decide on rounding. |
326 |
|
|
|
327 |
|
|
See C<ecb_pure> for a slightly less restrictive class of functions. |
328 |
|
|
|
329 |
root |
1.2 |
=item ecb_pure |
330 |
|
|
|
331 |
root |
1.17 |
Similar to C<ecb_const>, declares a function that has no side |
332 |
|
|
effects. Unlike C<ecb_const>, the function is allowed to examine global |
333 |
|
|
variables and any other memory areas (such as the ones passed to it via |
334 |
|
|
pointers). |
335 |
|
|
|
336 |
|
|
While these functions cannot be optimised as aggressively as C<ecb_const> |
337 |
|
|
functions, they can still be optimised away in many occasions, and the |
338 |
|
|
compiler has more freedom in moving calls to them around. |
339 |
|
|
|
340 |
|
|
Typical examples for such functions would be C<strlen> or C<memcmp>. A |
341 |
|
|
function that calculates the MD5 sum of some input and updates some MD5 |
342 |
|
|
state passed as argument would I<NOT> be pure, however, as it would modify |
343 |
|
|
some memory area that is not the return value. |
344 |
|
|
|
345 |
root |
1.2 |
=item ecb_hot |
346 |
|
|
|
347 |
root |
1.17 |
This declares a function as "hot" with regards to the cache - the function |
348 |
|
|
is used so often, that it is very beneficial to keep it in the cache if |
349 |
|
|
possible. |
350 |
|
|
|
351 |
|
|
The compiler reacts by trying to place hot functions near to each other in |
352 |
|
|
memory. |
353 |
|
|
|
354 |
sf-exg |
1.19 |
Whether a function is hot or not often depends on the whole program, |
355 |
root |
1.17 |
and less on the function itself. C<ecb_cold> is likely more useful in |
356 |
|
|
practise. |
357 |
|
|
|
358 |
root |
1.2 |
=item ecb_cold |
359 |
|
|
|
360 |
root |
1.17 |
The opposite of C<ecb_hot> - declares a function as "cold" with regards to |
361 |
|
|
the cache, or in other words, this function is not called often, or not at |
362 |
|
|
speed-critical times, and keeping it in the cache might be a waste of said |
363 |
|
|
cache. |
364 |
|
|
|
365 |
|
|
In addition to placing cold functions together (or at least away from hot |
366 |
|
|
functions), this knowledge can be used in other ways, for example, the |
367 |
|
|
function will be optimised for size, as opposed to speed, and codepaths |
368 |
|
|
leading to calls to those functions can automatically be marked as if |
369 |
root |
1.27 |
C<ecb_expect_false> had been used to reach them. |
370 |
root |
1.17 |
|
371 |
|
|
Good examples for such functions would be error reporting functions, or |
372 |
|
|
functions only called in exceptional or rare cases. |
373 |
|
|
|
374 |
root |
1.2 |
=item ecb_artificial |
375 |
|
|
|
376 |
root |
1.17 |
Declares the function as "artificial", in this case meaning that this |
377 |
root |
1.52 |
function is not really meant to be a function, but more like an accessor |
378 |
root |
1.17 |
- many methods in C++ classes are mere accessor functions, and having a |
379 |
|
|
crash reported in such a method, or single-stepping through them, is not |
380 |
|
|
usually so helpful, especially when it's inlined to just a few instructions. |
381 |
|
|
|
382 |
|
|
Marking them as artificial will instruct the debugger about just this, |
383 |
|
|
leading to happier debugging and thus happier lives. |
384 |
|
|
|
385 |
|
|
Example: in some kind of smart-pointer class, mark the pointer accessor as |
386 |
|
|
artificial, so that the whole class acts more like a pointer and less like |
387 |
|
|
some C++ abstraction monster. |
388 |
|
|
|
389 |
|
|
template<typename T> |
390 |
|
|
struct my_smart_ptr |
391 |
|
|
{ |
392 |
|
|
T *value; |
393 |
|
|
|
394 |
|
|
ecb_artificial |
395 |
|
|
operator T *() |
396 |
|
|
{ |
397 |
|
|
return value; |
398 |
|
|
} |
399 |
|
|
}; |
400 |
|
|
|
401 |
root |
1.2 |
=back |
402 |
root |
1.1 |
|
403 |
|
|
=head2 OPTIMISATION HINTS |
404 |
|
|
|
405 |
|
|
=over 4 |
406 |
|
|
|
407 |
root |
1.58 |
=item bool ecb_is_constant (expr) |
408 |
root |
1.1 |
|
409 |
root |
1.3 |
Returns true iff the expression can be deduced to be a compile-time |
410 |
|
|
constant, and false otherwise. |
411 |
|
|
|
412 |
|
|
For example, when you have a C<rndm16> function that returns a 16 bit |
413 |
|
|
random number, and you have a function that maps this to a range from |
414 |
root |
1.5 |
0..n-1, then you could use this inline function in a header file: |
415 |
root |
1.3 |
|
416 |
|
|
ecb_inline uint32_t |
417 |
|
|
rndm (uint32_t n) |
418 |
|
|
{ |
419 |
root |
1.6 |
return (n * (uint32_t)rndm16 ()) >> 16; |
420 |
root |
1.3 |
} |
421 |
|
|
|
422 |
|
|
However, for powers of two, you could use a normal mask, but that is only |
423 |
|
|
worth it if, at compile time, you can detect this case. This is the case |
424 |
|
|
when the passed number is a constant and also a power of two (C<n & (n - |
425 |
|
|
1) == 0>): |
426 |
|
|
|
427 |
|
|
ecb_inline uint32_t |
428 |
|
|
rndm (uint32_t n) |
429 |
|
|
{ |
430 |
|
|
return is_constant (n) && !(n & (n - 1)) |
431 |
|
|
? rndm16 () & (num - 1) |
432 |
root |
1.6 |
: (n * (uint32_t)rndm16 ()) >> 16; |
433 |
root |
1.3 |
} |
434 |
|
|
|
435 |
root |
1.62 |
=item ecb_expect (expr, value) |
436 |
root |
1.1 |
|
437 |
root |
1.7 |
Evaluates C<expr> and returns it. In addition, it tells the compiler that |
438 |
|
|
the C<expr> evaluates to C<value> a lot, which can be used for static |
439 |
|
|
branch optimisations. |
440 |
root |
1.1 |
|
441 |
root |
1.27 |
Usually, you want to use the more intuitive C<ecb_expect_true> and |
442 |
|
|
C<ecb_expect_false> functions instead. |
443 |
root |
1.1 |
|
444 |
root |
1.27 |
=item bool ecb_expect_true (cond) |
445 |
root |
1.1 |
|
446 |
root |
1.27 |
=item bool ecb_expect_false (cond) |
447 |
root |
1.1 |
|
448 |
root |
1.7 |
These two functions expect a expression that is true or false and return |
449 |
|
|
C<1> or C<0>, respectively, so when used in the condition of an C<if> or |
450 |
|
|
other conditional statement, it will not change the program: |
451 |
|
|
|
452 |
|
|
/* these two do the same thing */ |
453 |
|
|
if (some_condition) ...; |
454 |
root |
1.27 |
if (ecb_expect_true (some_condition)) ...; |
455 |
root |
1.7 |
|
456 |
root |
1.27 |
However, by using C<ecb_expect_true>, you tell the compiler that the |
457 |
|
|
condition is likely to be true (and for C<ecb_expect_false>, that it is |
458 |
|
|
unlikely to be true). |
459 |
root |
1.7 |
|
460 |
root |
1.9 |
For example, when you check for a null pointer and expect this to be a |
461 |
root |
1.27 |
rare, exceptional, case, then use C<ecb_expect_false>: |
462 |
root |
1.7 |
|
463 |
|
|
void my_free (void *ptr) |
464 |
|
|
{ |
465 |
root |
1.27 |
if (ecb_expect_false (ptr == 0)) |
466 |
root |
1.7 |
return; |
467 |
|
|
} |
468 |
|
|
|
469 |
|
|
Consequent use of these functions to mark away exceptional cases or to |
470 |
|
|
tell the compiler what the hot path through a function is can increase |
471 |
|
|
performance considerably. |
472 |
|
|
|
473 |
root |
1.27 |
You might know these functions under the name C<likely> and C<unlikely> |
474 |
|
|
- while these are common aliases, we find that the expect name is easier |
475 |
|
|
to understand when quickly skimming code. If you wish, you can use |
476 |
|
|
C<ecb_likely> instead of C<ecb_expect_true> and C<ecb_unlikely> instead of |
477 |
|
|
C<ecb_expect_false> - these are simply aliases. |
478 |
|
|
|
479 |
root |
1.7 |
A very good example is in a function that reserves more space for some |
480 |
|
|
memory block (for example, inside an implementation of a string stream) - |
481 |
root |
1.9 |
each time something is added, you have to check for a buffer overrun, but |
482 |
root |
1.7 |
you expect that most checks will turn out to be false: |
483 |
|
|
|
484 |
|
|
/* make sure we have "size" extra room in our buffer */ |
485 |
|
|
ecb_inline void |
486 |
|
|
reserve (int size) |
487 |
|
|
{ |
488 |
root |
1.27 |
if (ecb_expect_false (current + size > end)) |
489 |
root |
1.7 |
real_reserve_method (size); /* presumably noinline */ |
490 |
|
|
} |
491 |
|
|
|
492 |
root |
1.62 |
=item ecb_assume (cond) |
493 |
root |
1.7 |
|
494 |
sf-exg |
1.66 |
Tries to tell the compiler that some condition is true, even if it's not |
495 |
root |
1.65 |
obvious. This is not a function, but a statement: it cannot be used in |
496 |
|
|
another expression. |
497 |
root |
1.7 |
|
498 |
|
|
This can be used to teach the compiler about invariants or other |
499 |
|
|
conditions that might improve code generation, but which are impossible to |
500 |
|
|
deduce form the code itself. |
501 |
|
|
|
502 |
root |
1.27 |
For example, the example reservation function from the C<ecb_expect_false> |
503 |
root |
1.7 |
description could be written thus (only C<ecb_assume> was added): |
504 |
|
|
|
505 |
|
|
ecb_inline void |
506 |
|
|
reserve (int size) |
507 |
|
|
{ |
508 |
root |
1.27 |
if (ecb_expect_false (current + size > end)) |
509 |
root |
1.7 |
real_reserve_method (size); /* presumably noinline */ |
510 |
|
|
|
511 |
|
|
ecb_assume (current + size <= end); |
512 |
|
|
} |
513 |
|
|
|
514 |
|
|
If you then call this function twice, like this: |
515 |
|
|
|
516 |
|
|
reserve (10); |
517 |
|
|
reserve (1); |
518 |
|
|
|
519 |
|
|
Then the compiler I<might> be able to optimise out the second call |
520 |
|
|
completely, as it knows that C<< current + 1 > end >> is false and the |
521 |
|
|
call will never be executed. |
522 |
|
|
|
523 |
root |
1.62 |
=item ecb_unreachable () |
524 |
root |
1.7 |
|
525 |
|
|
This function does nothing itself, except tell the compiler that it will |
526 |
root |
1.9 |
never be executed. Apart from suppressing a warning in some cases, this |
527 |
root |
1.65 |
function can be used to implement C<ecb_assume> or similar functionality. |
528 |
root |
1.7 |
|
529 |
root |
1.62 |
=item ecb_prefetch (addr, rw, locality) |
530 |
root |
1.7 |
|
531 |
|
|
Tells the compiler to try to prefetch memory at the given C<addr>ess |
532 |
root |
1.10 |
for either reading (C<rw> = 0) or writing (C<rw> = 1). A C<locality> of |
533 |
root |
1.7 |
C<0> means that there will only be one access later, C<3> means that |
534 |
|
|
the data will likely be accessed very often, and values in between mean |
535 |
|
|
something... in between. The memory pointed to by the address does not |
536 |
|
|
need to be accessible (it could be a null pointer for example), but C<rw> |
537 |
|
|
and C<locality> must be compile-time constants. |
538 |
|
|
|
539 |
root |
1.65 |
This is a statement, not a function: you cannot use it as part of an |
540 |
|
|
expression. |
541 |
|
|
|
542 |
root |
1.7 |
An obvious way to use this is to prefetch some data far away, in a big |
543 |
root |
1.9 |
array you loop over. This prefetches memory some 128 array elements later, |
544 |
root |
1.7 |
in the hope that it will be ready when the CPU arrives at that location. |
545 |
|
|
|
546 |
|
|
int sum = 0; |
547 |
|
|
|
548 |
|
|
for (i = 0; i < N; ++i) |
549 |
|
|
{ |
550 |
|
|
sum += arr [i] |
551 |
|
|
ecb_prefetch (arr + i + 128, 0, 0); |
552 |
|
|
} |
553 |
|
|
|
554 |
|
|
It's hard to predict how far to prefetch, and most CPUs that can prefetch |
555 |
|
|
are often good enough to predict this kind of behaviour themselves. It |
556 |
|
|
gets more interesting with linked lists, especially when you do some fair |
557 |
|
|
processing on each list element: |
558 |
|
|
|
559 |
|
|
for (node *n = start; n; n = n->next) |
560 |
|
|
{ |
561 |
|
|
ecb_prefetch (n->next, 0, 0); |
562 |
|
|
... do medium amount of work with *n |
563 |
|
|
} |
564 |
|
|
|
565 |
|
|
After processing the node, (part of) the next node might already be in |
566 |
|
|
cache. |
567 |
root |
1.1 |
|
568 |
root |
1.2 |
=back |
569 |
root |
1.1 |
|
570 |
root |
1.36 |
=head2 BIT FIDDLING / BIT WIZARDRY |
571 |
root |
1.1 |
|
572 |
root |
1.4 |
=over 4 |
573 |
|
|
|
574 |
root |
1.3 |
=item bool ecb_big_endian () |
575 |
|
|
|
576 |
|
|
=item bool ecb_little_endian () |
577 |
|
|
|
578 |
sf-exg |
1.11 |
These two functions return true if the byte order is big endian |
579 |
|
|
(most-significant byte first) or little endian (least-significant byte |
580 |
|
|
first) respectively. |
581 |
|
|
|
582 |
root |
1.24 |
On systems that are neither, their return values are unspecified. |
583 |
|
|
|
584 |
root |
1.3 |
=item int ecb_ctz32 (uint32_t x) |
585 |
|
|
|
586 |
root |
1.35 |
=item int ecb_ctz64 (uint64_t x) |
587 |
|
|
|
588 |
sf-exg |
1.11 |
Returns the index of the least significant bit set in C<x> (or |
589 |
root |
1.24 |
equivalently the number of bits set to 0 before the least significant bit |
590 |
root |
1.35 |
set), starting from 0. If C<x> is 0 the result is undefined. |
591 |
|
|
|
592 |
root |
1.36 |
For smaller types than C<uint32_t> you can safely use C<ecb_ctz32>. |
593 |
|
|
|
594 |
root |
1.35 |
For example: |
595 |
sf-exg |
1.11 |
|
596 |
root |
1.15 |
ecb_ctz32 (3) = 0 |
597 |
|
|
ecb_ctz32 (6) = 1 |
598 |
sf-exg |
1.11 |
|
599 |
root |
1.41 |
=item bool ecb_is_pot32 (uint32_t x) |
600 |
|
|
|
601 |
|
|
=item bool ecb_is_pot64 (uint32_t x) |
602 |
|
|
|
603 |
sf-exg |
1.66 |
Returns true iff C<x> is a power of two or C<x == 0>. |
604 |
root |
1.41 |
|
605 |
sf-exg |
1.66 |
For smaller types than C<uint32_t> you can safely use C<ecb_is_pot32>. |
606 |
root |
1.41 |
|
607 |
root |
1.35 |
=item int ecb_ld32 (uint32_t x) |
608 |
|
|
|
609 |
|
|
=item int ecb_ld64 (uint64_t x) |
610 |
|
|
|
611 |
|
|
Returns the index of the most significant bit set in C<x>, or the number |
612 |
|
|
of digits the number requires in binary (so that C<< 2**ld <= x < |
613 |
|
|
2**(ld+1) >>). If C<x> is 0 the result is undefined. A common use case is |
614 |
|
|
to compute the integer binary logarithm, i.e. C<floor (log2 (n))>, for |
615 |
|
|
example to see how many bits a certain number requires to be encoded. |
616 |
|
|
|
617 |
|
|
This function is similar to the "count leading zero bits" function, except |
618 |
|
|
that that one returns how many zero bits are "in front" of the number (in |
619 |
|
|
the given data type), while C<ecb_ld> returns how many bits the number |
620 |
|
|
itself requires. |
621 |
|
|
|
622 |
root |
1.36 |
For smaller types than C<uint32_t> you can safely use C<ecb_ld32>. |
623 |
|
|
|
624 |
root |
1.3 |
=item int ecb_popcount32 (uint32_t x) |
625 |
|
|
|
626 |
root |
1.35 |
=item int ecb_popcount64 (uint64_t x) |
627 |
|
|
|
628 |
root |
1.36 |
Returns the number of bits set to 1 in C<x>. |
629 |
|
|
|
630 |
|
|
For smaller types than C<uint32_t> you can safely use C<ecb_popcount32>. |
631 |
|
|
|
632 |
|
|
For example: |
633 |
sf-exg |
1.11 |
|
634 |
root |
1.15 |
ecb_popcount32 (7) = 3 |
635 |
|
|
ecb_popcount32 (255) = 8 |
636 |
sf-exg |
1.11 |
|
637 |
root |
1.39 |
=item uint8_t ecb_bitrev8 (uint8_t x) |
638 |
|
|
|
639 |
|
|
=item uint16_t ecb_bitrev16 (uint16_t x) |
640 |
|
|
|
641 |
|
|
=item uint32_t ecb_bitrev32 (uint32_t x) |
642 |
|
|
|
643 |
|
|
Reverses the bits in x, i.e. the MSB becomes the LSB, MSB-1 becomes LSB+1 |
644 |
|
|
and so on. |
645 |
|
|
|
646 |
|
|
Example: |
647 |
|
|
|
648 |
|
|
ecb_bitrev8 (0xa7) = 0xea |
649 |
|
|
ecb_bitrev32 (0xffcc4411) = 0x882233ff |
650 |
|
|
|
651 |
root |
1.8 |
=item uint32_t ecb_bswap16 (uint32_t x) |
652 |
|
|
|
653 |
root |
1.3 |
=item uint32_t ecb_bswap32 (uint32_t x) |
654 |
|
|
|
655 |
root |
1.34 |
=item uint64_t ecb_bswap64 (uint64_t x) |
656 |
sf-exg |
1.13 |
|
657 |
root |
1.34 |
These functions return the value of the 16-bit (32-bit, 64-bit) value |
658 |
|
|
C<x> after reversing the order of bytes (0x11223344 becomes 0x44332211 in |
659 |
|
|
C<ecb_bswap32>). |
660 |
|
|
|
661 |
|
|
=item uint8_t ecb_rotl8 (uint8_t x, unsigned int count) |
662 |
|
|
|
663 |
|
|
=item uint16_t ecb_rotl16 (uint16_t x, unsigned int count) |
664 |
root |
1.3 |
|
665 |
|
|
=item uint32_t ecb_rotl32 (uint32_t x, unsigned int count) |
666 |
|
|
|
667 |
root |
1.34 |
=item uint64_t ecb_rotl64 (uint64_t x, unsigned int count) |
668 |
|
|
|
669 |
|
|
=item uint8_t ecb_rotr8 (uint8_t x, unsigned int count) |
670 |
|
|
|
671 |
|
|
=item uint16_t ecb_rotr16 (uint16_t x, unsigned int count) |
672 |
|
|
|
673 |
|
|
=item uint32_t ecb_rotr32 (uint32_t x, unsigned int count) |
674 |
|
|
|
675 |
root |
1.33 |
=item uint64_t ecb_rotr64 (uint64_t x, unsigned int count) |
676 |
|
|
|
677 |
root |
1.34 |
These two families of functions return the value of C<x> after rotating |
678 |
|
|
all the bits by C<count> positions to the right (C<ecb_rotr>) or left |
679 |
|
|
(C<ecb_rotl>). |
680 |
sf-exg |
1.11 |
|
681 |
root |
1.20 |
Current GCC versions understand these functions and usually compile them |
682 |
root |
1.34 |
to "optimal" code (e.g. a single C<rol> or a combination of C<shld> on |
683 |
|
|
x86). |
684 |
root |
1.20 |
|
685 |
root |
1.3 |
=back |
686 |
root |
1.1 |
|
687 |
root |
1.50 |
=head2 FLOATING POINT FIDDLING |
688 |
|
|
|
689 |
|
|
=over 4 |
690 |
|
|
|
691 |
root |
1.71 |
=item ECB_INFINITY [-UECB_NO_LIBM] |
692 |
root |
1.62 |
|
693 |
|
|
Evaluates to positive infinity if supported by the platform, otherwise to |
694 |
|
|
a truly huge number. |
695 |
|
|
|
696 |
root |
1.71 |
=item ECB_NAN [-UECB_NO_LIBM] |
697 |
root |
1.62 |
|
698 |
|
|
Evaluates to a quiet NAN if supported by the platform, otherwise to |
699 |
|
|
C<ECB_INFINITY>. |
700 |
|
|
|
701 |
root |
1.71 |
=item float ecb_ldexpf (float x, int exp) [-UECB_NO_LIBM] |
702 |
root |
1.62 |
|
703 |
|
|
Same as C<ldexpf>, but always available. |
704 |
|
|
|
705 |
root |
1.71 |
=item uint32_t ecb_float_to_binary16 (float x) [-UECB_NO_LIBM] |
706 |
|
|
|
707 |
root |
1.50 |
=item uint32_t ecb_float_to_binary32 (float x) [-UECB_NO_LIBM] |
708 |
|
|
|
709 |
|
|
=item uint64_t ecb_double_to_binary64 (double x) [-UECB_NO_LIBM] |
710 |
|
|
|
711 |
|
|
These functions each take an argument in the native C<float> or C<double> |
712 |
root |
1.71 |
type and return the IEEE 754 bit representation of it (binary16/half, |
713 |
|
|
binary32/single or binary64/double precision). |
714 |
root |
1.50 |
|
715 |
|
|
The bit representation is just as IEEE 754 defines it, i.e. the sign bit |
716 |
|
|
will be the most significant bit, followed by exponent and mantissa. |
717 |
|
|
|
718 |
|
|
This function should work even when the native floating point format isn't |
719 |
|
|
IEEE compliant, of course at a speed and code size penalty, and of course |
720 |
|
|
also within reasonable limits (it tries to convert NaNs, infinities and |
721 |
|
|
denormals, but will likely convert negative zero to positive zero). |
722 |
|
|
|
723 |
|
|
On all modern platforms (where C<ECB_STDFP> is true), the compiler should |
724 |
|
|
be able to optimise away this function completely. |
725 |
|
|
|
726 |
|
|
These functions can be helpful when serialising floats to the network - you |
727 |
root |
1.71 |
can serialise the return value like a normal uint16_t/uint32_t/uint64_t. |
728 |
root |
1.50 |
|
729 |
|
|
Another use for these functions is to manipulate floating point values |
730 |
|
|
directly. |
731 |
|
|
|
732 |
|
|
Silly example: toggle the sign bit of a float. |
733 |
|
|
|
734 |
|
|
/* On gcc-4.7 on amd64, */ |
735 |
|
|
/* this results in a single add instruction to toggle the bit, and 4 extra */ |
736 |
|
|
/* instructions to move the float value to an integer register and back. */ |
737 |
|
|
|
738 |
|
|
x = ecb_binary32_to_float (ecb_float_to_binary32 (x) ^ 0x80000000U) |
739 |
|
|
|
740 |
root |
1.58 |
=item float ecb_binary16_to_float (uint16_t x) [-UECB_NO_LIBM] |
741 |
|
|
|
742 |
root |
1.50 |
=item float ecb_binary32_to_float (uint32_t x) [-UECB_NO_LIBM] |
743 |
|
|
|
744 |
root |
1.70 |
=item double ecb_binary64_to_double (uint64_t x) [-UECB_NO_LIBM] |
745 |
root |
1.50 |
|
746 |
sf-exg |
1.59 |
The reverse operation of the previous function - takes the bit |
747 |
root |
1.71 |
representation of an IEEE binary16, binary32 or binary64 number (half, |
748 |
|
|
single or double precision) and converts it to the native C<float> or |
749 |
|
|
C<double> format. |
750 |
root |
1.50 |
|
751 |
|
|
This function should work even when the native floating point format isn't |
752 |
|
|
IEEE compliant, of course at a speed and code size penalty, and of course |
753 |
|
|
also within reasonable limits (it tries to convert normals and denormals, |
754 |
|
|
and might be lucky for infinities, and with extraordinary luck, also for |
755 |
|
|
negative zero). |
756 |
|
|
|
757 |
|
|
On all modern platforms (where C<ECB_STDFP> is true), the compiler should |
758 |
|
|
be able to optimise away this function completely. |
759 |
|
|
|
760 |
root |
1.71 |
=item uint16_t ecb_binary32_to_binary16 (uint32_t x) |
761 |
|
|
|
762 |
|
|
=item uint32_t ecb_binary16_to_binary32 (uint16_t x) |
763 |
|
|
|
764 |
|
|
Convert a IEEE binary32/single precision to binary16/half format, and vice |
765 |
root |
1.72 |
versa, handling all details (round-to-nearest-even, subnormals, infinity |
766 |
|
|
and NaNs) correctly. |
767 |
root |
1.71 |
|
768 |
|
|
These are functions are available under C<-DECB_NO_LIBM>, since |
769 |
|
|
they do not rely on the platform floating point format. The |
770 |
|
|
C<ecb_float_to_binary16> and C<ecb_binary16_to_float> functions are |
771 |
|
|
usually what you want. |
772 |
|
|
|
773 |
root |
1.50 |
=back |
774 |
|
|
|
775 |
root |
1.1 |
=head2 ARITHMETIC |
776 |
|
|
|
777 |
root |
1.3 |
=over 4 |
778 |
|
|
|
779 |
root |
1.14 |
=item x = ecb_mod (m, n) |
780 |
root |
1.3 |
|
781 |
root |
1.25 |
Returns C<m> modulo C<n>, which is the same as the positive remainder |
782 |
|
|
of the division operation between C<m> and C<n>, using floored |
783 |
|
|
division. Unlike the C remainder operator C<%>, this function ensures that |
784 |
|
|
the return value is always positive and that the two numbers I<m> and |
785 |
|
|
I<m' = m + i * n> result in the same value modulo I<n> - in other words, |
786 |
|
|
C<ecb_mod> implements the mathematical modulo operation, which is missing |
787 |
|
|
in the language. |
788 |
root |
1.14 |
|
789 |
sf-exg |
1.23 |
C<n> must be strictly positive (i.e. C<< >= 1 >>), while C<m> must be |
790 |
root |
1.14 |
negatable, that is, both C<m> and C<-m> must be representable in its |
791 |
root |
1.30 |
type (this typically excludes the minimum signed integer value, the same |
792 |
root |
1.25 |
limitation as for C</> and C<%> in C). |
793 |
sf-exg |
1.11 |
|
794 |
root |
1.24 |
Current GCC versions compile this into an efficient branchless sequence on |
795 |
root |
1.28 |
almost all CPUs. |
796 |
root |
1.24 |
|
797 |
|
|
For example, when you want to rotate forward through the members of an |
798 |
|
|
array for increasing C<m> (which might be negative), then you should use |
799 |
|
|
C<ecb_mod>, as the C<%> operator might give either negative results, or |
800 |
|
|
change direction for negative values: |
801 |
|
|
|
802 |
|
|
for (m = -100; m <= 100; ++m) |
803 |
|
|
int elem = myarray [ecb_mod (m, ecb_array_length (myarray))]; |
804 |
|
|
|
805 |
sf-exg |
1.37 |
=item x = ecb_div_rd (val, div) |
806 |
|
|
|
807 |
|
|
=item x = ecb_div_ru (val, div) |
808 |
|
|
|
809 |
|
|
Returns C<val> divided by C<div> rounded down or up, respectively. |
810 |
|
|
C<val> and C<div> must have integer types and C<div> must be strictly |
811 |
sf-exg |
1.38 |
positive. Note that these functions are implemented with macros in C |
812 |
|
|
and with function templates in C++. |
813 |
sf-exg |
1.37 |
|
814 |
root |
1.3 |
=back |
815 |
root |
1.1 |
|
816 |
|
|
=head2 UTILITY |
817 |
|
|
|
818 |
root |
1.3 |
=over 4 |
819 |
|
|
|
820 |
sf-exg |
1.23 |
=item element_count = ecb_array_length (name) |
821 |
root |
1.3 |
|
822 |
sf-exg |
1.13 |
Returns the number of elements in the array C<name>. For example: |
823 |
|
|
|
824 |
|
|
int primes[] = { 2, 3, 5, 7, 11 }; |
825 |
|
|
int sum = 0; |
826 |
|
|
|
827 |
|
|
for (i = 0; i < ecb_array_length (primes); i++) |
828 |
|
|
sum += primes [i]; |
829 |
|
|
|
830 |
root |
1.3 |
=back |
831 |
root |
1.1 |
|
832 |
root |
1.43 |
=head2 SYMBOLS GOVERNING COMPILATION OF ECB.H ITSELF |
833 |
|
|
|
834 |
|
|
These symbols need to be defined before including F<ecb.h> the first time. |
835 |
|
|
|
836 |
|
|
=over 4 |
837 |
|
|
|
838 |
root |
1.51 |
=item ECB_NO_THREADS |
839 |
root |
1.43 |
|
840 |
|
|
If F<ecb.h> is never used from multiple threads, then this symbol can |
841 |
|
|
be defined, in which case memory fences (and similar constructs) are |
842 |
|
|
completely removed, leading to more efficient code and fewer dependencies. |
843 |
|
|
|
844 |
|
|
Setting this symbol to a true value implies C<ECB_NO_SMP>. |
845 |
|
|
|
846 |
|
|
=item ECB_NO_SMP |
847 |
|
|
|
848 |
|
|
The weaker version of C<ECB_NO_THREADS> - if F<ecb.h> is used from |
849 |
|
|
multiple threads, but never concurrently (e.g. if the system the program |
850 |
|
|
runs on has only a single CPU with a single core, no hyperthreading and so |
851 |
|
|
on), then this symbol can be defined, leading to more efficient code and |
852 |
|
|
fewer dependencies. |
853 |
|
|
|
854 |
root |
1.50 |
=item ECB_NO_LIBM |
855 |
|
|
|
856 |
|
|
When defined to C<1>, do not export any functions that might introduce |
857 |
|
|
dependencies on the math library (usually called F<-lm>) - these are |
858 |
|
|
marked with [-UECB_NO_LIBM]. |
859 |
|
|
|
860 |
sf-exg |
1.69 |
=back |
861 |
|
|
|
862 |
root |
1.68 |
=head1 UNDOCUMENTED FUNCTIONALITY |
863 |
|
|
|
864 |
|
|
F<ecb.h> is full of undocumented functionality as well, some of which is |
865 |
|
|
intended to be internal-use only, some of which we forgot to document, and |
866 |
|
|
some of which we hide because we are not sure we will keep the interface |
867 |
|
|
stable. |
868 |
|
|
|
869 |
|
|
While you are welcome to rummage around and use whatever you find useful |
870 |
|
|
(we can't stop you), keep in mind that we will change undocumented |
871 |
|
|
functionality in incompatible ways without thinking twice, while we are |
872 |
|
|
considerably more conservative with documented things. |
873 |
|
|
|
874 |
|
|
=head1 AUTHORS |
875 |
|
|
|
876 |
|
|
C<libecb> is designed and maintained by: |
877 |
|
|
|
878 |
|
|
Emanuele Giaquinta <e.giaquinta@glauco.it> |
879 |
|
|
Marc Alexander Lehmann <schmorp@schmorp.de> |
880 |
|
|
|
881 |
root |
1.1 |
|