1 |
=head1 LIBECB - e-C-Builtins |
2 |
|
3 |
=head2 ABOUT LIBECB |
4 |
|
5 |
Libecb is currently a simple header file that doesn't require any |
6 |
configuration to use or include in your project. |
7 |
|
8 |
It's part of the e-suite of libraries, other members of which include |
9 |
libev and libeio. |
10 |
|
11 |
Its homepage can be found here: |
12 |
|
13 |
http://software.schmorp.de/pkg/libecb |
14 |
|
15 |
It mainly provides a number of wrappers around many compiler built-ins, |
16 |
together with replacement functions for other compilers. In addition |
17 |
to this, it provides a number of other low-level C utilities, such as |
18 |
endianness detection, byte swapping or bit rotations. |
19 |
|
20 |
Or in other words, things that should be built into any standard C |
21 |
system, but aren't, implemented as efficient as possible with GCC (clang, |
22 |
MSVC...), and still correct with other compilers. |
23 |
|
24 |
More might come. |
25 |
|
26 |
=head2 ABOUT THE HEADER |
27 |
|
28 |
At the moment, all you have to do is copy F<ecb.h> somewhere where your |
29 |
compiler can find it and include it: |
30 |
|
31 |
#include <ecb.h> |
32 |
|
33 |
The header should work fine for both C and C++ compilation, and gives you |
34 |
all of F<inttypes.h> in addition to the ECB symbols. |
35 |
|
36 |
There are currently no object files to link to - future versions might |
37 |
come with an (optional) object code library to link against, to reduce |
38 |
code size or gain access to additional features. |
39 |
|
40 |
It also currently includes everything from F<inttypes.h>. |
41 |
|
42 |
=head2 ABOUT THIS MANUAL / CONVENTIONS |
43 |
|
44 |
This manual mainly describes each (public) function available after |
45 |
including the F<ecb.h> header. The header might define other symbols than |
46 |
these, but these are not part of the public API, and not supported in any |
47 |
way. |
48 |
|
49 |
When the manual mentions a "function" then this could be defined either as |
50 |
as inline function, a macro, or an external symbol. |
51 |
|
52 |
When functions use a concrete standard type, such as C<int> or |
53 |
C<uint32_t>, then the corresponding function works only with that type. If |
54 |
only a generic name is used (C<expr>, C<cond>, C<value> and so on), then |
55 |
the corresponding function relies on C to implement the correct types, and |
56 |
is usually implemented as a macro. Specifically, a "bool" in this manual |
57 |
refers to any kind of boolean value, not a specific type. |
58 |
|
59 |
=head2 TYPES / TYPE SUPPORT |
60 |
|
61 |
F<ecb.h> makes sure that the following types are defined (in the expected way): |
62 |
|
63 |
int8_t uint8_ |
64 |
int16_t uint16_t |
65 |
int32_t uint32_ |
66 |
int64_t uint64_t |
67 |
int_fast8_t uint_fast8_t |
68 |
int_fast16_t uint_fast16_t |
69 |
int_fast32_t uint_fast32_t |
70 |
int_fast64_t uint_fast64_t |
71 |
intptr_t uintptr_t |
72 |
|
73 |
The macro C<ECB_PTRSIZE> is defined to the size of a pointer on this |
74 |
platform (currently C<4> or C<8>) and can be used in preprocessor |
75 |
expressions. |
76 |
|
77 |
For C<ptrdiff_t> and C<size_t> use C<stddef.h>/C<cstddef>. |
78 |
|
79 |
=head2 LANGUAGE/ENVIRONMENT/COMPILER VERSIONS |
80 |
|
81 |
All the following symbols expand to an expression that can be tested in |
82 |
preprocessor instructions as well as treated as a boolean (use C<!!> to |
83 |
ensure it's either C<0> or C<1> if you need that). |
84 |
|
85 |
=over |
86 |
|
87 |
=item ECB_C |
88 |
|
89 |
True if the implementation defines the C<__STDC__> macro to a true value, |
90 |
while not claiming to be C++, i..e C, but not C++. |
91 |
|
92 |
=item ECB_C99 |
93 |
|
94 |
True if the implementation claims to be compliant to C99 (ISO/IEC |
95 |
9899:1999) or any later version, while not claiming to be C++. |
96 |
|
97 |
Note that later versions (ECB_C11) remove core features again (for |
98 |
example, variable length arrays). |
99 |
|
100 |
=item ECB_C11, ECB_C17 |
101 |
|
102 |
True if the implementation claims to be compliant to C11/C17 (ISO/IEC |
103 |
9899:2011, :20187) or any later version, while not claiming to be C++. |
104 |
|
105 |
=item ECB_CPP |
106 |
|
107 |
True if the implementation defines the C<__cplusplus__> macro to a true |
108 |
value, which is typically true for C++ compilers. |
109 |
|
110 |
=item ECB_CPP11, ECB_CPP14, ECB_CPP17 |
111 |
|
112 |
True if the implementation claims to be compliant to C++11/C++14/C++17 |
113 |
(ISO/IEC 14882:2011, :2014, :2017) or any later version. |
114 |
|
115 |
Note that many C++20 features will likely have their own feature test |
116 |
macros (see e.g. L<http://eel.is/c++draft/cpp.predefined#1.8>). |
117 |
|
118 |
=item ECB_OPTIMIZE_SIZE |
119 |
|
120 |
Is C<1> when the compiler optimizes for size, C<0> otherwise. This symbol |
121 |
can also be defined before including F<ecb.h>, in which case it will be |
122 |
unchanged. |
123 |
|
124 |
=item ECB_GCC_VERSION (major, minor) |
125 |
|
126 |
Expands to a true value (suitable for testing by the preprocessor) if the |
127 |
compiler used is GNU C and the version is the given version, or higher. |
128 |
|
129 |
This macro tries to return false on compilers that claim to be GCC |
130 |
compatible but aren't. |
131 |
|
132 |
=item ECB_EXTERN_C |
133 |
|
134 |
Expands to C<extern "C"> in C++, and a simple C<extern> in C. |
135 |
|
136 |
This can be used to declare a single external C function: |
137 |
|
138 |
ECB_EXTERN_C int printf (const char *format, ...); |
139 |
|
140 |
=item ECB_EXTERN_C_BEG / ECB_EXTERN_C_END |
141 |
|
142 |
These two macros can be used to wrap multiple C<extern "C"> definitions - |
143 |
they expand to nothing in C. |
144 |
|
145 |
They are most useful in header files: |
146 |
|
147 |
ECB_EXTERN_C_BEG |
148 |
|
149 |
int mycfun1 (int x); |
150 |
int mycfun2 (int x); |
151 |
|
152 |
ECB_EXTERN_C_END |
153 |
|
154 |
=item ECB_STDFP |
155 |
|
156 |
If this evaluates to a true value (suitable for testing by the |
157 |
preprocessor), then C<float> and C<double> use IEEE 754 single/binary32 |
158 |
and double/binary64 representations internally I<and> the endianness of |
159 |
both types match the endianness of C<uint32_t> and C<uint64_t>. |
160 |
|
161 |
This means you can just copy the bits of a C<float> (or C<double>) to an |
162 |
C<uint32_t> (or C<uint64_t>) and get the raw IEEE 754 bit representation |
163 |
without having to think about format or endianness. |
164 |
|
165 |
This is true for basically all modern platforms, although F<ecb.h> might |
166 |
not be able to deduce this correctly everywhere and might err on the safe |
167 |
side. |
168 |
|
169 |
=item ECB_64BIT_NATIVE |
170 |
|
171 |
Evaluates to a true value (suitable for both preprocessor and C code |
172 |
testing) if 64 bit integer types on this architecture are evaluated |
173 |
"natively", that is, with similar speeds as 32 bit integers. While 64 bit |
174 |
integer support is very common (and in fact required by libecb), 32 bit |
175 |
CPUs have to emulate operations on them, so you might want to avoid them. |
176 |
|
177 |
=item ECB_AMD64, ECB_AMD64_X32 |
178 |
|
179 |
These two macros are defined to C<1> on the x86_64/amd64 ABI and the X32 |
180 |
ABI, respectively, and undefined elsewhere. |
181 |
|
182 |
The designers of the new X32 ABI for some inexplicable reason decided to |
183 |
make it look exactly like amd64, even though it's completely incompatible |
184 |
to that ABI, breaking about every piece of software that assumed that |
185 |
C<__x86_64> stands for, well, the x86-64 ABI, making these macros |
186 |
necessary. |
187 |
|
188 |
=back |
189 |
|
190 |
=head2 MACRO TRICKERY |
191 |
|
192 |
=over |
193 |
|
194 |
=item ECB_CONCAT (a, b) |
195 |
|
196 |
Expands any macros in C<a> and C<b>, then concatenates the result to form |
197 |
a single token. This is mainly useful to form identifiers from components, |
198 |
e.g.: |
199 |
|
200 |
#define S1 str |
201 |
#define S2 cpy |
202 |
|
203 |
ECB_CONCAT (S1, S2)(dst, src); // == strcpy (dst, src); |
204 |
|
205 |
=item ECB_STRINGIFY (arg) |
206 |
|
207 |
Expands any macros in C<arg> and returns the stringified version of |
208 |
it. This is mainly useful to get the contents of a macro in string form, |
209 |
e.g.: |
210 |
|
211 |
#define SQL_LIMIT 100 |
212 |
sql_exec ("select * from table limit " ECB_STRINGIFY (SQL_LIMIT)); |
213 |
|
214 |
=item ECB_STRINGIFY_EXPR (expr) |
215 |
|
216 |
Like C<ECB_STRINGIFY>, but additionally evaluates C<expr> to make sure it |
217 |
is a valid expression. This is useful to catch typos or cases where the |
218 |
macro isn't available: |
219 |
|
220 |
#include <errno.h> |
221 |
|
222 |
ECB_STRINGIFY (EDOM); // "33" (on my system at least) |
223 |
ECB_STRINGIFY_EXPR (EDOM); // "33" |
224 |
|
225 |
// now imagine we had a typo: |
226 |
|
227 |
ECB_STRINGIFY (EDAM); // "EDAM" |
228 |
ECB_STRINGIFY_EXPR (EDAM); // error: EDAM undefined |
229 |
|
230 |
=back |
231 |
|
232 |
=head2 ATTRIBUTES |
233 |
|
234 |
A major part of libecb deals with additional attributes that can be |
235 |
assigned to functions, variables and sometimes even types - much like |
236 |
C<const> or C<volatile> in C. They are implemented using either GCC |
237 |
attributes or other compiler/language specific features. Attributes |
238 |
declarations must be put before the whole declaration: |
239 |
|
240 |
ecb_const int mysqrt (int a); |
241 |
ecb_unused int i; |
242 |
|
243 |
=over |
244 |
|
245 |
=item ecb_unused |
246 |
|
247 |
Marks a function or a variable as "unused", which simply suppresses a |
248 |
warning by the compiler when it detects it as unused. This is useful when |
249 |
you e.g. declare a variable but do not always use it: |
250 |
|
251 |
{ |
252 |
ecb_unused int var; |
253 |
|
254 |
#ifdef SOMECONDITION |
255 |
var = ...; |
256 |
return var; |
257 |
#else |
258 |
return 0; |
259 |
#endif |
260 |
} |
261 |
|
262 |
=item ecb_deprecated |
263 |
|
264 |
Similar to C<ecb_unused>, but marks a function, variable or type as |
265 |
deprecated. This makes some compilers warn when the type is used. |
266 |
|
267 |
=item ecb_deprecated_message (message) |
268 |
|
269 |
Same as C<ecb_deprecated>, but if possible, the specified diagnostic is |
270 |
used instead of a generic depreciation message when the object is being |
271 |
used. |
272 |
|
273 |
=item ecb_inline |
274 |
|
275 |
Expands either to (a compiler-specific equivalent of) C<static inline> or |
276 |
to just C<static>, if inline isn't supported. It should be used to declare |
277 |
functions that should be inlined, for code size or speed reasons. |
278 |
|
279 |
Example: inline this function, it surely will reduce code size. |
280 |
|
281 |
ecb_inline int |
282 |
negmul (int a, int b) |
283 |
{ |
284 |
return - (a * b); |
285 |
} |
286 |
|
287 |
=item ecb_noinline |
288 |
|
289 |
Prevents a function from being inlined - it might be optimised away, but |
290 |
not inlined into other functions. This is useful if you know your function |
291 |
is rarely called and large enough for inlining not to be helpful. |
292 |
|
293 |
=item ecb_noreturn |
294 |
|
295 |
Marks a function as "not returning, ever". Some typical functions that |
296 |
don't return are C<exit> or C<abort> (which really works hard to not |
297 |
return), and now you can make your own: |
298 |
|
299 |
ecb_noreturn void |
300 |
my_abort (const char *errline) |
301 |
{ |
302 |
puts (errline); |
303 |
abort (); |
304 |
} |
305 |
|
306 |
In this case, the compiler would probably be smart enough to deduce it on |
307 |
its own, so this is mainly useful for declarations. |
308 |
|
309 |
=item ecb_restrict |
310 |
|
311 |
Expands to the C<restrict> keyword or equivalent on compilers that support |
312 |
them, and to nothing on others. Must be specified on a pointer type or |
313 |
an array index to indicate that the memory doesn't alias with any other |
314 |
restricted pointer in the same scope. |
315 |
|
316 |
Example: multiply a vector, and allow the compiler to parallelise the |
317 |
loop, because it knows it doesn't overwrite input values. |
318 |
|
319 |
void |
320 |
multiply (ecb_restrict float *src, |
321 |
ecb_restrict float *dst, |
322 |
int len, float factor) |
323 |
{ |
324 |
int i; |
325 |
|
326 |
for (i = 0; i < len; ++i) |
327 |
dst [i] = src [i] * factor; |
328 |
} |
329 |
|
330 |
=item ecb_const |
331 |
|
332 |
Declares that the function only depends on the values of its arguments, |
333 |
much like a mathematical function. It specifically does not read or write |
334 |
any memory any arguments might point to, global variables, or call any |
335 |
non-const functions. It also must not have any side effects. |
336 |
|
337 |
Such a function can be optimised much more aggressively by the compiler - |
338 |
for example, multiple calls with the same arguments can be optimised into |
339 |
a single call, which wouldn't be possible if the compiler would have to |
340 |
expect any side effects. |
341 |
|
342 |
It is best suited for functions in the sense of mathematical functions, |
343 |
such as a function returning the square root of its input argument. |
344 |
|
345 |
Not suited would be a function that calculates the hash of some memory |
346 |
area you pass in, prints some messages or looks at a global variable to |
347 |
decide on rounding. |
348 |
|
349 |
See C<ecb_pure> for a slightly less restrictive class of functions. |
350 |
|
351 |
=item ecb_pure |
352 |
|
353 |
Similar to C<ecb_const>, declares a function that has no side |
354 |
effects. Unlike C<ecb_const>, the function is allowed to examine global |
355 |
variables and any other memory areas (such as the ones passed to it via |
356 |
pointers). |
357 |
|
358 |
While these functions cannot be optimised as aggressively as C<ecb_const> |
359 |
functions, they can still be optimised away in many occasions, and the |
360 |
compiler has more freedom in moving calls to them around. |
361 |
|
362 |
Typical examples for such functions would be C<strlen> or C<memcmp>. A |
363 |
function that calculates the MD5 sum of some input and updates some MD5 |
364 |
state passed as argument would I<NOT> be pure, however, as it would modify |
365 |
some memory area that is not the return value. |
366 |
|
367 |
=item ecb_hot |
368 |
|
369 |
This declares a function as "hot" with regards to the cache - the function |
370 |
is used so often, that it is very beneficial to keep it in the cache if |
371 |
possible. |
372 |
|
373 |
The compiler reacts by trying to place hot functions near to each other in |
374 |
memory. |
375 |
|
376 |
Whether a function is hot or not often depends on the whole program, |
377 |
and less on the function itself. C<ecb_cold> is likely more useful in |
378 |
practise. |
379 |
|
380 |
=item ecb_cold |
381 |
|
382 |
The opposite of C<ecb_hot> - declares a function as "cold" with regards to |
383 |
the cache, or in other words, this function is not called often, or not at |
384 |
speed-critical times, and keeping it in the cache might be a waste of said |
385 |
cache. |
386 |
|
387 |
In addition to placing cold functions together (or at least away from hot |
388 |
functions), this knowledge can be used in other ways, for example, the |
389 |
function will be optimised for size, as opposed to speed, and code paths |
390 |
leading to calls to those functions can automatically be marked as if |
391 |
C<ecb_expect_false> had been used to reach them. |
392 |
|
393 |
Good examples for such functions would be error reporting functions, or |
394 |
functions only called in exceptional or rare cases. |
395 |
|
396 |
=item ecb_artificial |
397 |
|
398 |
Declares the function as "artificial", in this case meaning that this |
399 |
function is not really meant to be a function, but more like an accessor |
400 |
- many methods in C++ classes are mere accessor functions, and having a |
401 |
crash reported in such a method, or single-stepping through them, is not |
402 |
usually so helpful, especially when it's inlined to just a few instructions. |
403 |
|
404 |
Marking them as artificial will instruct the debugger about just this, |
405 |
leading to happier debugging and thus happier lives. |
406 |
|
407 |
Example: in some kind of smart-pointer class, mark the pointer accessor as |
408 |
artificial, so that the whole class acts more like a pointer and less like |
409 |
some C++ abstraction monster. |
410 |
|
411 |
template<typename T> |
412 |
struct my_smart_ptr |
413 |
{ |
414 |
T *value; |
415 |
|
416 |
ecb_artificial |
417 |
operator T *() |
418 |
{ |
419 |
return value; |
420 |
} |
421 |
}; |
422 |
|
423 |
=back |
424 |
|
425 |
=head2 OPTIMISATION HINTS |
426 |
|
427 |
=over |
428 |
|
429 |
=item bool ecb_is_constant (expr) |
430 |
|
431 |
Returns true iff the expression can be deduced to be a compile-time |
432 |
constant, and false otherwise. |
433 |
|
434 |
For example, when you have a C<rndm16> function that returns a 16 bit |
435 |
random number, and you have a function that maps this to a range from |
436 |
0..n-1, then you could use this inline function in a header file: |
437 |
|
438 |
ecb_inline uint32_t |
439 |
rndm (uint32_t n) |
440 |
{ |
441 |
return (n * (uint32_t)rndm16 ()) >> 16; |
442 |
} |
443 |
|
444 |
However, for powers of two, you could use a normal mask, but that is only |
445 |
worth it if, at compile time, you can detect this case. This is the case |
446 |
when the passed number is a constant and also a power of two (C<n & (n - |
447 |
1) == 0>): |
448 |
|
449 |
ecb_inline uint32_t |
450 |
rndm (uint32_t n) |
451 |
{ |
452 |
return is_constant (n) && !(n & (n - 1)) |
453 |
? rndm16 () & (num - 1) |
454 |
: (n * (uint32_t)rndm16 ()) >> 16; |
455 |
} |
456 |
|
457 |
=item ecb_expect (expr, value) |
458 |
|
459 |
Evaluates C<expr> and returns it. In addition, it tells the compiler that |
460 |
the C<expr> evaluates to C<value> a lot, which can be used for static |
461 |
branch optimisations. |
462 |
|
463 |
Usually, you want to use the more intuitive C<ecb_expect_true> and |
464 |
C<ecb_expect_false> functions instead. |
465 |
|
466 |
=item bool ecb_expect_true (cond) |
467 |
|
468 |
=item bool ecb_expect_false (cond) |
469 |
|
470 |
These two functions expect a expression that is true or false and return |
471 |
C<1> or C<0>, respectively, so when used in the condition of an C<if> or |
472 |
other conditional statement, it will not change the program: |
473 |
|
474 |
/* these two do the same thing */ |
475 |
if (some_condition) ...; |
476 |
if (ecb_expect_true (some_condition)) ...; |
477 |
|
478 |
However, by using C<ecb_expect_true>, you tell the compiler that the |
479 |
condition is likely to be true (and for C<ecb_expect_false>, that it is |
480 |
unlikely to be true). |
481 |
|
482 |
For example, when you check for a null pointer and expect this to be a |
483 |
rare, exceptional, case, then use C<ecb_expect_false>: |
484 |
|
485 |
void my_free (void *ptr) |
486 |
{ |
487 |
if (ecb_expect_false (ptr == 0)) |
488 |
return; |
489 |
} |
490 |
|
491 |
Consequent use of these functions to mark away exceptional cases or to |
492 |
tell the compiler what the hot path through a function is can increase |
493 |
performance considerably. |
494 |
|
495 |
You might know these functions under the name C<likely> and C<unlikely> |
496 |
- while these are common aliases, we find that the expect name is easier |
497 |
to understand when quickly skimming code. If you wish, you can use |
498 |
C<ecb_likely> instead of C<ecb_expect_true> and C<ecb_unlikely> instead of |
499 |
C<ecb_expect_false> - these are simply aliases. |
500 |
|
501 |
A very good example is in a function that reserves more space for some |
502 |
memory block (for example, inside an implementation of a string stream) - |
503 |
each time something is added, you have to check for a buffer overrun, but |
504 |
you expect that most checks will turn out to be false: |
505 |
|
506 |
/* make sure we have "size" extra room in our buffer */ |
507 |
ecb_inline void |
508 |
reserve (int size) |
509 |
{ |
510 |
if (ecb_expect_false (current + size > end)) |
511 |
real_reserve_method (size); /* presumably noinline */ |
512 |
} |
513 |
|
514 |
=item ecb_assume (cond) |
515 |
|
516 |
Tries to tell the compiler that some condition is true, even if it's not |
517 |
obvious. This is not a function, but a statement: it cannot be used in |
518 |
another expression. |
519 |
|
520 |
This can be used to teach the compiler about invariants or other |
521 |
conditions that might improve code generation, but which are impossible to |
522 |
deduce form the code itself. |
523 |
|
524 |
For example, the example reservation function from the C<ecb_expect_false> |
525 |
description could be written thus (only C<ecb_assume> was added): |
526 |
|
527 |
ecb_inline void |
528 |
reserve (int size) |
529 |
{ |
530 |
if (ecb_expect_false (current + size > end)) |
531 |
real_reserve_method (size); /* presumably noinline */ |
532 |
|
533 |
ecb_assume (current + size <= end); |
534 |
} |
535 |
|
536 |
If you then call this function twice, like this: |
537 |
|
538 |
reserve (10); |
539 |
reserve (1); |
540 |
|
541 |
Then the compiler I<might> be able to optimise out the second call |
542 |
completely, as it knows that C<< current + 1 > end >> is false and the |
543 |
call will never be executed. |
544 |
|
545 |
=item ecb_unreachable () |
546 |
|
547 |
This function does nothing itself, except tell the compiler that it will |
548 |
never be executed. Apart from suppressing a warning in some cases, this |
549 |
function can be used to implement C<ecb_assume> or similar functionality. |
550 |
|
551 |
=item ecb_prefetch (addr, rw, locality) |
552 |
|
553 |
Tells the compiler to try to prefetch memory at the given I<addr>ess |
554 |
for either reading (I<rw> = 0) or writing (I<rw> = 1). A I<locality> of |
555 |
C<0> means that there will only be one access later, C<3> means that |
556 |
the data will likely be accessed very often, and values in between mean |
557 |
something... in between. The memory pointed to by the address does not |
558 |
need to be accessible (it could be a null pointer for example), but C<rw> |
559 |
and C<locality> must be compile-time constants. |
560 |
|
561 |
This is a statement, not a function: you cannot use it as part of an |
562 |
expression. |
563 |
|
564 |
An obvious way to use this is to prefetch some data far away, in a big |
565 |
array you loop over. This prefetches memory some 128 array elements later, |
566 |
in the hope that it will be ready when the CPU arrives at that location. |
567 |
|
568 |
int sum = 0; |
569 |
|
570 |
for (i = 0; i < N; ++i) |
571 |
{ |
572 |
sum += arr [i] |
573 |
ecb_prefetch (arr + i + 128, 0, 0); |
574 |
} |
575 |
|
576 |
It's hard to predict how far to prefetch, and most CPUs that can prefetch |
577 |
are often good enough to predict this kind of behaviour themselves. It |
578 |
gets more interesting with linked lists, especially when you do some fair |
579 |
processing on each list element: |
580 |
|
581 |
for (node *n = start; n; n = n->next) |
582 |
{ |
583 |
ecb_prefetch (n->next, 0, 0); |
584 |
... do medium amount of work with *n |
585 |
} |
586 |
|
587 |
After processing the node, (part of) the next node might already be in |
588 |
cache. |
589 |
|
590 |
=back |
591 |
|
592 |
=head2 BIT FIDDLING / BIT WIZARDRY |
593 |
|
594 |
=over |
595 |
|
596 |
=item bool ecb_big_endian () |
597 |
|
598 |
=item bool ecb_little_endian () |
599 |
|
600 |
These two functions return true if the byte order is big endian |
601 |
(most-significant byte first) or little endian (least-significant byte |
602 |
first) respectively. |
603 |
|
604 |
On systems that are neither, their return values are unspecified. |
605 |
|
606 |
=item int ecb_ctz32 (uint32_t x) |
607 |
|
608 |
=item int ecb_ctz64 (uint64_t x) |
609 |
|
610 |
=item int ecb_ctz (T x) [C++] |
611 |
|
612 |
Returns the index of the least significant bit set in C<x> (or |
613 |
equivalently the number of bits set to 0 before the least significant bit |
614 |
set), starting from 0. If C<x> is 0 the result is undefined. |
615 |
|
616 |
For smaller types than C<uint32_t> you can safely use C<ecb_ctz32>. |
617 |
|
618 |
The overloaded C++ C<ecb_ctz> function supports C<uint8_t>, C<uint16_t>, |
619 |
C<uint32_t> and C<uint64_t> types. |
620 |
|
621 |
For example: |
622 |
|
623 |
ecb_ctz32 (3) = 0 |
624 |
ecb_ctz32 (6) = 1 |
625 |
|
626 |
=item int ecb_clz32 (uint32_t x) |
627 |
|
628 |
=item int ecb_clz64 (uint64_t x) |
629 |
|
630 |
Counts the number of leading zero bits in C<x>. If C<x> is 0 the result is |
631 |
undefined. |
632 |
|
633 |
It is often simpler to use one of the C<ecb_ld*> functions instead, whose |
634 |
result only depends on the value and not the size of the type. This is |
635 |
also the reason why there is no C++ overload. |
636 |
|
637 |
For example: |
638 |
|
639 |
ecb_clz32 (3) = 30 |
640 |
ecb_clz32 (6) = 29 |
641 |
|
642 |
=item bool ecb_is_pot32 (uint32_t x) |
643 |
|
644 |
=item bool ecb_is_pot64 (uint32_t x) |
645 |
|
646 |
=item bool ecb_is_pot (T x) [C++] |
647 |
|
648 |
Returns true iff C<x> is a power of two or C<x == 0>. |
649 |
|
650 |
For smaller types than C<uint32_t> you can safely use C<ecb_is_pot32>. |
651 |
|
652 |
The overloaded C++ C<ecb_is_pot> function supports C<uint8_t>, C<uint16_t>, |
653 |
C<uint32_t> and C<uint64_t> types. |
654 |
|
655 |
=item int ecb_ld32 (uint32_t x) |
656 |
|
657 |
=item int ecb_ld64 (uint64_t x) |
658 |
|
659 |
=item int ecb_ld64 (T x) [C++] |
660 |
|
661 |
Returns the index of the most significant bit set in C<x>, or the number |
662 |
of digits the number requires in binary (so that C<< 2**ld <= x < |
663 |
2**(ld+1) >>). If C<x> is 0 the result is undefined. A common use case is |
664 |
to compute the integer binary logarithm, i.e. C<floor (log2 (n))>, for |
665 |
example to see how many bits a certain number requires to be encoded. |
666 |
|
667 |
This function is similar to the "count leading zero bits" function, except |
668 |
that that one returns how many zero bits are "in front" of the number (in |
669 |
the given data type), while C<ecb_ld> returns how many bits the number |
670 |
itself requires. |
671 |
|
672 |
For smaller types than C<uint32_t> you can safely use C<ecb_ld32>. |
673 |
|
674 |
The overloaded C++ C<ecb_ld> function supports C<uint8_t>, C<uint16_t>, |
675 |
C<uint32_t> and C<uint64_t> types. |
676 |
|
677 |
=item int ecb_popcount32 (uint32_t x) |
678 |
|
679 |
=item int ecb_popcount64 (uint64_t x) |
680 |
|
681 |
=item int ecb_popcount (T x) [C++] |
682 |
|
683 |
Returns the number of bits set to 1 in C<x>. |
684 |
|
685 |
For smaller types than C<uint32_t> you can safely use C<ecb_popcount32>. |
686 |
|
687 |
The overloaded C++ C<ecb_popcount> function supports C<uint8_t>, C<uint16_t>, |
688 |
C<uint32_t> and C<uint64_t> types. |
689 |
|
690 |
For example: |
691 |
|
692 |
ecb_popcount32 (7) = 3 |
693 |
ecb_popcount32 (255) = 8 |
694 |
|
695 |
=item uint8_t ecb_bitrev8 (uint8_t x) |
696 |
|
697 |
=item uint16_t ecb_bitrev16 (uint16_t x) |
698 |
|
699 |
=item uint32_t ecb_bitrev32 (uint32_t x) |
700 |
|
701 |
=item T ecb_bitrev (T x) [C++] |
702 |
|
703 |
Reverses the bits in x, i.e. the MSB becomes the LSB, MSB-1 becomes LSB+1 |
704 |
and so on. |
705 |
|
706 |
The overloaded C++ C<ecb_bitrev> function supports C<uint8_t>, C<uint16_t> and C<uint32_t> types. |
707 |
|
708 |
Example: |
709 |
|
710 |
ecb_bitrev8 (0xa7) = 0xea |
711 |
ecb_bitrev32 (0xffcc4411) = 0x882233ff |
712 |
|
713 |
=item T ecb_bitrev (T x) [C++] |
714 |
|
715 |
Overloaded C++ bitrev function. |
716 |
|
717 |
C<T> must be one of C<uint8_t>, C<uint16_t> or C<uint32_t>. |
718 |
|
719 |
=item uint32_t ecb_bswap16 (uint32_t x) |
720 |
|
721 |
=item uint32_t ecb_bswap32 (uint32_t x) |
722 |
|
723 |
=item uint64_t ecb_bswap64 (uint64_t x) |
724 |
|
725 |
=item T ecb_bswap (T x) |
726 |
|
727 |
These functions return the value of the 16-bit (32-bit, 64-bit) value |
728 |
C<x> after reversing the order of bytes (0x11223344 becomes 0x44332211 in |
729 |
C<ecb_bswap32>). |
730 |
|
731 |
The overloaded C++ C<ecb_bswap> function supports C<uint8_t>, C<uint16_t>, |
732 |
C<uint32_t> and C<uint64_t> types. |
733 |
|
734 |
=item uint8_t ecb_rotl8 (uint8_t x, unsigned int count) |
735 |
|
736 |
=item uint16_t ecb_rotl16 (uint16_t x, unsigned int count) |
737 |
|
738 |
=item uint32_t ecb_rotl32 (uint32_t x, unsigned int count) |
739 |
|
740 |
=item uint64_t ecb_rotl64 (uint64_t x, unsigned int count) |
741 |
|
742 |
=item uint8_t ecb_rotr8 (uint8_t x, unsigned int count) |
743 |
|
744 |
=item uint16_t ecb_rotr16 (uint16_t x, unsigned int count) |
745 |
|
746 |
=item uint32_t ecb_rotr32 (uint32_t x, unsigned int count) |
747 |
|
748 |
=item uint64_t ecb_rotr64 (uint64_t x, unsigned int count) |
749 |
|
750 |
These two families of functions return the value of C<x> after rotating |
751 |
all the bits by C<count> positions to the right (C<ecb_rotr>) or left |
752 |
(C<ecb_rotl>). There are no restrictions on the value C<count>, i.e. both |
753 |
zero and values equal or larger than the word width work correctly. Also, |
754 |
notwithstanding C<count> being unsigned, negative numbers work and shift |
755 |
to the opposite direction. |
756 |
|
757 |
Current GCC/clang versions understand these functions and usually compile |
758 |
them to "optimal" code (e.g. a single C<rol> or a combination of C<shld> |
759 |
on x86). |
760 |
|
761 |
=item T ecb_rotl (T x, unsigned int count) [C++] |
762 |
|
763 |
=item T ecb_rotr (T x, unsigned int count) [C++] |
764 |
|
765 |
Overloaded C++ rotl/rotr functions. |
766 |
|
767 |
C<T> must be one of C<uint8_t>, C<uint16_t>, C<uint32_t> or C<uint64_t>. |
768 |
|
769 |
=item uint_fast8_t ecb_gray_encode8 (uint_fast8_t b) |
770 |
|
771 |
=item uint_fast16_t ecb_gray_encode16 (uint_fast16_t b) |
772 |
|
773 |
=item uint_fast32_t ecb_gray_encode32 (uint_fast32_t b) |
774 |
|
775 |
=item uint_fast64_t ecb_gray_encode64 (uint_fast64_t b) |
776 |
|
777 |
Encode an unsigned into its corresponding (reflective) gray code - the |
778 |
kind of gray code meant when just talking about "gray code". These |
779 |
functions are very fast and all have identical implementation, so there is |
780 |
no need to use a smaller type, as long as your CPU can handle it natively. |
781 |
|
782 |
=item T ecb_gray_encode (T b) [C++] |
783 |
|
784 |
Overloaded C++ version of the above, for C<uint{8,16,32,64}_t>. |
785 |
|
786 |
=item uint_fast8_t ecb_gray_decode8 (uint_fast8_t b) |
787 |
|
788 |
=item uint_fast16_t ecb_gray_decode16 (uint_fast16_t b) |
789 |
|
790 |
=item uint_fast32_t ecb_gray_decode32 (uint_fast32_t b) |
791 |
|
792 |
=item uint_fast64_t ecb_gray_decode64 (uint_fast64_t b) |
793 |
|
794 |
Decode a gray code back into linear index form (the reverse of |
795 |
C<ecb_gray*_encode>. Unlike the encode functions, the decode functions |
796 |
have higher time complexity for larger types, so it can pay off to use a |
797 |
smaller type here. |
798 |
|
799 |
=item T ecb_gray_decode (T b) [C++] |
800 |
|
801 |
Overloaded C++ version of the above, for C<uint{8,16,32,64}_t>. |
802 |
|
803 |
=back |
804 |
|
805 |
=head2 HILBERT CURVES |
806 |
|
807 |
These functions deal with (square, pseudo) Hilbert curves. The parameter |
808 |
I<order> indicates the size of the square and is specified in bits, that |
809 |
means for order C<8>, the coordinates range from C<0>..C<255>, and the |
810 |
curve index ranges from C<0>..C<65535>. |
811 |
|
812 |
The 32 bit variants of these functions map a 32 bit index to two 16 bit |
813 |
coordinates, stored in a 32 bit variable, where the high order bits are |
814 |
the x-coordinate, and the low order bits are the y-coordinate, thus, |
815 |
these functions map 32 bit linear index on the curve to a 32 bit packed |
816 |
coordinate pair, and vice versa. |
817 |
|
818 |
The 64 bit variants work similarly. |
819 |
|
820 |
The I<order> can go from C<1> to C<16> for the 32 bit curve, and C<1> to |
821 |
C<32> for the 64 bit curve. |
822 |
|
823 |
When going from one order to the next higher order, these functions |
824 |
replace the curve segments by smaller versions of the generating shape, |
825 |
while doubling the size (since they use integer coordinates), which is |
826 |
what you would expect mathematically. This means that the curve will be |
827 |
mirrored at the diagonal. If your goal is to simply cover more area while |
828 |
retaining existing point coordinates you should increase or decrease the |
829 |
I<order> by C<2> or, in the case of C<ecb_hilbert2d_index_to_coord>, |
830 |
simply specify the maximum I<order> of C<16> or C<32>, respectively, as |
831 |
these are constant-time. |
832 |
|
833 |
=over |
834 |
|
835 |
=item uint32_t ecb_hilbert2d_index_to_coord32 (int order, uint32_t index) |
836 |
|
837 |
=item uint64_t ecb_hilbert2d_index_to_coord64 (int order, uint64_t index) |
838 |
|
839 |
Map a point on a pseudo Hilbert curve from its linear distance from the |
840 |
origin on the curve to a x|y coordinate pair. The result is a packed |
841 |
coordinate pair, to get the actual x and < coordinates, you could do |
842 |
something like this: |
843 |
|
844 |
uint32_t xy = ecb_hilbert2d_index_to_coord32 (16, 255); |
845 |
uint16_t x = xy >> 16; |
846 |
uint16_t y = xy & 0xffffU; |
847 |
|
848 |
uint64_t xy = ecb_hilbert2d_index_to_coord64 (32, 255); |
849 |
uint32_t x = xy >> 32; |
850 |
uint32_t y = xy & 0xffffffffU; |
851 |
|
852 |
These functions work in constant time, so for many applications it is |
853 |
preferable to simply hard-code the order to the maximum (C<16> or C<32>). |
854 |
|
855 |
This (production-ready, i.e. never run) example generates an SVG image of |
856 |
an order 8 pseudo Hilbert curve: |
857 |
|
858 |
printf ("<svg xmlns='http://www.w3.org/2000/svg' width='%d' height='%d'>\n", 64 * 8, 64 * 8); |
859 |
printf ("<g transform='translate(4) scale(8)' stroke-width='0.25' stroke='black'>\n"); |
860 |
for (uint32_t i = 0; i < 64*64 - 1; ++i) |
861 |
{ |
862 |
uint32_t p1 = ecb_hilbert2d_index_to_coord32 (6, i ); |
863 |
uint32_t p2 = ecb_hilbert2d_index_to_coord32 (6, i + 1); |
864 |
printf ("<line x1='%d' y1='%d' x2='%d' y2='%d'/>\n", |
865 |
p1 >> 16, p1 & 0xffff, |
866 |
p2 >> 16, p2 & 0xffff); |
867 |
} |
868 |
printf ("</g>\n"); |
869 |
printf ("</svg>\n"); |
870 |
|
871 |
=item uint32_t ecb_hilbert2d_coord_to_index32 (int order, uint32_t xy) |
872 |
|
873 |
=item uint64_t ecb_hilbert2d_coord_to_index64 (int order, uint64_t xy) |
874 |
|
875 |
The reverse of C<ecb_hilbert2d_index_to_coord> - map a packed pair of |
876 |
coordinates to their linear index on the pseudo Hilbert curve of order |
877 |
I<order>. |
878 |
|
879 |
They are an exact inverse of the C<ecb_hilbert2d_coord_to_index> functions |
880 |
for the same I<order>: |
881 |
|
882 |
assert ( |
883 |
u == ecb_hilbert2d_coord_to_index (32, |
884 |
ecb_hilbert2d_index_to_coord32 (32, |
885 |
u))); |
886 |
|
887 |
Packing coordinates is done the same way, as well, from I<x> and I<y>: |
888 |
|
889 |
uint32_t xy = ((uint32_t)x << 16) | y; // for ecb_hilbert2d_coord_to_index32 |
890 |
uint64_t xy = ((uint64_t)x << 32) | y; // for ecb_hilbert2d_coord_to_index64 |
891 |
|
892 |
Unlike C<ecb_hilbert2d_coord_to_index>, these functions are O(I<order>), |
893 |
so it is preferable to use the lowest possible order. |
894 |
|
895 |
=back |
896 |
|
897 |
=head2 BIT MIXING, HASHING |
898 |
|
899 |
Sometimes you have an integer and want to distribute its bits well, for |
900 |
example, to use it as a hash in a hash table. A common example is pointer |
901 |
values, which often only have a limited range (e.g. low and high bits are |
902 |
often zero). |
903 |
|
904 |
The following functions try to mix the bits to get a good bias-free |
905 |
distribution. They were mainly made for pointers, but the underlying |
906 |
integer functions are exposed as well. |
907 |
|
908 |
As an added benefit, the functions are reversible, so if you find it |
909 |
convenient to store only the hash value, you can recover the original |
910 |
pointer from the hash ("unmix"), as long as your pointers are 32 or 64 bit |
911 |
(if this isn't the case on your platform, drop us a note and we will add |
912 |
functions for other bit widths). |
913 |
|
914 |
The unmix functions are very slightly slower than the mix functions, so |
915 |
it is equally very slightly preferable to store the original values wehen |
916 |
convenient. |
917 |
|
918 |
The underlying algorithm if subject to change, so currently these |
919 |
functions are not suitable for persistent hash tables, as their result |
920 |
value can change between different versions of libecb. |
921 |
|
922 |
=over |
923 |
|
924 |
=item uintptr_t ecb_ptrmix (void *ptr) |
925 |
|
926 |
Mixes the bits of a pointer so the result is suitable for hash table |
927 |
lookups. In other words, this hashes the pointer value. |
928 |
|
929 |
=item uintptr_t ecb_ptrmix (T *ptr) [C++] |
930 |
|
931 |
Overload the C<ecb_ptrmix> function to work for any pointer in C++. |
932 |
|
933 |
=item void *ecb_ptrunmix (uintptr_t v) |
934 |
|
935 |
Unmix the hash value into the original pointer. This only works as long |
936 |
as the hash value is not truncated, i.e. you used C<uintptr_t> (or |
937 |
equivalent) throughout to store it. |
938 |
|
939 |
=item T *ecb_ptrunmix<T> (uintptr_t v) [C++] |
940 |
|
941 |
The somewhat less useful template version of C<ecb_ptrunmix> for |
942 |
C++. Example: |
943 |
|
944 |
sometype *myptr; |
945 |
uintptr_t hash = ecb_ptrmix (myptr); |
946 |
sometype *orig = ecb_ptrunmix<sometype> (hash); |
947 |
|
948 |
=item uint32_t ecb_mix32 (uint32_t v) |
949 |
|
950 |
=item uint64_t ecb_mix64 (uint64_t v) |
951 |
|
952 |
Sometimes you don't have a pointer but an integer whose values are very |
953 |
badly distributed. In this case you can use these integer versions of the |
954 |
mixing function. No C++ template is provided currently. |
955 |
|
956 |
=item uint32_t ecb_unmix32 (uint32_t v) |
957 |
|
958 |
=item uint64_t ecb_unmix64 (uint64_t v) |
959 |
|
960 |
The reverse of the C<ecb_mix> functions - they take a mixed/hashed value |
961 |
and recover the original value. |
962 |
|
963 |
=back |
964 |
|
965 |
=head2 HOST ENDIANNESS CONVERSION |
966 |
|
967 |
=over |
968 |
|
969 |
=item uint_fast16_t ecb_be_u16_to_host (uint_fast16_t v) |
970 |
|
971 |
=item uint_fast32_t ecb_be_u32_to_host (uint_fast32_t v) |
972 |
|
973 |
=item uint_fast64_t ecb_be_u64_to_host (uint_fast64_t v) |
974 |
|
975 |
=item uint_fast16_t ecb_le_u16_to_host (uint_fast16_t v) |
976 |
|
977 |
=item uint_fast32_t ecb_le_u32_to_host (uint_fast32_t v) |
978 |
|
979 |
=item uint_fast64_t ecb_le_u64_to_host (uint_fast64_t v) |
980 |
|
981 |
Convert an unsigned 16, 32 or 64 bit value from big or little endian to host byte order. |
982 |
|
983 |
The naming convention is C<ecb_>(C<be>|C<le>)C<_u>C<16|32|64>C<_to_host>, |
984 |
where C<be> and C<le> stand for big endian and little endian, respectively. |
985 |
|
986 |
=item uint_fast16_t ecb_host_to_be_u16 (uint_fast16_t v) |
987 |
|
988 |
=item uint_fast32_t ecb_host_to_be_u32 (uint_fast32_t v) |
989 |
|
990 |
=item uint_fast64_t ecb_host_to_be_u64 (uint_fast64_t v) |
991 |
|
992 |
=item uint_fast16_t ecb_host_to_le_u16 (uint_fast16_t v) |
993 |
|
994 |
=item uint_fast32_t ecb_host_to_le_u32 (uint_fast32_t v) |
995 |
|
996 |
=item uint_fast64_t ecb_host_to_le_u64 (uint_fast64_t v) |
997 |
|
998 |
Like above, but converts I<from> host byte order to the specified |
999 |
endianness. |
1000 |
|
1001 |
=back |
1002 |
|
1003 |
In C++ the following additional template functions are supported: |
1004 |
|
1005 |
=over |
1006 |
|
1007 |
=item T ecb_be_to_host (T v) |
1008 |
|
1009 |
=item T ecb_le_to_host (T v) |
1010 |
|
1011 |
=item T ecb_host_to_be (T v) |
1012 |
|
1013 |
=item T ecb_host_to_le (T v) |
1014 |
|
1015 |
=back |
1016 |
|
1017 |
These functions work like their C counterparts, above, but use templates, |
1018 |
which make them useful in generic code. |
1019 |
|
1020 |
C<T> must be one of C<uint8_t>, C<uint16_t>, C<uint32_t> or C<uint64_t> |
1021 |
(so unlike their C counterparts, there is a version for C<uint8_t>, which |
1022 |
again can be useful in generic code). |
1023 |
|
1024 |
=head2 UNALIGNED LOAD/STORE |
1025 |
|
1026 |
These function load or store unaligned multi-byte values. |
1027 |
|
1028 |
=over |
1029 |
|
1030 |
=item uint_fast16_t ecb_peek_u16_u (const void *ptr) |
1031 |
|
1032 |
=item uint_fast32_t ecb_peek_u32_u (const void *ptr) |
1033 |
|
1034 |
=item uint_fast64_t ecb_peek_u64_u (const void *ptr) |
1035 |
|
1036 |
These functions load an unaligned, unsigned 16, 32 or 64 bit value from |
1037 |
memory. |
1038 |
|
1039 |
=item uint_fast16_t ecb_peek_be_u16_u (const void *ptr) |
1040 |
|
1041 |
=item uint_fast32_t ecb_peek_be_u32_u (const void *ptr) |
1042 |
|
1043 |
=item uint_fast64_t ecb_peek_be_u64_u (const void *ptr) |
1044 |
|
1045 |
=item uint_fast16_t ecb_peek_le_u16_u (const void *ptr) |
1046 |
|
1047 |
=item uint_fast32_t ecb_peek_le_u32_u (const void *ptr) |
1048 |
|
1049 |
=item uint_fast64_t ecb_peek_le_u64_u (const void *ptr) |
1050 |
|
1051 |
Like above, but additionally convert from big endian (C<be>) or little |
1052 |
endian (C<le>) byte order to host byte order while doing so. |
1053 |
|
1054 |
=item ecb_poke_u16_u (void *ptr, uint16_t v) |
1055 |
|
1056 |
=item ecb_poke_u32_u (void *ptr, uint32_t v) |
1057 |
|
1058 |
=item ecb_poke_u64_u (void *ptr, uint64_t v) |
1059 |
|
1060 |
These functions store an unaligned, unsigned 16, 32 or 64 bit value to |
1061 |
memory. |
1062 |
|
1063 |
=item ecb_poke_be_u16_u (void *ptr, uint_fast16_t v) |
1064 |
|
1065 |
=item ecb_poke_be_u32_u (void *ptr, uint_fast32_t v) |
1066 |
|
1067 |
=item ecb_poke_be_u64_u (void *ptr, uint_fast64_t v) |
1068 |
|
1069 |
=item ecb_poke_le_u16_u (void *ptr, uint_fast16_t v) |
1070 |
|
1071 |
=item ecb_poke_le_u32_u (void *ptr, uint_fast32_t v) |
1072 |
|
1073 |
=item ecb_poke_le_u64_u (void *ptr, uint_fast64_t v) |
1074 |
|
1075 |
Like above, but additionally convert from host byte order to big endian |
1076 |
(C<be>) or little endian (C<le>) byte order while doing so. |
1077 |
|
1078 |
=back |
1079 |
|
1080 |
In C++ the following additional template functions are supported: |
1081 |
|
1082 |
=over |
1083 |
|
1084 |
=item T ecb_peek<T> (const void *ptr) |
1085 |
|
1086 |
=item T ecb_peek_be<T> (const void *ptr) |
1087 |
|
1088 |
=item T ecb_peek_le<T> (const void *ptr) |
1089 |
|
1090 |
=item T ecb_peek_u<T> (const void *ptr) |
1091 |
|
1092 |
=item T ecb_peek_be_u<T> (const void *ptr) |
1093 |
|
1094 |
=item T ecb_peek_le_u<T> (const void *ptr) |
1095 |
|
1096 |
Similarly to their C counterparts, these functions load an unsigned 8, 16, |
1097 |
32 or 64 bit value from memory, with optional conversion from big/little |
1098 |
endian. |
1099 |
|
1100 |
Since the type cannot be deduced, it has to be specified explicitly, e.g. |
1101 |
|
1102 |
uint_fast16_t v = ecb_peek<uint16_t> (ptr); |
1103 |
|
1104 |
C<T> must be one of C<uint8_t>, C<uint16_t>, C<uint32_t> or C<uint64_t>. |
1105 |
|
1106 |
Unlike their C counterparts, these functions support 8 bit quantities |
1107 |
(C<uint8_t>) and also have an aligned version (without the C<_u> prefix), |
1108 |
all of which hopefully makes them more useful in generic code. |
1109 |
|
1110 |
=item ecb_poke (void *ptr, T v) |
1111 |
|
1112 |
=item ecb_poke_be (void *ptr, T v) |
1113 |
|
1114 |
=item ecb_poke_le (void *ptr, T v) |
1115 |
|
1116 |
=item ecb_poke_u (void *ptr, T v) |
1117 |
|
1118 |
=item ecb_poke_be_u (void *ptr, T v) |
1119 |
|
1120 |
=item ecb_poke_le_u (void *ptr, T v) |
1121 |
|
1122 |
Again, similarly to their C counterparts, these functions store an |
1123 |
unsigned 8, 16, 32 or 64 bit value to memory, with optional conversion to |
1124 |
big/little endian. |
1125 |
|
1126 |
C<T> must be one of C<uint8_t>, C<uint16_t>, C<uint32_t> or C<uint64_t>. |
1127 |
|
1128 |
Unlike their C counterparts, these functions support 8 bit quantities |
1129 |
(C<uint8_t>) and also have an aligned version (without the C<_u> prefix), |
1130 |
all of which hopefully makes them more useful in generic code. |
1131 |
|
1132 |
=back |
1133 |
|
1134 |
=head2 FAST INTEGER TO STRING |
1135 |
|
1136 |
Libecb defines a set of very fast integer to decimal string (or integer |
1137 |
to ASCII, short C<i2a>) functions. These work by converting the integer |
1138 |
to a fixed point representation and then successively multiplying out |
1139 |
the topmost digits. Unlike some other, also very fast, libraries, ecb's |
1140 |
algorithm should be completely branchless per digit, and does not rely on |
1141 |
the presence of special CPU functions (such as C<clz>). |
1142 |
|
1143 |
There is a high level API that takes an C<int32_t>, C<uint32_t>, |
1144 |
C<int64_t> or C<uint64_t> as argument, and a low-level API, which is |
1145 |
harder to use but supports slightly more formatting options. |
1146 |
|
1147 |
=head3 HIGH LEVEL API |
1148 |
|
1149 |
The high level API consists of four functions, one each for C<int32_t>, |
1150 |
C<uint32_t>, C<int64_t> and C<uint64_t>: |
1151 |
|
1152 |
Example: |
1153 |
|
1154 |
char buf[ECB_I2A_MAX_DIGITS + 1]; |
1155 |
char *end = ecb_i2a_i32 (buf, 17262); |
1156 |
*end = 0; |
1157 |
// buf now contains "17262" |
1158 |
|
1159 |
=over |
1160 |
|
1161 |
=item ECB_I2A_I32_DIGITS (=11) |
1162 |
|
1163 |
=item char *ecb_i2a_u32 (char *ptr, uint32_t value) |
1164 |
|
1165 |
Takes an C<uint32_t> I<value> and formats it as a decimal number starting |
1166 |
at I<ptr>, using at most C<ECB_I2A_I32_DIGITS> characters. Returns a |
1167 |
pointer to just after the generated string, where you would normally put |
1168 |
the terminating C<0> character. This function outputs the minimum number |
1169 |
of digits. |
1170 |
|
1171 |
=item ECB_I2A_U32_DIGITS (=10) |
1172 |
|
1173 |
=item char *ecb_i2a_i32 (char *ptr, int32_t value) |
1174 |
|
1175 |
Same as C<ecb_i2a_u32>, but formats a C<int32_t> value, including a minus |
1176 |
sign if needed. |
1177 |
|
1178 |
=item ECB_I2A_I64_DIGITS (=20) |
1179 |
|
1180 |
=item char *ecb_i2a_u64 (char *ptr, uint64_t value) |
1181 |
|
1182 |
=item ECB_I2A_U64_DIGITS (=21) |
1183 |
|
1184 |
=item char *ecb_i2a_i64 (char *ptr, int64_t value) |
1185 |
|
1186 |
Similar to their 32 bit counterparts, these take a 64 bit argument. |
1187 |
|
1188 |
=item ECB_I2A_MAX_DIGITS (=21) |
1189 |
|
1190 |
Instead of using a type specific length macro, you can just use |
1191 |
C<ECB_I2A_MAX_DIGITS>, which is good enough for any C<ecb_i2a> function. |
1192 |
|
1193 |
=back |
1194 |
|
1195 |
=head3 LOW-LEVEL API |
1196 |
|
1197 |
The functions above use a number of low-level APIs which have some strict |
1198 |
limitations, but can be used as building blocks (studying C<ecb_i2a_i32> |
1199 |
and related functions is recommended). |
1200 |
|
1201 |
There are three families of functions: functions that convert a number |
1202 |
to a fixed number of digits with leading zeroes (C<ecb_i2a_0N>, C<0> |
1203 |
for "leading zeroes"), functions that generate up to N digits, skipping |
1204 |
leading zeroes (C<_N>), and functions that can generate more digits, but |
1205 |
the leading digit has limited range (C<_xN>). |
1206 |
|
1207 |
None of the functions deal with negative numbers. |
1208 |
|
1209 |
Example: convert an IP address in an C<uint32_t> into dotted-quad: |
1210 |
|
1211 |
uint32_t ip = 0x0a000164; // 10.0.1.100 |
1212 |
char ips[3 * 4 + 3 + 1]; |
1213 |
char *ptr = ips; |
1214 |
ptr = ecb_i2a_3 (ptr, ip >> 24 ); *ptr++ = '.'; |
1215 |
ptr = ecb_i2a_3 (ptr, (ip >> 16) & 0xff); *ptr++ = '.'; |
1216 |
ptr = ecb_i2a_3 (ptr, (ip >> 8) & 0xff); *ptr++ = '.'; |
1217 |
ptr = ecb_i2a_3 (ptr, ip & 0xff); *ptr++ = 0; |
1218 |
printf ("ip: %s\n", ips); // prints "ip: 10.0.1.100" |
1219 |
|
1220 |
=over |
1221 |
|
1222 |
=item char *ecb_i2a_02 (char *ptr, uint32_t value) // 32 bit |
1223 |
|
1224 |
=item char *ecb_i2a_03 (char *ptr, uint32_t value) // 32 bit |
1225 |
|
1226 |
=item char *ecb_i2a_04 (char *ptr, uint32_t value) // 32 bit |
1227 |
|
1228 |
=item char *ecb_i2a_05 (char *ptr, uint32_t value) // 64 bit |
1229 |
|
1230 |
=item char *ecb_i2a_06 (char *ptr, uint32_t value) // 64 bit |
1231 |
|
1232 |
=item char *ecb_i2a_07 (char *ptr, uint32_t value) // 64 bit |
1233 |
|
1234 |
=item char *ecb_i2a_08 (char *ptr, uint32_t value) // 64 bit |
1235 |
|
1236 |
=item char *ecb_i2a_09 (char *ptr, uint32_t value) // 64 bit |
1237 |
|
1238 |
The C<< ecb_i2a_0I<N> >> functions take an unsigned I<value> and convert |
1239 |
them to exactly I<N> digits, returning a pointer to the first character |
1240 |
after the digits. The I<value> must be in range. The functions marked with |
1241 |
I<32 bit> do their calculations internally in 32 bit, the ones marked with |
1242 |
I<64 bit> internally use 64 bit integers, which might be slow on 32 bit |
1243 |
architectures (the high level API decides on 32 vs. 64 bit versions using |
1244 |
C<ECB_64BIT_NATIVE>). |
1245 |
|
1246 |
=item char *ecb_i2a_2 (char *ptr, uint32_t value) // 32 bit |
1247 |
|
1248 |
=item char *ecb_i2a_3 (char *ptr, uint32_t value) // 32 bit |
1249 |
|
1250 |
=item char *ecb_i2a_4 (char *ptr, uint32_t value) // 32 bit |
1251 |
|
1252 |
=item char *ecb_i2a_5 (char *ptr, uint32_t value) // 64 bit |
1253 |
|
1254 |
=item char *ecb_i2a_6 (char *ptr, uint32_t value) // 64 bit |
1255 |
|
1256 |
=item char *ecb_i2a_7 (char *ptr, uint32_t value) // 64 bit |
1257 |
|
1258 |
=item char *ecb_i2a_8 (char *ptr, uint32_t value) // 64 bit |
1259 |
|
1260 |
=item char *ecb_i2a_9 (char *ptr, uint32_t value) // 64 bit |
1261 |
|
1262 |
Similarly, the C<< ecb_i2a_I<N> >> functions take an unsigned I<value> |
1263 |
and convert them to at most I<N> digits, suppressing leading zeroes, and |
1264 |
returning a pointer to the first character after the digits. |
1265 |
|
1266 |
=item ECB_I2A_MAX_X5 (=59074) |
1267 |
|
1268 |
=item char *ecb_i2a_x5 (char *ptr, uint32_t value) // 32 bit |
1269 |
|
1270 |
=item ECB_I2A_MAX_X10 (=2932500665) |
1271 |
|
1272 |
=item char *ecb_i2a_x10 (char *ptr, uint32_t value) // 64 bit |
1273 |
|
1274 |
The C<< ecb_i2a_xI<N> >> functions are similar to the C<< ecb_i2a_I<N> >> |
1275 |
functions, but they can generate one digit more, as long as the number |
1276 |
is within range, which is given by the symbols C<ECB_I2A_MAX_X5> (almost |
1277 |
16 bit range) and C<ECB_I2A_MAX_X10> (a bit more than 31 bit range), |
1278 |
respectively. |
1279 |
|
1280 |
For example, the digit part of a 32 bit signed integer just fits into the |
1281 |
C<ECB_I2A_MAX_X10> range, so while C<ecb_i2a_x10> cannot convert a 10 |
1282 |
digit number, it can convert all 32 bit signed numbers. Sadly, it's not |
1283 |
good enough for 32 bit unsigned numbers. |
1284 |
|
1285 |
=back |
1286 |
|
1287 |
=head2 FLOATING POINT FIDDLING |
1288 |
|
1289 |
=over |
1290 |
|
1291 |
=item ECB_INFINITY [-UECB_NO_LIBM] |
1292 |
|
1293 |
Evaluates to positive infinity if supported by the platform, otherwise to |
1294 |
a truly huge number. |
1295 |
|
1296 |
=item ECB_NAN [-UECB_NO_LIBM] |
1297 |
|
1298 |
Evaluates to a quiet NAN if supported by the platform, otherwise to |
1299 |
C<ECB_INFINITY>. |
1300 |
|
1301 |
=item float ecb_ldexpf (float x, int exp) [-UECB_NO_LIBM] |
1302 |
|
1303 |
Same as C<ldexpf>, but always available. |
1304 |
|
1305 |
=item uint32_t ecb_float_to_binary16 (float x) [-UECB_NO_LIBM] |
1306 |
|
1307 |
=item uint32_t ecb_float_to_binary32 (float x) [-UECB_NO_LIBM] |
1308 |
|
1309 |
=item uint64_t ecb_double_to_binary64 (double x) [-UECB_NO_LIBM] |
1310 |
|
1311 |
These functions each take an argument in the native C<float> or C<double> |
1312 |
type and return the IEEE 754 bit representation of it (binary16/half, |
1313 |
binary32/single or binary64/double precision). |
1314 |
|
1315 |
The bit representation is just as IEEE 754 defines it, i.e. the sign bit |
1316 |
will be the most significant bit, followed by exponent and mantissa. |
1317 |
|
1318 |
This function should work even when the native floating point format isn't |
1319 |
IEEE compliant, of course at a speed and code size penalty, and of course |
1320 |
also within reasonable limits (it tries to convert NaNs, infinities and |
1321 |
denormals, but will likely convert negative zero to positive zero). |
1322 |
|
1323 |
On all modern platforms (where C<ECB_STDFP> is true), the compiler should |
1324 |
be able to completely optimise away the 32 and 64 bit functions. |
1325 |
|
1326 |
These functions can be helpful when serialising floats to the network - you |
1327 |
can serialise the return value like a normal uint16_t/uint32_t/uint64_t. |
1328 |
|
1329 |
Another use for these functions is to manipulate floating point values |
1330 |
directly. |
1331 |
|
1332 |
Silly example: toggle the sign bit of a float. |
1333 |
|
1334 |
/* On gcc-4.7 on amd64, */ |
1335 |
/* this results in a single add instruction to toggle the bit, and 4 extra */ |
1336 |
/* instructions to move the float value to an integer register and back. */ |
1337 |
|
1338 |
x = ecb_binary32_to_float (ecb_float_to_binary32 (x) ^ 0x80000000U) |
1339 |
|
1340 |
=item float ecb_binary16_to_float (uint16_t x) [-UECB_NO_LIBM] |
1341 |
|
1342 |
=item float ecb_binary32_to_float (uint32_t x) [-UECB_NO_LIBM] |
1343 |
|
1344 |
=item double ecb_binary64_to_double (uint64_t x) [-UECB_NO_LIBM] |
1345 |
|
1346 |
The reverse operation of the previous function - takes the bit |
1347 |
representation of an IEEE binary16, binary32 or binary64 number (half, |
1348 |
single or double precision) and converts it to the native C<float> or |
1349 |
C<double> format. |
1350 |
|
1351 |
This function should work even when the native floating point format isn't |
1352 |
IEEE compliant, of course at a speed and code size penalty, and of course |
1353 |
also within reasonable limits (it tries to convert normals and denormals, |
1354 |
and might be lucky for infinities, and with extraordinary luck, also for |
1355 |
negative zero). |
1356 |
|
1357 |
On all modern platforms (where C<ECB_STDFP> is true), the compiler should |
1358 |
be able to optimise away this function completely. |
1359 |
|
1360 |
=item uint16_t ecb_binary32_to_binary16 (uint32_t x) |
1361 |
|
1362 |
=item uint32_t ecb_binary16_to_binary32 (uint16_t x) |
1363 |
|
1364 |
Convert a IEEE binary32/single precision to binary16/half format, and vice |
1365 |
versa, handling all details (round-to-nearest-even, subnormals, infinity |
1366 |
and NaNs) correctly. |
1367 |
|
1368 |
These are functions are available under C<-DECB_NO_LIBM>, since |
1369 |
they do not rely on the platform floating point format. The |
1370 |
C<ecb_float_to_binary16> and C<ecb_binary16_to_float> functions are |
1371 |
usually what you want. |
1372 |
|
1373 |
=back |
1374 |
|
1375 |
=head2 ARITHMETIC |
1376 |
|
1377 |
=over |
1378 |
|
1379 |
=item x = ecb_mod (m, n) |
1380 |
|
1381 |
Returns C<m> modulo C<n>, which is the same as the positive remainder |
1382 |
of the division operation between C<m> and C<n>, using floored |
1383 |
division. Unlike the C remainder operator C<%>, this function ensures that |
1384 |
the return value is always positive and that the two numbers I<m> and |
1385 |
I<m' = m + i * n> result in the same value modulo I<n> - in other words, |
1386 |
C<ecb_mod> implements the mathematical modulo operation, which is missing |
1387 |
in the language. |
1388 |
|
1389 |
C<n> must be strictly positive (i.e. C<< >= 1 >>), while C<m> must be |
1390 |
negatable, that is, both C<m> and C<-m> must be representable in its |
1391 |
type (this typically excludes the minimum signed integer value, the same |
1392 |
limitation as for C</> and C<%> in C). |
1393 |
|
1394 |
Current GCC/clang versions compile this into an efficient branchless |
1395 |
sequence on almost all CPUs. |
1396 |
|
1397 |
For example, when you want to rotate forward through the members of an |
1398 |
array for increasing C<m> (which might be negative), then you should use |
1399 |
C<ecb_mod>, as the C<%> operator might give either negative results, or |
1400 |
change direction for negative values: |
1401 |
|
1402 |
for (m = -100; m <= 100; ++m) |
1403 |
int elem = myarray [ecb_mod (m, ecb_array_length (myarray))]; |
1404 |
|
1405 |
=item x = ecb_div_rd (val, div) |
1406 |
|
1407 |
=item x = ecb_div_ru (val, div) |
1408 |
|
1409 |
Returns C<val> divided by C<div> rounded down or up, respectively. |
1410 |
C<val> and C<div> must have integer types and C<div> must be strictly |
1411 |
positive. Note that these functions are implemented with macros in C |
1412 |
and with function templates in C++. |
1413 |
|
1414 |
=back |
1415 |
|
1416 |
=head2 UTILITY |
1417 |
|
1418 |
=over |
1419 |
|
1420 |
=item element_count = ecb_array_length (name) |
1421 |
|
1422 |
Returns the number of elements in the array C<name>. For example: |
1423 |
|
1424 |
int primes[] = { 2, 3, 5, 7, 11 }; |
1425 |
int sum = 0; |
1426 |
|
1427 |
for (i = 0; i < ecb_array_length (primes); i++) |
1428 |
sum += primes [i]; |
1429 |
|
1430 |
=back |
1431 |
|
1432 |
=head2 SYMBOLS GOVERNING COMPILATION OF ECB.H ITSELF |
1433 |
|
1434 |
These symbols need to be defined before including F<ecb.h> the first time. |
1435 |
|
1436 |
=over |
1437 |
|
1438 |
=item ECB_NO_THREADS |
1439 |
|
1440 |
If F<ecb.h> is never used from multiple threads, then this symbol can |
1441 |
be defined, in which case memory fences (and similar constructs) are |
1442 |
completely removed, leading to more efficient code and fewer dependencies. |
1443 |
|
1444 |
Setting this symbol to a true value implies C<ECB_NO_SMP>. |
1445 |
|
1446 |
=item ECB_NO_SMP |
1447 |
|
1448 |
The weaker version of C<ECB_NO_THREADS> - if F<ecb.h> is used from |
1449 |
multiple threads, but never concurrently (e.g. if the system the program |
1450 |
runs on has only a single CPU with a single core, no hyper-threading and so |
1451 |
on), then this symbol can be defined, leading to more efficient code and |
1452 |
fewer dependencies. |
1453 |
|
1454 |
=item ECB_NO_LIBM |
1455 |
|
1456 |
When defined to C<1>, do not export any functions that might introduce |
1457 |
dependencies on the math library (usually called F<-lm>) - these are |
1458 |
marked with [-UECB_NO_LIBM]. |
1459 |
|
1460 |
=back |
1461 |
|
1462 |
=head1 UNDOCUMENTED FUNCTIONALITY |
1463 |
|
1464 |
F<ecb.h> is full of undocumented functionality as well, some of which is |
1465 |
intended to be internal-use only, some of which we forgot to document, and |
1466 |
some of which we hide because we are not sure we will keep the interface |
1467 |
stable. |
1468 |
|
1469 |
While you are welcome to rummage around and use whatever you find useful |
1470 |
(we don't want to stop you), keep in mind that we will change undocumented |
1471 |
functionality in incompatible ways without thinking twice, while we are |
1472 |
considerably more conservative with documented things. |
1473 |
|
1474 |
=head1 AUTHORS |
1475 |
|
1476 |
C<libecb> is designed and maintained by: |
1477 |
|
1478 |
Emanuele Giaquinta <e.giaquinta@glauco.it> |
1479 |
Marc Alexander Lehmann <schmorp@schmorp.de> |