1 |
=head1 LIBECB |
2 |
|
3 |
You suck, we don't(tm) |
4 |
|
5 |
=head2 ABOUT THE HEADER |
6 |
|
7 |
- how to include it |
8 |
- it includes inttypes.h |
9 |
- no .a |
10 |
- whats a bool |
11 |
- function mean macro or function |
12 |
- macro means untyped |
13 |
|
14 |
=head2 GCC ATTRIBUTES |
15 |
|
16 |
blabla where to put, what others |
17 |
|
18 |
=over 4 |
19 |
|
20 |
=item ecb_attribute ((attrs...)) |
21 |
|
22 |
A simple wrapper that expands to C<__attribute__((attrs))> on GCC, and |
23 |
to nothing on other compilers, so the effect is that only GCC sees these. |
24 |
|
25 |
=item ecb_unused |
26 |
|
27 |
Marks a function or a variable as "unused", which simply suppresses a |
28 |
warning by GCC when it detects it as unused. This is useful when you e.g. |
29 |
declare a variable but do not always use it: |
30 |
|
31 |
{ |
32 |
int var ecb_unused; |
33 |
|
34 |
#ifdef SOMECONDITION |
35 |
var = ...; |
36 |
return var; |
37 |
#else |
38 |
return 0; |
39 |
#endif |
40 |
} |
41 |
|
42 |
=item ecb_noinline |
43 |
|
44 |
Prevent a function from being inlined - it might be optimised away, but |
45 |
not inlined into other functions. This is useful if you know your function |
46 |
is rarely called and large enough for inlining not to be helpful. |
47 |
|
48 |
=item ecb_noreturn |
49 |
|
50 |
=item ecb_const |
51 |
|
52 |
=item ecb_pure |
53 |
|
54 |
=item ecb_hot |
55 |
|
56 |
=item ecb_cold |
57 |
|
58 |
=item ecb_artificial |
59 |
|
60 |
=back |
61 |
|
62 |
=head2 OPTIMISATION HINTS |
63 |
|
64 |
=over 4 |
65 |
|
66 |
=item bool ecb_is_constant(expr) [MACRO] |
67 |
|
68 |
Returns true iff the expression can be deduced to be a compile-time |
69 |
constant, and false otherwise. |
70 |
|
71 |
For example, when you have a C<rndm16> function that returns a 16 bit |
72 |
random number, and you have a function that maps this to a range from |
73 |
0..n-1, then you could use this inline function in a header file: |
74 |
|
75 |
ecb_inline uint32_t |
76 |
rndm (uint32_t n) |
77 |
{ |
78 |
return (n * (uint32_t)rndm16 ()) >> 16; |
79 |
} |
80 |
|
81 |
However, for powers of two, you could use a normal mask, but that is only |
82 |
worth it if, at compile time, you can detect this case. This is the case |
83 |
when the passed number is a constant and also a power of two (C<n & (n - |
84 |
1) == 0>): |
85 |
|
86 |
ecb_inline uint32_t |
87 |
rndm (uint32_t n) |
88 |
{ |
89 |
return is_constant (n) && !(n & (n - 1)) |
90 |
? rndm16 () & (num - 1) |
91 |
: (n * (uint32_t)rndm16 ()) >> 16; |
92 |
} |
93 |
|
94 |
=item bool ecb_expect (expr, value) [MACRO] |
95 |
|
96 |
Evaluates C<expr> and returns it. In addition, it tells the compiler that |
97 |
the C<expr> evaluates to C<value> a lot, which can be used for static |
98 |
branch optimisations. |
99 |
|
100 |
Usually, you want to use the more intuitive C<ecb_likely> and |
101 |
C<ecb_unlikely> functions instead. |
102 |
|
103 |
=item bool ecb_likely (bool) [MACRO] |
104 |
|
105 |
=item bool ecb_unlikely (bool) [MACRO] |
106 |
|
107 |
These two functions expect a expression that is true or false and return |
108 |
C<1> or C<0>, respectively, so when used in the condition of an C<if> or |
109 |
other conditional statement, it will not change the program: |
110 |
|
111 |
/* these two do the same thing */ |
112 |
if (some_condition) ...; |
113 |
if (ecb_likely (some_condition)) ...; |
114 |
|
115 |
However, by using C<ecb_likely>, you tell the compiler that the condition |
116 |
is likely to be true (and for C<ecb_unlikely>, that it is unlikely to be |
117 |
true). |
118 |
|
119 |
For example, when you check for a null pointer and expect this to be a |
120 |
rare, exceptional, case, then use C<ecb_unlikely>: |
121 |
|
122 |
void my_free (void *ptr) |
123 |
{ |
124 |
if (ecb_unlikely (ptr == 0)) |
125 |
return; |
126 |
} |
127 |
|
128 |
Consequent use of these functions to mark away exceptional cases or to |
129 |
tell the compiler what the hot path through a function is can increase |
130 |
performance considerably. |
131 |
|
132 |
A very good example is in a function that reserves more space for some |
133 |
memory block (for example, inside an implementation of a string stream) - |
134 |
each time something is added, you have to check for a buffer overrun, but |
135 |
you expect that most checks will turn out to be false: |
136 |
|
137 |
/* make sure we have "size" extra room in our buffer */ |
138 |
ecb_inline void |
139 |
reserve (int size) |
140 |
{ |
141 |
if (ecb_unlikely (current + size > end)) |
142 |
real_reserve_method (size); /* presumably noinline */ |
143 |
} |
144 |
|
145 |
=item bool ecb_assume (cond) [MACRO] |
146 |
|
147 |
Try to tell the compiler that some condition is true, even if it's not |
148 |
obvious. |
149 |
|
150 |
This can be used to teach the compiler about invariants or other |
151 |
conditions that might improve code generation, but which are impossible to |
152 |
deduce form the code itself. |
153 |
|
154 |
For example, the example reservation function from the C<ecb_unlikely> |
155 |
description could be written thus (only C<ecb_assume> was added): |
156 |
|
157 |
ecb_inline void |
158 |
reserve (int size) |
159 |
{ |
160 |
if (ecb_unlikely (current + size > end)) |
161 |
real_reserve_method (size); /* presumably noinline */ |
162 |
|
163 |
ecb_assume (current + size <= end); |
164 |
} |
165 |
|
166 |
If you then call this function twice, like this: |
167 |
|
168 |
reserve (10); |
169 |
reserve (1); |
170 |
|
171 |
Then the compiler I<might> be able to optimise out the second call |
172 |
completely, as it knows that C<< current + 1 > end >> is false and the |
173 |
call will never be executed. |
174 |
|
175 |
=item bool ecb_unreachable () |
176 |
|
177 |
This function does nothing itself, except tell the compiler that it will |
178 |
never be executed. Apart from suppressing a warning in some cases, this |
179 |
function can be used to implement C<ecb_assume> or similar functions. |
180 |
|
181 |
=item bool ecb_prefetch (addr, rw, locality) [MACRO] |
182 |
|
183 |
Tells the compiler to try to prefetch memory at the given C<addr>ess |
184 |
for either reading (C<rw> = 0) or writing (C<rw> = 1). A C<locality> of |
185 |
C<0> means that there will only be one access later, C<3> means that |
186 |
the data will likely be accessed very often, and values in between mean |
187 |
something... in between. The memory pointed to by the address does not |
188 |
need to be accessible (it could be a null pointer for example), but C<rw> |
189 |
and C<locality> must be compile-time constants. |
190 |
|
191 |
An obvious way to use this is to prefetch some data far away, in a big |
192 |
array you loop over. This prefetches memory some 128 array elements later, |
193 |
in the hope that it will be ready when the CPU arrives at that location. |
194 |
|
195 |
int sum = 0; |
196 |
|
197 |
for (i = 0; i < N; ++i) |
198 |
{ |
199 |
sum += arr [i] |
200 |
ecb_prefetch (arr + i + 128, 0, 0); |
201 |
} |
202 |
|
203 |
It's hard to predict how far to prefetch, and most CPUs that can prefetch |
204 |
are often good enough to predict this kind of behaviour themselves. It |
205 |
gets more interesting with linked lists, especially when you do some fair |
206 |
processing on each list element: |
207 |
|
208 |
for (node *n = start; n; n = n->next) |
209 |
{ |
210 |
ecb_prefetch (n->next, 0, 0); |
211 |
... do medium amount of work with *n |
212 |
} |
213 |
|
214 |
After processing the node, (part of) the next node might already be in |
215 |
cache. |
216 |
|
217 |
=back |
218 |
|
219 |
=head2 BIT FIDDLING / BITSTUFFS |
220 |
|
221 |
=over 4 |
222 |
|
223 |
=item bool ecb_big_endian () |
224 |
|
225 |
=item bool ecb_little_endian () |
226 |
|
227 |
These two functions return true if the byte order is big endian |
228 |
(most-significant byte first) or little endian (least-significant byte |
229 |
first) respectively. |
230 |
|
231 |
=item int ecb_ctz32 (uint32_t x) |
232 |
|
233 |
Returns the index of the least significant bit set in C<x> (or |
234 |
equivalently the number of bits set to 0 before the least significant |
235 |
bit set), starting from 0. If C<x> is 0 the result is undefined. A |
236 |
common use case is to compute the integer binary logarithm, i.e., |
237 |
floor(log2(n)). For example: |
238 |
|
239 |
ecb_ctz32(3) = 0 |
240 |
ecb_ctz32(6) = 1 |
241 |
|
242 |
=item int ecb_popcount32 (uint32_t x) |
243 |
|
244 |
Returns the number of bits set to 1 in C<x>. For example: |
245 |
|
246 |
ecb_popcount32(7) = 3 |
247 |
ecb_popcount32(255) = 8 |
248 |
|
249 |
=item uint32_t ecb_bswap16 (uint32_t x) |
250 |
|
251 |
=item uint32_t ecb_bswap32 (uint32_t x) |
252 |
|
253 |
These two functions return the value of the 16-bit (32-bit) variable |
254 |
C<x> after reversing the order of bytes. |
255 |
|
256 |
=item uint32_t ecb_rotr32 (uint32_t x, unsigned int count) |
257 |
|
258 |
=item uint32_t ecb_rotl32 (uint32_t x, unsigned int count) |
259 |
|
260 |
These two functions return the value of C<x> after shifting all the bits |
261 |
by C<count> positions to the right or left respectively. |
262 |
|
263 |
=back |
264 |
|
265 |
=head2 ARITHMETIC |
266 |
|
267 |
=over 4 |
268 |
|
269 |
=item x = ecb_mod (m, n) [MACRO] |
270 |
|
271 |
Returns the positive remainder of the modulo operation between C<m> |
272 |
and C<n>. |
273 |
|
274 |
=back |
275 |
|
276 |
=head2 UTILITY |
277 |
|
278 |
=over 4 |
279 |
|
280 |
=item element_count = ecb_array_length (name) [MACRO] |
281 |
|
282 |
Returns the number of elements in the array C<name>. For example: |
283 |
|
284 |
int primes[] = { 2, 3, 5, 7, 11 }; |
285 |
int sum = 0; |
286 |
|
287 |
for (i = 0; i < ecb_array_length (primes); i++) |
288 |
sum += primes [i]; |
289 |
|
290 |
=back |
291 |
|
292 |
|