--- libecb/ecb.pod	2011/05/26 23:23:08	1.14
+++ libecb/ecb.pod	2012/05/29 14:35:43	1.45
@@ -5,7 +5,7 @@
 Libecb is currently a simple header file that doesn't require any
 configuration to use or include in your project.
 
-It's part of the e-suite of libraries, other memembers of which include
+It's part of the e-suite of libraries, other members of which include
 libev and libeio.
 
 Its homepage can be found here:
@@ -14,9 +14,13 @@
 
 It mainly provides a number of wrappers around GCC built-ins, together
 with replacement functions for other compilers. In addition to this,
-it provides a number of other lowlevel C utilities, such endienness
+it provides a number of other lowlevel C utilities, such as endianness
 detection, byte swapping or bit rotations.
 
+Or in other words, things that should be built into any standard C system,
+but aren't, implemented as efficient as possible with GCC, and still
+correct with other compilers.
+
 More might come.
 
 =head2 ABOUT THE HEADER
@@ -29,7 +33,7 @@
 The header should work fine for both C and C++ compilation, and gives you
 all of F<inttypes.h> in addition to the ECB symbols.
 
-There are currently no objetc files to link to - future versions might
+There are currently no object files to link to - future versions might
 come with an (optional) object code library to link against, to reduce
 code size or gain access to additional features.
 
@@ -52,16 +56,95 @@
 is usually implemented as a macro. Specifically, a "bool" in this manual
 refers to any kind of boolean value, not a specific type.
 
+=head2 TYPES / TYPE SUPPORT
+
+ecb.h makes sure that the following types are defined (in the expected way):
+
+   int8_t   uint8_t   int16_t  uint16_t
+   int32_t  uint32_t  int64_t  uint64_t
+   intptr_t uintptr_t ptrdiff_t
+
+The macro C<ECB_PTRSIZE> is defined to the size of a pointer on this
+platform (currently C<4> or C<8>) and can be used in preprocessor
+expressions.
+
+=head2 LANGUAGE/COMPILER VERSIONS
+
+All the following symbols expand to an expressionb that cna be tested in
+preprocessor instructions as well as treated as a boolean (use C<!!> to
+ensure it's either C<0> or C<1> if you need that).
+
+=over 4
+
+=item ECB_C
+
+True if the implementation defines the C<__STDC__> macro to a true value,
+which is typically true for both C and C++ compilers.
+
+=item ECB_C99
+
+True if the implementation claims to be C99 compliant.
+
+=item ECB_C11
+
+True if the implementation claims to be C11 compliant.
+
+=item ECB_CPP
+
+True if the implementation defines the C<__cplusplus__> macro to a true
+value, which is typically true for C++ compilers.
+
+=item ECB_CPP98
+
+True if the implementation claims to be compliant to ISO/IEC 14882:1998
+(the first C++ ISO standard) or any later vwersion. Typically true for all
+C++ compilers.
+
+=item ECB_CPP11
+
+True if the implementation claims to be compliant to ISO/IEC 14882:2011
+(C++11) or any later vwersion.
+
+=item ECB_GCC_VERSION(major,minor)
+
+Expands to a true value (suitable for testing in by the preprocessor)
+if the compiler used is GNU C and the version is the givne version, or
+higher.
+
+This macro tries to return false on compilers that claim to be GCC
+compatible but aren't.
+
+=back
+
 =head2 GCC ATTRIBUTES
 
-blabla where to put, what others
+A major part of libecb deals with GCC attributes. These are additional
+attributes that you can assign to functions, variables and sometimes even
+types - much like C<const> or C<volatile> in C.
+
+While GCC allows declarations to show up in many surprising places,
+but not in many expected places, the safest way is to put attribute
+declarations before the whole declaration:
+
+   ecb_const int mysqrt (int a);
+   ecb_unused int i;
+
+For variables, it is often nicer to put the attribute after the name, and
+avoid multiple declarations using commas:
+
+   int i ecb_unused;
 
 =over 4
 
 =item ecb_attribute ((attrs...))
 
-A simple wrapper that expands to C<__attribute__((attrs))> on GCC, and
-to nothing on other compilers, so the effect is that only GCC sees these.
+A simple wrapper that expands to C<__attribute__((attrs))> on GCC, and to
+nothing on other compilers, so the effect is that only GCC sees these.
+
+Example: use the C<deprecated> attribute on a function.
+
+  ecb_attribute((__deprecated__)) void
+  do_not_use_me_anymore (void);
 
 =item ecb_unused
 
@@ -69,15 +152,30 @@
 warning by GCC when it detects it as unused. This is useful when you e.g.
 declare a variable but do not always use it:
 
-   {
-     int var ecb_unused;
+  {
+    int var ecb_unused;
 
-     #ifdef SOMECONDITION
-        var = ...;
-        return var;
-     #else
-        return 0;
-     #endif
+    #ifdef SOMECONDITION
+       var = ...;
+       return var;
+    #else
+       return 0;
+    #endif
+  }
+
+=item ecb_inline
+
+This is not actually an attribute, but you use it like one. It expands
+either to C<static inline> or to just C<static>, if inline isn't
+supported. It should be used to declare functions that should be inlined,
+for code size or speed reasons.
+
+Example: inline this function, it surely will reduce codesize.
+
+   ecb_inline int
+   negmul (int a, int b)
+   {
+     return - (a * b);
    }
 
 =item ecb_noinline
@@ -88,16 +186,113 @@
 
 =item ecb_noreturn
 
+Marks a function as "not returning, ever". Some typical functions that
+don't return are C<exit> or C<abort> (which really works hard to not
+return), and now you can make your own:
+
+   ecb_noreturn void
+   my_abort (const char *errline)
+   {
+     puts (errline);
+     abort ();
+   }
+
+In this case, the compiler would probably be smart enough to deduce it on
+its own, so this is mainly useful for declarations.
+
 =item ecb_const
 
+Declares that the function only depends on the values of its arguments,
+much like a mathematical function. It specifically does not read or write
+any memory any arguments might point to, global variables, or call any
+non-const functions. It also must not have any side effects.
+
+Such a function can be optimised much more aggressively by the compiler -
+for example, multiple calls with the same arguments can be optimised into
+a single call, which wouldn't be possible if the compiler would have to
+expect any side effects.
+
+It is best suited for functions in the sense of mathematical functions,
+such as a function returning the square root of its input argument.
+
+Not suited would be a function that calculates the hash of some memory
+area you pass in, prints some messages or looks at a global variable to
+decide on rounding.
+
+See C<ecb_pure> for a slightly less restrictive class of functions.
+
 =item ecb_pure
 
+Similar to C<ecb_const>, declares a function that has no side
+effects. Unlike C<ecb_const>, the function is allowed to examine global
+variables and any other memory areas (such as the ones passed to it via
+pointers).
+
+While these functions cannot be optimised as aggressively as C<ecb_const>
+functions, they can still be optimised away in many occasions, and the
+compiler has more freedom in moving calls to them around.
+
+Typical examples for such functions would be C<strlen> or C<memcmp>. A
+function that calculates the MD5 sum of some input and updates some MD5
+state passed as argument would I<NOT> be pure, however, as it would modify
+some memory area that is not the return value.
+
 =item ecb_hot
 
+This declares a function as "hot" with regards to the cache - the function
+is used so often, that it is very beneficial to keep it in the cache if
+possible.
+
+The compiler reacts by trying to place hot functions near to each other in
+memory.
+
+Whether a function is hot or not often depends on the whole program,
+and less on the function itself. C<ecb_cold> is likely more useful in
+practise.
+
 =item ecb_cold
 
+The opposite of C<ecb_hot> - declares a function as "cold" with regards to
+the cache, or in other words, this function is not called often, or not at
+speed-critical times, and keeping it in the cache might be a waste of said
+cache.
+
+In addition to placing cold functions together (or at least away from hot
+functions), this knowledge can be used in other ways, for example, the
+function will be optimised for size, as opposed to speed, and codepaths
+leading to calls to those functions can automatically be marked as if
+C<ecb_expect_false> had been used to reach them.
+
+Good examples for such functions would be error reporting functions, or
+functions only called in exceptional or rare cases.
+
 =item ecb_artificial
 
+Declares the function as "artificial", in this case meaning that this
+function is not really mean to be a function, but more like an accessor
+- many methods in C++ classes are mere accessor functions, and having a
+crash reported in such a method, or single-stepping through them, is not
+usually so helpful, especially when it's inlined to just a few instructions.
+
+Marking them as artificial will instruct the debugger about just this,
+leading to happier debugging and thus happier lives.
+
+Example: in some kind of smart-pointer class, mark the pointer accessor as
+artificial, so that the whole class acts more like a pointer and less like
+some C++ abstraction monster.
+
+  template<typename T>
+  struct my_smart_ptr
+  {
+    T *value;
+
+    ecb_artificial
+    operator T *()
+    {
+      return value;
+    }
+  };
+
 =back
 
 =head2 OPTIMISATION HINTS
@@ -138,12 +333,12 @@
 the C<expr> evaluates to C<value> a lot, which can be used for static
 branch optimisations.
 
-Usually, you want to use the more intuitive C<ecb_likely> and
-C<ecb_unlikely> functions instead.
+Usually, you want to use the more intuitive C<ecb_expect_true> and
+C<ecb_expect_false> functions instead.
 
-=item bool ecb_likely (bool)
+=item bool ecb_expect_true (cond)
 
-=item bool ecb_unlikely (bool)
+=item bool ecb_expect_false (cond)
 
 These two functions expect a expression that is true or false and return
 C<1> or C<0>, respectively, so when used in the condition of an C<if> or
@@ -151,18 +346,18 @@
 
   /* these two do the same thing */
   if (some_condition) ...;
-  if (ecb_likely (some_condition)) ...;
+  if (ecb_expect_true (some_condition)) ...;
 
-However, by using C<ecb_likely>, you tell the compiler that the condition
-is likely to be true (and for C<ecb_unlikely>, that it is unlikely to be
-true).
+However, by using C<ecb_expect_true>, you tell the compiler that the
+condition is likely to be true (and for C<ecb_expect_false>, that it is
+unlikely to be true).
 
 For example, when you check for a null pointer and expect this to be a
-rare, exceptional, case, then use C<ecb_unlikely>:
+rare, exceptional, case, then use C<ecb_expect_false>:
 
   void my_free (void *ptr)
   {
-    if (ecb_unlikely (ptr == 0))
+    if (ecb_expect_false (ptr == 0))
       return;
   }
 
@@ -170,6 +365,12 @@
 tell the compiler what the hot path through a function is can increase
 performance considerably.
 
+You might know these functions under the name C<likely> and C<unlikely>
+- while these are common aliases, we find that the expect name is easier
+to understand when quickly skimming code. If you wish, you can use
+C<ecb_likely> instead of C<ecb_expect_true> and C<ecb_unlikely> instead of
+C<ecb_expect_false> - these are simply aliases.
+
 A very good example is in a function that reserves more space for some
 memory block (for example, inside an implementation of a string stream) -
 each time something is added, you have to check for a buffer overrun, but
@@ -179,7 +380,7 @@
   ecb_inline void
   reserve (int size)
   {
-    if (ecb_unlikely (current + size > end))
+    if (ecb_expect_false (current + size > end))
       real_reserve_method (size); /* presumably noinline */
   }
 
@@ -192,13 +393,13 @@
 conditions that might improve code generation, but which are impossible to
 deduce form the code itself.
 
-For example, the example reservation function from the C<ecb_unlikely>
+For example, the example reservation function from the C<ecb_expect_false>
 description could be written thus (only C<ecb_assume> was added):
 
   ecb_inline void
   reserve (int size)
   {
-    if (ecb_unlikely (current + size > end))
+    if (ecb_expect_false (current + size > end))
       real_reserve_method (size); /* presumably noinline */
 
     ecb_assume (current + size <= end);
@@ -257,7 +458,7 @@
 
 =back
 
-=head2 BIT FIDDLING / BITSTUFFS
+=head2 BIT FIDDLING / BIT WIZARDRY
 
 =over 4
 
@@ -269,37 +470,108 @@
 (most-significant byte first) or little endian (least-significant byte
 first) respectively.
 
+On systems that are neither, their return values are unspecified.
+
 =item int ecb_ctz32 (uint32_t x)
 
+=item int ecb_ctz64 (uint64_t x)
+
 Returns the index of the least significant bit set in C<x> (or
-equivalently the number of bits set to 0 before the least significant
-bit set), starting from 0. If C<x> is 0 the result is undefined. A
-common use case is to compute the integer binary logarithm, i.e.,
-floor(log2(n)). For example:
+equivalently the number of bits set to 0 before the least significant bit
+set), starting from 0. If C<x> is 0 the result is undefined.
+
+For smaller types than C<uint32_t> you can safely use C<ecb_ctz32>.
+
+For example:
+
+  ecb_ctz32 (3) = 0
+  ecb_ctz32 (6) = 1
+
+=item bool ecb_is_pot32 (uint32_t x)
+
+=item bool ecb_is_pot64 (uint32_t x)
+
+Return true iff C<x> is a power of two or C<x == 0>.
 
-  ecb_ctz32(3) = 0
-  ecb_ctz32(6) = 1
+For smaller types then C<uint32_t> you can safely use C<ecb_is_pot32>.
+
+=item int ecb_ld32 (uint32_t x)
+
+=item int ecb_ld64 (uint64_t x)
+
+Returns the index of the most significant bit set in C<x>, or the number
+of digits the number requires in binary (so that C<< 2**ld <= x <
+2**(ld+1) >>). If C<x> is 0 the result is undefined. A common use case is
+to compute the integer binary logarithm, i.e. C<floor (log2 (n))>, for
+example to see how many bits a certain number requires to be encoded.
+
+This function is similar to the "count leading zero bits" function, except
+that that one returns how many zero bits are "in front" of the number (in
+the given data type), while C<ecb_ld> returns how many bits the number
+itself requires.
+
+For smaller types than C<uint32_t> you can safely use C<ecb_ld32>.
 
 =item int ecb_popcount32 (uint32_t x)
 
-Returns the number of bits set to 1 in C<x>. For example:
+=item int ecb_popcount64 (uint64_t x)
+
+Returns the number of bits set to 1 in C<x>.
+
+For smaller types than C<uint32_t> you can safely use C<ecb_popcount32>.
+
+For example:
+
+  ecb_popcount32 (7) = 3
+  ecb_popcount32 (255) = 8
+
+=item uint8_t  ecb_bitrev8  (uint8_t  x)
+
+=item uint16_t ecb_bitrev16 (uint16_t x)
+
+=item uint32_t ecb_bitrev32 (uint32_t x)
+
+Reverses the bits in x, i.e. the MSB becomes the LSB, MSB-1 becomes LSB+1
+and so on.
 
-  ecb_popcount32(7) = 3
-  ecb_popcount32(255) = 8
+Example:
+
+   ecb_bitrev8 (0xa7) = 0xea
+   ecb_bitrev32 (0xffcc4411) = 0x882233ff
 
 =item uint32_t ecb_bswap16 (uint32_t x)
 
 =item uint32_t ecb_bswap32 (uint32_t x)
 
-These two functions return the value of the 16-bit (32-bit) variable
-C<x> after reversing the order of bytes.
+=item uint64_t ecb_bswap64 (uint64_t x)
 
-=item uint32_t ecb_rotr32 (uint32_t x, unsigned int count)
+These functions return the value of the 16-bit (32-bit, 64-bit) value
+C<x> after reversing the order of bytes (0x11223344 becomes 0x44332211 in
+C<ecb_bswap32>).
+
+=item uint8_t  ecb_rotl8  (uint8_t  x, unsigned int count)
+
+=item uint16_t ecb_rotl16 (uint16_t x, unsigned int count)
 
 =item uint32_t ecb_rotl32 (uint32_t x, unsigned int count)
 
-These two functions return the value of C<x> after shifting all the bits
-by C<count> positions to the right or left respectively.
+=item uint64_t ecb_rotl64 (uint64_t x, unsigned int count)
+
+=item uint8_t  ecb_rotr8  (uint8_t  x, unsigned int count)
+
+=item uint16_t ecb_rotr16 (uint16_t x, unsigned int count)
+
+=item uint32_t ecb_rotr32 (uint32_t x, unsigned int count)
+
+=item uint64_t ecb_rotr64 (uint64_t x, unsigned int count)
+
+These two families of functions return the value of C<x> after rotating
+all the bits by C<count> positions to the right (C<ecb_rotr>) or left
+(C<ecb_rotl>).
+
+Current GCC versions understand these functions and usually compile them
+to "optimal" code (e.g. a single C<rol> or a combination of C<shld> on
+x86).
 
 =back
 
@@ -309,13 +581,38 @@
 
 =item x = ecb_mod (m, n)
 
-Returns the positive remainder of the modulo operation between C<m> and
-C<n>. Unlike the C moduloe operator C<%>, this function ensures that the
-return value is always positive).
+Returns C<m> modulo C<n>, which is the same as the positive remainder
+of the division operation between C<m> and C<n>, using floored
+division. Unlike the C remainder operator C<%>, this function ensures that
+the return value is always positive and that the two numbers I<m> and
+I<m' = m + i * n> result in the same value modulo I<n> - in other words,
+C<ecb_mod> implements the mathematical modulo operation, which is missing
+in the language.
 
-C<n> must be strictly positive (i.e. C<< >1 >>), while C<m> must be
+C<n> must be strictly positive (i.e. C<< >= 1 >>), while C<m> must be
 negatable, that is, both C<m> and C<-m> must be representable in its
-type.
+type (this typically excludes the minimum signed integer value, the same
+limitation as for C</> and C<%> in C).
+
+Current GCC versions compile this into an efficient branchless sequence on
+almost all CPUs.
+
+For example, when you want to rotate forward through the members of an
+array for increasing C<m> (which might be negative), then you should use
+C<ecb_mod>, as the C<%> operator might give either negative results, or
+change direction for negative values:
+
+   for (m = -100; m <= 100; ++m)
+     int elem = myarray [ecb_mod (m, ecb_array_length (myarray))];
+
+=item x = ecb_div_rd (val, div)
+
+=item x = ecb_div_ru (val, div)
+
+Returns C<val> divided by C<div> rounded down or up, respectively.
+C<val> and C<div> must have integer types and C<div> must be strictly
+positive. Note that these functions are implemented with macros in C
+and with function templates in C++.
 
 =back
 
@@ -323,7 +620,7 @@
 
 =over 4
 
-=item element_count = ecb_array_length (name) [MACRO]
+=item element_count = ecb_array_length (name)
 
 Returns the number of elements in the array C<name>. For example:
 
@@ -335,4 +632,28 @@
 
 =back
 
+=head2 SYMBOLS GOVERNING COMPILATION OF ECB.H ITSELF
+
+These symbols need to be defined before including F<ecb.h> the first time.
+
+=over 4
+
+=item ECB_NO_THRADS
+
+If F<ecb.h> is never used from multiple threads, then this symbol can
+be defined, in which case memory fences (and similar constructs) are
+completely removed, leading to more efficient code and fewer dependencies.
+
+Setting this symbol to a true value implies C<ECB_NO_SMP>.
+
+=item ECB_NO_SMP
+
+The weaker version of C<ECB_NO_THREADS> - if F<ecb.h> is used from
+multiple threads, but never concurrently (e.g. if the system the program
+runs on has only a single CPU with a single core, no hyperthreading and so
+on), then this symbol can be defined, leading to more efficient code and
+fewer dependencies.
+
+=back
+