--- libecb/ecb.pod	2011/05/31 21:52:31	1.25
+++ libecb/ecb.pod	2014/10/16 14:47:20	1.57
@@ -56,14 +56,123 @@
 is usually implemented as a macro. Specifically, a "bool" in this manual
 refers to any kind of boolean value, not a specific type.
 
+=head2 TYPES / TYPE SUPPORT
+
+ecb.h makes sure that the following types are defined (in the expected way):
+
+   int8_t   uint8_t   int16_t  uint16_t
+   int32_t  uint32_t  int64_t  uint64_t
+   intptr_t uintptr_t
+
+The macro C<ECB_PTRSIZE> is defined to the size of a pointer on this
+platform (currently C<4> or C<8>) and can be used in preprocessor
+expressions.
+
+For C<ptrdiff_t> and C<size_t> use C<stddef.h>.
+
+=head2 LANGUAGE/COMPILER VERSIONS
+
+All the following symbols expand to an expression that can be tested in
+preprocessor instructions as well as treated as a boolean (use C<!!> to
+ensure it's either C<0> or C<1> if you need that).
+
+=over 4
+
+=item ECB_C
+
+True if the implementation defines the C<__STDC__> macro to a true value,
+while not claiming to be C++.
+
+=item ECB_C99
+
+True if the implementation claims to be compliant to C99 (ISO/IEC
+9899:1999) or any later version, while not claiming to be C++.
+
+Note that later versions (ECB_C11) remove core features again (for
+example, variable length arrays).
+
+=item ECB_C11
+
+True if the implementation claims to be compliant to C11 (ISO/IEC
+9899:2011) or any later version, while not claiming to be C++.
+
+=item ECB_CPP
+
+True if the implementation defines the C<__cplusplus__> macro to a true
+value, which is typically true for C++ compilers.
+
+=item ECB_CPP11
+
+True if the implementation claims to be compliant to ISO/IEC 14882:2011
+(C++11) or any later version.
+
+=item ECB_GCC_VERSION (major, minor)
+
+Expands to a true value (suitable for testing in by the preprocessor)
+if the compiler used is GNU C and the version is the given version, or
+higher.
+
+This macro tries to return false on compilers that claim to be GCC
+compatible but aren't.
+
+=item ECB_EXTERN_C
+
+Expands to C<extern "C"> in C++, and a simple C<extern> in C.
+
+This can be used to declare a single external C function:
+
+   ECB_EXTERN_C int printf (const char *format, ...);
+
+=item ECB_EXTERN_C_BEG / ECB_EXTERN_C_END
+
+These two macros can be used to wrap multiple C<extern "C"> definitions -
+they expand to nothing in C.
+
+They are most useful in header files:
+
+   ECB_EXTERN_C_BEG
+
+   int mycfun1 (int x);
+   int mycfun2 (int x);
+
+   ECB_EXTERN_C_END
+
+=item ECB_STDFP
+
+If this evaluates to a true value (suitable for testing in by the
+preprocessor), then C<float> and C<double> use IEEE 754 single/binary32
+and double/binary64 representations internally I<and> the endianness of
+both types match the endianness of C<uint32_t> and C<uint64_t>.
+
+This means you can just copy the bits of a C<float> (or C<double>) to an
+C<uint32_t> (or C<uint64_t>) and get the raw IEEE 754 bit representation
+without having to think about format or endianness.
+
+This is true for basically all modern platforms, although F<ecb.h> might
+not be able to deduce this correctly everywhere and might err on the safe
+side.
+
+=item ECB_AMD64, ECB_AMD64_X32
+
+These two macros are defined to C<1> on the x86_64/amd64 ABI and the X32
+ABI, respectively, and undefined elsewhere.
+
+The designers of the new X32 ABI for some inexplicable reason decided to
+make it look exactly like amd64, even though it's completely incompatible
+to that ABI, breaking about every piece of software that assumed that
+C<__x86_64> stands for, well, the x86-64 ABI, making these macros
+necessary.
+
+=back
+
 =head2 GCC ATTRIBUTES
 
 A major part of libecb deals with GCC attributes. These are additional
-attributes that you cna assign to functions, variables and sometimes even
+attributes that you can assign to functions, variables and sometimes even
 types - much like C<const> or C<volatile> in C.
 
 While GCC allows declarations to show up in many surprising places,
-but not in many expeted places, the safest way is to put attribute
+but not in many expected places, the safest way is to put attribute
 declarations before the whole declaration:
 
    ecb_const int mysqrt (int a);
@@ -78,8 +187,9 @@
 
 =item ecb_attribute ((attrs...))
 
-A simple wrapper that expands to C<__attribute__((attrs))> on GCC, and to
-nothing on other compilers, so the effect is that only GCC sees these.
+A simple wrapper that expands to C<__attribute__((attrs))> on GCC 3.1+ and
+Clang 2.8+, and to nothing on other compilers, so the effect is that only
+GCC and Clang see these.
 
 Example: use the C<deprecated> attribute on a function.
 
@@ -103,6 +213,26 @@
     #endif
   }
 
+=item ecb_deprecated
+
+Similar to C<ecb_unused>, but marks a function, variable or type as
+deprecated. This makes some compilers warn when the type is used.
+
+=item ecb_inline
+
+This is not actually an attribute, but you use it like one. It expands
+either to C<static inline> or to just C<static>, if inline isn't
+supported. It should be used to declare functions that should be inlined,
+for code size or speed reasons.
+
+Example: inline this function, it surely will reduce codesize.
+
+   ecb_inline int
+   negmul (int a, int b)
+   {
+     return - (a * b);
+   }
+
 =item ecb_noinline
 
 Prevent a function from being inlined - it might be optimised away, but
@@ -125,6 +255,27 @@
 In this case, the compiler would probably be smart enough to deduce it on
 its own, so this is mainly useful for declarations.
 
+=item ecb_restrict
+
+Expands to the C<restrict> keyword or equivalent on compilers that support
+them, and to nothing on others. Must be specified on a pointer type or
+an array index to indicate that the memory doesn't alias with any other
+restricted pointer in the same scope.
+
+Example: multiply a vector, and allow the compiler to parallelise the
+loop, because it knows it doesn't overwrite input values.
+
+   void
+   multiply (float *ecb_restrict src,
+             float *ecb_restrict dst,
+             int len, float factor)
+   {
+     int i;
+
+     for (i = 0; i < len; ++i)
+       dst [i] = src [i] * factor;
+   }
+
 =item ecb_const
 
 Declares that the function only depends on the values of its arguments,
@@ -186,7 +337,7 @@
 functions), this knowledge can be used in other ways, for example, the
 function will be optimised for size, as opposed to speed, and codepaths
 leading to calls to those functions can automatically be marked as if
-C<ecb_unlikely> had been used to reach them.
+C<ecb_expect_false> had been used to reach them.
 
 Good examples for such functions would be error reporting functions, or
 functions only called in exceptional or rare cases.
@@ -194,7 +345,7 @@
 =item ecb_artificial
 
 Declares the function as "artificial", in this case meaning that this
-function is not really mean to be a function, but more like an accessor
+function is not really meant to be a function, but more like an accessor
 - many methods in C++ classes are mere accessor functions, and having a
 crash reported in such a method, or single-stepping through them, is not
 usually so helpful, especially when it's inlined to just a few instructions.
@@ -258,12 +409,12 @@
 the C<expr> evaluates to C<value> a lot, which can be used for static
 branch optimisations.
 
-Usually, you want to use the more intuitive C<ecb_likely> and
-C<ecb_unlikely> functions instead.
+Usually, you want to use the more intuitive C<ecb_expect_true> and
+C<ecb_expect_false> functions instead.
 
-=item bool ecb_likely (cond)
+=item bool ecb_expect_true (cond)
 
-=item bool ecb_unlikely (cond)
+=item bool ecb_expect_false (cond)
 
 These two functions expect a expression that is true or false and return
 C<1> or C<0>, respectively, so when used in the condition of an C<if> or
@@ -271,18 +422,18 @@
 
   /* these two do the same thing */
   if (some_condition) ...;
-  if (ecb_likely (some_condition)) ...;
+  if (ecb_expect_true (some_condition)) ...;
 
-However, by using C<ecb_likely>, you tell the compiler that the condition
-is likely to be true (and for C<ecb_unlikely>, that it is unlikely to be
-true).
+However, by using C<ecb_expect_true>, you tell the compiler that the
+condition is likely to be true (and for C<ecb_expect_false>, that it is
+unlikely to be true).
 
 For example, when you check for a null pointer and expect this to be a
-rare, exceptional, case, then use C<ecb_unlikely>:
+rare, exceptional, case, then use C<ecb_expect_false>:
 
   void my_free (void *ptr)
   {
-    if (ecb_unlikely (ptr == 0))
+    if (ecb_expect_false (ptr == 0))
       return;
   }
 
@@ -290,6 +441,12 @@
 tell the compiler what the hot path through a function is can increase
 performance considerably.
 
+You might know these functions under the name C<likely> and C<unlikely>
+- while these are common aliases, we find that the expect name is easier
+to understand when quickly skimming code. If you wish, you can use
+C<ecb_likely> instead of C<ecb_expect_true> and C<ecb_unlikely> instead of
+C<ecb_expect_false> - these are simply aliases.
+
 A very good example is in a function that reserves more space for some
 memory block (for example, inside an implementation of a string stream) -
 each time something is added, you have to check for a buffer overrun, but
@@ -299,7 +456,7 @@
   ecb_inline void
   reserve (int size)
   {
-    if (ecb_unlikely (current + size > end))
+    if (ecb_expect_false (current + size > end))
       real_reserve_method (size); /* presumably noinline */
   }
 
@@ -312,13 +469,13 @@
 conditions that might improve code generation, but which are impossible to
 deduce form the code itself.
 
-For example, the example reservation function from the C<ecb_unlikely>
+For example, the example reservation function from the C<ecb_expect_false>
 description could be written thus (only C<ecb_assume> was added):
 
   ecb_inline void
   reserve (int size)
   {
-    if (ecb_unlikely (current + size > end))
+    if (ecb_expect_false (current + size > end))
       real_reserve_method (size); /* presumably noinline */
 
     ecb_assume (current + size <= end);
@@ -377,7 +534,7 @@
 
 =back
 
-=head2 BIT FIDDLING / BITSTUFFS
+=head2 BIT FIDDLING / BIT WIZARDRY
 
 =over 4
 
@@ -393,38 +550,159 @@
 
 =item int ecb_ctz32 (uint32_t x)
 
+=item int ecb_ctz64 (uint64_t x)
+
 Returns the index of the least significant bit set in C<x> (or
 equivalently the number of bits set to 0 before the least significant bit
-set), starting from 0. If C<x> is 0 the result is undefined. A common use
-case is to compute the integer binary logarithm, i.e.,  C<floor (log2
-(n))>. For example:
+set), starting from 0. If C<x> is 0 the result is undefined.
+
+For smaller types than C<uint32_t> you can safely use C<ecb_ctz32>.
+
+For example:
 
   ecb_ctz32 (3) = 0
   ecb_ctz32 (6) = 1
 
+=item bool ecb_is_pot32 (uint32_t x)
+
+=item bool ecb_is_pot64 (uint32_t x)
+
+Return true iff C<x> is a power of two or C<x == 0>.
+
+For smaller types then C<uint32_t> you can safely use C<ecb_is_pot32>.
+
+=item int ecb_ld32 (uint32_t x)
+
+=item int ecb_ld64 (uint64_t x)
+
+Returns the index of the most significant bit set in C<x>, or the number
+of digits the number requires in binary (so that C<< 2**ld <= x <
+2**(ld+1) >>). If C<x> is 0 the result is undefined. A common use case is
+to compute the integer binary logarithm, i.e. C<floor (log2 (n))>, for
+example to see how many bits a certain number requires to be encoded.
+
+This function is similar to the "count leading zero bits" function, except
+that that one returns how many zero bits are "in front" of the number (in
+the given data type), while C<ecb_ld> returns how many bits the number
+itself requires.
+
+For smaller types than C<uint32_t> you can safely use C<ecb_ld32>.
+
 =item int ecb_popcount32 (uint32_t x)
 
-Returns the number of bits set to 1 in C<x>. For example:
+=item int ecb_popcount64 (uint64_t x)
+
+Returns the number of bits set to 1 in C<x>.
+
+For smaller types than C<uint32_t> you can safely use C<ecb_popcount32>.
+
+For example:
 
   ecb_popcount32 (7) = 3
   ecb_popcount32 (255) = 8
 
+=item uint8_t  ecb_bitrev8  (uint8_t  x)
+
+=item uint16_t ecb_bitrev16 (uint16_t x)
+
+=item uint32_t ecb_bitrev32 (uint32_t x)
+
+Reverses the bits in x, i.e. the MSB becomes the LSB, MSB-1 becomes LSB+1
+and so on.
+
+Example:
+
+   ecb_bitrev8 (0xa7) = 0xea
+   ecb_bitrev32 (0xffcc4411) = 0x882233ff
+
 =item uint32_t ecb_bswap16 (uint32_t x)
 
 =item uint32_t ecb_bswap32 (uint32_t x)
 
-These two functions return the value of the 16-bit (32-bit) value C<x>
-after reversing the order of bytes (0x11223344 becomes 0x44332211).
+=item uint64_t ecb_bswap64 (uint64_t x)
 
-=item uint32_t ecb_rotr32 (uint32_t x, unsigned int count)
+These functions return the value of the 16-bit (32-bit, 64-bit) value
+C<x> after reversing the order of bytes (0x11223344 becomes 0x44332211 in
+C<ecb_bswap32>).
+
+=item uint8_t  ecb_rotl8  (uint8_t  x, unsigned int count)
+
+=item uint16_t ecb_rotl16 (uint16_t x, unsigned int count)
 
 =item uint32_t ecb_rotl32 (uint32_t x, unsigned int count)
 
-These two functions return the value of C<x> after rotating all the bits
-by C<count> positions to the right or left respectively.
+=item uint64_t ecb_rotl64 (uint64_t x, unsigned int count)
+
+=item uint8_t  ecb_rotr8  (uint8_t  x, unsigned int count)
+
+=item uint16_t ecb_rotr16 (uint16_t x, unsigned int count)
+
+=item uint32_t ecb_rotr32 (uint32_t x, unsigned int count)
+
+=item uint64_t ecb_rotr64 (uint64_t x, unsigned int count)
+
+These two families of functions return the value of C<x> after rotating
+all the bits by C<count> positions to the right (C<ecb_rotr>) or left
+(C<ecb_rotl>).
 
 Current GCC versions understand these functions and usually compile them
-to "optimal" code (e.g. a single C<roll> on x86).
+to "optimal" code (e.g. a single C<rol> or a combination of C<shld> on
+x86).
+
+=back
+
+=head2 FLOATING POINT FIDDLING
+
+=over 4
+
+=item uint32_t ecb_float_to_binary32  (float  x) [-UECB_NO_LIBM]
+
+=item uint64_t ecb_double_to_binary64 (double x) [-UECB_NO_LIBM]
+
+These functions each take an argument in the native C<float> or C<double>
+type and return the IEEE 754 bit representation of it.
+
+The bit representation is just as IEEE 754 defines it, i.e. the sign bit
+will be the most significant bit, followed by exponent and mantissa.
+
+This function should work even when the native floating point format isn't
+IEEE compliant, of course at a speed and code size penalty, and of course
+also within reasonable limits (it tries to convert NaNs, infinities and
+denormals, but will likely convert negative zero to positive zero).
+
+On all modern platforms (where C<ECB_STDFP> is true), the compiler should
+be able to optimise away this function completely.
+
+These functions can be helpful when serialising floats to the network - you
+can serialise the return value like a normal uint32_t/uint64_t.
+
+Another use for these functions is to manipulate floating point values
+directly.
+
+Silly example: toggle the sign bit of a float.
+
+   /* On gcc-4.7 on amd64, */
+   /* this results in a single add instruction to toggle the bit, and 4 extra */
+   /* instructions to move the float value to an integer register and back. */
+
+   x = ecb_binary32_to_float (ecb_float_to_binary32 (x) ^ 0x80000000U)
+
+=item float  ecb_binary32_to_float  (uint32_t x) [-UECB_NO_LIBM]
+
+=item double ecb_binary32_to_double (uint64_t x) [-UECB_NO_LIBM]
+
+The reverse operation of the previos function - takes the bit representation
+of an IEEE binary32 or binary64 number and converts it to the native C<float>
+or C<double> format.
+
+This function should work even when the native floating point format isn't
+IEEE compliant, of course at a speed and code size penalty, and of course
+also within reasonable limits (it tries to convert normals and denormals,
+and might be lucky for infinities, and with extraordinary luck, also for
+negative zero).
+
+On all modern platforms (where C<ECB_STDFP> is true), the compiler should
+be able to optimise away this function completely.
 
 =back
 
@@ -444,11 +722,11 @@
 
 C<n> must be strictly positive (i.e. C<< >= 1 >>), while C<m> must be
 negatable, that is, both C<m> and C<-m> must be representable in its
-type (this typically includes the minimum signed integer value, the same
+type (this typically excludes the minimum signed integer value, the same
 limitation as for C</> and C<%> in C).
 
 Current GCC versions compile this into an efficient branchless sequence on
-many systems.
+almost all CPUs.
 
 For example, when you want to rotate forward through the members of an
 array for increasing C<m> (which might be negative), then you should use
@@ -458,6 +736,15 @@
    for (m = -100; m <= 100; ++m)
      int elem = myarray [ecb_mod (m, ecb_array_length (myarray))];
 
+=item x = ecb_div_rd (val, div)
+
+=item x = ecb_div_ru (val, div)
+
+Returns C<val> divided by C<div> rounded down or up, respectively.
+C<val> and C<div> must have integer types and C<div> must be strictly
+positive. Note that these functions are implemented with macros in C
+and with function templates in C++.
+
 =back
 
 =head2 UTILITY
@@ -476,4 +763,34 @@
 
 =back
 
+=head2 SYMBOLS GOVERNING COMPILATION OF ECB.H ITSELF
+
+These symbols need to be defined before including F<ecb.h> the first time.
+
+=over 4
+
+=item ECB_NO_THREADS
+
+If F<ecb.h> is never used from multiple threads, then this symbol can
+be defined, in which case memory fences (and similar constructs) are
+completely removed, leading to more efficient code and fewer dependencies.
+
+Setting this symbol to a true value implies C<ECB_NO_SMP>.
+
+=item ECB_NO_SMP
+
+The weaker version of C<ECB_NO_THREADS> - if F<ecb.h> is used from
+multiple threads, but never concurrently (e.g. if the system the program
+runs on has only a single CPU with a single core, no hyperthreading and so
+on), then this symbol can be defined, leading to more efficient code and
+fewer dependencies.
+
+=item ECB_NO_LIBM
+
+When defined to C<1>, do not export any functions that might introduce
+dependencies on the math library (usually called F<-lm>) - these are
+marked with [-UECB_NO_LIBM].
+
+=back
+