libecb/ecb.pod

=head1 LIBECB - e-C-Builtins

=head2 ABOUT LIBECB

Libecb is currently a simple header file that doesn't require any
configuration to use or include in your project.

It's part of the e-suite of libraries, other members of which include
libev and libeio.

Its homepage can be found here:

    http://software.schmorp.de/pkg/libecb

It mainly provides a number of wrappers around many compiler built-ins,
together with replacement functions for other compilers. In addition
to this, it provides a number of other low-level C utilities, such as
endianness detection, byte swapping or bit rotations.

Or in other words, things that should be built into any standard C
system, but aren't, implemented as efficient as possible with GCC (clang,
MSVC...), and still correct with other compilers.

More might come.

=head2 ABOUT THE HEADER

At the moment, all you have to do is copy F<ecb.h> somewhere where your
compiler can find it and include it:

   #include <ecb.h>

The header should work fine for both C and C++ compilation, and gives you
all of F<inttypes.h> in addition to the ECB symbols.

There are currently no object files to link to - future versions might
come with an (optional) object code library to link against, to reduce
code size or gain access to additional features.

It also currently includes everything from F<inttypes.h>.

=head2 ABOUT THIS MANUAL / CONVENTIONS

This manual mainly describes each (public) function available after
including the F<ecb.h> header. The header might define other symbols than
these, but these are not part of the public API, and not supported in any
way.

When the manual mentions a "function" then this could be defined either as
as inline function, a macro, or an external symbol.

When functions use a concrete standard type, such as C<int> or
C<uint32_t>, then the corresponding function works only with that type. If
only a generic name is used (C<expr>, C<cond>, C<value> and so on), then
the corresponding function relies on C to implement the correct types, and
is usually implemented as a macro. Specifically, a "bool" in this manual
refers to any kind of boolean value, not a specific type.

=head2 TYPES / TYPE SUPPORT

F<ecb.h> makes sure that the following types are defined (in the expected way):

   int8_t       uint8_
   int16_t      uint16_t
   int32_t      uint32_
   int64_t      uint64_t
   int_fast8_t  uint_fast8_t
   int_fast16_t uint_fast16_t
   int_fast32_t uint_fast32_t
   int_fast64_t uint_fast64_t
   intptr_t     uintptr_t

The macro C<ECB_PTRSIZE> is defined to the size of a pointer on this
platform (currently C<4> or C<8>) and can be used in preprocessor
expressions.

For C<ptrdiff_t> and C<size_t> use C<stddef.h>/C<cstddef>.

=head2 LANGUAGE/ENVIRONMENT/COMPILER VERSIONS

All the following symbols expand to an expression that can be tested in
preprocessor instructions as well as treated as a boolean (use C<!!> to
ensure it's either C<0> or C<1> if you need that).

=over

=item ECB_C

True if the implementation defines the C<__STDC__> macro to a true value,
while not claiming to be C++, i..e C, but not C++.

=item ECB_C99

True if the implementation claims to be compliant to C99 (ISO/IEC
9899:1999) or any later version, while not claiming to be C++.

Note that later versions (ECB_C11) remove core features again (for
example, variable length arrays).

=item ECB_C11, ECB_C17

True if the implementation claims to be compliant to C11/C17 (ISO/IEC
9899:2011, :20187) or any later version, while not claiming to be C++.

=item ECB_CPP

True if the implementation defines the C<__cplusplus__> macro to a true
value, which is typically true for C++ compilers.

=item ECB_CPP11, ECB_CPP14, ECB_CPP17

True if the implementation claims to be compliant to C++11/C++14/C++17
(ISO/IEC 14882:2011, :2014, :2017) or any later version.

Note that many C++20 features will likely have their own feature test
macros (see e.g. L<http://eel.is/c++draft/cpp.predefined#1.8>).

=item ECB_OPTIMIZE_SIZE

Is C<1> when the compiler optimizes for size, C<0> otherwise. This symbol
can also be defined before including F<ecb.h>, in which case it will be
unchanged.

=item ECB_GCC_VERSION (major, minor)

Expands to a true value (suitable for testing by the preprocessor) if the
compiler used is GNU C and the version is the given version, or higher.

This macro tries to return false on compilers that claim to be GCC
compatible but aren't.

=item ECB_EXTERN_C

Expands to C<extern "C"> in C++, and a simple C<extern> in C.

This can be used to declare a single external C function:

   ECB_EXTERN_C int printf (const char *format, ...);

=item ECB_EXTERN_C_BEG / ECB_EXTERN_C_END

These two macros can be used to wrap multiple C<extern "C"> definitions -
they expand to nothing in C.

They are most useful in header files:

   ECB_EXTERN_C_BEG

   int mycfun1 (int x);
   int mycfun2 (int x);

   ECB_EXTERN_C_END

=item ECB_STDFP

If this evaluates to a true value (suitable for testing by the
preprocessor), then C<float> and C<double> use IEEE 754 single/binary32
and double/binary64 representations internally I<and> the endianness of
both types match the endianness of C<uint32_t> and C<uint64_t>.

This means you can just copy the bits of a C<float> (or C<double>) to an
C<uint32_t> (or C<uint64_t>) and get the raw IEEE 754 bit representation
without having to think about format or endianness.

This is true for basically all modern platforms, although F<ecb.h> might
not be able to deduce this correctly everywhere and might err on the safe
side.

=item ECB_64BIT_NATIVE

Evaluates to a true value (suitable for both preprocessor and C code
testing) if 64 bit integer types on this architecture are evaluated
"natively", that is, with similar speeds as 32 bit integers. While 64 bit
integer support is very common (and in fact required by libecb), 32 bit
CPUs have to emulate operations on them, so you might want to avoid them.

=item ECB_AMD64, ECB_AMD64_X32

These two macros are defined to C<1> on the x86_64/amd64 ABI and the X32
ABI, respectively, and undefined elsewhere.

The designers of the new X32 ABI for some inexplicable reason decided to
make it look exactly like amd64, even though it's completely incompatible
to that ABI, breaking about every piece of software that assumed that
C<__x86_64> stands for, well, the x86-64 ABI, making these macros
necessary.

=back

=head2 MACRO TRICKERY

=over

=item ECB_CONCAT (a, b)

Expands any macros in C<a> and C<b>, then concatenates the result to form
a single token. This is mainly useful to form identifiers from components,
e.g.:

   #define S1 str
   #define S2 cpy

   ECB_CONCAT (S1, S2)(dst, src); // == strcpy (dst, src);

=item ECB_STRINGIFY (arg)

Expands any macros in C<arg> and returns the stringified version of
it. This is mainly useful to get the contents of a macro in string form,
e.g.:

   #define SQL_LIMIT 100
   sql_exec ("select * from table limit " ECB_STRINGIFY (SQL_LIMIT));

=item ECB_STRINGIFY_EXPR (expr)

Like C<ECB_STRINGIFY>, but additionally evaluates C<expr> to make sure it
is a valid expression. This is useful to catch typos or cases where the
macro isn't available:

   #include <errno.h>

   ECB_STRINGIFY      (EDOM); // "33" (on my system at least)
   ECB_STRINGIFY_EXPR (EDOM); // "33"

   // now imagine we had a typo:

   ECB_STRINGIFY      (EDAM); // "EDAM"
   ECB_STRINGIFY_EXPR (EDAM); // error: EDAM undefined

=back

=head2 ATTRIBUTES

A major part of libecb deals with additional attributes that can be
assigned to functions, variables and sometimes even types - much like
C<const> or C<volatile> in C. They are implemented using either GCC
attributes or other compiler/language specific features. Attributes
declarations must be put before the whole declaration:

   ecb_const int mysqrt (int a);
   ecb_unused int i;

=over

=item ecb_unused

Marks a function or a variable as "unused", which simply suppresses a
warning by the compiler when it detects it as unused. This is useful when
you e.g. declare a variable but do not always use it:

  {
    ecb_unused int var;

    #ifdef SOMECONDITION
       var = ...;
       return var;
    #else
       return 0;
    #endif
  }

=item ecb_deprecated

Similar to C<ecb_unused>, but marks a function, variable or type as
deprecated. This makes some compilers warn when the type is used.

=item ecb_deprecated_message (message)

Same as C<ecb_deprecated>, but if possible, the specified diagnostic is
used instead of a generic depreciation message when the object is being
used.

=item ecb_inline

Expands either to (a compiler-specific equivalent of) C<static inline> or
to just C<static>, if inline isn't supported. It should be used to declare
functions that should be inlined, for code size or speed reasons.

Example: inline this function, it surely will reduce code size.

   ecb_inline int
   negmul (int a, int b)
   {
     return - (a * b);
   }

=item ecb_noinline

Prevents a function from being inlined - it might be optimised away, but
not inlined into other functions. This is useful if you know your function
is rarely called and large enough for inlining not to be helpful.

=item ecb_noreturn

Marks a function as "not returning, ever". Some typical functions that
don't return are C<exit> or C<abort> (which really works hard to not
return), and now you can make your own:

   ecb_noreturn void
   my_abort (const char *errline)
   {
     puts (errline);
     abort ();
   }

In this case, the compiler would probably be smart enough to deduce it on
its own, so this is mainly useful for declarations.

=item ecb_restrict

Expands to the C<restrict> keyword or equivalent on compilers that support
them, and to nothing on others. Must be specified on a pointer type or
an array index to indicate that the memory doesn't alias with any other
restricted pointer in the same scope.

Example: multiply a vector, and allow the compiler to parallelise the
loop, because it knows it doesn't overwrite input values.

   void
   multiply (ecb_restrict float *src,
             ecb_restrict float *dst,
             int len, float factor)
   {
     int i;

     for (i = 0; i < len; ++i)
       dst [i] = src [i] * factor;
   }

=item ecb_const

Declares that the function only depends on the values of its arguments,
much like a mathematical function. It specifically does not read or write
any memory any arguments might point to, global variables, or call any
non-const functions. It also must not have any side effects.

Such a function can be optimised much more aggressively by the compiler -
for example, multiple calls with the same arguments can be optimised into
a single call, which wouldn't be possible if the compiler would have to
expect any side effects.

It is best suited for functions in the sense of mathematical functions,
such as a function returning the square root of its input argument.

Not suited would be a function that calculates the hash of some memory
area you pass in, prints some messages or looks at a global variable to
decide on rounding.

See C<ecb_pure> for a slightly less restrictive class of functions.

=item ecb_pure

Similar to C<ecb_const>, declares a function that has no side
effects. Unlike C<ecb_const>, the function is allowed to examine global
variables and any other memory areas (such as the ones passed to it via
pointers).

While these functions cannot be optimised as aggressively as C<ecb_const>
functions, they can still be optimised away in many occasions, and the
compiler has more freedom in moving calls to them around.

Typical examples for such functions would be C<strlen> or C<memcmp>. A
function that calculates the MD5 sum of some input and updates some MD5
state passed as argument would I<NOT> be pure, however, as it would modify
some memory area that is not the return value.

=item ecb_hot

This declares a function as "hot" with regards to the cache - the function
is used so often, that it is very beneficial to keep it in the cache if
possible.

The compiler reacts by trying to place hot functions near to each other in
memory.

Whether a function is hot or not often depends on the whole program,
and less on the function itself. C<ecb_cold> is likely more useful in
practise.

=item ecb_cold

The opposite of C<ecb_hot> - declares a function as "cold" with regards to
the cache, or in other words, this function is not called often, or not at
speed-critical times, and keeping it in the cache might be a waste of said
cache.

In addition to placing cold functions together (or at least away from hot
functions), this knowledge can be used in other ways, for example, the
function will be optimised for size, as opposed to speed, and code paths
leading to calls to those functions can automatically be marked as if
C<ecb_expect_false> had been used to reach them.

Good examples for such functions would be error reporting functions, or
functions only called in exceptional or rare cases.

=item ecb_artificial

Declares the function as "artificial", in this case meaning that this
function is not really meant to be a function, but more like an accessor
- many methods in C++ classes are mere accessor functions, and having a
crash reported in such a method, or single-stepping through them, is not
usually so helpful, especially when it's inlined to just a few instructions.

Marking them as artificial will instruct the debugger about just this,
leading to happier debugging and thus happier lives.

Example: in some kind of smart-pointer class, mark the pointer accessor as
artificial, so that the whole class acts more like a pointer and less like
some C++ abstraction monster.

  template<typename T>
  struct my_smart_ptr
  {
    T *value;

    ecb_artificial
    operator T *()
    {
      return value;
    }
  };

=back

=head2 OPTIMISATION HINTS

=over

=item bool ecb_is_constant (expr)

Returns true iff the expression can be deduced to be a compile-time
constant, and false otherwise.

For example, when you have a C<rndm16> function that returns a 16 bit
random number, and you have a function that maps this to a range from
0..n-1, then you could use this inline function in a header file:

  ecb_inline uint32_t
  rndm (uint32_t n)
  {
    return (n * (uint32_t)rndm16 ()) >> 16;
  }

However, for powers of two, you could use a normal mask, but that is only
worth it if, at compile time, you can detect this case. This is the case
when the passed number is a constant and also a power of two (C<n & (n -
1) == 0>):

  ecb_inline uint32_t
  rndm (uint32_t n)
  {
    return is_constant (n) && !(n & (n - 1))
      ? rndm16 () & (num - 1)
      : (n * (uint32_t)rndm16 ()) >> 16;
  }

=item ecb_expect (expr, value)

Evaluates C<expr> and returns it. In addition, it tells the compiler that
the C<expr> evaluates to C<value> a lot, which can be used for static
branch optimisations.

Usually, you want to use the more intuitive C<ecb_expect_true> and
C<ecb_expect_false> functions instead.

=item bool ecb_expect_true (cond)

=item bool ecb_expect_false (cond)

These two functions expect a expression that is true or false and return
C<1> or C<0>, respectively, so when used in the condition of an C<if> or
other conditional statement, it will not change the program:

  /* these two do the same thing */
  if (some_condition) ...;
  if (ecb_expect_true (some_condition)) ...;

However, by using C<ecb_expect_true>, you tell the compiler that the
condition is likely to be true (and for C<ecb_expect_false>, that it is
unlikely to be true).

For example, when you check for a null pointer and expect this to be a
rare, exceptional, case, then use C<ecb_expect_false>:

  void my_free (void *ptr)
  {
    if (ecb_expect_false (ptr == 0))
      return;
  }

Consequent use of these functions to mark away exceptional cases or to
tell the compiler what the hot path through a function is can increase
performance considerably.

You might know these functions under the name C<likely> and C<unlikely>
- while these are common aliases, we find that the expect name is easier
to understand when quickly skimming code. If you wish, you can use
C<ecb_likely> instead of C<ecb_expect_true> and C<ecb_unlikely> instead of
C<ecb_expect_false> - these are simply aliases.

A very good example is in a function that reserves more space for some
memory block (for example, inside an implementation of a string stream) -
each time something is added, you have to check for a buffer overrun, but
you expect that most checks will turn out to be false:

  /* make sure we have "size" extra room in our buffer */
  ecb_inline void
  reserve (int size)
  {
    if (ecb_expect_false (current + size > end))
      real_reserve_method (size); /* presumably noinline */
  }

=item ecb_assume (cond)

Tries to tell the compiler that some condition is true, even if it's not
obvious. This is not a function, but a statement: it cannot be used in
another expression.

This can be used to teach the compiler about invariants or other
conditions that might improve code generation, but which are impossible to
deduce form the code itself.

For example, the example reservation function from the C<ecb_expect_false>
description could be written thus (only C<ecb_assume> was added):

  ecb_inline void
  reserve (int size)
  {
    if (ecb_expect_false (current + size > end))
      real_reserve_method (size); /* presumably noinline */

    ecb_assume (current + size <= end);
  }

If you then call this function twice, like this:

  reserve (10);
  reserve (1);

Then the compiler I<might> be able to optimise out the second call
completely, as it knows that C<< current + 1 > end >> is false and the
call will never be executed.

=item ecb_unreachable ()

This function does nothing itself, except tell the compiler that it will
never be executed. Apart from suppressing a warning in some cases, this
function can be used to implement C<ecb_assume> or similar functionality.

=item ecb_prefetch (addr, rw, locality)

Tells the compiler to try to prefetch memory at the given I<addr>ess
for either reading (I<rw> = 0) or writing (I<rw> = 1). A I<locality> of
C<0> means that there will only be one access later, C<3> means that
the data will likely be accessed very often, and values in between mean
something... in between. The memory pointed to by the address does not
need to be accessible (it could be a null pointer for example), but C<rw>
and C<locality> must be compile-time constants.

This is a statement, not a function: you cannot use it as part of an
expression.

An obvious way to use this is to prefetch some data far away, in a big
array you loop over. This prefetches memory some 128 array elements later,
in the hope that it will be ready when the CPU arrives at that location.

  int sum = 0;

  for (i = 0; i < N; ++i)
    {
      sum += arr [i]
      ecb_prefetch (arr + i + 128, 0, 0);
    }

It's hard to predict how far to prefetch, and most CPUs that can prefetch
are often good enough to predict this kind of behaviour themselves. It
gets more interesting with linked lists, especially when you do some fair
processing on each list element:

  for (node *n = start; n; n = n->next)
    {
      ecb_prefetch (n->next, 0, 0);
      ... do medium amount of work with *n
    }

After processing the node, (part of) the next node might already be in
cache.

=back

=head2 BIT FIDDLING / BIT WIZARDRY

=over

=item bool ecb_big_endian ()

=item bool ecb_little_endian ()

These two functions return true if the byte order is big endian
(most-significant byte first) or little endian (least-significant byte
first) respectively.

On systems that are neither, their return values are unspecified.

=item int ecb_ctz32 (uint32_t x)

=item int ecb_ctz64 (uint64_t x)

=item int ecb_ctz (T x) [C++]

Returns the index of the least significant bit set in C<x> (or
equivalently the number of bits set to 0 before the least significant bit
set), starting from 0. If C<x> is 0 the result is undefined.

For smaller types than C<uint32_t> you can safely use C<ecb_ctz32>.

The overloaded C++ C<ecb_ctz> function supports C<uint8_t>, C<uint16_t>,
C<uint32_t> and C<uint64_t> types.

For example:

  ecb_ctz32 (3) = 0
  ecb_ctz32 (6) = 1

=item int ecb_clz32 (uint32_t x)

=item int ecb_clz64 (uint64_t x)

Counts the number of leading zero bits in C<x>. If C<x> is 0 the result is
undefined.

It is often simpler to use one of the C<ecb_ld*> functions instead, whose
result only depends on the value and not the size of the type. This is
also the reason why there is no C++ overload.

For example:

  ecb_clz32 (3) = 30
  ecb_clz32 (6) = 29

=item bool ecb_is_pot32 (uint32_t x)

=item bool ecb_is_pot64 (uint32_t x)

=item bool ecb_is_pot (T x) [C++]

Returns true iff C<x> is a power of two or C<x == 0>.

For smaller types than C<uint32_t> you can safely use C<ecb_is_pot32>.

The overloaded C++ C<ecb_is_pot> function supports C<uint8_t>, C<uint16_t>,
C<uint32_t> and C<uint64_t> types.

=item int ecb_ld32 (uint32_t x)

=item int ecb_ld64 (uint64_t x)

=item int ecb_ld64 (T x) [C++]

Returns the index of the most significant bit set in C<x>, or the number
of digits the number requires in binary (so that C<< 2**ld <= x <
2**(ld+1) >>). If C<x> is 0 the result is undefined. A common use case is
to compute the integer binary logarithm, i.e. C<floor (log2 (n))>, for
example to see how many bits a certain number requires to be encoded.

This function is similar to the "count leading zero bits" function, except
that that one returns how many zero bits are "in front" of the number (in
the given data type), while C<ecb_ld> returns how many bits the number
itself requires.

For smaller types than C<uint32_t> you can safely use C<ecb_ld32>.

The overloaded C++ C<ecb_ld> function supports C<uint8_t>, C<uint16_t>,
C<uint32_t> and C<uint64_t> types.

=item int ecb_popcount32 (uint32_t x)

=item int ecb_popcount64 (uint64_t x)

=item int ecb_popcount (T x) [C++]

Returns the number of bits set to 1 in C<x>.

For smaller types than C<uint32_t> you can safely use C<ecb_popcount32>.

The overloaded C++ C<ecb_popcount> function supports C<uint8_t>, C<uint16_t>,
C<uint32_t> and C<uint64_t> types.

For example:

  ecb_popcount32 (7) = 3
  ecb_popcount32 (255) = 8

=item uint8_t  ecb_bitrev8  (uint8_t  x)

=item uint16_t ecb_bitrev16 (uint16_t x)

=item uint32_t ecb_bitrev32 (uint32_t x)

=item T ecb_bitrev (T x) [C++]

Reverses the bits in x, i.e. the MSB becomes the LSB, MSB-1 becomes LSB+1
and so on.

The overloaded C++ C<ecb_bitrev> function supports C<uint8_t>, C<uint16_t> and C<uint32_t> types.

Example:

   ecb_bitrev8 (0xa7) = 0xea
   ecb_bitrev32 (0xffcc4411) = 0x882233ff

=item T ecb_bitrev (T x) [C++]

Overloaded C++ bitrev function.

C<T> must be one of C<uint8_t>, C<uint16_t> or C<uint32_t>.

=item uint32_t ecb_bswap16 (uint32_t x)

=item uint32_t ecb_bswap32 (uint32_t x)

=item uint64_t ecb_bswap64 (uint64_t x)

=item T ecb_bswap (T x)

These functions return the value of the 16-bit (32-bit, 64-bit) value
C<x> after reversing the order of bytes (0x11223344 becomes 0x44332211 in
C<ecb_bswap32>).

The overloaded C++ C<ecb_bswap> function supports C<uint8_t>, C<uint16_t>,
C<uint32_t> and C<uint64_t> types.

=item uint8_t  ecb_rotl8  (uint8_t  x, unsigned int count)

=item uint16_t ecb_rotl16 (uint16_t x, unsigned int count)

=item uint32_t ecb_rotl32 (uint32_t x, unsigned int count)

=item uint64_t ecb_rotl64 (uint64_t x, unsigned int count)

=item uint8_t  ecb_rotr8  (uint8_t  x, unsigned int count)

=item uint16_t ecb_rotr16 (uint16_t x, unsigned int count)

=item uint32_t ecb_rotr32 (uint32_t x, unsigned int count)

=item uint64_t ecb_rotr64 (uint64_t x, unsigned int count)

These two families of functions return the value of C<x> after rotating
all the bits by C<count> positions to the right (C<ecb_rotr>) or left
(C<ecb_rotl>). There are no restrictions on the value C<count>, i.e. both
zero and values equal or larger than the word width work correctly. Also,
notwithstanding C<count> being unsigned, negative numbers work and shift
to the opposite direction.

Current GCC/clang versions understand these functions and usually compile
them to "optimal" code (e.g. a single C<rol> or a combination of C<shld>
on x86).

=item T ecb_rotl (T x, unsigned int count) [C++]

=item T ecb_rotr (T x, unsigned int count) [C++]

Overloaded C++ rotl/rotr functions.

C<T> must be one of C<uint8_t>, C<uint16_t>, C<uint32_t> or C<uint64_t>.

=item uint_fast8_t  ecb_gray_encode8  (uint_fast8_t  b)

=item uint_fast16_t ecb_gray_encode16 (uint_fast16_t b)

=item uint_fast32_t ecb_gray_encode32 (uint_fast32_t b)

=item uint_fast64_t ecb_gray_encode64 (uint_fast64_t b)

Encode an unsigned into its corresponding (reflective) gray code - the
kind of gray code meant when just talking about "gray code". These
functions are very fast and all have identical implementation, so there is
no need to use a smaller type, as long as your CPU can handle it natively.

=item T ecb_gray_encode (T b) [C++]

Overloaded C++ version of the above, for C<uint{8,16,32,64}_t>.

=item uint_fast8_t  ecb_gray_decode8  (uint_fast8_t  b)

=item uint_fast16_t ecb_gray_decode16 (uint_fast16_t b)

=item uint_fast32_t ecb_gray_decode32 (uint_fast32_t b)

=item uint_fast64_t ecb_gray_decode64 (uint_fast64_t b)

Decode a gray code back into linear index form (the reverse of
C<ecb_gray*_encode>. Unlike the encode functions, the decode functions
have higher time complexity for larger types, so it can pay off to use a
smaller type here.

=item T ecb_gray_decode (T b) [C++]

Overloaded C++ version of the above, for C<uint{8,16,32,64}_t>.

=back

=head2 HILBERT CURVES

These functions deal with (square, pseudo) Hilbert curves. The parameter
I<order> indicates the size of the square and is specified in bits, that
means for order C<8>, the coordinates range from C<0>..C<255>, and the
curve index ranges from C<0>..C<65535>.

The 32 bit variants of these functions map a 32 bit index to two 16 bit
coordinates, stored in a 32 bit variable, where the high order bits are
the x-coordinate, and the low order bits are the y-coordinate, thus,
these functions map 32 bit linear index on the curve to a 32 bit packed
coordinate pair, and vice versa.

The 64 bit variants work similarly.

The I<order> can go from C<1> to C<16> for the 32 bit curve, and C<1> to
C<32> for the 64 bit curve.

When going from one order to the next higher order, these functions
replace the curve segments by smaller versions of the generating shape,
while doubling the size (since they use integer coordinates), which is
what you would expect mathematically. This means that the curve will be
mirrored at the diagonal. If your goal is to simply cover more area while
retaining existing point coordinates you should increase or decrease the
I<order> by C<2> or, in the case of C<ecb_hilbert2d_index_to_coord>,
simply specify the maximum I<order> of C<16> or C<32>, respectively, as
these are constant-time.

=over

=item uint32_t ecb_hilbert2d_index_to_coord32 (int order, uint32_t index)

=item uint64_t ecb_hilbert2d_index_to_coord64 (int order, uint64_t index)

Map a point on a pseudo Hilbert curve from its linear distance from the
origin on the curve to a x|y coordinate pair. The result is a packed
coordinate pair, to get the actual x and < coordinates, you could do
something like this:

   uint32_t xy = ecb_hilbert2d_index_to_coord32 (16, 255);
   uint16_t x = xy >> 16;
   uint16_t y = xy & 0xffffU;

   uint64_t xy = ecb_hilbert2d_index_to_coord64 (32, 255);
   uint32_t x = xy >> 32;
   uint32_t y = xy & 0xffffffffU;

These functions work in constant time, so for many applications it is
preferable to simply hard-code the order to the maximum (C<16> or C<32>).

This (production-ready, i.e. never run) example generates an SVG image of
an order 8 pseudo Hilbert curve:

   printf ("<svg xmlns='http://www.w3.org/2000/svg' width='%d' height='%d'>\n", 64 * 8, 64 * 8);
   printf ("<g transform='translate(4) scale(8)' stroke-width='0.25' stroke='black'>\n");
   for (uint32_t i = 0; i < 64*64 - 1; ++i)
     {
       uint32_t p1 = ecb_hilbert2d_index_to_coord32 (6, i    );
       uint32_t p2 = ecb_hilbert2d_index_to_coord32 (6, i + 1);
       printf ("<line x1='%d' y1='%d' x2='%d' y2='%d'/>\n",
         p1 >> 16, p1 & 0xffff,
         p2 >> 16, p2 & 0xffff);
     }
   printf ("</g>\n");
   printf ("</svg>\n");

=item uint32_t ecb_hilbert2d_coord_to_index32 (int order, uint32_t xy)

=item uint64_t ecb_hilbert2d_coord_to_index64 (int order, uint64_t xy)

The reverse of C<ecb_hilbert2d_index_to_coord> - map a packed pair of
coordinates to their linear index on the pseudo Hilbert curve of order
I<order>.

They are an exact inverse of the C<ecb_hilbert2d_coord_to_index> functions
for the same I<order>:

   assert (
      u == ecb_hilbert2d_coord_to_index (32,
             ecb_hilbert2d_index_to_coord32 (32,
               u)));

Packing coordinates is done the same way, as well, from I<x> and I<y>:

   uint32_t xy = ((uint32_t)x << 16) | y; // for ecb_hilbert2d_coord_to_index32
   uint64_t xy = ((uint64_t)x << 32) | y; // for ecb_hilbert2d_coord_to_index64

Unlike C<ecb_hilbert2d_coord_to_index>, these functions are O(I<order>),
so it is preferable to use the lowest possible order.

=back

=head2 BIT MIXING, HASHING

Sometimes you have an integer and want to distribute its bits well, for
example, to use it as a hash in a hash table. A common example is pointer
values, which often only have a limited range (e.g. low and high bits are
often zero).

The following functions try to mix the bits to get a good bias-free
distribution. They were mainly made for pointers, but the underlying
integer functions are exposed as well.

As an added benefit, the functions are reversible, so if you find it
convenient to store only the hash value, you can recover the original
pointer from the hash ("unmix"), as long as your pointers are 32 or 64 bit
(if this isn't the case on your platform, drop us a note and we will add
functions for other bit widths).

The unmix functions are very slightly slower than the mix functions, so
it is equally very slightly preferable to store the original values wehen
convenient.

The underlying algorithm if subject to change, so currently these
functions are not suitable for persistent hash tables, as their result
value can change between different versions of libecb.

=over

=item uintptr_t ecb_ptrmix (void *ptr)

Mixes the bits of a pointer so the result is suitable for hash table
lookups. In other words, this hashes the pointer value.

=item uintptr_t ecb_ptrmix (T *ptr) [C++]

Overload the C<ecb_ptrmix> function to work for any pointer in C++.

=item void *ecb_ptrunmix (uintptr_t v)

Unmix the hash value into the original pointer. This only works as long
as the hash value is not truncated, i.e. you used C<uintptr_t> (or
equivalent) throughout to store it.

=item T *ecb_ptrunmix<T> (uintptr_t v) [C++]

The somewhat less useful template version of C<ecb_ptrunmix> for
C++. Example:

   sometype *myptr;
   uintptr_t hash = ecb_ptrmix (myptr);
   sometype *orig = ecb_ptrunmix<sometype> (hash);

=item uint32_t ecb_mix32 (uint32_t v)

=item uint64_t ecb_mix64 (uint64_t v)

Sometimes you don't have a pointer but an integer whose values are very
badly distributed. In this case you can use these integer versions of the
mixing function. No C++ template is provided currently.

=item uint32_t ecb_unmix32 (uint32_t v)

=item uint64_t ecb_unmix64 (uint64_t v)

The reverse of the C<ecb_mix> functions - they take a mixed/hashed value
and recover the original value.

=back

=head2 HOST ENDIANNESS CONVERSION

=over

=item uint_fast16_t ecb_be_u16_to_host (uint_fast16_t v)

=item uint_fast32_t ecb_be_u32_to_host (uint_fast32_t v)

=item uint_fast64_t ecb_be_u64_to_host (uint_fast64_t v)

=item uint_fast16_t ecb_le_u16_to_host (uint_fast16_t v)

=item uint_fast32_t ecb_le_u32_to_host (uint_fast32_t v)

=item uint_fast64_t ecb_le_u64_to_host (uint_fast64_t v)

Convert an unsigned 16, 32 or 64 bit value from big or little endian to host byte order.

The naming convention is C<ecb_>(C<be>|C<le>)C<_u>C<16|32|64>C<_to_host>,
where C<be> and C<le> stand for big endian and little endian, respectively.

=item uint_fast16_t ecb_host_to_be_u16 (uint_fast16_t v)

=item uint_fast32_t ecb_host_to_be_u32 (uint_fast32_t v)

=item uint_fast64_t ecb_host_to_be_u64 (uint_fast64_t v)

=item uint_fast16_t ecb_host_to_le_u16 (uint_fast16_t v)

=item uint_fast32_t ecb_host_to_le_u32 (uint_fast32_t v)

=item uint_fast64_t ecb_host_to_le_u64 (uint_fast64_t v)

Like above, but converts I<from> host byte order to the specified
endianness.

=back

In C++ the following additional template functions are supported:

=over

=item T ecb_be_to_host (T v)

=item T ecb_le_to_host (T v)

=item T ecb_host_to_be (T v)

=item T ecb_host_to_le (T v)

=back

These functions work like their C counterparts, above, but use templates,
which make them useful in generic code.

C<T> must be one of C<uint8_t>, C<uint16_t>, C<uint32_t> or C<uint64_t>
(so unlike their C counterparts, there is a version for C<uint8_t>, which
again can be useful in generic code).

=head2 UNALIGNED LOAD/STORE

These function load or store unaligned multi-byte values.

=over

=item uint_fast16_t ecb_peek_u16_u (const void *ptr)

=item uint_fast32_t ecb_peek_u32_u (const void *ptr)

=item uint_fast64_t ecb_peek_u64_u (const void *ptr)

These functions load an unaligned, unsigned 16, 32 or 64 bit value from
memory.

=item uint_fast16_t ecb_peek_be_u16_u (const void *ptr)

=item uint_fast32_t ecb_peek_be_u32_u (const void *ptr)

=item uint_fast64_t ecb_peek_be_u64_u (const void *ptr)

=item uint_fast16_t ecb_peek_le_u16_u (const void *ptr)

=item uint_fast32_t ecb_peek_le_u32_u (const void *ptr)

=item uint_fast64_t ecb_peek_le_u64_u (const void *ptr)

Like above, but additionally convert from big endian (C<be>) or little
endian (C<le>) byte order to host byte order while doing so.

=item ecb_poke_u16_u (void *ptr, uint16_t v)

=item ecb_poke_u32_u (void *ptr, uint32_t v)

=item ecb_poke_u64_u (void *ptr, uint64_t v)

These functions store an unaligned, unsigned 16, 32 or 64 bit value to
memory.

=item ecb_poke_be_u16_u (void *ptr, uint_fast16_t v)

=item ecb_poke_be_u32_u (void *ptr, uint_fast32_t v)

=item ecb_poke_be_u64_u (void *ptr, uint_fast64_t v)

=item ecb_poke_le_u16_u (void *ptr, uint_fast16_t v)

=item ecb_poke_le_u32_u (void *ptr, uint_fast32_t v)

=item ecb_poke_le_u64_u (void *ptr, uint_fast64_t v)

Like above, but additionally convert from host byte order to big endian
(C<be>) or little endian (C<le>) byte order while doing so.

=back

In C++ the following additional template functions are supported:

=over

=item T ecb_peek<T>      (const void *ptr)

=item T ecb_peek_be<T>   (const void *ptr)

=item T ecb_peek_le<T>   (const void *ptr)

=item T ecb_peek_u<T>    (const void *ptr)

=item T ecb_peek_be_u<T> (const void *ptr)

=item T ecb_peek_le_u<T> (const void *ptr)

Similarly to their C counterparts, these functions load an unsigned 8, 16,
32 or 64 bit value from memory, with optional conversion from big/little
endian.

Since the type cannot be deduced, it has to be specified explicitly, e.g.

   uint_fast16_t v = ecb_peek<uint16_t> (ptr);

C<T> must be one of C<uint8_t>, C<uint16_t>, C<uint32_t> or C<uint64_t>.

Unlike their C counterparts, these functions support 8 bit quantities
(C<uint8_t>) and also have an aligned version (without the C<_u> prefix),
all of which hopefully makes them more useful in generic code.

=item ecb_poke      (void *ptr, T v)

=item ecb_poke_be   (void *ptr, T v)

=item ecb_poke_le   (void *ptr, T v)

=item ecb_poke_u    (void *ptr, T v)

=item ecb_poke_be_u (void *ptr, T v)

=item ecb_poke_le_u (void *ptr, T v)

Again, similarly to their C counterparts, these functions store an
unsigned 8, 16, 32 or 64 bit value to memory, with optional conversion to
big/little endian.

C<T> must be one of C<uint8_t>, C<uint16_t>, C<uint32_t> or C<uint64_t>.

Unlike their C counterparts, these functions support 8 bit quantities
(C<uint8_t>) and also have an aligned version (without the C<_u> prefix),
all of which hopefully makes them more useful in generic code.

=back

=head2 FAST INTEGER TO STRING

Libecb defines a set of very fast integer to decimal string (or integer
to ASCII, short C<i2a>) functions.  These work by converting the integer
to a fixed point representation and then successively multiplying out
the topmost digits. Unlike some other, also very fast, libraries, ecb's
algorithm should be completely branchless per digit, and does not rely on
the presence of special CPU functions (such as C<clz>).

There is a high level API that takes an C<int32_t>, C<uint32_t>,
C<int64_t> or C<uint64_t> as argument, and a low-level API, which is
harder to use but supports slightly more formatting options.

=head3 HIGH LEVEL API

The high level API consists of four functions, one each for C<int32_t>,
C<uint32_t>, C<int64_t> and C<uint64_t>:

Example:

   char buf[ECB_I2A_MAX_DIGITS + 1];
   char *end = ecb_i2a_i32 (buf, 17262);
   *end = 0;
   // buf now contains "17262"

=over

=item ECB_I2A_I32_DIGITS (=11)

=item char *ecb_i2a_u32 (char *ptr, uint32_t value)

Takes an C<uint32_t> I<value> and formats it as a decimal number starting
at I<ptr>, using at most C<ECB_I2A_I32_DIGITS> characters. Returns a
pointer to just after the generated string, where you would normally put
the terminating C<0> character. This function outputs the minimum number
of digits.

=item ECB_I2A_U32_DIGITS (=10)

=item char *ecb_i2a_i32 (char *ptr, int32_t value)

Same as C<ecb_i2a_u32>, but formats a C<int32_t> value, including a minus
sign if needed.

=item ECB_I2A_I64_DIGITS (=20)

=item char *ecb_i2a_u64 (char *ptr, uint64_t value)

=item ECB_I2A_U64_DIGITS (=21)

=item char *ecb_i2a_i64 (char *ptr, int64_t value)

Similar to their 32 bit counterparts, these take a 64 bit argument.

=item ECB_I2A_MAX_DIGITS (=21)

Instead of using a type specific length macro, you can just use
C<ECB_I2A_MAX_DIGITS>, which is good enough for any C<ecb_i2a> function.

=back

=head3 LOW-LEVEL API

The functions above use a number of low-level APIs which have some strict
limitations, but can be used as building blocks (studying C<ecb_i2a_i32>
and related functions is recommended).

There are three families of functions: functions that convert a number
to a fixed number of digits with leading zeroes (C<ecb_i2a_0N>, C<0>
for "leading zeroes"), functions that generate up to N digits, skipping
leading zeroes (C<_N>), and functions that can generate more digits, but
the leading digit has limited range (C<_xN>).

None of the functions deal with negative numbers.

Example: convert an IP address in an C<uint32_t> into dotted-quad:

   uint32_t ip = 0x0a000164; // 10.0.1.100
   char ips[3 * 4 + 3 + 1];
   char *ptr = ips;
   ptr = ecb_i2a_3 (ptr,  ip >> 24        ); *ptr++ = '.';
   ptr = ecb_i2a_3 (ptr, (ip >> 16) & 0xff); *ptr++ = '.';
   ptr = ecb_i2a_3 (ptr, (ip >>  8) & 0xff); *ptr++ = '.';
   ptr = ecb_i2a_3 (ptr,  ip        & 0xff); *ptr++ = 0;
   printf ("ip: %s\n", ips); // prints "ip: 10.0.1.100"

=over

=item char *ecb_i2a_02  (char *ptr, uint32_t value) // 32 bit

=item char *ecb_i2a_03  (char *ptr, uint32_t value) // 32 bit

=item char *ecb_i2a_04  (char *ptr, uint32_t value) // 32 bit

=item char *ecb_i2a_05  (char *ptr, uint32_t value) // 64 bit

=item char *ecb_i2a_06  (char *ptr, uint32_t value) // 64 bit

=item char *ecb_i2a_07  (char *ptr, uint32_t value) // 64 bit

=item char *ecb_i2a_08  (char *ptr, uint32_t value) // 64 bit

=item char *ecb_i2a_09  (char *ptr, uint32_t value) // 64 bit

The C<< ecb_i2a_0I<N> >> functions take an unsigned I<value> and convert
them to exactly I<N> digits, returning a pointer to the first character
after the digits. The I<value> must be in range. The functions marked with
I<32 bit> do their calculations internally in 32 bit, the ones marked with
I<64 bit> internally use 64 bit integers, which might be slow on 32 bit
architectures (the high level API decides on 32 vs. 64 bit versions using
C<ECB_64BIT_NATIVE>).

=item char *ecb_i2a_2   (char *ptr, uint32_t value) // 32 bit

=item char *ecb_i2a_3   (char *ptr, uint32_t value) // 32 bit

=item char *ecb_i2a_4   (char *ptr, uint32_t value) // 32 bit

=item char *ecb_i2a_5   (char *ptr, uint32_t value) // 64 bit

=item char *ecb_i2a_6   (char *ptr, uint32_t value) // 64 bit

=item char *ecb_i2a_7   (char *ptr, uint32_t value) // 64 bit

=item char *ecb_i2a_8   (char *ptr, uint32_t value) // 64 bit

=item char *ecb_i2a_9   (char *ptr, uint32_t value) // 64 bit

Similarly, the C<< ecb_i2a_I<N> >> functions take an unsigned I<value>
and convert them to at most I<N> digits, suppressing leading zeroes, and
returning a pointer to the first character after the digits.

=item ECB_I2A_MAX_X5 (=59074)

=item char *ecb_i2a_x5  (char *ptr, uint32_t value) // 32 bit

=item ECB_I2A_MAX_X10 (=2932500665)

=item char *ecb_i2a_x10 (char *ptr, uint32_t value) // 64 bit

The C<< ecb_i2a_xI<N> >> functions are similar to the C<< ecb_i2a_I<N> >>
functions, but they can generate one digit more, as long as the number
is within range, which is given by the symbols C<ECB_I2A_MAX_X5> (almost
16 bit range) and C<ECB_I2A_MAX_X10> (a bit more than 31 bit range),
respectively.

For example, the digit part of a 32 bit signed integer just fits into the
C<ECB_I2A_MAX_X10> range, so while C<ecb_i2a_x10> cannot convert a 10
digit number, it can convert all 32 bit signed numbers. Sadly, it's not
good enough for 32 bit unsigned numbers.

=back

=head2 FLOATING POINT FIDDLING

=over

=item ECB_INFINITY [-UECB_NO_LIBM]

Evaluates to positive infinity if supported by the platform, otherwise to
a truly huge number.

=item ECB_NAN [-UECB_NO_LIBM]

Evaluates to a quiet NAN if supported by the platform, otherwise to
C<ECB_INFINITY>.

=item float ecb_ldexpf (float x, int exp) [-UECB_NO_LIBM]

Same as C<ldexpf>, but always available.

=item uint32_t ecb_float_to_binary16  (float  x) [-UECB_NO_LIBM]

=item uint32_t ecb_float_to_binary32  (float  x) [-UECB_NO_LIBM]

=item uint64_t ecb_double_to_binary64 (double x) [-UECB_NO_LIBM]

These functions each take an argument in the native C<float> or C<double>
type and return the IEEE 754 bit representation of it (binary16/half,
binary32/single or binary64/double precision).

The bit representation is just as IEEE 754 defines it, i.e. the sign bit
will be the most significant bit, followed by exponent and mantissa.

This function should work even when the native floating point format isn't
IEEE compliant, of course at a speed and code size penalty, and of course
also within reasonable limits (it tries to convert NaNs, infinities and
denormals, but will likely convert negative zero to positive zero).

On all modern platforms (where C<ECB_STDFP> is true), the compiler should
be able to completely optimise away the 32 and 64 bit functions.

These functions can be helpful when serialising floats to the network - you
can serialise the return value like a normal uint16_t/uint32_t/uint64_t.

Another use for these functions is to manipulate floating point values
directly.

Silly example: toggle the sign bit of a float.

   /* On gcc-4.7 on amd64, */
   /* this results in a single add instruction to toggle the bit, and 4 extra */
   /* instructions to move the float value to an integer register and back. */

   x = ecb_binary32_to_float (ecb_float_to_binary32 (x) ^ 0x80000000U)

=item float  ecb_binary16_to_float  (uint16_t x) [-UECB_NO_LIBM]

=item float  ecb_binary32_to_float  (uint32_t x) [-UECB_NO_LIBM]

=item double ecb_binary64_to_double (uint64_t x) [-UECB_NO_LIBM]

The reverse operation of the previous function - takes the bit
representation of an IEEE binary16, binary32 or binary64 number (half,
single or double precision) and converts it to the native C<float> or
C<double> format.

This function should work even when the native floating point format isn't
IEEE compliant, of course at a speed and code size penalty, and of course
also within reasonable limits (it tries to convert normals and denormals,
and might be lucky for infinities, and with extraordinary luck, also for
negative zero).

On all modern platforms (where C<ECB_STDFP> is true), the compiler should
be able to optimise away this function completely.

=item uint16_t ecb_binary32_to_binary16 (uint32_t x)

=item uint32_t ecb_binary16_to_binary32 (uint16_t x)

Convert a IEEE binary32/single precision to binary16/half format, and vice
versa, handling all details (round-to-nearest-even, subnormals, infinity
and NaNs) correctly.

These are functions are available under C<-DECB_NO_LIBM>, since
they do not rely on the platform floating point format. The
C<ecb_float_to_binary16> and C<ecb_binary16_to_float> functions are
usually what you want.

=back

=head2 ARITHMETIC

=over

=item x = ecb_mod (m, n)

Returns C<m> modulo C<n>, which is the same as the positive remainder
of the division operation between C<m> and C<n>, using floored
division. Unlike the C remainder operator C<%>, this function ensures that
the return value is always positive and that the two numbers I<m> and
I<m' = m + i * n> result in the same value modulo I<n> - in other words,
C<ecb_mod> implements the mathematical modulo operation, which is missing
in the language.

C<n> must be strictly positive (i.e. C<< >= 1 >>), while C<m> must be
negatable, that is, both C<m> and C<-m> must be representable in its
type (this typically excludes the minimum signed integer value, the same
limitation as for C</> and C<%> in C).

Current GCC/clang versions compile this into an efficient branchless
sequence on almost all CPUs.

For example, when you want to rotate forward through the members of an
array for increasing C<m> (which might be negative), then you should use
C<ecb_mod>, as the C<%> operator might give either negative results, or
change direction for negative values:

   for (m = -100; m <= 100; ++m)
     int elem = myarray [ecb_mod (m, ecb_array_length (myarray))];

=item x = ecb_div_rd (val, div)

=item x = ecb_div_ru (val, div)

Returns C<val> divided by C<div> rounded down or up, respectively.
C<val> and C<div> must have integer types and C<div> must be strictly
positive. Note that these functions are implemented with macros in C
and with function templates in C++.

=back

=head2 UTILITY

=over

=item element_count = ecb_array_length (name)

Returns the number of elements in the array C<name>. For example:

  int primes[] = { 2, 3, 5, 7, 11 };
  int sum = 0;

  for (i = 0; i < ecb_array_length (primes); i++)
    sum += primes [i];

=back

=head2 SYMBOLS GOVERNING COMPILATION OF ECB.H ITSELF

These symbols need to be defined before including F<ecb.h> the first time.

=over

=item ECB_NO_THREADS

If F<ecb.h> is never used from multiple threads, then this symbol can
be defined, in which case memory fences (and similar constructs) are
completely removed, leading to more efficient code and fewer dependencies.

Setting this symbol to a true value implies C<ECB_NO_SMP>.

=item ECB_NO_SMP

The weaker version of C<ECB_NO_THREADS> - if F<ecb.h> is used from
multiple threads, but never concurrently (e.g. if the system the program
runs on has only a single CPU with a single core, no hyper-threading and so
on), then this symbol can be defined, leading to more efficient code and
fewer dependencies.

=item ECB_NO_LIBM

When defined to C<1>, do not export any functions that might introduce
dependencies on the math library (usually called F<-lm>) - these are
marked with [-UECB_NO_LIBM].

=back

=head1 UNDOCUMENTED FUNCTIONALITY

F<ecb.h> is full of undocumented functionality as well, some of which is
intended to be internal-use only, some of which we forgot to document, and
some of which we hide because we are not sure we will keep the interface
stable.

While you are welcome to rummage around and use whatever you find useful
(we don't want to stop you), keep in mind that we will change undocumented
functionality in incompatible ways without thinking twice, while we are
considerably more conservative with documented things.

=head1 AUTHORS

C<libecb> is designed and maintained by:

   Emanuele Giaquinta <e.giaquinta@glauco.it>
   Marc Alexander Lehmann <schmorp@schmorp.de>
Revision:	1.107
Committed:	Fri Mar 25 15:28:08 2022 UTC (2 years, 4 months ago) by root
Branch:	MAIN
CVS Tags:	HEAD
Changes since 1.106:	+8 -8 lines
Log Message:	* empty log message *