--- libecb/ecb.pod 2021/07/31 16:13:30 1.94 +++ libecb/ecb.pod 2021/11/22 17:15:50 1.101 @@ -734,7 +734,9 @@ These two families of functions return the value of C after rotating all the bits by C positions to the right (C) or left (C). There are no restrictions on the value C, i.e. both -zero and values equal or larger than the word width work correctly. +zero and values equal or larger than the word width work correctly. Also, +notwithstanding C being unsigned, negative numbers work and shift +to the opposite direction. Current GCC/clang versions understand these functions and usually compile them to "optimal" code (e.g. a single C or a combination of C @@ -750,6 +752,74 @@ =back +=head2 BIT MIXING, HASHING + +Sometimes you have an integer and want to distribute its bits well, for +example, to use it as a hash in a hashtable. A common example is pointer +values, which often only have a limited range (e.g. low and high bits are +often zero). + +The following functions try to mix the bits to get a good bias-free +distribution. They were mainly made for pointers, but the underlying +integer functions are exposed as well. + +As an added benefit, the functions are reversible, so if you find it +convenient to store only the hash value, you can recover the original +pointer from the hash ("unmix"), as long as your pinters are 32 or 64 bit +(if this isn't the case on your platform, drop us a note and we will add +functions for other bit widths). + +The unmix functions are very slightly slower than the mix functions, so +it is equally very slightly preferable to store the original values wehen +convenient. + +The underlying algorithm if subject to change, so currently these +functions are not suitable for persistent hash tables, as their result +value can change between diferent versions of libecb. + +=over + +=item uintptr_t ecb_ptrmix (void *ptr) + +Mixes the bits of a pointer so the result is suitable for hash table +lookups. In other words, this hashes the pointer value. + +=item uintptr_t ecb_ptrmix (T *ptr) [C++] + +Overload the C function to work for any pointer in C++. + +=item void *ecb_ptrunmix (uintptr_t v) + +Unmix the hash value into the original pointer. This only works as long +as the hash value is not truncated, i.e. you used C (or +equivalent) throughout to store it. + +=item T *ecb_ptrunmix (uintptr_t v) [C++] + +The somewhat less useful template version of C for +C++. Example: + + sometype *myptr; + uintptr_t hash = ecb_ptrmix (myptr); + sometype *orig = ecb_ptrunmix (hash); + +=item uint32_t ecb_mix32 (uint32_t v) + +=item uint64_t ecb_mix64 (uint64_t v) + +Sometimes you don't have a pointer but an integer whose values are very +badly distributed. In this case you cna sue these integer versions of the +mixing function. No C++ template is provided currently. + +=item uint32_t ecb_unmix32 (uint32_t v) + +=item uint64_t ecb_unmix64 (uint64_t v) + +The reverse of the C functions - they take a mixed/hashed value +and recover the original value. + +=back + =head2 HOST ENDIANNESS CONVERSION =over @@ -975,7 +1045,7 @@ =item ECB_I2A_MAX_DIGITS (=21) -Instead of using a type specific length macro, youi can just use +Instead of using a type specific length macro, you can just use C, which is good enough for any C function. =back @@ -983,7 +1053,7 @@ =head3 LOW-LEVEL API The functions above use a number of low-level APIs which have some strict -limitations, but can be used as building blocks (study of C +limitations, but can be used as building blocks (studying C and related functions is recommended). There are three families of functions: functions that convert a number @@ -1023,7 +1093,7 @@ =item char *ecb_i2a_09 (char *ptr, uint32_t value) // 64 bit -The C<< ecb_i2a_0I > functions take an unsigned I and convert +The C<< ecb_i2a_0I >> functions take an unsigned I and convert them to exactly I digits, returning a pointer to the first character after the digits. The I must be in range. The functions marked with I<32 bit> do their calculations internally in 32 bit, the ones marked with @@ -1047,7 +1117,7 @@ =item char *ecb_i2a_9 (char *ptr, uint32_t value) // 64 bit -Similarly, the C<< ecb_i2a_I > functions take an unsigned I +Similarly, the C<< ecb_i2a_I >> functions take an unsigned I and convert them to at most I digits, suppressing leading zeroes, and returning a pointer to the first character after the digits. @@ -1059,7 +1129,7 @@ =item char *ecb_i2a_x10 (char *ptr, uint32_t value) // 64 bit -The C<< ecb_i2a_xI >> functions are similar to the C<< ecb_i2a_I > +The C<< ecb_i2a_xI >> functions are similar to the C<< ecb_i2a_I >> functions, but they can generate one digit more, as long as the number is within range, which is given by the symbols C (almost 16 bit range) and C (a bit more than 31 bit range), @@ -1109,7 +1179,7 @@ denormals, but will likely convert negative zero to positive zero). On all modern platforms (where C is true), the compiler should -be able to optimise away this function completely. +be able to completely optimise away the 32 and 64 bit functions. These functions can be helpful when serialising floats to the network - you can serialise the return value like a normal uint16_t/uint32_t/uint64_t. @@ -1255,7 +1325,7 @@ stable. While you are welcome to rummage around and use whatever you find useful -(we can't stop you), keep in mind that we will change undocumented +(we don't want to stop you), keep in mind that we will change undocumented functionality in incompatible ways without thinking twice, while we are considerably more conservative with documented things. @@ -1265,5 +1335,3 @@ Emanuele Giaquinta Marc Alexander Lehmann - -