… | |
… | |
12 | |
12 | |
13 | http://software.schmorp.de/pkg/libecb |
13 | http://software.schmorp.de/pkg/libecb |
14 | |
14 | |
15 | It mainly provides a number of wrappers around many compiler built-ins, |
15 | It mainly provides a number of wrappers around many compiler built-ins, |
16 | together with replacement functions for other compilers. In addition |
16 | together with replacement functions for other compilers. In addition |
17 | to this, it provides a number of other lowlevel C utilities, such as |
17 | to this, it provides a number of other low-level C utilities, such as |
18 | endianness detection, byte swapping or bit rotations. |
18 | endianness detection, byte swapping or bit rotations. |
19 | |
19 | |
20 | Or in other words, things that should be built into any standard C |
20 | Or in other words, things that should be built into any standard C |
21 | system, but aren't, implemented as efficient as possible with GCC (clang, |
21 | system, but aren't, implemented as efficient as possible with GCC (clang, |
22 | msvc...), and still correct with other compilers. |
22 | MSVC...), and still correct with other compilers. |
23 | |
23 | |
24 | More might come. |
24 | More might come. |
25 | |
25 | |
26 | =head2 ABOUT THE HEADER |
26 | =head2 ABOUT THE HEADER |
27 | |
27 | |
… | |
… | |
56 | is usually implemented as a macro. Specifically, a "bool" in this manual |
56 | is usually implemented as a macro. Specifically, a "bool" in this manual |
57 | refers to any kind of boolean value, not a specific type. |
57 | refers to any kind of boolean value, not a specific type. |
58 | |
58 | |
59 | =head2 TYPES / TYPE SUPPORT |
59 | =head2 TYPES / TYPE SUPPORT |
60 | |
60 | |
61 | ecb.h makes sure that the following types are defined (in the expected way): |
61 | F<ecb.h> makes sure that the following types are defined (in the expected way): |
62 | |
62 | |
63 | int8_t uint8_ |
63 | int8_t uint8_ |
64 | int16_t uint16_t |
64 | int16_t uint16_t |
65 | int32_t uint32_ |
65 | int32_t uint32_ |
66 | int64_t uint64_t |
66 | int64_t uint64_t |
… | |
… | |
170 | |
170 | |
171 | Evaluates to a true value (suitable for both preprocessor and C code |
171 | Evaluates to a true value (suitable for both preprocessor and C code |
172 | testing) if 64 bit integer types on this architecture are evaluated |
172 | testing) if 64 bit integer types on this architecture are evaluated |
173 | "natively", that is, with similar speeds as 32 bit integers. While 64 bit |
173 | "natively", that is, with similar speeds as 32 bit integers. While 64 bit |
174 | integer support is very common (and in fact required by libecb), 32 bit |
174 | integer support is very common (and in fact required by libecb), 32 bit |
175 | cpus have to emulate operations on them, so you might want to avoid them. |
175 | CPUs have to emulate operations on them, so you might want to avoid them. |
176 | |
176 | |
177 | =item ECB_AMD64, ECB_AMD64_X32 |
177 | =item ECB_AMD64, ECB_AMD64_X32 |
178 | |
178 | |
179 | These two macros are defined to C<1> on the x86_64/amd64 ABI and the X32 |
179 | These two macros are defined to C<1> on the x86_64/amd64 ABI and the X32 |
180 | ABI, respectively, and undefined elsewhere. |
180 | ABI, respectively, and undefined elsewhere. |
… | |
… | |
274 | |
274 | |
275 | Expands either to (a compiler-specific equivalent of) C<static inline> or |
275 | Expands either to (a compiler-specific equivalent of) C<static inline> or |
276 | to just C<static>, if inline isn't supported. It should be used to declare |
276 | to just C<static>, if inline isn't supported. It should be used to declare |
277 | functions that should be inlined, for code size or speed reasons. |
277 | functions that should be inlined, for code size or speed reasons. |
278 | |
278 | |
279 | Example: inline this function, it surely will reduce codesize. |
279 | Example: inline this function, it surely will reduce code size. |
280 | |
280 | |
281 | ecb_inline int |
281 | ecb_inline int |
282 | negmul (int a, int b) |
282 | negmul (int a, int b) |
283 | { |
283 | { |
284 | return - (a * b); |
284 | return - (a * b); |
… | |
… | |
384 | speed-critical times, and keeping it in the cache might be a waste of said |
384 | speed-critical times, and keeping it in the cache might be a waste of said |
385 | cache. |
385 | cache. |
386 | |
386 | |
387 | In addition to placing cold functions together (or at least away from hot |
387 | In addition to placing cold functions together (or at least away from hot |
388 | functions), this knowledge can be used in other ways, for example, the |
388 | functions), this knowledge can be used in other ways, for example, the |
389 | function will be optimised for size, as opposed to speed, and codepaths |
389 | function will be optimised for size, as opposed to speed, and code paths |
390 | leading to calls to those functions can automatically be marked as if |
390 | leading to calls to those functions can automatically be marked as if |
391 | C<ecb_expect_false> had been used to reach them. |
391 | C<ecb_expect_false> had been used to reach them. |
392 | |
392 | |
393 | Good examples for such functions would be error reporting functions, or |
393 | Good examples for such functions would be error reporting functions, or |
394 | functions only called in exceptional or rare cases. |
394 | functions only called in exceptional or rare cases. |
… | |
… | |
548 | never be executed. Apart from suppressing a warning in some cases, this |
548 | never be executed. Apart from suppressing a warning in some cases, this |
549 | function can be used to implement C<ecb_assume> or similar functionality. |
549 | function can be used to implement C<ecb_assume> or similar functionality. |
550 | |
550 | |
551 | =item ecb_prefetch (addr, rw, locality) |
551 | =item ecb_prefetch (addr, rw, locality) |
552 | |
552 | |
553 | Tells the compiler to try to prefetch memory at the given C<addr>ess |
553 | Tells the compiler to try to prefetch memory at the given I<addr>ess |
554 | for either reading (C<rw> = 0) or writing (C<rw> = 1). A C<locality> of |
554 | for either reading (I<rw> = 0) or writing (I<rw> = 1). A I<locality> of |
555 | C<0> means that there will only be one access later, C<3> means that |
555 | C<0> means that there will only be one access later, C<3> means that |
556 | the data will likely be accessed very often, and values in between mean |
556 | the data will likely be accessed very often, and values in between mean |
557 | something... in between. The memory pointed to by the address does not |
557 | something... in between. The memory pointed to by the address does not |
558 | need to be accessible (it could be a null pointer for example), but C<rw> |
558 | need to be accessible (it could be a null pointer for example), but C<rw> |
559 | and C<locality> must be compile-time constants. |
559 | and C<locality> must be compile-time constants. |
… | |
… | |
784 | |
784 | |
785 | Overloaded C++ version of the above, for C<uint{8,16,32,64}_t>. |
785 | Overloaded C++ version of the above, for C<uint{8,16,32,64}_t>. |
786 | |
786 | |
787 | =back |
787 | =back |
788 | |
788 | |
|
|
789 | =head2 HILBERT CURVES |
|
|
790 | |
|
|
791 | These functions deal with (square, pseudo) Hilbert curves. The parameter |
|
|
792 | I<order> indicates the size of the square and is specified in bits, that |
|
|
793 | means for order C<8>, the coordinates range from C<0>..C<255>, and the |
|
|
794 | curve index ranges from C<0>..C<65535>. |
|
|
795 | |
|
|
796 | The 32 bit variants of these functions map a 32 bit index to two 16 bit |
|
|
797 | coordinates, stored in a 32 bit variable, where the high order bits are |
|
|
798 | the x-coordinate, and the low order bits are the y-coordinate, thus, |
|
|
799 | these functions map 32 bit linear index on the curve to a 32 bit packed |
|
|
800 | coordinate pair, and vice versa. |
|
|
801 | |
|
|
802 | The 64 bit variants work similarly. |
|
|
803 | |
|
|
804 | The I<order> can go from C<1> to C<16> for the 32 bit curve, and C<1> to |
|
|
805 | C<32> for the 64 bit curve. |
|
|
806 | |
|
|
807 | When going from one order to the next higher order, these functions |
|
|
808 | replace the curve segments by smaller versions of the generating shape, |
|
|
809 | while doubling the size (since they use integer coordinates), which is |
|
|
810 | what you would expect mathematically. This means that the curve will be |
|
|
811 | mirrored at the diagonal. If your goal is to simply cover more area while |
|
|
812 | retaining existing point coordinates you should increase or decrease the |
|
|
813 | I<order> by C<2> or, in the case of C<ecb_hilbert2d_index_to_coord>, |
|
|
814 | simply specify the maximum I<order> of C<16> or C<32>, respectively, as |
|
|
815 | these are constant-time. |
|
|
816 | |
|
|
817 | =over |
|
|
818 | |
|
|
819 | =item uint32_t ecb_hilbert2d_index_to_coord32 (int order, uint32_t index) |
|
|
820 | |
|
|
821 | =item uint64_t ecb_hilbert2d_index_to_coord64 (int order, uint64_t index) |
|
|
822 | |
|
|
823 | Map a point on a pseudo Hilbert curve from its linear distance from the |
|
|
824 | origin on the curve to a x|y coordinate pair. The result is a packed |
|
|
825 | coordinate pair, to get the actual x and < coordinates, you could do |
|
|
826 | something like this: |
|
|
827 | |
|
|
828 | uint32_t xy = ecb_hilbert2d_index_to_coord32 (16, 255); |
|
|
829 | uint16_t x = xy >> 16; |
|
|
830 | uint16_t y = xy & 0xffffU; |
|
|
831 | |
|
|
832 | uint64_t xy = ecb_hilbert2d_index_to_coord64 (32, 255); |
|
|
833 | uint32_t x = xy >> 32; |
|
|
834 | uint32_t y = xy & 0xffffffffU; |
|
|
835 | |
|
|
836 | These functions work in constant time, so for many applications it is |
|
|
837 | preferable to simply hard-code the order to the maximum (C<16> or C<32>). |
|
|
838 | |
|
|
839 | This (production-ready, i.e. never run) example generates an SVG image of |
|
|
840 | an order 8 pseudo Hilbert curve: |
|
|
841 | |
|
|
842 | printf ("<svg xmlns='http://www.w3.org/2000/svg' width='%d' height='%d'>\n", 64 * 8, 64 * 8); |
|
|
843 | printf ("<g transform='translate(4) scale(8)' stroke-width='0.25' stroke='black'>\n"); |
|
|
844 | for (uint32_t i = 0; i < 64*64 - 1; ++i) |
|
|
845 | { |
|
|
846 | uint32_t p1 = ecb_hilbert2d_index_to_coord32 (6, i ); |
|
|
847 | uint32_t p2 = ecb_hilbert2d_index_to_coord32 (6, i + 1); |
|
|
848 | printf ("<line x1='%d' y1='%d' x2='%d' y2='%d'/>\n", |
|
|
849 | p1 >> 16, p1 & 0xffff, |
|
|
850 | p2 >> 16, p2 & 0xffff); |
|
|
851 | } |
|
|
852 | printf ("</g>\n"); |
|
|
853 | printf ("</svg>\n"); |
|
|
854 | |
|
|
855 | =item uint32_t ecb_hilbert2d_coord_to_index32 (int order, uint32_t xy) |
|
|
856 | |
|
|
857 | =item uint64_t ecb_hilbert2d_coord_to_index64 (int order, uint64_t xy) |
|
|
858 | |
|
|
859 | The reverse of C<ecb_hilbert2d_index_to_coord> - map a packed pair of |
|
|
860 | coordinates to their linear index on the pseudo Hilbert curve of order |
|
|
861 | I<order>. |
|
|
862 | |
|
|
863 | They are an exact inverse of the C<ecb_hilbert2d_coord_to_index> functions |
|
|
864 | for the same I<order>: |
|
|
865 | |
|
|
866 | assert ( |
|
|
867 | u == ecb_hilbert2d_coord_to_index (32, |
|
|
868 | ecb_hilbert2d_index_to_coord32 (32, |
|
|
869 | u))); |
|
|
870 | |
|
|
871 | Packing coordinates is done the same way, as well, from I<x> and I<y>: |
|
|
872 | |
|
|
873 | uint32_t xy = ((uint32_t)x << 16) | y; // for ecb_hilbert2d_coord_to_index32 |
|
|
874 | uint64_t xy = ((uint64_t)x << 32) | y; // for ecb_hilbert2d_coord_to_index64 |
|
|
875 | |
|
|
876 | Unlike C<ecb_hilbert2d_coord_to_index>, these functions are O(I<order>), |
|
|
877 | so it is preferable to use the lowest possible order. |
|
|
878 | |
|
|
879 | =back |
|
|
880 | |
789 | =head2 BIT MIXING, HASHING |
881 | =head2 BIT MIXING, HASHING |
790 | |
882 | |
791 | Sometimes you have an integer and want to distribute its bits well, for |
883 | Sometimes you have an integer and want to distribute its bits well, for |
792 | example, to use it as a hash in a hashtable. A common example is pointer |
884 | example, to use it as a hash in a hash table. A common example is pointer |
793 | values, which often only have a limited range (e.g. low and high bits are |
885 | values, which often only have a limited range (e.g. low and high bits are |
794 | often zero). |
886 | often zero). |
795 | |
887 | |
796 | The following functions try to mix the bits to get a good bias-free |
888 | The following functions try to mix the bits to get a good bias-free |
797 | distribution. They were mainly made for pointers, but the underlying |
889 | distribution. They were mainly made for pointers, but the underlying |
798 | integer functions are exposed as well. |
890 | integer functions are exposed as well. |
799 | |
891 | |
800 | As an added benefit, the functions are reversible, so if you find it |
892 | As an added benefit, the functions are reversible, so if you find it |
801 | convenient to store only the hash value, you can recover the original |
893 | convenient to store only the hash value, you can recover the original |
802 | pointer from the hash ("unmix"), as long as your pinters are 32 or 64 bit |
894 | pointer from the hash ("unmix"), as long as your pointers are 32 or 64 bit |
803 | (if this isn't the case on your platform, drop us a note and we will add |
895 | (if this isn't the case on your platform, drop us a note and we will add |
804 | functions for other bit widths). |
896 | functions for other bit widths). |
805 | |
897 | |
806 | The unmix functions are very slightly slower than the mix functions, so |
898 | The unmix functions are very slightly slower than the mix functions, so |
807 | it is equally very slightly preferable to store the original values wehen |
899 | it is equally very slightly preferable to store the original values wehen |
808 | convenient. |
900 | convenient. |
809 | |
901 | |
810 | The underlying algorithm if subject to change, so currently these |
902 | The underlying algorithm if subject to change, so currently these |
811 | functions are not suitable for persistent hash tables, as their result |
903 | functions are not suitable for persistent hash tables, as their result |
812 | value can change between diferent versions of libecb. |
904 | value can change between different versions of libecb. |
813 | |
905 | |
814 | =over |
906 | =over |
815 | |
907 | |
816 | =item uintptr_t ecb_ptrmix (void *ptr) |
908 | =item uintptr_t ecb_ptrmix (void *ptr) |
817 | |
909 | |
… | |
… | |
840 | =item uint32_t ecb_mix32 (uint32_t v) |
932 | =item uint32_t ecb_mix32 (uint32_t v) |
841 | |
933 | |
842 | =item uint64_t ecb_mix64 (uint64_t v) |
934 | =item uint64_t ecb_mix64 (uint64_t v) |
843 | |
935 | |
844 | Sometimes you don't have a pointer but an integer whose values are very |
936 | Sometimes you don't have a pointer but an integer whose values are very |
845 | badly distributed. In this case you cna sue these integer versions of the |
937 | badly distributed. In this case you can use these integer versions of the |
846 | mixing function. No C++ template is provided currently. |
938 | mixing function. No C++ template is provided currently. |
847 | |
939 | |
848 | =item uint32_t ecb_unmix32 (uint32_t v) |
940 | =item uint32_t ecb_unmix32 (uint32_t v) |
849 | |
941 | |
850 | =item uint64_t ecb_unmix64 (uint64_t v) |
942 | =item uint64_t ecb_unmix64 (uint64_t v) |
… | |
… | |
1010 | =item ecb_poke_be_u (void *ptr, T v) |
1102 | =item ecb_poke_be_u (void *ptr, T v) |
1011 | |
1103 | |
1012 | =item ecb_poke_le_u (void *ptr, T v) |
1104 | =item ecb_poke_le_u (void *ptr, T v) |
1013 | |
1105 | |
1014 | Again, similarly to their C counterparts, these functions store an |
1106 | Again, similarly to their C counterparts, these functions store an |
1015 | unsigned 8, 16, 32 or z64 bit value to memory, with optional conversion to |
1107 | unsigned 8, 16, 32 or 64 bit value to memory, with optional conversion to |
1016 | big/little endian. |
1108 | big/little endian. |
1017 | |
1109 | |
1018 | C<T> must be one of C<uint8_t>, C<uint16_t>, C<uint32_t> or C<uint64_t>. |
1110 | C<T> must be one of C<uint8_t>, C<uint16_t>, C<uint32_t> or C<uint64_t>. |
1019 | |
1111 | |
1020 | Unlike their C counterparts, these functions support 8 bit quantities |
1112 | Unlike their C counterparts, these functions support 8 bit quantities |
… | |
… | |
1024 | =back |
1116 | =back |
1025 | |
1117 | |
1026 | =head2 FAST INTEGER TO STRING |
1118 | =head2 FAST INTEGER TO STRING |
1027 | |
1119 | |
1028 | Libecb defines a set of very fast integer to decimal string (or integer |
1120 | Libecb defines a set of very fast integer to decimal string (or integer |
1029 | to ascii, short C<i2a>) functions. These work by converting the integer |
1121 | to ASCII, short C<i2a>) functions. These work by converting the integer |
1030 | to a fixed point representation and then successively multiplying out |
1122 | to a fixed point representation and then successively multiplying out |
1031 | the topmost digits. Unlike some other, also very fast, libraries, ecb's |
1123 | the topmost digits. Unlike some other, also very fast, libraries, ecb's |
1032 | algorithm should be completely branchless per digit, and does not rely on |
1124 | algorithm should be completely branchless per digit, and does not rely on |
1033 | the presence of special cpu functions (such as clz). |
1125 | the presence of special CPU functions (such as C<clz>). |
1034 | |
1126 | |
1035 | There is a high level API that takes an C<int32_t>, C<uint32_t>, |
1127 | There is a high level API that takes an C<int32_t>, C<uint32_t>, |
1036 | C<int64_t> or C<uint64_t> as argument, and a low-level API, which is |
1128 | C<int64_t> or C<uint64_t> as argument, and a low-level API, which is |
1037 | harder to use but supports slightly more formatting options. |
1129 | harder to use but supports slightly more formatting options. |
1038 | |
1130 | |
… | |
… | |
1096 | leading zeroes (C<_N>), and functions that can generate more digits, but |
1188 | leading zeroes (C<_N>), and functions that can generate more digits, but |
1097 | the leading digit has limited range (C<_xN>). |
1189 | the leading digit has limited range (C<_xN>). |
1098 | |
1190 | |
1099 | None of the functions deal with negative numbers. |
1191 | None of the functions deal with negative numbers. |
1100 | |
1192 | |
1101 | Example: convert an IP address in an u32 into dotted-quad: |
1193 | Example: convert an IP address in an C<uint32_t> into dotted-quad: |
1102 | |
1194 | |
1103 | uint32_t ip = 0x0a000164; // 10.0.1.100 |
1195 | uint32_t ip = 0x0a000164; // 10.0.1.100 |
1104 | char ips[3 * 4 + 3 + 1]; |
1196 | char ips[3 * 4 + 3 + 1]; |
1105 | char *ptr = ips; |
1197 | char *ptr = ips; |
1106 | ptr = ecb_i2a_3 (ptr, ip >> 24 ); *ptr++ = '.'; |
1198 | ptr = ecb_i2a_3 (ptr, ip >> 24 ); *ptr++ = '.'; |
… | |
… | |
1337 | |
1429 | |
1338 | =item ECB_NO_SMP |
1430 | =item ECB_NO_SMP |
1339 | |
1431 | |
1340 | The weaker version of C<ECB_NO_THREADS> - if F<ecb.h> is used from |
1432 | The weaker version of C<ECB_NO_THREADS> - if F<ecb.h> is used from |
1341 | multiple threads, but never concurrently (e.g. if the system the program |
1433 | multiple threads, but never concurrently (e.g. if the system the program |
1342 | runs on has only a single CPU with a single core, no hyperthreading and so |
1434 | runs on has only a single CPU with a single core, no hyper-threading and so |
1343 | on), then this symbol can be defined, leading to more efficient code and |
1435 | on), then this symbol can be defined, leading to more efficient code and |
1344 | fewer dependencies. |
1436 | fewer dependencies. |
1345 | |
1437 | |
1346 | =item ECB_NO_LIBM |
1438 | =item ECB_NO_LIBM |
1347 | |
1439 | |