… | |
… | |
72 | usually works best, so you usually do not specify this parameter. |
72 | usually works best, so you usually do not specify this parameter. |
73 | |
73 | |
74 | If something is found, the associated data blob (always a binary |
74 | If something is found, the associated data blob (always a binary |
75 | string) is returned, otherwise you receive "undef". |
75 | string) is returned, otherwise you receive "undef". |
76 | |
76 | |
77 | Unless you specify a cusotrm format, the data blob is actually a |
77 | Unless you specify a custom format/extractor when building your |
78 | UTF-8 string, so you might want to call "utf8::decode" on it to get |
78 | database, the data blob is actually a UTF-8 string, so you might |
79 | a unicode astring. |
79 | want to call "utf8::decode" on it to get a unicode string: |
80 | |
80 | |
81 | At the moment, the implementation is in pure perl, but will |
81 | my $res = $db->lookup (47, 37); # near mariupol, UA |
82 | eventually move to C. |
82 | if (defined $res) { |
|
|
83 | utf8::decode $res; |
|
|
84 | # $res now contains the unicode result |
|
|
85 | } |
83 | |
86 | |
84 | ALGORITHM |
87 | ALGORITHM |
85 | The algorithm that this module implements consists of two parts: binning |
88 | The algorithm that this module implements consists of two parts: binning |
86 | and weighting (done when writing the database) and then finding the |
89 | and weighting (done when writing the database) and then finding the |
87 | nearest point. |
90 | nearest point. |
… | |
… | |
96 | |
99 | |
97 | It will then calculate the (squared) distance to the search coordinate |
100 | It will then calculate the (squared) distance to the search coordinate |
98 | using an approximate euclidean distance on an equireactangular |
101 | using an approximate euclidean distance on an equireactangular |
99 | projection. The squared distance is multiplied with a weight (1..25 for |
102 | projection. The squared distance is multiplied with a weight (1..25 for |
100 | the geonames database, based on population and adminstrative status, |
103 | the geonames database, based on population and adminstrative status, |
101 | always 1 for postcal codes), and the minimum distance wins. |
104 | always 1 for postal codes), and the minimum distance wins. |
102 | |
105 | |
103 | Binning should not introduce errors, but bigger bins can slow down |
106 | Binning should not introduce errors, but bigger bins can slow down |
104 | lookup times due to having to look at more places. The lookup assumes a |
107 | lookup times due to having to look at more places. The lookup assumes a |
105 | spherical shape for the earth, the equirectangular projection stretches |
108 | spherical shape for the earth, the equirectangular projection stretches |
106 | distances unevenly and the euclidean distance calculation introduces |
109 | distances unevenly and the euclidean distance calculation introduces |
107 | further errors. For typical distance (<< 100km) and the intended usage, |
110 | further errors. For typical distance (<< 100km) and the intended usage, |
108 | these errors should be considered negligible. |
111 | these errors should be considered negligible. |
109 | |
112 | |
110 | SPEED |
113 | SPEED |
111 | The current implementation is written in pure perl, and on my machine, |
114 | On my machine, "lookup" typically does more than a million lookups per |
112 | typically does 10000-200000 lookups per second. The goal for version 1.0 |
115 | second - performance varies depending on result density and number of |
113 | is to move the lookup to C. |
116 | indexed points. |
114 | |
117 | |
115 | TENTATIVE ROADMAP |
118 | TENTATIVE ROADMAP |
116 | The database writer should be accessible via a module, so you cna easily |
119 | The database writer should be accessible via a module, so you can easily |
117 | generate your own databases without having to run an external command. |
120 | generate your own databases without having to run an external command. |
118 | |
121 | |
119 | The api might be extended to allow for multiple returns, or nearest |
122 | The API might be extended to allow for multiple lookups, multiple |
120 | neighbour search. |
123 | returns, or nearest neighbour search, or more return values (distance, |
|
|
124 | coordinates). |
|
|
125 | |
|
|
126 | Longer lookups will take advantage of perlmulticore. |
|
|
127 | |
|
|
128 | PERL MULTICORE SUPPORT |
|
|
129 | This is not yet implemented: |
|
|
130 | |
|
|
131 | This module supports the perl multicore specification |
|
|
132 | (<http://perlmulticore.schmorp.de/>) when doing lookups. |
121 | |
133 | |
122 | SEE ALSO |
134 | SEE ALSO |
123 | geo-latlon2place-makedb to create databases from common formats. |
135 | geo-latlon2place-makedb to create databases from common formats. |
124 | |
136 | |
125 | AUTHOR |
137 | AUTHOR |