ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/Geo-LatLon2Place/LatLon2Place.pm
(Generate patch)

Comparing Geo-LatLon2Place/LatLon2Place.pm (file contents):
Revision 1.1 by root, Mon Mar 14 02:41:51 2022 UTC vs.
Revision 1.5 by root, Tue Mar 15 07:33:40 2022 UTC

8 8
9 my $db = Geo::LatLon2Place->new ("/var/lib/mydb.cdb"); 9 my $db = Geo::LatLon2Place->new ("/var/lib/mydb.cdb");
10 10
11=head1 DESCRIPTION 11=head1 DESCRIPTION
12 12
13This is a simple-purpose module that tries to do one job: find the nearest 13This is a single-purpose module that tries to do one job: find the nearest
14placename for a point on earth. It doesn't claim to do a perfect job, but 14placename for a point on earth. It doesn't claim to do a perfect job, but
15it tries to be simple to set up, simple to use and be fast. 15it tries to be simple to set up, simple to use and be fast. It doesn't
16attempt to provide many features or nifty algorithms, and is meant to be
17used in situations where you simply need a name for a coordinate without
18becoming a GIS expert first.
16 19
17=head2 BUILDING AND SETTING UP 20=head2 BUILDING, SETTING UP AND USAGE
18 21
19To build this module, you need tinycdb, a cdb implementation by Michael 22To build this module, you need tinycdb, a cdb implementation by Michael
20Tokarev, or a compatible library. On GNU/Debian-based systems you can get 23Tokarev, or a compatible library. On GNU/Debian-based systems you can get
21this by executing F<apt-get install libcdb-dev>. 24this by executing F<apt-get install libcdb-dev>.
22 25
27(L<https://www.geonames.org/export/>, note the license), for example, 30(L<https://www.geonames.org/export/>, note the license), for example,
28F<cities500.zip>, which lists all places with population 500 or more: 31F<cities500.zip>, which lists all places with population 500 or more:
29 32
30 wget https://download.geonames.org/export/dump/cities500.zip 33 wget https://download.geonames.org/export/dump/cities500.zip
31 unzip cities500.zip 34 unzip cities500.zip
32 geo-latlon2place-makedb --geonames-gazetteer cities500.txt ll2place.cdb 35 geo-latlon2place-makedb cities500.txt cities500.ll2p
33 36
34This will create a file F<ll2place.cdb> that you can use for lookups 37This will create a file F<ll2p.cdb> that you can use for lookups
35with this module. At the time of this writing, the F<cities500> database 38with this module. At the time of this writing, the F<cities500> database
36results in about a 10MB file while the F<allCountries> database results in 39results in about a 10MB file while the F<allCountries> database results in
37about 120MB. 40about 120MB.
38 41
42Lookups will return a string of the form C<placename, countrycode>.
43
44If you want to use the geonames postal code database (from
45L<https://www.geonames.org/zip/>), use these commands:
46
47 wget https://download.geonames.org/export/zip/allCountries.zip
48 unzip allCountries.zip
49 geo-latlon2place-makedb --extract geonames-postalcodes allCountries.txt allCountries.ll2p
50
51You can then use the resulting database like this:
52
53 my $lookup = Geo::LatLon2Place->new ("allCountries.ll2p");
54
55 # and then do as many queries as you wish:
56 my $res = $lookup->(49, 8.4);
57 if (defined $res) {
58 utf8::decode $res; # convert $res from utf-8 to unicode
59 print "49, 8.4 found $res\n"; # should be Karlsruhe, DE for geonames
60 } else {
61 print "nothing found at 49, 8.4\n";
62 }
63
64=head1 THE Geo::LatLon2Place CLASS
65
39=over 4 66=over
40 67
41=cut 68=cut
42 69
43package Geo::LatLon2Place; 70package Geo::LatLon2Place;
44 71
48 75
49BEGIN { 76BEGIN {
50 our $VERSION = 0.01; 77 our $VERSION = 0.01;
51 78
52 require XSLoader; 79 require XSLoader;
53 XSLoader::load __PACKAGE__, $VERSION; 80 XSLoader::load (__PACKAGE__, $VERSION);
54 81
55 eval 'sub TORAD() { ' . ((atan2 1,0) / 180) . ' }'; 82 eval 'sub TORAD() { ' . ((atan2 1,0) / 90) . ' }';
56} 83}
84
85=item $lookup = Geo::LatLon2Place->new ($path)
86
87Opens a database created by F<geo-latlon2place-makedb> and return an
88object that allows you to run queries against it.
89
90The database will be mmaped, so it will not be loaded into memory, but
91your operating system will cache it appropriately.
92
93=cut
57 94
58sub new { 95sub new {
59 my ($class, $path) = @_; 96 my ($class, $path) = @_;
60 97
61 open my $fh, "<", $path 98 open my $fh, "<", $path
69 (my ($magic, $version), $self->[2], $self->[3]) = unpack "a4VVV", cdb_get $self->[1], ""; 106 (my ($magic, $version), $self->[2], $self->[3]) = unpack "a4VVV", cdb_get $self->[1], "";
70 107
71 $magic eq "SRGL" 108 $magic eq "SRGL"
72 or Carp::croak "$path: not a Geo::LatLon2Place file"; 109 or Carp::croak "$path: not a Geo::LatLon2Place file";
73 110
74 $version == 1 111 $version == 2
75 or Carp::croak "$path: version mismatch (got $version, expected 1)"; 112 or Carp::croak "$path: version mismatch (got $version, expected 2)";
76 113
77 $self 114 $self
78} 115}
79 116
80sub DESTROY { 117sub DESTROY {
81 my ($self) = @_; 118 my ($self) = @_;
82 119
83 cdb_free $self->[1]; 120 cdb_free $self->[1];
84} 121}
85 122
123=item $res = $lookup->lookup ($lat $lon[, $radius])
124
125Looks up the point in the database that is "nearest" to C<$lat, $lon>,
126search at leats up to C<$radius> kilometres. The default for C<$radius> is
127the cell size the database is built with, and this usually works best, so
128you usually do not specify this parameter.
129
130If something is found, the associated data blob (always a binary string)
131is returned, otherwise you receive C<undef>.
132
133Unless you specify a cusotrm format, the data blob is actually a UTF-8
134string, so you might want to call C<utf8::decode> on it to get a unicode
135astring.
136
137At the moment, the implementation is in pure perl, but will eventually
138move to C.
139
140=cut
141
142sub lookup_xs {
143 my ($self, $lat, $lon, $radius) = @_;
144
145 lookup_ext_ $self->[1], $self->[2], $self->[3], $lat, $lon, 0, $radius, 0
146}
147
86sub lookup { 148sub lookup {
87 my ($self, $lat, $lon, $radius) = @_; 149 my ($self, $lat, $lon, $radius) = @_;
88 150
89 $radius ||= $self->[2]; 151 $radius ||= $self->[2];
90 $radius = int +($radius + $self->[2] - 1) / $self->[2]; 152 $radius = int +($radius + $self->[2] - 1) / $self->[2];
91 153
92 my $coslat = cos abs $lat * TORAD; 154 my $coslat = cos $lat * TORAD;
93 155
94 my $blat = int $self->[3] * $coslat; 156 my $blat = int $self->[3] * $coslat;
95 my $cx = int (($lon + 180) * $blat / 360); 157 my $cx = int (($lon + 180) * $blat / 360);
96 my $cy = int (($lat + 90) * $self->[3] / 180); 158 my $cy = int (($lat + 90) * $self->[3] / 180);
97 159
98 my ($min, $res) = (1e00); 160 my ($min, $res) = (1e00);
99 161
100 for my $y ($cy - $radius .. $cy + $radius) { 162 for my $y ($cy - $radius .. $cy + $radius) {
101 for my $x ($cx - $radius .. $cx + $radius) { 163 for my $x ($cx - $radius .. $cx + $radius) {
164 warn unpack "H*", pack "s< s<", $x, $y;
165 warn $blat;
102 for (unpack "(C/a*)*", cdb_get $self->[1], pack "s< s<", $x, $y) { 166 for (unpack "(C/a*)*", cdb_get $self->[1], pack "s< s<", $x, $y) {
103 my ($plat, $plon, $w, $data) = unpack "s< s< C a*"; 167 my ($plat, $plon, $w, $data) = unpack "s< s< C a*";
104 $plat = $plat * ( 90 / 32767); 168 $plat = $plat * ( 90 / 32767);
105 $plon = $plon * (180 / 32767); 169 $plon = $plon * (180 / 32767);
106 170
115 } 179 }
116 180
117 $res 181 $res
118} 182}
119 183
184=back
185
186=head1 ALGORITHM
187
188The algorithm that this module implements consists of two parts: binning
189and weighting (done when writing the database) and then finding the
190nearest point.
191
192The first part bins all data points into a grid which has its minimum cell
193size at the equator and poles, with somewhat larger cells in between.
194
195The lookup part will then read the cell that the coordinate is in and some
196neighbouring cells (depending on the search radius, by default it will
197read the eight cells around it).
198
199It will then calculate the (squared) distance to the search coordinate
200using an approximate euclidean distance on an equireactangular
201projection. The squared distance is multiplied with a weight (1..25 for
202the geonames database, based on population and adminstrative status,
203always 1 for postcal codes), and the minimum distance wins.
204
205Binning should not introduce errors, but bigger bins can slow down lookup
206times due to having to look at more places. The lookup assumes a spherical
207shape for the earth, the equirectangular projection stretches distances
208unevenly and the euclidean distance calculation introduces further
209errors. For typical distance (<< 100km) and the intended usage, these
210errors should be considered negligible.
211
212=head1 SPEED
213
214The current implementation is written in pure perl, and on my machine,
215typically does 10000-200000 lookups per second. The goal for version 1.0
216is to move the lookup to C.
217
218=head1 TENTATIVE ROADMAP
219
220The database writer should be accessible via a module, so you cna easily
221generate your own databases without having to run an external command.
222
223The api might be extended to allow for multiple returns, or nearest
224neighbour search, or more return values (distance, coordinates).
225
226=head1 PERL MULTICORE SUPPORT
227
228This module supports the perl multicore specification
229(L<http://perlmulticore.schmorp.de/>) when doing lookups.
230
231=head1 SEE ALSO
232
233L<geo-latlon2place-makedb> to create databases from common formats.
234
120=head1 AUTHOR 235=head1 AUTHOR
121 236
122 Marc Lehmann <schmorp@schmorp.de> 237 Marc Lehmann <schmorp@schmorp.de>
123 http://home.schmorp.de/ 238 http://home.schmorp.de/
124 239

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines