ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/Geo-LatLon2Place/LatLon2Place.pm
Revision: 1.5
Committed: Tue Mar 15 07:33:40 2022 UTC (2 years, 2 months ago) by root
Branch: MAIN
Changes since 1.4: +12 -4 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 Geo::LatLon2Place - convert latitude and longitude to nearest place
4
5 =head1 SYNOPSIS
6
7 use Geo::LatLon2Place;
8
9 my $db = Geo::LatLon2Place->new ("/var/lib/mydb.cdb");
10
11 =head1 DESCRIPTION
12
13 This is a single-purpose module that tries to do one job: find the nearest
14 placename for a point on earth. It doesn't claim to do a perfect job, but
15 it tries to be simple to set up, simple to use and be fast. It doesn't
16 attempt to provide many features or nifty algorithms, and is meant to be
17 used in situations where you simply need a name for a coordinate without
18 becoming a GIS expert first.
19
20 =head2 BUILDING, SETTING UP AND USAGE
21
22 To build this module, you need tinycdb, a cdb implementation by Michael
23 Tokarev, or a compatible library. On GNU/Debian-based systems you can get
24 this by executing F<apt-get install libcdb-dev>.
25
26 After install the module, you need to generate a database using the
27 F<geo-latlon2place-makedb> command.
28
29 Currently, it accepts various databases from geonames
30 (L<https://www.geonames.org/export/>, note the license), for example,
31 F<cities500.zip>, which lists all places with population 500 or more:
32
33 wget https://download.geonames.org/export/dump/cities500.zip
34 unzip cities500.zip
35 geo-latlon2place-makedb cities500.txt cities500.ll2p
36
37 This will create a file F<ll2p.cdb> that you can use for lookups
38 with this module. At the time of this writing, the F<cities500> database
39 results in about a 10MB file while the F<allCountries> database results in
40 about 120MB.
41
42 Lookups will return a string of the form C<placename, countrycode>.
43
44 If you want to use the geonames postal code database (from
45 L<https://www.geonames.org/zip/>), use these commands:
46
47 wget https://download.geonames.org/export/zip/allCountries.zip
48 unzip allCountries.zip
49 geo-latlon2place-makedb --extract geonames-postalcodes allCountries.txt allCountries.ll2p
50
51 You can then use the resulting database like this:
52
53 my $lookup = Geo::LatLon2Place->new ("allCountries.ll2p");
54
55 # and then do as many queries as you wish:
56 my $res = $lookup->(49, 8.4);
57 if (defined $res) {
58 utf8::decode $res; # convert $res from utf-8 to unicode
59 print "49, 8.4 found $res\n"; # should be Karlsruhe, DE for geonames
60 } else {
61 print "nothing found at 49, 8.4\n";
62 }
63
64 =head1 THE Geo::LatLon2Place CLASS
65
66 =over
67
68 =cut
69
70 package Geo::LatLon2Place;
71
72 use common::sense;
73
74 use Carp ();
75
76 BEGIN {
77 our $VERSION = 0.01;
78
79 require XSLoader;
80 XSLoader::load (__PACKAGE__, $VERSION);
81
82 eval 'sub TORAD() { ' . ((atan2 1,0) / 90) . ' }';
83 }
84
85 =item $lookup = Geo::LatLon2Place->new ($path)
86
87 Opens a database created by F<geo-latlon2place-makedb> and return an
88 object that allows you to run queries against it.
89
90 The database will be mmaped, so it will not be loaded into memory, but
91 your operating system will cache it appropriately.
92
93 =cut
94
95 sub new {
96 my ($class, $path) = @_;
97
98 open my $fh, "<", $path
99 or Carp::croak "$path: $!\n";
100
101 my $self = bless [$fh, ""], $class;
102
103 cdb_init $self->[1], fileno $self->[0]
104 and Carp::croak "$path: unable to open as cdb file\n";
105
106 (my ($magic, $version), $self->[2], $self->[3]) = unpack "a4VVV", cdb_get $self->[1], "";
107
108 $magic eq "SRGL"
109 or Carp::croak "$path: not a Geo::LatLon2Place file";
110
111 $version == 2
112 or Carp::croak "$path: version mismatch (got $version, expected 2)";
113
114 $self
115 }
116
117 sub DESTROY {
118 my ($self) = @_;
119
120 cdb_free $self->[1];
121 }
122
123 =item $res = $lookup->lookup ($lat $lon[, $radius])
124
125 Looks up the point in the database that is "nearest" to C<$lat, $lon>,
126 search at leats up to C<$radius> kilometres. The default for C<$radius> is
127 the cell size the database is built with, and this usually works best, so
128 you usually do not specify this parameter.
129
130 If something is found, the associated data blob (always a binary string)
131 is returned, otherwise you receive C<undef>.
132
133 Unless you specify a cusotrm format, the data blob is actually a UTF-8
134 string, so you might want to call C<utf8::decode> on it to get a unicode
135 astring.
136
137 At the moment, the implementation is in pure perl, but will eventually
138 move to C.
139
140 =cut
141
142 sub lookup_xs {
143 my ($self, $lat, $lon, $radius) = @_;
144
145 lookup_ext_ $self->[1], $self->[2], $self->[3], $lat, $lon, 0, $radius, 0
146 }
147
148 sub lookup {
149 my ($self, $lat, $lon, $radius) = @_;
150
151 $radius ||= $self->[2];
152 $radius = int +($radius + $self->[2] - 1) / $self->[2];
153
154 my $coslat = cos $lat * TORAD;
155
156 my $blat = int $self->[3] * $coslat;
157 my $cx = int (($lon + 180) * $blat / 360);
158 my $cy = int (($lat + 90) * $self->[3] / 180);
159
160 my ($min, $res) = (1e00);
161
162 for my $y ($cy - $radius .. $cy + $radius) {
163 for my $x ($cx - $radius .. $cx + $radius) {
164 warn unpack "H*", pack "s< s<", $x, $y;
165 warn $blat;
166 for (unpack "(C/a*)*", cdb_get $self->[1], pack "s< s<", $x, $y) {
167 my ($plat, $plon, $w, $data) = unpack "s< s< C a*";
168 $plat = $plat * ( 90 / 32767);
169 $plon = $plon * (180 / 32767);
170
171 my $dx = ($lon - $plon) * TORAD * $coslat;
172 my $dy = ($lat - $plat) * TORAD;
173 my $d2 = ($dx * $dx + $dy * $dy) * $w;
174
175 $d2 >= $min
176 or ($min, $res) = ($d2, $data);
177 }
178 }
179 }
180
181 $res
182 }
183
184 =back
185
186 =head1 ALGORITHM
187
188 The algorithm that this module implements consists of two parts: binning
189 and weighting (done when writing the database) and then finding the
190 nearest point.
191
192 The first part bins all data points into a grid which has its minimum cell
193 size at the equator and poles, with somewhat larger cells in between.
194
195 The lookup part will then read the cell that the coordinate is in and some
196 neighbouring cells (depending on the search radius, by default it will
197 read the eight cells around it).
198
199 It will then calculate the (squared) distance to the search coordinate
200 using an approximate euclidean distance on an equireactangular
201 projection. The squared distance is multiplied with a weight (1..25 for
202 the geonames database, based on population and adminstrative status,
203 always 1 for postcal codes), and the minimum distance wins.
204
205 Binning should not introduce errors, but bigger bins can slow down lookup
206 times due to having to look at more places. The lookup assumes a spherical
207 shape for the earth, the equirectangular projection stretches distances
208 unevenly and the euclidean distance calculation introduces further
209 errors. For typical distance (<< 100km) and the intended usage, these
210 errors should be considered negligible.
211
212 =head1 SPEED
213
214 The current implementation is written in pure perl, and on my machine,
215 typically does 10000-200000 lookups per second. The goal for version 1.0
216 is to move the lookup to C.
217
218 =head1 TENTATIVE ROADMAP
219
220 The database writer should be accessible via a module, so you cna easily
221 generate your own databases without having to run an external command.
222
223 The api might be extended to allow for multiple returns, or nearest
224 neighbour search, or more return values (distance, coordinates).
225
226 =head1 PERL MULTICORE SUPPORT
227
228 This module supports the perl multicore specification
229 (L<http://perlmulticore.schmorp.de/>) when doing lookups.
230
231 =head1 SEE ALSO
232
233 L<geo-latlon2place-makedb> to create databases from common formats.
234
235 =head1 AUTHOR
236
237 Marc Lehmann <schmorp@schmorp.de>
238 http://home.schmorp.de/
239
240 =cut
241
242 1
243