ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/Geo-LatLon2Place/LatLon2Place.pm
Revision: 1.5
Committed: Tue Mar 15 07:33:40 2022 UTC (2 years, 2 months ago) by root
Branch: MAIN
Changes since 1.4: +12 -4 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     Geo::LatLon2Place - convert latitude and longitude to nearest place
4    
5     =head1 SYNOPSIS
6    
7     use Geo::LatLon2Place;
8    
9     my $db = Geo::LatLon2Place->new ("/var/lib/mydb.cdb");
10    
11     =head1 DESCRIPTION
12    
13 root 1.2 This is a single-purpose module that tries to do one job: find the nearest
14 root 1.1 placename for a point on earth. It doesn't claim to do a perfect job, but
15 root 1.2 it tries to be simple to set up, simple to use and be fast. It doesn't
16     attempt to provide many features or nifty algorithms, and is meant to be
17     used in situations where you simply need a name for a coordinate without
18     becoming a GIS expert first.
19 root 1.1
20 root 1.2 =head2 BUILDING, SETTING UP AND USAGE
21 root 1.1
22     To build this module, you need tinycdb, a cdb implementation by Michael
23     Tokarev, or a compatible library. On GNU/Debian-based systems you can get
24     this by executing F<apt-get install libcdb-dev>.
25    
26     After install the module, you need to generate a database using the
27     F<geo-latlon2place-makedb> command.
28    
29     Currently, it accepts various databases from geonames
30     (L<https://www.geonames.org/export/>, note the license), for example,
31     F<cities500.zip>, which lists all places with population 500 or more:
32    
33     wget https://download.geonames.org/export/dump/cities500.zip
34     unzip cities500.zip
35 root 1.2 geo-latlon2place-makedb cities500.txt cities500.ll2p
36 root 1.1
37 root 1.2 This will create a file F<ll2p.cdb> that you can use for lookups
38 root 1.1 with this module. At the time of this writing, the F<cities500> database
39     results in about a 10MB file while the F<allCountries> database results in
40     about 120MB.
41    
42 root 1.2 Lookups will return a string of the form C<placename, countrycode>.
43    
44     If you want to use the geonames postal code database (from
45     L<https://www.geonames.org/zip/>), use these commands:
46    
47     wget https://download.geonames.org/export/zip/allCountries.zip
48     unzip allCountries.zip
49     geo-latlon2place-makedb --extract geonames-postalcodes allCountries.txt allCountries.ll2p
50    
51     You can then use the resulting database like this:
52    
53     my $lookup = Geo::LatLon2Place->new ("allCountries.ll2p");
54    
55     # and then do as many queries as you wish:
56     my $res = $lookup->(49, 8.4);
57     if (defined $res) {
58     utf8::decode $res; # convert $res from utf-8 to unicode
59     print "49, 8.4 found $res\n"; # should be Karlsruhe, DE for geonames
60     } else {
61     print "nothing found at 49, 8.4\n";
62     }
63    
64     =head1 THE Geo::LatLon2Place CLASS
65    
66     =over
67 root 1.1
68     =cut
69    
70     package Geo::LatLon2Place;
71    
72     use common::sense;
73    
74     use Carp ();
75    
76     BEGIN {
77     our $VERSION = 0.01;
78    
79     require XSLoader;
80 root 1.2 XSLoader::load (__PACKAGE__, $VERSION);
81 root 1.1
82 root 1.5 eval 'sub TORAD() { ' . ((atan2 1,0) / 90) . ' }';
83 root 1.1 }
84    
85 root 1.2 =item $lookup = Geo::LatLon2Place->new ($path)
86    
87     Opens a database created by F<geo-latlon2place-makedb> and return an
88     object that allows you to run queries against it.
89    
90     The database will be mmaped, so it will not be loaded into memory, but
91     your operating system will cache it appropriately.
92    
93     =cut
94    
95 root 1.1 sub new {
96     my ($class, $path) = @_;
97    
98     open my $fh, "<", $path
99     or Carp::croak "$path: $!\n";
100    
101     my $self = bless [$fh, ""], $class;
102    
103     cdb_init $self->[1], fileno $self->[0]
104     and Carp::croak "$path: unable to open as cdb file\n";
105    
106     (my ($magic, $version), $self->[2], $self->[3]) = unpack "a4VVV", cdb_get $self->[1], "";
107    
108     $magic eq "SRGL"
109     or Carp::croak "$path: not a Geo::LatLon2Place file";
110    
111 root 1.5 $version == 2
112     or Carp::croak "$path: version mismatch (got $version, expected 2)";
113 root 1.1
114     $self
115     }
116    
117     sub DESTROY {
118     my ($self) = @_;
119    
120     cdb_free $self->[1];
121     }
122    
123 root 1.2 =item $res = $lookup->lookup ($lat $lon[, $radius])
124    
125     Looks up the point in the database that is "nearest" to C<$lat, $lon>,
126     search at leats up to C<$radius> kilometres. The default for C<$radius> is
127     the cell size the database is built with, and this usually works best, so
128     you usually do not specify this parameter.
129    
130     If something is found, the associated data blob (always a binary string)
131     is returned, otherwise you receive C<undef>.
132    
133     Unless you specify a cusotrm format, the data blob is actually a UTF-8
134     string, so you might want to call C<utf8::decode> on it to get a unicode
135     astring.
136    
137     At the moment, the implementation is in pure perl, but will eventually
138     move to C.
139    
140     =cut
141    
142 root 1.5 sub lookup_xs {
143     my ($self, $lat, $lon, $radius) = @_;
144    
145     lookup_ext_ $self->[1], $self->[2], $self->[3], $lat, $lon, 0, $radius, 0
146     }
147    
148 root 1.1 sub lookup {
149     my ($self, $lat, $lon, $radius) = @_;
150    
151     $radius ||= $self->[2];
152     $radius = int +($radius + $self->[2] - 1) / $self->[2];
153    
154 root 1.5 my $coslat = cos $lat * TORAD;
155 root 1.1
156     my $blat = int $self->[3] * $coslat;
157     my $cx = int (($lon + 180) * $blat / 360);
158     my $cy = int (($lat + 90) * $self->[3] / 180);
159    
160     my ($min, $res) = (1e00);
161    
162     for my $y ($cy - $radius .. $cy + $radius) {
163     for my $x ($cx - $radius .. $cx + $radius) {
164 root 1.5 warn unpack "H*", pack "s< s<", $x, $y;
165     warn $blat;
166 root 1.1 for (unpack "(C/a*)*", cdb_get $self->[1], pack "s< s<", $x, $y) {
167     my ($plat, $plon, $w, $data) = unpack "s< s< C a*";
168     $plat = $plat * ( 90 / 32767);
169     $plon = $plon * (180 / 32767);
170    
171     my $dx = ($lon - $plon) * TORAD * $coslat;
172     my $dy = ($lat - $plat) * TORAD;
173     my $d2 = ($dx * $dx + $dy * $dy) * $w;
174    
175     $d2 >= $min
176     or ($min, $res) = ($d2, $data);
177     }
178     }
179     }
180    
181     $res
182     }
183    
184 root 1.2 =back
185    
186     =head1 ALGORITHM
187    
188     The algorithm that this module implements consists of two parts: binning
189     and weighting (done when writing the database) and then finding the
190     nearest point.
191    
192     The first part bins all data points into a grid which has its minimum cell
193     size at the equator and poles, with somewhat larger cells in between.
194    
195     The lookup part will then read the cell that the coordinate is in and some
196     neighbouring cells (depending on the search radius, by default it will
197     read the eight cells around it).
198    
199     It will then calculate the (squared) distance to the search coordinate
200     using an approximate euclidean distance on an equireactangular
201     projection. The squared distance is multiplied with a weight (1..25 for
202     the geonames database, based on population and adminstrative status,
203     always 1 for postcal codes), and the minimum distance wins.
204    
205     Binning should not introduce errors, but bigger bins can slow down lookup
206     times due to having to look at more places. The lookup assumes a spherical
207     shape for the earth, the equirectangular projection stretches distances
208     unevenly and the euclidean distance calculation introduces further
209     errors. For typical distance (<< 100km) and the intended usage, these
210     errors should be considered negligible.
211    
212     =head1 SPEED
213    
214     The current implementation is written in pure perl, and on my machine,
215     typically does 10000-200000 lookups per second. The goal for version 1.0
216     is to move the lookup to C.
217    
218     =head1 TENTATIVE ROADMAP
219    
220     The database writer should be accessible via a module, so you cna easily
221     generate your own databases without having to run an external command.
222    
223     The api might be extended to allow for multiple returns, or nearest
224 root 1.4 neighbour search, or more return values (distance, coordinates).
225 root 1.2
226 root 1.3 =head1 PERL MULTICORE SUPPORT
227    
228     This module supports the perl multicore specification
229     (L<http://perlmulticore.schmorp.de/>) when doing lookups.
230    
231 root 1.2 =head1 SEE ALSO
232    
233     L<geo-latlon2place-makedb> to create databases from common formats.
234    
235 root 1.1 =head1 AUTHOR
236    
237     Marc Lehmann <schmorp@schmorp.de>
238     http://home.schmorp.de/
239    
240     =cut
241    
242     1
243