Geo-LatLon2Place/LatLon2Place.pm

=head1 NAME

Geo::LatLon2Place - convert latitude and longitude to nearest place

=head1 SYNOPSIS

 use Geo::LatLon2Place;

 my $db = Geo::LatLon2Place->new ("/var/lib/mydb.cdb");

=head1 DESCRIPTION

This is a single-purpose module that tries to do one job: find the nearest
placename for a point on earth. It doesn't claim to do a perfect job, but
it tries to be simple to set up, simple to use and be fast. It doesn't
attempt to provide many features or nifty algorithms, and is meant to be
used in situations where you simply need a name for a coordinate without
becoming a GIS expert first.

=head2 BUILDING, SETTING UP AND USAGE

To build this module, you need tinycdb, a cdb implementation by Michael
Tokarev, or a compatible library. On GNU/Debian-based systems you can get
this by executing F<apt-get install libcdb-dev>.

After install the module, you need to generate a database using the
F<geo-latlon2place-makedb> command.

Currently, it accepts various databases from geonames
(L<https://www.geonames.org/export/>, note the license), for example,
F<cities500.zip>, which lists all places with population 500 or more:

   wget https://download.geonames.org/export/dump/cities500.zip
   unzip cities500.zip
   geo-latlon2place-makedb cities500.txt cities500.ll2p

This will create a file F<ll2p.cdb> that you can use for lookups
with this module. At the time of this writing, the F<cities500> database
results in about a 10MB file while the F<allCountries> database results in
about 120MB.

Lookups will return a string of the form C<placename, countrycode>.

If you want to use the geonames postal code database (from
L<https://www.geonames.org/zip/>), use these commands:

   wget https://download.geonames.org/export/zip/allCountries.zip
   unzip allCountries.zip
   geo-latlon2place-makedb --extract geonames-postalcodes allCountries.txt allCountries.ll2p

You can then use the resulting database like this:

   my $lookup = Geo::LatLon2Place->new ("allCountries.ll2p");

   # and then do as many queries as you wish:
   my $res = $lookup->(49, 8.4);
   if (defined $res) {
      utf8::decode $res; # convert $res from utf-8 to unicode
      print "49, 8.4 found $res\n"; # should be Karlsruhe, DE for geonames
   } else {
      print "nothing found at 49, 8.4\n";
   }

=head1 THE Geo::LatLon2Place CLASS

=over

=cut

package Geo::LatLon2Place;

use common::sense;

use Carp ();

BEGIN {
   our $VERSION = '1.0';

   require XSLoader;
   XSLoader::load (__PACKAGE__, $VERSION);

   eval 'sub TORAD() { ' . ((atan2 1,0) / 90) . ' }';
}

=item $lookup = Geo::LatLon2Place->new ($path)

Opens a database created by F<geo-latlon2place-makedb> and return an
object that allows you to run queries against it.

The database will be mmaped, so it will not be loaded into memory, but
your operating system will cache it appropriately.

=cut

sub new {
   my ($class, $path) = @_;

   open my $fh, "<", $path
      or Carp::croak "$path: $!\n";

   my $self = bless [$fh, ""], $class;

   cdb_init $self->[1], fileno $self->[0]
      and Carp::croak "$path: unable to open as cdb file\n";

   (my ($magic, $version), $self->[2], $self->[3]) = unpack "a4VVV", cdb_get $self->[1], "";

   $magic eq "SRGL"
      or Carp::croak "$path: not a Geo::LatLon2Place file";

   $version == 2
      or Carp::croak "$path: version mismatch (got $version, expected 2)";

   $self
}

sub DESTROY {
   my ($self) = @_;

   cdb_free $self->[1];
}

=item $res = $lookup->lookup ($lat, $lon[, $radius])

Looks up the point in the database that is "nearest" to C<$lat, $lon>,
search at least up to C<$radius> kilometres. The default for C<$radius> is
the cell size the database is built with, and this usually works best, so
you usually do not specify this parameter.

If something is found, the associated data blob (always a binary string)
is returned, otherwise you receive C<undef>.

Unless you specify a custom format/extractor when building your database,
the data blob is actually a UTF-8 string, so you might want to call
C<utf8::decode> on it to get a unicode string:

   my $res = $db->lookup (47, 37); # near mariupol, UA
   if (defined $res) {
      utf8::decode $res;
      # $res now contains the unicode result
   }

=cut

sub lookup {
   my ($self, $lat, $lon, $radius) = @_;

   lookup_ext_ $self->[1], $self->[2], $self->[3], $lat, $lon, 0, $radius, 0
}

=back

=head1 ALGORITHM

The algorithm that this module implements consists of two parts: binning
and weighting (done when writing the database) and then finding the
nearest point.

The first part bins all data points into a grid which has its minimum cell
size at the equator and poles, with somewhat larger cells in between.

The lookup part will then read the cell that the coordinate is in and some
neighbouring cells (depending on the search radius, by default it will
read the eight cells around it).

It will then calculate the (squared) distance to the search coordinate
using an approximate euclidean distance on an equireactangular
projection. The squared distance is multiplied with a weight (1..25 for
the geonames database, based on population and adminstrative status,
always 1 for postal codes), and the minimum distance wins.

Binning should not introduce errors, but bigger bins can slow down lookup
times due to having to look at more places. The lookup assumes a spherical
shape for the earth, the equirectangular projection stretches distances
unevenly and the euclidean distance calculation introduces further
errors. For typical distance (<< 100km) and the intended usage, these
errors should be considered negligible.

=head1 SPEED

On my machine, C<lookup> typically does more than a million lookups per
second - performance varies depending on result density and number of
indexed points.

=head1 TENTATIVE ROADMAP

The database writer should be accessible via a module, so you can easily
generate your own databases without having to run an external command.

The API might be extended to allow for multiple lookups, multiple
returns, or nearest neighbour search, or more return values (distance,
coordinates).

Longer lookups will take advantage of perlmulticore.

=head1 PERL MULTICORE SUPPORT

This is not yet implemented:

This module supports the perl multicore specification
(L<http://perlmulticore.schmorp.de/>) when doing lookups.

=head1 SEE ALSO

L<geo-latlon2place-makedb> to create databases from common formats.

=head1 AUTHOR

 Marc Lehmann <schmorp@schmorp.de>
 http://home.schmorp.de/

=cut

1

Revision:	1.9
Committed:	Fri Oct 13 11:34:55 2023 UTC (7 months, 2 weeks ago) by root
Branch:	MAIN
CVS Tags:	HEAD
Changes since 1.8:	+1 -1 lines
Log Message:	* empty log message *
#	Content
1	=head1 NAME
2
3	Geo::LatLon2Place - convert latitude and longitude to nearest place
4
5	=head1 SYNOPSIS
6
7	use Geo::LatLon2Place;
8
9	my $db = Geo::LatLon2Place->new ("/var/lib/mydb.cdb");
10
11	=head1 DESCRIPTION
12
13	This is a single-purpose module that tries to do one job: find the nearest
14	placename for a point on earth. It doesn't claim to do a perfect job, but
15	it tries to be simple to set up, simple to use and be fast. It doesn't
16	attempt to provide many features or nifty algorithms, and is meant to be
17	used in situations where you simply need a name for a coordinate without
18	becoming a GIS expert first.
19
20	=head2 BUILDING, SETTING UP AND USAGE
21
22	To build this module, you need tinycdb, a cdb implementation by Michael
23	Tokarev, or a compatible library. On GNU/Debian-based systems you can get
24	this by executing F<apt-get install libcdb-dev>.
25
26	After install the module, you need to generate a database using the
27	F<geo-latlon2place-makedb> command.
28
29	Currently, it accepts various databases from geonames
30	(L<https://www.geonames.org/export/>, note the license), for example,
31	F<cities500.zip>, which lists all places with population 500 or more:
32
33	wget https://download.geonames.org/export/dump/cities500.zip
34	unzip cities500.zip
35	geo-latlon2place-makedb cities500.txt cities500.ll2p
36
37	This will create a file F<ll2p.cdb> that you can use for lookups
38	with this module. At the time of this writing, the F<cities500> database
39	results in about a 10MB file while the F<allCountries> database results in
40	about 120MB.
41
42	Lookups will return a string of the form C<placename, countrycode>.
43
44	If you want to use the geonames postal code database (from
45	L<https://www.geonames.org/zip/>), use these commands:
46
47	wget https://download.geonames.org/export/zip/allCountries.zip
48	unzip allCountries.zip
49	geo-latlon2place-makedb --extract geonames-postalcodes allCountries.txt allCountries.ll2p
50
51	You can then use the resulting database like this:
52
53	my $lookup = Geo::LatLon2Place->new ("allCountries.ll2p");
54
55	# and then do as many queries as you wish:
56	my $res = $lookup->(49, 8.4);
57	if (defined $res) {
58	utf8::decode $res; # convert $res from utf-8 to unicode
59	print "49, 8.4 found $res\n"; # should be Karlsruhe, DE for geonames
60	} else {
61	print "nothing found at 49, 8.4\n";
62	}
63
64	=head1 THE Geo::LatLon2Place CLASS
65
66	=over
67
68	=cut
69
70	package Geo::LatLon2Place;
71
72	use common::sense;
73
74	use Carp ();
75
76	BEGIN {
77	our $VERSION = '1.0';
78
79	require XSLoader;
80	XSLoader::load (__PACKAGE__, $VERSION);
81
82	eval 'sub TORAD() { ' . ((atan2 1,0) / 90) . ' }';
83	}
84
85	=item $lookup = Geo::LatLon2Place->new ($path)
86
87	Opens a database created by F<geo-latlon2place-makedb> and return an
88	object that allows you to run queries against it.
89
90	The database will be mmaped, so it will not be loaded into memory, but
91	your operating system will cache it appropriately.
92
93	=cut
94
95	sub new {
96	my ($class, $path) = @_;
97
98	open my $fh, "<", $path
99	or Carp::croak "$path: $!\n";
100
101	my $self = bless [$fh, ""], $class;
102
103	cdb_init $self->[1], fileno $self->[0]
104	and Carp::croak "$path: unable to open as cdb file\n";
105
106	(my ($magic, $version), $self->[2], $self->[3]) = unpack "a4VVV", cdb_get $self->[1], "";
107
108	$magic eq "SRGL"
109	or Carp::croak "$path: not a Geo::LatLon2Place file";
110
111	$version == 2
112	or Carp::croak "$path: version mismatch (got $version, expected 2)";
113
114	$self
115	}
116
117	sub DESTROY {
118	my ($self) = @_;
119
120	cdb_free $self->[1];
121	}
122
123	=item $res = $lookup->lookup ($lat, $lon[, $radius])
124
125	Looks up the point in the database that is "nearest" to C<$lat, $lon>,
126	search at least up to C<$radius> kilometres. The default for C<$radius> is
127	the cell size the database is built with, and this usually works best, so
128	you usually do not specify this parameter.
129
130	If something is found, the associated data blob (always a binary string)
131	is returned, otherwise you receive C<undef>.
132
133	Unless you specify a custom format/extractor when building your database,
134	the data blob is actually a UTF-8 string, so you might want to call
135	C<utf8::decode> on it to get a unicode string:
136
137	my $res = $db->lookup (47, 37); # near mariupol, UA
138	if (defined $res) {
139	utf8::decode $res;
140	# $res now contains the unicode result
141	}
142
143	=cut
144
145	sub lookup {
146	my ($self, $lat, $lon, $radius) = @_;
147
148	lookup_ext_ $self->[1], $self->[2], $self->[3], $lat, $lon, 0, $radius, 0
149	}
150
151	=back
152
153	=head1 ALGORITHM
154
155	The algorithm that this module implements consists of two parts: binning
156	and weighting (done when writing the database) and then finding the
157	nearest point.
158
159	The first part bins all data points into a grid which has its minimum cell
160	size at the equator and poles, with somewhat larger cells in between.
161
162	The lookup part will then read the cell that the coordinate is in and some
163	neighbouring cells (depending on the search radius, by default it will
164	read the eight cells around it).
165
166	It will then calculate the (squared) distance to the search coordinate
167	using an approximate euclidean distance on an equireactangular
168	projection. The squared distance is multiplied with a weight (1..25 for
169	the geonames database, based on population and adminstrative status,
170	always 1 for postal codes), and the minimum distance wins.
171
172	Binning should not introduce errors, but bigger bins can slow down lookup
173	times due to having to look at more places. The lookup assumes a spherical
174	shape for the earth, the equirectangular projection stretches distances
175	unevenly and the euclidean distance calculation introduces further
176	errors. For typical distance (<< 100km) and the intended usage, these
177	errors should be considered negligible.
178
179	=head1 SPEED
180
181	On my machine, C<lookup> typically does more than a million lookups per
182	second - performance varies depending on result density and number of
183	indexed points.
184
185	=head1 TENTATIVE ROADMAP
186
187	The database writer should be accessible via a module, so you can easily
188	generate your own databases without having to run an external command.
189
190	The API might be extended to allow for multiple lookups, multiple
191	returns, or nearest neighbour search, or more return values (distance,
192	coordinates).
193
194	Longer lookups will take advantage of perlmulticore.
195
196	=head1 PERL MULTICORE SUPPORT
197
198	This is not yet implemented:
199
200	This module supports the perl multicore specification
201	(L<http://perlmulticore.schmorp.de/>) when doing lookups.
202
203	=head1 SEE ALSO
204
205	L<geo-latlon2place-makedb> to create databases from common formats.
206
207	=head1 AUTHOR
208
209	Marc Lehmann <schmorp@schmorp.de>
210	http://home.schmorp.de/
211
212	=cut
213
214	1
215