… | |
… | |
37 | The extraction method: the default is C<geonames>, which expects a |
37 | The extraction method: the default is C<geonames>, which expects a |
38 | geonames database (L<https://download.geonames.org/export/dump/>, for |
38 | geonames database (L<https://download.geonames.org/export/dump/>, for |
39 | example F<DE.txt>, F<cities500.txt> or F<allCountries.txt>) and extracts |
39 | example F<DE.txt>, F<cities500.txt> or F<allCountries.txt>) and extracts |
40 | I<placename, countrycode> strings from it. |
40 | I<placename, countrycode> strings from it. |
41 | |
41 | |
42 | The method C<geonames-postalcodes> (not yet implemented) |
42 | The method C<geonames-postalcodes> does the same, but for a geonames |
43 | does the same, but for a geonames postal code database |
|
|
44 | L<https://download.geonames.org/export/zip>. |
43 | postal code database L<https://download.geonames.org/export/zip>, and |
|
|
44 | extracts C<zip name, countrycopde> strings. |
45 | |
45 | |
46 | Lastly, you can specify a perl fragment that implements your own filtering |
46 | Lastly, you can specify a perl fragment that implements your own filtering |
47 | and extraction. |
47 | and extraction. |
48 | |
48 | |
49 | =back |
49 | =back |
… | |
… | |
59 | in C<$_>. The file is opened using the C<:perlio> layer, so if your input |
59 | in C<$_>. The file is opened using the C<:perlio> layer, so if your input |
60 | file is in UTF-8, so will be C<$_>. |
60 | file is in UTF-8, so will be C<$_>. |
61 | |
61 | |
62 | For example, the following would expect an input file with space separated |
62 | For example, the following would expect an input file with space separated |
63 | latitude, longitude, weight and name, where name can contain spaces, which |
63 | latitude, longitude, weight and name, where name can contain spaces, which |
64 | is useful when you wat to provide your own input data: |
64 | is useful when you want to provide your own input data: |
65 | |
65 | |
66 | geo-latlon2place-makedb --extract 'chomp; split / /, 4' input output |
66 | geo-latlon2place-makedb --extract 'chomp; split / /, 4' input output |
67 | |
67 | |
68 | A slighly more verbose example expecting only latitude, longitude and a |
68 | A slighly more verbose example expecting only latitude, longitude and a |
69 | name would be: |
69 | name would be: |
… | |
… | |
90 | weight, these should be self-explaining. The weight is used during search |
90 | weight, these should be self-explaining. The weight is used during search |
91 | and will be multiplied to the square of the distance, and is used to make |
91 | and will be multiplied to the square of the distance, and is used to make |
92 | larger cities win over small ones when the coordinate is somewhere between |
92 | larger cities win over small ones when the coordinate is somewhere between |
93 | them. |
93 | them. |
94 | |
94 | |
|
|
95 | The standard extractors (C<geonames> and C<geonames-postalcodes>) provide |
|
|
96 | a UTF-8-encoded string as blob, but any binary data will do, for example, |
|
|
97 | if you want to associate your coordinate pairs with some short-ish |
|
|
98 | integer codes, you could do this: |
|
|
99 | |
|
|
100 | geo-latlon2place-makedb --extract ' |
|
|
101 | chomp; |
|
|
102 | my ($lat, $lon, $id) = split / /, 4; |
|
|
103 | ($lat, $lon, 1, pack "w", $id) |
|
|
104 | ' input output |
|
|
105 | |
|
|
106 | And later use C<unpack "w"> on the data returned by C<lookup>. |
|
|
107 | |
95 | The C<geonames> filter looks similar to this fragment, which shows off |
108 | The C<geonames> filter looks similar to the following fragment, which |
96 | more possibilities: |
109 | shows off some more filtering possibilities: |
97 | |
110 | |
98 | my ($id, $name, undef, undef, $lat, $lon, $t1, $t2, $cc, undef, $a1, $s2, $a3, $a4, $pop, undef) = split /\t/; |
111 | my ($id, $name, undef, undef, $lat, $lon, $t1, $t2, $cc, undef, $a1, $s2, $a3, $a4, $pop, undef) = split /\t/; |
99 | |
112 | |
100 | return if $t1 ne "P"; # only places |
113 | return if $t1 ne "P"; # only places |
101 | |
114 | |
… | |
… | |
114 | # actually place names, so ignore very long names |
127 | # actually place names, so ignore very long names |
115 | 60 > length $name |
128 | 60 > length $name |
116 | or return; |
129 | or return; |
117 | |
130 | |
118 | # we estimate a weight by dividing 25 by the radius of the place, |
131 | # we estimate a weight by dividing 25 by the radius of the place, |
119 | # which we get by assuming a fixed population density of 5000 people/kmĀ², |
132 | # which we get by assuming a fixed population density of 5000 # people |
120 | # which is almost always a considerable over-estimate. |
133 | # per square km, # which is almost always a considerable over-estimate. |
121 | # 25 and 5000 are pretty much made-up, feel free to improve and |
134 | # 25 and 5000 are pretty much made-up, feel free to improve and |
122 | # send me the results. |
135 | # send me the results. |
123 | my $w = 25 / (1 + sqrt $pop / 5000); |
136 | my $w = 25 / (1 + sqrt $pop / 5000); |
124 | |
137 | |
125 | # administrative centers get a fixed low weight |
138 | # administrative centers get a fixed low weight |