1 | NAME |
1 | NAME |
2 | Convert::Scalar - convert between different representations of perl |
2 | Geo::LatLon2Place - convert latitude and longitude to nearest place |
3 | scalars |
|
|
4 | |
3 | |
5 | SYNOPSIS |
4 | SYNOPSIS |
6 | use Convert::Scalar; |
5 | use Geo::LatLon2Place; |
|
|
6 | |
|
|
7 | my $db = Geo::LatLon2Place->new ("/var/lib/mydb.cdb"); |
7 | |
8 | |
8 | DESCRIPTION |
9 | DESCRIPTION |
9 | This module exports various internal perl methods that change the |
10 | This is a single-purpose module that tries to do one job: find the |
10 | internal representation or state of a perl scalar. All of these work |
11 | nearest placename for a point on earth. It doesn't claim to do a perfect |
11 | in-place, that is, they modify their scalar argument. No functions are |
12 | job, but it tries to be simple to set up, simple to use and be fast. It |
12 | exported by default. |
13 | doesn't attempt to provide many features or nifty algorithms, and is |
|
|
14 | meant to be used in situations where you simply need a name for a |
|
|
15 | coordinate without becoming a GIS expert first. |
13 | |
16 | |
14 | The following export tags exist: |
17 | BUILDING, SETTING UP AND USAGE |
|
|
18 | To build this module, you need tinycdb, a cdb implementation by Michael |
|
|
19 | Tokarev, or a compatible library. On GNU/Debian-based systems you can |
|
|
20 | get this by executing apt-get install libcdb-dev. |
15 | |
21 | |
16 | :utf8 all functions with utf8 in their name |
22 | After install the module, you need to generate a database using the |
17 | :taint all functions with taint in their name |
23 | geo-latlon2place-makedb command. |
18 | :refcnt all functions with refcnt in their name |
|
|
19 | :ok all *ok-functions. |
|
|
20 | |
24 | |
21 | utf8 scalar[, mode] |
25 | Currently, it accepts various databases from geonames |
22 | Returns true when the given scalar is marked as utf8, false |
26 | (<https://www.geonames.org/export/>, note the license), for example, |
23 | otherwise. If the optional mode argument is given, also forces the |
27 | cities500.zip, which lists all places with population 500 or more: |
24 | interpretation of the string to utf8 (mode true) or plain bytes |
|
|
25 | (mode false). The actual (byte-) content is not changed. The return |
|
|
26 | value always reflects the state before any modification is done. |
|
|
27 | |
28 | |
28 | This function is useful when you "import" utf8-data into perl, or |
29 | wget https://download.geonames.org/export/dump/cities500.zip |
29 | when some external function (e.g. storing/retrieving from a |
30 | unzip cities500.zip |
30 | database) removes the utf8-flag. |
31 | geo-latlon2place-makedb cities500.txt cities500.ll2p |
31 | |
32 | |
32 | utf8_on scalar |
33 | This will create a file ll2p.cdb that you can use for lookups with this |
33 | Similar to "utf8 scalar, 1", but additionally returns the scalar |
34 | module. At the time of this writing, the cities500 database results in |
34 | (the argument is still modified in-place). |
35 | about a 10MB file while the allCountries database results in about |
|
|
36 | 120MB. |
35 | |
37 | |
36 | utf8_off scalar |
38 | Lookups will return a string of the form "placename, countrycode". |
37 | Similar to "utf8 scalar, 0", but additionally returns the scalar |
|
|
38 | (the argument is still modified in-place). |
|
|
39 | |
39 | |
40 | utf8_valid scalar [Perl 5.7] |
40 | If you want to use the geonames postal code database (from |
41 | Returns true if the bytes inside the scalar form a valid utf8 |
41 | <https://www.geonames.org/zip/>), use these commands: |
42 | string, false otherwise (the check is independent of the actual |
|
|
43 | encoding perl thinks the string is in). |
|
|
44 | |
42 | |
45 | utf8_upgrade scalar |
43 | wget https://download.geonames.org/export/zip/allCountries.zip |
46 | Convert the string content of the scalar in-place to its |
44 | unzip allCountries.zip |
47 | UTF8-encoded form (and also returns it). |
45 | geo-latlon2place-makedb --extract geonames-postalcodes allCountries.txt allCountries.ll2p |
48 | |
46 | |
49 | utf8_downgrade scalar[, fail_ok=0] |
47 | You can then use the resulting database like this: |
50 | Attempt to convert the string content of the scalar from |
|
|
51 | UTF8-encoded to ISO-8859-1. This may not be possible if the string |
|
|
52 | contains characters that cannot be represented in a single byte; if |
|
|
53 | this is the case, it leaves the scalar unchanged and either returns |
|
|
54 | false or, if "fail_ok" is not true (the default), croaks. |
|
|
55 | |
48 | |
56 | utf8_encode scalar |
49 | my $lookup = Geo::LatLon2Place->new ("allCountries.ll2p"); |
57 | Convert the string value of the scalar to UTF8-encoded, but then |
|
|
58 | turn off the "SvUTF8" flag so that it looks like bytes to perl |
|
|
59 | again. (Might be removed in future versions). |
|
|
60 | |
50 | |
61 | utf8_length scalar |
51 | # and then do as many queries as you wish: |
62 | Returns the number of characters in the string, counting wide UTF8 |
52 | my $res = $lookup->(49, 8.4); |
63 | characters as a single character, independent of wether the scalar |
53 | if (defined $res) { |
64 | is marked as containing bytes or mulitbyte characters. |
54 | utf8::decode $res; # convert $res from utf-8 to unicode |
|
|
55 | print "49, 8.4 found $res\n"; # should be Karlsruhe, DE for geonames |
|
|
56 | } else { |
|
|
57 | print "nothing found at 49, 8.4\n"; |
|
|
58 | } |
65 | |
59 | |
66 | $old = readonly scalar[, $new] |
60 | THE Geo::LatLon2Place CLASS |
67 | Returns whether the scalar is currently readonly, and sets or clears |
61 | $lookup = Geo::LatLon2Place->new ($path) |
68 | the readonly status if a new status is given. |
62 | Opens a database created by geo-latlon2place-makedb and return an |
|
|
63 | object that allows you to run queries against it. |
69 | |
64 | |
70 | readonly_on scalar |
65 | The database will be mmaped, so it will not be loaded into memory, |
71 | Sets the readonly flag on the scalar. |
66 | but your operating system will cache it appropriately. |
72 | |
67 | |
73 | readonly_off scalar |
68 | $res = $lookup->lookup ($lat $lon[, $radius]) |
74 | Clears the readonly flag on the scalar. |
69 | Looks up the point in the database that is "nearest" to "$lat, |
|
|
70 | $lon", search at leats up to $radius kilometres. The default for |
|
|
71 | $radius is the cell size the database is built with, and this |
|
|
72 | usually works best, so you usually do not specify this parameter. |
75 | |
73 | |
76 | unmagic scalar, type |
74 | If something is found, the associated data blob (always a binary |
77 | Remove the specified magic from the scalar (DANGEROUS!). |
75 | string) is returned, otherwise you receive "undef". |
78 | |
76 | |
79 | weaken scalar |
77 | Unless you specify a cusotrm format, the data blob is actually a |
80 | Weaken a reference. (See also WeakRef). |
78 | UTF-8 string, so you might want to call "utf8::decode" on it to get |
|
|
79 | a unicode astring. |
81 | |
80 | |
82 | taint scalar |
81 | At the moment, the implementation is in pure perl, but will |
83 | Taint the scalar. |
82 | eventually move to C. |
84 | |
83 | |
85 | tainted scalar |
84 | ALGORITHM |
86 | returns true when the scalar is tainted, false otherwise. |
85 | The algorithm that this module implements consists of two parts: binning |
|
|
86 | and weighting (done when writing the database) and then finding the |
|
|
87 | nearest point. |
87 | |
88 | |
88 | untaint scalar |
89 | The first part bins all data points into a grid which has its minimum |
89 | Remove the tainted flag from the specified scalar. |
90 | cell size at the equator and poles, with somewhat larger cells in |
|
|
91 | between. |
90 | |
92 | |
91 | length = len scalar |
93 | The lookup part will then read the cell that the coordinate is in and |
92 | Returns SvLEN (scalar), that is, the actual number of bytes |
94 | some neighbouring cells (depending on the search radius, by default it |
93 | allocated to the string value, or "undef", is the scalar has no |
95 | will read the eight cells around it). |
94 | string value. |
|
|
95 | |
96 | |
96 | scalar = grow scalar, newlen |
97 | It will then calculate the (squared) distance to the search coordinate |
97 | Sets the memory area used for the scalar to the given length, if the |
98 | using an approximate euclidean distance on an equireactangular |
98 | current length is less than the new value. This does not affect the |
99 | projection. The squared distance is multiplied with a weight (1..25 for |
99 | contents of the scalar, but is only useful to "pre-allocate" memory |
100 | the geonames database, based on population and adminstrative status, |
100 | space if you know the scalar will grow. The return value is the |
101 | always 1 for postcal codes), and the minimum distance wins. |
101 | modified scalar (the scalar is modified in-place). |
|
|
102 | |
102 | |
103 | scalar = extend scalar, addlen=64 |
103 | Binning should not introduce errors, but bigger bins can slow down |
104 | Reserves enough space in the scalar so that addlen bytes can be |
104 | lookup times due to having to look at more places. The lookup assumes a |
105 | appended without reallocating it. The actual contents of the scalar |
105 | spherical shape for the earth, the equirectangular projection stretches |
106 | will not be affected. The modified scalar will also be returned. |
106 | distances unevenly and the euclidean distance calculation introduces |
|
|
107 | further errors. For typical distance (<< 100km) and the intended usage, |
|
|
108 | these errors should be considered negligible. |
107 | |
109 | |
108 | This function is meant to make append workloads efficient - if you |
110 | SPEED |
109 | append a short string to a scalar many times (millions of times), |
111 | The current implementation is written in pure perl, and on my machine, |
110 | then perl will have to reallocate and copy the scalar basically |
112 | typically does 10000-200000 lookups per second. The goal for version 1.0 |
111 | every time. |
113 | is to move the lookup to C. |
112 | |
114 | |
113 | If you instead use "extend $scalar, length $shortstring", then |
115 | TENTATIVE ROADMAP |
114 | Convert::Scalar will use a "size to next power of two, roughly" |
116 | The database writer should be accessible via a module, so you cna easily |
115 | algorithm, so as the scalar grows, perl will have to resize and copy |
117 | generate your own databases without having to run an external command. |
116 | it less and less often. |
|
|
117 | |
118 | |
118 | nread = extend_read fh, scalar, addlen=64 |
119 | The api might be extended to allow for multiple returns, or nearest |
119 | Calls "extend scalar, addlen" to ensure some space is available, |
120 | neighbour search. |
120 | then do the equivalent of "sysread" to the end, to try to fill the |
|
|
121 | extra space. Returns how many bytes have been read, 0 on EOF or |
|
|
122 | undef> on eror, just like "sysread". |
|
|
123 | |
121 | |
124 | This function is useful to implement many protocols where you read |
122 | SEE ALSO |
125 | some data, see if it is enough to decode, and if not, read some |
123 | geo-latlon2place-makedb to create databases from common formats. |
126 | more, where the naive or easy way of doing this would result in bad |
|
|
127 | performance. |
|
|
128 | |
|
|
129 | nread = read_all fh, scalar, length |
|
|
130 | Tries to read "length" bytes into "scalar". Unlike "read" or |
|
|
131 | "sysread", it will try to read more bytes if not all bytes could be |
|
|
132 | read in one go (this is often called "xread" in C). |
|
|
133 | |
|
|
134 | Returns the total nunmber of bytes read (normally "length", unless |
|
|
135 | an error or EOF occured), 0 on EOF and "undef" on errors. |
|
|
136 | |
|
|
137 | nwritten = write_all fh, scalar |
|
|
138 | Like "readall", but for writes - the equivalent of the "xwrite" |
|
|
139 | function often seen in C. |
|
|
140 | |
|
|
141 | refcnt scalar[, newrefcnt] |
|
|
142 | Returns the current reference count of the given scalar and |
|
|
143 | optionally sets it to the given reference count. |
|
|
144 | |
|
|
145 | refcnt_inc scalar |
|
|
146 | Increments the reference count of the given scalar inplace. |
|
|
147 | |
|
|
148 | refcnt_dec scalar |
|
|
149 | Decrements the reference count of the given scalar inplace. Use |
|
|
150 | "weaken" instead if you understand what this function is fore. |
|
|
151 | Better yet: don't use this module in this case. |
|
|
152 | |
|
|
153 | refcnt_rv scalar[, newrefcnt] |
|
|
154 | Works like "refcnt", but dereferences the given reference first. |
|
|
155 | This is useful to find the reference count of arrays or hashes, |
|
|
156 | which cannot be passed directly. Remember that taking a reference of |
|
|
157 | some object increases it's reference count, so the reference count |
|
|
158 | used by the *_rv-functions tend to be one higher. |
|
|
159 | |
|
|
160 | refcnt_inc_rv scalar |
|
|
161 | Works like "refcnt_inc", but dereferences the given reference first. |
|
|
162 | |
|
|
163 | refcnt_dec_rv scalar |
|
|
164 | Works like "refcnt_dec", but dereferences the given reference first. |
|
|
165 | |
|
|
166 | ok scalar |
|
|
167 | uok scalar |
|
|
168 | rok scalar |
|
|
169 | pok scalar |
|
|
170 | nok scalar |
|
|
171 | niok scalar |
|
|
172 | Calls SvOK, SvUOK, SvROK, SvPOK, SvNOK or SvNIOK on the given |
|
|
173 | scalar, respectively. |
|
|
174 | |
|
|
175 | CANDIDATES FOR FUTURE RELEASES |
|
|
176 | The following API functions (perlapi) are considered for future |
|
|
177 | inclusion in this module If you want them, write me. |
|
|
178 | |
|
|
179 | sv_upgrade |
|
|
180 | sv_pvn_force |
|
|
181 | sv_pvutf8n_force |
|
|
182 | the sv2xx family |
|
|
183 | |
124 | |
184 | AUTHOR |
125 | AUTHOR |
185 | Marc Lehmann <schmorp@schmorp.de> |
126 | Marc Lehmann <schmorp@schmorp.de> |
186 | http://home.schmorp.de/ |
127 | http://home.schmorp.de/ |
187 | |
128 | |