--- Geo-LatLon2Place/README 2022/03/14 02:41:52 1.1 +++ Geo-LatLon2Place/README 2022/03/14 03:14:41 1.2 @@ -1,185 +1,126 @@ NAME - Convert::Scalar - convert between different representations of perl - scalars + Geo::LatLon2Place - convert latitude and longitude to nearest place SYNOPSIS - use Convert::Scalar; + use Geo::LatLon2Place; + + my $db = Geo::LatLon2Place->new ("/var/lib/mydb.cdb"); DESCRIPTION - This module exports various internal perl methods that change the - internal representation or state of a perl scalar. All of these work - in-place, that is, they modify their scalar argument. No functions are - exported by default. - - The following export tags exist: - - :utf8 all functions with utf8 in their name - :taint all functions with taint in their name - :refcnt all functions with refcnt in their name - :ok all *ok-functions. - - utf8 scalar[, mode] - Returns true when the given scalar is marked as utf8, false - otherwise. If the optional mode argument is given, also forces the - interpretation of the string to utf8 (mode true) or plain bytes - (mode false). The actual (byte-) content is not changed. The return - value always reflects the state before any modification is done. - - This function is useful when you "import" utf8-data into perl, or - when some external function (e.g. storing/retrieving from a - database) removes the utf8-flag. - - utf8_on scalar - Similar to "utf8 scalar, 1", but additionally returns the scalar - (the argument is still modified in-place). - - utf8_off scalar - Similar to "utf8 scalar, 0", but additionally returns the scalar - (the argument is still modified in-place). - - utf8_valid scalar [Perl 5.7] - Returns true if the bytes inside the scalar form a valid utf8 - string, false otherwise (the check is independent of the actual - encoding perl thinks the string is in). - - utf8_upgrade scalar - Convert the string content of the scalar in-place to its - UTF8-encoded form (and also returns it). - - utf8_downgrade scalar[, fail_ok=0] - Attempt to convert the string content of the scalar from - UTF8-encoded to ISO-8859-1. This may not be possible if the string - contains characters that cannot be represented in a single byte; if - this is the case, it leaves the scalar unchanged and either returns - false or, if "fail_ok" is not true (the default), croaks. - - utf8_encode scalar - Convert the string value of the scalar to UTF8-encoded, but then - turn off the "SvUTF8" flag so that it looks like bytes to perl - again. (Might be removed in future versions). - - utf8_length scalar - Returns the number of characters in the string, counting wide UTF8 - characters as a single character, independent of wether the scalar - is marked as containing bytes or mulitbyte characters. - - $old = readonly scalar[, $new] - Returns whether the scalar is currently readonly, and sets or clears - the readonly status if a new status is given. - - readonly_on scalar - Sets the readonly flag on the scalar. - - readonly_off scalar - Clears the readonly flag on the scalar. - - unmagic scalar, type - Remove the specified magic from the scalar (DANGEROUS!). - - weaken scalar - Weaken a reference. (See also WeakRef). - - taint scalar - Taint the scalar. - - tainted scalar - returns true when the scalar is tainted, false otherwise. - - untaint scalar - Remove the tainted flag from the specified scalar. - - length = len scalar - Returns SvLEN (scalar), that is, the actual number of bytes - allocated to the string value, or "undef", is the scalar has no - string value. - - scalar = grow scalar, newlen - Sets the memory area used for the scalar to the given length, if the - current length is less than the new value. This does not affect the - contents of the scalar, but is only useful to "pre-allocate" memory - space if you know the scalar will grow. The return value is the - modified scalar (the scalar is modified in-place). - - scalar = extend scalar, addlen=64 - Reserves enough space in the scalar so that addlen bytes can be - appended without reallocating it. The actual contents of the scalar - will not be affected. The modified scalar will also be returned. - - This function is meant to make append workloads efficient - if you - append a short string to a scalar many times (millions of times), - then perl will have to reallocate and copy the scalar basically - every time. - - If you instead use "extend $scalar, length $shortstring", then - Convert::Scalar will use a "size to next power of two, roughly" - algorithm, so as the scalar grows, perl will have to resize and copy - it less and less often. - - nread = extend_read fh, scalar, addlen=64 - Calls "extend scalar, addlen" to ensure some space is available, - then do the equivalent of "sysread" to the end, to try to fill the - extra space. Returns how many bytes have been read, 0 on EOF or - undef> on eror, just like "sysread". - - This function is useful to implement many protocols where you read - some data, see if it is enough to decode, and if not, read some - more, where the naive or easy way of doing this would result in bad - performance. - - nread = read_all fh, scalar, length - Tries to read "length" bytes into "scalar". Unlike "read" or - "sysread", it will try to read more bytes if not all bytes could be - read in one go (this is often called "xread" in C). - - Returns the total nunmber of bytes read (normally "length", unless - an error or EOF occured), 0 on EOF and "undef" on errors. - - nwritten = write_all fh, scalar - Like "readall", but for writes - the equivalent of the "xwrite" - function often seen in C. - - refcnt scalar[, newrefcnt] - Returns the current reference count of the given scalar and - optionally sets it to the given reference count. - - refcnt_inc scalar - Increments the reference count of the given scalar inplace. - - refcnt_dec scalar - Decrements the reference count of the given scalar inplace. Use - "weaken" instead if you understand what this function is fore. - Better yet: don't use this module in this case. - - refcnt_rv scalar[, newrefcnt] - Works like "refcnt", but dereferences the given reference first. - This is useful to find the reference count of arrays or hashes, - which cannot be passed directly. Remember that taking a reference of - some object increases it's reference count, so the reference count - used by the *_rv-functions tend to be one higher. - - refcnt_inc_rv scalar - Works like "refcnt_inc", but dereferences the given reference first. - - refcnt_dec_rv scalar - Works like "refcnt_dec", but dereferences the given reference first. - - ok scalar - uok scalar - rok scalar - pok scalar - nok scalar - niok scalar - Calls SvOK, SvUOK, SvROK, SvPOK, SvNOK or SvNIOK on the given - scalar, respectively. - - CANDIDATES FOR FUTURE RELEASES - The following API functions (perlapi) are considered for future - inclusion in this module If you want them, write me. - - sv_upgrade - sv_pvn_force - sv_pvutf8n_force - the sv2xx family + This is a single-purpose module that tries to do one job: find the + nearest placename for a point on earth. It doesn't claim to do a perfect + job, but it tries to be simple to set up, simple to use and be fast. It + doesn't attempt to provide many features or nifty algorithms, and is + meant to be used in situations where you simply need a name for a + coordinate without becoming a GIS expert first. + + BUILDING, SETTING UP AND USAGE + To build this module, you need tinycdb, a cdb implementation by Michael + Tokarev, or a compatible library. On GNU/Debian-based systems you can + get this by executing apt-get install libcdb-dev. + + After install the module, you need to generate a database using the + geo-latlon2place-makedb command. + + Currently, it accepts various databases from geonames + (, note the license), for example, + cities500.zip, which lists all places with population 500 or more: + + wget https://download.geonames.org/export/dump/cities500.zip + unzip cities500.zip + geo-latlon2place-makedb cities500.txt cities500.ll2p + + This will create a file ll2p.cdb that you can use for lookups with this + module. At the time of this writing, the cities500 database results in + about a 10MB file while the allCountries database results in about + 120MB. + + Lookups will return a string of the form "placename, countrycode". + + If you want to use the geonames postal code database (from + ), use these commands: + + wget https://download.geonames.org/export/zip/allCountries.zip + unzip allCountries.zip + geo-latlon2place-makedb --extract geonames-postalcodes allCountries.txt allCountries.ll2p + + You can then use the resulting database like this: + + my $lookup = Geo::LatLon2Place->new ("allCountries.ll2p"); + + # and then do as many queries as you wish: + my $res = $lookup->(49, 8.4); + if (defined $res) { + utf8::decode $res; # convert $res from utf-8 to unicode + print "49, 8.4 found $res\n"; # should be Karlsruhe, DE for geonames + } else { + print "nothing found at 49, 8.4\n"; + } + +THE Geo::LatLon2Place CLASS + $lookup = Geo::LatLon2Place->new ($path) + Opens a database created by geo-latlon2place-makedb and return an + object that allows you to run queries against it. + + The database will be mmaped, so it will not be loaded into memory, + but your operating system will cache it appropriately. + + $res = $lookup->lookup ($lat $lon[, $radius]) + Looks up the point in the database that is "nearest" to "$lat, + $lon", search at leats up to $radius kilometres. The default for + $radius is the cell size the database is built with, and this + usually works best, so you usually do not specify this parameter. + + If something is found, the associated data blob (always a binary + string) is returned, otherwise you receive "undef". + + Unless you specify a cusotrm format, the data blob is actually a + UTF-8 string, so you might want to call "utf8::decode" on it to get + a unicode astring. + + At the moment, the implementation is in pure perl, but will + eventually move to C. + +ALGORITHM + The algorithm that this module implements consists of two parts: binning + and weighting (done when writing the database) and then finding the + nearest point. + + The first part bins all data points into a grid which has its minimum + cell size at the equator and poles, with somewhat larger cells in + between. + + The lookup part will then read the cell that the coordinate is in and + some neighbouring cells (depending on the search radius, by default it + will read the eight cells around it). + + It will then calculate the (squared) distance to the search coordinate + using an approximate euclidean distance on an equireactangular + projection. The squared distance is multiplied with a weight (1..25 for + the geonames database, based on population and adminstrative status, + always 1 for postcal codes), and the minimum distance wins. + + Binning should not introduce errors, but bigger bins can slow down + lookup times due to having to look at more places. The lookup assumes a + spherical shape for the earth, the equirectangular projection stretches + distances unevenly and the euclidean distance calculation introduces + further errors. For typical distance (<< 100km) and the intended usage, + these errors should be considered negligible. + +SPEED + The current implementation is written in pure perl, and on my machine, + typically does 10000-200000 lookups per second. The goal for version 1.0 + is to move the lookup to C. + +TENTATIVE ROADMAP + The database writer should be accessible via a module, so you cna easily + generate your own databases without having to run an external command. + + The api might be extended to allow for multiple returns, or nearest + neighbour search. + +SEE ALSO + geo-latlon2place-makedb to create databases from common formats. AUTHOR Marc Lehmann