… | |
… | |
15 | |
15 | |
16 | =head1 SYNOPSIS |
16 | =head1 SYNOPSIS |
17 | |
17 | |
18 | use Tree::M; |
18 | use Tree::M; |
19 | |
19 | |
|
|
20 | $M = new Tree::M |
|
|
21 | |
20 | =head1 DESCRIPTION |
22 | =head1 DESCRIPTION |
21 | |
23 | |
22 | (not yet) |
24 | (not yet) |
23 | |
25 | |
24 | Ever had the problem of managing multi-dimensional (spatial) data but your |
26 | Ever had the problem of managing multi-dimensional (spatial) data but your |
25 | database only had one-dimensional indices (b-tree etc.)? Queries like |
27 | database only had one-dimensional indices (b-tree etc.)? Queries like |
26 | |
28 | |
27 | select data from table where latitude > 40 and latitude < 50 |
29 | select data from table where latitude > 40 and latitude < 50 |
28 | and longitude> 50 and longitude< 70; |
30 | and longitude> 50 and longitude< 60; |
29 | |
31 | |
30 | are quite inefficient, unless longitude and latitude are part of the same |
32 | are quite inefficient, unless longitude and latitude are part of the same |
31 | spatial index (e.g. a r-tree). |
33 | spatial index (e.g. an R-tree). |
32 | |
34 | |
33 | An M-tree etc.. etc... |
35 | An M-tree is an index tree that does not directly look at the stored keys |
|
|
36 | but rather requires a I<distance> (a metric, e.g. a vector norm) function |
|
|
37 | to be defined that sorts keys according to their distance. In the example |
|
|
38 | above the distance function could be the maximum norm (C<max(x1-x2, |
|
|
39 | y1-y2)>). The lookup above would then be something like this: |
|
|
40 | |
|
|
41 | my $res = $M->range([45,55], 5); |
|
|
42 | |
|
|
43 | This module implements an M-tree. Although the data structure and the |
|
|
44 | distance function is arbitrary, the current version only implements |
|
|
45 | n-dimensional discrete vectors and hardwires the distance function to the |
|
|
46 | suared euclidean metric (i.e. C<(x1-x2)**2 + (y1-y2)**2 + (z1-z2)**2 + |
|
|
47 | ...>). Evolution towards more freedom is expected ;) |
34 | |
48 | |
35 | =head2 THE Tree::M CLASS |
49 | =head2 THE Tree::M CLASS |
36 | |
50 | |
37 | =over 4 |
51 | =over 4 |
38 | |
52 | |
39 | =item $M = new Tree::M arg => value, ... |
53 | =item $M = new Tree::M ndims, min, max, steps |
40 | |
54 | |
41 | ... |
55 | Creates a new M-Tree. Before it can be used you have to call one of the |
|
|
56 | C<create> or C<open> methods. |
42 | |
57 | |
43 | distance => specify a distance function. the default distance |
58 | ndims the number of dimensions each vector has |
44 | method uses this function in some way. |
59 | min the lowest allowable scalar value in each dimension |
45 | keyof => the keyof method uses this argument to determine |
60 | max the maximum allowable number |
46 | the key of an object |
61 | steps the number of discrete steps (used when stored externally) |
47 | min => the minimum number of objects stored in a node |
62 | |
48 | max => the maximum number of objects stored in a node |
63 | Example: create an M-Tree that stores 8-bit rgb-values: |
|
|
64 | |
|
|
65 | $M = new Tree::M 3, 0, 255, 256; |
|
|
66 | |
|
|
67 | Example: create an M-Tree that stores coordinates from -1..1 with 100 different steps: |
|
|
68 | |
|
|
69 | $M = new Tree::M 2, -1, 1, 100; |
|
|
70 | |
|
|
71 | =item $M->open(path) |
|
|
72 | |
|
|
73 | =item $M->create($path) |
|
|
74 | |
|
|
75 | Open or create the external storage file C<$path> and associate it with the tree. |
|
|
76 | |
|
|
77 | [this braindamaged API will go away ;)] |
|
|
78 | |
|
|
79 | =item $M->insert(\@v, $data) |
|
|
80 | |
|
|
81 | Insert a vector (given by an array reference) into the index and associate |
|
|
82 | it with the value C<$data> (a 32-bit integer). |
|
|
83 | |
|
|
84 | =item $res = $M->range(\@v, $radius) |
|
|
85 | |
|
|
86 | Search all entries not farther away from C<@v> then C<$radius> and return |
|
|
87 | an arrayref containing the searchresults. |
|
|
88 | |
|
|
89 | Each result is again anarrayref composed like this: |
|
|
90 | |
|
|
91 | [\@v, $data] |
|
|
92 | |
|
|
93 | e.g. the same as given to the C<insert> method. |
|
|
94 | |
|
|
95 | =item $res = $M->top(\@v, $n) |
|
|
96 | |
|
|
97 | Return the C<$n> "nearest neighbours". The results arrayref (see C<range>) |
|
|
98 | contains the C<$n> index values nearest to C<@v>, sorted for distance. |
|
|
99 | |
|
|
100 | =item $distance = $M->distance(\@v1, \@v2) |
|
|
101 | |
|
|
102 | Calculcate the distance between two vectors, just as they databse engine |
|
|
103 | would do it. |
|
|
104 | |
|
|
105 | =item $depth = $M->maxlevel |
|
|
106 | |
|
|
107 | Return the maximum height of the tree (usually a small integer specifying |
|
|
108 | the length of the path from the root to the farthest leaf) |
49 | |
109 | |
50 | =cut |
110 | =cut |
51 | |
111 | |
52 | =back |
112 | =back |
|
|
113 | |
|
|
114 | =head1 BUGS |
|
|
115 | |
|
|
116 | Inserting too many duplicate keys into the tree cause the C++ library to |
|
|
117 | die, so don't do that. |
53 | |
118 | |
54 | =head1 AUTHOR |
119 | =head1 AUTHOR |
55 | |
120 | |
56 | Marc Lehmann <pcg@goof.com>. |
121 | Marc Lehmann <pcg@goof.com>. |
57 | |
122 | |