String-Similarity/Similarity.pm

=head1 NAME

String::Similarity - calculate the similarity of two strings

=head1 SYNOPSIS

 use String::Similarity;

 $similarity = similarity $string1, $string2;
 $similarity = similarity $string1, $string2, $limit;

=head1 DESCRIPTION

=over 4

=cut

package String::Similarity;

require DynaLoader;

$VERSION = '1.02';
@ISA = qw/Exporter DynaLoader/;
@EXPORT = qw(similarity);
@EXPORT_OK = qw(fstrcmp);

bootstrap String::Similarity $VERSION;

=item $factor = similarity $string1, $string2, [$limit]

The C<similarity>-function calculates the similarity index of
its two arguments.  A value of C<0> means that the strings are
entirely different. A value of C<1> means that the strings are
identical. Everything else lies between 0 and 1 and describes the amount
of similarity between the strings.

It roughly works by looking at the smallest number of edits to change one
string into the other.

You can add an optional argument C<$limit> (default 0) that gives the
minimum similarity the two strings must satisfy. C<similarity> stops
analyzing the string as soon as the result drops below the given limit,
in which case the result will be invalid but lower than the given
C<$limit>. You can use this to speed up the common case of searching for
the most similar string from a set by specifing the maximum similarity
found so far.

=cut

# out of historical reasons, I prefer "fstrcmp" as the original name.
*similarity = *fstrcmp;

1;

=back

=head1 SEE ALSO

 The basic algorithm is described in:
 "An O(ND) Difference Algorithm and its Variations", Eugene Myers,
 Algorithmica Vol. 1 No. 2, 1986, pp. 251-266;
 see especially section 4.2, which describes the variation used below.

 The basic algorithm was independently discovered as described in:
 "Algorithms for Approximate String Matching", E. Ukkonen,
 Information and Control Vol. 64, 1985, pp. 100-118.

=head1 AUTHOR

 Marc Lehmann <schmorp@schmorp.de>
 http://home.schmorp.de/

 (the underlying fstrcmp function was taken from gnu diffutils and
 modified by Peter Miller <pmiller@agso.gov.au> and Marc Lehmann
 <schmorp@schmorp.de>).


Revision:	1.1
Committed:	Sat Jun 25 09:55:53 2005 UTC (19 years, 1 month ago) by root
Branch:	MAIN
Log Message:	* empty log message *
#	User	Rev	Content
1	root	1.1	=head1 NAME
2
3			String::Similarity - calculate the similarity of two strings
4
5			=head1 SYNOPSIS
6
7			use String::Similarity;
8
9			$similarity = similarity $string1, $string2;
10			$similarity = similarity $string1, $string2, $limit;
11
12			=head1 DESCRIPTION
13
14			=over 4
15
16			=cut
17
18			package String::Similarity;
19
20			require DynaLoader;
21
22			$VERSION = '1.02';
23			@ISA = qw/Exporter DynaLoader/;
24			@EXPORT = qw(similarity);
25			@EXPORT_OK = qw(fstrcmp);
26
27			bootstrap String::Similarity $VERSION;
28
29			=item $factor = similarity $string1, $string2, [$limit]
30
31			The C<similarity>-function calculates the similarity index of
32			its two arguments. A value of C<0> means that the strings are
33			entirely different. A value of C<1> means that the strings are
34			identical. Everything else lies between 0 and 1 and describes the amount
35			of similarity between the strings.
36
37			It roughly works by looking at the smallest number of edits to change one
38			string into the other.
39
40			You can add an optional argument C<$limit> (default 0) that gives the
41			minimum similarity the two strings must satisfy. C<similarity> stops
42			analyzing the string as soon as the result drops below the given limit,
43			in which case the result will be invalid but lower than the given
44			C<$limit>. You can use this to speed up the common case of searching for
45			the most similar string from a set by specifing the maximum similarity
46			found so far.
47
48			=cut
49
50			# out of historical reasons, I prefer "fstrcmp" as the original name.
51			similarity = fstrcmp;
52
53			1;
54
55			=back
56
57			=head1 SEE ALSO
58
59			The basic algorithm is described in:
60			"An O(ND) Difference Algorithm and its Variations", Eugene Myers,
61			Algorithmica Vol. 1 No. 2, 1986, pp. 251-266;
62			see especially section 4.2, which describes the variation used below.
63
64			The basic algorithm was independently discovered as described in:
65			"Algorithms for Approximate String Matching", E. Ukkonen,
66			Information and Control Vol. 64, 1985, pp. 100-118.
67
68			=head1 AUTHOR
69
70			Marc Lehmann <schmorp@schmorp.de>
71			http://home.schmorp.de/
72
73			(the underlying fstrcmp function was taken from gnu diffutils and
74			modified by Peter Miller <pmiller@agso.gov.au> and Marc Lehmann
75			<schmorp@schmorp.de>).
76
77
78