[ViewVC] Contents of: cvs/Convert-UUlib/README

NAME
    Convert::UUlib - decode uu/xx/b64/mime/yenc/etc-encoded data from a
    massive number of files

SYNOPSIS
     use Convert::UUlib ':all';
 
     # read all the files named on the commandline and decode them
     # into the CURRENT directory. See below for a longer example.
     LoadFile $_ for @ARGV;

     for my $uu (GetFileList) {
        if ($uu->state & FILE_OK) {
          $uu->decode;
          print $uu->filename, "\n";
        }
     }

DESCRIPTION
    This module started as an interface to the uulib/uudeview library by
    Frank Pilhofer that can be used to decode all kinds of usenet (and
    other) binary messages.

    After upstream abondoned the project, th library was continuously
    bugfixed and improved in this module, with major focuses on security
    fixes, correctness and speed (that does not mean that this library is
    considered safe with untrusted data, but it surely is safer than the
    poriginal uudeview).

    Read the file doc/library.pdf from the distribution for in-depth
    information about the C-library used in this interface, and the rest of
    this document and especially the non-trivial decoder program at the end.

EXPORTED CONSTANTS
  Action code constants
      ACT_IDLE      we don't do anything
      ACT_SCANNING  scanning an input file
      ACT_DECODING  decoding into a temp file
      ACT_COPYING   copying temp to target
      ACT_ENCODING  encoding a file

  Message severity levels
      MSG_MESSAGE   just a message, nothing important
      MSG_NOTE      something that should be noticed
      MSG_WARNING   important msg, processing continues
      MSG_ERROR     processing has been terminated
      MSG_FATAL     decoder cannot process further requests
      MSG_PANIC     recovery impossible, app must terminate

  Options
      OPT_VERSION   version number MAJOR.MINORplPATCH (ro)
      OPT_FAST      assumes only one part per file
      OPT_DUMBNESS  switch off the program's intelligence
      OPT_BRACKPOL  give numbers in [] higher precendence
      OPT_VERBOSE   generate informative messages
      OPT_DESPERATE try to decode incomplete files
      OPT_IGNREPLY  ignore RE:plies (off by default)
      OPT_OVERWRITE whether it's OK to overwrite ex. files
      OPT_SAVEPATH  prefix to save-files on disk
      OPT_IGNMODE   ignore the original file mode
      OPT_DEBUG     print messages with FILE/LINE info
      OPT_ERRNO     get last error code for RET_IOERR (ro)
      OPT_PROGRESS  retrieve progress information
      OPT_USETEXT   handle text messages
      OPT_PREAMB    handle Mime preambles/epilogues
      OPT_TINYB64   detect short B64 outside of Mime
      OPT_ENCEXT    extension for single-part encoded files
      OPT_REMOVE    remove input files after decoding (dangerous)
      OPT_MOREMIME  strict MIME adherence
      OPT_DOTDOT    ".."-unescaping has not yet been done on input files
      OPT_RBUF      set default read I/O buffer size in bytes
      OPT_WBUF      set default write I/O buffer size in bytes
      OPT_AUTOCHECK automatically check file list after every loadfile

  Result/Error codes
      RET_OK        everything went fine
      RET_IOERR     I/O Error - examine errno
      RET_NOMEM     not enough memory
      RET_ILLVAL    illegal value for operation
      RET_NODATA    decoder didn't find any data
      RET_NOEND     encoded data wasn't ended properly
      RET_UNSUP     unsupported function (encoding)
      RET_EXISTS    file exists (decoding)
      RET_CONT      continue -- special from ScanPart
      RET_CANCEL    operation canceled

  File States
     This code is zero, i.e. "false":

      UUFILE_READ   Read in, but not further processed

     The following state codes are or'ed together:

      FILE_MISPART  Missing Part(s) detected
      FILE_NOBEGIN  No 'begin' found
      FILE_NOEND    No 'end' found
      FILE_NODATA   File does not contain valid uudata
      FILE_OK       All Parts found, ready to decode
      FILE_ERROR    Error while decoding
      FILE_DECODED  Successfully decoded
      FILE_TMPFILE  Temporary decoded file exists

  Encoding types
      UU_ENCODED    UUencoded data
      B64_ENCODED   Mime-Base64 data
      XX_ENCODED    XXencoded data
      BH_ENCODED    Binhex encoded
      PT_ENCODED    Plain-Text encoded (MIME)
      QP_ENCODED    Quoted-Printable (MIME)
      YENC_ENCODED  yEnc encoded (non-MIME)

EXPORTED FUNCTIONS
  Initializing and cleanup
    Initialize is automatically called when the module is loaded and
    allocates quite a small amount of memory for todays machines ;) CleanUp
    releases that again.

    On my machine, a fairly complete decode with DBI backend needs about
    10MB RSS to decode 20000 files.

    CleanUp
        Release memory, file items and clean up files. Should be called
        after a decoidng run, if you want to start a new one.

  Setting and querying options
    $option = GetOption OPT_xxx
    SetOption OPT_xxx, opt-value

    See the "OPT_xxx" constants above to see which options exist.

  Setting various callbacks
    SetMsgCallback [callback-function]
    SetBusyCallback [callback-function]
    SetFileCallback [callback-function]
    SetFNameFilter [callback-function]

  Call the currently selected FNameFilter
    $file = FNameFilter $file

  Loading sourcefiles, optionally fuzzy merge and start decoding
    ($retval, $count) = LoadFile $fname, [$id, [$delflag, [$partno]]]
        Load the given file and scan it for encoded contents. Optionally tag
        it with the given id, and if $delflag is true, delete the file after
        it is no longer necessary. If you are certain of the part number,
        you can specify it as the last argument.

        A better (usually faster) way of doing this is using the
        "SetFNameFilter" functionality.

    $retval = Smerge $pass
        If you are desperate, try to call "Smerge" with increasing $pass
        values, beginning at 0, to try to merge parts that usually would not
        have been merged.

        Most probably this will result in garbled files, so never do this by
        default, except:

        If the "OPT_AUTOCHECK" option has been disabled (by default it is
        enabled) to speed up file loading, then you *have* to call "Smerge
        -1" after loading all files as an additional pre-pass (which is
        normally done by "LoadFile").

    $item = GetFileListItem $item_number
        Return the $item structure for the $item_number'th found file, or
        "undef" of no file with that number exists.

        The first file has number 0, and the series has no holes, so you can
        iterate over all files by starting with zero and incrementing until
        you hit "undef".

        This function has to walk the linear list of fils on each access, so
        if you want to iterate over all items, it is usually faster to use
        "GetFileList".

    @items = GetFileList
        Similar to "GetFileListItem", but returns all files in one go, which
        is very much faster for large number of items, and has no drawbacks
        when used for a small number of items.

  Decoding files
    $retval = $item->rename ($newname)
        Change the ondisk filename where the decoded file will be saved.

    $retval = $item->decode_temp
        Decode the file into a temporary location, use "$item->infile" to
        retrieve the temporary filename.

    $retval = $item->remove_temp
        Remove the temporarily decoded file again.

    $retval = $item->decode ([$target_path])
        Decode the file to its destination, or the given target path.

    $retval = $item->info (callback-function)

  Querying (and setting) item attributes
    $state = $item->state
    $mode = $item->mode ([newmode])
    $uudet = $item->uudet
    $size = $item->size
    $filename = $item->filename ([newfilename})
    $subfname = $item->subfname
    $mimeid = $item->mimeid
    $mimetype = $item->mimetype
    $binfile = $item->binfile

  Information about source parts
    $parts = $item->parts
        Return information about all parts (source files) used to decode the
        file as a list of hashrefs with the following structure:

         {
           partno   => <integer describing the part number, starting with 1>,
           # the following member sonly exist when they contain useful information
           sfname   => <local pathname of the file where this part is from>,
           filename => <the ondisk filename of the decoded file>,
           subfname => <used to cluster postings, possibly the posting filename>,
           subject  => <the subject of the posting/mail>,
           origin   => <the possible source (From) address>,
           mimetype => <the possible mimetype of the decoded file>,
           mimeid   => <the id part of the Content-Type>,
         }

        Usually you are interested mostly the "sfname" and possibly the
        "partno" and "filename" members.

  Functions below are not documented and not very well tested - feedback welcome
      QuickDecode
      EncodeMulti
      EncodePartial
      EncodeToStream
      EncodeToFile
      E_PrepSingle
      E_PrepPartial

  EXTENSION FUNCTIONS
    Functions found in this module but not documented in the uulib
    documentation:

    $msg = straction ACT_xxx
        Return a human readable string representing the given action code.

    $msg = strerror RET_xxx
        Return a human readable string representing the given error code.

    $str = strencoding xxx_ENCODED
        Return the name of the encoding type as a string.

    $str = strmsglevel MSG_xxx
        Returns the message level as a string.

    SetFileNameCallback $cb
        Sets (or queries) the FileNameCallback, which is called whenever the
        decoding library can't find a filename and wants to extract a
        filename from the subject line of a posting. The callback will be
        called with two arguments, the subject line and the current
        candidate for the filename. The latter argument can be "undef",
        which means that no filename could be found (and likely no one
        exists, so it is safe to also return "undef" in this case). If it
        doesn't return anything (not even "undef"!), then nothing happens,
        so this is a no-op callback:

           sub cb {
              return ();
           }

        If it returns "undef", then this indicates that no filename could be
        found. In all other cases, the return value is taken to be the
        filename.

        This is a slightly more useful callback:

          sub cb {
             return unless $_[1]; # skip "Re:"-plies et al.
             my ($subject, $filename) = @_;
             # if we find some *.rar, take it
             return $1 if $subject =~ /(\w+\.rar)/;
             # otherwise just pass what we have
             return ();
          }

LARGE EXAMPLE DECODER
    The general workflow for decoding is like this:

    1. Configure options with "SetOption" or "SetXXXCallback".
    2. Load all source files with "LoadFile".
    3. Optionally "Smerge".
    4. Iterate over all "GetFileList" items (i.e. result files).
    5. "CleanUp" to delete files and free items.

    What follows is the file "example-decoder" from the distribution that
    illustrates the above worklfow in a non-trivial example.

       #!/usr/bin/perl

       # decode all the files in the directory uusrc/ and copy
       # the resulting files to uudst/

       use Convert::UUlib ':all';

       sub namefilter {
          my ($path) = @_;

          $path=~s/^.*[\/\\]//;

          $path
       }

       sub busycb {
          my ($action, $curfile, $partno, $numparts, $percent, $fsize) = @_;
          $_[0]=straction($action);
          print "busy_callback(", (join ",",@_), ")\n";
          0
       }

       SetOption OPT_RBUF, 128*1024;
       SetOption OPT_WBUF, 1024*1024;
       SetOption OPT_IGNMODE, 1;
       SetOption OPT_IGNMODE, 1;
       SetOption OPT_VERBOSE, 1;
       SetOption OPT_AUTOCHK, 0;

       # show the three ways you can set callback functions. I normally
       # prefer the one with the sub inplace.
       SetFNameFilter \&namefilter;

       SetBusyCallback "busycb", 333;

       SetMsgCallback sub {
          my ($msg, $level) = @_;
          print uc strmsglevel $_[1], ": $msg\n";
       };

       # the following non-trivial FileNameCallback takes care
       # of some subject lines not detected properly by uulib:
       SetFileNameCallback sub {
          return unless $_[1]; # skip "Re:"-plies et al.
          local $_ = $_[0];

          # the following rules are rather effective on some newsgroups,
          # like alt.binaries.games.anime, where non-mime, uuencoded data
          # is very common

          # if we find some *.rar, take it as the filename
          return $1 if /(\S{3,}\.(?:[rstuvwxyz]\d\d|rar))\s/i;

          # one common subject format
          return $1 if /- "(.{2,}?\..+?)" (?:yenc )?\(\d+\/\d+\)/i;

          # - filename.par (04/55)
          return $1 if /- "?(\S{3,}\.\S+?)"? (?:yenc )?\(\d+\/\d+\)/i;

          # - (xxx) No. 1 sayuri81.jpg 756565 bytes
          # - (20 files) No.17 Roseanne.jpg [2/2]
          return $1 if /No\.[ 0-9]+ (\S+\....) (?:\d+ bytes )?\[/;

          # try to detect some common forms of filenames
          return $1 if /([a-z0-9_\-+.]{3,}\.[a-z]{3,4}(?:.\d+))/i;

          # otherwise just pass what we have
          ()
       };

       # now read all files in the directory uusrc/*
       for (<uusrc/*>) {
          my ($retval, $count) = LoadFile ($_, $_, 1);
          print "file($_), status(", strerror $retval, ") parts($count)\n";
       }

       Smerge -1;

       SetOption OPT_SAVEPATH, "uudst/";

       # now wade through all files and their source parts
       for my $uu (GetFileList) {
          print "file ", $uu->filename, "\n";
          print " state ", $uu->state, "\n";
          print " mode ", $uu->mode, "\n";
          print " uudet ", strencoding $uu->uudet, "\n";
          print " size ", $uu->size, "\n";
          print " subfname ", $uu->subfname, "\n";
          print " mimeid ", $uu->mimeid, "\n";
          print " mimetype ", $uu->mimetype, "\n";

          # print additional info about all parts
          print " parts";
          for ($uu->parts) {
             for my $k (sort keys %$_) {
                print " $k=$_->{$k}";
             }
             print "\n";
          }

          $uu->remove_temp;

          if (my $err = $uu->decode) {
             print " ERROR ", strerror $err, "\n";
          } else {
             print " successfully saved as uudst/", $uu->filename, "\n";
          }
       }

       print "cleanup...\n";

       CleanUp;

PERLMULTICORE SUPPORT
    This module supports the perlmulticore standard (see
    <http://perlmulticore.schmorp.de/> for more info) for the following
    functions - generally these are functions accessing the disk and/or
    using considerable CPU time:

       LoadFile
       $item->decode
       $item->decode_temp
       $item->remove_temp
       $item->info

    The perl interpreter will be reacquired/released on every callback
    invocation, so for performance reasons, callbacks should be avoided if
    that is costly.

    Future versions might enable multicore support for more functions.

BUGS AND LIMITATIONS
    The original uulib library this module uses was written at a time where
    main memory of measured in megabytes and buffer overflows as a security
    thign didn't exist. While a lot of security fixes have been applied over
    the years (includign some defense in depth mechanism that can shield
    against a lot of as-of-yet undetected bugs), using this library for
    security purposes requires care.

    Likewise, file sizes when the uulib library was written were tiny
    compared to today, so do not expect this library to handle files larger
    than 2GB.

    Lastly, this module uses a very "C-like" interface, which means it
    doesn't protect you from invalid points as you might expect from "more
    perlish" modules - for example, accessing a file item object after
    callinbg "CleanUp" will likely result in crashes, memory corruption, or
    worse.

AUTHOR
    Marc Lehmann <schmorp@schmorp.de>, the original uulib library was
    written by Frank Pilhofer <fp@informatik.uni-frankfurt.de>, and later
    heavily bugfixed by Marc Lehmann.

SEE ALSO
    perl(1), uudeview homepage at <http://www.fpx.de/fp/Software/UUDeview/>.

Revision:	1.7
Committed:	Thu Dec 17 01:24:59 2020 UTC (3 years, 4 months ago) by root
Branch:	MAIN
CVS Tags:	rel-1_8, HEAD
Changes since 1.6:	+18 -3 lines
Log Message:	1.8
#	Content
1	NAME
2	Convert::UUlib - decode uu/xx/b64/mime/yenc/etc-encoded data from a
3	massive number of files
4
5	SYNOPSIS
6	use Convert::UUlib ':all';
7
8	# read all the files named on the commandline and decode them
9	# into the CURRENT directory. See below for a longer example.
10	LoadFile $_ for @ARGV;
11
12	for my $uu (GetFileList) {
13	if ($uu->state & FILE_OK) {
14	$uu->decode;
15	print $uu->filename, "\n";
16	}
17	}
18
19	DESCRIPTION
20	This module started as an interface to the uulib/uudeview library by
21	Frank Pilhofer that can be used to decode all kinds of usenet (and
22	other) binary messages.
23
24	After upstream abondoned the project, th library was continuously
25	bugfixed and improved in this module, with major focuses on security
26	fixes, correctness and speed (that does not mean that this library is
27	considered safe with untrusted data, but it surely is safer than the
28	poriginal uudeview).
29
30	Read the file doc/library.pdf from the distribution for in-depth
31	information about the C-library used in this interface, and the rest of
32	this document and especially the non-trivial decoder program at the end.
33
34	EXPORTED CONSTANTS
35	Action code constants
36	ACT_IDLE we don't do anything
37	ACT_SCANNING scanning an input file
38	ACT_DECODING decoding into a temp file
39	ACT_COPYING copying temp to target
40	ACT_ENCODING encoding a file
41
42	Message severity levels
43	MSG_MESSAGE just a message, nothing important
44	MSG_NOTE something that should be noticed
45	MSG_WARNING important msg, processing continues
46	MSG_ERROR processing has been terminated
47	MSG_FATAL decoder cannot process further requests
48	MSG_PANIC recovery impossible, app must terminate
49
50	Options
51	OPT_VERSION version number MAJOR.MINORplPATCH (ro)
52	OPT_FAST assumes only one part per file
53	OPT_DUMBNESS switch off the program's intelligence
54	OPT_BRACKPOL give numbers in [] higher precendence
55	OPT_VERBOSE generate informative messages
56	OPT_DESPERATE try to decode incomplete files
57	OPT_IGNREPLY ignore RE:plies (off by default)
58	OPT_OVERWRITE whether it's OK to overwrite ex. files
59	OPT_SAVEPATH prefix to save-files on disk
60	OPT_IGNMODE ignore the original file mode
61	OPT_DEBUG print messages with FILE/LINE info
62	OPT_ERRNO get last error code for RET_IOERR (ro)
63	OPT_PROGRESS retrieve progress information
64	OPT_USETEXT handle text messages
65	OPT_PREAMB handle Mime preambles/epilogues
66	OPT_TINYB64 detect short B64 outside of Mime
67	OPT_ENCEXT extension for single-part encoded files
68	OPT_REMOVE remove input files after decoding (dangerous)
69	OPT_MOREMIME strict MIME adherence
70	OPT_DOTDOT ".."-unescaping has not yet been done on input files
71	OPT_RBUF set default read I/O buffer size in bytes
72	OPT_WBUF set default write I/O buffer size in bytes
73	OPT_AUTOCHECK automatically check file list after every loadfile
74
75	Result/Error codes
76	RET_OK everything went fine
77	RET_IOERR I/O Error - examine errno
78	RET_NOMEM not enough memory
79	RET_ILLVAL illegal value for operation
80	RET_NODATA decoder didn't find any data
81	RET_NOEND encoded data wasn't ended properly
82	RET_UNSUP unsupported function (encoding)
83	RET_EXISTS file exists (decoding)
84	RET_CONT continue -- special from ScanPart
85	RET_CANCEL operation canceled
86
87	File States
88	This code is zero, i.e. "false":
89
90	UUFILE_READ Read in, but not further processed
91
92	The following state codes are or'ed together:
93
94	FILE_MISPART Missing Part(s) detected
95	FILE_NOBEGIN No 'begin' found
96	FILE_NOEND No 'end' found
97	FILE_NODATA File does not contain valid uudata
98	FILE_OK All Parts found, ready to decode
99	FILE_ERROR Error while decoding
100	FILE_DECODED Successfully decoded
101	FILE_TMPFILE Temporary decoded file exists
102
103	Encoding types
104	UU_ENCODED UUencoded data
105	B64_ENCODED Mime-Base64 data
106	XX_ENCODED XXencoded data
107	BH_ENCODED Binhex encoded
108	PT_ENCODED Plain-Text encoded (MIME)
109	QP_ENCODED Quoted-Printable (MIME)
110	YENC_ENCODED yEnc encoded (non-MIME)
111
112	EXPORTED FUNCTIONS
113	Initializing and cleanup
114	Initialize is automatically called when the module is loaded and
115	allocates quite a small amount of memory for todays machines ;) CleanUp
116	releases that again.
117
118	On my machine, a fairly complete decode with DBI backend needs about
119	10MB RSS to decode 20000 files.
120
121	CleanUp
122	Release memory, file items and clean up files. Should be called
123	after a decoidng run, if you want to start a new one.
124
125	Setting and querying options
126	$option = GetOption OPT_xxx
127	SetOption OPT_xxx, opt-value
128
129	See the "OPT_xxx" constants above to see which options exist.
130
131	Setting various callbacks
132	SetMsgCallback [callback-function]
133	SetBusyCallback [callback-function]
134	SetFileCallback [callback-function]
135	SetFNameFilter [callback-function]
136
137	Call the currently selected FNameFilter
138	$file = FNameFilter $file
139
140	Loading sourcefiles, optionally fuzzy merge and start decoding
141	($retval, $count) = LoadFile $fname, [$id, [$delflag, [$partno]]]
142	Load the given file and scan it for encoded contents. Optionally tag
143	it with the given id, and if $delflag is true, delete the file after
144	it is no longer necessary. If you are certain of the part number,
145	you can specify it as the last argument.
146
147	A better (usually faster) way of doing this is using the
148	"SetFNameFilter" functionality.
149
150	$retval = Smerge $pass
151	If you are desperate, try to call "Smerge" with increasing $pass
152	values, beginning at 0, to try to merge parts that usually would not
153	have been merged.
154
155	Most probably this will result in garbled files, so never do this by
156	default, except:
157
158	If the "OPT_AUTOCHECK" option has been disabled (by default it is
159	enabled) to speed up file loading, then you have to call "Smerge
160	-1" after loading all files as an additional pre-pass (which is
161	normally done by "LoadFile").
162
163	$item = GetFileListItem $item_number
164	Return the $item structure for the $item_number'th found file, or
165	"undef" of no file with that number exists.
166
167	The first file has number 0, and the series has no holes, so you can
168	iterate over all files by starting with zero and incrementing until
169	you hit "undef".
170
171	This function has to walk the linear list of fils on each access, so
172	if you want to iterate over all items, it is usually faster to use
173	"GetFileList".
174
175	@items = GetFileList
176	Similar to "GetFileListItem", but returns all files in one go, which
177	is very much faster for large number of items, and has no drawbacks
178	when used for a small number of items.
179
180	Decoding files
181	$retval = $item->rename ($newname)
182	Change the ondisk filename where the decoded file will be saved.
183
184	$retval = $item->decode_temp
185	Decode the file into a temporary location, use "$item->infile" to
186	retrieve the temporary filename.
187
188	$retval = $item->remove_temp
189	Remove the temporarily decoded file again.
190
191	$retval = $item->decode ([$target_path])
192	Decode the file to its destination, or the given target path.
193
194	$retval = $item->info (callback-function)
195
196	Querying (and setting) item attributes
197	$state = $item->state
198	$mode = $item->mode ([newmode])
199	$uudet = $item->uudet
200	$size = $item->size
201	$filename = $item->filename ([newfilename})
202	$subfname = $item->subfname
203	$mimeid = $item->mimeid
204	$mimetype = $item->mimetype
205	$binfile = $item->binfile
206
207	Information about source parts
208	$parts = $item->parts
209	Return information about all parts (source files) used to decode the
210	file as a list of hashrefs with the following structure:
211
212	{
213	partno => <integer describing the part number, starting with 1>,
214	# the following member sonly exist when they contain useful information
215	sfname => <local pathname of the file where this part is from>,
216	filename => <the ondisk filename of the decoded file>,
217	subfname => <used to cluster postings, possibly the posting filename>,
218	subject => <the subject of the posting/mail>,
219	origin => <the possible source (From) address>,
220	mimetype => <the possible mimetype of the decoded file>,
221	mimeid => <the id part of the Content-Type>,
222	}
223
224	Usually you are interested mostly the "sfname" and possibly the
225	"partno" and "filename" members.
226
227	Functions below are not documented and not very well tested - feedback welcome
228	QuickDecode
229	EncodeMulti
230	EncodePartial
231	EncodeToStream
232	EncodeToFile
233	E_PrepSingle
234	E_PrepPartial
235
236	EXTENSION FUNCTIONS
237	Functions found in this module but not documented in the uulib
238	documentation:
239
240	$msg = straction ACT_xxx
241	Return a human readable string representing the given action code.
242
243	$msg = strerror RET_xxx
244	Return a human readable string representing the given error code.
245
246	$str = strencoding xxx_ENCODED
247	Return the name of the encoding type as a string.
248
249	$str = strmsglevel MSG_xxx
250	Returns the message level as a string.
251
252	SetFileNameCallback $cb
253	Sets (or queries) the FileNameCallback, which is called whenever the
254	decoding library can't find a filename and wants to extract a
255	filename from the subject line of a posting. The callback will be
256	called with two arguments, the subject line and the current
257	candidate for the filename. The latter argument can be "undef",
258	which means that no filename could be found (and likely no one
259	exists, so it is safe to also return "undef" in this case). If it
260	doesn't return anything (not even "undef"!), then nothing happens,
261	so this is a no-op callback:
262
263	sub cb {
264	return ();
265	}
266
267	If it returns "undef", then this indicates that no filename could be
268	found. In all other cases, the return value is taken to be the
269	filename.
270
271	This is a slightly more useful callback:
272
273	sub cb {
274	return unless $_[1]; # skip "Re:"-plies et al.
275	my ($subject, $filename) = @_;
276	# if we find some *.rar, take it
277	return $1 if $subject =~ /(\w+\.rar)/;
278	# otherwise just pass what we have
279	return ();
280	}
281
282	LARGE EXAMPLE DECODER
283	The general workflow for decoding is like this:
284
285	1. Configure options with "SetOption" or "SetXXXCallback".
286	2. Load all source files with "LoadFile".
287	3. Optionally "Smerge".
288	4. Iterate over all "GetFileList" items (i.e. result files).
289	5. "CleanUp" to delete files and free items.
290
291	What follows is the file "example-decoder" from the distribution that
292	illustrates the above worklfow in a non-trivial example.
293
294	#!/usr/bin/perl
295
296	# decode all the files in the directory uusrc/ and copy
297	# the resulting files to uudst/
298
299	use Convert::UUlib ':all';
300
301	sub namefilter {
302	my ($path) = @_;
303
304	$path=~s/^.*[\/\\]//;
305
306	$path
307	}
308
309	sub busycb {
310	my ($action, $curfile, $partno, $numparts, $percent, $fsize) = @_;
311	$_[0]=straction($action);
312	print "busy_callback(", (join ",",@_), ")\n";
313	0
314	}
315
316	SetOption OPT_RBUF, 128*1024;
317	SetOption OPT_WBUF, 1024*1024;
318	SetOption OPT_IGNMODE, 1;
319	SetOption OPT_IGNMODE, 1;
320	SetOption OPT_VERBOSE, 1;
321	SetOption OPT_AUTOCHK, 0;
322
323	# show the three ways you can set callback functions. I normally
324	# prefer the one with the sub inplace.
325	SetFNameFilter \&namefilter;
326
327	SetBusyCallback "busycb", 333;
328
329	SetMsgCallback sub {
330	my ($msg, $level) = @_;
331	print uc strmsglevel $_[1], ": $msg\n";
332	};
333
334	# the following non-trivial FileNameCallback takes care
335	# of some subject lines not detected properly by uulib:
336	SetFileNameCallback sub {
337	return unless $_[1]; # skip "Re:"-plies et al.
338	local $_ = $_[0];
339
340	# the following rules are rather effective on some newsgroups,
341	# like alt.binaries.games.anime, where non-mime, uuencoded data
342	# is very common
343
344	# if we find some *.rar, take it as the filename
345	return $1 if /(\S{3,}\.(?:[rstuvwxyz]\d\d\|rar))\s/i;
346
347	# one common subject format
348	return $1 if /- "(.{2,}?\..+?)" (?:yenc )?\(\d+\/\d+\)/i;
349
350	# - filename.par (04/55)
351	return $1 if /- "?(\S{3,}\.\S+?)"? (?:yenc )?\(\d+\/\d+\)/i;
352
353	# - (xxx) No. 1 sayuri81.jpg 756565 bytes
354	# - (20 files) No.17 Roseanne.jpg [2/2]
355	return $1 if /No\.[ 0-9]+ (\S+\....) (?:\d+ bytes )?\[/;
356
357	# try to detect some common forms of filenames
358	return $1 if /([a-z0-9_\-+.]{3,}\.[a-z]{3,4}(?:.\d+))/i;
359
360	# otherwise just pass what we have
361	()
362	};
363
364	# now read all files in the directory uusrc/*
365	for (<uusrc/*>) {
366	my ($retval, $count) = LoadFile ($_, $_, 1);
367	print "file($_), status(", strerror $retval, ") parts($count)\n";
368	}
369
370	Smerge -1;
371
372	SetOption OPT_SAVEPATH, "uudst/";
373
374	# now wade through all files and their source parts
375	for my $uu (GetFileList) {
376	print "file ", $uu->filename, "\n";
377	print " state ", $uu->state, "\n";
378	print " mode ", $uu->mode, "\n";
379	print " uudet ", strencoding $uu->uudet, "\n";
380	print " size ", $uu->size, "\n";
381	print " subfname ", $uu->subfname, "\n";
382	print " mimeid ", $uu->mimeid, "\n";
383	print " mimetype ", $uu->mimetype, "\n";
384
385	# print additional info about all parts
386	print " parts";
387	for ($uu->parts) {
388	for my $k (sort keys %$_) {
389	print " $k=$_->{$k}";
390	}
391	print "\n";
392	}
393
394	$uu->remove_temp;
395
396	if (my $err = $uu->decode) {
397	print " ERROR ", strerror $err, "\n";
398	} else {
399	print " successfully saved as uudst/", $uu->filename, "\n";
400	}
401	}
402
403	print "cleanup...\n";
404
405	CleanUp;
406
407	PERLMULTICORE SUPPORT
408	This module supports the perlmulticore standard (see
409	<http://perlmulticore.schmorp.de/> for more info) for the following
410	functions - generally these are functions accessing the disk and/or
411	using considerable CPU time:
412
413	LoadFile
414	$item->decode
415	$item->decode_temp
416	$item->remove_temp
417	$item->info
418
419	The perl interpreter will be reacquired/released on every callback
420	invocation, so for performance reasons, callbacks should be avoided if
421	that is costly.
422
423	Future versions might enable multicore support for more functions.
424
425	BUGS AND LIMITATIONS
426	The original uulib library this module uses was written at a time where
427	main memory of measured in megabytes and buffer overflows as a security
428	thign didn't exist. While a lot of security fixes have been applied over
429	the years (includign some defense in depth mechanism that can shield
430	against a lot of as-of-yet undetected bugs), using this library for
431	security purposes requires care.
432
433	Likewise, file sizes when the uulib library was written were tiny
434	compared to today, so do not expect this library to handle files larger
435	than 2GB.
436
437	Lastly, this module uses a very "C-like" interface, which means it
438	doesn't protect you from invalid points as you might expect from "more
439	perlish" modules - for example, accessing a file item object after
440	callinbg "CleanUp" will likely result in crashes, memory corruption, or
441	worse.
442
443	AUTHOR
444	Marc Lehmann <schmorp@schmorp.de>, the original uulib library was
445	written by Frank Pilhofer <fp@informatik.uni-frankfurt.de>, and later
446	heavily bugfixed by Marc Lehmann.
447
448	SEE ALSO
449	perl(1), uudeview homepage at <http://www.fpx.de/fp/Software/UUDeview/>.
450