ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/Convert-UUlib/README
Revision: 1.6
Committed: Sun Mar 1 05:14:55 2020 UTC (4 years, 3 months ago) by root
Branch: MAIN
CVS Tags: rel-1_71, rel-1_7
Changes since 1.5: +81 -37 lines
Log Message:
1.7

File Contents

# User Rev Content
1 root 1.1 NAME
2     Convert::UUlib - Perl interface to the uulib library (a.k.a.
3     uudeview/uuenview).
4    
5     SYNOPSIS
6     use Convert::UUlib ':all';
7 root 1.5
8     # read all the files named on the commandline and decode them
9 root 1.1 # into the CURRENT directory. See below for a longer example.
10     LoadFile $_ for @ARGV;
11 root 1.6
12     for my $uu (GetFileList) {
13 root 1.1 if ($uu->state & FILE_OK) {
14     $uu->decode;
15     print $uu->filename, "\n";
16     }
17     }
18    
19     DESCRIPTION
20     Read the file doc/library.pdf from the distribution for in-depth
21     information about the C-library used in this interface, and the rest of
22     this document and especially the non-trivial decoder program at the end.
23    
24     EXPORTED CONSTANTS
25     Action code constants
26     ACT_IDLE we don't do anything
27     ACT_SCANNING scanning an input file
28     ACT_DECODING decoding into a temp file
29     ACT_COPYING copying temp to target
30     ACT_ENCODING encoding a file
31    
32     Message severity levels
33     MSG_MESSAGE just a message, nothing important
34     MSG_NOTE something that should be noticed
35     MSG_WARNING important msg, processing continues
36     MSG_ERROR processing has been terminated
37     MSG_FATAL decoder cannot process further requests
38     MSG_PANIC recovery impossible, app must terminate
39    
40     Options
41     OPT_VERSION version number MAJOR.MINORplPATCH (ro)
42     OPT_FAST assumes only one part per file
43     OPT_DUMBNESS switch off the program's intelligence
44     OPT_BRACKPOL give numbers in [] higher precendence
45     OPT_VERBOSE generate informative messages
46     OPT_DESPERATE try to decode incomplete files
47     OPT_IGNREPLY ignore RE:plies (off by default)
48     OPT_OVERWRITE whether it's OK to overwrite ex. files
49     OPT_SAVEPATH prefix to save-files on disk
50     OPT_IGNMODE ignore the original file mode
51     OPT_DEBUG print messages with FILE/LINE info
52     OPT_ERRNO get last error code for RET_IOERR (ro)
53     OPT_PROGRESS retrieve progress information
54     OPT_USETEXT handle text messages
55     OPT_PREAMB handle Mime preambles/epilogues
56     OPT_TINYB64 detect short B64 outside of Mime
57     OPT_ENCEXT extension for single-part encoded files
58     OPT_REMOVE remove input files after decoding (dangerous)
59     OPT_MOREMIME strict MIME adherence
60     OPT_DOTDOT ".."-unescaping has not yet been done on input files
61 root 1.5 OPT_RBUF set default read I/O buffer size in bytes
62     OPT_WBUF set default write I/O buffer size in bytes
63     OPT_AUTOCHECK automatically check file list after every loadfile
64 root 1.1
65     Result/Error codes
66     RET_OK everything went fine
67     RET_IOERR I/O Error - examine errno
68     RET_NOMEM not enough memory
69     RET_ILLVAL illegal value for operation
70     RET_NODATA decoder didn't find any data
71     RET_NOEND encoded data wasn't ended properly
72     RET_UNSUP unsupported function (encoding)
73     RET_EXISTS file exists (decoding)
74     RET_CONT continue -- special from ScanPart
75     RET_CANCEL operation canceled
76    
77     File States
78     This code is zero, i.e. "false":
79    
80     UUFILE_READ Read in, but not further processed
81    
82     The following state codes are or'ed together:
83    
84     FILE_MISPART Missing Part(s) detected
85     FILE_NOBEGIN No 'begin' found
86     FILE_NOEND No 'end' found
87     FILE_NODATA File does not contain valid uudata
88     FILE_OK All Parts found, ready to decode
89     FILE_ERROR Error while decoding
90     FILE_DECODED Successfully decoded
91     FILE_TMPFILE Temporary decoded file exists
92    
93     Encoding types
94     UU_ENCODED UUencoded data
95     B64_ENCODED Mime-Base64 data
96     XX_ENCODED XXencoded data
97     BH_ENCODED Binhex encoded
98     PT_ENCODED Plain-Text encoded (MIME)
99     QP_ENCODED Quoted-Printable (MIME)
100     YENC_ENCODED yEnc encoded (non-MIME)
101    
102     EXPORTED FUNCTIONS
103     Initializing and cleanup
104     Initialize is automatically called when the module is loaded and
105     allocates quite a small amount of memory for todays machines ;) CleanUp
106     releases that again.
107    
108     On my machine, a fairly complete decode with DBI backend needs about
109     10MB RSS to decode 20000 files.
110    
111     CleanUp
112 root 1.6 Release memory, file items and clean up files. Should be called
113     after a decoidng run, if you want to start a new one.
114 root 1.1
115     Setting and querying options
116     $option = GetOption OPT_xxx
117     SetOption OPT_xxx, opt-value
118    
119     See the "OPT_xxx" constants above to see which options exist.
120    
121     Setting various callbacks
122     SetMsgCallback [callback-function]
123     SetBusyCallback [callback-function]
124     SetFileCallback [callback-function]
125     SetFNameFilter [callback-function]
126    
127     Call the currently selected FNameFilter
128     $file = FNameFilter $file
129    
130     Loading sourcefiles, optionally fuzzy merge and start decoding
131     ($retval, $count) = LoadFile $fname, [$id, [$delflag, [$partno]]]
132     Load the given file and scan it for encoded contents. Optionally tag
133     it with the given id, and if $delflag is true, delete the file after
134     it is no longer necessary. If you are certain of the part number,
135     you can specify it as the last argument.
136    
137     A better (usually faster) way of doing this is using the
138     "SetFNameFilter" functionality.
139    
140     $retval = Smerge $pass
141     If you are desperate, try to call "Smerge" with increasing $pass
142     values, beginning at 0, to try to merge parts that usually would not
143     have been merged.
144    
145     Most probably this will result in garbled files, so never do this by
146 root 1.5 default, except:
147    
148     If the "OPT_AUTOCHECK" option has been disabled (by default it is
149     enabled) to speed up file loading, then you *have* to call "Smerge
150     -1" after loading all files as an additional pre-pass (which is
151     normally done by "LoadFile").
152 root 1.1
153     $item = GetFileListItem $item_number
154     Return the $item structure for the $item_number'th found file, or
155     "undef" of no file with that number exists.
156    
157     The first file has number 0, and the series has no holes, so you can
158     iterate over all files by starting with zero and incrementing until
159     you hit "undef".
160    
161 root 1.6 This function has to walk the linear list of fils on each access, so
162     if you want to iterate over all items, it is usually faster to use
163     "GetFileList".
164    
165     @items = GetFileList
166     Similar to "GetFileListItem", but returns all files in one go.
167    
168 root 1.1 Decoding files
169 root 1.6 $retval = $item->rename ($newname)
170 root 1.1 Change the ondisk filename where the decoded file will be saved.
171    
172     $retval = $item->decode_temp
173     Decode the file into a temporary location, use "$item->infile" to
174     retrieve the temporary filename.
175    
176     $retval = $item->remove_temp
177     Remove the temporarily decoded file again.
178    
179 root 1.6 $retval = $item->decode ([$target_path])
180     Decode the file to its destination, or the given target path.
181 root 1.1
182 root 1.6 $retval = $item->info (callback-function)
183 root 1.1
184     Querying (and setting) item attributes
185     $state = $item->state
186 root 1.6 $mode = $item->mode ([newmode])
187 root 1.1 $uudet = $item->uudet
188     $size = $item->size
189 root 1.6 $filename = $item->filename ([newfilename})
190 root 1.1 $subfname = $item->subfname
191     $mimeid = $item->mimeid
192     $mimetype = $item->mimetype
193     $binfile = $item->binfile
194    
195     Information about source parts
196     $parts = $item->parts
197     Return information about all parts (source files) used to decode the
198     file as a list of hashrefs with the following structure:
199    
200     {
201     partno => <integer describing the part number, starting with 1>,
202     # the following member sonly exist when they contain useful information
203     sfname => <local pathname of the file where this part is from>,
204     filename => <the ondisk filename of the decoded file>,
205     subfname => <used to cluster postings, possibly the posting filename>,
206     subject => <the subject of the posting/mail>,
207     origin => <the possible source (From) address>,
208     mimetype => <the possible mimetype of the decoded file>,
209     mimeid => <the id part of the Content-Type>,
210     }
211    
212     Usually you are interested mostly the "sfname" and possibly the
213     "partno" and "filename" members.
214    
215 root 1.6 Functions below are not documented and not very well tested - feedback welcome
216 root 1.1 QuickDecode
217     EncodeMulti
218     EncodePartial
219     EncodeToStream
220     EncodeToFile
221     E_PrepSingle
222     E_PrepPartial
223    
224     EXTENSION FUNCTIONS
225     Functions found in this module but not documented in the uulib
226     documentation:
227    
228     $msg = straction ACT_xxx
229     Return a human readable string representing the given action code.
230    
231     $msg = strerror RET_xxx
232     Return a human readable string representing the given error code.
233    
234     $str = strencoding xxx_ENCODED
235     Return the name of the encoding type as a string.
236    
237     $str = strmsglevel MSG_xxx
238     Returns the message level as a string.
239    
240     SetFileNameCallback $cb
241     Sets (or queries) the FileNameCallback, which is called whenever the
242     decoding library can't find a filename and wants to extract a
243     filename from the subject line of a posting. The callback will be
244     called with two arguments, the subject line and the current
245     candidate for the filename. The latter argument can be "undef",
246     which means that no filename could be found (and likely no one
247     exists, so it is safe to also return "undef" in this case). If it
248     doesn't return anything (not even "undef"!), then nothing happens,
249     so this is a no-op callback:
250    
251     sub cb {
252     return ();
253     }
254    
255     If it returns "undef", then this indicates that no filename could be
256     found. In all other cases, the return value is taken to be the
257     filename.
258    
259     This is a slightly more useful callback:
260    
261     sub cb {
262     return unless $_[1]; # skip "Re:"-plies et al.
263     my ($subject, $filename) = @_;
264     # if we find some *.rar, take it
265     return $1 if $subject =~ /(\w+\.rar)/;
266     # otherwise just pass what we have
267     return ();
268     }
269    
270     LARGE EXAMPLE DECODER
271 root 1.6 The general workflow for decoding is like this:
272    
273     1. Configure options with "SetOption" or "SetXXXCallback".
274     2. Load all source files with "LoadFile".
275     3. Optionally "Smerge".
276     4. Iterate over all "GetFileList" items (i.e. result files).
277     5. "CleanUp" to delete files and free items.
278    
279     What follows is the file "example-decoder" from the distribution that
280     illustrates the above worklfow in a non-trivial example.
281 root 1.1
282 root 1.4 #!/usr/bin/perl
283 root 1.1
284 root 1.4 # decode all the files in the directory uusrc/ and copy
285     # the resulting files to uudst/
286 root 1.1
287 root 1.4 use Convert::UUlib ':all';
288 root 1.1
289 root 1.4 sub namefilter {
290     my ($path) = @_;
291    
292     $path=~s/^.*[\/\\]//;
293    
294     $path
295     }
296    
297     sub busycb {
298     my ($action, $curfile, $partno, $numparts, $percent, $fsize) = @_;
299     $_[0]=straction($action);
300     print "busy_callback(", (join ",",@_), ")\n";
301     0
302     }
303    
304     SetOption OPT_RBUF, 128*1024;
305     SetOption OPT_WBUF, 1024*1024;
306     SetOption OPT_IGNMODE, 1;
307     SetOption OPT_IGNMODE, 1;
308     SetOption OPT_VERBOSE, 1;
309    
310     # show the three ways you can set callback functions. I normally
311     # prefer the one with the sub inplace.
312     SetFNameFilter \&namefilter;
313    
314     SetBusyCallback "busycb", 333;
315    
316     SetMsgCallback sub {
317     my ($msg, $level) = @_;
318     print uc strmsglevel $_[1], ": $msg\n";
319     };
320    
321     # the following non-trivial FileNameCallback takes care
322     # of some subject lines not detected properly by uulib:
323     SetFileNameCallback sub {
324     return unless $_[1]; # skip "Re:"-plies et al.
325     local $_ = $_[0];
326    
327     # the following rules are rather effective on some newsgroups,
328     # like alt.binaries.games.anime, where non-mime, uuencoded data
329     # is very common
330    
331     # if we find some *.rar, take it as the filename
332     return $1 if /(\S{3,}\.(?:[rstuvwxyz]\d\d|rar))\s/i;
333    
334     # one common subject format
335     return $1 if /- "(.{2,}?\..+?)" (?:yenc )?\(\d+\/\d+\)/i;
336    
337     # - filename.par (04/55)
338     return $1 if /- "?(\S{3,}\.\S+?)"? (?:yenc )?\(\d+\/\d+\)/i;
339    
340     # - (xxx) No. 1 sayuri81.jpg 756565 bytes
341     # - (20 files) No.17 Roseanne.jpg [2/2]
342     return $1 if /No\.[ 0-9]+ (\S+\....) (?:\d+ bytes )?\[/;
343    
344     # try to detect some common forms of filenames
345     return $1 if /([a-z0-9_\-+.]{3,}\.[a-z]{3,4}(?:.\d+))/i;
346    
347     # otherwise just pass what we have
348     ()
349     };
350    
351     # now read all files in the directory uusrc/*
352 root 1.6 for (<uusrc/*>) {
353 root 1.4 my ($retval, $count) = LoadFile ($_, $_, 1);
354     print "file($_), status(", strerror $retval, ") parts($count)\n";
355     }
356    
357     SetOption OPT_SAVEPATH, "uudst/";
358    
359     # now wade through all files and their source parts
360 root 1.6 for my $uu (GetFileList) {
361     print "file ", $uu->filename, "\n";
362     print " state ", $uu->state, "\n";
363     print " mode ", $uu->mode, "\n";
364     print " uudet ", strencoding $uu->uudet, "\n";
365     print " size ", $uu->size, "\n";
366     print " subfname ", $uu->subfname, "\n";
367     print " mimeid ", $uu->mimeid, "\n";
368     print " mimetype ", $uu->mimetype, "\n";
369 root 1.4
370     # print additional info about all parts
371 root 1.6 print " parts";
372 root 1.4 for ($uu->parts) {
373 root 1.6 for my $k (sort keys %$_) {
374     print " $k=$_->{$k}";
375 root 1.4 }
376     print "\n";
377     }
378    
379     $uu->remove_temp;
380    
381 root 1.6 if (my $err = $uu->decode) {
382     print " ERROR ", strerror $err, "\n";
383 root 1.4 } else {
384 root 1.6 print " successfully saved as uudst/", $uu->filename, "\n";
385 root 1.4 }
386     }
387 root 1.1
388 root 1.4 print "cleanup...\n";
389    
390     CleanUp;
391 root 1.1
392 root 1.6 PERLMULTICORE SUPPORT
393     This module supports the perlmulticore standard (see
394     <http://perlmulticore.schmorp.de/> for more info) for the following
395     functions - generally these are functions accessing the disk and/or
396     using considerable CPU time:
397    
398     LoadFile
399     $item->decode
400     $item->decode_temp
401     $item->remove_temp
402     $item->info
403    
404     The perl interpreter will be reacquired/released on every callback
405     invocation, so for performance reasons, callbacks should be avoided if
406     that is costly.
407    
408     Future versions might enable multicore support for more functions.
409    
410     BUGS AND LIMITATIONS
411     The original uulib library this module uses was written at a time where
412     main memory of measured in megabytes and buffer overflows as a security
413     thign didn't exist. While a lot of security fixes have been applied over
414     the years (includign some defense in depth mechanism that can shield
415     against a lot of as-of-yet undetected bugs), using this library for
416     security purposes requires care.
417    
418     Likewise, file sizes when the uulib library was written were tiny
419     compared to today, so do not expect this library to handle files larger
420     than 2GB.
421    
422     Lastly, this module uses a very "C-like" interface, which means it
423     doesn't protect you from invalid points as you might expect from "more
424     perlish" modules - for example, accessing a file item object after
425     callinbg "CleanUp" will likely result in crashes, memory corruption, or
426     worse.
427    
428 root 1.1 AUTHOR
429     Marc Lehmann <schmorp@schmorp.de>, the original uulib library was
430     written by Frank Pilhofer <fp@informatik.uni-frankfurt.de>, and later
431     heavily bugfixed by Marc Lehmann.
432    
433     SEE ALSO
434 root 1.6 perl(1), uudeview homepage at <http://www.fpx.de/fp/Software/UUDeview/>.
435 root 1.1