ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/Convert-UUlib/README
Revision: 1.4
Committed: Mon Oct 13 12:12:56 2008 UTC (15 years, 7 months ago) by root
Branch: MAIN
CVS Tags: rel-1_12
Changes since 1.3: +112 -97 lines
Log Message:
1.12

File Contents

# User Rev Content
1 root 1.1 NAME
2     Convert::UUlib - Perl interface to the uulib library (a.k.a.
3     uudeview/uuenview).
4    
5     SYNOPSIS
6     use Convert::UUlib ':all';
7 root 1.3
8     # read all the files named on the commandline and decode them
9 root 1.1 # into the CURRENT directory. See below for a longer example.
10     LoadFile $_ for @ARGV;
11     for (my $i = 0; my $uu = GetFileListItem $i; $i++) {
12     if ($uu->state & FILE_OK) {
13     $uu->decode;
14     print $uu->filename, "\n";
15     }
16     }
17    
18     DESCRIPTION
19     Read the file doc/library.pdf from the distribution for in-depth
20     information about the C-library used in this interface, and the rest of
21     this document and especially the non-trivial decoder program at the end.
22    
23     EXPORTED CONSTANTS
24     Action code constants
25     ACT_IDLE we don't do anything
26     ACT_SCANNING scanning an input file
27     ACT_DECODING decoding into a temp file
28     ACT_COPYING copying temp to target
29     ACT_ENCODING encoding a file
30    
31     Message severity levels
32     MSG_MESSAGE just a message, nothing important
33     MSG_NOTE something that should be noticed
34     MSG_WARNING important msg, processing continues
35     MSG_ERROR processing has been terminated
36     MSG_FATAL decoder cannot process further requests
37     MSG_PANIC recovery impossible, app must terminate
38    
39     Options
40     OPT_VERSION version number MAJOR.MINORplPATCH (ro)
41     OPT_FAST assumes only one part per file
42     OPT_DUMBNESS switch off the program's intelligence
43     OPT_BRACKPOL give numbers in [] higher precendence
44     OPT_VERBOSE generate informative messages
45     OPT_DESPERATE try to decode incomplete files
46     OPT_IGNREPLY ignore RE:plies (off by default)
47     OPT_OVERWRITE whether it's OK to overwrite ex. files
48     OPT_SAVEPATH prefix to save-files on disk
49     OPT_IGNMODE ignore the original file mode
50     OPT_DEBUG print messages with FILE/LINE info
51     OPT_ERRNO get last error code for RET_IOERR (ro)
52     OPT_PROGRESS retrieve progress information
53     OPT_USETEXT handle text messages
54     OPT_PREAMB handle Mime preambles/epilogues
55     OPT_TINYB64 detect short B64 outside of Mime
56     OPT_ENCEXT extension for single-part encoded files
57     OPT_REMOVE remove input files after decoding (dangerous)
58     OPT_MOREMIME strict MIME adherence
59     OPT_DOTDOT ".."-unescaping has not yet been done on input files
60 root 1.2 OPT_RBUF set default read I/O buffer size in bytes *EXPERIMENTAL*
61     OPT_WBUF set default write I/O buffer size in bytes *EXPERIMENTAL*
62 root 1.1
63     Result/Error codes
64     RET_OK everything went fine
65     RET_IOERR I/O Error - examine errno
66     RET_NOMEM not enough memory
67     RET_ILLVAL illegal value for operation
68     RET_NODATA decoder didn't find any data
69     RET_NOEND encoded data wasn't ended properly
70     RET_UNSUP unsupported function (encoding)
71     RET_EXISTS file exists (decoding)
72     RET_CONT continue -- special from ScanPart
73     RET_CANCEL operation canceled
74    
75     File States
76     This code is zero, i.e. "false":
77    
78     UUFILE_READ Read in, but not further processed
79    
80     The following state codes are or'ed together:
81    
82     FILE_MISPART Missing Part(s) detected
83     FILE_NOBEGIN No 'begin' found
84     FILE_NOEND No 'end' found
85     FILE_NODATA File does not contain valid uudata
86     FILE_OK All Parts found, ready to decode
87     FILE_ERROR Error while decoding
88     FILE_DECODED Successfully decoded
89     FILE_TMPFILE Temporary decoded file exists
90    
91     Encoding types
92     UU_ENCODED UUencoded data
93     B64_ENCODED Mime-Base64 data
94     XX_ENCODED XXencoded data
95     BH_ENCODED Binhex encoded
96     PT_ENCODED Plain-Text encoded (MIME)
97     QP_ENCODED Quoted-Printable (MIME)
98     YENC_ENCODED yEnc encoded (non-MIME)
99    
100     EXPORTED FUNCTIONS
101     Initializing and cleanup
102     Initialize is automatically called when the module is loaded and
103     allocates quite a small amount of memory for todays machines ;) CleanUp
104     releases that again.
105    
106     On my machine, a fairly complete decode with DBI backend needs about
107     10MB RSS to decode 20000 files.
108    
109     Initialize
110     Not normally necessary, (re-)initializes the library.
111    
112     CleanUp
113     Not normally necessary, could be called at the end to release memory
114     before starting a new decoding round.
115    
116     Setting and querying options
117     $option = GetOption OPT_xxx
118     SetOption OPT_xxx, opt-value
119    
120     See the "OPT_xxx" constants above to see which options exist.
121    
122     Setting various callbacks
123     SetMsgCallback [callback-function]
124     SetBusyCallback [callback-function]
125     SetFileCallback [callback-function]
126     SetFNameFilter [callback-function]
127    
128     Call the currently selected FNameFilter
129     $file = FNameFilter $file
130    
131     Loading sourcefiles, optionally fuzzy merge and start decoding
132     ($retval, $count) = LoadFile $fname, [$id, [$delflag, [$partno]]]
133     Load the given file and scan it for encoded contents. Optionally tag
134     it with the given id, and if $delflag is true, delete the file after
135     it is no longer necessary. If you are certain of the part number,
136     you can specify it as the last argument.
137    
138     A better (usually faster) way of doing this is using the
139     "SetFNameFilter" functionality.
140    
141     $retval = Smerge $pass
142     If you are desperate, try to call "Smerge" with increasing $pass
143     values, beginning at 0, to try to merge parts that usually would not
144     have been merged.
145    
146     Most probably this will result in garbled files, so never do this by
147     default.
148    
149     $item = GetFileListItem $item_number
150     Return the $item structure for the $item_number'th found file, or
151     "undef" of no file with that number exists.
152    
153     The first file has number 0, and the series has no holes, so you can
154     iterate over all files by starting with zero and incrementing until
155     you hit "undef".
156    
157     Decoding files
158     $retval = $item->rename($newname)
159     Change the ondisk filename where the decoded file will be saved.
160    
161     $retval = $item->decode_temp
162     Decode the file into a temporary location, use "$item->infile" to
163     retrieve the temporary filename.
164    
165     $retval = $item->remove_temp
166     Remove the temporarily decoded file again.
167    
168     $retval = $item->decode([$target_path])
169     Decode the file to it's destination, or the given target path.
170    
171     $retval = $item->info(callback-function)
172    
173     Querying (and setting) item attributes
174     $state = $item->state
175     $mode = $item->mode([newmode])
176     $uudet = $item->uudet
177     $size = $item->size
178     $filename = $item->filename([newfilename})
179     $subfname = $item->subfname
180     $mimeid = $item->mimeid
181     $mimetype = $item->mimetype
182     $binfile = $item->binfile
183    
184     Information about source parts
185     $parts = $item->parts
186     Return information about all parts (source files) used to decode the
187     file as a list of hashrefs with the following structure:
188    
189     {
190     partno => <integer describing the part number, starting with 1>,
191     # the following member sonly exist when they contain useful information
192     sfname => <local pathname of the file where this part is from>,
193     filename => <the ondisk filename of the decoded file>,
194     subfname => <used to cluster postings, possibly the posting filename>,
195     subject => <the subject of the posting/mail>,
196     origin => <the possible source (From) address>,
197     mimetype => <the possible mimetype of the decoded file>,
198     mimeid => <the id part of the Content-Type>,
199     }
200    
201     Usually you are interested mostly the "sfname" and possibly the
202     "partno" and "filename" members.
203    
204     Functions below not documented and not very well tested
205     QuickDecode
206     EncodeMulti
207     EncodePartial
208     EncodeToStream
209     EncodeToFile
210     E_PrepSingle
211     E_PrepPartial
212    
213     EXTENSION FUNCTIONS
214     Functions found in this module but not documented in the uulib
215     documentation:
216    
217     $msg = straction ACT_xxx
218     Return a human readable string representing the given action code.
219    
220     $msg = strerror RET_xxx
221     Return a human readable string representing the given error code.
222    
223     $str = strencoding xxx_ENCODED
224     Return the name of the encoding type as a string.
225    
226     $str = strmsglevel MSG_xxx
227     Returns the message level as a string.
228    
229     SetFileNameCallback $cb
230     Sets (or queries) the FileNameCallback, which is called whenever the
231     decoding library can't find a filename and wants to extract a
232     filename from the subject line of a posting. The callback will be
233     called with two arguments, the subject line and the current
234     candidate for the filename. The latter argument can be "undef",
235     which means that no filename could be found (and likely no one
236     exists, so it is safe to also return "undef" in this case). If it
237     doesn't return anything (not even "undef"!), then nothing happens,
238     so this is a no-op callback:
239    
240     sub cb {
241     return ();
242     }
243    
244     If it returns "undef", then this indicates that no filename could be
245     found. In all other cases, the return value is taken to be the
246     filename.
247    
248     This is a slightly more useful callback:
249    
250     sub cb {
251     return unless $_[1]; # skip "Re:"-plies et al.
252     my ($subject, $filename) = @_;
253     # if we find some *.rar, take it
254     return $1 if $subject =~ /(\w+\.rar)/;
255     # otherwise just pass what we have
256     return ();
257     }
258    
259     LARGE EXAMPLE DECODER
260     This is the file "example-decoder" from the distribution, put here
261     instead of more thorough documentation.
262    
263 root 1.4 #!/usr/bin/perl
264 root 1.1
265 root 1.4 # decode all the files in the directory uusrc/ and copy
266     # the resulting files to uudst/
267 root 1.1
268 root 1.4 use Convert::UUlib ':all';
269 root 1.1
270 root 1.4 sub namefilter {
271     my ($path) = @_;
272    
273     $path=~s/^.*[\/\\]//;
274    
275     $path
276     }
277    
278     sub busycb {
279     my ($action, $curfile, $partno, $numparts, $percent, $fsize) = @_;
280     $_[0]=straction($action);
281     print "busy_callback(", (join ",",@_), ")\n";
282     0
283     }
284    
285     SetOption OPT_RBUF, 128*1024;
286     SetOption OPT_WBUF, 1024*1024;
287     SetOption OPT_IGNMODE, 1;
288     SetOption OPT_IGNMODE, 1;
289     SetOption OPT_VERBOSE, 1;
290    
291     # show the three ways you can set callback functions. I normally
292     # prefer the one with the sub inplace.
293     SetFNameFilter \&namefilter;
294    
295     SetBusyCallback "busycb", 333;
296    
297     SetMsgCallback sub {
298     my ($msg, $level) = @_;
299     print uc strmsglevel $_[1], ": $msg\n";
300     };
301    
302     # the following non-trivial FileNameCallback takes care
303     # of some subject lines not detected properly by uulib:
304     SetFileNameCallback sub {
305     return unless $_[1]; # skip "Re:"-plies et al.
306     local $_ = $_[0];
307    
308     return $1 if /(\S+\s+IMG_\d+.jpg)/i;
309    
310     # the following rules are rather effective on some newsgroups,
311     # like alt.binaries.games.anime, where non-mime, uuencoded data
312     # is very common
313    
314     # if we find some *.rar, take it as the filename
315     return $1 if /(\S{3,}\.(?:[rstuvwxyz]\d\d|rar))\s/i;
316    
317     # one common subject format
318     return $1 if /- "(.{2,}?\..+?)" (?:yenc )?\(\d+\/\d+\)/i;
319    
320     # - filename.par (04/55)
321     return $1 if /- "?(\S{3,}\.\S+?)"? (?:yenc )?\(\d+\/\d+\)/i;
322    
323     # - (xxx) No. 1 sayuri81.jpg 756565 bytes
324     # - (20 files) No.17 Roseanne.jpg [2/2]
325     return $1 if /No\.[ 0-9]+ (\S+\....) (?:\d+ bytes )?\[/;
326    
327     # try to detect some common forms of filenames
328     return $1 if /([a-z0-9_\-+.]{3,}\.[a-z]{3,4}(?:.\d+))/i;
329    
330     # otherwise just pass what we have
331     ()
332     };
333    
334     # now read all files in the directory uusrc/*
335     for(<uusrc/*>) {
336     my ($retval, $count) = LoadFile ($_, $_, 1);
337     print "file($_), status(", strerror $retval, ") parts($count)\n";
338     }
339    
340     SetOption OPT_SAVEPATH, "uudst/";
341    
342     # now wade through all files and their source parts
343     $i = 0;
344     while ($uu = GetFileListItem $i) {
345     $i++;
346     print "file nr. $i";
347     print " state ", $uu->state;
348     print " mode ", $uu->mode;
349     print " uudet ", strencoding $uu->uudet;
350     print " size ", $uu->size;
351     print " filename ", $uu->filename;
352     print " subfname ", $uu->subfname;
353     print " mimeid ", $uu->mimeid;
354     print " mimetype ", $uu->mimetype;
355     print "\n";
356    
357     # print additional info about all parts
358     for ($uu->parts) {
359     while (my ($k, $v) = each %$_) {
360     print "$k > $v, ";
361     }
362     print "\n";
363     }
364    
365     print $uu->filename;
366    
367     $uu->remove_temp;
368    
369     if (my $err = $uu->decode ()) {
370     print ", ", strerror $err, "\n";
371     } else {
372     print ", saved as uudst/", $uu->filename, "\n";
373     }
374     }
375 root 1.1
376 root 1.4 print "cleanup...\n";
377    
378     CleanUp;
379 root 1.1
380     AUTHOR
381     Marc Lehmann <schmorp@schmorp.de>, the original uulib library was
382     written by Frank Pilhofer <fp@informatik.uni-frankfurt.de>, and later
383     heavily bugfixed by Marc Lehmann.
384    
385     SEE ALSO
386     perl(1), uudeview homepage at http://www.uni-frankfurt.de/~fp/uudeview/.
387