ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/Convert-UUlib/README
Revision: 1.2
Committed: Mon Dec 5 23:56:21 2005 UTC (18 years, 5 months ago) by root
Branch: MAIN
CVS Tags: rel-1_09, rel-0_7, rel-1_06, rel-1_08
Changes since 1.1: +2 -0 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 NAME
2     Convert::UUlib - Perl interface to the uulib library (a.k.a.
3     uudeview/uuenview).
4    
5     SYNOPSIS
6     use Convert::UUlib ':all';
7    
8     # read all the files named on the commandline and decode them
9     # into the CURRENT directory. See below for a longer example.
10     LoadFile $_ for @ARGV;
11     for (my $i = 0; my $uu = GetFileListItem $i; $i++) {
12     if ($uu->state & FILE_OK) {
13     $uu->decode;
14     print $uu->filename, "\n";
15     }
16     }
17    
18     DESCRIPTION
19     Read the file doc/library.pdf from the distribution for in-depth
20     information about the C-library used in this interface, and the rest of
21     this document and especially the non-trivial decoder program at the end.
22    
23     EXPORTED CONSTANTS
24     Action code constants
25     ACT_IDLE we don't do anything
26     ACT_SCANNING scanning an input file
27     ACT_DECODING decoding into a temp file
28     ACT_COPYING copying temp to target
29     ACT_ENCODING encoding a file
30    
31     Message severity levels
32     MSG_MESSAGE just a message, nothing important
33     MSG_NOTE something that should be noticed
34     MSG_WARNING important msg, processing continues
35     MSG_ERROR processing has been terminated
36     MSG_FATAL decoder cannot process further requests
37     MSG_PANIC recovery impossible, app must terminate
38    
39     Options
40     OPT_VERSION version number MAJOR.MINORplPATCH (ro)
41     OPT_FAST assumes only one part per file
42     OPT_DUMBNESS switch off the program's intelligence
43     OPT_BRACKPOL give numbers in [] higher precendence
44     OPT_VERBOSE generate informative messages
45     OPT_DESPERATE try to decode incomplete files
46     OPT_IGNREPLY ignore RE:plies (off by default)
47     OPT_OVERWRITE whether it's OK to overwrite ex. files
48     OPT_SAVEPATH prefix to save-files on disk
49     OPT_IGNMODE ignore the original file mode
50     OPT_DEBUG print messages with FILE/LINE info
51     OPT_ERRNO get last error code for RET_IOERR (ro)
52     OPT_PROGRESS retrieve progress information
53     OPT_USETEXT handle text messages
54     OPT_PREAMB handle Mime preambles/epilogues
55     OPT_TINYB64 detect short B64 outside of Mime
56     OPT_ENCEXT extension for single-part encoded files
57     OPT_REMOVE remove input files after decoding (dangerous)
58     OPT_MOREMIME strict MIME adherence
59     OPT_DOTDOT ".."-unescaping has not yet been done on input files
60 root 1.2 OPT_RBUF set default read I/O buffer size in bytes *EXPERIMENTAL*
61     OPT_WBUF set default write I/O buffer size in bytes *EXPERIMENTAL*
62 root 1.1
63     Result/Error codes
64     RET_OK everything went fine
65     RET_IOERR I/O Error - examine errno
66     RET_NOMEM not enough memory
67     RET_ILLVAL illegal value for operation
68     RET_NODATA decoder didn't find any data
69     RET_NOEND encoded data wasn't ended properly
70     RET_UNSUP unsupported function (encoding)
71     RET_EXISTS file exists (decoding)
72     RET_CONT continue -- special from ScanPart
73     RET_CANCEL operation canceled
74    
75     File States
76     This code is zero, i.e. "false":
77    
78     UUFILE_READ Read in, but not further processed
79    
80     The following state codes are or'ed together:
81    
82     FILE_MISPART Missing Part(s) detected
83     FILE_NOBEGIN No 'begin' found
84     FILE_NOEND No 'end' found
85     FILE_NODATA File does not contain valid uudata
86     FILE_OK All Parts found, ready to decode
87     FILE_ERROR Error while decoding
88     FILE_DECODED Successfully decoded
89     FILE_TMPFILE Temporary decoded file exists
90    
91     Encoding types
92     UU_ENCODED UUencoded data
93     B64_ENCODED Mime-Base64 data
94     XX_ENCODED XXencoded data
95     BH_ENCODED Binhex encoded
96     PT_ENCODED Plain-Text encoded (MIME)
97     QP_ENCODED Quoted-Printable (MIME)
98     YENC_ENCODED yEnc encoded (non-MIME)
99    
100     EXPORTED FUNCTIONS
101     Initializing and cleanup
102     Initialize is automatically called when the module is loaded and
103     allocates quite a small amount of memory for todays machines ;) CleanUp
104     releases that again.
105    
106     On my machine, a fairly complete decode with DBI backend needs about
107     10MB RSS to decode 20000 files.
108    
109     Initialize
110     Not normally necessary, (re-)initializes the library.
111    
112     CleanUp
113     Not normally necessary, could be called at the end to release memory
114     before starting a new decoding round.
115    
116     Setting and querying options
117     $option = GetOption OPT_xxx
118     SetOption OPT_xxx, opt-value
119    
120     See the "OPT_xxx" constants above to see which options exist.
121    
122     Setting various callbacks
123     SetMsgCallback [callback-function]
124     SetBusyCallback [callback-function]
125     SetFileCallback [callback-function]
126     SetFNameFilter [callback-function]
127    
128     Call the currently selected FNameFilter
129     $file = FNameFilter $file
130    
131     Loading sourcefiles, optionally fuzzy merge and start decoding
132     ($retval, $count) = LoadFile $fname, [$id, [$delflag, [$partno]]]
133     Load the given file and scan it for encoded contents. Optionally tag
134     it with the given id, and if $delflag is true, delete the file after
135     it is no longer necessary. If you are certain of the part number,
136     you can specify it as the last argument.
137    
138     A better (usually faster) way of doing this is using the
139     "SetFNameFilter" functionality.
140    
141     $retval = Smerge $pass
142     If you are desperate, try to call "Smerge" with increasing $pass
143     values, beginning at 0, to try to merge parts that usually would not
144     have been merged.
145    
146     Most probably this will result in garbled files, so never do this by
147     default.
148    
149     $item = GetFileListItem $item_number
150     Return the $item structure for the $item_number'th found file, or
151     "undef" of no file with that number exists.
152    
153     The first file has number 0, and the series has no holes, so you can
154     iterate over all files by starting with zero and incrementing until
155     you hit "undef".
156    
157     Decoding files
158     $retval = $item->rename($newname)
159     Change the ondisk filename where the decoded file will be saved.
160    
161     $retval = $item->decode_temp
162     Decode the file into a temporary location, use "$item->infile" to
163     retrieve the temporary filename.
164    
165     $retval = $item->remove_temp
166     Remove the temporarily decoded file again.
167    
168     $retval = $item->decode([$target_path])
169     Decode the file to it's destination, or the given target path.
170    
171     $retval = $item->info(callback-function)
172    
173     Querying (and setting) item attributes
174     $state = $item->state
175     $mode = $item->mode([newmode])
176     $uudet = $item->uudet
177     $size = $item->size
178     $filename = $item->filename([newfilename})
179     $subfname = $item->subfname
180     $mimeid = $item->mimeid
181     $mimetype = $item->mimetype
182     $binfile = $item->binfile
183    
184     Information about source parts
185     $parts = $item->parts
186     Return information about all parts (source files) used to decode the
187     file as a list of hashrefs with the following structure:
188    
189     {
190     partno => <integer describing the part number, starting with 1>,
191     # the following member sonly exist when they contain useful information
192     sfname => <local pathname of the file where this part is from>,
193     filename => <the ondisk filename of the decoded file>,
194     subfname => <used to cluster postings, possibly the posting filename>,
195     subject => <the subject of the posting/mail>,
196     origin => <the possible source (From) address>,
197     mimetype => <the possible mimetype of the decoded file>,
198     mimeid => <the id part of the Content-Type>,
199     }
200    
201     Usually you are interested mostly the "sfname" and possibly the
202     "partno" and "filename" members.
203    
204     Functions below not documented and not very well tested
205     QuickDecode
206     EncodeMulti
207     EncodePartial
208     EncodeToStream
209     EncodeToFile
210     E_PrepSingle
211     E_PrepPartial
212    
213     EXTENSION FUNCTIONS
214     Functions found in this module but not documented in the uulib
215     documentation:
216    
217     $msg = straction ACT_xxx
218     Return a human readable string representing the given action code.
219    
220     $msg = strerror RET_xxx
221     Return a human readable string representing the given error code.
222    
223     $str = strencoding xxx_ENCODED
224     Return the name of the encoding type as a string.
225    
226     $str = strmsglevel MSG_xxx
227     Returns the message level as a string.
228    
229     SetFileNameCallback $cb
230     Sets (or queries) the FileNameCallback, which is called whenever the
231     decoding library can't find a filename and wants to extract a
232     filename from the subject line of a posting. The callback will be
233     called with two arguments, the subject line and the current
234     candidate for the filename. The latter argument can be "undef",
235     which means that no filename could be found (and likely no one
236     exists, so it is safe to also return "undef" in this case). If it
237     doesn't return anything (not even "undef"!), then nothing happens,
238     so this is a no-op callback:
239    
240     sub cb {
241     return ();
242     }
243    
244     If it returns "undef", then this indicates that no filename could be
245     found. In all other cases, the return value is taken to be the
246     filename.
247    
248     This is a slightly more useful callback:
249    
250     sub cb {
251     return unless $_[1]; # skip "Re:"-plies et al.
252     my ($subject, $filename) = @_;
253     # if we find some *.rar, take it
254     return $1 if $subject =~ /(\w+\.rar)/;
255     # otherwise just pass what we have
256     return ();
257     }
258    
259     LARGE EXAMPLE DECODER
260     This is the file "example-decoder" from the distribution, put here
261     instead of more thorough documentation.
262    
263     # decode all the files in the directory uusrc/ and copy
264     # the resulting files to uudst/
265    
266     use Convert::UUlib ':all';
267    
268     sub namefilter {
269     my($path)=@_;
270     $path=~s/^.*[\/\\]//;
271     $path;
272     }
273    
274     sub busycb {
275     my ($action, $curfile, $partno, $numparts, $percent, $fsize) = @_;
276     $_[0]=straction($action);
277     print "busy_callback(", (join ",",@_), ")\n";
278     0;
279     }
280    
281     SetOption OPT_IGNMODE, 1;
282     SetOption OPT_VERBOSE, 1;
283    
284     # show the three ways you can set callback functions. I normally
285     # prefer the one with the sub inplace.
286     SetFNameFilter \&namefilter;
287    
288     SetBusyCallback "busycb", 333;
289    
290     SetMsgCallback sub {
291     my ($msg, $level) = @_;
292     print uc strmsglevel $_[1], ": $msg\n";
293     };
294    
295     # the following non-trivial FileNameCallback takes care
296     # of some subject lines not detected properly by uulib:
297     SetFileNameCallback sub {
298     return unless $_[1]; # skip "Re:"-plies et al.
299     local $_ = $_[0];
300    
301     # the following rules are rather effective on some newsgroups,
302     # like alt.binaries.games.anime, where non-mime, uuencoded data
303     # is very common
304    
305     # if we find some *.rar, take it as the filename
306     return $1 if /(\S{3,}\.(?:[rstuvwxyz]\d\d|rar))\s/i;
307    
308     # one common subject format
309     return $1 if /- "(.{2,}?\..+?)" (?:yenc )?\(\d+\/\d+\)/i;
310    
311     # - filename.par (04/55)
312     return $1 if /- "?(\S{3,}\.\S+?)"? (?:yenc )?\(\d+\/\d+\)/i;
313    
314     # - (xxx) No. 1 sayuri81.jpg 756565 bytes
315     # - (20 files) No.17 Roseanne.jpg [2/2]
316     return $1 if /No\.[ 0-9]+ (\S+\....) (?:\d+ bytes )?\[/;
317    
318     # otherwise just pass what we have
319     return ();
320     };
321    
322     # now read all files in the directory uusrc/*
323     for(<uusrc/*>) {
324     my($retval,$count)=LoadFile ($_, $_, 1);
325     print "file($_), status(", strerror $retval, ") parts($count)\n";
326     }
327    
328     SetOption OPT_SAVEPATH, "uudst/";
329    
330     # now wade through all files and their source parts
331     $i = 0;
332     while ($uu = GetFileListItem($i)) {
333     $i++;
334     print "file nr. $i";
335     print " state ", $uu->state;
336     print " mode ", $uu->mode;
337     print " uudet ", strencoding $uu->uudet;
338     print " size ", $uu->size;
339     print " filename ", $uu->filename;
340     print " subfname ", $uu->subfname;
341     print " mimeid ", $uu->mimeid;
342     print " mimetype ", $uu->mimetype;
343     print "\n";
344    
345     # print additional info about all parts
346     for ($uu->parts) {
347     while (my ($k, $v) = each %$_) {
348     print "$k > $v, ";
349     }
350     print "\n";
351     }
352    
353     $uu->decode_temp;
354     print " temporarily decoded to ", $uu->binfile, "\n";
355     $uu->remove_temp;
356    
357     print strerror $uu->decode;
358     print " saved as uudst/", $uu->filename, "\n";
359     }
360    
361     print "cleanup...\n";
362    
363     CleanUp();
364    
365     AUTHOR
366     Marc Lehmann <schmorp@schmorp.de>, the original uulib library was
367     written by Frank Pilhofer <fp@informatik.uni-frankfurt.de>, and later
368     heavily bugfixed by Marc Lehmann.
369    
370     SEE ALSO
371     perl(1), uudeview homepage at http://www.uni-frankfurt.de/~fp/uudeview/.
372