ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/Convert-UUlib/README
Revision: 1.4
Committed: Mon Oct 13 12:12:56 2008 UTC (15 years, 7 months ago) by root
Branch: MAIN
CVS Tags: rel-1_12
Changes since 1.3: +112 -97 lines
Log Message:
1.12

File Contents

# Content
1 NAME
2 Convert::UUlib - Perl interface to the uulib library (a.k.a.
3 uudeview/uuenview).
4
5 SYNOPSIS
6 use Convert::UUlib ':all';
7
8 # read all the files named on the commandline and decode them
9 # into the CURRENT directory. See below for a longer example.
10 LoadFile $_ for @ARGV;
11 for (my $i = 0; my $uu = GetFileListItem $i; $i++) {
12 if ($uu->state & FILE_OK) {
13 $uu->decode;
14 print $uu->filename, "\n";
15 }
16 }
17
18 DESCRIPTION
19 Read the file doc/library.pdf from the distribution for in-depth
20 information about the C-library used in this interface, and the rest of
21 this document and especially the non-trivial decoder program at the end.
22
23 EXPORTED CONSTANTS
24 Action code constants
25 ACT_IDLE we don't do anything
26 ACT_SCANNING scanning an input file
27 ACT_DECODING decoding into a temp file
28 ACT_COPYING copying temp to target
29 ACT_ENCODING encoding a file
30
31 Message severity levels
32 MSG_MESSAGE just a message, nothing important
33 MSG_NOTE something that should be noticed
34 MSG_WARNING important msg, processing continues
35 MSG_ERROR processing has been terminated
36 MSG_FATAL decoder cannot process further requests
37 MSG_PANIC recovery impossible, app must terminate
38
39 Options
40 OPT_VERSION version number MAJOR.MINORplPATCH (ro)
41 OPT_FAST assumes only one part per file
42 OPT_DUMBNESS switch off the program's intelligence
43 OPT_BRACKPOL give numbers in [] higher precendence
44 OPT_VERBOSE generate informative messages
45 OPT_DESPERATE try to decode incomplete files
46 OPT_IGNREPLY ignore RE:plies (off by default)
47 OPT_OVERWRITE whether it's OK to overwrite ex. files
48 OPT_SAVEPATH prefix to save-files on disk
49 OPT_IGNMODE ignore the original file mode
50 OPT_DEBUG print messages with FILE/LINE info
51 OPT_ERRNO get last error code for RET_IOERR (ro)
52 OPT_PROGRESS retrieve progress information
53 OPT_USETEXT handle text messages
54 OPT_PREAMB handle Mime preambles/epilogues
55 OPT_TINYB64 detect short B64 outside of Mime
56 OPT_ENCEXT extension for single-part encoded files
57 OPT_REMOVE remove input files after decoding (dangerous)
58 OPT_MOREMIME strict MIME adherence
59 OPT_DOTDOT ".."-unescaping has not yet been done on input files
60 OPT_RBUF set default read I/O buffer size in bytes *EXPERIMENTAL*
61 OPT_WBUF set default write I/O buffer size in bytes *EXPERIMENTAL*
62
63 Result/Error codes
64 RET_OK everything went fine
65 RET_IOERR I/O Error - examine errno
66 RET_NOMEM not enough memory
67 RET_ILLVAL illegal value for operation
68 RET_NODATA decoder didn't find any data
69 RET_NOEND encoded data wasn't ended properly
70 RET_UNSUP unsupported function (encoding)
71 RET_EXISTS file exists (decoding)
72 RET_CONT continue -- special from ScanPart
73 RET_CANCEL operation canceled
74
75 File States
76 This code is zero, i.e. "false":
77
78 UUFILE_READ Read in, but not further processed
79
80 The following state codes are or'ed together:
81
82 FILE_MISPART Missing Part(s) detected
83 FILE_NOBEGIN No 'begin' found
84 FILE_NOEND No 'end' found
85 FILE_NODATA File does not contain valid uudata
86 FILE_OK All Parts found, ready to decode
87 FILE_ERROR Error while decoding
88 FILE_DECODED Successfully decoded
89 FILE_TMPFILE Temporary decoded file exists
90
91 Encoding types
92 UU_ENCODED UUencoded data
93 B64_ENCODED Mime-Base64 data
94 XX_ENCODED XXencoded data
95 BH_ENCODED Binhex encoded
96 PT_ENCODED Plain-Text encoded (MIME)
97 QP_ENCODED Quoted-Printable (MIME)
98 YENC_ENCODED yEnc encoded (non-MIME)
99
100 EXPORTED FUNCTIONS
101 Initializing and cleanup
102 Initialize is automatically called when the module is loaded and
103 allocates quite a small amount of memory for todays machines ;) CleanUp
104 releases that again.
105
106 On my machine, a fairly complete decode with DBI backend needs about
107 10MB RSS to decode 20000 files.
108
109 Initialize
110 Not normally necessary, (re-)initializes the library.
111
112 CleanUp
113 Not normally necessary, could be called at the end to release memory
114 before starting a new decoding round.
115
116 Setting and querying options
117 $option = GetOption OPT_xxx
118 SetOption OPT_xxx, opt-value
119
120 See the "OPT_xxx" constants above to see which options exist.
121
122 Setting various callbacks
123 SetMsgCallback [callback-function]
124 SetBusyCallback [callback-function]
125 SetFileCallback [callback-function]
126 SetFNameFilter [callback-function]
127
128 Call the currently selected FNameFilter
129 $file = FNameFilter $file
130
131 Loading sourcefiles, optionally fuzzy merge and start decoding
132 ($retval, $count) = LoadFile $fname, [$id, [$delflag, [$partno]]]
133 Load the given file and scan it for encoded contents. Optionally tag
134 it with the given id, and if $delflag is true, delete the file after
135 it is no longer necessary. If you are certain of the part number,
136 you can specify it as the last argument.
137
138 A better (usually faster) way of doing this is using the
139 "SetFNameFilter" functionality.
140
141 $retval = Smerge $pass
142 If you are desperate, try to call "Smerge" with increasing $pass
143 values, beginning at 0, to try to merge parts that usually would not
144 have been merged.
145
146 Most probably this will result in garbled files, so never do this by
147 default.
148
149 $item = GetFileListItem $item_number
150 Return the $item structure for the $item_number'th found file, or
151 "undef" of no file with that number exists.
152
153 The first file has number 0, and the series has no holes, so you can
154 iterate over all files by starting with zero and incrementing until
155 you hit "undef".
156
157 Decoding files
158 $retval = $item->rename($newname)
159 Change the ondisk filename where the decoded file will be saved.
160
161 $retval = $item->decode_temp
162 Decode the file into a temporary location, use "$item->infile" to
163 retrieve the temporary filename.
164
165 $retval = $item->remove_temp
166 Remove the temporarily decoded file again.
167
168 $retval = $item->decode([$target_path])
169 Decode the file to it's destination, or the given target path.
170
171 $retval = $item->info(callback-function)
172
173 Querying (and setting) item attributes
174 $state = $item->state
175 $mode = $item->mode([newmode])
176 $uudet = $item->uudet
177 $size = $item->size
178 $filename = $item->filename([newfilename})
179 $subfname = $item->subfname
180 $mimeid = $item->mimeid
181 $mimetype = $item->mimetype
182 $binfile = $item->binfile
183
184 Information about source parts
185 $parts = $item->parts
186 Return information about all parts (source files) used to decode the
187 file as a list of hashrefs with the following structure:
188
189 {
190 partno => <integer describing the part number, starting with 1>,
191 # the following member sonly exist when they contain useful information
192 sfname => <local pathname of the file where this part is from>,
193 filename => <the ondisk filename of the decoded file>,
194 subfname => <used to cluster postings, possibly the posting filename>,
195 subject => <the subject of the posting/mail>,
196 origin => <the possible source (From) address>,
197 mimetype => <the possible mimetype of the decoded file>,
198 mimeid => <the id part of the Content-Type>,
199 }
200
201 Usually you are interested mostly the "sfname" and possibly the
202 "partno" and "filename" members.
203
204 Functions below not documented and not very well tested
205 QuickDecode
206 EncodeMulti
207 EncodePartial
208 EncodeToStream
209 EncodeToFile
210 E_PrepSingle
211 E_PrepPartial
212
213 EXTENSION FUNCTIONS
214 Functions found in this module but not documented in the uulib
215 documentation:
216
217 $msg = straction ACT_xxx
218 Return a human readable string representing the given action code.
219
220 $msg = strerror RET_xxx
221 Return a human readable string representing the given error code.
222
223 $str = strencoding xxx_ENCODED
224 Return the name of the encoding type as a string.
225
226 $str = strmsglevel MSG_xxx
227 Returns the message level as a string.
228
229 SetFileNameCallback $cb
230 Sets (or queries) the FileNameCallback, which is called whenever the
231 decoding library can't find a filename and wants to extract a
232 filename from the subject line of a posting. The callback will be
233 called with two arguments, the subject line and the current
234 candidate for the filename. The latter argument can be "undef",
235 which means that no filename could be found (and likely no one
236 exists, so it is safe to also return "undef" in this case). If it
237 doesn't return anything (not even "undef"!), then nothing happens,
238 so this is a no-op callback:
239
240 sub cb {
241 return ();
242 }
243
244 If it returns "undef", then this indicates that no filename could be
245 found. In all other cases, the return value is taken to be the
246 filename.
247
248 This is a slightly more useful callback:
249
250 sub cb {
251 return unless $_[1]; # skip "Re:"-plies et al.
252 my ($subject, $filename) = @_;
253 # if we find some *.rar, take it
254 return $1 if $subject =~ /(\w+\.rar)/;
255 # otherwise just pass what we have
256 return ();
257 }
258
259 LARGE EXAMPLE DECODER
260 This is the file "example-decoder" from the distribution, put here
261 instead of more thorough documentation.
262
263 #!/usr/bin/perl
264
265 # decode all the files in the directory uusrc/ and copy
266 # the resulting files to uudst/
267
268 use Convert::UUlib ':all';
269
270 sub namefilter {
271 my ($path) = @_;
272
273 $path=~s/^.*[\/\\]//;
274
275 $path
276 }
277
278 sub busycb {
279 my ($action, $curfile, $partno, $numparts, $percent, $fsize) = @_;
280 $_[0]=straction($action);
281 print "busy_callback(", (join ",",@_), ")\n";
282 0
283 }
284
285 SetOption OPT_RBUF, 128*1024;
286 SetOption OPT_WBUF, 1024*1024;
287 SetOption OPT_IGNMODE, 1;
288 SetOption OPT_IGNMODE, 1;
289 SetOption OPT_VERBOSE, 1;
290
291 # show the three ways you can set callback functions. I normally
292 # prefer the one with the sub inplace.
293 SetFNameFilter \&namefilter;
294
295 SetBusyCallback "busycb", 333;
296
297 SetMsgCallback sub {
298 my ($msg, $level) = @_;
299 print uc strmsglevel $_[1], ": $msg\n";
300 };
301
302 # the following non-trivial FileNameCallback takes care
303 # of some subject lines not detected properly by uulib:
304 SetFileNameCallback sub {
305 return unless $_[1]; # skip "Re:"-plies et al.
306 local $_ = $_[0];
307
308 return $1 if /(\S+\s+IMG_\d+.jpg)/i;
309
310 # the following rules are rather effective on some newsgroups,
311 # like alt.binaries.games.anime, where non-mime, uuencoded data
312 # is very common
313
314 # if we find some *.rar, take it as the filename
315 return $1 if /(\S{3,}\.(?:[rstuvwxyz]\d\d|rar))\s/i;
316
317 # one common subject format
318 return $1 if /- "(.{2,}?\..+?)" (?:yenc )?\(\d+\/\d+\)/i;
319
320 # - filename.par (04/55)
321 return $1 if /- "?(\S{3,}\.\S+?)"? (?:yenc )?\(\d+\/\d+\)/i;
322
323 # - (xxx) No. 1 sayuri81.jpg 756565 bytes
324 # - (20 files) No.17 Roseanne.jpg [2/2]
325 return $1 if /No\.[ 0-9]+ (\S+\....) (?:\d+ bytes )?\[/;
326
327 # try to detect some common forms of filenames
328 return $1 if /([a-z0-9_\-+.]{3,}\.[a-z]{3,4}(?:.\d+))/i;
329
330 # otherwise just pass what we have
331 ()
332 };
333
334 # now read all files in the directory uusrc/*
335 for(<uusrc/*>) {
336 my ($retval, $count) = LoadFile ($_, $_, 1);
337 print "file($_), status(", strerror $retval, ") parts($count)\n";
338 }
339
340 SetOption OPT_SAVEPATH, "uudst/";
341
342 # now wade through all files and their source parts
343 $i = 0;
344 while ($uu = GetFileListItem $i) {
345 $i++;
346 print "file nr. $i";
347 print " state ", $uu->state;
348 print " mode ", $uu->mode;
349 print " uudet ", strencoding $uu->uudet;
350 print " size ", $uu->size;
351 print " filename ", $uu->filename;
352 print " subfname ", $uu->subfname;
353 print " mimeid ", $uu->mimeid;
354 print " mimetype ", $uu->mimetype;
355 print "\n";
356
357 # print additional info about all parts
358 for ($uu->parts) {
359 while (my ($k, $v) = each %$_) {
360 print "$k > $v, ";
361 }
362 print "\n";
363 }
364
365 print $uu->filename;
366
367 $uu->remove_temp;
368
369 if (my $err = $uu->decode ()) {
370 print ", ", strerror $err, "\n";
371 } else {
372 print ", saved as uudst/", $uu->filename, "\n";
373 }
374 }
375
376 print "cleanup...\n";
377
378 CleanUp;
379
380 AUTHOR
381 Marc Lehmann <schmorp@schmorp.de>, the original uulib library was
382 written by Frank Pilhofer <fp@informatik.uni-frankfurt.de>, and later
383 heavily bugfixed by Marc Lehmann.
384
385 SEE ALSO
386 perl(1), uudeview homepage at http://www.uni-frankfurt.de/~fp/uudeview/.
387