ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/cvsroot/Convert-UUlib/README
Revision: 1.6
Committed: Sun Mar 1 05:14:55 2020 UTC (4 years, 4 months ago) by root
Branch: MAIN
CVS Tags: rel-1_71, rel-1_7
Changes since 1.5: +81 -37 lines
Log Message:
1.7

File Contents

# Content
1 NAME
2 Convert::UUlib - Perl interface to the uulib library (a.k.a.
3 uudeview/uuenview).
4
5 SYNOPSIS
6 use Convert::UUlib ':all';
7
8 # read all the files named on the commandline and decode them
9 # into the CURRENT directory. See below for a longer example.
10 LoadFile $_ for @ARGV;
11
12 for my $uu (GetFileList) {
13 if ($uu->state & FILE_OK) {
14 $uu->decode;
15 print $uu->filename, "\n";
16 }
17 }
18
19 DESCRIPTION
20 Read the file doc/library.pdf from the distribution for in-depth
21 information about the C-library used in this interface, and the rest of
22 this document and especially the non-trivial decoder program at the end.
23
24 EXPORTED CONSTANTS
25 Action code constants
26 ACT_IDLE we don't do anything
27 ACT_SCANNING scanning an input file
28 ACT_DECODING decoding into a temp file
29 ACT_COPYING copying temp to target
30 ACT_ENCODING encoding a file
31
32 Message severity levels
33 MSG_MESSAGE just a message, nothing important
34 MSG_NOTE something that should be noticed
35 MSG_WARNING important msg, processing continues
36 MSG_ERROR processing has been terminated
37 MSG_FATAL decoder cannot process further requests
38 MSG_PANIC recovery impossible, app must terminate
39
40 Options
41 OPT_VERSION version number MAJOR.MINORplPATCH (ro)
42 OPT_FAST assumes only one part per file
43 OPT_DUMBNESS switch off the program's intelligence
44 OPT_BRACKPOL give numbers in [] higher precendence
45 OPT_VERBOSE generate informative messages
46 OPT_DESPERATE try to decode incomplete files
47 OPT_IGNREPLY ignore RE:plies (off by default)
48 OPT_OVERWRITE whether it's OK to overwrite ex. files
49 OPT_SAVEPATH prefix to save-files on disk
50 OPT_IGNMODE ignore the original file mode
51 OPT_DEBUG print messages with FILE/LINE info
52 OPT_ERRNO get last error code for RET_IOERR (ro)
53 OPT_PROGRESS retrieve progress information
54 OPT_USETEXT handle text messages
55 OPT_PREAMB handle Mime preambles/epilogues
56 OPT_TINYB64 detect short B64 outside of Mime
57 OPT_ENCEXT extension for single-part encoded files
58 OPT_REMOVE remove input files after decoding (dangerous)
59 OPT_MOREMIME strict MIME adherence
60 OPT_DOTDOT ".."-unescaping has not yet been done on input files
61 OPT_RBUF set default read I/O buffer size in bytes
62 OPT_WBUF set default write I/O buffer size in bytes
63 OPT_AUTOCHECK automatically check file list after every loadfile
64
65 Result/Error codes
66 RET_OK everything went fine
67 RET_IOERR I/O Error - examine errno
68 RET_NOMEM not enough memory
69 RET_ILLVAL illegal value for operation
70 RET_NODATA decoder didn't find any data
71 RET_NOEND encoded data wasn't ended properly
72 RET_UNSUP unsupported function (encoding)
73 RET_EXISTS file exists (decoding)
74 RET_CONT continue -- special from ScanPart
75 RET_CANCEL operation canceled
76
77 File States
78 This code is zero, i.e. "false":
79
80 UUFILE_READ Read in, but not further processed
81
82 The following state codes are or'ed together:
83
84 FILE_MISPART Missing Part(s) detected
85 FILE_NOBEGIN No 'begin' found
86 FILE_NOEND No 'end' found
87 FILE_NODATA File does not contain valid uudata
88 FILE_OK All Parts found, ready to decode
89 FILE_ERROR Error while decoding
90 FILE_DECODED Successfully decoded
91 FILE_TMPFILE Temporary decoded file exists
92
93 Encoding types
94 UU_ENCODED UUencoded data
95 B64_ENCODED Mime-Base64 data
96 XX_ENCODED XXencoded data
97 BH_ENCODED Binhex encoded
98 PT_ENCODED Plain-Text encoded (MIME)
99 QP_ENCODED Quoted-Printable (MIME)
100 YENC_ENCODED yEnc encoded (non-MIME)
101
102 EXPORTED FUNCTIONS
103 Initializing and cleanup
104 Initialize is automatically called when the module is loaded and
105 allocates quite a small amount of memory for todays machines ;) CleanUp
106 releases that again.
107
108 On my machine, a fairly complete decode with DBI backend needs about
109 10MB RSS to decode 20000 files.
110
111 CleanUp
112 Release memory, file items and clean up files. Should be called
113 after a decoidng run, if you want to start a new one.
114
115 Setting and querying options
116 $option = GetOption OPT_xxx
117 SetOption OPT_xxx, opt-value
118
119 See the "OPT_xxx" constants above to see which options exist.
120
121 Setting various callbacks
122 SetMsgCallback [callback-function]
123 SetBusyCallback [callback-function]
124 SetFileCallback [callback-function]
125 SetFNameFilter [callback-function]
126
127 Call the currently selected FNameFilter
128 $file = FNameFilter $file
129
130 Loading sourcefiles, optionally fuzzy merge and start decoding
131 ($retval, $count) = LoadFile $fname, [$id, [$delflag, [$partno]]]
132 Load the given file and scan it for encoded contents. Optionally tag
133 it with the given id, and if $delflag is true, delete the file after
134 it is no longer necessary. If you are certain of the part number,
135 you can specify it as the last argument.
136
137 A better (usually faster) way of doing this is using the
138 "SetFNameFilter" functionality.
139
140 $retval = Smerge $pass
141 If you are desperate, try to call "Smerge" with increasing $pass
142 values, beginning at 0, to try to merge parts that usually would not
143 have been merged.
144
145 Most probably this will result in garbled files, so never do this by
146 default, except:
147
148 If the "OPT_AUTOCHECK" option has been disabled (by default it is
149 enabled) to speed up file loading, then you *have* to call "Smerge
150 -1" after loading all files as an additional pre-pass (which is
151 normally done by "LoadFile").
152
153 $item = GetFileListItem $item_number
154 Return the $item structure for the $item_number'th found file, or
155 "undef" of no file with that number exists.
156
157 The first file has number 0, and the series has no holes, so you can
158 iterate over all files by starting with zero and incrementing until
159 you hit "undef".
160
161 This function has to walk the linear list of fils on each access, so
162 if you want to iterate over all items, it is usually faster to use
163 "GetFileList".
164
165 @items = GetFileList
166 Similar to "GetFileListItem", but returns all files in one go.
167
168 Decoding files
169 $retval = $item->rename ($newname)
170 Change the ondisk filename where the decoded file will be saved.
171
172 $retval = $item->decode_temp
173 Decode the file into a temporary location, use "$item->infile" to
174 retrieve the temporary filename.
175
176 $retval = $item->remove_temp
177 Remove the temporarily decoded file again.
178
179 $retval = $item->decode ([$target_path])
180 Decode the file to its destination, or the given target path.
181
182 $retval = $item->info (callback-function)
183
184 Querying (and setting) item attributes
185 $state = $item->state
186 $mode = $item->mode ([newmode])
187 $uudet = $item->uudet
188 $size = $item->size
189 $filename = $item->filename ([newfilename})
190 $subfname = $item->subfname
191 $mimeid = $item->mimeid
192 $mimetype = $item->mimetype
193 $binfile = $item->binfile
194
195 Information about source parts
196 $parts = $item->parts
197 Return information about all parts (source files) used to decode the
198 file as a list of hashrefs with the following structure:
199
200 {
201 partno => <integer describing the part number, starting with 1>,
202 # the following member sonly exist when they contain useful information
203 sfname => <local pathname of the file where this part is from>,
204 filename => <the ondisk filename of the decoded file>,
205 subfname => <used to cluster postings, possibly the posting filename>,
206 subject => <the subject of the posting/mail>,
207 origin => <the possible source (From) address>,
208 mimetype => <the possible mimetype of the decoded file>,
209 mimeid => <the id part of the Content-Type>,
210 }
211
212 Usually you are interested mostly the "sfname" and possibly the
213 "partno" and "filename" members.
214
215 Functions below are not documented and not very well tested - feedback welcome
216 QuickDecode
217 EncodeMulti
218 EncodePartial
219 EncodeToStream
220 EncodeToFile
221 E_PrepSingle
222 E_PrepPartial
223
224 EXTENSION FUNCTIONS
225 Functions found in this module but not documented in the uulib
226 documentation:
227
228 $msg = straction ACT_xxx
229 Return a human readable string representing the given action code.
230
231 $msg = strerror RET_xxx
232 Return a human readable string representing the given error code.
233
234 $str = strencoding xxx_ENCODED
235 Return the name of the encoding type as a string.
236
237 $str = strmsglevel MSG_xxx
238 Returns the message level as a string.
239
240 SetFileNameCallback $cb
241 Sets (or queries) the FileNameCallback, which is called whenever the
242 decoding library can't find a filename and wants to extract a
243 filename from the subject line of a posting. The callback will be
244 called with two arguments, the subject line and the current
245 candidate for the filename. The latter argument can be "undef",
246 which means that no filename could be found (and likely no one
247 exists, so it is safe to also return "undef" in this case). If it
248 doesn't return anything (not even "undef"!), then nothing happens,
249 so this is a no-op callback:
250
251 sub cb {
252 return ();
253 }
254
255 If it returns "undef", then this indicates that no filename could be
256 found. In all other cases, the return value is taken to be the
257 filename.
258
259 This is a slightly more useful callback:
260
261 sub cb {
262 return unless $_[1]; # skip "Re:"-plies et al.
263 my ($subject, $filename) = @_;
264 # if we find some *.rar, take it
265 return $1 if $subject =~ /(\w+\.rar)/;
266 # otherwise just pass what we have
267 return ();
268 }
269
270 LARGE EXAMPLE DECODER
271 The general workflow for decoding is like this:
272
273 1. Configure options with "SetOption" or "SetXXXCallback".
274 2. Load all source files with "LoadFile".
275 3. Optionally "Smerge".
276 4. Iterate over all "GetFileList" items (i.e. result files).
277 5. "CleanUp" to delete files and free items.
278
279 What follows is the file "example-decoder" from the distribution that
280 illustrates the above worklfow in a non-trivial example.
281
282 #!/usr/bin/perl
283
284 # decode all the files in the directory uusrc/ and copy
285 # the resulting files to uudst/
286
287 use Convert::UUlib ':all';
288
289 sub namefilter {
290 my ($path) = @_;
291
292 $path=~s/^.*[\/\\]//;
293
294 $path
295 }
296
297 sub busycb {
298 my ($action, $curfile, $partno, $numparts, $percent, $fsize) = @_;
299 $_[0]=straction($action);
300 print "busy_callback(", (join ",",@_), ")\n";
301 0
302 }
303
304 SetOption OPT_RBUF, 128*1024;
305 SetOption OPT_WBUF, 1024*1024;
306 SetOption OPT_IGNMODE, 1;
307 SetOption OPT_IGNMODE, 1;
308 SetOption OPT_VERBOSE, 1;
309
310 # show the three ways you can set callback functions. I normally
311 # prefer the one with the sub inplace.
312 SetFNameFilter \&namefilter;
313
314 SetBusyCallback "busycb", 333;
315
316 SetMsgCallback sub {
317 my ($msg, $level) = @_;
318 print uc strmsglevel $_[1], ": $msg\n";
319 };
320
321 # the following non-trivial FileNameCallback takes care
322 # of some subject lines not detected properly by uulib:
323 SetFileNameCallback sub {
324 return unless $_[1]; # skip "Re:"-plies et al.
325 local $_ = $_[0];
326
327 # the following rules are rather effective on some newsgroups,
328 # like alt.binaries.games.anime, where non-mime, uuencoded data
329 # is very common
330
331 # if we find some *.rar, take it as the filename
332 return $1 if /(\S{3,}\.(?:[rstuvwxyz]\d\d|rar))\s/i;
333
334 # one common subject format
335 return $1 if /- "(.{2,}?\..+?)" (?:yenc )?\(\d+\/\d+\)/i;
336
337 # - filename.par (04/55)
338 return $1 if /- "?(\S{3,}\.\S+?)"? (?:yenc )?\(\d+\/\d+\)/i;
339
340 # - (xxx) No. 1 sayuri81.jpg 756565 bytes
341 # - (20 files) No.17 Roseanne.jpg [2/2]
342 return $1 if /No\.[ 0-9]+ (\S+\....) (?:\d+ bytes )?\[/;
343
344 # try to detect some common forms of filenames
345 return $1 if /([a-z0-9_\-+.]{3,}\.[a-z]{3,4}(?:.\d+))/i;
346
347 # otherwise just pass what we have
348 ()
349 };
350
351 # now read all files in the directory uusrc/*
352 for (<uusrc/*>) {
353 my ($retval, $count) = LoadFile ($_, $_, 1);
354 print "file($_), status(", strerror $retval, ") parts($count)\n";
355 }
356
357 SetOption OPT_SAVEPATH, "uudst/";
358
359 # now wade through all files and their source parts
360 for my $uu (GetFileList) {
361 print "file ", $uu->filename, "\n";
362 print " state ", $uu->state, "\n";
363 print " mode ", $uu->mode, "\n";
364 print " uudet ", strencoding $uu->uudet, "\n";
365 print " size ", $uu->size, "\n";
366 print " subfname ", $uu->subfname, "\n";
367 print " mimeid ", $uu->mimeid, "\n";
368 print " mimetype ", $uu->mimetype, "\n";
369
370 # print additional info about all parts
371 print " parts";
372 for ($uu->parts) {
373 for my $k (sort keys %$_) {
374 print " $k=$_->{$k}";
375 }
376 print "\n";
377 }
378
379 $uu->remove_temp;
380
381 if (my $err = $uu->decode) {
382 print " ERROR ", strerror $err, "\n";
383 } else {
384 print " successfully saved as uudst/", $uu->filename, "\n";
385 }
386 }
387
388 print "cleanup...\n";
389
390 CleanUp;
391
392 PERLMULTICORE SUPPORT
393 This module supports the perlmulticore standard (see
394 <http://perlmulticore.schmorp.de/> for more info) for the following
395 functions - generally these are functions accessing the disk and/or
396 using considerable CPU time:
397
398 LoadFile
399 $item->decode
400 $item->decode_temp
401 $item->remove_temp
402 $item->info
403
404 The perl interpreter will be reacquired/released on every callback
405 invocation, so for performance reasons, callbacks should be avoided if
406 that is costly.
407
408 Future versions might enable multicore support for more functions.
409
410 BUGS AND LIMITATIONS
411 The original uulib library this module uses was written at a time where
412 main memory of measured in megabytes and buffer overflows as a security
413 thign didn't exist. While a lot of security fixes have been applied over
414 the years (includign some defense in depth mechanism that can shield
415 against a lot of as-of-yet undetected bugs), using this library for
416 security purposes requires care.
417
418 Likewise, file sizes when the uulib library was written were tiny
419 compared to today, so do not expect this library to handle files larger
420 than 2GB.
421
422 Lastly, this module uses a very "C-like" interface, which means it
423 doesn't protect you from invalid points as you might expect from "more
424 perlish" modules - for example, accessing a file item object after
425 callinbg "CleanUp" will likely result in crashes, memory corruption, or
426 worse.
427
428 AUTHOR
429 Marc Lehmann <schmorp@schmorp.de>, the original uulib library was
430 written by Frank Pilhofer <fp@informatik.uni-frankfurt.de>, and later
431 heavily bugfixed by Marc Lehmann.
432
433 SEE ALSO
434 perl(1), uudeview homepage at <http://www.fpx.de/fp/Software/UUDeview/>.
435