ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/Convert-UUlib/README
Revision: 1.3
Committed: Fri Jun 13 13:27:51 2008 UTC (15 years, 11 months ago) by root
Branch: MAIN
CVS Tags: rel-1_11, rel-1_10
Changes since 1.2: +2 -2 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 NAME
2 Convert::UUlib - Perl interface to the uulib library (a.k.a.
3 uudeview/uuenview).
4
5 SYNOPSIS
6 use Convert::UUlib ':all';
7
8 # read all the files named on the commandline and decode them
9 # into the CURRENT directory. See below for a longer example.
10 LoadFile $_ for @ARGV;
11 for (my $i = 0; my $uu = GetFileListItem $i; $i++) {
12 if ($uu->state & FILE_OK) {
13 $uu->decode;
14 print $uu->filename, "\n";
15 }
16 }
17
18 DESCRIPTION
19 Read the file doc/library.pdf from the distribution for in-depth
20 information about the C-library used in this interface, and the rest of
21 this document and especially the non-trivial decoder program at the end.
22
23 EXPORTED CONSTANTS
24 Action code constants
25 ACT_IDLE we don't do anything
26 ACT_SCANNING scanning an input file
27 ACT_DECODING decoding into a temp file
28 ACT_COPYING copying temp to target
29 ACT_ENCODING encoding a file
30
31 Message severity levels
32 MSG_MESSAGE just a message, nothing important
33 MSG_NOTE something that should be noticed
34 MSG_WARNING important msg, processing continues
35 MSG_ERROR processing has been terminated
36 MSG_FATAL decoder cannot process further requests
37 MSG_PANIC recovery impossible, app must terminate
38
39 Options
40 OPT_VERSION version number MAJOR.MINORplPATCH (ro)
41 OPT_FAST assumes only one part per file
42 OPT_DUMBNESS switch off the program's intelligence
43 OPT_BRACKPOL give numbers in [] higher precendence
44 OPT_VERBOSE generate informative messages
45 OPT_DESPERATE try to decode incomplete files
46 OPT_IGNREPLY ignore RE:plies (off by default)
47 OPT_OVERWRITE whether it's OK to overwrite ex. files
48 OPT_SAVEPATH prefix to save-files on disk
49 OPT_IGNMODE ignore the original file mode
50 OPT_DEBUG print messages with FILE/LINE info
51 OPT_ERRNO get last error code for RET_IOERR (ro)
52 OPT_PROGRESS retrieve progress information
53 OPT_USETEXT handle text messages
54 OPT_PREAMB handle Mime preambles/epilogues
55 OPT_TINYB64 detect short B64 outside of Mime
56 OPT_ENCEXT extension for single-part encoded files
57 OPT_REMOVE remove input files after decoding (dangerous)
58 OPT_MOREMIME strict MIME adherence
59 OPT_DOTDOT ".."-unescaping has not yet been done on input files
60 OPT_RBUF set default read I/O buffer size in bytes *EXPERIMENTAL*
61 OPT_WBUF set default write I/O buffer size in bytes *EXPERIMENTAL*
62
63 Result/Error codes
64 RET_OK everything went fine
65 RET_IOERR I/O Error - examine errno
66 RET_NOMEM not enough memory
67 RET_ILLVAL illegal value for operation
68 RET_NODATA decoder didn't find any data
69 RET_NOEND encoded data wasn't ended properly
70 RET_UNSUP unsupported function (encoding)
71 RET_EXISTS file exists (decoding)
72 RET_CONT continue -- special from ScanPart
73 RET_CANCEL operation canceled
74
75 File States
76 This code is zero, i.e. "false":
77
78 UUFILE_READ Read in, but not further processed
79
80 The following state codes are or'ed together:
81
82 FILE_MISPART Missing Part(s) detected
83 FILE_NOBEGIN No 'begin' found
84 FILE_NOEND No 'end' found
85 FILE_NODATA File does not contain valid uudata
86 FILE_OK All Parts found, ready to decode
87 FILE_ERROR Error while decoding
88 FILE_DECODED Successfully decoded
89 FILE_TMPFILE Temporary decoded file exists
90
91 Encoding types
92 UU_ENCODED UUencoded data
93 B64_ENCODED Mime-Base64 data
94 XX_ENCODED XXencoded data
95 BH_ENCODED Binhex encoded
96 PT_ENCODED Plain-Text encoded (MIME)
97 QP_ENCODED Quoted-Printable (MIME)
98 YENC_ENCODED yEnc encoded (non-MIME)
99
100 EXPORTED FUNCTIONS
101 Initializing and cleanup
102 Initialize is automatically called when the module is loaded and
103 allocates quite a small amount of memory for todays machines ;) CleanUp
104 releases that again.
105
106 On my machine, a fairly complete decode with DBI backend needs about
107 10MB RSS to decode 20000 files.
108
109 Initialize
110 Not normally necessary, (re-)initializes the library.
111
112 CleanUp
113 Not normally necessary, could be called at the end to release memory
114 before starting a new decoding round.
115
116 Setting and querying options
117 $option = GetOption OPT_xxx
118 SetOption OPT_xxx, opt-value
119
120 See the "OPT_xxx" constants above to see which options exist.
121
122 Setting various callbacks
123 SetMsgCallback [callback-function]
124 SetBusyCallback [callback-function]
125 SetFileCallback [callback-function]
126 SetFNameFilter [callback-function]
127
128 Call the currently selected FNameFilter
129 $file = FNameFilter $file
130
131 Loading sourcefiles, optionally fuzzy merge and start decoding
132 ($retval, $count) = LoadFile $fname, [$id, [$delflag, [$partno]]]
133 Load the given file and scan it for encoded contents. Optionally tag
134 it with the given id, and if $delflag is true, delete the file after
135 it is no longer necessary. If you are certain of the part number,
136 you can specify it as the last argument.
137
138 A better (usually faster) way of doing this is using the
139 "SetFNameFilter" functionality.
140
141 $retval = Smerge $pass
142 If you are desperate, try to call "Smerge" with increasing $pass
143 values, beginning at 0, to try to merge parts that usually would not
144 have been merged.
145
146 Most probably this will result in garbled files, so never do this by
147 default.
148
149 $item = GetFileListItem $item_number
150 Return the $item structure for the $item_number'th found file, or
151 "undef" of no file with that number exists.
152
153 The first file has number 0, and the series has no holes, so you can
154 iterate over all files by starting with zero and incrementing until
155 you hit "undef".
156
157 Decoding files
158 $retval = $item->rename($newname)
159 Change the ondisk filename where the decoded file will be saved.
160
161 $retval = $item->decode_temp
162 Decode the file into a temporary location, use "$item->infile" to
163 retrieve the temporary filename.
164
165 $retval = $item->remove_temp
166 Remove the temporarily decoded file again.
167
168 $retval = $item->decode([$target_path])
169 Decode the file to it's destination, or the given target path.
170
171 $retval = $item->info(callback-function)
172
173 Querying (and setting) item attributes
174 $state = $item->state
175 $mode = $item->mode([newmode])
176 $uudet = $item->uudet
177 $size = $item->size
178 $filename = $item->filename([newfilename})
179 $subfname = $item->subfname
180 $mimeid = $item->mimeid
181 $mimetype = $item->mimetype
182 $binfile = $item->binfile
183
184 Information about source parts
185 $parts = $item->parts
186 Return information about all parts (source files) used to decode the
187 file as a list of hashrefs with the following structure:
188
189 {
190 partno => <integer describing the part number, starting with 1>,
191 # the following member sonly exist when they contain useful information
192 sfname => <local pathname of the file where this part is from>,
193 filename => <the ondisk filename of the decoded file>,
194 subfname => <used to cluster postings, possibly the posting filename>,
195 subject => <the subject of the posting/mail>,
196 origin => <the possible source (From) address>,
197 mimetype => <the possible mimetype of the decoded file>,
198 mimeid => <the id part of the Content-Type>,
199 }
200
201 Usually you are interested mostly the "sfname" and possibly the
202 "partno" and "filename" members.
203
204 Functions below not documented and not very well tested
205 QuickDecode
206 EncodeMulti
207 EncodePartial
208 EncodeToStream
209 EncodeToFile
210 E_PrepSingle
211 E_PrepPartial
212
213 EXTENSION FUNCTIONS
214 Functions found in this module but not documented in the uulib
215 documentation:
216
217 $msg = straction ACT_xxx
218 Return a human readable string representing the given action code.
219
220 $msg = strerror RET_xxx
221 Return a human readable string representing the given error code.
222
223 $str = strencoding xxx_ENCODED
224 Return the name of the encoding type as a string.
225
226 $str = strmsglevel MSG_xxx
227 Returns the message level as a string.
228
229 SetFileNameCallback $cb
230 Sets (or queries) the FileNameCallback, which is called whenever the
231 decoding library can't find a filename and wants to extract a
232 filename from the subject line of a posting. The callback will be
233 called with two arguments, the subject line and the current
234 candidate for the filename. The latter argument can be "undef",
235 which means that no filename could be found (and likely no one
236 exists, so it is safe to also return "undef" in this case). If it
237 doesn't return anything (not even "undef"!), then nothing happens,
238 so this is a no-op callback:
239
240 sub cb {
241 return ();
242 }
243
244 If it returns "undef", then this indicates that no filename could be
245 found. In all other cases, the return value is taken to be the
246 filename.
247
248 This is a slightly more useful callback:
249
250 sub cb {
251 return unless $_[1]; # skip "Re:"-plies et al.
252 my ($subject, $filename) = @_;
253 # if we find some *.rar, take it
254 return $1 if $subject =~ /(\w+\.rar)/;
255 # otherwise just pass what we have
256 return ();
257 }
258
259 LARGE EXAMPLE DECODER
260 This is the file "example-decoder" from the distribution, put here
261 instead of more thorough documentation.
262
263 # decode all the files in the directory uusrc/ and copy
264 # the resulting files to uudst/
265
266 use Convert::UUlib ':all';
267
268 sub namefilter {
269 my($path)=@_;
270 $path=~s/^.*[\/\\]//;
271 $path;
272 }
273
274 sub busycb {
275 my ($action, $curfile, $partno, $numparts, $percent, $fsize) = @_;
276 $_[0]=straction($action);
277 print "busy_callback(", (join ",",@_), ")\n";
278 0;
279 }
280
281 SetOption OPT_IGNMODE, 1;
282 SetOption OPT_VERBOSE, 1;
283
284 # show the three ways you can set callback functions. I normally
285 # prefer the one with the sub inplace.
286 SetFNameFilter \&namefilter;
287
288 SetBusyCallback "busycb", 333;
289
290 SetMsgCallback sub {
291 my ($msg, $level) = @_;
292 print uc strmsglevel $_[1], ": $msg\n";
293 };
294
295 # the following non-trivial FileNameCallback takes care
296 # of some subject lines not detected properly by uulib:
297 SetFileNameCallback sub {
298 return unless $_[1]; # skip "Re:"-plies et al.
299 local $_ = $_[0];
300
301 # the following rules are rather effective on some newsgroups,
302 # like alt.binaries.games.anime, where non-mime, uuencoded data
303 # is very common
304
305 # if we find some *.rar, take it as the filename
306 return $1 if /(\S{3,}\.(?:[rstuvwxyz]\d\d|rar))\s/i;
307
308 # one common subject format
309 return $1 if /- "(.{2,}?\..+?)" (?:yenc )?\(\d+\/\d+\)/i;
310
311 # - filename.par (04/55)
312 return $1 if /- "?(\S{3,}\.\S+?)"? (?:yenc )?\(\d+\/\d+\)/i;
313
314 # - (xxx) No. 1 sayuri81.jpg 756565 bytes
315 # - (20 files) No.17 Roseanne.jpg [2/2]
316 return $1 if /No\.[ 0-9]+ (\S+\....) (?:\d+ bytes )?\[/;
317
318 # otherwise just pass what we have
319 return ();
320 };
321
322 # now read all files in the directory uusrc/*
323 for(<uusrc/*>) {
324 my($retval,$count)=LoadFile ($_, $_, 1);
325 print "file($_), status(", strerror $retval, ") parts($count)\n";
326 }
327
328 SetOption OPT_SAVEPATH, "uudst/";
329
330 # now wade through all files and their source parts
331 $i = 0;
332 while ($uu = GetFileListItem($i)) {
333 $i++;
334 print "file nr. $i";
335 print " state ", $uu->state;
336 print " mode ", $uu->mode;
337 print " uudet ", strencoding $uu->uudet;
338 print " size ", $uu->size;
339 print " filename ", $uu->filename;
340 print " subfname ", $uu->subfname;
341 print " mimeid ", $uu->mimeid;
342 print " mimetype ", $uu->mimetype;
343 print "\n";
344
345 # print additional info about all parts
346 for ($uu->parts) {
347 while (my ($k, $v) = each %$_) {
348 print "$k > $v, ";
349 }
350 print "\n";
351 }
352
353 $uu->decode_temp;
354 print " temporarily decoded to ", $uu->binfile, "\n";
355 $uu->remove_temp;
356
357 print strerror $uu->decode;
358 print " saved as uudst/", $uu->filename, "\n";
359 }
360
361 print "cleanup...\n";
362
363 CleanUp();
364
365 AUTHOR
366 Marc Lehmann <schmorp@schmorp.de>, the original uulib library was
367 written by Frank Pilhofer <fp@informatik.uni-frankfurt.de>, and later
368 heavily bugfixed by Marc Lehmann.
369
370 SEE ALSO
371 perl(1), uudeview homepage at http://www.uni-frankfurt.de/~fp/uudeview/.
372