ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/Perl-LibExtractor/LibExtractor.pm
Revision: 1.27
Committed: Tue Sep 24 09:32:16 2013 UTC (10 years, 8 months ago) by root
Branch: MAIN
CVS Tags: HEAD
Changes since 1.26: +33 -5 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     Perl::LibExtractor - determine perl library subsets for building distributions
4    
5     =head1 SYNOPSIS
6    
7     use Perl::LibExtractor;
8    
9     =head1 DESCRIPTION
10    
11     The purpose of this module is to determine subsets of your perl library,
12     that is, a set of files needed to satisfy certain dependencies (e.g. of a
13     program).
14    
15     The goal is to extract a part of your perl installation including
16 root 1.2 dependencies. A typical use case for this module would be to find out
17 root 1.1 which files are needed to be build a L<PAR> distribution, to link into
18     an L<App::Staticperl> binary, or to pack with L<Urlader>, to create
19     stand-alone distributions tailormade to run your app.
20    
21     =head1 METHODS
22    
23     To use this module, first call the C<new>-constructor and then as many
24 root 1.5 other methods as you want, to generate a set of files. Then query the set
25     of files and do whatever you want with them.
26 root 1.2
27 root 1.23 The command-line utility F<perl-libextractor> can be a convenient
28 root 1.2 alternative to using this module directly, and offers a few extra options,
29     such as to copy out the files into a new directory, strip them and/or
30     manipulate them in other ways.
31 root 1.1
32     =cut
33    
34     package Perl::LibExtractor;
35    
36 root 1.22 our $VERSION = '1.1';
37 root 1.1
38     use Config;
39 root 1.25 use Cwd ();
40 root 1.7 use File::Spec ();
41 root 1.1 use File::Temp ();
42    
43     use common::sense;
44    
45     sub I_SRC () { 0 }
46     sub I_DEP () { 1 }
47    
48     sub croak($) {
49     require Carp;
50     Carp::croak "(Perl::LibExtractor) $_[0]";
51     }
52    
53 root 1.8 my $canonpath = File::Spec->can ("canonpath");
54     my $case_tolerant = File::Spec->case_tolerant;
55 root 1.7
56     sub canonpath($) {
57     local $_ = $canonpath->(File::Spec::, $_[0]);
58     s%\\%/%g;
59 root 1.8 # $_ = lc if $case_tolerant; # we assume perl file name case is always the same
60 root 1.7 $_
61     }
62    
63 root 1.5 =head2 CREATION
64    
65     =over 4
66    
67 root 1.1 =item $extractor = new Perl::LibExtractor [key => value...]
68    
69     Creates a new extractor object. Each extractor object stores some
70     configuration options and a subset of files that can be queried at any
71     time,.
72    
73 root 1.12 Binary executables (such as the perl interpreter) are stored inside
74     F<bin/>, perl scripts are stored under F<script/>, perl library files are
75     stored under F<lib/> and shared libraries are stored under F<dll/>.
76    
77 root 1.1 The following key-value pairs exist, with default values as specified.
78    
79     =over 4
80    
81 root 1.25 =item inc => \@INC without any initial "."
82 root 1.1
83     An arrayref with paths to perl library directories. The default is
84 root 1.25 C<\@INC>, with the first entry removed if it is F<.>. Internally, all
85     paths will be changed to absolute paths.
86 root 1.1
87     To prepend custom dirs just do this:
88    
89     inc => ["mydir", @INC],
90    
91 root 1.3 =item use_packlist => 1
92 root 1.1
93 root 1.3 Enable (if true) or disable the use of C<.packlist> files. If enabled,
94 root 1.5 then each time a file is traced, the complete distribution that contains
95     it is included (but not traced).
96    
97     If disabled, only shared objects and autoload files will be added.
98 root 1.1
99 root 1.13 Debian GNU/Linux doesn't completely package perl or any perl modules, so
100     this option will fail. Other perls should be fine.
101    
102 root 1.16 =item extra_deps => { file => [files...] }
103    
104     Some (mainly runtime dependencies in the perl core library) cannot be
105     detected automatically by this module, especially if you don't use
106     packlists and C<add_core>.
107    
108     This module comes with a set of default dependencies (such as L<Carp>
109     requiring L<Carp::Heavy>), which you cna override with this parameter.
110    
111     To see the default set of dependencies that come with this module, use
112     this:
113    
114     perl -MPerl::LibExtractor -MData::Dumper -e 'print Dumper $Perl::LibExtractor::EXTRA_DEPS'
115    
116 root 1.1 =back
117    
118     =cut
119    
120 root 1.16 our $EXTRA_DEPS = {
121 root 1.17 'bytes.pm' => ['bytes_heavy.pl'],
122     'utf8.pm' => ['utf8_heavy.pl'],
123     'Config.pm' => ['Config_heavy.pl', 'Config_git.pl'],
124     'Carp.pm' => ['Carp/Heavy.pm'],
125     'Exporter.pm' => ['Exporter/Heavy.pm'],
126 root 1.16 };
127    
128 root 1.1 sub new {
129     my ($class, %kv) = @_;
130    
131     my $self = bless {
132 root 1.3 use_packlist => 1,
133 root 1.16 extra_deps => $EXTRA_DEPS,
134 root 1.1 %kv,
135 root 1.16 set => {},
136 root 1.1 }, $class;
137    
138 root 1.25 unless (exists $self->{inc}) {
139     my @inc = @INC;
140     shift @inc if $inc[0] eq ".";
141     $self->{inc} = \@INC;
142     }
143    
144 root 1.1 my %inc_seen;
145 root 1.25 my @inc = grep !$inc_seen{$_}++ && -d "$_/.", map Cwd::abs_path ($_), @{ $self->{inc} };
146 root 1.1 $self->{inc} = \@inc;
147    
148 root 1.14 # maybe not inc, but these?
149     # sitearchexp
150     # sitelib
151     # vendorarchexp
152     # vendorlibexp
153     # archlibexp
154     # privlibexp
155    
156 root 1.1 $self->_set_inc;
157    
158     $self
159     }
160    
161 root 1.4 sub _perl_path() {
162     my $secure_perl_path = $Config{perlpath};
163    
164     if ($^O ne 'VMS') {
165     $secure_perl_path .= $Config{_exe}
166     unless $secure_perl_path =~ m/$Config{_exe}$/i;
167     }
168    
169     $secure_perl_path
170     }
171    
172 root 1.8 sub _path2match {
173     my $re = join "|", map "\Q$_", @_;
174    
175 root 1.19 $re = "^(?:$re)\\/";
176 root 1.8 $re =~ s%\\[/\\]%[/\\\\]%g; # we support / and \ on all OSes, keep your fingers crossed
177    
178     $case_tolerant
179     ? qr<$re>i
180     : qr<$re>
181     }
182    
183 root 1.11 sub _read_packlist {
184 root 1.15 my ($self, $path) = @_;
185 root 1.11
186 root 1.15 my $pfxmatch = $self->{pfxmatch};
187     my $lib = $self->{lib};
188 root 1.11
189     my @packlist;
190    
191 root 1.15 open my $fh, "<:perlio", $path
192     or die "$path: $!";
193 root 1.11
194     while (<$fh>) {
195     chomp;
196     s/ .*$//; # newer-style .packlists might contain key=value pairs
197    
198 root 1.19 s%\\%/%g; # we only do unix-style paths internally
199    
200 root 1.15 s/$pfxmatch// and exists $lib->{$_}
201 root 1.11 or next;
202    
203     push @packlist, canonpath $_;
204     }
205    
206     \@packlist
207     }
208    
209 root 1.1 sub _set_inc {
210     my ($self) = @_;
211    
212 root 1.15 my $pfxmatch = _path2match @{ $self->{inc }};
213 root 1.2
214 root 1.1 my %lib;
215 root 1.2 my @packlists;
216 root 1.1
217     # find all files in all libdirs, earlier ones overwrite later ones
218 root 1.2 my @scan = map [$_, ""], @{ $self->{inc} };
219    
220     while (@scan) {
221     my ($root, $dir) = @{ pop @scan };
222    
223     my $pfx = length $dir ? "$dir/" : "";
224    
225     for (do {
226     opendir my $fh, "$root/$dir"
227     or croak "$root/$dir: $!";
228     grep !/^\.\.?$/, readdir $fh
229     }) {
230     if (-d "$root/$dir/$_/.") {
231     $lib{"$pfx$_/"} = "$root/$pfx$_";
232     push @scan, [$root, "$pfx$_"];
233     } elsif ($_ eq ".packlist" && $pfx =~ m%^auto/%) {
234 root 1.15 push @packlists, "$root/$pfx.packlist";
235 root 1.2 } elsif (/\.bs$/ && $pfx =~ m%^auto/% && !-s "$root/$dir/$_") {
236     # skip empty .bs files
237     # } elsif (/\.(?:pod|h|html)$/) {
238     # # not interested in those
239     } else {
240     #push @files, $_;
241     $lib{"$pfx$_"} = "$root/$pfx$_";
242 root 1.1 }
243 root 1.2 }
244    
245     #$lib{"$_[1]/"} = [\@dirs, \@files]; # won't work nice with overwrite
246     }
247    
248 root 1.15 $self->{lib} = \%lib;
249     $self->{pfxmatch} = $pfxmatch;
250 root 1.14
251 root 1.2 my %packlist;
252 root 1.1
253 root 1.2 # need to go forward here
254     for (@packlists) {
255 root 1.15 my $packlist = $self->_read_packlist ($_);
256 root 1.2
257 root 1.11 $packlist{$_} = $packlist
258     for @$packlist;
259 root 1.1 }
260    
261 root 1.14 $self->{packlist} = \%packlist;
262 root 1.1 }
263    
264 root 1.5 =back
265    
266     =head2 TRACE/PACKLIST BASED ADDING
267    
268     The following methods add various things to the set of files.
269    
270     Each time a perl file is added, it is scanned by tracing either loading,
271     execution or compiling it, and seeing which other perl modules and
272     libraries have been loaded.
273    
274     For each library file found this way, additional dependencies are added:
275     if packlists are enabled, then all files of the distribution that contains
276     the file will be added. If packlists are disabled, then only shared
277     objects and autoload files for modules will be added.
278    
279     Only files from perl library directories will be added automatically. Any
280     other files (such as manpages or scripts installed in the F<bin>
281     directory) are skipped.
282    
283     If there is an error, such as a module not being found, then this module
284     croaks (as opposed to silently skipping). If you want to add something of
285     which you are not sure it exists, then you can wrap the call into C<eval
286     {}>. In some cases, you can avoid this by executing the code you want
287     to work later using C<add_eval> - see C<add_core_support> for an actual
288     example of this technique.
289    
290     Note that packlists are meant to add files not covered by other
291     mechanisms, such as resource files and other data files loaded directly by
292     a module - they are not meant to add dependencies that are missed because
293     they only happen at runtime.
294    
295     For example, with packlists, when using L<AnyEvent>, then all event loop
296     backends are automatically added as well, but I<not> any event loops
297     (i.e. L<AnyEvent::Impl::POE> is added, but L<POE> itself is not). Without
298     packlists, only the backend that is being used is added (i.e. normally
299     none, as loading AnyEvent does not instantly load any backend).
300    
301     To catch the extra event loop dependencies, you can either initialise
302     AnyEvent so it picks a suitable backend:
303    
304     $extractor->add_eval ("use AnyEvent; AnyEvent::detect");
305    
306     Or you can directly load the backend modules you plan to use:
307    
308     $extractor->add_mod ("AnyEvent::Impl::EV", "AnyEvent::Impl::Perl");
309    
310     An example of a program (or module) that has extra resource files is
311     L<Deliantra::Client> - the normal tracing (without packlist usage) will
312     correctly add all submodules, but miss the fonts and textures. By using
313     the packlist, those files are added correctly.
314    
315     =over 4
316    
317     =cut
318    
319 root 1.1 sub _add {
320     my ($self, $add) = @_;
321    
322     my $lib = $self->{lib};
323     my $path;
324    
325     for (@$add) {
326 root 1.12 $path = "lib/$_";
327 root 1.1
328     $self->{set}{$path} ||= do {
329     my @info;
330    
331     $info[I_SRC] = $lib->{$_}
332     or croak "$_: unable to locate file in perl library";
333    
334 root 1.3 if ($self->{use_packlist} && exists $self->{packlist}{$_}) {
335 root 1.12 $self->{set}{"lib/$_"} ||= [$self->{lib}{$_} or die]
336 root 1.2 for @{ $self->{packlist}{$_} };
337 root 1.3
338     # for (grep /\.pm$/, @{ $self->{packlist}{$_} }) {
339     # s/\.pm$//;
340     # s%/%::%g;
341 root 1.27 # my $pkg = "Perl::LibExtractor::trace::_add_" . ++$self->{count};
342 root 1.3 # $self->add_eval ("{ package $pkg; eval 'use $_' }")
343     # unless $self->{_add_do}{$_}++;
344     # }
345     #
346     # $self->{_add_do}{$_}++ or $self->add_eval ("do q\x00$_\x00")
347     # for grep /\.pl$/, @{ $self->{packlist}{$_} };
348    
349 root 1.2 } elsif (/^(.*)\.pm$/) {
350 root 1.1 (my $auto = "auto/$1/") =~ s%::%/%g;
351     $auto =~ m%/([^/]+)/$% or die;
352     my $base = $1;
353    
354     if (exists $lib->{$auto}) {
355     # auto dir exists, scan it for cool stuff
356    
357     # 1. shared object, others are of no interest to us
358     my $so = "$auto$base.$Config{dlext}";
359     if (my $src = $lib->{$so}) {
360 root 1.12 $so = "lib/$so";
361 root 1.1 push @{ $info[I_DEP] }, $so; $self->{set}{$so} = [$src];
362     }
363    
364     # 2. autoloader/autosplit
365     my $ix = "${auto}autosplit.ix";
366     if (my $src = $lib->{$ix}) {
367 root 1.12 $ix = "lib/$ix";
368 root 1.1 push @{ $info[I_DEP] }, $ix; $self->{set}{$ix} = [$src];
369    
370     open my $fh, "<:perlio", $src
371     or croak "$src: $!";
372    
373     my $package;
374    
375     while (<$fh>) {
376     if (/^\s*sub\s+ ([^[:space:];]+) \s* (?:\([^)]*\))? \s*;?\s*$/x) {
377     my $al = "auto/$package/$1.al";
378     my $src = $lib->{$al}
379     or croak "$al: autoload file not found, but should be there.";
380    
381 root 1.12 $al = "lib/$al";
382 root 1.1 push @{ $info[I_DEP] }, $al; $self->{set}{$al} = [$src];
383    
384     } elsif (/^\s*package\s+([^[:space:];]+)\s*;?\s*$/) {
385     ($package = $1) =~ s/::/\//g;
386     } elsif (/^\s*(?:#|1?\s*;?\s*$)/) {
387     # nop
388     } else {
389     warn "WARNING: $src: unparsable line, please report: $_";
390     }
391     }
392     }
393    
394     skip:
395     }
396     }
397    
398 root 1.16 if (exists $self->{extra_deps}{$_}) {
399     # we require it again, because many extra dependencies require the main module to be loaded
400     $self->add_eval ("require q\x00$_\x00");
401    
402     exists $lib->{$_} and $self->add_require ($_)
403     for @{ $self->{extra_deps}{$_} };
404     }
405    
406 root 1.1 \@info
407     };
408     }
409     }
410    
411     sub _trace {
412     my ($self, $file, $eval) = @_;
413    
414 root 1.3 $self->{trace_begin} .= "\n#line \"$file\" 1\n$eval;\n";
415 root 1.1 }
416    
417     sub _trace_flush {
418     my ($self) = @_;
419    
420 root 1.3 # ->_add might add additional files to trace
421     while (exists $self->{trace_begin} or exists $self->{trace_check}) {
422     my $tmpdir = newdir File::Temp;
423     my $dir = $tmpdir->dirname;
424    
425     open my $fh, ">:perlio", "$dir/eval"
426     or croak "$dir/eval: $!";
427     syswrite $fh,
428 root 1.27 'BEGIN { '
429     . '@INC = (' . (join ", ", map "q\x00$_\x00", @{ $self->{inc} }) . ');'
430     . "chdir q\x00$dir\x00 or die q\x00$dir\x00 . \": \$!\";"
431     . '$Perl::LibExtractor::EXTRACTING = 1;'
432     . '}'
433 root 1.3 . 'BEGIN { ' . (delete $self->{trace_begin}) . "}\n"
434     . "CHECK {\n"
435     . 'open STDOUT, ">:raw", "out" or die "out: $!";'
436     . 'print join "\x00", values %INC;'
437     . 'open STDERR, ">stderr";' # suppress "syntax OK" message from perl
438     . "}\n"
439     . (delete $self->{trace_check});
440     close $fh;
441    
442 root 1.27 system "cat $dir/eval >/tmp/x";#d#
443    
444 root 1.4 system _perl_path, "-c", "$dir/eval"
445 root 1.3 and croak "trace failure, check trace process output - caught";
446 root 1.1
447 root 1.3 my @inc = split /\x00/, do {
448     open my $fh, "<:perlio", "$dir/out"
449     or croak "$dir/out: $!";
450     local $/;
451     scalar readline $fh
452     };
453 root 1.1
454 root 1.15 my $pfxmatch = $self->{pfxmatch};
455 root 1.1
456 root 1.3 # remove the library directory prefix, hope for the best
457 root 1.15 s/$pfxmatch//
458 root 1.3 or croak "$_: file outside any library directory"
459     for @inc;
460 root 1.1
461 root 1.3 $self->_add (\@inc);
462     }
463 root 1.1 }
464    
465     =item $extractor->add_mod ($module[, $module...])
466    
467     Adds the given module(s) to the file set - the module name must be specified
468     as in C<use>, i.e. with C<::> as separators and without F<.pm>.
469    
470     The program will be loaded with the default import list, any dependent
471     files, such as the shared object implementing xs functions, or autoload
472     files, will also be added.
473    
474 root 1.5 If you want to use a different import list (for those rare modules wghere
475     import lists trigger different backend modules to be loaded for example),
476     you can use C<add_eval> instead:
477    
478     $extractor->add_eval ("use Module qw(a b c)");
479    
480 root 1.1 Example: add F<Coro.pm> and F<AnyEvent/AIO.pm>, and all relevant files
481     from the distribution they are part of.
482    
483     $extractor->add_mod ("Coro", "AnyEvent::AIO");
484    
485     =cut
486    
487     sub add_mod {
488     my $self = shift;
489    
490 root 1.3 for (@_) {
491 root 1.27 my $pkg = "Perl::LibExtractor::trace::mod_" . ++$self->{count};
492 root 1.3 $self->_trace ("use $_", "{ package $pkg; use $_ }")
493     unless $self->{add_mod}{$_}++;
494     }
495 root 1.1 }
496    
497 root 1.16 =item $extractor->add_require ($name[, $name...])
498    
499     Works like C<add_mod>, but uses C<require $name> to load the module, i.e.
500     the name must be a filename.
501    
502     Example: load Coro and AnyEvent::AIO, but using C<add_require> instead of C<add_mod>.
503    
504     $extractor->add_require ("Coro.pm", "AnyEvent/AIO.pm");
505    
506     =cut
507    
508     sub add_require {
509     my $self = shift;
510    
511     for (@_) {
512     $self->add_eval ("require q\x00$_\x00")
513     unless $self->{add_require}{$_}++;
514     }
515     }
516    
517 root 1.12 =item $extractor->add_bin ($name[, $name...])
518 root 1.1
519     Adds the given (perl) program(s) to the file set, that is, a program
520     installed by some perl module, written in perl (an example would be the
521 root 1.23 L<perl-libextractor> program that is part of the C<Perl::LibExtractor>
522 root 1.1 distribution).
523    
524 root 1.26 The program should be installed in one of the standard paths for perl
525     programs - C<$Config{sitebin}>, C<$Config{vendorbin}> or C<$Config{bin}>,
526     or must be specified via an absolute path (starting with F</>).
527    
528 root 1.5 Example: add the deliantra client program installed by the
529 root 1.12 L<Deliantra::Client> module and put it under F<bin/deliantra>.
530 root 1.1
531 root 1.12 $extractor->add_bin ("deliantra");
532 root 1.1
533     =cut
534    
535 root 1.12 sub add_bin {
536 root 1.1 my $self = shift;
537    
538     exe:
539     for my $exe (@_) {
540 root 1.26 for my $dir ($exe =~ m%^/% ? "" : (grep length, $Config{sitebinexp}, $Config{vendorbinexp}, $Config{binexp})) {
541 root 1.1 if (open my $fh, "<:perlio", "$dir/$exe") {
542 root 1.12 if (-f $fh) {
543     my $file = do { local $/; readline $fh };
544 root 1.1
545 root 1.12 $self->_trace_flush if exists $self->{trace_check};
546     $self->{trace_check} = $file;
547 root 1.1
548 root 1.12 $self->{set}{"bin/$exe"} = ["$dir/$exe"];
549     next exe;
550     }
551 root 1.1 }
552     }
553    
554 root 1.12 croak "add_bin ($exe): executable not found";
555 root 1.1 }
556     }
557    
558     =item $extractor->add_eval ($string)
559    
560 root 1.5 Evaluates the string as perl code and adds all modules that are loaded
561     by it. For example, this would add L<AnyEvent> and the default backend
562     implementation module and event loop module:
563 root 1.1
564     $extractor->add_eval ("use AnyEvent; AnyEvent::detect");
565    
566 root 1.5 Each code snippet will be executed in its own package and under C<use
567     strict>.
568    
569 root 1.1 =cut
570    
571     sub add_eval {
572     my ($self, $eval) = @_;
573    
574 root 1.21 (my $file = substr $eval, 0, 64) =~ s/\015?\012/\\n/g;
575    
576 root 1.27 my $pkg = "Perl::LibExtractor::trace::eval_" . ++$self->{count};
577 root 1.1 $eval =~ s/\x00/\x00."\\x00".q\x00/g;
578 root 1.21 $self->_trace ($file,
579 root 1.3 "local \$^H = \$^H;" # vvvvvvvvvvvvvvvvvvvv = use strict; use utf8
580 root 1.5 . "eval q\x00package $pkg; BEGIN { \$^H = \$^H | 0x800600 } $eval\x00; die \"\$\@\" if \$\@;\n"
581 root 1.3 );
582 root 1.1 }
583    
584 root 1.5 =back
585    
586     =head2 OTHER METHODS FOR ADDING FILES
587    
588     The following methods add commonly used files that are either not covered
589     by other methods or add commonly-used dependencies.
590    
591     =over 4
592    
593 root 1.1 =item $extractor->add_perl
594    
595     Adds the perl binary itself to the file set, including the libperl dll, if
596     needed.
597    
598 root 1.12 For example, on UNIX systems, this usually adds a F<exe/perl> and possibly
599     some F<dll/libperl.so.XXX>.
600 root 1.5
601 root 1.4 =cut
602    
603     sub add_perl {
604     my ($self) = @_;
605    
606 root 1.12 $self->{set}{"exe/perl$Config{_exe}"} = [_perl_path];
607 root 1.4
608     # on debian, we have the special case of a perl binary linked against
609     # a static libperl.a (which is not available), but the Config says to use
610     # a shared library, which is in the wrong directory, too (which breaks
611     # every other perl installation on the system - they are so stupid).
612    
613     # that means we can't find the libperl.so, because dbeian actively breaks
614     # their perl install, and we don't need it. we work around this by silently
615     # not including the libperl if we cannot find it.
616    
617     if ($Config{useshrplib} eq "true") {
618 root 1.8 my ($libperl, $libpath);
619    
620     if ($^O eq "cygwin") {
621     $libperl = $Config{libperl};
622     $libpath = "$Config{binexp}/$libperl";
623     } elsif ($^O eq "MSWin32") {
624     ($libperl = $Config{libperl}) =~ s/\Q$Config{_a}\E$/.$Config{so}/;
625     $libpath = "$Config{binexp}/$libperl";
626     } else {
627     $libperl = $Config{libperl};
628     $libpath = $self->{lib}{"CORE/$libperl"};
629 root 1.4 }
630 root 1.8
631 root 1.18 $self->{set}{"dll/$libperl"} = [$libpath]
632 root 1.8 if length $libpath && -e $libpath;
633 root 1.4 }
634     }
635 root 1.3
636     =item $extractor->add_core_support
637    
638     Try to add modules and files needed to support commonly-used builtin
639     language features. For example to open a scalar for I/O you need the
640     L<PerlIO::scalar> module:
641    
642     open $fh, "<", \$scalar
643    
644     A number of regex and string features (e.g. C<ucfirst>) need some unicore
645     files, e.g.:
646    
647     'my $x = chr 1234; "\u$x\U$x\l$x\L$x"; $x =~ /\d|\w|\s|\b|$x/i';
648    
649     This call adds these files (simply by executing code similar to the above
650     code fragments).
651    
652     Notable things that are missing are other PerlIO layers, such as
653     L<PerlIO::encoding>, and named character and character class matches.
654    
655     =cut
656    
657     sub add_core_support {
658     my ($self) = @_;
659    
660 root 1.21 $self->add_eval ('
661     # PerlIO::Scalar
662     my $v; open my $fh, "<", \$v;
663    
664     # various unicore regex/builtin gambits
665     my $x = chr 1234;
666     "\u$x\U$x\l$x\L$x";
667     $x =~ /$_$x?/i
668     for qw(\d \w \s \b \R \h \v);
669     split " ", $x; # usually covered by the regex above
670     ');
671    
672 root 1.10 $self->add_eval ('/\x{1234}(?<a>)\g{a}/') if $] >= 5.010; # usually covered by the regex above
673 root 1.3 }
674 root 1.1
675 root 1.3 =item $extractor->add_unicore
676    
677 root 1.11 Adds (hopefully) all files from the unicore database that will ever be
678 root 1.5 needed.
679    
680     If you are not sure which unicode character classes and similar unicore
681     databases you need, and you do not care about an extra one thousand(!)
682     files comprising 4MB of data, then you can just call this method, which
683     adds basically all files from perl's unicode database.
684    
685 root 1.11 Note that C<add_core_support> also adds some unicore files, but it's not a
686     subset of C<add_unicore> - the former adds all files neccessary to support
687     core builtins (which includes some unicore files and other things), while
688     the latter adds all unicore files (but nothing else).
689    
690     When in doubt, use both.
691    
692 root 1.5 =cut
693    
694     sub add_unicore {
695     my ($self) = @_;
696    
697     $self->_add ([grep m%^unicore/.*\.pl$%, keys %{ $self->{lib} }]);
698     }
699    
700 root 1.11 =item $extractor->add_core
701    
702     This adds all files from the perl core distribution, that is, all library
703     files that come with perl.
704    
705     This is a superset of C<add_core_support> and C<add_unicore>.
706    
707     This is quite a lot, but on the plus side, you can be sure nothing is
708     missing.
709    
710 root 1.13 This requires a full perl installation - Debian GNU/Linux doesn't package
711     the full perl library, so this function will not work there.
712 root 1.11
713     =cut
714    
715     sub add_core {
716     my ($self) = @_;
717    
718     my $lib = $self->{lib};
719    
720     for (@{
721 root 1.15 $self->_read_packlist (".packlist")
722 root 1.11 }) {
723     $self->{set}{$_} ||= [
724 root 1.12 "lib/"
725 root 1.11 . ($lib->{$_} or croak "$_: unable to locate file in perl library")
726     ];
727     }
728     }
729    
730 root 1.5 =back
731    
732     =head2 GLOB-BASED ADDING AND FILTERING
733 root 1.1
734 root 1.5 These methods add or manipulate files by using glob-based patterns.
735    
736     These glob patterns work similarly to glob patterns in the shell:
737    
738     =over 4
739    
740     =item /
741    
742     A F</> at the start of the pattern interprets the pattern as a file
743     path inside the file set, almost the same as in the shell. For example,
744     F</bin/perl*> would match all files whose names starting with F<perl>
745     inside the F<bin> directory in the set.
746    
747     If the F</> is missing, then the pattern is interpreted as a module name
748 root 1.12 (a F<.pm> file). For example, F<Coro> matches the file F<lib/Coro.pm> ,
749     while F<Coro::*> would match F<lib/Coro/*.pm>.
750 root 1.5
751     =item *
752    
753     A single star matches anything inside a single directory component. For
754     example, F</lib/Coro/*.pm> would match all F<.pm> files inside the
755     F<lib/Coro/> directory, but not any files deeper in the hierarchy.
756 root 1.1
757 root 1.5 Another way to look at it is that a single star matches anything but a
758     slash (F</>).
759 root 1.1
760 root 1.5 =item **
761 root 1.1
762 root 1.5 A double star matches any number of characters in the path, including F</>.
763 root 1.1
764 root 1.5 For example, F<AnyEvent::**> would match all modules whose names start
765     with C<AnyEvent::>, no matter how deep in the hierarchy they are.
766 root 1.1
767 root 1.5 =back
768 root 1.1
769 root 1.2 =cut
770    
771     sub _extglob2re {
772 root 1.5 for (quotemeta $_[1]) {
773 root 1.2 s/\\\*\\\*/.*/g;
774     s/\\\*/[^\/]*/g;
775     s/\\\?/[^\/]/g;
776    
777     unless (s%^\\/%%) {
778     s%\\:\\:%/%g;
779 root 1.12 $_ = "lib/$_\\.pm";
780 root 1.2 }
781    
782     $_ .= '$';
783     s/(?: \[\^\/\] | \. ) \*\$$//x; # remove ** at end
784    
785     return qr<^$_>s
786     }
787     }
788    
789 root 1.5 =over 4
790    
791     =item $extractor->add_glob ($modglob[, $modglob...])
792    
793     Adds all files from the perl library that match the given glob pattern.
794    
795     For example, you could implement C<add_unicore> yourself like this:
796    
797     $extractor->add_glob ("/unicore/**.pl");
798    
799     =cut
800    
801     sub add_glob {
802     my $self = shift;
803    
804     for (@_) {
805     my $pat = $self->_extglob2re ($_);
806     $self->_add ([grep /$pat/, keys %{ $self->{lib} }]);
807     }
808     }
809    
810     =item $extractor->filter ($pattern[, $pattern...])
811    
812     Applies a series of include/exclude filters. Each filter must start with
813     either C<+> or C<->, to designate the pattern as I<include> or I<exclude>
814     pattern. The rest of the pattern is a normal glob pattern.
815    
816     An exclude pattern (C<->) instantly removes all matching files from
817     the set. An include pattern (C<+>) protects matching files from later
818     removals.
819    
820     That is, if you have an include pattern then all files that were matched
821     by it will be included in the set, regardless of any further exclude
822     patterns matching the same files.
823    
824     Likewise, any file excluded by a pattern will not be included in the set,
825     even if matched by later include patterns.
826    
827     Any files not matched by any expression will simply stay in the set.
828    
829     For example, to remove most of the useless autoload functions by the POSIX
830     module (they either do the same thing as a builtin or always raise an
831 root 1.12 error), you would use this:
832 root 1.5
833     $extractor->filter ("-/lib/auto/POSIX/*.al");
834    
835     This does not remove all autoload files, only the ones not defined by a
836     subclass (e.g. it leaves C<POSIX::SigRt::xxx> alone).
837    
838     =cut
839    
840 root 1.2 sub filter {
841     my ($self, @patterns) = @_;
842    
843 root 1.3 $self->_trace_flush;
844    
845     my $set = $self->{set};
846     my %include;
847 root 1.2
848     for my $pat (@patterns) {
849     $pat =~ s/^([+\-])//
850     or croak "$_: not a valid filter pattern (missing + or - prefix)";
851     my $inc = $1 eq "+";
852     $pat = $self->_extglob2re ($pat);
853 root 1.3
854     my @match = grep /$pat/, keys %$set;
855    
856     if ($inc) {
857     @include{@match} = delete @$set{@match};
858     } else {
859     delete @$set{@{ $_->[I_DEP] }} # remove dependents
860     for delete @$set{@match};
861     }
862 root 1.2 }
863 root 1.3
864     my @include = keys %include;
865     @$set{@include} = delete @include{@include};
866 root 1.2 }
867 root 1.1
868 root 1.3 =item $extractor->runtime_only
869    
870     This removes all files that are not needed at runtime, such as static
871     archives, header and other files needed only for compilation of modules,
872     and pod and html files (which are unlikely to be needed at runtime).
873 root 1.1
874 root 1.8 This is quite useful when you want to have only files actually needed to
875 root 1.3 execute a program.
876    
877     =cut
878    
879     sub runtime_only {
880     my ($self) = @_;
881    
882     $self->_trace_flush;
883    
884     my $set = $self->{set};
885    
886 root 1.8 # delete all static libraries, also windows stuff
887 root 1.12 delete @$set{ grep m%^lib/auto/(?:.+/)?([^\/]+)/\1(?:\Q$Config{_a}\E|\.pdb|\.exp)$%s, keys %$set };
888 root 1.3
889     # delete all extralibs.ld and extralibs.all (no clue what the latter is for)
890 root 1.12 delete @$set{ grep m%^lib/auto/.*/extralibs\.(?:ld|all)$%s, keys %$set };
891 root 1.3
892     # delete all .pod, .h, .html files (hopefully none of them are used at runtime)
893 root 1.24 # also typemap
894     delete @$set{ grep m%^lib/.*\.(?:pod|h|html|typemap)$%s, keys %$set };
895 root 1.11
896     # delete unneeded unicore files
897 root 1.12 delete @$set{ grep m%^lib/unicore/(?:mktables(?:\.lst)?|.*\.txt)$%s, keys %$set };
898 root 1.3 }
899 root 1.1
900     =back
901    
902 root 1.5 =head2 RESULT SET
903    
904     =over 4
905 root 1.1
906 root 1.5 =item $set = $extractor->set
907 root 1.1
908 root 1.5 Returns a hash reference that represents the result set. The hash is the
909     actual internal storage hash and can only be modified as described below.
910 root 1.1
911 root 1.5 Each key in the hash is the path inside the set, without a leading slash,
912     e.g.:
913 root 1.1
914 root 1.5 bin/perl
915     lib/unicore/lib/Blk/Superscr.pl
916     lib/AnyEvent/Impl/EV.pm
917    
918     The value is an array reference with mostly unspecified contents, except
919     the first element, which is the file system path where the actual file can
920     be found.
921    
922     This code snippet lists all files inside the set:
923    
924     print "$_\n"
925     for sort keys %{ $extractor->set });
926    
927     This code fragment prints C<< filesystem_path => set_path >> pairs for all
928     files in the set:
929    
930     my $set = $extractor->set;
931     while (my ($set,$fspath) = each %$set) {
932     print "$fspath => $set\n";
933     }
934 root 1.1
935 root 1.5 You can implement your own filtering by asking for the result set with
936     C<< $extractor->set >>, and then deleting keys from the referenced hash
937     - since you can ask for the result set at any time you can add things,
938     filter them out this way, and add additional things.
939    
940     =back
941    
942     =cut
943    
944     sub set {
945 root 1.6 $_[0]->_trace_flush;
946 root 1.5 $_[0]{set}
947     }
948    
949 root 1.27 =head1 EXTRACTION PROCESS
950    
951     This module uses two perl execution modes to trace dependencies:
952     compiling only (as with the C<-c> switch) and running (the normal mode of
953     execution).
954    
955     currently, scripts/binaries are compiled only, while everything else is
956     executed.
957    
958     The variable C<$Perl::LibExtractor::EXTRACTING> exists and has a true
959     value when code is executed for tracing, and can be used to avoid
960     executing code that shouldn't be executed when determining dependencies,
961     e.g. with C<defined>:
962    
963     unless (defined $Perl::LibExtractor::EXTRACTING) {
964     open my $fh, "<some/resource/file"
965     or die "$!";
966     }
967    
968     You can test the truth value of C<$Perl::LibExtractor::EXTRACTING>
969     directly, but using C<defined> doesn't issue a warning when uninitialized
970     warnings are enabled.
971    
972 root 1.5 =head1 EXAMPLE
973    
974     To package he deliantra client (L<Deliantra::Client>), finding all
975     (perl) files needed to run it is a first step. This can be done by using
976     something like the following code snippet:
977    
978 root 1.12 my $ex = new Perl::LibExtractor;
979 root 1.5
980     $ex->add_perl;
981     $ex->add_core_support;
982 root 1.12 $ex->add_bin ("deliantra");
983 root 1.5 $ex->add_mod ("AnyEvent::Impl::EV");
984     $ex->add_mod ("AnyEvent::Impl::Perl");
985     $ex->add_mod ("Urlader");
986     $ex->filter ("-/*/auto/POSIX/**.al");
987     $ex->runtime_only;
988    
989     First it sets the perl library directory to F<pm> and F<.> (the latter
990     to work around some AutoLoader bugs), so perl uses only the perl library
991     files that came with the binary package.
992    
993     Then it sets some environment variable to override the system default
994     (which might be incompatible).
995    
996     Then it runs the client itself, using C<require>. Since C<require> only
997     looks in the perl library directory this is the reaosn why the scripts
998     were put there (of course, since F<.> is also included it doesn't matter,
999     but I refuse to yield to bugs).
1000    
1001     Finally it exits with a clean status to signal "ok" to Urlader.
1002    
1003     Back to the original C<Perl::LibExtractor> script: after initialising a
1004     new set, the script simply adds the F<perl> interpreter and core support
1005     files (just in case, not all are needed, but some are, and I am too lazy
1006     to find out which ones exactly).
1007    
1008     Then it adds the deliantra executable itself, which in turn adds most of
1009     the required modules. After that, the AnyEvent implementation modules are
1010     added because these dependencies are not picked up automatically.
1011    
1012     The L<Urlader> module is added because the client itself does not depend
1013     on it at all, but the wrapper does.
1014    
1015     At this point, all required files are present, and it's time to slim
1016     down: most of the ueseless POSIX autoloaded functions are removed,
1017     not because they are so big, but because creating files is a costly
1018     operation in itself, so even small fiels have considerable overhead when
1019     unpacking. Then files not required for running the client are removed.
1020    
1021     And that concludes it, the set is now ready.
1022 root 1.1
1023     =head1 SEE ALSO
1024    
1025 root 1.23 The utility program that comes with this module: L<perl-libextractor>.
1026 root 1.1
1027     L<App::Staticperl>, L<Urlader>, L<Perl::Squish>.
1028    
1029 root 1.5 =head1 LICENSE
1030    
1031     This software package is licensed under the GPL version 3 or any later
1032     version, see COPYING for details.
1033    
1034     This license does not, of course, apply to any output generated by this
1035     software.
1036    
1037 root 1.1 =head1 AUTHOR
1038    
1039     Marc Lehmann <schmorp@schmorp.de>
1040     http://home.schmorp.de/
1041    
1042     =cut
1043    
1044     1;
1045