ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/Perl-LibExtractor/LibExtractor.pm
Revision: 1.27
Committed: Tue Sep 24 09:32:16 2013 UTC (10 years, 7 months ago) by root
Branch: MAIN
CVS Tags: HEAD
Changes since 1.26: +33 -5 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 Perl::LibExtractor - determine perl library subsets for building distributions
4
5 =head1 SYNOPSIS
6
7 use Perl::LibExtractor;
8
9 =head1 DESCRIPTION
10
11 The purpose of this module is to determine subsets of your perl library,
12 that is, a set of files needed to satisfy certain dependencies (e.g. of a
13 program).
14
15 The goal is to extract a part of your perl installation including
16 dependencies. A typical use case for this module would be to find out
17 which files are needed to be build a L<PAR> distribution, to link into
18 an L<App::Staticperl> binary, or to pack with L<Urlader>, to create
19 stand-alone distributions tailormade to run your app.
20
21 =head1 METHODS
22
23 To use this module, first call the C<new>-constructor and then as many
24 other methods as you want, to generate a set of files. Then query the set
25 of files and do whatever you want with them.
26
27 The command-line utility F<perl-libextractor> can be a convenient
28 alternative to using this module directly, and offers a few extra options,
29 such as to copy out the files into a new directory, strip them and/or
30 manipulate them in other ways.
31
32 =cut
33
34 package Perl::LibExtractor;
35
36 our $VERSION = '1.1';
37
38 use Config;
39 use Cwd ();
40 use File::Spec ();
41 use File::Temp ();
42
43 use common::sense;
44
45 sub I_SRC () { 0 }
46 sub I_DEP () { 1 }
47
48 sub croak($) {
49 require Carp;
50 Carp::croak "(Perl::LibExtractor) $_[0]";
51 }
52
53 my $canonpath = File::Spec->can ("canonpath");
54 my $case_tolerant = File::Spec->case_tolerant;
55
56 sub canonpath($) {
57 local $_ = $canonpath->(File::Spec::, $_[0]);
58 s%\\%/%g;
59 # $_ = lc if $case_tolerant; # we assume perl file name case is always the same
60 $_
61 }
62
63 =head2 CREATION
64
65 =over 4
66
67 =item $extractor = new Perl::LibExtractor [key => value...]
68
69 Creates a new extractor object. Each extractor object stores some
70 configuration options and a subset of files that can be queried at any
71 time,.
72
73 Binary executables (such as the perl interpreter) are stored inside
74 F<bin/>, perl scripts are stored under F<script/>, perl library files are
75 stored under F<lib/> and shared libraries are stored under F<dll/>.
76
77 The following key-value pairs exist, with default values as specified.
78
79 =over 4
80
81 =item inc => \@INC without any initial "."
82
83 An arrayref with paths to perl library directories. The default is
84 C<\@INC>, with the first entry removed if it is F<.>. Internally, all
85 paths will be changed to absolute paths.
86
87 To prepend custom dirs just do this:
88
89 inc => ["mydir", @INC],
90
91 =item use_packlist => 1
92
93 Enable (if true) or disable the use of C<.packlist> files. If enabled,
94 then each time a file is traced, the complete distribution that contains
95 it is included (but not traced).
96
97 If disabled, only shared objects and autoload files will be added.
98
99 Debian GNU/Linux doesn't completely package perl or any perl modules, so
100 this option will fail. Other perls should be fine.
101
102 =item extra_deps => { file => [files...] }
103
104 Some (mainly runtime dependencies in the perl core library) cannot be
105 detected automatically by this module, especially if you don't use
106 packlists and C<add_core>.
107
108 This module comes with a set of default dependencies (such as L<Carp>
109 requiring L<Carp::Heavy>), which you cna override with this parameter.
110
111 To see the default set of dependencies that come with this module, use
112 this:
113
114 perl -MPerl::LibExtractor -MData::Dumper -e 'print Dumper $Perl::LibExtractor::EXTRA_DEPS'
115
116 =back
117
118 =cut
119
120 our $EXTRA_DEPS = {
121 'bytes.pm' => ['bytes_heavy.pl'],
122 'utf8.pm' => ['utf8_heavy.pl'],
123 'Config.pm' => ['Config_heavy.pl', 'Config_git.pl'],
124 'Carp.pm' => ['Carp/Heavy.pm'],
125 'Exporter.pm' => ['Exporter/Heavy.pm'],
126 };
127
128 sub new {
129 my ($class, %kv) = @_;
130
131 my $self = bless {
132 use_packlist => 1,
133 extra_deps => $EXTRA_DEPS,
134 %kv,
135 set => {},
136 }, $class;
137
138 unless (exists $self->{inc}) {
139 my @inc = @INC;
140 shift @inc if $inc[0] eq ".";
141 $self->{inc} = \@INC;
142 }
143
144 my %inc_seen;
145 my @inc = grep !$inc_seen{$_}++ && -d "$_/.", map Cwd::abs_path ($_), @{ $self->{inc} };
146 $self->{inc} = \@inc;
147
148 # maybe not inc, but these?
149 # sitearchexp
150 # sitelib
151 # vendorarchexp
152 # vendorlibexp
153 # archlibexp
154 # privlibexp
155
156 $self->_set_inc;
157
158 $self
159 }
160
161 sub _perl_path() {
162 my $secure_perl_path = $Config{perlpath};
163
164 if ($^O ne 'VMS') {
165 $secure_perl_path .= $Config{_exe}
166 unless $secure_perl_path =~ m/$Config{_exe}$/i;
167 }
168
169 $secure_perl_path
170 }
171
172 sub _path2match {
173 my $re = join "|", map "\Q$_", @_;
174
175 $re = "^(?:$re)\\/";
176 $re =~ s%\\[/\\]%[/\\\\]%g; # we support / and \ on all OSes, keep your fingers crossed
177
178 $case_tolerant
179 ? qr<$re>i
180 : qr<$re>
181 }
182
183 sub _read_packlist {
184 my ($self, $path) = @_;
185
186 my $pfxmatch = $self->{pfxmatch};
187 my $lib = $self->{lib};
188
189 my @packlist;
190
191 open my $fh, "<:perlio", $path
192 or die "$path: $!";
193
194 while (<$fh>) {
195 chomp;
196 s/ .*$//; # newer-style .packlists might contain key=value pairs
197
198 s%\\%/%g; # we only do unix-style paths internally
199
200 s/$pfxmatch// and exists $lib->{$_}
201 or next;
202
203 push @packlist, canonpath $_;
204 }
205
206 \@packlist
207 }
208
209 sub _set_inc {
210 my ($self) = @_;
211
212 my $pfxmatch = _path2match @{ $self->{inc }};
213
214 my %lib;
215 my @packlists;
216
217 # find all files in all libdirs, earlier ones overwrite later ones
218 my @scan = map [$_, ""], @{ $self->{inc} };
219
220 while (@scan) {
221 my ($root, $dir) = @{ pop @scan };
222
223 my $pfx = length $dir ? "$dir/" : "";
224
225 for (do {
226 opendir my $fh, "$root/$dir"
227 or croak "$root/$dir: $!";
228 grep !/^\.\.?$/, readdir $fh
229 }) {
230 if (-d "$root/$dir/$_/.") {
231 $lib{"$pfx$_/"} = "$root/$pfx$_";
232 push @scan, [$root, "$pfx$_"];
233 } elsif ($_ eq ".packlist" && $pfx =~ m%^auto/%) {
234 push @packlists, "$root/$pfx.packlist";
235 } elsif (/\.bs$/ && $pfx =~ m%^auto/% && !-s "$root/$dir/$_") {
236 # skip empty .bs files
237 # } elsif (/\.(?:pod|h|html)$/) {
238 # # not interested in those
239 } else {
240 #push @files, $_;
241 $lib{"$pfx$_"} = "$root/$pfx$_";
242 }
243 }
244
245 #$lib{"$_[1]/"} = [\@dirs, \@files]; # won't work nice with overwrite
246 }
247
248 $self->{lib} = \%lib;
249 $self->{pfxmatch} = $pfxmatch;
250
251 my %packlist;
252
253 # need to go forward here
254 for (@packlists) {
255 my $packlist = $self->_read_packlist ($_);
256
257 $packlist{$_} = $packlist
258 for @$packlist;
259 }
260
261 $self->{packlist} = \%packlist;
262 }
263
264 =back
265
266 =head2 TRACE/PACKLIST BASED ADDING
267
268 The following methods add various things to the set of files.
269
270 Each time a perl file is added, it is scanned by tracing either loading,
271 execution or compiling it, and seeing which other perl modules and
272 libraries have been loaded.
273
274 For each library file found this way, additional dependencies are added:
275 if packlists are enabled, then all files of the distribution that contains
276 the file will be added. If packlists are disabled, then only shared
277 objects and autoload files for modules will be added.
278
279 Only files from perl library directories will be added automatically. Any
280 other files (such as manpages or scripts installed in the F<bin>
281 directory) are skipped.
282
283 If there is an error, such as a module not being found, then this module
284 croaks (as opposed to silently skipping). If you want to add something of
285 which you are not sure it exists, then you can wrap the call into C<eval
286 {}>. In some cases, you can avoid this by executing the code you want
287 to work later using C<add_eval> - see C<add_core_support> for an actual
288 example of this technique.
289
290 Note that packlists are meant to add files not covered by other
291 mechanisms, such as resource files and other data files loaded directly by
292 a module - they are not meant to add dependencies that are missed because
293 they only happen at runtime.
294
295 For example, with packlists, when using L<AnyEvent>, then all event loop
296 backends are automatically added as well, but I<not> any event loops
297 (i.e. L<AnyEvent::Impl::POE> is added, but L<POE> itself is not). Without
298 packlists, only the backend that is being used is added (i.e. normally
299 none, as loading AnyEvent does not instantly load any backend).
300
301 To catch the extra event loop dependencies, you can either initialise
302 AnyEvent so it picks a suitable backend:
303
304 $extractor->add_eval ("use AnyEvent; AnyEvent::detect");
305
306 Or you can directly load the backend modules you plan to use:
307
308 $extractor->add_mod ("AnyEvent::Impl::EV", "AnyEvent::Impl::Perl");
309
310 An example of a program (or module) that has extra resource files is
311 L<Deliantra::Client> - the normal tracing (without packlist usage) will
312 correctly add all submodules, but miss the fonts and textures. By using
313 the packlist, those files are added correctly.
314
315 =over 4
316
317 =cut
318
319 sub _add {
320 my ($self, $add) = @_;
321
322 my $lib = $self->{lib};
323 my $path;
324
325 for (@$add) {
326 $path = "lib/$_";
327
328 $self->{set}{$path} ||= do {
329 my @info;
330
331 $info[I_SRC] = $lib->{$_}
332 or croak "$_: unable to locate file in perl library";
333
334 if ($self->{use_packlist} && exists $self->{packlist}{$_}) {
335 $self->{set}{"lib/$_"} ||= [$self->{lib}{$_} or die]
336 for @{ $self->{packlist}{$_} };
337
338 # for (grep /\.pm$/, @{ $self->{packlist}{$_} }) {
339 # s/\.pm$//;
340 # s%/%::%g;
341 # my $pkg = "Perl::LibExtractor::trace::_add_" . ++$self->{count};
342 # $self->add_eval ("{ package $pkg; eval 'use $_' }")
343 # unless $self->{_add_do}{$_}++;
344 # }
345 #
346 # $self->{_add_do}{$_}++ or $self->add_eval ("do q\x00$_\x00")
347 # for grep /\.pl$/, @{ $self->{packlist}{$_} };
348
349 } elsif (/^(.*)\.pm$/) {
350 (my $auto = "auto/$1/") =~ s%::%/%g;
351 $auto =~ m%/([^/]+)/$% or die;
352 my $base = $1;
353
354 if (exists $lib->{$auto}) {
355 # auto dir exists, scan it for cool stuff
356
357 # 1. shared object, others are of no interest to us
358 my $so = "$auto$base.$Config{dlext}";
359 if (my $src = $lib->{$so}) {
360 $so = "lib/$so";
361 push @{ $info[I_DEP] }, $so; $self->{set}{$so} = [$src];
362 }
363
364 # 2. autoloader/autosplit
365 my $ix = "${auto}autosplit.ix";
366 if (my $src = $lib->{$ix}) {
367 $ix = "lib/$ix";
368 push @{ $info[I_DEP] }, $ix; $self->{set}{$ix} = [$src];
369
370 open my $fh, "<:perlio", $src
371 or croak "$src: $!";
372
373 my $package;
374
375 while (<$fh>) {
376 if (/^\s*sub\s+ ([^[:space:];]+) \s* (?:\([^)]*\))? \s*;?\s*$/x) {
377 my $al = "auto/$package/$1.al";
378 my $src = $lib->{$al}
379 or croak "$al: autoload file not found, but should be there.";
380
381 $al = "lib/$al";
382 push @{ $info[I_DEP] }, $al; $self->{set}{$al} = [$src];
383
384 } elsif (/^\s*package\s+([^[:space:];]+)\s*;?\s*$/) {
385 ($package = $1) =~ s/::/\//g;
386 } elsif (/^\s*(?:#|1?\s*;?\s*$)/) {
387 # nop
388 } else {
389 warn "WARNING: $src: unparsable line, please report: $_";
390 }
391 }
392 }
393
394 skip:
395 }
396 }
397
398 if (exists $self->{extra_deps}{$_}) {
399 # we require it again, because many extra dependencies require the main module to be loaded
400 $self->add_eval ("require q\x00$_\x00");
401
402 exists $lib->{$_} and $self->add_require ($_)
403 for @{ $self->{extra_deps}{$_} };
404 }
405
406 \@info
407 };
408 }
409 }
410
411 sub _trace {
412 my ($self, $file, $eval) = @_;
413
414 $self->{trace_begin} .= "\n#line \"$file\" 1\n$eval;\n";
415 }
416
417 sub _trace_flush {
418 my ($self) = @_;
419
420 # ->_add might add additional files to trace
421 while (exists $self->{trace_begin} or exists $self->{trace_check}) {
422 my $tmpdir = newdir File::Temp;
423 my $dir = $tmpdir->dirname;
424
425 open my $fh, ">:perlio", "$dir/eval"
426 or croak "$dir/eval: $!";
427 syswrite $fh,
428 'BEGIN { '
429 . '@INC = (' . (join ", ", map "q\x00$_\x00", @{ $self->{inc} }) . ');'
430 . "chdir q\x00$dir\x00 or die q\x00$dir\x00 . \": \$!\";"
431 . '$Perl::LibExtractor::EXTRACTING = 1;'
432 . '}'
433 . 'BEGIN { ' . (delete $self->{trace_begin}) . "}\n"
434 . "CHECK {\n"
435 . 'open STDOUT, ">:raw", "out" or die "out: $!";'
436 . 'print join "\x00", values %INC;'
437 . 'open STDERR, ">stderr";' # suppress "syntax OK" message from perl
438 . "}\n"
439 . (delete $self->{trace_check});
440 close $fh;
441
442 system "cat $dir/eval >/tmp/x";#d#
443
444 system _perl_path, "-c", "$dir/eval"
445 and croak "trace failure, check trace process output - caught";
446
447 my @inc = split /\x00/, do {
448 open my $fh, "<:perlio", "$dir/out"
449 or croak "$dir/out: $!";
450 local $/;
451 scalar readline $fh
452 };
453
454 my $pfxmatch = $self->{pfxmatch};
455
456 # remove the library directory prefix, hope for the best
457 s/$pfxmatch//
458 or croak "$_: file outside any library directory"
459 for @inc;
460
461 $self->_add (\@inc);
462 }
463 }
464
465 =item $extractor->add_mod ($module[, $module...])
466
467 Adds the given module(s) to the file set - the module name must be specified
468 as in C<use>, i.e. with C<::> as separators and without F<.pm>.
469
470 The program will be loaded with the default import list, any dependent
471 files, such as the shared object implementing xs functions, or autoload
472 files, will also be added.
473
474 If you want to use a different import list (for those rare modules wghere
475 import lists trigger different backend modules to be loaded for example),
476 you can use C<add_eval> instead:
477
478 $extractor->add_eval ("use Module qw(a b c)");
479
480 Example: add F<Coro.pm> and F<AnyEvent/AIO.pm>, and all relevant files
481 from the distribution they are part of.
482
483 $extractor->add_mod ("Coro", "AnyEvent::AIO");
484
485 =cut
486
487 sub add_mod {
488 my $self = shift;
489
490 for (@_) {
491 my $pkg = "Perl::LibExtractor::trace::mod_" . ++$self->{count};
492 $self->_trace ("use $_", "{ package $pkg; use $_ }")
493 unless $self->{add_mod}{$_}++;
494 }
495 }
496
497 =item $extractor->add_require ($name[, $name...])
498
499 Works like C<add_mod>, but uses C<require $name> to load the module, i.e.
500 the name must be a filename.
501
502 Example: load Coro and AnyEvent::AIO, but using C<add_require> instead of C<add_mod>.
503
504 $extractor->add_require ("Coro.pm", "AnyEvent/AIO.pm");
505
506 =cut
507
508 sub add_require {
509 my $self = shift;
510
511 for (@_) {
512 $self->add_eval ("require q\x00$_\x00")
513 unless $self->{add_require}{$_}++;
514 }
515 }
516
517 =item $extractor->add_bin ($name[, $name...])
518
519 Adds the given (perl) program(s) to the file set, that is, a program
520 installed by some perl module, written in perl (an example would be the
521 L<perl-libextractor> program that is part of the C<Perl::LibExtractor>
522 distribution).
523
524 The program should be installed in one of the standard paths for perl
525 programs - C<$Config{sitebin}>, C<$Config{vendorbin}> or C<$Config{bin}>,
526 or must be specified via an absolute path (starting with F</>).
527
528 Example: add the deliantra client program installed by the
529 L<Deliantra::Client> module and put it under F<bin/deliantra>.
530
531 $extractor->add_bin ("deliantra");
532
533 =cut
534
535 sub add_bin {
536 my $self = shift;
537
538 exe:
539 for my $exe (@_) {
540 for my $dir ($exe =~ m%^/% ? "" : (grep length, $Config{sitebinexp}, $Config{vendorbinexp}, $Config{binexp})) {
541 if (open my $fh, "<:perlio", "$dir/$exe") {
542 if (-f $fh) {
543 my $file = do { local $/; readline $fh };
544
545 $self->_trace_flush if exists $self->{trace_check};
546 $self->{trace_check} = $file;
547
548 $self->{set}{"bin/$exe"} = ["$dir/$exe"];
549 next exe;
550 }
551 }
552 }
553
554 croak "add_bin ($exe): executable not found";
555 }
556 }
557
558 =item $extractor->add_eval ($string)
559
560 Evaluates the string as perl code and adds all modules that are loaded
561 by it. For example, this would add L<AnyEvent> and the default backend
562 implementation module and event loop module:
563
564 $extractor->add_eval ("use AnyEvent; AnyEvent::detect");
565
566 Each code snippet will be executed in its own package and under C<use
567 strict>.
568
569 =cut
570
571 sub add_eval {
572 my ($self, $eval) = @_;
573
574 (my $file = substr $eval, 0, 64) =~ s/\015?\012/\\n/g;
575
576 my $pkg = "Perl::LibExtractor::trace::eval_" . ++$self->{count};
577 $eval =~ s/\x00/\x00."\\x00".q\x00/g;
578 $self->_trace ($file,
579 "local \$^H = \$^H;" # vvvvvvvvvvvvvvvvvvvv = use strict; use utf8
580 . "eval q\x00package $pkg; BEGIN { \$^H = \$^H | 0x800600 } $eval\x00; die \"\$\@\" if \$\@;\n"
581 );
582 }
583
584 =back
585
586 =head2 OTHER METHODS FOR ADDING FILES
587
588 The following methods add commonly used files that are either not covered
589 by other methods or add commonly-used dependencies.
590
591 =over 4
592
593 =item $extractor->add_perl
594
595 Adds the perl binary itself to the file set, including the libperl dll, if
596 needed.
597
598 For example, on UNIX systems, this usually adds a F<exe/perl> and possibly
599 some F<dll/libperl.so.XXX>.
600
601 =cut
602
603 sub add_perl {
604 my ($self) = @_;
605
606 $self->{set}{"exe/perl$Config{_exe}"} = [_perl_path];
607
608 # on debian, we have the special case of a perl binary linked against
609 # a static libperl.a (which is not available), but the Config says to use
610 # a shared library, which is in the wrong directory, too (which breaks
611 # every other perl installation on the system - they are so stupid).
612
613 # that means we can't find the libperl.so, because dbeian actively breaks
614 # their perl install, and we don't need it. we work around this by silently
615 # not including the libperl if we cannot find it.
616
617 if ($Config{useshrplib} eq "true") {
618 my ($libperl, $libpath);
619
620 if ($^O eq "cygwin") {
621 $libperl = $Config{libperl};
622 $libpath = "$Config{binexp}/$libperl";
623 } elsif ($^O eq "MSWin32") {
624 ($libperl = $Config{libperl}) =~ s/\Q$Config{_a}\E$/.$Config{so}/;
625 $libpath = "$Config{binexp}/$libperl";
626 } else {
627 $libperl = $Config{libperl};
628 $libpath = $self->{lib}{"CORE/$libperl"};
629 }
630
631 $self->{set}{"dll/$libperl"} = [$libpath]
632 if length $libpath && -e $libpath;
633 }
634 }
635
636 =item $extractor->add_core_support
637
638 Try to add modules and files needed to support commonly-used builtin
639 language features. For example to open a scalar for I/O you need the
640 L<PerlIO::scalar> module:
641
642 open $fh, "<", \$scalar
643
644 A number of regex and string features (e.g. C<ucfirst>) need some unicore
645 files, e.g.:
646
647 'my $x = chr 1234; "\u$x\U$x\l$x\L$x"; $x =~ /\d|\w|\s|\b|$x/i';
648
649 This call adds these files (simply by executing code similar to the above
650 code fragments).
651
652 Notable things that are missing are other PerlIO layers, such as
653 L<PerlIO::encoding>, and named character and character class matches.
654
655 =cut
656
657 sub add_core_support {
658 my ($self) = @_;
659
660 $self->add_eval ('
661 # PerlIO::Scalar
662 my $v; open my $fh, "<", \$v;
663
664 # various unicore regex/builtin gambits
665 my $x = chr 1234;
666 "\u$x\U$x\l$x\L$x";
667 $x =~ /$_$x?/i
668 for qw(\d \w \s \b \R \h \v);
669 split " ", $x; # usually covered by the regex above
670 ');
671
672 $self->add_eval ('/\x{1234}(?<a>)\g{a}/') if $] >= 5.010; # usually covered by the regex above
673 }
674
675 =item $extractor->add_unicore
676
677 Adds (hopefully) all files from the unicore database that will ever be
678 needed.
679
680 If you are not sure which unicode character classes and similar unicore
681 databases you need, and you do not care about an extra one thousand(!)
682 files comprising 4MB of data, then you can just call this method, which
683 adds basically all files from perl's unicode database.
684
685 Note that C<add_core_support> also adds some unicore files, but it's not a
686 subset of C<add_unicore> - the former adds all files neccessary to support
687 core builtins (which includes some unicore files and other things), while
688 the latter adds all unicore files (but nothing else).
689
690 When in doubt, use both.
691
692 =cut
693
694 sub add_unicore {
695 my ($self) = @_;
696
697 $self->_add ([grep m%^unicore/.*\.pl$%, keys %{ $self->{lib} }]);
698 }
699
700 =item $extractor->add_core
701
702 This adds all files from the perl core distribution, that is, all library
703 files that come with perl.
704
705 This is a superset of C<add_core_support> and C<add_unicore>.
706
707 This is quite a lot, but on the plus side, you can be sure nothing is
708 missing.
709
710 This requires a full perl installation - Debian GNU/Linux doesn't package
711 the full perl library, so this function will not work there.
712
713 =cut
714
715 sub add_core {
716 my ($self) = @_;
717
718 my $lib = $self->{lib};
719
720 for (@{
721 $self->_read_packlist (".packlist")
722 }) {
723 $self->{set}{$_} ||= [
724 "lib/"
725 . ($lib->{$_} or croak "$_: unable to locate file in perl library")
726 ];
727 }
728 }
729
730 =back
731
732 =head2 GLOB-BASED ADDING AND FILTERING
733
734 These methods add or manipulate files by using glob-based patterns.
735
736 These glob patterns work similarly to glob patterns in the shell:
737
738 =over 4
739
740 =item /
741
742 A F</> at the start of the pattern interprets the pattern as a file
743 path inside the file set, almost the same as in the shell. For example,
744 F</bin/perl*> would match all files whose names starting with F<perl>
745 inside the F<bin> directory in the set.
746
747 If the F</> is missing, then the pattern is interpreted as a module name
748 (a F<.pm> file). For example, F<Coro> matches the file F<lib/Coro.pm> ,
749 while F<Coro::*> would match F<lib/Coro/*.pm>.
750
751 =item *
752
753 A single star matches anything inside a single directory component. For
754 example, F</lib/Coro/*.pm> would match all F<.pm> files inside the
755 F<lib/Coro/> directory, but not any files deeper in the hierarchy.
756
757 Another way to look at it is that a single star matches anything but a
758 slash (F</>).
759
760 =item **
761
762 A double star matches any number of characters in the path, including F</>.
763
764 For example, F<AnyEvent::**> would match all modules whose names start
765 with C<AnyEvent::>, no matter how deep in the hierarchy they are.
766
767 =back
768
769 =cut
770
771 sub _extglob2re {
772 for (quotemeta $_[1]) {
773 s/\\\*\\\*/.*/g;
774 s/\\\*/[^\/]*/g;
775 s/\\\?/[^\/]/g;
776
777 unless (s%^\\/%%) {
778 s%\\:\\:%/%g;
779 $_ = "lib/$_\\.pm";
780 }
781
782 $_ .= '$';
783 s/(?: \[\^\/\] | \. ) \*\$$//x; # remove ** at end
784
785 return qr<^$_>s
786 }
787 }
788
789 =over 4
790
791 =item $extractor->add_glob ($modglob[, $modglob...])
792
793 Adds all files from the perl library that match the given glob pattern.
794
795 For example, you could implement C<add_unicore> yourself like this:
796
797 $extractor->add_glob ("/unicore/**.pl");
798
799 =cut
800
801 sub add_glob {
802 my $self = shift;
803
804 for (@_) {
805 my $pat = $self->_extglob2re ($_);
806 $self->_add ([grep /$pat/, keys %{ $self->{lib} }]);
807 }
808 }
809
810 =item $extractor->filter ($pattern[, $pattern...])
811
812 Applies a series of include/exclude filters. Each filter must start with
813 either C<+> or C<->, to designate the pattern as I<include> or I<exclude>
814 pattern. The rest of the pattern is a normal glob pattern.
815
816 An exclude pattern (C<->) instantly removes all matching files from
817 the set. An include pattern (C<+>) protects matching files from later
818 removals.
819
820 That is, if you have an include pattern then all files that were matched
821 by it will be included in the set, regardless of any further exclude
822 patterns matching the same files.
823
824 Likewise, any file excluded by a pattern will not be included in the set,
825 even if matched by later include patterns.
826
827 Any files not matched by any expression will simply stay in the set.
828
829 For example, to remove most of the useless autoload functions by the POSIX
830 module (they either do the same thing as a builtin or always raise an
831 error), you would use this:
832
833 $extractor->filter ("-/lib/auto/POSIX/*.al");
834
835 This does not remove all autoload files, only the ones not defined by a
836 subclass (e.g. it leaves C<POSIX::SigRt::xxx> alone).
837
838 =cut
839
840 sub filter {
841 my ($self, @patterns) = @_;
842
843 $self->_trace_flush;
844
845 my $set = $self->{set};
846 my %include;
847
848 for my $pat (@patterns) {
849 $pat =~ s/^([+\-])//
850 or croak "$_: not a valid filter pattern (missing + or - prefix)";
851 my $inc = $1 eq "+";
852 $pat = $self->_extglob2re ($pat);
853
854 my @match = grep /$pat/, keys %$set;
855
856 if ($inc) {
857 @include{@match} = delete @$set{@match};
858 } else {
859 delete @$set{@{ $_->[I_DEP] }} # remove dependents
860 for delete @$set{@match};
861 }
862 }
863
864 my @include = keys %include;
865 @$set{@include} = delete @include{@include};
866 }
867
868 =item $extractor->runtime_only
869
870 This removes all files that are not needed at runtime, such as static
871 archives, header and other files needed only for compilation of modules,
872 and pod and html files (which are unlikely to be needed at runtime).
873
874 This is quite useful when you want to have only files actually needed to
875 execute a program.
876
877 =cut
878
879 sub runtime_only {
880 my ($self) = @_;
881
882 $self->_trace_flush;
883
884 my $set = $self->{set};
885
886 # delete all static libraries, also windows stuff
887 delete @$set{ grep m%^lib/auto/(?:.+/)?([^\/]+)/\1(?:\Q$Config{_a}\E|\.pdb|\.exp)$%s, keys %$set };
888
889 # delete all extralibs.ld and extralibs.all (no clue what the latter is for)
890 delete @$set{ grep m%^lib/auto/.*/extralibs\.(?:ld|all)$%s, keys %$set };
891
892 # delete all .pod, .h, .html files (hopefully none of them are used at runtime)
893 # also typemap
894 delete @$set{ grep m%^lib/.*\.(?:pod|h|html|typemap)$%s, keys %$set };
895
896 # delete unneeded unicore files
897 delete @$set{ grep m%^lib/unicore/(?:mktables(?:\.lst)?|.*\.txt)$%s, keys %$set };
898 }
899
900 =back
901
902 =head2 RESULT SET
903
904 =over 4
905
906 =item $set = $extractor->set
907
908 Returns a hash reference that represents the result set. The hash is the
909 actual internal storage hash and can only be modified as described below.
910
911 Each key in the hash is the path inside the set, without a leading slash,
912 e.g.:
913
914 bin/perl
915 lib/unicore/lib/Blk/Superscr.pl
916 lib/AnyEvent/Impl/EV.pm
917
918 The value is an array reference with mostly unspecified contents, except
919 the first element, which is the file system path where the actual file can
920 be found.
921
922 This code snippet lists all files inside the set:
923
924 print "$_\n"
925 for sort keys %{ $extractor->set });
926
927 This code fragment prints C<< filesystem_path => set_path >> pairs for all
928 files in the set:
929
930 my $set = $extractor->set;
931 while (my ($set,$fspath) = each %$set) {
932 print "$fspath => $set\n";
933 }
934
935 You can implement your own filtering by asking for the result set with
936 C<< $extractor->set >>, and then deleting keys from the referenced hash
937 - since you can ask for the result set at any time you can add things,
938 filter them out this way, and add additional things.
939
940 =back
941
942 =cut
943
944 sub set {
945 $_[0]->_trace_flush;
946 $_[0]{set}
947 }
948
949 =head1 EXTRACTION PROCESS
950
951 This module uses two perl execution modes to trace dependencies:
952 compiling only (as with the C<-c> switch) and running (the normal mode of
953 execution).
954
955 currently, scripts/binaries are compiled only, while everything else is
956 executed.
957
958 The variable C<$Perl::LibExtractor::EXTRACTING> exists and has a true
959 value when code is executed for tracing, and can be used to avoid
960 executing code that shouldn't be executed when determining dependencies,
961 e.g. with C<defined>:
962
963 unless (defined $Perl::LibExtractor::EXTRACTING) {
964 open my $fh, "<some/resource/file"
965 or die "$!";
966 }
967
968 You can test the truth value of C<$Perl::LibExtractor::EXTRACTING>
969 directly, but using C<defined> doesn't issue a warning when uninitialized
970 warnings are enabled.
971
972 =head1 EXAMPLE
973
974 To package he deliantra client (L<Deliantra::Client>), finding all
975 (perl) files needed to run it is a first step. This can be done by using
976 something like the following code snippet:
977
978 my $ex = new Perl::LibExtractor;
979
980 $ex->add_perl;
981 $ex->add_core_support;
982 $ex->add_bin ("deliantra");
983 $ex->add_mod ("AnyEvent::Impl::EV");
984 $ex->add_mod ("AnyEvent::Impl::Perl");
985 $ex->add_mod ("Urlader");
986 $ex->filter ("-/*/auto/POSIX/**.al");
987 $ex->runtime_only;
988
989 First it sets the perl library directory to F<pm> and F<.> (the latter
990 to work around some AutoLoader bugs), so perl uses only the perl library
991 files that came with the binary package.
992
993 Then it sets some environment variable to override the system default
994 (which might be incompatible).
995
996 Then it runs the client itself, using C<require>. Since C<require> only
997 looks in the perl library directory this is the reaosn why the scripts
998 were put there (of course, since F<.> is also included it doesn't matter,
999 but I refuse to yield to bugs).
1000
1001 Finally it exits with a clean status to signal "ok" to Urlader.
1002
1003 Back to the original C<Perl::LibExtractor> script: after initialising a
1004 new set, the script simply adds the F<perl> interpreter and core support
1005 files (just in case, not all are needed, but some are, and I am too lazy
1006 to find out which ones exactly).
1007
1008 Then it adds the deliantra executable itself, which in turn adds most of
1009 the required modules. After that, the AnyEvent implementation modules are
1010 added because these dependencies are not picked up automatically.
1011
1012 The L<Urlader> module is added because the client itself does not depend
1013 on it at all, but the wrapper does.
1014
1015 At this point, all required files are present, and it's time to slim
1016 down: most of the ueseless POSIX autoloaded functions are removed,
1017 not because they are so big, but because creating files is a costly
1018 operation in itself, so even small fiels have considerable overhead when
1019 unpacking. Then files not required for running the client are removed.
1020
1021 And that concludes it, the set is now ready.
1022
1023 =head1 SEE ALSO
1024
1025 The utility program that comes with this module: L<perl-libextractor>.
1026
1027 L<App::Staticperl>, L<Urlader>, L<Perl::Squish>.
1028
1029 =head1 LICENSE
1030
1031 This software package is licensed under the GPL version 3 or any later
1032 version, see COPYING for details.
1033
1034 This license does not, of course, apply to any output generated by this
1035 software.
1036
1037 =head1 AUTHOR
1038
1039 Marc Lehmann <schmorp@schmorp.de>
1040 http://home.schmorp.de/
1041
1042 =cut
1043
1044 1;
1045