--- App-Staticperl/staticperl.pod 2010/12/06 21:12:21 1.4 +++ App-Staticperl/staticperl.pod 2010/12/10 20:29:17 1.20 @@ -1,6 +1,6 @@ =head1 NAME -staticperl - perl, libc, 50 modules all in one 500kb file +staticperl - perl, libc, 100 modules, all in one 500kb file =head1 SYNOPSIS @@ -16,6 +16,7 @@ staticperl instcpan modulename... # install modules from CPAN staticperl mkbundle # see documentation staticperl mkperl # see documentation + staticperl mkapp appname # see documentation Typical Examples: @@ -24,19 +25,26 @@ staticperl mkperl -M '"Config_heavy.pl"' # build a perl that supports -V staticperl mkperl -MAnyEvent::Impl::Perl -MAnyEvent::HTTPD -MURI -MURI::http # build a perl with the above modules linked in + staticperl mkapp myapp --boot mainprog mymodules + # build a binary "myapp" from mainprog and mymodules =head1 DESCRIPTION -This script helps you creating single-file perl interpreters, or embedding -a perl interpreter in your applications. Single-file means that it is -fully self-contained - no separate shared objects, no autoload fragments, -no .pm or .pl files are needed. And when linking statically, you can -create (or embed) a single file that contains perl interpreter, libc, all -the modules you need and all the libraries you need. - -With F and F on x86, you can create a single 500kb binary that -contains perl and 50 modules such as AnyEvent, EV, IO::AIO, Coro and so -on. Or any other choice of modules. +This script helps you to create single-file perl interpreters +or applications, or embedding a perl interpreter in your +applications. Single-file means that it is fully self-contained - no +separate shared objects, no autoload fragments, no .pm or .pl files are +needed. And when linking statically, you can create (or embed) a single +file that contains perl interpreter, libc, all the modules you need, all +the libraries you need and of course your actual program. + +With F and F on x86, you can create a single 500kb binary +that contains perl and 100 modules such as POSIX, AnyEvent, EV, IO::AIO, +Coro and so on. Or any other choice of modules. + +To see how this turns out, you can try out smallperl and bigperl, two +pre-built static and compressed perl binaries with many and even more +modules: just follow the links at L. The created files do not need write access to the file system (like PAR does). In fact, since this script is in many ways similar to PAR::Packer, @@ -65,17 +73,21 @@ F loads all required files directly from memory. There is no need to unpack files into a temporary directory. -=item * More control over included files. +=item * More control over included files, more burden. PAR tries to be maintenance and hassle-free - it tries to include more -files than necessary to make sure everything works out of the box. The -extra files (such as the unicode database) can take substantial amounts of -memory and file size. +files than necessary to make sure everything works out of the box. It +mostly succeeds at this, but he extra files (such as the unicode database) +can take substantial amounts of memory and file size. With F, the burden is mostly with the developer - only direct compile-time dependencies and L are handled automatically. This means the modules to include often need to be tweaked manually. +All this does not preclude more permissive modes to be implemented in +the future, but right now, you have to resolve state hidden dependencies +manually. + =item * PAR works out of the box, F does not. Maintaining your own custom perl build can be a pain in the ass, and while @@ -83,6 +95,11 @@ build and possibly fiddling with some modules. PAR is likely to produce results faster. +Ok, PAR never has worked for me out of the box, and for some people, +F does work out of the box, as they don't count "fiddling with +module use lists" against it, but nevertheless, F is certainly +a bit more difficult to use. + =back =head1 HOW DOES IT WORK? @@ -100,9 +117,9 @@ except everything is compiled in), or you create bundle files (basically C sources you can use to embed all files into your project). -This step is very fast (a few seconds if PPI is not used for stripping, -more seconds otherwise, as PPI is very slow), and can be tweaked and -repeated as often as necessary. +This step is very fast (a few seconds if PPI is not used for stripping, or +the stripped files are in the cache), and can be tweaked and repeated as +often as necessary. =head1 THE F SCRIPT @@ -142,6 +159,10 @@ =over 4 +=item F + +Prints some info about the version of the F script you are using. + =item F Runs only the download and unpack phase, unless this has already happened. @@ -186,9 +207,12 @@ =item F -Runs F in the perl source directory (and potentially -cleans up other intermediate files). This can be used to clean up -intermediate files without removing the installed perl interpreter. +Deletes the perl source directory (and potentially cleans up other +intermediate files). This can be used to clean up files only needed for +building perl, without removing the installed perl interpreter, or to +force a re-build from scratch. + +At the moment, it doesn't delete downloaded tarballs. =item F @@ -238,6 +262,16 @@ to include that module. I found out about these dependencies by carefully watching any error messages about missing modules... +Instead of building a new perl binary, you can also build a standalone +application: + + # build the app + staticperl mkapp app --boot eg/httpd \ + -MAnyEvent::Impl::Perl -MAnyEvent::HTTPD -MURI::http + + # run it + ./app + =head3 OPTION PROCESSING All options can be given as arguments on the command line (typically @@ -262,6 +296,44 @@ order given on the command line (that affects the C<--use> and C<--eval> options at the moment). +=head3 PACKAGE SELECTION WORKFLOW + +F has a number of options to control package +selection. This section describes how they interact with each other. Also, +since I am still a newbie w.r.t. these issues, maybe future versions of +F will change this, so watch out :) + +The idiom "in order" means "in order that they are specified on the +commandline". If you use a bundle specification file, then the options +will be processed as if they were given in place of the bundle file name. + +=over 4 + +=item 1. apply all C<--use>, C<--eval>, C<--add>, C<--addbin> and +C<--incglob> options, in order. + +In addition, C<--use> and C<--eval> dependencies will be added when the +options are processed. + +=item 2. apply all C<--include> and C<--exclude> options, in order. + +All this step does is potentially reduce the number of files already +selected or found in phase 1. + +=item 3. find all modules (== F<.pm> files), gather their static archives +(F<.a>) and AutoLoader splitfiles (F<.ix> and F<.al> files), find any +extra libraries they need for linking (F) and optionally +evaluate any F<.packlist> files. + +This step is required to link against XS extensions and also adds files +required for L to do it's job. + +=back + +After this, all the files selected for bundling will be read and processed +(stripped), the bundle files will be written, and optionally a new F +or application binary will be linked. + =head3 MKBUNDLE OPTIONS =over 4 @@ -283,15 +355,17 @@ pod documentation, which is very fast and reduces file size a lot. The C method uses L to parse and condense the perl sources. This -saves a lot more than just L, and is generally safer, but -is also a lot slower, so is best used for production builds. Note that -this method doesn't optimise for raw file size, but for best compression -(that means that the uncompressed file size is a bit larger, but the files -compress better, e.g. with F). - -Last not least, in the unlikely case where C is too slow, or some -module gets mistreated, you can specify C to not mangle included -perl sources in any way. +saves a lot more than just L, and is generally safer, +but is also a lot slower (some files take almost a minute to strip - +F maintains a cache of stripped files to speed up subsequent +runs for this reason). Note that this method doesn't optimise for raw file +size, but for best compression (that means that the uncompressed file size +is a bit larger, but the files compress better, e.g. with F). + +Last not least, if you need accurate line numbers in error messages, +or in the unlikely case where C is too slow, or some module gets +mistreated, you can specify C to not mangle included perl sources in +any way. =item --perl @@ -305,6 +379,28 @@ # build a new ./perl with only common::sense in it - very small :) staticperl mkperl -Mcommon::sense +=item --app name + +After writing out the bundle files, try to link a new standalone +program. It will be called C, and the bundle files get removed after +linking it. + +The difference to the (mutually exclusive) C<--perl> option is that the +binary created by this option will not try to act as a perl interpreter - +instead it will simply initialise the perl interpreter, clean it up and +exit. + +This switch is automatically used when F is invoked with the +C command (instead of C): + +To let it do something useful you I add some boot code, e.g. with +the C<--boot> option. + +Example: create a standalone perl binary that will execute F when +it is started. + + staticperl mkbundle --app myexe --boot appfile + =item --use module | -Mmodule Include the named module and all direct dependencies. This is done by @@ -368,7 +464,41 @@ the perl interpreter executes scripts given on the command line (or via C<-e>). This works even in an embedded interpreter. -=item --add "file" | --add "file alias" +=item --usepacklist + +Read F<.packlist> files for each distribution that happens to match a +module name you specified. Sounds weird, and it is, so expect semantics to +change somehow in the future. + +The idea is that most CPAN distributions have a F<.pm> file that matches +the name of the distribution (which is rather reasonable after all). + +If this switch is enabled, then if any of the F<.pm> files that have been +selected match an install distribution, then all F<.pm>, F<.pl>, F<.al> +and F<.ix> files installed by this distribution are also included. + +For example, using this switch, when the L module is specified, then +all L submodules that have been installed via the CPAN distribution +are included as well, so you don't have to manually specify them. + +=item --incglob pattern + +This goes through all library directories and tries to match any F<.pm> +and F<.pl> files against the extended glob pattern (see below). If a file +matches, it is added. This switch will automatically detect L +files and the required link libraries for XS modules, but it will I +scan the file for dependencies (at the moment). + +This is mainly useful to include "everything": + + --incglob '*' + +Or to include perl libraries, or trees of those, such as the unicode +database files needed by many other modules: + + --incglob '/unicore/**.pl' + +=item --add file | --add "file alias" Adds the given (perl) file into the bundle (and optionally call it "alias"). This is useful to include any custom files into the bundle. @@ -384,6 +514,31 @@ add file2 myfiles/file2 add file3 myfiles/file3 +=item --binadd file | --add "file alias" + +Just like C<--add>, except that it treats the file as binary and adds it +without any processing. + +You should probably add a C prefix to avoid clashing with embedded +perl files (whose paths do not start with C), and/or use a special +directory, such as C. + +You can later get a copy of these files by calling C. + +=item --include pattern | -i pattern | --exclude pattern | -x pattern + +These two options define an include/exclude filter that is used after all +files selected by the other options have been found. Each include/exclude +is applied to all files found so far - an include makes sure that the +given files will be part of the resulting file set, an exclude will +exclude files. The patterns are "extended glob patterns" (see below). + +For example, to include everything, except C modules, but still +include F, you could use this: + + --incglob '*' -i '/Devel/PPPort.pm' -x '/Devel/**' + =item --static When C<--perl> is also given, link statically instead of dynamically. The @@ -397,6 +552,24 @@ executables, or try the C<--staticlibs> option to link only some libraries statically. +=item --staticlib libname + +When not linking fully statically, this option allows you to link specific +libraries statically. What it does is simply replace all occurances of +C<-llibname> with the GCC-specific C<-Wl,-Bstatic -llibname -Wl,-Bdynamic> +option. + +This will have no effect unless the library is actually linked against, +specifically, C<--staticlib> will not link against the named library +unless it would be linked against anyway. + +Example: link libcrypt statically into the binary. + + staticperl mkperl -MIO::AIO --staticlib crypt + + # ldopts might nwo contain: + # -lm -Wl,-Bstatic -lcrypt -Wl,-Bdynamic -lpthread + =item any other argument Any other argument is interpreted as a bundle specification file, which @@ -404,18 +577,62 @@ =back -=head2 F CONFIGURATION AND HOOKS +=head3 EXTENDED GLOB PATTERNS + +Some options of F expect an I. This is neither a normal shell glob nor a regex, but something +in between. The idea has been copied from rsync, and there are the current +matching rules: + +=over 4 + +=item Patterns starting with F will be a anchored at the root of the library tree. + +That is, F will match the F directory in C<@INC>, but +nothing inside, and neither any other file or directory called F +anywhere else in the hierarchy. + +=item Patterns not starting with F will be anchored at the end of the path. + +That is, F will match any file called F anywhere in the +hierarchy, but not any directories of the same name. + +=item A F<*> matches any single component. + +That is, F would match all F<.pl> files directly inside +C, not any deeper level F<.pl> files. Or in other words, F<*> +will not match slashes. + +=item A F<**> matches anything. + +That is, F would match all F<.pl> files under F, +no matter how deeply nested they are inside subdirectories. -During (each) startup, F tries to source the following shell -files in order: +=item A F matches a single character within a component. + +That is, F matches F, but not the +hypothetical F, as F does not match F. + +=back + +=head2 F CONFIGURATION AND HOOKS + +During (each) startup, F tries to source some shell files to +allow you to fine-tune/override configuration settings. + +In them you can override shell variables, or define shell functions +("hooks") to be called at specific phases during installation. For +example, you could define a C hook to install additional +modules from CPAN each time you start from scratch. + +If the env variable C<$STATICPERLRC> is set, then F will try +to source the file named with it only. Otherwise, it tries the following +shell files in order: /etc/staticperlrc ~/.staticperlrc $STATICPERL/rc -They can be used to override shell variables, or define functions to be -called at specific phases. - Note that the last file is erased during F, so generally should not be used. @@ -430,61 +647,73 @@ The e-mail address of the person who built this binary. Has no good default, so should be specified by you. -=back +=item C -=head4 Variables you I to override +The URL of the CPAN mirror to use (e.g. L). -=over 4 +=item C -=item C +Additional modules installed during F. Here you can +set which modules you want have to installed from CPAN. -The perl version to install - default is currently C<5.12.2>, but C<5.8.9> -is also a good choice (5.8.9 is much smaller than 5.12.2, while 5.10.1 is -about as big as 5.12.2). +Example: I really really need EV, AnyEvent, Coro and AnyEvent::AIO. -=item C + EXTRA_MODULES="EV AnyEvent Coro AnyEvent::AIO" -The URL of the CPAN mirror to use (e.g. L). +Note that you can also use a C hook to achieve this, and +more. -=item C, C, C, C +=back -These flags are passed to perl's F script, and are generally -optimised for small size (at the cost of performance). Since they also -contain subtle workarounds around various build issues, changing these -usually requires understanding their default values - best look at the top -of the F script for more info on these. +=head4 Variables you might I to override + +=over 4 =item C The directory where staticperl stores all its files (default: F<~/.staticperl>). -=item C - -The prefix where perl gets installed (default: F<$STATICPERL/perl>), -i.e. where the F and F subdirectories will end up. - -=item C, C, others +=item C, C, ... Usually set to C<1> to make modules "less inquisitive" during their installation, you can set any environment variable you want - some modules (such as L or L) use environment variables for further tweaking. -=item C +=item C -Additional modules installed during F. Here you can -set which modules you want have to installed from CPAN. +The perl version to install - default is currently C<5.12.2>, but C<5.8.9> +is also a good choice (5.8.9 is much smaller than 5.12.2, while 5.10.1 is +about as big as 5.12.2). -Example: I really really need EV, AnyEvent, Coro and IO::AIO. +=item C - EXTRA_MODULES="EV AnyEvent Coro IO::AIO" +The prefix where perl gets installed (default: F<$STATICPERL/perl>), +i.e. where the F and F subdirectories will end up. -Note that you can also use a C hook to achieve this, and -more. +=item C + +Additional Configure options - these are simply passed to the perl +Configure script. For example, if you wanted to enable dynamic loading, +you could pass C<-Dusedl>. To enable ithreads (Why would you want that +insanity? Don't! Use L instead!) you would pass C<-Duseithreads> +and so on. + +More commonly, you would either activate 64 bit integer support +(C<-Duse64bitint>), or disable large files support (-Uuselargefiles), to +reduce filesize further. + +=item C, C, C, C + +These flags are passed to perl's F script, and are generally +optimised for small size (at the cost of performance). Since they also +contain subtle workarounds around various build issues, changing these +usually requires understanding their default values - best look at the top +of the F script for more info on these. =back -=head4 Variables you I to override +=head4 Variables you probably I to override =over 4 @@ -519,13 +748,21 @@ =over 4 +=item preconfigure + +Called just before running F<./Configur> in the perl source +directory. Current working directory is the perl source directory. + +This can be used to set any C variables, which might be costly +to compute. + =item postconfigure Called after configuring, but before building perl. Current working directory is the perl source directory. -Could be used to tailor/patch config.sh (followed by F<./Configure -S>) or -do any other modifications. +Could be used to tailor/patch config.sh (followed by F) +or do any other modifications. =item postbuild @@ -550,6 +787,282 @@ =back +=head1 ANATOMY OF A BUNDLE + +When not building a new perl binary, C will leave a number of +files in the current working directory, which can be used to embed a perl +interpreter in your program. + +Intimate knowledge of L and preferably some experience with +embedding perl is highly recommended. + +C (or the C<--perl> option) basically does this to link the new +interpreter (it also adds a main program to F): + + $Config{cc} $(cat bundle.ccopts) -o perl bundle.c $(cat bundle.ldopts) + +=over 4 + +=item bundle.h + +A header file that contains the prototypes of the few symbols "exported" +by bundle.c, and also exposes the perl headers to the application. + +=over 4 + +=item staticperl_init () + +Initialises the perl interpreter. You can use the normal perl functions +after calling this function, for example, to define extra functions or +to load a .pm file that contains some initialisation code, or the main +program function: + + XS (xsfunction) + { + dXSARGS; + + // now we have items, ST(i) etc. + } + + static void + run_myapp(void) + { + staticperl_init (); + newXSproto ("myapp::xsfunction", xsfunction, __FILE__, "$$;$"); + eval_pv ("require myapp::main", 1); // executes "myapp/main.pm" + } + +=item staticperl_xs_init (pTHX) + +Sometimes you need direct control over C and C, in +which case you do not want to use C but call them on your +own. + +Then you need this function - either pass it directly as the C +function to C, or call it from your own C function. + +=item staticperl_cleanup () + +In the unlikely case that you want to destroy the perl interpreter, here +is the corresponding function. + +=item PerlInterpreter *staticperl + +The perl interpreter pointer used by staticperl. Not normally so useful, +but there it is. + +=back + +=item bundle.ccopts + +Contains the compiler options required to compile at least F and +any file that includes F - you should probably use it in your +C. + +=item bundle.ldopts + +The linker options needed to link the final program. + +=back + +=head1 RUNTIME FUNCTIONALITY + +Binaries created with C/C contain extra functions, which +are required to access the bundled perl sources, but might be useful for +other purposes. + +In addition, for the embedded loading of perl files to work, F +overrides the C<@INC> array. + +=over 4 + +=item $file = staticperl::find $path + +Returns the data associated with the given C<$path> +(e.g. C, C), which is basically +the UNIX path relative to the perl library directory. + +Returns C if the file isn't embedded. + +=item @paths = staticperl::list + +Returns the list of all paths embedded in this binary. + +=back + +=head1 FULLY STATIC BINARIES - BUILDROOT + +To make truly static (Linux-) libraries, you might want to have a look at +buildroot (L). + +Buildroot is primarily meant to set up a cross-compile environment (which +is not so useful as perl doesn't quite like cross compiles), but it can also compile +a chroot environment where you can use F. + +To do so, download buildroot, and enable "Build options => development +files in target filesystem" and optionally "Build options => gcc +optimization level (optimize for size)". At the time of writing, I had +good experiences with GCC 4.4.x but not GCC 4.5. + +To minimise code size, I used C<-pipe -ffunction-sections -fdata-sections +-finline-limit=8 -fno-builtin-strlen -mtune=i386>. The C<-mtune=i386> +doesn't decrease codesize much, but it makes the file much more +compressible. + +If you don't need Coro or threads, you can go with "linuxthreads.old" (or +no thread support). For Coro, it is highly recommended to switch to a +uClibc newer than 0.9.31 (at the time of this writing, I used the 20101201 +snapshot) and enable NPTL, otherwise Coro needs to be configured with the +ultra-slow pthreads backend to work around linuxthreads bugs (it also uses +twice the address space needed for stacks). + +If you use C, then you should also be aware that +uClibc shares C between all threads when statically linking. See +L for a +workaround (And L for discussion). + +C support is also recommended, especially if you want +to play around with buildroot options. Enabling the C +package will probably enable all options required for a successful +perl build. F itself additionally needs either C +(recommended, for CPAN) or C. + +As for shells, busybox should provide all that is needed, but the default +busybox configuration doesn't include F which is needed by perl - +either make a custom busybox config, or compile coreutils. + +For the latter route, you might find that bash has some bugs that keep +it from working properly in a chroot - either use dash (and link it to +F inside the chroot) or link busybox to F, using it's +built-in ash shell. + +Finally, you need F inside the chroot for many scripts to work +- F or bind-mounting your F will +both provide this. + +After you have compiled and set up your buildroot target, you can copy +F from the C distribution or from your +perl f directory (if you installed it) into the F +filesystem, chroot inside and run it. + +=head1 RECIPES / SPECIFIC MODULES + +This section contains some common(?) recipes and information about +problems with some common modules or perl constructs that require extra +files to be included. + +=head2 MODULES + +=over 4 + +=item utf8 + +Some functionality in the utf8 module, such as swash handling (used +for unicode character ranges in regexes) is implemented in the +C<"utf8_heavy.pl"> library: + + -M'"utf8_heavy.pl"' + +Many Unicode properties in turn are defined in separate modules, +such as C<"unicore/Heavy.pl"> and more specific data tables such as +C<"unicore/To/Digit.pl"> or C<"unicore/lib/Perl/Word.pl">. These tables +are big (7MB uncompressed, although F contains special +handling for those files), so including them on demand by your application +only might pay off. + +To simply include the whole unicode database, use: + + --incglob '/unicore/*.pl' + +=item AnyEvent + +AnyEvent needs a backend implementation that it will load in a delayed +fashion. The L backend is the default choice +for AnyEvent if it can't find anything else, and is usually a safe +fallback. If you plan to use e.g. L (L...), then you need to +include the L (L...) backend as +well. + +If you want to handle IRIs or IDNs (L punycode and idn +functions), you also need to include C<"AnyEvent/Util/idna.pl"> and +C<"AnyEvent/Util/uts46data.pl">. + +Or you can use C<--usepacklist> and specify C<-MAnyEvent> to include +everything. + +=item Carp + +Carp had (in older versions of perl) a dependency on L. As of +perl 5.12.2 (maybe earlier), this dependency no longer exists. + +=item Config + +The F switch (as well as many modules) needs L, which in +turn might need L<"Config_heavy.pl">. Including the latter gives you +both. + +=item Term::ReadLine::Perl + +Also needs L, or C<--usepacklist>. + +=item URI + +URI implements schemes as separate modules - the generic URL scheme is +implemented in L, HTTP is implemented in L. If +you need to use any of these schemes, you should include these manually, +or use C<--usepacklist>. + +=back + +=head2 RECIPES + +=over 4 + +=item Linking everything in + +To link just about everything installed in the perl library into a new +perl, try this: + + staticperl mkperl --strip ppi --incglob '*' + +=item Getting rid of netdb function + +The perl core has lots of netdb functions (C, C +and so on) that few applications use. You can avoid compiling them in by +putting the following fragment into a C hook: + + preconfigure() { + for sym in \ + d_getgrnam_r d_endgrent d_endgrent_r d_endhent \ + d_endhostent_r d_endnent d_endnetent_r d_endpent \ + d_endprotoent_r d_endpwent d_endpwent_r d_endsent \ + d_endservent_r d_getgrent d_getgrent_r d_getgrgid_r \ + d_getgrnam_r d_gethbyaddr d_gethent d_getsbyport \ + d_gethostbyaddr_r d_gethostbyname_r d_gethostent_r \ + d_getlogin_r d_getnbyaddr d_getnbyname d_getnent \ + d_getnetbyaddr_r d_getnetbyname_r d_getnetent_r \ + d_getpent d_getpbyname d_getpbynumber d_getprotobyname_r \ + d_getprotobynumber_r d_getprotoent_r d_getpwent \ + d_getpwent_r d_getpwnam_r d_getpwuid_r d_getsent \ + d_getservbyname_r d_getservbyport_r d_getservent_r \ + d_getspnam_r d_getsbyname + # d_gethbyname + do + PERL_CONFIGURE="$PERL_CONFIGURE -U$sym" + done + } + +This mostly gains space when linking staticaly, as the functions will +liekly not be linked in. The gain for dynamically-linked binaries is +smaller. + +Also, this leaves C in - not only is it actually used +often, the L module also exposes it, so leaving it out usually +gains little. Why Socket exposes a C function that is in the core already +is anybody's guess. + +=back + =head1 AUTHOR Marc Lehmann