ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/perlmulticore/perlmulticore.pod
Revision: 1.16
Committed: Mon Mar 4 15:41:29 2019 UTC (5 years, 7 months ago) by root
Branch: MAIN
CVS Tags: HEAD
Changes since 1.15: +0 -41 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     The Perl Multicore Specification and Implementation
4    
5     =head1 SYNOPSIS
6    
7 root 1.6 #include "perlmulticore.h"
8 root 1.1
9     // in your XS function:
10    
11     perlinterp_release ();
12     do_the_C_thing ();
13     perlinterp_acquire ();
14    
15     =head1 DESCRIPTION
16    
17 root 1.2 This specification describes a simple mechanism for XS modules to allow
18 root 1.1 re-use of the perl interpreter for other threads while doing some lengthy
19     operation, such as cryptography, SQL queries, disk I/O and so on.
20    
21 root 1.15 The mechanism basically implements the same mechanism that practically
22     all other scripting languages (e.g. python) use when implementing real
23     threads.
24    
25 root 1.2 The design goals for this mechanism were to be simple to use, to be
26     extremely low overhead when not active, with both low code and data size
27     overhead and broad applicability.
28 root 1.1
29     The newest version of this document can be found at
30     L<http://perlmulticore.schmorp.de/>.
31    
32 root 1.2 The newest version of the header file that implements this specification
33     can be downloaded from L<http://perlmulticore.schmorp.de/perlmulticore.h>.
34 root 1.1
35 root 1.3 =head2 XS? HOW DO I USE THIS FROM PERL?
36    
37     This document is only about the XS-level mechanism that defines generic
38     callbacks - to make use of this, you need a module that provides an
39     implementation for these callbacks, for example
40     L<Coro::Multicore|http://pod.tst.eu/http://cvs.schmorp.de/Coro-Multicore/Multicore.pm>.
41    
42     =head2 WHICH MODULES SUPPORT IT?
43    
44     You can check L<the perl multicore registry|http://perlmulticore.schmorp.de/registry>
45     for a list of modules that support this specification.
46    
47 root 1.1 =head1 HOW DO I USE THIS IN MY MODULES?
48    
49     The usage is very simple - you include this header file in your XS module. Then, before you
50     do your lengthy operation, you release the perl interpreter:
51    
52     perlinterp_release ();
53    
54     And when you are done with your computation, you acquire it again:
55    
56     perlinterp_acquire ();
57    
58     And that's it. This doesn't load any modules and consists of only a few
59     machine instructions when no module to take advantage of it is loaded.
60    
61     Here is a simple example, an C<flock> wrapper implemented in XS. Unlike
62     perl's built-in C<flock>, it allows other threads (for example, those
63     provided by L<Coro>) to execute, instead of blocking the whole perl
64     interpreter. For the sake of this example, it requires a file descriptor
65     instead of a handle.
66    
67     #include "perlmulticore.h" // this header file
68    
69     // and in the XS portion
70     int flock (int fd, int operation)
71     CODE:
72     perlinterp_release ();
73     RETVAL = flock (fd, operation);
74     perlinterp_acquire ();
75     OUTPUT:
76     RETVAL
77    
78 root 1.3 You cna find more examples In the L<Case Studies> appendix.
79 root 1.1
80     =head2 HOW ABOUT NOT-SO LONG WORK?
81    
82     Sometimes you don't know how long your code will take - in a compression
83     library for example, compressing a few hundred Kilobyte of data can take
84     a while, while 50 Bytes will compress so fast that even attempting to do
85     something else could be more costly than just doing it.
86    
87     This is a very hard problem to solve. The best you can do at the moment is
88     to release the perl interpreter only when you think the work to be done
89     justifies the expense.
90    
91     As a rule of thumb, if you expect to need more than a few thousand cycles,
92     you should release the interpreter, else you shouldn't. When in doubt,
93     release.
94    
95     For example, in a compression library, you might want to do this:
96    
97     if (bytes_to_be_compressed > 2000) perlinterp_release ();
98     do_compress (...);
99     if (bytes_to_be_compressed > 2000) perlinterp_acquire ();
100    
101     Make sure the if conditions are exactly the same and don't change, so you
102     always call acquire when you release, and vice versa.
103    
104     When you don't have a handy indicator, you might still do something
105     useful. For example, if you do some file locking with C<fcntl> and you
106     expect the lock to be available immediately in most cases, you could try
107     with C<F_SETLK> (which doesn't wait), and only release/wait/acquire when
108     the lock couldn't be set:
109    
110     int res = fcntl (fd, F_SETLK, &flock);
111    
112     if (res)
113     {
114     // error, assume lock is held by another process and do it the slow way
115     perlinterp_release ();
116     res = fcntl (fd, F_SETLKW, &flock);
117     perlinterp_acquire ();
118     }
119    
120     =head1 THE HARD AND FAST RULES
121    
122     As with everything, there are a number of rules to follow.
123    
124     =over 4
125    
126     =item I<Never> touch any perl data structures after calling C<perlinterp_release>.
127    
128     Possibly the most important rule of them all, anything perl is
129     completely off-limits after C<perlinterp_release>, until you call
130     C<perlinterp_acquire>, after which you can access perl stuff again.
131    
132     That includes anything in the perl interpreter that you didn't prove to be
133     safe, and didn't prove to be safe in older and future versions of perl:
134     global variables, local perl scalars, even if you are sure nobody accesses
135     them and you only try to "read" their value, and so on.
136    
137     If you need to access perl things, do it before releasing the
138     interpreter with C<perlinterp_release>, or after acquiring it again with
139     C<perlinterp_acquire>.
140    
141     =item I<Always> call C<perlinterp_release> and C<perlinterp_acquire> in pairs.
142    
143     For each C<perlinterp_release> call there must be a C<perlinterp_acquire>
144     call. They don't have to be in the same function, and you can have
145     multiple calls to them, as long as every C<perlinterp_release> call is
146     followed by exactly one C<perlinterp_acquire> call.
147    
148     For example., this would be fine:
149    
150     perlinterp_release ();
151    
152     if (!function_that_fails_with_0_return_value ())
153     {
154     perlinterp_acquire ();
155     croak ("error");
156     // croak doesn't return
157     }
158    
159     perlinterp_acquire ();
160     // do other stuff
161    
162     =item I<Never> nest calls to C<perlinterp_release> and C<perlinterp_acquire>.
163    
164     That simply means that after calling C<perlinterp_release>, you must
165     call C<perlinterp_acquire> before calling C<perlinterp_release>
166     again. Likewise, after C<perlinterp_acquire>, you can call
167     C<perlinterp_release> but not another C<perlinterp_acquire>.
168    
169     =item I<Always> call C<perlinterp_release> first.
170    
171     Also simple: you I<must not> call C<perlinterp_acquire> without having
172     called C<perlinterp_release> before.
173    
174     =item I<Never> underestimate threads.
175    
176     While it's easy to add parallel execution ability to your XS module, it
177     doesn't mean it is safe. After you release the perl interpreter, it's
178     perfectly possible that it will call your XS function in another thread,
179     even while your original function still executes. In other words: your C
180     code must be thread safe, and if you use any library, that library must be
181     thread-safe, too.
182    
183     Always assume that the code between C<perlinterp_release> and
184     C<perlinterp_acquire> is executed in parallel on multiple CPUs at the same
185     time. If your code can't cope with that, you could consider using a mutex
186     to only allow one such execution, which is still better than blocking
187     everybody else from doing anything:
188    
189     static pthread_mutex_t my_mutex = PTHREAD_MUTEX_INITIALIZER;
190    
191     perlinterp_release ();
192     pthread_mutex_lock (&my_mutex);
193     do_your_non_thread_safe_thing ();
194     pthread_mutex_unlock (&my_mutex);
195     perlinterp_acquire ();
196    
197     =item I<Don't> get confused by having to release first.
198    
199     In many real world scenarios, you acquire a resource, do something, then
200     release it again. Don't let this confuse you, with this, you already own
201     the resource (the perl interpreter) so you have to I<release> first, and
202     I<acquire> it again later, not the other way around.
203    
204     =back
205    
206    
207     =head1 DESIGN PRINCIPLES
208    
209     This section discusses how the design goals were reached (you be the
210     judge), how it is implemented, and what overheads this implies.
211    
212     =over 4
213    
214     =item Simple to Use
215    
216     All you have to do is identify the place in your existing code where you
217     stop touching perl stuff, do your actual work, and start touching perl
218     stuff again.
219    
220     Then slap C<perlinterp_release ()> and C<perlinterp_acquire ()> around the
221     actual work code.
222    
223     You have to include F<perlmulticore.h> and distribute it with your XS
224     code, but all these things border on the trivial.
225    
226     =item Very Efficient
227    
228     The definition for C<perlinterp_release> and C<perlinterp_release> is very
229     short:
230    
231     #define perlinterp_release() perl_multicore_api->pmapi_release ()
232     #define perlinterp_acquire() perl_multicore_api->pmapi_acquire ()
233    
234     Both are macros that read a pointer from memory (perl_multicore_api),
235     dereference a function pointer stored at that place, and call the
236     function, which takes no arguments and returns nothing.
237    
238     The first call to C<perlinterp_release> will check for the presence
239     of any supporting module, and if none is loaded, will create a dummy
240     implementation where both C<pmapi_release> and C<pmapi_acquire> execute
241     this function:
242    
243     static void perl_multicore_nop (void) { }
244    
245     So in the case of no magical module being loaded, all calls except the
246     first are two memory accesses and a predictable function call of an empty
247     function.
248    
249     Of course, the overhead is much higher when these functions actually
250     implement anything useful, but you always get what you pay for.
251    
252     With L<Coro::Multicore>, every release/acquire involves two pthread
253     switches, two coro thread switches, a bunch of syscalls, and sometimes
254     interacting with the event loop.
255    
256     A dedicated thread pool such as the one L<IO::AIO> uses could reduce
257     these overheads, and would also reduce the dependencies (L<AnyEvent> is a
258     smaller and more portable dependency than L<Coro>), but it would require a
259     lot more work on the side of the module author wanting to support it than
260     this solution.
261    
262     =item Low Code and Data Size Overhead
263    
264     On a 64 bit system, F<perlmulticore.h> uses exactly C<8> octets (one
265     pointer) of your data segment, to store the C<perl_multicore_api>
266     pointer. In addition it creates a C<16> octet perl string to store the
267     function pointers in, and stores it in a hash provided by perl for this
268     purpose.
269    
270     This is pretty much the equivalent of executing this code:
271    
272     $existing_hash{perl_multicore_api} = "123456781234567812345678";
273    
274     And that's it, which is, as I think, indeed very little.
275    
276 root 1.10 As for code size and speed, on my amd64 system, every call to
277     C<perlinterp_release> or C<perlinterp_acquire> results in a variation of
278     the following 9-10 octet sequence which is easy to predict for modern
279     CPUs, as the function pointer is constant after initialisation:
280 root 1.1
281     150> mov 0x200f23(%rip),%rax # <perl_multicore_api>
282     157> callq *0x8(%rax)
283    
284 root 1.10 The actual function being called when no backend is installed or enabled
285     looks like this:
286    
287     1310> retq
288    
289 root 1.7 The biggest part is the initialisation code, which consists of 11 lines of
290 root 1.1 typical XS code. On my system, all the code in F<perlmulticore.h> compiles
291     to less than 160 octets of read-only data.
292    
293     =item Broad Applicability
294    
295     While there are alternative ways to achieve the goal of parallel execution
296     with threads that might be more efficient, this mechanism was chosen
297     because it is very simple to retrofit existing modules with it, and it
298    
299     The design goals for this mechanism were to be simple to use, very
300     efficient when not needed, low code and data size overhead and broad
301     applicability.
302    
303     =back
304    
305    
306     =head1 DISABLING PERL MULTICORE AT COMPILE TIME
307    
308     You can disable the complete perl multicore API by defining the
309     symbol C<PERL_MULTICORE_DISABLE> to C<1> (e.g. by specifying
310     F<-DPERL_MULTICORE_DISABLE> as compiler argument).
311    
312     This will leave no traces of the API in the compiled code, suitable
313     "empty" C<perl_release> and C<perl_acquire> definitions will be provided.
314    
315     This could be added to perl's C<CPPFLAGS> when configuring perl on
316 root 1.8 platforms that do not support threading at all for example, and would
317     reduce the overhead to nothing. It is by no means required, though, as the
318     header will compile and work just fine without any thread support.
319 root 1.1
320    
321 root 1.4 =head1 APPENDIX: CASE STUDIESX<Case Studies>
322 root 1.3
323 root 1.4 This appendix contains some case studies on how to patch existing
324 root 1.3 modules. Unless they are available on CPAN, the patched modules (including
325     diffs), can be found at the perl multicore repository (see L<the
326     perlmulticore registry|http://perlmulticore.schmorp.de/registry>)
327    
328     In addition to the patches shown, the
329     L<perlmulticore.h|http://perlmulticore.schmorp.de/perlmulticore.h> header
330     must be added to the module and included in any XS or C file that uses it.
331    
332    
333     =head2 Case Study: C<Digest::MD5>
334    
335     The C<Digest::MD5> module presents some unique challenges becausu it mixes
336     Perl-I/O and CPU-based processing.
337    
338     So first let's identify the easy cases - set up (in C<new>) and
339     calculating the final digest are very fast operations and would unlikely
340     profit from running them in a separate thread. Which leaves the C<add>
341     method and the C<md5> (C<md5_hex>, C<md5_base64>) functions.
342    
343     They are both very easy to update - the C<MD5Update> call
344     doesn't access any perl data structures, so you can slap
345     C<perlinterp_release>/C<perlinterp_acquire> around it:
346    
347     if (len > 8000) perlinterp_release ();
348     MD5Update(context, data, len);
349     if (len > 8000) perlinterp_acquire ();
350    
351     This works for both C<add> and C<md5> XS functions. The C<8000> is
352     somewhat arbitrary.
353    
354     This leaves C<addfile>, which would normally be the ideal candidate,
355     because it is often used on large files and needs to wait both for I/O and
356     the CPU. Unfortunately, it is implemented like this (only the inner loop
357     is shown):
358    
359     unsigned char buffer[4096];
360    
361     while ( (n = PerlIO_read(fh, buffer, sizeof(buffer))) > 0) {
362     MD5Update(context, buffer, n);
363     }
364    
365     That is, it uses a 4KB buffer per C<MD5Update>. Putting
366     C<perlinterp_release>/C<perlinterp_acquire> calls around it would be way
367     too inefficient. Ideally, you would want to put them around the whole
368     loop.
369    
370     Unfortunately, C<Digest::MD5> uses C<PerlIO> for the actual I/O, and
371     C<PerlIO> is not thread-safe. We can't even use a mutex, as we would have
372     to protect against all other C<PerlIO> calls.
373    
374     As a compromise, we can use the C<USE_HEAP_INSTEAD_OF_STACK> option that
375     C<Digest::MD5> provide, which puts the buffer onto the stack, and use a
376     far larger buffer:
377    
378     #define USE_HEAP_INSTEAD_OF_STACK
379    
380     New(0, buffer, 1024 * 1024, unsigned char);
381    
382     while ( (n = PerlIO_read(fh, buffer, sizeof(buffer))) > 0) {
383     if (n > 8000) perlinterp_release ();
384     MD5Update(context, buffer, n);
385     if (n > 8000) perlinterp_acquire ();
386     }
387    
388     This will unfortunately still block on I/O, and allocate a large block of
389     memory, but it is better than nothing.
390    
391    
392     =head2 Case Study: C<DBD::mysql>
393    
394     Another example would be to modify C<DBD::mysql> to allow other
395     threads to execute while executing SQL queries.
396    
397     The actual code that needs to be patched is not actually in an F<.xs>
398     file, but in the F<dbdimp.c> file, which is included in an XS file.
399    
400     While there are many calls, the most important ones are the statement
401     execute calls. There are only two in F<dbdimp.c>, one call in
402     C<mysql_st_internal_execute41>, and one in C<dbd_st_execute>, both calling
403     the undocumented internal C<mysql_st_internal_execute> function.
404    
405     The difference is that the former is used with mysql 4.1+ and prepared
406     statements.
407    
408     The call in C<dbd_st_execute> is easy, as it does all the important work
409     and doesn't access any perl data structures (I checked C<DBIc_NUM_PARAMS>
410     manually to make sure):
411    
412     perlinterp_release ();
413     imp_sth->row_num= mysql_st_internal_execute(
414     sth,
415     *statement,
416     NULL,
417     DBIc_NUM_PARAMS(imp_sth),
418     imp_sth->params,
419     &imp_sth->result,
420     imp_dbh->pmysql,
421     imp_sth->use_mysql_use_result
422     );
423     perlinterp_acquire ();
424    
425     Despite the name, C<mysql_st_internal_execute41> isn't actually from
426     F<libmysqlclient>, but a long function in F<dbdimp.c>. Here is an abridged version, with
427     C<perlinterp_release>/C<perlinterp_acquire> calls:
428    
429     int i;
430     enum enum_field_types enum_type;
431     dTHX;
432     int execute_retval;
433     my_ulonglong rows=0;
434     D_imp_xxh(sth);
435    
436     if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
437     PerlIO_printf(DBIc_LOGPIO(imp_xxh),
438     "\t-> mysql_st_internal_execute41\n");
439    
440     perlinterp_release ();
441    
442     if (num_params > 0 && !(*has_been_bound))
443     {
444     if (mysql_stmt_bind_param(stmt,bind))
445     goto error;
446     }
447    
448     if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
449     {
450 root 1.5 perlinterp_acquire ();
451 root 1.3 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
452     "\t\tmysql_st_internal_execute41 calling mysql_execute with %d num_params\n",
453     num_params);
454 root 1.5 perlinterp_release ();
455 root 1.3 }
456    
457    
458     execute_retval= mysql_stmt_execute(stmt);
459    
460     if (execute_retval)
461     goto error;
462    
463     /*
464     This statement does not return a result set (INSERT, UPDATE...)
465     */
466     if (!(*result= mysql_stmt_result_metadata(stmt)))
467     {
468     if (mysql_stmt_errno(stmt))
469     goto error;
470    
471     rows= mysql_stmt_affected_rows(stmt);
472     }
473     /*
474     This statement returns a result set (SELECT...)
475     */
476     else
477     {
478     for (i = mysql_stmt_field_count(stmt) - 1; i >=0; --i) {
479     enum_type = mysql_to_perl_type(stmt->fields[i].type);
480     if (enum_type != MYSQL_TYPE_DOUBLE && enum_type != MYSQL_TYPE_LONG)
481     {
482     /* mysql_stmt_store_result to update MYSQL_FIELD->max_length */
483     my_bool on = 1;
484     mysql_stmt_attr_set(stmt, STMT_ATTR_UPDATE_MAX_LENGTH, &on);
485     break;
486     }
487     }
488     /* Get the total rows affected and return */
489     if (mysql_stmt_store_result(stmt))
490     goto error;
491     else
492     rows= mysql_stmt_num_rows(stmt);
493     }
494     perlinterp_acquire ();
495     if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
496     PerlIO_printf(DBIc_LOGPIO(imp_xxh),
497     "\t<- mysql_internal_execute_41 returning %d rows\n",
498     (int) rows);
499     return(rows);
500    
501     error:
502     if (*result)
503     {
504     mysql_free_result(*result);
505     *result= 0;
506     }
507     perlinterp_acquire ();
508     if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
509     PerlIO_printf(DBIc_LOGPIO(imp_xxh),
510     " errno %d err message %s\n",
511     mysql_stmt_errno(stmt),
512     mysql_stmt_error(stmt));
513    
514     So C<perlinterp_release> is called after some logging, but before the
515     C<mysql_free_result> call.
516    
517     To make things more interesting, the function has multiple calls to
518     C<PerlIO> to log things, all of which aren't thread-safe, and need to be
519     surrounded with C<perlinterp_acquire> and C<pelrinterp_release> calls
520     to temporarily re-acquire the interpreter. This is slow, but logging is
521     normally off:
522    
523     if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
524     {
525 root 1.5 perlinterp_acquire ();
526 root 1.3 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
527     "\t\tmysql_st_internal_execute41 calling mysql_execute with %d num_params\n",
528     num_params);
529 root 1.5 perlinterp_release ();
530 root 1.3 }
531    
532     The function also has a separate error exit, each of which needs it's own
533     C<perlinterp_acquire> call. First the normal function exit:
534    
535     perlinterp_acquire ();
536     if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
537     PerlIO_printf(DBIc_LOGPIO(imp_xxh),
538     "\t<- mysql_internal_execute_41 returning %d rows\n",
539     (int) rows);
540     return(rows);
541    
542     And this is the error exit:
543    
544     error:
545     if (*result)
546     {
547     mysql_free_result(*result);
548     *result= 0;
549     }
550     perlinterp_acquire ();
551    
552     This is enough to run DBI's C<execute> calls in separate threads.
553    
554     =head3 Interlude: the various C<DBD::mysql> async mechanisms
555    
556     Here is a short discussion of the four principal ways to run
557     C<DBD::mysql> SQL queries asynchronously.
558    
559     =over 4
560    
561     =item in a separate process
562    
563     Both C<AnyEvent::DBI> and C<DBD::Gofer> (via
564     C<DBD::Gofer::Transport::corostream>) can run C<DBI> calls in a separate
565     process, and this is not limited to mysql. This has to be paid with more
566     complex management, some limitations in what can be done, and an extra
567     serailisation/deserialisation step for all data.
568    
569     =item C<DBD::mysql>'s async support
570    
571     This let's you execute the SQL query, while waiting for the results
572     via an event loop or similar mechanism. This is reasonably fast and
573     very compatible, but the disadvantage are that C<DBD::mysql> requires
574     undocumented internal functions to do this, and more importantly, this
575     only covers the actual execution phase, not the data transfer phase:
576     for statements with large results, the program blocks till all of it is
577     transferred, which can include large amounts of disk I/O.
578    
579     =item C<Coro::Mysql>
580    
581     This module actually works quite similar to the perl multicore, but uses
582     Coro threads exclusively. It shares the advantages of C<DBD::mysql>'s
583     async mode, but not, at least in theory, it's disadvantages. In practise,
584     the mechanism it uses isn't undocumented, but distributions often don't
585     come with the correct header file needed top use it, and oracle's mysql
586     has broken whtis mechanism multiple times (mariadb supports it), so it's
587     actually less reliably available than C<DBD::mysql>'s async mode or perl
588     multicore.
589    
590     It also requires C<Coro>.
591    
592     =item perl multicore
593    
594     This method has all the advantages of C<Coro::Mysql> without most
595     disadvantages, except that it incurs higher overhead due to the extra
596     thread switching.
597    
598     =back
599    
600     Pick your poison.
601    
602    
603 root 1.11 =head1 SEE ALSO
604    
605     This document's canonical web address: L<http://perlmulticore.schmorp.de/>
606    
607     The header file you need in your XS module: L<http://perlmulticore.schmorp.de/perlmulticore.h>
608    
609     Status of CPAN modules, and pre-patched module tarballs: L<http://perlmulticore.schmorp.de/registry>
610    
611    
612 root 1.1 =head1 AUTHOR
613    
614     Marc A. Lehmann <perlmulticore@schmorp.de>
615     http://perlmulticore.schmorp.de/
616    
617     =head1 LICENSE
618    
619 root 1.4 The F<perlmulticore.h> header file itself is in the public
620 root 1.1 domain. Where this is legally not possible, or at your
621 root 1.9 option, it can be licensed under the creative commons CC0
622 root 1.1 license: L<https://creativecommons.org/publicdomain/zero/1.0/>.
623    
624 root 1.4 This document is licensed under the General Public License, version
625     3.0, or any later version.
626