ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/perlmulticore/perlmulticore.pod
Revision: 1.9
Committed: Sun Mar 3 10:59:48 2019 UTC (5 years, 7 months ago) by root
Branch: MAIN
Changes since 1.8: +1 -1 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     The Perl Multicore Specification and Implementation
4    
5     =head1 SYNOPSIS
6    
7 root 1.6 #include "perlmulticore.h"
8 root 1.1
9     // in your XS function:
10    
11     perlinterp_release ();
12     do_the_C_thing ();
13     perlinterp_acquire ();
14    
15     =head1 DESCRIPTION
16    
17 root 1.2 This specification describes a simple mechanism for XS modules to allow
18 root 1.1 re-use of the perl interpreter for other threads while doing some lengthy
19     operation, such as cryptography, SQL queries, disk I/O and so on.
20    
21 root 1.2 The design goals for this mechanism were to be simple to use, to be
22     extremely low overhead when not active, with both low code and data size
23     overhead and broad applicability.
24 root 1.1
25     The newest version of this document can be found at
26     L<http://perlmulticore.schmorp.de/>.
27    
28 root 1.2 The newest version of the header file that implements this specification
29     can be downloaded from L<http://perlmulticore.schmorp.de/perlmulticore.h>.
30 root 1.1
31 root 1.3 =head2 XS? HOW DO I USE THIS FROM PERL?
32    
33     This document is only about the XS-level mechanism that defines generic
34     callbacks - to make use of this, you need a module that provides an
35     implementation for these callbacks, for example
36     L<Coro::Multicore|http://pod.tst.eu/http://cvs.schmorp.de/Coro-Multicore/Multicore.pm>.
37    
38     =head2 WHICH MODULES SUPPORT IT?
39    
40     You can check L<the perl multicore registry|http://perlmulticore.schmorp.de/registry>
41     for a list of modules that support this specification.
42    
43 root 1.1 =head1 HOW DO I USE THIS IN MY MODULES?
44    
45     The usage is very simple - you include this header file in your XS module. Then, before you
46     do your lengthy operation, you release the perl interpreter:
47    
48     perlinterp_release ();
49    
50     And when you are done with your computation, you acquire it again:
51    
52     perlinterp_acquire ();
53    
54     And that's it. This doesn't load any modules and consists of only a few
55     machine instructions when no module to take advantage of it is loaded.
56    
57     Here is a simple example, an C<flock> wrapper implemented in XS. Unlike
58     perl's built-in C<flock>, it allows other threads (for example, those
59     provided by L<Coro>) to execute, instead of blocking the whole perl
60     interpreter. For the sake of this example, it requires a file descriptor
61     instead of a handle.
62    
63     #include "perlmulticore.h" // this header file
64    
65     // and in the XS portion
66     int flock (int fd, int operation)
67     CODE:
68     perlinterp_release ();
69     RETVAL = flock (fd, operation);
70     perlinterp_acquire ();
71     OUTPUT:
72     RETVAL
73    
74 root 1.3 You cna find more examples In the L<Case Studies> appendix.
75 root 1.1
76     =head2 HOW ABOUT NOT-SO LONG WORK?
77    
78     Sometimes you don't know how long your code will take - in a compression
79     library for example, compressing a few hundred Kilobyte of data can take
80     a while, while 50 Bytes will compress so fast that even attempting to do
81     something else could be more costly than just doing it.
82    
83     This is a very hard problem to solve. The best you can do at the moment is
84     to release the perl interpreter only when you think the work to be done
85     justifies the expense.
86    
87     As a rule of thumb, if you expect to need more than a few thousand cycles,
88     you should release the interpreter, else you shouldn't. When in doubt,
89     release.
90    
91     For example, in a compression library, you might want to do this:
92    
93     if (bytes_to_be_compressed > 2000) perlinterp_release ();
94     do_compress (...);
95     if (bytes_to_be_compressed > 2000) perlinterp_acquire ();
96    
97     Make sure the if conditions are exactly the same and don't change, so you
98     always call acquire when you release, and vice versa.
99    
100     When you don't have a handy indicator, you might still do something
101     useful. For example, if you do some file locking with C<fcntl> and you
102     expect the lock to be available immediately in most cases, you could try
103     with C<F_SETLK> (which doesn't wait), and only release/wait/acquire when
104     the lock couldn't be set:
105    
106     int res = fcntl (fd, F_SETLK, &flock);
107    
108     if (res)
109     {
110     // error, assume lock is held by another process and do it the slow way
111     perlinterp_release ();
112     res = fcntl (fd, F_SETLKW, &flock);
113     perlinterp_acquire ();
114     }
115    
116     =head1 THE HARD AND FAST RULES
117    
118     As with everything, there are a number of rules to follow.
119    
120     =over 4
121    
122     =item I<Never> touch any perl data structures after calling C<perlinterp_release>.
123    
124     Possibly the most important rule of them all, anything perl is
125     completely off-limits after C<perlinterp_release>, until you call
126     C<perlinterp_acquire>, after which you can access perl stuff again.
127    
128     That includes anything in the perl interpreter that you didn't prove to be
129     safe, and didn't prove to be safe in older and future versions of perl:
130     global variables, local perl scalars, even if you are sure nobody accesses
131     them and you only try to "read" their value, and so on.
132    
133     If you need to access perl things, do it before releasing the
134     interpreter with C<perlinterp_release>, or after acquiring it again with
135     C<perlinterp_acquire>.
136    
137     =item I<Always> call C<perlinterp_release> and C<perlinterp_acquire> in pairs.
138    
139     For each C<perlinterp_release> call there must be a C<perlinterp_acquire>
140     call. They don't have to be in the same function, and you can have
141     multiple calls to them, as long as every C<perlinterp_release> call is
142     followed by exactly one C<perlinterp_acquire> call.
143    
144     For example., this would be fine:
145    
146     perlinterp_release ();
147    
148     if (!function_that_fails_with_0_return_value ())
149     {
150     perlinterp_acquire ();
151     croak ("error");
152     // croak doesn't return
153     }
154    
155     perlinterp_acquire ();
156     // do other stuff
157    
158     =item I<Never> nest calls to C<perlinterp_release> and C<perlinterp_acquire>.
159    
160     That simply means that after calling C<perlinterp_release>, you must
161     call C<perlinterp_acquire> before calling C<perlinterp_release>
162     again. Likewise, after C<perlinterp_acquire>, you can call
163     C<perlinterp_release> but not another C<perlinterp_acquire>.
164    
165     =item I<Always> call C<perlinterp_release> first.
166    
167     Also simple: you I<must not> call C<perlinterp_acquire> without having
168     called C<perlinterp_release> before.
169    
170     =item I<Never> underestimate threads.
171    
172     While it's easy to add parallel execution ability to your XS module, it
173     doesn't mean it is safe. After you release the perl interpreter, it's
174     perfectly possible that it will call your XS function in another thread,
175     even while your original function still executes. In other words: your C
176     code must be thread safe, and if you use any library, that library must be
177     thread-safe, too.
178    
179     Always assume that the code between C<perlinterp_release> and
180     C<perlinterp_acquire> is executed in parallel on multiple CPUs at the same
181     time. If your code can't cope with that, you could consider using a mutex
182     to only allow one such execution, which is still better than blocking
183     everybody else from doing anything:
184    
185     static pthread_mutex_t my_mutex = PTHREAD_MUTEX_INITIALIZER;
186    
187     perlinterp_release ();
188     pthread_mutex_lock (&my_mutex);
189     do_your_non_thread_safe_thing ();
190     pthread_mutex_unlock (&my_mutex);
191     perlinterp_acquire ();
192    
193     =item I<Don't> get confused by having to release first.
194    
195     In many real world scenarios, you acquire a resource, do something, then
196     release it again. Don't let this confuse you, with this, you already own
197     the resource (the perl interpreter) so you have to I<release> first, and
198     I<acquire> it again later, not the other way around.
199    
200     =back
201    
202    
203     =head1 DESIGN PRINCIPLES
204    
205     This section discusses how the design goals were reached (you be the
206     judge), how it is implemented, and what overheads this implies.
207    
208     =over 4
209    
210     =item Simple to Use
211    
212     All you have to do is identify the place in your existing code where you
213     stop touching perl stuff, do your actual work, and start touching perl
214     stuff again.
215    
216     Then slap C<perlinterp_release ()> and C<perlinterp_acquire ()> around the
217     actual work code.
218    
219     You have to include F<perlmulticore.h> and distribute it with your XS
220     code, but all these things border on the trivial.
221    
222     =item Very Efficient
223    
224     The definition for C<perlinterp_release> and C<perlinterp_release> is very
225     short:
226    
227     #define perlinterp_release() perl_multicore_api->pmapi_release ()
228     #define perlinterp_acquire() perl_multicore_api->pmapi_acquire ()
229    
230     Both are macros that read a pointer from memory (perl_multicore_api),
231     dereference a function pointer stored at that place, and call the
232     function, which takes no arguments and returns nothing.
233    
234     The first call to C<perlinterp_release> will check for the presence
235     of any supporting module, and if none is loaded, will create a dummy
236     implementation where both C<pmapi_release> and C<pmapi_acquire> execute
237     this function:
238    
239     static void perl_multicore_nop (void) { }
240    
241     So in the case of no magical module being loaded, all calls except the
242     first are two memory accesses and a predictable function call of an empty
243     function.
244    
245     Of course, the overhead is much higher when these functions actually
246     implement anything useful, but you always get what you pay for.
247    
248     With L<Coro::Multicore>, every release/acquire involves two pthread
249     switches, two coro thread switches, a bunch of syscalls, and sometimes
250     interacting with the event loop.
251    
252     A dedicated thread pool such as the one L<IO::AIO> uses could reduce
253     these overheads, and would also reduce the dependencies (L<AnyEvent> is a
254     smaller and more portable dependency than L<Coro>), but it would require a
255     lot more work on the side of the module author wanting to support it than
256     this solution.
257    
258     =item Low Code and Data Size Overhead
259    
260     On a 64 bit system, F<perlmulticore.h> uses exactly C<8> octets (one
261     pointer) of your data segment, to store the C<perl_multicore_api>
262     pointer. In addition it creates a C<16> octet perl string to store the
263     function pointers in, and stores it in a hash provided by perl for this
264     purpose.
265    
266     This is pretty much the equivalent of executing this code:
267    
268     $existing_hash{perl_multicore_api} = "123456781234567812345678";
269    
270     And that's it, which is, as I think, indeed very little.
271    
272     As for code size, on my amd64 system, every call to C<perlinterp_release>
273     or C<perlinterp_acquire> results in a variation of the following 9-10
274     octet sequence:
275    
276     150> mov 0x200f23(%rip),%rax # <perl_multicore_api>
277     157> callq *0x8(%rax)
278    
279 root 1.7 The biggest part is the initialisation code, which consists of 11 lines of
280 root 1.1 typical XS code. On my system, all the code in F<perlmulticore.h> compiles
281     to less than 160 octets of read-only data.
282    
283     =item Broad Applicability
284    
285     While there are alternative ways to achieve the goal of parallel execution
286     with threads that might be more efficient, this mechanism was chosen
287     because it is very simple to retrofit existing modules with it, and it
288    
289     The design goals for this mechanism were to be simple to use, very
290     efficient when not needed, low code and data size overhead and broad
291     applicability.
292    
293     =back
294    
295    
296     =head1 DISABLING PERL MULTICORE AT COMPILE TIME
297    
298     You can disable the complete perl multicore API by defining the
299     symbol C<PERL_MULTICORE_DISABLE> to C<1> (e.g. by specifying
300     F<-DPERL_MULTICORE_DISABLE> as compiler argument).
301    
302     This will leave no traces of the API in the compiled code, suitable
303     "empty" C<perl_release> and C<perl_acquire> definitions will be provided.
304    
305     This could be added to perl's C<CPPFLAGS> when configuring perl on
306 root 1.8 platforms that do not support threading at all for example, and would
307     reduce the overhead to nothing. It is by no means required, though, as the
308     header will compile and work just fine without any thread support.
309 root 1.1
310    
311 root 1.4 =head1 APPENDIX: CASE STUDIESX<Case Studies>
312 root 1.3
313 root 1.4 This appendix contains some case studies on how to patch existing
314 root 1.3 modules. Unless they are available on CPAN, the patched modules (including
315     diffs), can be found at the perl multicore repository (see L<the
316     perlmulticore registry|http://perlmulticore.schmorp.de/registry>)
317    
318     In addition to the patches shown, the
319     L<perlmulticore.h|http://perlmulticore.schmorp.de/perlmulticore.h> header
320     must be added to the module and included in any XS or C file that uses it.
321    
322    
323     =head2 Case Study: C<Digest::MD5>
324    
325     The C<Digest::MD5> module presents some unique challenges becausu it mixes
326     Perl-I/O and CPU-based processing.
327    
328     So first let's identify the easy cases - set up (in C<new>) and
329     calculating the final digest are very fast operations and would unlikely
330     profit from running them in a separate thread. Which leaves the C<add>
331     method and the C<md5> (C<md5_hex>, C<md5_base64>) functions.
332    
333     They are both very easy to update - the C<MD5Update> call
334     doesn't access any perl data structures, so you can slap
335     C<perlinterp_release>/C<perlinterp_acquire> around it:
336    
337     if (len > 8000) perlinterp_release ();
338     MD5Update(context, data, len);
339     if (len > 8000) perlinterp_acquire ();
340    
341     This works for both C<add> and C<md5> XS functions. The C<8000> is
342     somewhat arbitrary.
343    
344     This leaves C<addfile>, which would normally be the ideal candidate,
345     because it is often used on large files and needs to wait both for I/O and
346     the CPU. Unfortunately, it is implemented like this (only the inner loop
347     is shown):
348    
349     unsigned char buffer[4096];
350    
351     while ( (n = PerlIO_read(fh, buffer, sizeof(buffer))) > 0) {
352     MD5Update(context, buffer, n);
353     }
354    
355     That is, it uses a 4KB buffer per C<MD5Update>. Putting
356     C<perlinterp_release>/C<perlinterp_acquire> calls around it would be way
357     too inefficient. Ideally, you would want to put them around the whole
358     loop.
359    
360     Unfortunately, C<Digest::MD5> uses C<PerlIO> for the actual I/O, and
361     C<PerlIO> is not thread-safe. We can't even use a mutex, as we would have
362     to protect against all other C<PerlIO> calls.
363    
364     As a compromise, we can use the C<USE_HEAP_INSTEAD_OF_STACK> option that
365     C<Digest::MD5> provide, which puts the buffer onto the stack, and use a
366     far larger buffer:
367    
368     #define USE_HEAP_INSTEAD_OF_STACK
369    
370     New(0, buffer, 1024 * 1024, unsigned char);
371    
372     while ( (n = PerlIO_read(fh, buffer, sizeof(buffer))) > 0) {
373     if (n > 8000) perlinterp_release ();
374     MD5Update(context, buffer, n);
375     if (n > 8000) perlinterp_acquire ();
376     }
377    
378     This will unfortunately still block on I/O, and allocate a large block of
379     memory, but it is better than nothing.
380    
381    
382     =head2 Case Study: C<DBD::mysql>
383    
384     Another example would be to modify C<DBD::mysql> to allow other
385     threads to execute while executing SQL queries.
386    
387     The actual code that needs to be patched is not actually in an F<.xs>
388     file, but in the F<dbdimp.c> file, which is included in an XS file.
389    
390     While there are many calls, the most important ones are the statement
391     execute calls. There are only two in F<dbdimp.c>, one call in
392     C<mysql_st_internal_execute41>, and one in C<dbd_st_execute>, both calling
393     the undocumented internal C<mysql_st_internal_execute> function.
394    
395     The difference is that the former is used with mysql 4.1+ and prepared
396     statements.
397    
398     The call in C<dbd_st_execute> is easy, as it does all the important work
399     and doesn't access any perl data structures (I checked C<DBIc_NUM_PARAMS>
400     manually to make sure):
401    
402     perlinterp_release ();
403     imp_sth->row_num= mysql_st_internal_execute(
404     sth,
405     *statement,
406     NULL,
407     DBIc_NUM_PARAMS(imp_sth),
408     imp_sth->params,
409     &imp_sth->result,
410     imp_dbh->pmysql,
411     imp_sth->use_mysql_use_result
412     );
413     perlinterp_acquire ();
414    
415     Despite the name, C<mysql_st_internal_execute41> isn't actually from
416     F<libmysqlclient>, but a long function in F<dbdimp.c>. Here is an abridged version, with
417     C<perlinterp_release>/C<perlinterp_acquire> calls:
418    
419     int i;
420     enum enum_field_types enum_type;
421     dTHX;
422     int execute_retval;
423     my_ulonglong rows=0;
424     D_imp_xxh(sth);
425    
426     if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
427     PerlIO_printf(DBIc_LOGPIO(imp_xxh),
428     "\t-> mysql_st_internal_execute41\n");
429    
430     perlinterp_release ();
431    
432     if (num_params > 0 && !(*has_been_bound))
433     {
434     if (mysql_stmt_bind_param(stmt,bind))
435     goto error;
436     }
437    
438     if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
439     {
440 root 1.5 perlinterp_acquire ();
441 root 1.3 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
442     "\t\tmysql_st_internal_execute41 calling mysql_execute with %d num_params\n",
443     num_params);
444 root 1.5 perlinterp_release ();
445 root 1.3 }
446    
447    
448     execute_retval= mysql_stmt_execute(stmt);
449    
450     if (execute_retval)
451     goto error;
452    
453     /*
454     This statement does not return a result set (INSERT, UPDATE...)
455     */
456     if (!(*result= mysql_stmt_result_metadata(stmt)))
457     {
458     if (mysql_stmt_errno(stmt))
459     goto error;
460    
461     rows= mysql_stmt_affected_rows(stmt);
462     }
463     /*
464     This statement returns a result set (SELECT...)
465     */
466     else
467     {
468     for (i = mysql_stmt_field_count(stmt) - 1; i >=0; --i) {
469     enum_type = mysql_to_perl_type(stmt->fields[i].type);
470     if (enum_type != MYSQL_TYPE_DOUBLE && enum_type != MYSQL_TYPE_LONG)
471     {
472     /* mysql_stmt_store_result to update MYSQL_FIELD->max_length */
473     my_bool on = 1;
474     mysql_stmt_attr_set(stmt, STMT_ATTR_UPDATE_MAX_LENGTH, &on);
475     break;
476     }
477     }
478     /* Get the total rows affected and return */
479     if (mysql_stmt_store_result(stmt))
480     goto error;
481     else
482     rows= mysql_stmt_num_rows(stmt);
483     }
484     perlinterp_acquire ();
485     if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
486     PerlIO_printf(DBIc_LOGPIO(imp_xxh),
487     "\t<- mysql_internal_execute_41 returning %d rows\n",
488     (int) rows);
489     return(rows);
490    
491     error:
492     if (*result)
493     {
494     mysql_free_result(*result);
495     *result= 0;
496     }
497     perlinterp_acquire ();
498     if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
499     PerlIO_printf(DBIc_LOGPIO(imp_xxh),
500     " errno %d err message %s\n",
501     mysql_stmt_errno(stmt),
502     mysql_stmt_error(stmt));
503    
504     So C<perlinterp_release> is called after some logging, but before the
505     C<mysql_free_result> call.
506    
507     To make things more interesting, the function has multiple calls to
508     C<PerlIO> to log things, all of which aren't thread-safe, and need to be
509     surrounded with C<perlinterp_acquire> and C<pelrinterp_release> calls
510     to temporarily re-acquire the interpreter. This is slow, but logging is
511     normally off:
512    
513     if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
514     {
515 root 1.5 perlinterp_acquire ();
516 root 1.3 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
517     "\t\tmysql_st_internal_execute41 calling mysql_execute with %d num_params\n",
518     num_params);
519 root 1.5 perlinterp_release ();
520 root 1.3 }
521    
522     The function also has a separate error exit, each of which needs it's own
523     C<perlinterp_acquire> call. First the normal function exit:
524    
525     perlinterp_acquire ();
526     if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
527     PerlIO_printf(DBIc_LOGPIO(imp_xxh),
528     "\t<- mysql_internal_execute_41 returning %d rows\n",
529     (int) rows);
530     return(rows);
531    
532     And this is the error exit:
533    
534     error:
535     if (*result)
536     {
537     mysql_free_result(*result);
538     *result= 0;
539     }
540     perlinterp_acquire ();
541    
542     This is enough to run DBI's C<execute> calls in separate threads.
543    
544     =head3 Interlude: the various C<DBD::mysql> async mechanisms
545    
546     Here is a short discussion of the four principal ways to run
547     C<DBD::mysql> SQL queries asynchronously.
548    
549     =over 4
550    
551     =item in a separate process
552    
553     Both C<AnyEvent::DBI> and C<DBD::Gofer> (via
554     C<DBD::Gofer::Transport::corostream>) can run C<DBI> calls in a separate
555     process, and this is not limited to mysql. This has to be paid with more
556     complex management, some limitations in what can be done, and an extra
557     serailisation/deserialisation step for all data.
558    
559     =item C<DBD::mysql>'s async support
560    
561     This let's you execute the SQL query, while waiting for the results
562     via an event loop or similar mechanism. This is reasonably fast and
563     very compatible, but the disadvantage are that C<DBD::mysql> requires
564     undocumented internal functions to do this, and more importantly, this
565     only covers the actual execution phase, not the data transfer phase:
566     for statements with large results, the program blocks till all of it is
567     transferred, which can include large amounts of disk I/O.
568    
569     =item C<Coro::Mysql>
570    
571     This module actually works quite similar to the perl multicore, but uses
572     Coro threads exclusively. It shares the advantages of C<DBD::mysql>'s
573     async mode, but not, at least in theory, it's disadvantages. In practise,
574     the mechanism it uses isn't undocumented, but distributions often don't
575     come with the correct header file needed top use it, and oracle's mysql
576     has broken whtis mechanism multiple times (mariadb supports it), so it's
577     actually less reliably available than C<DBD::mysql>'s async mode or perl
578     multicore.
579    
580     It also requires C<Coro>.
581    
582     =item perl multicore
583    
584     This method has all the advantages of C<Coro::Mysql> without most
585     disadvantages, except that it incurs higher overhead due to the extra
586     thread switching.
587    
588     =back
589    
590     Pick your poison.
591    
592    
593 root 1.1 =head1 AUTHOR
594    
595     Marc A. Lehmann <perlmulticore@schmorp.de>
596     http://perlmulticore.schmorp.de/
597    
598     =head1 LICENSE
599    
600 root 1.4 The F<perlmulticore.h> header file itself is in the public
601 root 1.1 domain. Where this is legally not possible, or at your
602 root 1.9 option, it can be licensed under the creative commons CC0
603 root 1.1 license: L<https://creativecommons.org/publicdomain/zero/1.0/>.
604    
605 root 1.4 This document is licensed under the General Public License, version
606     3.0, or any later version.
607