ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/perlmulticore/perlmulticore.pod
Revision: 1.16
Committed: Mon Mar 4 15:41:29 2019 UTC (5 years, 6 months ago) by root
Branch: MAIN
CVS Tags: HEAD
Changes since 1.15: +0 -41 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 The Perl Multicore Specification and Implementation
4
5 =head1 SYNOPSIS
6
7 #include "perlmulticore.h"
8
9 // in your XS function:
10
11 perlinterp_release ();
12 do_the_C_thing ();
13 perlinterp_acquire ();
14
15 =head1 DESCRIPTION
16
17 This specification describes a simple mechanism for XS modules to allow
18 re-use of the perl interpreter for other threads while doing some lengthy
19 operation, such as cryptography, SQL queries, disk I/O and so on.
20
21 The mechanism basically implements the same mechanism that practically
22 all other scripting languages (e.g. python) use when implementing real
23 threads.
24
25 The design goals for this mechanism were to be simple to use, to be
26 extremely low overhead when not active, with both low code and data size
27 overhead and broad applicability.
28
29 The newest version of this document can be found at
30 L<http://perlmulticore.schmorp.de/>.
31
32 The newest version of the header file that implements this specification
33 can be downloaded from L<http://perlmulticore.schmorp.de/perlmulticore.h>.
34
35 =head2 XS? HOW DO I USE THIS FROM PERL?
36
37 This document is only about the XS-level mechanism that defines generic
38 callbacks - to make use of this, you need a module that provides an
39 implementation for these callbacks, for example
40 L<Coro::Multicore|http://pod.tst.eu/http://cvs.schmorp.de/Coro-Multicore/Multicore.pm>.
41
42 =head2 WHICH MODULES SUPPORT IT?
43
44 You can check L<the perl multicore registry|http://perlmulticore.schmorp.de/registry>
45 for a list of modules that support this specification.
46
47 =head1 HOW DO I USE THIS IN MY MODULES?
48
49 The usage is very simple - you include this header file in your XS module. Then, before you
50 do your lengthy operation, you release the perl interpreter:
51
52 perlinterp_release ();
53
54 And when you are done with your computation, you acquire it again:
55
56 perlinterp_acquire ();
57
58 And that's it. This doesn't load any modules and consists of only a few
59 machine instructions when no module to take advantage of it is loaded.
60
61 Here is a simple example, an C<flock> wrapper implemented in XS. Unlike
62 perl's built-in C<flock>, it allows other threads (for example, those
63 provided by L<Coro>) to execute, instead of blocking the whole perl
64 interpreter. For the sake of this example, it requires a file descriptor
65 instead of a handle.
66
67 #include "perlmulticore.h" // this header file
68
69 // and in the XS portion
70 int flock (int fd, int operation)
71 CODE:
72 perlinterp_release ();
73 RETVAL = flock (fd, operation);
74 perlinterp_acquire ();
75 OUTPUT:
76 RETVAL
77
78 You cna find more examples In the L<Case Studies> appendix.
79
80 =head2 HOW ABOUT NOT-SO LONG WORK?
81
82 Sometimes you don't know how long your code will take - in a compression
83 library for example, compressing a few hundred Kilobyte of data can take
84 a while, while 50 Bytes will compress so fast that even attempting to do
85 something else could be more costly than just doing it.
86
87 This is a very hard problem to solve. The best you can do at the moment is
88 to release the perl interpreter only when you think the work to be done
89 justifies the expense.
90
91 As a rule of thumb, if you expect to need more than a few thousand cycles,
92 you should release the interpreter, else you shouldn't. When in doubt,
93 release.
94
95 For example, in a compression library, you might want to do this:
96
97 if (bytes_to_be_compressed > 2000) perlinterp_release ();
98 do_compress (...);
99 if (bytes_to_be_compressed > 2000) perlinterp_acquire ();
100
101 Make sure the if conditions are exactly the same and don't change, so you
102 always call acquire when you release, and vice versa.
103
104 When you don't have a handy indicator, you might still do something
105 useful. For example, if you do some file locking with C<fcntl> and you
106 expect the lock to be available immediately in most cases, you could try
107 with C<F_SETLK> (which doesn't wait), and only release/wait/acquire when
108 the lock couldn't be set:
109
110 int res = fcntl (fd, F_SETLK, &flock);
111
112 if (res)
113 {
114 // error, assume lock is held by another process and do it the slow way
115 perlinterp_release ();
116 res = fcntl (fd, F_SETLKW, &flock);
117 perlinterp_acquire ();
118 }
119
120 =head1 THE HARD AND FAST RULES
121
122 As with everything, there are a number of rules to follow.
123
124 =over 4
125
126 =item I<Never> touch any perl data structures after calling C<perlinterp_release>.
127
128 Possibly the most important rule of them all, anything perl is
129 completely off-limits after C<perlinterp_release>, until you call
130 C<perlinterp_acquire>, after which you can access perl stuff again.
131
132 That includes anything in the perl interpreter that you didn't prove to be
133 safe, and didn't prove to be safe in older and future versions of perl:
134 global variables, local perl scalars, even if you are sure nobody accesses
135 them and you only try to "read" their value, and so on.
136
137 If you need to access perl things, do it before releasing the
138 interpreter with C<perlinterp_release>, or after acquiring it again with
139 C<perlinterp_acquire>.
140
141 =item I<Always> call C<perlinterp_release> and C<perlinterp_acquire> in pairs.
142
143 For each C<perlinterp_release> call there must be a C<perlinterp_acquire>
144 call. They don't have to be in the same function, and you can have
145 multiple calls to them, as long as every C<perlinterp_release> call is
146 followed by exactly one C<perlinterp_acquire> call.
147
148 For example., this would be fine:
149
150 perlinterp_release ();
151
152 if (!function_that_fails_with_0_return_value ())
153 {
154 perlinterp_acquire ();
155 croak ("error");
156 // croak doesn't return
157 }
158
159 perlinterp_acquire ();
160 // do other stuff
161
162 =item I<Never> nest calls to C<perlinterp_release> and C<perlinterp_acquire>.
163
164 That simply means that after calling C<perlinterp_release>, you must
165 call C<perlinterp_acquire> before calling C<perlinterp_release>
166 again. Likewise, after C<perlinterp_acquire>, you can call
167 C<perlinterp_release> but not another C<perlinterp_acquire>.
168
169 =item I<Always> call C<perlinterp_release> first.
170
171 Also simple: you I<must not> call C<perlinterp_acquire> without having
172 called C<perlinterp_release> before.
173
174 =item I<Never> underestimate threads.
175
176 While it's easy to add parallel execution ability to your XS module, it
177 doesn't mean it is safe. After you release the perl interpreter, it's
178 perfectly possible that it will call your XS function in another thread,
179 even while your original function still executes. In other words: your C
180 code must be thread safe, and if you use any library, that library must be
181 thread-safe, too.
182
183 Always assume that the code between C<perlinterp_release> and
184 C<perlinterp_acquire> is executed in parallel on multiple CPUs at the same
185 time. If your code can't cope with that, you could consider using a mutex
186 to only allow one such execution, which is still better than blocking
187 everybody else from doing anything:
188
189 static pthread_mutex_t my_mutex = PTHREAD_MUTEX_INITIALIZER;
190
191 perlinterp_release ();
192 pthread_mutex_lock (&my_mutex);
193 do_your_non_thread_safe_thing ();
194 pthread_mutex_unlock (&my_mutex);
195 perlinterp_acquire ();
196
197 =item I<Don't> get confused by having to release first.
198
199 In many real world scenarios, you acquire a resource, do something, then
200 release it again. Don't let this confuse you, with this, you already own
201 the resource (the perl interpreter) so you have to I<release> first, and
202 I<acquire> it again later, not the other way around.
203
204 =back
205
206
207 =head1 DESIGN PRINCIPLES
208
209 This section discusses how the design goals were reached (you be the
210 judge), how it is implemented, and what overheads this implies.
211
212 =over 4
213
214 =item Simple to Use
215
216 All you have to do is identify the place in your existing code where you
217 stop touching perl stuff, do your actual work, and start touching perl
218 stuff again.
219
220 Then slap C<perlinterp_release ()> and C<perlinterp_acquire ()> around the
221 actual work code.
222
223 You have to include F<perlmulticore.h> and distribute it with your XS
224 code, but all these things border on the trivial.
225
226 =item Very Efficient
227
228 The definition for C<perlinterp_release> and C<perlinterp_release> is very
229 short:
230
231 #define perlinterp_release() perl_multicore_api->pmapi_release ()
232 #define perlinterp_acquire() perl_multicore_api->pmapi_acquire ()
233
234 Both are macros that read a pointer from memory (perl_multicore_api),
235 dereference a function pointer stored at that place, and call the
236 function, which takes no arguments and returns nothing.
237
238 The first call to C<perlinterp_release> will check for the presence
239 of any supporting module, and if none is loaded, will create a dummy
240 implementation where both C<pmapi_release> and C<pmapi_acquire> execute
241 this function:
242
243 static void perl_multicore_nop (void) { }
244
245 So in the case of no magical module being loaded, all calls except the
246 first are two memory accesses and a predictable function call of an empty
247 function.
248
249 Of course, the overhead is much higher when these functions actually
250 implement anything useful, but you always get what you pay for.
251
252 With L<Coro::Multicore>, every release/acquire involves two pthread
253 switches, two coro thread switches, a bunch of syscalls, and sometimes
254 interacting with the event loop.
255
256 A dedicated thread pool such as the one L<IO::AIO> uses could reduce
257 these overheads, and would also reduce the dependencies (L<AnyEvent> is a
258 smaller and more portable dependency than L<Coro>), but it would require a
259 lot more work on the side of the module author wanting to support it than
260 this solution.
261
262 =item Low Code and Data Size Overhead
263
264 On a 64 bit system, F<perlmulticore.h> uses exactly C<8> octets (one
265 pointer) of your data segment, to store the C<perl_multicore_api>
266 pointer. In addition it creates a C<16> octet perl string to store the
267 function pointers in, and stores it in a hash provided by perl for this
268 purpose.
269
270 This is pretty much the equivalent of executing this code:
271
272 $existing_hash{perl_multicore_api} = "123456781234567812345678";
273
274 And that's it, which is, as I think, indeed very little.
275
276 As for code size and speed, on my amd64 system, every call to
277 C<perlinterp_release> or C<perlinterp_acquire> results in a variation of
278 the following 9-10 octet sequence which is easy to predict for modern
279 CPUs, as the function pointer is constant after initialisation:
280
281 150> mov 0x200f23(%rip),%rax # <perl_multicore_api>
282 157> callq *0x8(%rax)
283
284 The actual function being called when no backend is installed or enabled
285 looks like this:
286
287 1310> retq
288
289 The biggest part is the initialisation code, which consists of 11 lines of
290 typical XS code. On my system, all the code in F<perlmulticore.h> compiles
291 to less than 160 octets of read-only data.
292
293 =item Broad Applicability
294
295 While there are alternative ways to achieve the goal of parallel execution
296 with threads that might be more efficient, this mechanism was chosen
297 because it is very simple to retrofit existing modules with it, and it
298
299 The design goals for this mechanism were to be simple to use, very
300 efficient when not needed, low code and data size overhead and broad
301 applicability.
302
303 =back
304
305
306 =head1 DISABLING PERL MULTICORE AT COMPILE TIME
307
308 You can disable the complete perl multicore API by defining the
309 symbol C<PERL_MULTICORE_DISABLE> to C<1> (e.g. by specifying
310 F<-DPERL_MULTICORE_DISABLE> as compiler argument).
311
312 This will leave no traces of the API in the compiled code, suitable
313 "empty" C<perl_release> and C<perl_acquire> definitions will be provided.
314
315 This could be added to perl's C<CPPFLAGS> when configuring perl on
316 platforms that do not support threading at all for example, and would
317 reduce the overhead to nothing. It is by no means required, though, as the
318 header will compile and work just fine without any thread support.
319
320
321 =head1 APPENDIX: CASE STUDIESX<Case Studies>
322
323 This appendix contains some case studies on how to patch existing
324 modules. Unless they are available on CPAN, the patched modules (including
325 diffs), can be found at the perl multicore repository (see L<the
326 perlmulticore registry|http://perlmulticore.schmorp.de/registry>)
327
328 In addition to the patches shown, the
329 L<perlmulticore.h|http://perlmulticore.schmorp.de/perlmulticore.h> header
330 must be added to the module and included in any XS or C file that uses it.
331
332
333 =head2 Case Study: C<Digest::MD5>
334
335 The C<Digest::MD5> module presents some unique challenges becausu it mixes
336 Perl-I/O and CPU-based processing.
337
338 So first let's identify the easy cases - set up (in C<new>) and
339 calculating the final digest are very fast operations and would unlikely
340 profit from running them in a separate thread. Which leaves the C<add>
341 method and the C<md5> (C<md5_hex>, C<md5_base64>) functions.
342
343 They are both very easy to update - the C<MD5Update> call
344 doesn't access any perl data structures, so you can slap
345 C<perlinterp_release>/C<perlinterp_acquire> around it:
346
347 if (len > 8000) perlinterp_release ();
348 MD5Update(context, data, len);
349 if (len > 8000) perlinterp_acquire ();
350
351 This works for both C<add> and C<md5> XS functions. The C<8000> is
352 somewhat arbitrary.
353
354 This leaves C<addfile>, which would normally be the ideal candidate,
355 because it is often used on large files and needs to wait both for I/O and
356 the CPU. Unfortunately, it is implemented like this (only the inner loop
357 is shown):
358
359 unsigned char buffer[4096];
360
361 while ( (n = PerlIO_read(fh, buffer, sizeof(buffer))) > 0) {
362 MD5Update(context, buffer, n);
363 }
364
365 That is, it uses a 4KB buffer per C<MD5Update>. Putting
366 C<perlinterp_release>/C<perlinterp_acquire> calls around it would be way
367 too inefficient. Ideally, you would want to put them around the whole
368 loop.
369
370 Unfortunately, C<Digest::MD5> uses C<PerlIO> for the actual I/O, and
371 C<PerlIO> is not thread-safe. We can't even use a mutex, as we would have
372 to protect against all other C<PerlIO> calls.
373
374 As a compromise, we can use the C<USE_HEAP_INSTEAD_OF_STACK> option that
375 C<Digest::MD5> provide, which puts the buffer onto the stack, and use a
376 far larger buffer:
377
378 #define USE_HEAP_INSTEAD_OF_STACK
379
380 New(0, buffer, 1024 * 1024, unsigned char);
381
382 while ( (n = PerlIO_read(fh, buffer, sizeof(buffer))) > 0) {
383 if (n > 8000) perlinterp_release ();
384 MD5Update(context, buffer, n);
385 if (n > 8000) perlinterp_acquire ();
386 }
387
388 This will unfortunately still block on I/O, and allocate a large block of
389 memory, but it is better than nothing.
390
391
392 =head2 Case Study: C<DBD::mysql>
393
394 Another example would be to modify C<DBD::mysql> to allow other
395 threads to execute while executing SQL queries.
396
397 The actual code that needs to be patched is not actually in an F<.xs>
398 file, but in the F<dbdimp.c> file, which is included in an XS file.
399
400 While there are many calls, the most important ones are the statement
401 execute calls. There are only two in F<dbdimp.c>, one call in
402 C<mysql_st_internal_execute41>, and one in C<dbd_st_execute>, both calling
403 the undocumented internal C<mysql_st_internal_execute> function.
404
405 The difference is that the former is used with mysql 4.1+ and prepared
406 statements.
407
408 The call in C<dbd_st_execute> is easy, as it does all the important work
409 and doesn't access any perl data structures (I checked C<DBIc_NUM_PARAMS>
410 manually to make sure):
411
412 perlinterp_release ();
413 imp_sth->row_num= mysql_st_internal_execute(
414 sth,
415 *statement,
416 NULL,
417 DBIc_NUM_PARAMS(imp_sth),
418 imp_sth->params,
419 &imp_sth->result,
420 imp_dbh->pmysql,
421 imp_sth->use_mysql_use_result
422 );
423 perlinterp_acquire ();
424
425 Despite the name, C<mysql_st_internal_execute41> isn't actually from
426 F<libmysqlclient>, but a long function in F<dbdimp.c>. Here is an abridged version, with
427 C<perlinterp_release>/C<perlinterp_acquire> calls:
428
429 int i;
430 enum enum_field_types enum_type;
431 dTHX;
432 int execute_retval;
433 my_ulonglong rows=0;
434 D_imp_xxh(sth);
435
436 if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
437 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
438 "\t-> mysql_st_internal_execute41\n");
439
440 perlinterp_release ();
441
442 if (num_params > 0 && !(*has_been_bound))
443 {
444 if (mysql_stmt_bind_param(stmt,bind))
445 goto error;
446 }
447
448 if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
449 {
450 perlinterp_acquire ();
451 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
452 "\t\tmysql_st_internal_execute41 calling mysql_execute with %d num_params\n",
453 num_params);
454 perlinterp_release ();
455 }
456
457
458 execute_retval= mysql_stmt_execute(stmt);
459
460 if (execute_retval)
461 goto error;
462
463 /*
464 This statement does not return a result set (INSERT, UPDATE...)
465 */
466 if (!(*result= mysql_stmt_result_metadata(stmt)))
467 {
468 if (mysql_stmt_errno(stmt))
469 goto error;
470
471 rows= mysql_stmt_affected_rows(stmt);
472 }
473 /*
474 This statement returns a result set (SELECT...)
475 */
476 else
477 {
478 for (i = mysql_stmt_field_count(stmt) - 1; i >=0; --i) {
479 enum_type = mysql_to_perl_type(stmt->fields[i].type);
480 if (enum_type != MYSQL_TYPE_DOUBLE && enum_type != MYSQL_TYPE_LONG)
481 {
482 /* mysql_stmt_store_result to update MYSQL_FIELD->max_length */
483 my_bool on = 1;
484 mysql_stmt_attr_set(stmt, STMT_ATTR_UPDATE_MAX_LENGTH, &on);
485 break;
486 }
487 }
488 /* Get the total rows affected and return */
489 if (mysql_stmt_store_result(stmt))
490 goto error;
491 else
492 rows= mysql_stmt_num_rows(stmt);
493 }
494 perlinterp_acquire ();
495 if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
496 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
497 "\t<- mysql_internal_execute_41 returning %d rows\n",
498 (int) rows);
499 return(rows);
500
501 error:
502 if (*result)
503 {
504 mysql_free_result(*result);
505 *result= 0;
506 }
507 perlinterp_acquire ();
508 if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
509 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
510 " errno %d err message %s\n",
511 mysql_stmt_errno(stmt),
512 mysql_stmt_error(stmt));
513
514 So C<perlinterp_release> is called after some logging, but before the
515 C<mysql_free_result> call.
516
517 To make things more interesting, the function has multiple calls to
518 C<PerlIO> to log things, all of which aren't thread-safe, and need to be
519 surrounded with C<perlinterp_acquire> and C<pelrinterp_release> calls
520 to temporarily re-acquire the interpreter. This is slow, but logging is
521 normally off:
522
523 if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
524 {
525 perlinterp_acquire ();
526 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
527 "\t\tmysql_st_internal_execute41 calling mysql_execute with %d num_params\n",
528 num_params);
529 perlinterp_release ();
530 }
531
532 The function also has a separate error exit, each of which needs it's own
533 C<perlinterp_acquire> call. First the normal function exit:
534
535 perlinterp_acquire ();
536 if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
537 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
538 "\t<- mysql_internal_execute_41 returning %d rows\n",
539 (int) rows);
540 return(rows);
541
542 And this is the error exit:
543
544 error:
545 if (*result)
546 {
547 mysql_free_result(*result);
548 *result= 0;
549 }
550 perlinterp_acquire ();
551
552 This is enough to run DBI's C<execute> calls in separate threads.
553
554 =head3 Interlude: the various C<DBD::mysql> async mechanisms
555
556 Here is a short discussion of the four principal ways to run
557 C<DBD::mysql> SQL queries asynchronously.
558
559 =over 4
560
561 =item in a separate process
562
563 Both C<AnyEvent::DBI> and C<DBD::Gofer> (via
564 C<DBD::Gofer::Transport::corostream>) can run C<DBI> calls in a separate
565 process, and this is not limited to mysql. This has to be paid with more
566 complex management, some limitations in what can be done, and an extra
567 serailisation/deserialisation step for all data.
568
569 =item C<DBD::mysql>'s async support
570
571 This let's you execute the SQL query, while waiting for the results
572 via an event loop or similar mechanism. This is reasonably fast and
573 very compatible, but the disadvantage are that C<DBD::mysql> requires
574 undocumented internal functions to do this, and more importantly, this
575 only covers the actual execution phase, not the data transfer phase:
576 for statements with large results, the program blocks till all of it is
577 transferred, which can include large amounts of disk I/O.
578
579 =item C<Coro::Mysql>
580
581 This module actually works quite similar to the perl multicore, but uses
582 Coro threads exclusively. It shares the advantages of C<DBD::mysql>'s
583 async mode, but not, at least in theory, it's disadvantages. In practise,
584 the mechanism it uses isn't undocumented, but distributions often don't
585 come with the correct header file needed top use it, and oracle's mysql
586 has broken whtis mechanism multiple times (mariadb supports it), so it's
587 actually less reliably available than C<DBD::mysql>'s async mode or perl
588 multicore.
589
590 It also requires C<Coro>.
591
592 =item perl multicore
593
594 This method has all the advantages of C<Coro::Mysql> without most
595 disadvantages, except that it incurs higher overhead due to the extra
596 thread switching.
597
598 =back
599
600 Pick your poison.
601
602
603 =head1 SEE ALSO
604
605 This document's canonical web address: L<http://perlmulticore.schmorp.de/>
606
607 The header file you need in your XS module: L<http://perlmulticore.schmorp.de/perlmulticore.h>
608
609 Status of CPAN modules, and pre-patched module tarballs: L<http://perlmulticore.schmorp.de/registry>
610
611
612 =head1 AUTHOR
613
614 Marc A. Lehmann <perlmulticore@schmorp.de>
615 http://perlmulticore.schmorp.de/
616
617 =head1 LICENSE
618
619 The F<perlmulticore.h> header file itself is in the public
620 domain. Where this is legally not possible, or at your
621 option, it can be licensed under the creative commons CC0
622 license: L<https://creativecommons.org/publicdomain/zero/1.0/>.
623
624 This document is licensed under the General Public License, version
625 3.0, or any later version.
626