ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/perlmulticore/perlmulticore.pod
Revision: 1.10
Committed: Mon Mar 4 06:34:59 2019 UTC (5 years, 7 months ago) by root
Branch: MAIN
Changes since 1.9: +9 -3 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 The Perl Multicore Specification and Implementation
4
5 =head1 SYNOPSIS
6
7 #include "perlmulticore.h"
8
9 // in your XS function:
10
11 perlinterp_release ();
12 do_the_C_thing ();
13 perlinterp_acquire ();
14
15 =head1 DESCRIPTION
16
17 This specification describes a simple mechanism for XS modules to allow
18 re-use of the perl interpreter for other threads while doing some lengthy
19 operation, such as cryptography, SQL queries, disk I/O and so on.
20
21 The design goals for this mechanism were to be simple to use, to be
22 extremely low overhead when not active, with both low code and data size
23 overhead and broad applicability.
24
25 The newest version of this document can be found at
26 L<http://perlmulticore.schmorp.de/>.
27
28 The newest version of the header file that implements this specification
29 can be downloaded from L<http://perlmulticore.schmorp.de/perlmulticore.h>.
30
31 =head2 XS? HOW DO I USE THIS FROM PERL?
32
33 This document is only about the XS-level mechanism that defines generic
34 callbacks - to make use of this, you need a module that provides an
35 implementation for these callbacks, for example
36 L<Coro::Multicore|http://pod.tst.eu/http://cvs.schmorp.de/Coro-Multicore/Multicore.pm>.
37
38 =head2 WHICH MODULES SUPPORT IT?
39
40 You can check L<the perl multicore registry|http://perlmulticore.schmorp.de/registry>
41 for a list of modules that support this specification.
42
43 =head1 HOW DO I USE THIS IN MY MODULES?
44
45 The usage is very simple - you include this header file in your XS module. Then, before you
46 do your lengthy operation, you release the perl interpreter:
47
48 perlinterp_release ();
49
50 And when you are done with your computation, you acquire it again:
51
52 perlinterp_acquire ();
53
54 And that's it. This doesn't load any modules and consists of only a few
55 machine instructions when no module to take advantage of it is loaded.
56
57 Here is a simple example, an C<flock> wrapper implemented in XS. Unlike
58 perl's built-in C<flock>, it allows other threads (for example, those
59 provided by L<Coro>) to execute, instead of blocking the whole perl
60 interpreter. For the sake of this example, it requires a file descriptor
61 instead of a handle.
62
63 #include "perlmulticore.h" // this header file
64
65 // and in the XS portion
66 int flock (int fd, int operation)
67 CODE:
68 perlinterp_release ();
69 RETVAL = flock (fd, operation);
70 perlinterp_acquire ();
71 OUTPUT:
72 RETVAL
73
74 You cna find more examples In the L<Case Studies> appendix.
75
76 =head2 HOW ABOUT NOT-SO LONG WORK?
77
78 Sometimes you don't know how long your code will take - in a compression
79 library for example, compressing a few hundred Kilobyte of data can take
80 a while, while 50 Bytes will compress so fast that even attempting to do
81 something else could be more costly than just doing it.
82
83 This is a very hard problem to solve. The best you can do at the moment is
84 to release the perl interpreter only when you think the work to be done
85 justifies the expense.
86
87 As a rule of thumb, if you expect to need more than a few thousand cycles,
88 you should release the interpreter, else you shouldn't. When in doubt,
89 release.
90
91 For example, in a compression library, you might want to do this:
92
93 if (bytes_to_be_compressed > 2000) perlinterp_release ();
94 do_compress (...);
95 if (bytes_to_be_compressed > 2000) perlinterp_acquire ();
96
97 Make sure the if conditions are exactly the same and don't change, so you
98 always call acquire when you release, and vice versa.
99
100 When you don't have a handy indicator, you might still do something
101 useful. For example, if you do some file locking with C<fcntl> and you
102 expect the lock to be available immediately in most cases, you could try
103 with C<F_SETLK> (which doesn't wait), and only release/wait/acquire when
104 the lock couldn't be set:
105
106 int res = fcntl (fd, F_SETLK, &flock);
107
108 if (res)
109 {
110 // error, assume lock is held by another process and do it the slow way
111 perlinterp_release ();
112 res = fcntl (fd, F_SETLKW, &flock);
113 perlinterp_acquire ();
114 }
115
116 =head1 THE HARD AND FAST RULES
117
118 As with everything, there are a number of rules to follow.
119
120 =over 4
121
122 =item I<Never> touch any perl data structures after calling C<perlinterp_release>.
123
124 Possibly the most important rule of them all, anything perl is
125 completely off-limits after C<perlinterp_release>, until you call
126 C<perlinterp_acquire>, after which you can access perl stuff again.
127
128 That includes anything in the perl interpreter that you didn't prove to be
129 safe, and didn't prove to be safe in older and future versions of perl:
130 global variables, local perl scalars, even if you are sure nobody accesses
131 them and you only try to "read" their value, and so on.
132
133 If you need to access perl things, do it before releasing the
134 interpreter with C<perlinterp_release>, or after acquiring it again with
135 C<perlinterp_acquire>.
136
137 =item I<Always> call C<perlinterp_release> and C<perlinterp_acquire> in pairs.
138
139 For each C<perlinterp_release> call there must be a C<perlinterp_acquire>
140 call. They don't have to be in the same function, and you can have
141 multiple calls to them, as long as every C<perlinterp_release> call is
142 followed by exactly one C<perlinterp_acquire> call.
143
144 For example., this would be fine:
145
146 perlinterp_release ();
147
148 if (!function_that_fails_with_0_return_value ())
149 {
150 perlinterp_acquire ();
151 croak ("error");
152 // croak doesn't return
153 }
154
155 perlinterp_acquire ();
156 // do other stuff
157
158 =item I<Never> nest calls to C<perlinterp_release> and C<perlinterp_acquire>.
159
160 That simply means that after calling C<perlinterp_release>, you must
161 call C<perlinterp_acquire> before calling C<perlinterp_release>
162 again. Likewise, after C<perlinterp_acquire>, you can call
163 C<perlinterp_release> but not another C<perlinterp_acquire>.
164
165 =item I<Always> call C<perlinterp_release> first.
166
167 Also simple: you I<must not> call C<perlinterp_acquire> without having
168 called C<perlinterp_release> before.
169
170 =item I<Never> underestimate threads.
171
172 While it's easy to add parallel execution ability to your XS module, it
173 doesn't mean it is safe. After you release the perl interpreter, it's
174 perfectly possible that it will call your XS function in another thread,
175 even while your original function still executes. In other words: your C
176 code must be thread safe, and if you use any library, that library must be
177 thread-safe, too.
178
179 Always assume that the code between C<perlinterp_release> and
180 C<perlinterp_acquire> is executed in parallel on multiple CPUs at the same
181 time. If your code can't cope with that, you could consider using a mutex
182 to only allow one such execution, which is still better than blocking
183 everybody else from doing anything:
184
185 static pthread_mutex_t my_mutex = PTHREAD_MUTEX_INITIALIZER;
186
187 perlinterp_release ();
188 pthread_mutex_lock (&my_mutex);
189 do_your_non_thread_safe_thing ();
190 pthread_mutex_unlock (&my_mutex);
191 perlinterp_acquire ();
192
193 =item I<Don't> get confused by having to release first.
194
195 In many real world scenarios, you acquire a resource, do something, then
196 release it again. Don't let this confuse you, with this, you already own
197 the resource (the perl interpreter) so you have to I<release> first, and
198 I<acquire> it again later, not the other way around.
199
200 =back
201
202
203 =head1 DESIGN PRINCIPLES
204
205 This section discusses how the design goals were reached (you be the
206 judge), how it is implemented, and what overheads this implies.
207
208 =over 4
209
210 =item Simple to Use
211
212 All you have to do is identify the place in your existing code where you
213 stop touching perl stuff, do your actual work, and start touching perl
214 stuff again.
215
216 Then slap C<perlinterp_release ()> and C<perlinterp_acquire ()> around the
217 actual work code.
218
219 You have to include F<perlmulticore.h> and distribute it with your XS
220 code, but all these things border on the trivial.
221
222 =item Very Efficient
223
224 The definition for C<perlinterp_release> and C<perlinterp_release> is very
225 short:
226
227 #define perlinterp_release() perl_multicore_api->pmapi_release ()
228 #define perlinterp_acquire() perl_multicore_api->pmapi_acquire ()
229
230 Both are macros that read a pointer from memory (perl_multicore_api),
231 dereference a function pointer stored at that place, and call the
232 function, which takes no arguments and returns nothing.
233
234 The first call to C<perlinterp_release> will check for the presence
235 of any supporting module, and if none is loaded, will create a dummy
236 implementation where both C<pmapi_release> and C<pmapi_acquire> execute
237 this function:
238
239 static void perl_multicore_nop (void) { }
240
241 So in the case of no magical module being loaded, all calls except the
242 first are two memory accesses and a predictable function call of an empty
243 function.
244
245 Of course, the overhead is much higher when these functions actually
246 implement anything useful, but you always get what you pay for.
247
248 With L<Coro::Multicore>, every release/acquire involves two pthread
249 switches, two coro thread switches, a bunch of syscalls, and sometimes
250 interacting with the event loop.
251
252 A dedicated thread pool such as the one L<IO::AIO> uses could reduce
253 these overheads, and would also reduce the dependencies (L<AnyEvent> is a
254 smaller and more portable dependency than L<Coro>), but it would require a
255 lot more work on the side of the module author wanting to support it than
256 this solution.
257
258 =item Low Code and Data Size Overhead
259
260 On a 64 bit system, F<perlmulticore.h> uses exactly C<8> octets (one
261 pointer) of your data segment, to store the C<perl_multicore_api>
262 pointer. In addition it creates a C<16> octet perl string to store the
263 function pointers in, and stores it in a hash provided by perl for this
264 purpose.
265
266 This is pretty much the equivalent of executing this code:
267
268 $existing_hash{perl_multicore_api} = "123456781234567812345678";
269
270 And that's it, which is, as I think, indeed very little.
271
272 As for code size and speed, on my amd64 system, every call to
273 C<perlinterp_release> or C<perlinterp_acquire> results in a variation of
274 the following 9-10 octet sequence which is easy to predict for modern
275 CPUs, as the function pointer is constant after initialisation:
276
277 150> mov 0x200f23(%rip),%rax # <perl_multicore_api>
278 157> callq *0x8(%rax)
279
280 The actual function being called when no backend is installed or enabled
281 looks like this:
282
283 1310> retq
284
285 The biggest part is the initialisation code, which consists of 11 lines of
286 typical XS code. On my system, all the code in F<perlmulticore.h> compiles
287 to less than 160 octets of read-only data.
288
289 =item Broad Applicability
290
291 While there are alternative ways to achieve the goal of parallel execution
292 with threads that might be more efficient, this mechanism was chosen
293 because it is very simple to retrofit existing modules with it, and it
294
295 The design goals for this mechanism were to be simple to use, very
296 efficient when not needed, low code and data size overhead and broad
297 applicability.
298
299 =back
300
301
302 =head1 DISABLING PERL MULTICORE AT COMPILE TIME
303
304 You can disable the complete perl multicore API by defining the
305 symbol C<PERL_MULTICORE_DISABLE> to C<1> (e.g. by specifying
306 F<-DPERL_MULTICORE_DISABLE> as compiler argument).
307
308 This will leave no traces of the API in the compiled code, suitable
309 "empty" C<perl_release> and C<perl_acquire> definitions will be provided.
310
311 This could be added to perl's C<CPPFLAGS> when configuring perl on
312 platforms that do not support threading at all for example, and would
313 reduce the overhead to nothing. It is by no means required, though, as the
314 header will compile and work just fine without any thread support.
315
316
317 =head1 APPENDIX: CASE STUDIESX<Case Studies>
318
319 This appendix contains some case studies on how to patch existing
320 modules. Unless they are available on CPAN, the patched modules (including
321 diffs), can be found at the perl multicore repository (see L<the
322 perlmulticore registry|http://perlmulticore.schmorp.de/registry>)
323
324 In addition to the patches shown, the
325 L<perlmulticore.h|http://perlmulticore.schmorp.de/perlmulticore.h> header
326 must be added to the module and included in any XS or C file that uses it.
327
328
329 =head2 Case Study: C<Digest::MD5>
330
331 The C<Digest::MD5> module presents some unique challenges becausu it mixes
332 Perl-I/O and CPU-based processing.
333
334 So first let's identify the easy cases - set up (in C<new>) and
335 calculating the final digest are very fast operations and would unlikely
336 profit from running them in a separate thread. Which leaves the C<add>
337 method and the C<md5> (C<md5_hex>, C<md5_base64>) functions.
338
339 They are both very easy to update - the C<MD5Update> call
340 doesn't access any perl data structures, so you can slap
341 C<perlinterp_release>/C<perlinterp_acquire> around it:
342
343 if (len > 8000) perlinterp_release ();
344 MD5Update(context, data, len);
345 if (len > 8000) perlinterp_acquire ();
346
347 This works for both C<add> and C<md5> XS functions. The C<8000> is
348 somewhat arbitrary.
349
350 This leaves C<addfile>, which would normally be the ideal candidate,
351 because it is often used on large files and needs to wait both for I/O and
352 the CPU. Unfortunately, it is implemented like this (only the inner loop
353 is shown):
354
355 unsigned char buffer[4096];
356
357 while ( (n = PerlIO_read(fh, buffer, sizeof(buffer))) > 0) {
358 MD5Update(context, buffer, n);
359 }
360
361 That is, it uses a 4KB buffer per C<MD5Update>. Putting
362 C<perlinterp_release>/C<perlinterp_acquire> calls around it would be way
363 too inefficient. Ideally, you would want to put them around the whole
364 loop.
365
366 Unfortunately, C<Digest::MD5> uses C<PerlIO> for the actual I/O, and
367 C<PerlIO> is not thread-safe. We can't even use a mutex, as we would have
368 to protect against all other C<PerlIO> calls.
369
370 As a compromise, we can use the C<USE_HEAP_INSTEAD_OF_STACK> option that
371 C<Digest::MD5> provide, which puts the buffer onto the stack, and use a
372 far larger buffer:
373
374 #define USE_HEAP_INSTEAD_OF_STACK
375
376 New(0, buffer, 1024 * 1024, unsigned char);
377
378 while ( (n = PerlIO_read(fh, buffer, sizeof(buffer))) > 0) {
379 if (n > 8000) perlinterp_release ();
380 MD5Update(context, buffer, n);
381 if (n > 8000) perlinterp_acquire ();
382 }
383
384 This will unfortunately still block on I/O, and allocate a large block of
385 memory, but it is better than nothing.
386
387
388 =head2 Case Study: C<DBD::mysql>
389
390 Another example would be to modify C<DBD::mysql> to allow other
391 threads to execute while executing SQL queries.
392
393 The actual code that needs to be patched is not actually in an F<.xs>
394 file, but in the F<dbdimp.c> file, which is included in an XS file.
395
396 While there are many calls, the most important ones are the statement
397 execute calls. There are only two in F<dbdimp.c>, one call in
398 C<mysql_st_internal_execute41>, and one in C<dbd_st_execute>, both calling
399 the undocumented internal C<mysql_st_internal_execute> function.
400
401 The difference is that the former is used with mysql 4.1+ and prepared
402 statements.
403
404 The call in C<dbd_st_execute> is easy, as it does all the important work
405 and doesn't access any perl data structures (I checked C<DBIc_NUM_PARAMS>
406 manually to make sure):
407
408 perlinterp_release ();
409 imp_sth->row_num= mysql_st_internal_execute(
410 sth,
411 *statement,
412 NULL,
413 DBIc_NUM_PARAMS(imp_sth),
414 imp_sth->params,
415 &imp_sth->result,
416 imp_dbh->pmysql,
417 imp_sth->use_mysql_use_result
418 );
419 perlinterp_acquire ();
420
421 Despite the name, C<mysql_st_internal_execute41> isn't actually from
422 F<libmysqlclient>, but a long function in F<dbdimp.c>. Here is an abridged version, with
423 C<perlinterp_release>/C<perlinterp_acquire> calls:
424
425 int i;
426 enum enum_field_types enum_type;
427 dTHX;
428 int execute_retval;
429 my_ulonglong rows=0;
430 D_imp_xxh(sth);
431
432 if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
433 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
434 "\t-> mysql_st_internal_execute41\n");
435
436 perlinterp_release ();
437
438 if (num_params > 0 && !(*has_been_bound))
439 {
440 if (mysql_stmt_bind_param(stmt,bind))
441 goto error;
442 }
443
444 if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
445 {
446 perlinterp_acquire ();
447 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
448 "\t\tmysql_st_internal_execute41 calling mysql_execute with %d num_params\n",
449 num_params);
450 perlinterp_release ();
451 }
452
453
454 execute_retval= mysql_stmt_execute(stmt);
455
456 if (execute_retval)
457 goto error;
458
459 /*
460 This statement does not return a result set (INSERT, UPDATE...)
461 */
462 if (!(*result= mysql_stmt_result_metadata(stmt)))
463 {
464 if (mysql_stmt_errno(stmt))
465 goto error;
466
467 rows= mysql_stmt_affected_rows(stmt);
468 }
469 /*
470 This statement returns a result set (SELECT...)
471 */
472 else
473 {
474 for (i = mysql_stmt_field_count(stmt) - 1; i >=0; --i) {
475 enum_type = mysql_to_perl_type(stmt->fields[i].type);
476 if (enum_type != MYSQL_TYPE_DOUBLE && enum_type != MYSQL_TYPE_LONG)
477 {
478 /* mysql_stmt_store_result to update MYSQL_FIELD->max_length */
479 my_bool on = 1;
480 mysql_stmt_attr_set(stmt, STMT_ATTR_UPDATE_MAX_LENGTH, &on);
481 break;
482 }
483 }
484 /* Get the total rows affected and return */
485 if (mysql_stmt_store_result(stmt))
486 goto error;
487 else
488 rows= mysql_stmt_num_rows(stmt);
489 }
490 perlinterp_acquire ();
491 if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
492 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
493 "\t<- mysql_internal_execute_41 returning %d rows\n",
494 (int) rows);
495 return(rows);
496
497 error:
498 if (*result)
499 {
500 mysql_free_result(*result);
501 *result= 0;
502 }
503 perlinterp_acquire ();
504 if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
505 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
506 " errno %d err message %s\n",
507 mysql_stmt_errno(stmt),
508 mysql_stmt_error(stmt));
509
510 So C<perlinterp_release> is called after some logging, but before the
511 C<mysql_free_result> call.
512
513 To make things more interesting, the function has multiple calls to
514 C<PerlIO> to log things, all of which aren't thread-safe, and need to be
515 surrounded with C<perlinterp_acquire> and C<pelrinterp_release> calls
516 to temporarily re-acquire the interpreter. This is slow, but logging is
517 normally off:
518
519 if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
520 {
521 perlinterp_acquire ();
522 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
523 "\t\tmysql_st_internal_execute41 calling mysql_execute with %d num_params\n",
524 num_params);
525 perlinterp_release ();
526 }
527
528 The function also has a separate error exit, each of which needs it's own
529 C<perlinterp_acquire> call. First the normal function exit:
530
531 perlinterp_acquire ();
532 if (DBIc_TRACE_LEVEL(imp_xxh) >= 2)
533 PerlIO_printf(DBIc_LOGPIO(imp_xxh),
534 "\t<- mysql_internal_execute_41 returning %d rows\n",
535 (int) rows);
536 return(rows);
537
538 And this is the error exit:
539
540 error:
541 if (*result)
542 {
543 mysql_free_result(*result);
544 *result= 0;
545 }
546 perlinterp_acquire ();
547
548 This is enough to run DBI's C<execute> calls in separate threads.
549
550 =head3 Interlude: the various C<DBD::mysql> async mechanisms
551
552 Here is a short discussion of the four principal ways to run
553 C<DBD::mysql> SQL queries asynchronously.
554
555 =over 4
556
557 =item in a separate process
558
559 Both C<AnyEvent::DBI> and C<DBD::Gofer> (via
560 C<DBD::Gofer::Transport::corostream>) can run C<DBI> calls in a separate
561 process, and this is not limited to mysql. This has to be paid with more
562 complex management, some limitations in what can be done, and an extra
563 serailisation/deserialisation step for all data.
564
565 =item C<DBD::mysql>'s async support
566
567 This let's you execute the SQL query, while waiting for the results
568 via an event loop or similar mechanism. This is reasonably fast and
569 very compatible, but the disadvantage are that C<DBD::mysql> requires
570 undocumented internal functions to do this, and more importantly, this
571 only covers the actual execution phase, not the data transfer phase:
572 for statements with large results, the program blocks till all of it is
573 transferred, which can include large amounts of disk I/O.
574
575 =item C<Coro::Mysql>
576
577 This module actually works quite similar to the perl multicore, but uses
578 Coro threads exclusively. It shares the advantages of C<DBD::mysql>'s
579 async mode, but not, at least in theory, it's disadvantages. In practise,
580 the mechanism it uses isn't undocumented, but distributions often don't
581 come with the correct header file needed top use it, and oracle's mysql
582 has broken whtis mechanism multiple times (mariadb supports it), so it's
583 actually less reliably available than C<DBD::mysql>'s async mode or perl
584 multicore.
585
586 It also requires C<Coro>.
587
588 =item perl multicore
589
590 This method has all the advantages of C<Coro::Mysql> without most
591 disadvantages, except that it incurs higher overhead due to the extra
592 thread switching.
593
594 =back
595
596 Pick your poison.
597
598
599 =head1 AUTHOR
600
601 Marc A. Lehmann <perlmulticore@schmorp.de>
602 http://perlmulticore.schmorp.de/
603
604 =head1 LICENSE
605
606 The F<perlmulticore.h> header file itself is in the public
607 domain. Where this is legally not possible, or at your
608 option, it can be licensed under the creative commons CC0
609 license: L<https://creativecommons.org/publicdomain/zero/1.0/>.
610
611 This document is licensed under the General Public License, version
612 3.0, or any later version.
613