ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/perlmulticore/perlmulticore.pod
Revision: 1.2
Committed: Thu Jul 2 22:42:24 2015 UTC (9 years, 3 months ago) by root
Branch: MAIN
Changes since 1.1: +6 -6 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     The Perl Multicore Specification and Implementation
4    
5     =head1 SYNOPSIS
6    
7     #include "perlmultiore.h"
8    
9     // in your XS function:
10    
11     perlinterp_release ();
12     do_the_C_thing ();
13     perlinterp_acquire ();
14    
15     =head1 DESCRIPTION
16    
17 root 1.2 This specification describes a simple mechanism for XS modules to allow
18 root 1.1 re-use of the perl interpreter for other threads while doing some lengthy
19     operation, such as cryptography, SQL queries, disk I/O and so on.
20    
21 root 1.2 The design goals for this mechanism were to be simple to use, to be
22     extremely low overhead when not active, with both low code and data size
23     overhead and broad applicability.
24 root 1.1
25     The newest version of this document can be found at
26     L<http://perlmulticore.schmorp.de/>.
27    
28 root 1.2 The newest version of the header file that implements this specification
29     can be downloaded from L<http://perlmulticore.schmorp.de/perlmulticore.h>.
30 root 1.1
31     =head1 HOW DO I USE THIS IN MY MODULES?
32    
33     The usage is very simple - you include this header file in your XS module. Then, before you
34     do your lengthy operation, you release the perl interpreter:
35    
36     perlinterp_release ();
37    
38     And when you are done with your computation, you acquire it again:
39    
40     perlinterp_acquire ();
41    
42     And that's it. This doesn't load any modules and consists of only a few
43     machine instructions when no module to take advantage of it is loaded.
44    
45     Here is a simple example, an C<flock> wrapper implemented in XS. Unlike
46     perl's built-in C<flock>, it allows other threads (for example, those
47     provided by L<Coro>) to execute, instead of blocking the whole perl
48     interpreter. For the sake of this example, it requires a file descriptor
49     instead of a handle.
50    
51     #include "perlmulticore.h" // this header file
52    
53     // and in the XS portion
54     int flock (int fd, int operation)
55     CODE:
56     perlinterp_release ();
57     RETVAL = flock (fd, operation);
58     perlinterp_acquire ();
59     OUTPUT:
60     RETVAL
61    
62     Another example would be to modify L<DBD::mysql> to allow other
63     threads to execute while executing SQL queries. One way to do this
64     is find all C<mysql_st_internal_execute> and similar calls (such as
65     C<mysql_st_internal_execute41>), and adorn them with release/acquire
66     calls:
67    
68     {
69     perlinterp_release ();
70     imp_sth->row_num= mysql_st_internal_execute(sth, ...);
71     perlinterp_acquire ();
72     }
73    
74     =head2 HOW ABOUT NOT-SO LONG WORK?
75    
76     Sometimes you don't know how long your code will take - in a compression
77     library for example, compressing a few hundred Kilobyte of data can take
78     a while, while 50 Bytes will compress so fast that even attempting to do
79     something else could be more costly than just doing it.
80    
81     This is a very hard problem to solve. The best you can do at the moment is
82     to release the perl interpreter only when you think the work to be done
83     justifies the expense.
84    
85     As a rule of thumb, if you expect to need more than a few thousand cycles,
86     you should release the interpreter, else you shouldn't. When in doubt,
87     release.
88    
89     For example, in a compression library, you might want to do this:
90    
91     if (bytes_to_be_compressed > 2000) perlinterp_release ();
92     do_compress (...);
93     if (bytes_to_be_compressed > 2000) perlinterp_acquire ();
94    
95     Make sure the if conditions are exactly the same and don't change, so you
96     always call acquire when you release, and vice versa.
97    
98     When you don't have a handy indicator, you might still do something
99     useful. For example, if you do some file locking with C<fcntl> and you
100     expect the lock to be available immediately in most cases, you could try
101     with C<F_SETLK> (which doesn't wait), and only release/wait/acquire when
102     the lock couldn't be set:
103    
104     int res = fcntl (fd, F_SETLK, &flock);
105    
106     if (res)
107     {
108     // error, assume lock is held by another process and do it the slow way
109     perlinterp_release ();
110     res = fcntl (fd, F_SETLKW, &flock);
111     perlinterp_acquire ();
112     }
113    
114     =head1 THE HARD AND FAST RULES
115    
116     As with everything, there are a number of rules to follow.
117    
118     =over 4
119    
120     =item I<Never> touch any perl data structures after calling C<perlinterp_release>.
121    
122     Possibly the most important rule of them all, anything perl is
123     completely off-limits after C<perlinterp_release>, until you call
124     C<perlinterp_acquire>, after which you can access perl stuff again.
125    
126     That includes anything in the perl interpreter that you didn't prove to be
127     safe, and didn't prove to be safe in older and future versions of perl:
128     global variables, local perl scalars, even if you are sure nobody accesses
129     them and you only try to "read" their value, and so on.
130    
131     If you need to access perl things, do it before releasing the
132     interpreter with C<perlinterp_release>, or after acquiring it again with
133     C<perlinterp_acquire>.
134    
135     =item I<Always> call C<perlinterp_release> and C<perlinterp_acquire> in pairs.
136    
137     For each C<perlinterp_release> call there must be a C<perlinterp_acquire>
138     call. They don't have to be in the same function, and you can have
139     multiple calls to them, as long as every C<perlinterp_release> call is
140     followed by exactly one C<perlinterp_acquire> call.
141    
142     For example., this would be fine:
143    
144     perlinterp_release ();
145    
146     if (!function_that_fails_with_0_return_value ())
147     {
148     perlinterp_acquire ();
149     croak ("error");
150     // croak doesn't return
151     }
152    
153     perlinterp_acquire ();
154     // do other stuff
155    
156     =item I<Never> nest calls to C<perlinterp_release> and C<perlinterp_acquire>.
157    
158     That simply means that after calling C<perlinterp_release>, you must
159     call C<perlinterp_acquire> before calling C<perlinterp_release>
160     again. Likewise, after C<perlinterp_acquire>, you can call
161     C<perlinterp_release> but not another C<perlinterp_acquire>.
162    
163     =item I<Always> call C<perlinterp_release> first.
164    
165     Also simple: you I<must not> call C<perlinterp_acquire> without having
166     called C<perlinterp_release> before.
167    
168     =item I<Never> underestimate threads.
169    
170     While it's easy to add parallel execution ability to your XS module, it
171     doesn't mean it is safe. After you release the perl interpreter, it's
172     perfectly possible that it will call your XS function in another thread,
173     even while your original function still executes. In other words: your C
174     code must be thread safe, and if you use any library, that library must be
175     thread-safe, too.
176    
177     Always assume that the code between C<perlinterp_release> and
178     C<perlinterp_acquire> is executed in parallel on multiple CPUs at the same
179     time. If your code can't cope with that, you could consider using a mutex
180     to only allow one such execution, which is still better than blocking
181     everybody else from doing anything:
182    
183     static pthread_mutex_t my_mutex = PTHREAD_MUTEX_INITIALIZER;
184    
185     perlinterp_release ();
186     pthread_mutex_lock (&my_mutex);
187     do_your_non_thread_safe_thing ();
188     pthread_mutex_unlock (&my_mutex);
189     perlinterp_acquire ();
190    
191     =item I<Don't> get confused by having to release first.
192    
193     In many real world scenarios, you acquire a resource, do something, then
194     release it again. Don't let this confuse you, with this, you already own
195     the resource (the perl interpreter) so you have to I<release> first, and
196     I<acquire> it again later, not the other way around.
197    
198     =back
199    
200    
201     =head1 DESIGN PRINCIPLES
202    
203     This section discusses how the design goals were reached (you be the
204     judge), how it is implemented, and what overheads this implies.
205    
206     =over 4
207    
208     =item Simple to Use
209    
210     All you have to do is identify the place in your existing code where you
211     stop touching perl stuff, do your actual work, and start touching perl
212     stuff again.
213    
214     Then slap C<perlinterp_release ()> and C<perlinterp_acquire ()> around the
215     actual work code.
216    
217     You have to include F<perlmulticore.h> and distribute it with your XS
218     code, but all these things border on the trivial.
219    
220     =item Very Efficient
221    
222     The definition for C<perlinterp_release> and C<perlinterp_release> is very
223     short:
224    
225     #define perlinterp_release() perl_multicore_api->pmapi_release ()
226     #define perlinterp_acquire() perl_multicore_api->pmapi_acquire ()
227    
228     Both are macros that read a pointer from memory (perl_multicore_api),
229     dereference a function pointer stored at that place, and call the
230     function, which takes no arguments and returns nothing.
231    
232     The first call to C<perlinterp_release> will check for the presence
233     of any supporting module, and if none is loaded, will create a dummy
234     implementation where both C<pmapi_release> and C<pmapi_acquire> execute
235     this function:
236    
237     static void perl_multicore_nop (void) { }
238    
239     So in the case of no magical module being loaded, all calls except the
240     first are two memory accesses and a predictable function call of an empty
241     function.
242    
243     Of course, the overhead is much higher when these functions actually
244     implement anything useful, but you always get what you pay for.
245    
246     With L<Coro::Multicore>, every release/acquire involves two pthread
247     switches, two coro thread switches, a bunch of syscalls, and sometimes
248     interacting with the event loop.
249    
250     A dedicated thread pool such as the one L<IO::AIO> uses could reduce
251     these overheads, and would also reduce the dependencies (L<AnyEvent> is a
252     smaller and more portable dependency than L<Coro>), but it would require a
253     lot more work on the side of the module author wanting to support it than
254     this solution.
255    
256     =item Low Code and Data Size Overhead
257    
258     On a 64 bit system, F<perlmulticore.h> uses exactly C<8> octets (one
259     pointer) of your data segment, to store the C<perl_multicore_api>
260     pointer. In addition it creates a C<16> octet perl string to store the
261     function pointers in, and stores it in a hash provided by perl for this
262     purpose.
263    
264     This is pretty much the equivalent of executing this code:
265    
266     $existing_hash{perl_multicore_api} = "123456781234567812345678";
267    
268     And that's it, which is, as I think, indeed very little.
269    
270     As for code size, on my amd64 system, every call to C<perlinterp_release>
271     or C<perlinterp_acquire> results in a variation of the following 9-10
272     octet sequence:
273    
274     150> mov 0x200f23(%rip),%rax # <perl_multicore_api>
275     157> callq *0x8(%rax)
276    
277     The biggest part if the initialisation code, which consists of 11 lines of
278     typical XS code. On my system, all the code in F<perlmulticore.h> compiles
279     to less than 160 octets of read-only data.
280    
281     =item Broad Applicability
282    
283     While there are alternative ways to achieve the goal of parallel execution
284     with threads that might be more efficient, this mechanism was chosen
285     because it is very simple to retrofit existing modules with it, and it
286    
287     The design goals for this mechanism were to be simple to use, very
288     efficient when not needed, low code and data size overhead and broad
289     applicability.
290    
291     =back
292    
293    
294     =head1 DISABLING PERL MULTICORE AT COMPILE TIME
295    
296     You can disable the complete perl multicore API by defining the
297     symbol C<PERL_MULTICORE_DISABLE> to C<1> (e.g. by specifying
298     F<-DPERL_MULTICORE_DISABLE> as compiler argument).
299    
300     This will leave no traces of the API in the compiled code, suitable
301     "empty" C<perl_release> and C<perl_acquire> definitions will be provided.
302    
303     This could be added to perl's C<CPPFLAGS> when configuring perl on
304     platforms that do not support threading at all for example.
305    
306    
307     =head1 AUTHOR
308    
309     Marc A. Lehmann <perlmulticore@schmorp.de>
310     http://perlmulticore.schmorp.de/
311    
312     =head1 LICENSE
313    
314     The F<perlmulticore.h> header file is put into the public
315     domain. Where this is legally not possible, or at your
316     option, it can be licensed under creativecommons CC0
317     license: L<https://creativecommons.org/publicdomain/zero/1.0/>.
318