… | |
… | |
12 | This module is an AnyEvent user, you need to make sure that you use and |
12 | This module is an AnyEvent user, you need to make sure that you use and |
13 | run a supported event loop. |
13 | run a supported event loop. |
14 | |
14 | |
15 | This module implements a simple, stateless and non-blocking HTTP client. |
15 | This module implements a simple, stateless and non-blocking HTTP client. |
16 | It supports GET, POST and other request methods, cookies and more, all |
16 | It supports GET, POST and other request methods, cookies and more, all |
17 | on a very low level. It can follow redirects supports proxies and |
17 | on a very low level. It can follow redirects, supports proxies, and |
18 | automatically limits the number of connections to the values specified |
18 | automatically limits the number of connections to the values specified |
19 | in the RFC. |
19 | in the RFC. |
20 | |
20 | |
21 | It should generally be a "good client" that is enough for most HTTP |
21 | It should generally be a "good client" that is enough for most HTTP |
22 | tasks. Simple tasks should be simple, but complex tasks should still be |
22 | tasks. Simple tasks should be simple, but complex tasks should still be |
… | |
… | |
50 | object at least alive until the callback get called. If the object |
50 | object at least alive until the callback get called. If the object |
51 | gets destroyed before the callback is called, the request will be |
51 | gets destroyed before the callback is called, the request will be |
52 | cancelled. |
52 | cancelled. |
53 | |
53 | |
54 | The callback will be called with the response body data as first |
54 | The callback will be called with the response body data as first |
55 | argument (or "undef" if an error occured), and a hash-ref with |
55 | argument (or "undef" if an error occurred), and a hash-ref with |
56 | response headers as second argument. |
56 | response headers (and trailers) as second argument. |
57 | |
57 | |
58 | All the headers in that hash are lowercased. In addition to the |
58 | All the headers in that hash are lowercased. In addition to the |
59 | response headers, the "pseudo-headers" (uppercase to avoid clashing |
59 | response headers, the "pseudo-headers" (uppercase to avoid clashing |
60 | with possible response headers) "HTTPVersion", "Status" and "Reason" |
60 | with possible response headers) "HTTPVersion", "Status" and "Reason" |
61 | contain the three parts of the HTTP Status-Line of the same name. If |
61 | contain the three parts of the HTTP Status-Line of the same name. If |
… | |
… | |
79 | If the server sends a header multiple times, then their contents |
79 | If the server sends a header multiple times, then their contents |
80 | will be joined together with a comma (","), as per the HTTP spec. |
80 | will be joined together with a comma (","), as per the HTTP spec. |
81 | |
81 | |
82 | If an internal error occurs, such as not being able to resolve a |
82 | If an internal error occurs, such as not being able to resolve a |
83 | hostname, then $data will be "undef", "$headers->{Status}" will be |
83 | hostname, then $data will be "undef", "$headers->{Status}" will be |
84 | "59x" (usually 599) and the "Reason" pseudo-header will contain an |
84 | 590-599 and the "Reason" pseudo-header will contain an error |
85 | error message. |
85 | message. Currently the following status codes are used: |
|
|
86 | |
|
|
87 | 595 - errors during connection establishment, proxy handshake. |
|
|
88 | 596 - errors during TLS negotiation, request sending and header |
|
|
89 | processing. |
|
|
90 | 597 - errors during body receiving or processing. |
|
|
91 | 598 - user aborted request via "on_header" or "on_body". |
|
|
92 | 599 - other, usually nonretryable, errors (garbled URL etc.). |
86 | |
93 | |
87 | A typical callback might look like this: |
94 | A typical callback might look like this: |
88 | |
95 | |
89 | sub { |
96 | sub { |
90 | my ($body, $hdr) = @_; |
97 | my ($body, $hdr) = @_; |
… | |
… | |
99 | Additional parameters are key-value pairs, and are fully optional. |
106 | Additional parameters are key-value pairs, and are fully optional. |
100 | They include: |
107 | They include: |
101 | |
108 | |
102 | recurse => $count (default: $MAX_RECURSE) |
109 | recurse => $count (default: $MAX_RECURSE) |
103 | Whether to recurse requests or not, e.g. on redirects, |
110 | Whether to recurse requests or not, e.g. on redirects, |
104 | authentication retries and so on, and how often to do so. |
111 | authentication and other retries and so on, and how often to do |
|
|
112 | so. |
|
|
113 | |
|
|
114 | Only redirects to http and https URLs are supported. While most |
|
|
115 | common redirection forms are handled entirely within this |
|
|
116 | module, some require the use of the optional URI module. If it |
|
|
117 | is required but missing, then the request will fail with an |
|
|
118 | error. |
105 | |
119 | |
106 | headers => hashref |
120 | headers => hashref |
107 | The request headers to use. Currently, "http_request" may |
121 | The request headers to use. Currently, "http_request" may |
108 | provide its own "Host:", "Content-Length:", "Connection:" and |
122 | provide its own "Host:", "Content-Length:", "Connection:" and |
109 | "Cookie:" headers and will provide defaults for "User-Agent:" |
123 | "Cookie:" headers and will provide defaults at least for "TE:", |
110 | and "Referer:" (this can be suppressed by using "undef" for |
124 | "Referer:" and "User-Agent:" (this can be suppressed by using |
111 | these headers in which case they won't be sent at all). |
125 | "undef" for these headers in which case they won't be sent at |
|
|
126 | all). |
|
|
127 | |
|
|
128 | You really should provide your own "User-Agent:" header value |
|
|
129 | that is appropriate for your program - I wouldn't be surprised |
|
|
130 | if the default AnyEvent string gets blocked by webservers sooner |
|
|
131 | or later. |
|
|
132 | |
|
|
133 | Also, make sure that your headers names and values do not |
|
|
134 | contain any embedded newlines. |
112 | |
135 | |
113 | timeout => $seconds |
136 | timeout => $seconds |
114 | The time-out to use for various stages - each connect attempt |
137 | The time-out to use for various stages - each connect attempt |
115 | will reset the timeout, as will read or write activity, i.e. |
138 | will reset the timeout, as will read or write activity, i.e. |
116 | this is not an overall timeout. |
139 | this is not an overall timeout. |
117 | |
140 | |
118 | Default timeout is 5 minutes. |
141 | Default timeout is 5 minutes. |
119 | |
142 | |
120 | proxy => [$host, $port[, $scheme]] or undef |
143 | proxy => [$host, $port[, $scheme]] or undef |
121 | Use the given http proxy for all requests. If not specified, |
144 | Use the given http proxy for all requests, or no proxy if |
122 | then the default proxy (as specified by $ENV{http_proxy}) is |
145 | "undef" is used. |
|
|
146 | |
|
|
147 | $scheme must be either missing or must be "http" for HTTP. |
|
|
148 | |
|
|
149 | If not specified, then the default proxy is used (see |
|
|
150 | "AnyEvent::HTTP::set_proxy"). |
|
|
151 | |
|
|
152 | Currently, if your proxy requires authorization, you have to |
|
|
153 | specify an appropriate "Proxy-Authorization" header in every |
123 | used. |
154 | request. |
124 | |
155 | |
125 | $scheme must be either missing, "http" for HTTP or "https" for |
156 | Note that this module will prefer an existing persistent |
126 | HTTPS. |
157 | connection, even if that connection was made using another |
|
|
158 | proxy. If you need to ensure that a new connection is made in |
|
|
159 | this case, you can either force "persistent" to false or e.g. |
|
|
160 | use the proxy address in your "sessionid". |
127 | |
161 | |
128 | body => $string |
162 | body => $string |
129 | The request body, usually empty. Will be-sent as-is (future |
163 | The request body, usually empty. Will be sent as-is (future |
130 | versions of this module might offer more options). |
164 | versions of this module might offer more options). |
131 | |
165 | |
132 | cookie_jar => $hash_ref |
166 | cookie_jar => $hash_ref |
133 | Passing this parameter enables (simplified) cookie-processing, |
167 | Passing this parameter enables (simplified) cookie-processing, |
134 | loosely based on the original netscape specification. |
168 | loosely based on the original netscape specification. |
135 | |
169 | |
136 | The $hash_ref must be an (initially empty) hash reference which |
170 | The $hash_ref must be an (initially empty) hash reference which |
137 | will get updated automatically. It is possible to save the |
171 | will get updated automatically. It is possible to save the |
138 | cookie_jar to persistent storage with something like JSON or |
172 | cookie jar to persistent storage with something like JSON or |
139 | Storable, but this is not recommended, as expiry times are |
173 | Storable - see the "AnyEvent::HTTP::cookie_jar_expire" function |
140 | currently being ignored. |
174 | if you wish to remove expired or session-only cookies, and also |
|
|
175 | for documentation on the format of the cookie jar. |
141 | |
176 | |
142 | Note that this cookie implementation is not of very high |
177 | Note that this cookie implementation is not meant to be |
143 | quality, nor meant to be complete. If you want complete cookie |
178 | complete. If you want complete cookie management you have to do |
144 | management you have to do that on your own. "cookie_jar" is |
179 | that on your own. "cookie_jar" is meant as a quick fix to get |
145 | meant as a quick fix to get some cookie-using sites working. |
180 | most cookie-using sites working. Cookies are a privacy disaster, |
146 | Cookies are a privacy disaster, do not use them unless required |
181 | do not use them unless required to. |
147 | to. |
182 | |
|
|
183 | When cookie processing is enabled, the "Cookie:" and |
|
|
184 | "Set-Cookie:" headers will be set and handled by this module, |
|
|
185 | otherwise they will be left untouched. |
148 | |
186 | |
149 | tls_ctx => $scheme | $tls_ctx |
187 | tls_ctx => $scheme | $tls_ctx |
150 | Specifies the AnyEvent::TLS context to be used for https |
188 | Specifies the AnyEvent::TLS context to be used for https |
151 | connections. This parameter follows the same rules as the |
189 | connections. This parameter follows the same rules as the |
152 | "tls_ctx" parameter to AnyEvent::Handle, but additionally, the |
190 | "tls_ctx" parameter to AnyEvent::Handle, but additionally, the |
… | |
… | |
155 | and high-security (CA and common-name verification) TLS context. |
193 | and high-security (CA and common-name verification) TLS context. |
156 | |
194 | |
157 | The default for this option is "low", which could be interpreted |
195 | The default for this option is "low", which could be interpreted |
158 | as "give me the page, no matter what". |
196 | as "give me the page, no matter what". |
159 | |
197 | |
|
|
198 | See also the "sessionid" parameter. |
|
|
199 | |
|
|
200 | sessionid => $string |
|
|
201 | The module might reuse connections to the same host internally |
|
|
202 | (regardless of other settings, such as "tcp_connect" or |
|
|
203 | "proxy"). Sometimes (e.g. when using TLS or a specfic proxy), |
|
|
204 | you do not want to reuse connections from other sessions. This |
|
|
205 | can be achieved by setting this parameter to some unique ID |
|
|
206 | (such as the address of an object storing your state data or the |
|
|
207 | TLS context, or the proxy IP) - only connections using the same |
|
|
208 | unique ID will be reused. |
|
|
209 | |
160 | on_prepare => $callback->($fh) |
210 | on_prepare => $callback->($fh) |
161 | In rare cases you need to "tune" the socket before it is used to |
211 | In rare cases you need to "tune" the socket before it is used to |
162 | connect (for exmaple, to bind it on a given IP address). This |
212 | connect (for example, to bind it on a given IP address). This |
163 | parameter overrides the prepare callback passed to |
213 | parameter overrides the prepare callback passed to |
164 | "AnyEvent::Socket::tcp_connect" and behaves exactly the same way |
214 | "AnyEvent::Socket::tcp_connect" and behaves exactly the same way |
165 | (e.g. it has to provide a timeout). See the description for the |
215 | (e.g. it has to provide a timeout). See the description for the |
166 | $prepare_cb argument of "AnyEvent::Socket::tcp_connect" for |
216 | $prepare_cb argument of "AnyEvent::Socket::tcp_connect" for |
167 | details. |
217 | details. |
… | |
… | |
173 | AnyEvent::Socket::tcp_connect to do this, but you can provide |
223 | AnyEvent::Socket::tcp_connect to do this, but you can provide |
174 | your own "tcp_connect" function - obviously, it has to follow |
224 | your own "tcp_connect" function - obviously, it has to follow |
175 | the same calling conventions, except that it may always return a |
225 | the same calling conventions, except that it may always return a |
176 | connection guard object. |
226 | connection guard object. |
177 | |
227 | |
|
|
228 | The connections made by this hook will be treated as equivalent |
|
|
229 | to connections made the built-in way, specifically, they will be |
|
|
230 | put into and taken from the persistent connection cache. If your |
|
|
231 | $tcp_connect function is incompatible with this kind of re-use, |
|
|
232 | consider switching off "persistent" connections and/or providing |
|
|
233 | a "sessionid" identifier. |
|
|
234 | |
178 | There are probably lots of weird uses for this function, |
235 | There are probably lots of weird uses for this function, |
179 | starting from tracing the hosts "http_request" actually tries to |
236 | starting from tracing the hosts "http_request" actually tries to |
180 | connect, to (inexact but fast) host => IP address caching or |
237 | connect, to (inexact but fast) host => IP address caching or |
181 | even socks protocol support. |
238 | even socks protocol support. |
182 | |
239 | |
… | |
… | |
192 | |
249 | |
193 | This callback is useful, among other things, to quickly reject |
250 | This callback is useful, among other things, to quickly reject |
194 | unwanted content, which, if it is supposed to be rare, can be |
251 | unwanted content, which, if it is supposed to be rare, can be |
195 | faster than first doing a "HEAD" request. |
252 | faster than first doing a "HEAD" request. |
196 | |
253 | |
|
|
254 | The downside is that cancelling the request makes it impossible |
|
|
255 | to re-use the connection. Also, the "on_header" callback will |
|
|
256 | not receive any trailer (headers sent after the response body). |
|
|
257 | |
197 | Example: cancel the request unless the content-type is |
258 | Example: cancel the request unless the content-type is |
198 | "text/html". |
259 | "text/html". |
199 | |
260 | |
200 | on_header => sub { |
261 | on_header => sub { |
201 | $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/ |
262 | $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/ |
… | |
… | |
208 | |
269 | |
209 | It has to return either true (in which case AnyEvent::HTTP will |
270 | It has to return either true (in which case AnyEvent::HTTP will |
210 | continue), or false, in which case AnyEvent::HTTP will cancel |
271 | continue), or false, in which case AnyEvent::HTTP will cancel |
211 | the download (and call the completion callback with an error |
272 | the download (and call the completion callback with an error |
212 | code of 598). |
273 | code of 598). |
|
|
274 | |
|
|
275 | The downside to cancelling the request is that it makes it |
|
|
276 | impossible to re-use the connection. |
213 | |
277 | |
214 | This callback is useful when the data is too large to be held in |
278 | This callback is useful when the data is too large to be held in |
215 | memory (so the callback writes it to a file) or when only some |
279 | memory (so the callback writes it to a file) or when only some |
216 | information should be extracted, or when the body should be |
280 | information should be extracted, or when the body should be |
217 | processed incrementally. |
281 | processed incrementally. |
… | |
… | |
231 | AnyEvent::Handle object associated with the connection. In error |
295 | AnyEvent::Handle object associated with the connection. In error |
232 | cases, "undef" will be passed. When there is no body (e.g. |
296 | cases, "undef" will be passed. When there is no body (e.g. |
233 | status 304), the empty string will be passed. |
297 | status 304), the empty string will be passed. |
234 | |
298 | |
235 | The handle object might or might not be in TLS mode, might be |
299 | The handle object might or might not be in TLS mode, might be |
236 | connected to a proxy, be a persistent connection etc., and |
300 | connected to a proxy, be a persistent connection, use chunked |
237 | configured in unspecified ways. The user is responsible for this |
301 | transfer encoding etc., and configured in unspecified ways. The |
238 | handle (it will not be used by this module anymore). |
302 | user is responsible for this handle (it will not be used by this |
|
|
303 | module anymore). |
239 | |
304 | |
240 | This is useful with some push-type services, where, after the |
305 | This is useful with some push-type services, where, after the |
241 | initial headers, an interactive protocol is used (typical |
306 | initial headers, an interactive protocol is used (typical |
242 | example would be the push-style twitter API which starts a |
307 | example would be the push-style twitter API which starts a |
243 | JSON/XML stream). |
308 | JSON/XML stream). |
244 | |
309 | |
245 | If you think you need this, first have a look at "on_body", to |
310 | If you think you need this, first have a look at "on_body", to |
246 | see if that doesn't solve your problem in a better way. |
311 | see if that doesn't solve your problem in a better way. |
247 | |
312 | |
|
|
313 | persistent => $boolean |
|
|
314 | Try to create/reuse a persistent connection. When this flag is |
|
|
315 | set (default: true for idempotent requests, false for all |
|
|
316 | others), then "http_request" tries to re-use an existing |
|
|
317 | (previously-created) persistent connection to same host (i.e. |
|
|
318 | identical URL scheme, hostname, port and sessionid) and, failing |
|
|
319 | that, tries to create a new one. |
|
|
320 | |
|
|
321 | Requests failing in certain ways will be automatically retried |
|
|
322 | once, which is dangerous for non-idempotent requests, which is |
|
|
323 | why it defaults to off for them. The reason for this is because |
|
|
324 | the bozos who designed HTTP/1.1 made it impossible to |
|
|
325 | distinguish between a fatal error and a normal connection |
|
|
326 | timeout, so you never know whether there was a problem with your |
|
|
327 | request or not. |
|
|
328 | |
|
|
329 | When reusing an existent connection, many parameters (such as |
|
|
330 | TLS context) will be ignored. See the "sessionid" parameter for |
|
|
331 | a workaround. |
|
|
332 | |
|
|
333 | keepalive => $boolean |
|
|
334 | Only used when "persistent" is also true. This parameter decides |
|
|
335 | whether "http_request" tries to handshake a HTTP/1.0-style |
|
|
336 | keep-alive connection (as opposed to only a HTTP/1.1 persistent |
|
|
337 | connection). |
|
|
338 | |
|
|
339 | The default is true, except when using a proxy, in which case it |
|
|
340 | defaults to false, as HTTP/1.0 proxies cannot support this in a |
|
|
341 | meaningful way. |
|
|
342 | |
|
|
343 | handle_params => { key => value ... } |
|
|
344 | The key-value pairs in this hash will be passed to any |
|
|
345 | AnyEvent::Handle constructor that is called - not all requests |
|
|
346 | will create a handle, and sometimes more than one is created, so |
|
|
347 | this parameter is only good for setting hints. |
|
|
348 | |
|
|
349 | Example: set the maximum read size to 4096, to potentially |
|
|
350 | conserve memory at the cost of speed. |
|
|
351 | |
|
|
352 | handle_params => { |
|
|
353 | max_read_size => 4096, |
|
|
354 | }, |
|
|
355 | |
248 | Example: make a simple HTTP GET request for http://www.nethype.de/ |
356 | Example: do a simple HTTP GET request for http://www.nethype.de/ and |
|
|
357 | print the response body. |
249 | |
358 | |
250 | http_request GET => "http://www.nethype.de/", sub { |
359 | http_request GET => "http://www.nethype.de/", sub { |
251 | my ($body, $hdr) = @_; |
360 | my ($body, $hdr) = @_; |
252 | print "$body\n"; |
361 | print "$body\n"; |
253 | }; |
362 | }; |
254 | |
363 | |
255 | Example: make a HTTP HEAD request on https://www.google.com/, use a |
364 | Example: do a HTTP HEAD request on https://www.google.com/, use a |
256 | timeout of 30 seconds. |
365 | timeout of 30 seconds. |
257 | |
366 | |
258 | http_request |
367 | http_request |
259 | GET => "https://www.google.com", |
368 | HEAD => "https://www.google.com", |
|
|
369 | headers => { "user-agent" => "MySearchClient 1.0" }, |
260 | timeout => 30, |
370 | timeout => 30, |
261 | sub { |
371 | sub { |
262 | my ($body, $hdr) = @_; |
372 | my ($body, $hdr) = @_; |
263 | use Data::Dumper; |
373 | use Data::Dumper; |
264 | print Dumper $hdr; |
374 | print Dumper $hdr; |
265 | } |
375 | } |
266 | ; |
376 | ; |
267 | |
377 | |
268 | Example: make another simple HTTP GET request, but immediately try |
378 | Example: do another simple HTTP GET request, but immediately try to |
269 | to cancel it. |
379 | cancel it. |
270 | |
380 | |
271 | my $request = http_request GET => "http://www.nethype.de/", sub { |
381 | my $request = http_request GET => "http://www.nethype.de/", sub { |
272 | my ($body, $hdr) = @_; |
382 | my ($body, $hdr) = @_; |
273 | print "$body\n"; |
383 | print "$body\n"; |
274 | }; |
384 | }; |
… | |
… | |
279 | AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for the |
389 | AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for the |
280 | actual connection, which in turn uses AnyEvent::DNS to resolve |
390 | actual connection, which in turn uses AnyEvent::DNS to resolve |
281 | hostnames. The latter is a simple stub resolver and does no caching on |
391 | hostnames. The latter is a simple stub resolver and does no caching on |
282 | its own. If you want DNS caching, you currently have to provide your own |
392 | its own. If you want DNS caching, you currently have to provide your own |
283 | default resolver (by storing a suitable resolver object in |
393 | default resolver (by storing a suitable resolver object in |
284 | $AnyEvent::DNS::RESOLVER). |
394 | $AnyEvent::DNS::RESOLVER) or your own "tcp_connect" callback. |
285 | |
395 | |
286 | GLOBAL FUNCTIONS AND VARIABLES |
396 | GLOBAL FUNCTIONS AND VARIABLES |
287 | AnyEvent::HTTP::set_proxy "proxy-url" |
397 | AnyEvent::HTTP::set_proxy "proxy-url" |
288 | Sets the default proxy server to use. The proxy-url must begin with |
398 | Sets the default proxy server to use. The proxy-url must begin with |
289 | a string of the form "http://host:port" (optionally "https:..."), |
399 | a string of the form "http://host:port", croaks otherwise. |
290 | croaks otherwise. |
|
|
291 | |
400 | |
292 | To clear an already-set proxy, use "undef". |
401 | To clear an already-set proxy, use "undef". |
|
|
402 | |
|
|
403 | When AnyEvent::HTTP is loaded for the first time it will query the |
|
|
404 | default proxy from the operating system, currently by looking at |
|
|
405 | "$ENV{http_proxy"}. |
|
|
406 | |
|
|
407 | AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end] |
|
|
408 | Remove all cookies from the cookie jar that have been expired. If |
|
|
409 | $session_end is given and true, then additionally remove all session |
|
|
410 | cookies. |
|
|
411 | |
|
|
412 | You should call this function (with a true $session_end) before you |
|
|
413 | save cookies to disk, and you should call this function after |
|
|
414 | loading them again. If you have a long-running program you can |
|
|
415 | additionally call this function from time to time. |
|
|
416 | |
|
|
417 | A cookie jar is initially an empty hash-reference that is managed by |
|
|
418 | this module. Its format is subject to change, but currently it is as |
|
|
419 | follows: |
|
|
420 | |
|
|
421 | The key "version" has to contain 2, otherwise the hash gets cleared. |
|
|
422 | All other keys are hostnames or IP addresses pointing to |
|
|
423 | hash-references. The key for these inner hash references is the |
|
|
424 | server path for which this cookie is meant, and the values are again |
|
|
425 | hash-references. Each key of those hash-references is a cookie name, |
|
|
426 | and the value, you guessed it, is another hash-reference, this time |
|
|
427 | with the key-value pairs from the cookie, except for "expires" and |
|
|
428 | "max-age", which have been replaced by a "_expires" key that |
|
|
429 | contains the cookie expiry timestamp. Session cookies are indicated |
|
|
430 | by not having an "_expires" key. |
|
|
431 | |
|
|
432 | Here is an example of a cookie jar with a single cookie, so you have |
|
|
433 | a chance of understanding the above paragraph: |
|
|
434 | |
|
|
435 | { |
|
|
436 | version => 2, |
|
|
437 | "10.0.0.1" => { |
|
|
438 | "/" => { |
|
|
439 | "mythweb_id" => { |
|
|
440 | _expires => 1293917923, |
|
|
441 | value => "ooRung9dThee3ooyXooM1Ohm", |
|
|
442 | }, |
|
|
443 | }, |
|
|
444 | }, |
|
|
445 | } |
293 | |
446 | |
294 | $date = AnyEvent::HTTP::format_date $timestamp |
447 | $date = AnyEvent::HTTP::format_date $timestamp |
295 | Takes a POSIX timestamp (seconds since the epoch) and formats it as |
448 | Takes a POSIX timestamp (seconds since the epoch) and formats it as |
296 | a HTTP Date (RFC 2616). |
449 | a HTTP Date (RFC 2616). |
297 | |
450 | |
298 | $timestamp = AnyEvent::HTTP::parse_date $date |
451 | $timestamp = AnyEvent::HTTP::parse_date $date |
299 | Takes a HTTP Date (RFC 2616) and returns the corresponding POSIX |
452 | Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie spec) |
|
|
453 | or a bunch of minor variations of those, and returns the |
300 | timestamp, or "undef" if the date cannot be parsed. |
454 | corresponding POSIX timestamp, or "undef" if the date cannot be |
|
|
455 | parsed. |
301 | |
456 | |
302 | $AnyEvent::HTTP::MAX_RECURSE |
457 | $AnyEvent::HTTP::MAX_RECURSE |
303 | The default value for the "recurse" request parameter (default: 10). |
458 | The default value for the "recurse" request parameter (default: 10). |
|
|
459 | |
|
|
460 | $AnyEvent::HTTP::TIMEOUT |
|
|
461 | The default timeout for connection operations (default: 300). |
304 | |
462 | |
305 | $AnyEvent::HTTP::USERAGENT |
463 | $AnyEvent::HTTP::USERAGENT |
306 | The default value for the "User-Agent" header (the default is |
464 | The default value for the "User-Agent" header (the default is |
307 | "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; |
465 | "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; |
308 | +http://software.schmorp.de/pkg/AnyEvent)"). |
466 | +http://software.schmorp.de/pkg/AnyEvent)"). |
309 | |
467 | |
310 | $AnyEvent::HTTP::MAX_PER_HOST |
468 | $AnyEvent::HTTP::MAX_PER_HOST |
311 | The maximum number of concurrent connections to the same host |
469 | The maximum number of concurrent connections to the same host |
312 | (identified by the hostname). If the limit is exceeded, then the |
470 | (identified by the hostname). If the limit is exceeded, then |
313 | additional requests are queued until previous connections are |
471 | additional requests are queued until previous connections are |
314 | closed. |
472 | closed. Both persistent and non-persistent connections are counted |
|
|
473 | in this limit. |
315 | |
474 | |
316 | The default value for this is 4, and it is highly advisable to not |
475 | The default value for this is 4, and it is highly advisable to not |
317 | increase it. |
476 | increase it much. |
|
|
477 | |
|
|
478 | For comparison: the RFC's recommend 4 non-persistent or 2 persistent |
|
|
479 | connections, older browsers used 2, newer ones (such as firefox 3) |
|
|
480 | typically use 6, and Opera uses 8 because like, they have the |
|
|
481 | fastest browser and give a shit for everybody else on the planet. |
|
|
482 | |
|
|
483 | $AnyEvent::HTTP::PERSISTENT_TIMEOUT |
|
|
484 | The time after which idle persistent connections get closed by |
|
|
485 | AnyEvent::HTTP (default: 3). |
318 | |
486 | |
319 | $AnyEvent::HTTP::ACTIVE |
487 | $AnyEvent::HTTP::ACTIVE |
320 | The number of active connections. This is not the number of |
488 | The number of active connections. This is not the number of |
321 | currently running requests, but the number of currently open and |
489 | currently running requests, but the number of currently open and |
322 | non-idle TCP connections. This number of can be useful for |
490 | non-idle TCP connections. This number can be useful for |
323 | load-leveling. |
491 | load-leveling. |
324 | |
492 | |
|
|
493 | SHOWCASE |
|
|
494 | This section contains some more elaborate "real-world" examples or code |
|
|
495 | snippets. |
|
|
496 | |
|
|
497 | HTTP/1.1 FILE DOWNLOAD |
|
|
498 | Downloading files with HTTP can be quite tricky, especially when |
|
|
499 | something goes wrong and you want to resume. |
|
|
500 | |
|
|
501 | Here is a function that initiates and resumes a download. It uses the |
|
|
502 | last modified time to check for file content changes, and works with |
|
|
503 | many HTTP/1.0 servers as well, and usually falls back to a complete |
|
|
504 | re-download on older servers. |
|
|
505 | |
|
|
506 | It calls the completion callback with either "undef", which means a |
|
|
507 | nonretryable error occurred, 0 when the download was partial and should |
|
|
508 | be retried, and 1 if it was successful. |
|
|
509 | |
|
|
510 | use AnyEvent::HTTP; |
|
|
511 | |
|
|
512 | sub download($$$) { |
|
|
513 | my ($url, $file, $cb) = @_; |
|
|
514 | |
|
|
515 | open my $fh, "+<", $file |
|
|
516 | or die "$file: $!"; |
|
|
517 | |
|
|
518 | my %hdr; |
|
|
519 | my $ofs = 0; |
|
|
520 | |
|
|
521 | if (stat $fh and -s _) { |
|
|
522 | $ofs = -s _; |
|
|
523 | warn "-s is ", $ofs; |
|
|
524 | $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9]; |
|
|
525 | $hdr{"range"} = "bytes=$ofs-"; |
|
|
526 | } |
|
|
527 | |
|
|
528 | http_get $url, |
|
|
529 | headers => \%hdr, |
|
|
530 | on_header => sub { |
|
|
531 | my ($hdr) = @_; |
|
|
532 | |
|
|
533 | if ($hdr->{Status} == 200 && $ofs) { |
|
|
534 | # resume failed |
|
|
535 | truncate $fh, $ofs = 0; |
|
|
536 | } |
|
|
537 | |
|
|
538 | sysseek $fh, $ofs, 0; |
|
|
539 | |
|
|
540 | 1 |
|
|
541 | }, |
|
|
542 | on_body => sub { |
|
|
543 | my ($data, $hdr) = @_; |
|
|
544 | |
|
|
545 | if ($hdr->{Status} =~ /^2/) { |
|
|
546 | length $data == syswrite $fh, $data |
|
|
547 | or return; # abort on write errors |
|
|
548 | } |
|
|
549 | |
|
|
550 | 1 |
|
|
551 | }, |
|
|
552 | sub { |
|
|
553 | my (undef, $hdr) = @_; |
|
|
554 | |
|
|
555 | my $status = $hdr->{Status}; |
|
|
556 | |
|
|
557 | if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) { |
|
|
558 | utime $time, $time, $fh; |
|
|
559 | } |
|
|
560 | |
|
|
561 | if ($status == 200 || $status == 206 || $status == 416) { |
|
|
562 | # download ok || resume ok || file already fully downloaded |
|
|
563 | $cb->(1, $hdr); |
|
|
564 | |
|
|
565 | } elsif ($status == 412) { |
|
|
566 | # file has changed while resuming, delete and retry |
|
|
567 | unlink $file; |
|
|
568 | $cb->(0, $hdr); |
|
|
569 | |
|
|
570 | } elsif ($status == 500 or $status == 503 or $status =~ /^59/) { |
|
|
571 | # retry later |
|
|
572 | $cb->(0, $hdr); |
|
|
573 | |
|
|
574 | } else { |
|
|
575 | $cb->(undef, $hdr); |
|
|
576 | } |
|
|
577 | } |
|
|
578 | ; |
|
|
579 | } |
|
|
580 | |
|
|
581 | download "http://server/somelargefile", "/tmp/somelargefile", sub { |
|
|
582 | if ($_[0]) { |
|
|
583 | print "OK!\n"; |
|
|
584 | } elsif (defined $_[0]) { |
|
|
585 | print "please retry later\n"; |
|
|
586 | } else { |
|
|
587 | print "ERROR\n"; |
|
|
588 | } |
|
|
589 | }; |
|
|
590 | |
325 | SOCKS PROXIES |
591 | SOCKS PROXIES |
326 | Socks proxies are not directly supported by AnyEvent::HTTP. You can |
592 | Socks proxies are not directly supported by AnyEvent::HTTP. You can |
327 | compile your perl to support socks, or use an external program such as |
593 | compile your perl to support socks, or use an external program such as |
328 | socksify (dante) or tsocks to make your program use a socks proxy |
594 | socksify (dante) or tsocks to make your program use a socks proxy |
329 | transparently. |
595 | transparently. |
330 | |
596 | |
… | |
… | |
382 | |
648 | |
383 | AUTHOR |
649 | AUTHOR |
384 | Marc Lehmann <schmorp@schmorp.de> |
650 | Marc Lehmann <schmorp@schmorp.de> |
385 | http://home.schmorp.de/ |
651 | http://home.schmorp.de/ |
386 | |
652 | |
387 | With many thanks to Дмитрий Шалашов, who provided |
653 | With many thanks to Дмитрий Шалашов, who provided countless testcases |
388 | countless testcases and bugreports. |
654 | and bugreports. |
389 | |
655 | |