ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/README
(Generate patch)

Comparing AnyEvent-HTTP/README (file contents):
Revision 1.13 by root, Wed Jun 16 19:17:30 2010 UTC vs.
Revision 1.28 by root, Mon Apr 27 12:14:12 2020 UTC

12 This module is an AnyEvent user, you need to make sure that you use and 12 This module is an AnyEvent user, you need to make sure that you use and
13 run a supported event loop. 13 run a supported event loop.
14 14
15 This module implements a simple, stateless and non-blocking HTTP client. 15 This module implements a simple, stateless and non-blocking HTTP client.
16 It supports GET, POST and other request methods, cookies and more, all 16 It supports GET, POST and other request methods, cookies and more, all
17 on a very low level. It can follow redirects supports proxies and 17 on a very low level. It can follow redirects, supports proxies, and
18 automatically limits the number of connections to the values specified 18 automatically limits the number of connections to the values specified
19 in the RFC. 19 in the RFC.
20 20
21 It should generally be a "good client" that is enough for most HTTP 21 It should generally be a "good client" that is enough for most HTTP
22 tasks. Simple tasks should be simple, but complex tasks should still be 22 tasks. Simple tasks should be simple, but complex tasks should still be
46 URL must be an absolute http or https URL. 46 URL must be an absolute http or https URL.
47 47
48 When called in void context, nothing is returned. In other contexts, 48 When called in void context, nothing is returned. In other contexts,
49 "http_request" returns a "cancellation guard" - you have to keep the 49 "http_request" returns a "cancellation guard" - you have to keep the
50 object at least alive until the callback get called. If the object 50 object at least alive until the callback get called. If the object
51 gets destroyed before the callbakc is called, the request will be 51 gets destroyed before the callback is called, the request will be
52 cancelled. 52 cancelled.
53 53
54 The callback will be called with the response body data as first 54 The callback will be called with the response body data as first
55 argument (or "undef" if an error occured), and a hash-ref with 55 argument (or "undef" if an error occurred), and a hash-ref with
56 response headers as second argument. 56 response headers (and trailers) as second argument.
57 57
58 All the headers in that hash are lowercased. In addition to the 58 All the headers in that hash are lowercased. In addition to the
59 response headers, the "pseudo-headers" (uppercase to avoid clashing 59 response headers, the "pseudo-headers" (uppercase to avoid clashing
60 with possible response headers) "HTTPVersion", "Status" and "Reason" 60 with possible response headers) "HTTPVersion", "Status" and "Reason"
61 contain the three parts of the HTTP Status-Line of the same name. 61 contain the three parts of the HTTP Status-Line of the same name. If
62 an error occurs during the body phase of a request, then the
63 original "Status" and "Reason" values from the header are available
64 as "OrigStatus" and "OrigReason".
62 65
63 The pseudo-header "URL" contains the actual URL (which can differ 66 The pseudo-header "URL" contains the actual URL (which can differ
64 from the requested URL when following redirects - for example, you 67 from the requested URL when following redirects - for example, you
65 might get an error that your URL scheme is not supported even though 68 might get an error that your URL scheme is not supported even though
66 your URL is a valid http URL because it redirected to an ftp URL, in 69 your URL is a valid http URL because it redirected to an ftp URL, in
76 If the server sends a header multiple times, then their contents 79 If the server sends a header multiple times, then their contents
77 will be joined together with a comma (","), as per the HTTP spec. 80 will be joined together with a comma (","), as per the HTTP spec.
78 81
79 If an internal error occurs, such as not being able to resolve a 82 If an internal error occurs, such as not being able to resolve a
80 hostname, then $data will be "undef", "$headers->{Status}" will be 83 hostname, then $data will be "undef", "$headers->{Status}" will be
81 "59x" (usually 599) and the "Reason" pseudo-header will contain an 84 590-599 and the "Reason" pseudo-header will contain an error
82 error message. 85 message. Currently the following status codes are used:
86
87 595 - errors during connection establishment, proxy handshake.
88 596 - errors during TLS negotiation, request sending and header
89 processing.
90 597 - errors during body receiving or processing.
91 598 - user aborted request via "on_header" or "on_body".
92 599 - other, usually nonretryable, errors (garbled URL etc.).
83 93
84 A typical callback might look like this: 94 A typical callback might look like this:
85 95
86 sub { 96 sub {
87 my ($body, $hdr) = @_; 97 my ($body, $hdr) = @_;
96 Additional parameters are key-value pairs, and are fully optional. 106 Additional parameters are key-value pairs, and are fully optional.
97 They include: 107 They include:
98 108
99 recurse => $count (default: $MAX_RECURSE) 109 recurse => $count (default: $MAX_RECURSE)
100 Whether to recurse requests or not, e.g. on redirects, 110 Whether to recurse requests or not, e.g. on redirects,
101 authentication retries and so on, and how often to do so. 111 authentication and other retries and so on, and how often to do
112 so.
113
114 Only redirects to http and https URLs are supported. While most
115 common redirection forms are handled entirely within this
116 module, some require the use of the optional URI module. If it
117 is required but missing, then the request will fail with an
118 error.
102 119
103 headers => hashref 120 headers => hashref
104 The request headers to use. Currently, "http_request" may 121 The request headers to use. Currently, "http_request" may
105 provide its own "Host:", "Content-Length:", "Connection:" and 122 provide its own "Host:", "Content-Length:", "Connection:" and
106 "Cookie:" headers and will provide defaults for "User-Agent:" 123 "Cookie:" headers and will provide defaults at least for "TE:",
107 and "Referer:" (this can be suppressed by using "undef" for 124 "Referer:" and "User-Agent:" (this can be suppressed by using
108 these headers in which case they won't be sent at all). 125 "undef" for these headers in which case they won't be sent at
126 all).
127
128 You really should provide your own "User-Agent:" header value
129 that is appropriate for your program - I wouldn't be surprised
130 if the default AnyEvent string gets blocked by webservers sooner
131 or later.
132
133 Also, make sure that your headers names and values do not
134 contain any embedded newlines.
109 135
110 timeout => $seconds 136 timeout => $seconds
111 The time-out to use for various stages - each connect attempt 137 The time-out to use for various stages - each connect attempt
112 will reset the timeout, as will read or write activity, i.e. 138 will reset the timeout, as will read or write activity, i.e.
113 this is not an overall timeout. 139 this is not an overall timeout.
114 140
115 Default timeout is 5 minutes. 141 Default timeout is 5 minutes.
116 142
117 proxy => [$host, $port[, $scheme]] or undef 143 proxy => [$host, $port[, $scheme]] or undef
118 Use the given http proxy for all requests. If not specified, 144 Use the given http proxy for all requests, or no proxy if
119 then the default proxy (as specified by $ENV{http_proxy}) is 145 "undef" is used.
146
147 $scheme must be either missing or must be "http" for HTTP.
148
149 If not specified, then the default proxy is used (see
150 "AnyEvent::HTTP::set_proxy").
151
152 Currently, if your proxy requires authorization, you have to
153 specify an appropriate "Proxy-Authorization" header in every
120 used. 154 request.
121 155
122 $scheme must be either missing, "http" for HTTP or "https" for 156 Note that this module will prefer an existing persistent
123 HTTPS. 157 connection, even if that connection was made using another
158 proxy. If you need to ensure that a new connection is made in
159 this case, you can either force "persistent" to false or e.g.
160 use the proxy address in your "sessionid".
124 161
125 body => $string 162 body => $string
126 The request body, usually empty. Will be-sent as-is (future 163 The request body, usually empty. Will be sent as-is (future
127 versions of this module might offer more options). 164 versions of this module might offer more options).
128 165
129 cookie_jar => $hash_ref 166 cookie_jar => $hash_ref
130 Passing this parameter enables (simplified) cookie-processing, 167 Passing this parameter enables (simplified) cookie-processing,
131 loosely based on the original netscape specification. 168 loosely based on the original netscape specification.
132 169
133 The $hash_ref must be an (initially empty) hash reference which 170 The $hash_ref must be an (initially empty) hash reference which
134 will get updated automatically. It is possible to save the 171 will get updated automatically. It is possible to save the
135 cookie_jar to persistent storage with something like JSON or 172 cookie jar to persistent storage with something like JSON or
136 Storable, but this is not recommended, as expiry times are 173 Storable - see the "AnyEvent::HTTP::cookie_jar_expire" function
137 currently being ignored. 174 if you wish to remove expired or session-only cookies, and also
175 for documentation on the format of the cookie jar.
138 176
139 Note that this cookie implementation is not of very high 177 Note that this cookie implementation is not meant to be
140 quality, nor meant to be complete. If you want complete cookie 178 complete. If you want complete cookie management you have to do
141 management you have to do that on your own. "cookie_jar" is 179 that on your own. "cookie_jar" is meant as a quick fix to get
142 meant as a quick fix to get some cookie-using sites working. 180 most cookie-using sites working. Cookies are a privacy disaster,
143 Cookies are a privacy disaster, do not use them unless required 181 do not use them unless required to.
144 to. 182
183 When cookie processing is enabled, the "Cookie:" and
184 "Set-Cookie:" headers will be set and handled by this module,
185 otherwise they will be left untouched.
145 186
146 tls_ctx => $scheme | $tls_ctx 187 tls_ctx => $scheme | $tls_ctx
147 Specifies the AnyEvent::TLS context to be used for https 188 Specifies the AnyEvent::TLS context to be used for https
148 connections. This parameter follows the same rules as the 189 connections. This parameter follows the same rules as the
149 "tls_ctx" parameter to AnyEvent::Handle, but additionally, the 190 "tls_ctx" parameter to AnyEvent::Handle, but additionally, the
152 and high-security (CA and common-name verification) TLS context. 193 and high-security (CA and common-name verification) TLS context.
153 194
154 The default for this option is "low", which could be interpreted 195 The default for this option is "low", which could be interpreted
155 as "give me the page, no matter what". 196 as "give me the page, no matter what".
156 197
198 See also the "sessionid" parameter.
199
200 sessionid => $string
201 The module might reuse connections to the same host internally
202 (regardless of other settings, such as "tcp_connect" or
203 "proxy"). Sometimes (e.g. when using TLS or a specfic proxy),
204 you do not want to reuse connections from other sessions. This
205 can be achieved by setting this parameter to some unique ID
206 (such as the address of an object storing your state data or the
207 TLS context, or the proxy IP) - only connections using the same
208 unique ID will be reused.
209
157 on_prepare => $callback->($fh) 210 on_prepare => $callback->($fh)
158 In rare cases you need to "tune" the socket before it is used to 211 In rare cases you need to "tune" the socket before it is used to
159 connect (for exmaple, to bind it on a given IP address). This 212 connect (for example, to bind it on a given IP address). This
160 parameter overrides the prepare callback passed to 213 parameter overrides the prepare callback passed to
161 "AnyEvent::Socket::tcp_connect" and behaves exactly the same way 214 "AnyEvent::Socket::tcp_connect" and behaves exactly the same way
162 (e.g. it has to provide a timeout). See the description for the 215 (e.g. it has to provide a timeout). See the description for the
163 $prepare_cb argument of "AnyEvent::Socket::tcp_connect" for 216 $prepare_cb argument of "AnyEvent::Socket::tcp_connect" for
164 details. 217 details.
165 218
219 tcp_connect => $callback->($host, $service, $connect_cb,
220 $prepare_cb)
221 In even rarer cases you want total control over how
222 AnyEvent::HTTP establishes connections. Normally it uses
223 AnyEvent::Socket::tcp_connect to do this, but you can provide
224 your own "tcp_connect" function - obviously, it has to follow
225 the same calling conventions, except that it may always return a
226 connection guard object.
227
228 The connections made by this hook will be treated as equivalent
229 to connections made the built-in way, specifically, they will be
230 put into and taken from the persistent connection cache. If your
231 $tcp_connect function is incompatible with this kind of re-use,
232 consider switching off "persistent" connections and/or providing
233 a "sessionid" identifier.
234
235 There are probably lots of weird uses for this function,
236 starting from tracing the hosts "http_request" actually tries to
237 connect, to (inexact but fast) host => IP address caching or
238 even socks protocol support.
239
166 on_header => $callback->($headers) 240 on_header => $callback->($headers)
167 When specified, this callback will be called with the header 241 When specified, this callback will be called with the header
168 hash as soon as headers have been successfully received from the 242 hash as soon as headers have been successfully received from the
169 remote server (not on locally-generated errors). 243 remote server (not on locally-generated errors).
170 244
175 249
176 This callback is useful, among other things, to quickly reject 250 This callback is useful, among other things, to quickly reject
177 unwanted content, which, if it is supposed to be rare, can be 251 unwanted content, which, if it is supposed to be rare, can be
178 faster than first doing a "HEAD" request. 252 faster than first doing a "HEAD" request.
179 253
254 The downside is that cancelling the request makes it impossible
255 to re-use the connection. Also, the "on_header" callback will
256 not receive any trailer (headers sent after the response body).
257
180 Example: cancel the request unless the content-type is 258 Example: cancel the request unless the content-type is
181 "text/html". 259 "text/html".
182 260
183 on_header => sub { 261 on_header => sub {
184 $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/ 262 $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
191 269
192 It has to return either true (in which case AnyEvent::HTTP will 270 It has to return either true (in which case AnyEvent::HTTP will
193 continue), or false, in which case AnyEvent::HTTP will cancel 271 continue), or false, in which case AnyEvent::HTTP will cancel
194 the download (and call the completion callback with an error 272 the download (and call the completion callback with an error
195 code of 598). 273 code of 598).
274
275 The downside to cancelling the request is that it makes it
276 impossible to re-use the connection.
196 277
197 This callback is useful when the data is too large to be held in 278 This callback is useful when the data is too large to be held in
198 memory (so the callback writes it to a file) or when only some 279 memory (so the callback writes it to a file) or when only some
199 information should be extracted, or when the body should be 280 information should be extracted, or when the body should be
200 processed incrementally. 281 processed incrementally.
214 AnyEvent::Handle object associated with the connection. In error 295 AnyEvent::Handle object associated with the connection. In error
215 cases, "undef" will be passed. When there is no body (e.g. 296 cases, "undef" will be passed. When there is no body (e.g.
216 status 304), the empty string will be passed. 297 status 304), the empty string will be passed.
217 298
218 The handle object might or might not be in TLS mode, might be 299 The handle object might or might not be in TLS mode, might be
219 connected to a proxy, be a persistent connection etc., and 300 connected to a proxy, be a persistent connection, use chunked
220 configured in unspecified ways. The user is responsible for this 301 transfer encoding etc., and configured in unspecified ways. The
221 handle (it will not be used by this module anymore). 302 user is responsible for this handle (it will not be used by this
303 module anymore).
222 304
223 This is useful with some push-type services, where, after the 305 This is useful with some push-type services, where, after the
224 initial headers, an interactive protocol is used (typical 306 initial headers, an interactive protocol is used (typical
225 example would be the push-style twitter API which starts a 307 example would be the push-style twitter API which starts a
226 JSON/XML stream). 308 JSON/XML stream).
227 309
228 If you think you need this, first have a look at "on_body", to 310 If you think you need this, first have a look at "on_body", to
229 see if that doesn't solve your problem in a better way. 311 see if that doesn't solve your problem in a better way.
230 312
313 persistent => $boolean
314 Try to create/reuse a persistent connection. When this flag is
315 set (default: true for idempotent requests, false for all
316 others), then "http_request" tries to re-use an existing
317 (previously-created) persistent connection to same host (i.e.
318 identical URL scheme, hostname, port and sessionid) and, failing
319 that, tries to create a new one.
320
321 Requests failing in certain ways will be automatically retried
322 once, which is dangerous for non-idempotent requests, which is
323 why it defaults to off for them. The reason for this is because
324 the bozos who designed HTTP/1.1 made it impossible to
325 distinguish between a fatal error and a normal connection
326 timeout, so you never know whether there was a problem with your
327 request or not.
328
329 When reusing an existent connection, many parameters (such as
330 TLS context) will be ignored. See the "sessionid" parameter for
331 a workaround.
332
333 keepalive => $boolean
334 Only used when "persistent" is also true. This parameter decides
335 whether "http_request" tries to handshake a HTTP/1.0-style
336 keep-alive connection (as opposed to only a HTTP/1.1 persistent
337 connection).
338
339 The default is true, except when using a proxy, in which case it
340 defaults to false, as HTTP/1.0 proxies cannot support this in a
341 meaningful way.
342
343 handle_params => { key => value ... }
344 The key-value pairs in this hash will be passed to any
345 AnyEvent::Handle constructor that is called - not all requests
346 will create a handle, and sometimes more than one is created, so
347 this parameter is only good for setting hints.
348
349 Example: set the maximum read size to 4096, to potentially
350 conserve memory at the cost of speed.
351
352 handle_params => {
353 max_read_size => 4096,
354 },
355
231 Example: make a simple HTTP GET request for http://www.nethype.de/ 356 Example: do a simple HTTP GET request for http://www.nethype.de/ and
357 print the response body.
232 358
233 http_request GET => "http://www.nethype.de/", sub { 359 http_request GET => "http://www.nethype.de/", sub {
234 my ($body, $hdr) = @_; 360 my ($body, $hdr) = @_;
235 print "$body\n"; 361 print "$body\n";
236 }; 362 };
237 363
238 Example: make a HTTP HEAD request on https://www.google.com/, use a 364 Example: do a HTTP HEAD request on https://www.google.com/, use a
239 timeout of 30 seconds. 365 timeout of 30 seconds.
240 366
241 http_request 367 http_request
242 GET => "https://www.google.com", 368 HEAD => "https://www.google.com",
369 headers => { "user-agent" => "MySearchClient 1.0" },
243 timeout => 30, 370 timeout => 30,
244 sub { 371 sub {
245 my ($body, $hdr) = @_; 372 my ($body, $hdr) = @_;
246 use Data::Dumper; 373 use Data::Dumper;
247 print Dumper $hdr; 374 print Dumper $hdr;
248 } 375 }
249 ; 376 ;
250 377
251 Example: make another simple HTTP GET request, but immediately try 378 Example: do another simple HTTP GET request, but immediately try to
252 to cancel it. 379 cancel it.
253 380
254 my $request = http_request GET => "http://www.nethype.de/", sub { 381 my $request = http_request GET => "http://www.nethype.de/", sub {
255 my ($body, $hdr) = @_; 382 my ($body, $hdr) = @_;
256 print "$body\n"; 383 print "$body\n";
257 }; 384 };
262 AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for the 389 AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for the
263 actual connection, which in turn uses AnyEvent::DNS to resolve 390 actual connection, which in turn uses AnyEvent::DNS to resolve
264 hostnames. The latter is a simple stub resolver and does no caching on 391 hostnames. The latter is a simple stub resolver and does no caching on
265 its own. If you want DNS caching, you currently have to provide your own 392 its own. If you want DNS caching, you currently have to provide your own
266 default resolver (by storing a suitable resolver object in 393 default resolver (by storing a suitable resolver object in
267 $AnyEvent::DNS::RESOLVER). 394 $AnyEvent::DNS::RESOLVER) or your own "tcp_connect" callback.
268 395
269 GLOBAL FUNCTIONS AND VARIABLES 396 GLOBAL FUNCTIONS AND VARIABLES
270 AnyEvent::HTTP::set_proxy "proxy-url" 397 AnyEvent::HTTP::set_proxy "proxy-url"
271 Sets the default proxy server to use. The proxy-url must begin with 398 Sets the default proxy server to use. The proxy-url must begin with
272 a string of the form "http://host:port" (optionally "https:..."), 399 a string of the form "http://host:port", croaks otherwise.
273 croaks otherwise.
274 400
275 To clear an already-set proxy, use "undef". 401 To clear an already-set proxy, use "undef".
402
403 When AnyEvent::HTTP is loaded for the first time it will query the
404 default proxy from the operating system, currently by looking at
405 "$ENV{http_proxy"}.
406
407 AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
408 Remove all cookies from the cookie jar that have been expired. If
409 $session_end is given and true, then additionally remove all session
410 cookies.
411
412 You should call this function (with a true $session_end) before you
413 save cookies to disk, and you should call this function after
414 loading them again. If you have a long-running program you can
415 additionally call this function from time to time.
416
417 A cookie jar is initially an empty hash-reference that is managed by
418 this module. Its format is subject to change, but currently it is as
419 follows:
420
421 The key "version" has to contain 2, otherwise the hash gets cleared.
422 All other keys are hostnames or IP addresses pointing to
423 hash-references. The key for these inner hash references is the
424 server path for which this cookie is meant, and the values are again
425 hash-references. Each key of those hash-references is a cookie name,
426 and the value, you guessed it, is another hash-reference, this time
427 with the key-value pairs from the cookie, except for "expires" and
428 "max-age", which have been replaced by a "_expires" key that
429 contains the cookie expiry timestamp. Session cookies are indicated
430 by not having an "_expires" key.
431
432 Here is an example of a cookie jar with a single cookie, so you have
433 a chance of understanding the above paragraph:
434
435 {
436 version => 2,
437 "10.0.0.1" => {
438 "/" => {
439 "mythweb_id" => {
440 _expires => 1293917923,
441 value => "ooRung9dThee3ooyXooM1Ohm",
442 },
443 },
444 },
445 }
446
447 $date = AnyEvent::HTTP::format_date $timestamp
448 Takes a POSIX timestamp (seconds since the epoch) and formats it as
449 a HTTP Date (RFC 2616).
450
451 $timestamp = AnyEvent::HTTP::parse_date $date
452 Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie spec)
453 or a bunch of minor variations of those, and returns the
454 corresponding POSIX timestamp, or "undef" if the date cannot be
455 parsed.
276 456
277 $AnyEvent::HTTP::MAX_RECURSE 457 $AnyEvent::HTTP::MAX_RECURSE
278 The default value for the "recurse" request parameter (default: 10). 458 The default value for the "recurse" request parameter (default: 10).
459
460 $AnyEvent::HTTP::TIMEOUT
461 The default timeout for connection operations (default: 300).
279 462
280 $AnyEvent::HTTP::USERAGENT 463 $AnyEvent::HTTP::USERAGENT
281 The default value for the "User-Agent" header (the default is 464 The default value for the "User-Agent" header (the default is
282 "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; 465 "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION;
283 +http://software.schmorp.de/pkg/AnyEvent)"). 466 +http://software.schmorp.de/pkg/AnyEvent)").
284 467
285 $AnyEvent::HTTP::MAX_PER_HOST 468 $AnyEvent::HTTP::MAX_PER_HOST
286 The maximum number of concurrent connections to the same host 469 The maximum number of concurrent connections to the same host
287 (identified by the hostname). If the limit is exceeded, then the 470 (identified by the hostname). If the limit is exceeded, then
288 additional requests are queued until previous connections are 471 additional requests are queued until previous connections are
289 closed. 472 closed. Both persistent and non-persistent connections are counted
473 in this limit.
290 474
291 The default value for this is 4, and it is highly advisable to not 475 The default value for this is 4, and it is highly advisable to not
292 increase it. 476 increase it much.
477
478 For comparison: the RFC's recommend 4 non-persistent or 2 persistent
479 connections, older browsers used 2, newer ones (such as firefox 3)
480 typically use 6, and Opera uses 8 because like, they have the
481 fastest browser and give a shit for everybody else on the planet.
482
483 $AnyEvent::HTTP::PERSISTENT_TIMEOUT
484 The time after which idle persistent connections get closed by
485 AnyEvent::HTTP (default: 3).
293 486
294 $AnyEvent::HTTP::ACTIVE 487 $AnyEvent::HTTP::ACTIVE
295 The number of active connections. This is not the number of 488 The number of active connections. This is not the number of
296 currently running requests, but the number of currently open and 489 currently running requests, but the number of currently open and
297 non-idle TCP connections. This number of can be useful for 490 non-idle TCP connections. This number can be useful for
298 load-leveling. 491 load-leveling.
492
493 SHOWCASE
494 This section contains some more elaborate "real-world" examples or code
495 snippets.
496
497 HTTP/1.1 FILE DOWNLOAD
498 Downloading files with HTTP can be quite tricky, especially when
499 something goes wrong and you want to resume.
500
501 Here is a function that initiates and resumes a download. It uses the
502 last modified time to check for file content changes, and works with
503 many HTTP/1.0 servers as well, and usually falls back to a complete
504 re-download on older servers.
505
506 It calls the completion callback with either "undef", which means a
507 nonretryable error occurred, 0 when the download was partial and should
508 be retried, and 1 if it was successful.
509
510 use AnyEvent::HTTP;
511
512 sub download($$$) {
513 my ($url, $file, $cb) = @_;
514
515 open my $fh, "+<", $file
516 or die "$file: $!";
517
518 my %hdr;
519 my $ofs = 0;
520
521 if (stat $fh and -s _) {
522 $ofs = -s _;
523 warn "-s is ", $ofs;
524 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
525 $hdr{"range"} = "bytes=$ofs-";
526 }
527
528 http_get $url,
529 headers => \%hdr,
530 on_header => sub {
531 my ($hdr) = @_;
532
533 if ($hdr->{Status} == 200 && $ofs) {
534 # resume failed
535 truncate $fh, $ofs = 0;
536 }
537
538 sysseek $fh, $ofs, 0;
539
540 1
541 },
542 on_body => sub {
543 my ($data, $hdr) = @_;
544
545 if ($hdr->{Status} =~ /^2/) {
546 length $data == syswrite $fh, $data
547 or return; # abort on write errors
548 }
549
550 1
551 },
552 sub {
553 my (undef, $hdr) = @_;
554
555 my $status = $hdr->{Status};
556
557 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
558 utime $time, $time, $fh;
559 }
560
561 if ($status == 200 || $status == 206 || $status == 416) {
562 # download ok || resume ok || file already fully downloaded
563 $cb->(1, $hdr);
564
565 } elsif ($status == 412) {
566 # file has changed while resuming, delete and retry
567 unlink $file;
568 $cb->(0, $hdr);
569
570 } elsif ($status == 500 or $status == 503 or $status =~ /^59/) {
571 # retry later
572 $cb->(0, $hdr);
573
574 } else {
575 $cb->(undef, $hdr);
576 }
577 }
578 ;
579 }
580
581 download "http://server/somelargefile", "/tmp/somelargefile", sub {
582 if ($_[0]) {
583 print "OK!\n";
584 } elsif (defined $_[0]) {
585 print "please retry later\n";
586 } else {
587 print "ERROR\n";
588 }
589 };
590
591 SOCKS PROXIES
592 Socks proxies are not directly supported by AnyEvent::HTTP. You can
593 compile your perl to support socks, or use an external program such as
594 socksify (dante) or tsocks to make your program use a socks proxy
595 transparently.
596
597 Alternatively, for AnyEvent::HTTP only, you can use your own
598 "tcp_connect" function that does the proxy handshake - here is an
599 example that works with socks4a proxies:
600
601 use Errno;
602 use AnyEvent::Util;
603 use AnyEvent::Socket;
604 use AnyEvent::Handle;
605
606 # host, port and username of/for your socks4a proxy
607 my $socks_host = "10.0.0.23";
608 my $socks_port = 9050;
609 my $socks_user = "";
610
611 sub socks4a_connect {
612 my ($host, $port, $connect_cb, $prepare_cb) = @_;
613
614 my $hdl = new AnyEvent::Handle
615 connect => [$socks_host, $socks_port],
616 on_prepare => sub { $prepare_cb->($_[0]{fh}) },
617 on_error => sub { $connect_cb->() },
618 ;
619
620 $hdl->push_write (pack "CCnNZ*Z*", 4, 1, $port, 1, $socks_user, $host);
621
622 $hdl->push_read (chunk => 8, sub {
623 my ($hdl, $chunk) = @_;
624 my ($status, $port, $ipn) = unpack "xCna4", $chunk;
625
626 if ($status == 0x5a) {
627 $connect_cb->($hdl->{fh}, (format_address $ipn) . ":$port");
628 } else {
629 $! = Errno::ENXIO; $connect_cb->();
630 }
631 });
632
633 $hdl
634 }
635
636 Use "socks4a_connect" instead of "tcp_connect" when doing
637 "http_request"s, possibly after switching off other proxy types:
638
639 AnyEvent::HTTP::set_proxy undef; # usually you do not want other proxies
640
641 http_get 'http://www.google.com', tcp_connect => \&socks4a_connect, sub {
642 my ($data, $headers) = @_;
643 ...
644 };
299 645
300SEE ALSO 646SEE ALSO
301 AnyEvent. 647 AnyEvent.
302 648
303AUTHOR 649AUTHOR
304 Marc Lehmann <schmorp@schmorp.de> 650 Marc Lehmann <schmorp@schmorp.de>
305 http://home.schmorp.de/ 651 http://home.schmorp.de/
306 652
307 With many thanks to Дмитрий Шалашов, who provided 653 With many thanks to Дмитрий Шалашов, who provided countless testcases
308 countless testcases and bugreports. 654 and bugreports.
309 655

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines