ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/README
(Generate patch)

Comparing AnyEvent-HTTP/README (file contents):
Revision 1.9 by root, Sat Jul 25 01:29:09 2009 UTC vs.
Revision 1.28 by root, Mon Apr 27 12:14:12 2020 UTC

12 This module is an AnyEvent user, you need to make sure that you use and 12 This module is an AnyEvent user, you need to make sure that you use and
13 run a supported event loop. 13 run a supported event loop.
14 14
15 This module implements a simple, stateless and non-blocking HTTP client. 15 This module implements a simple, stateless and non-blocking HTTP client.
16 It supports GET, POST and other request methods, cookies and more, all 16 It supports GET, POST and other request methods, cookies and more, all
17 on a very low level. It can follow redirects supports proxies and 17 on a very low level. It can follow redirects, supports proxies, and
18 automatically limits the number of connections to the values specified 18 automatically limits the number of connections to the values specified
19 in the RFC. 19 in the RFC.
20 20
21 It should generally be a "good client" that is enough for most HTTP 21 It should generally be a "good client" that is enough for most HTTP
22 tasks. Simple tasks should be simple, but complex tasks should still be 22 tasks. Simple tasks should be simple, but complex tasks should still be
46 URL must be an absolute http or https URL. 46 URL must be an absolute http or https URL.
47 47
48 When called in void context, nothing is returned. In other contexts, 48 When called in void context, nothing is returned. In other contexts,
49 "http_request" returns a "cancellation guard" - you have to keep the 49 "http_request" returns a "cancellation guard" - you have to keep the
50 object at least alive until the callback get called. If the object 50 object at least alive until the callback get called. If the object
51 gets destroyed before the callbakc is called, the request will be 51 gets destroyed before the callback is called, the request will be
52 cancelled. 52 cancelled.
53 53
54 The callback will be called with the response body data as first 54 The callback will be called with the response body data as first
55 argument (or "undef" if an error occured), and a hash-ref with 55 argument (or "undef" if an error occurred), and a hash-ref with
56 response headers as second argument. 56 response headers (and trailers) as second argument.
57 57
58 All the headers in that hash are lowercased. In addition to the 58 All the headers in that hash are lowercased. In addition to the
59 response headers, the "pseudo-headers" "HTTPVersion", "Status" and 59 response headers, the "pseudo-headers" (uppercase to avoid clashing
60 with possible response headers) "HTTPVersion", "Status" and "Reason"
60 "Reason" contain the three parts of the HTTP Status-Line of the same 61 contain the three parts of the HTTP Status-Line of the same name. If
62 an error occurs during the body phase of a request, then the
63 original "Status" and "Reason" values from the header are available
64 as "OrigStatus" and "OrigReason".
65
61 name. The pseudo-header "URL" contains the original URL (which can 66 The pseudo-header "URL" contains the actual URL (which can differ
62 differ from the requested URL when following redirects). 67 from the requested URL when following redirects - for example, you
68 might get an error that your URL scheme is not supported even though
69 your URL is a valid http URL because it redirected to an ftp URL, in
70 which case you can look at the URL pseudo header).
71
72 The pseudo-header "Redirect" only exists when the request was a
73 result of an internal redirect. In that case it is an array
74 reference with the "($data, $headers)" from the redirect response.
75 Note that this response could in turn be the result of a redirect
76 itself, and "$headers->{Redirect}[1]{Redirect}" will then contain
77 the original response, and so on.
63 78
64 If the server sends a header multiple times, then their contents 79 If the server sends a header multiple times, then their contents
65 will be joined together with a comma (","), as per the HTTP spec. 80 will be joined together with a comma (","), as per the HTTP spec.
66 81
67 If an internal error occurs, such as not being able to resolve a 82 If an internal error occurs, such as not being able to resolve a
68 hostname, then $data will be "undef", "$headers->{Status}" will be 83 hostname, then $data will be "undef", "$headers->{Status}" will be
69 "59x" (usually 599) and the "Reason" pseudo-header will contain an 84 590-599 and the "Reason" pseudo-header will contain an error
70 error message. 85 message. Currently the following status codes are used:
86
87 595 - errors during connection establishment, proxy handshake.
88 596 - errors during TLS negotiation, request sending and header
89 processing.
90 597 - errors during body receiving or processing.
91 598 - user aborted request via "on_header" or "on_body".
92 599 - other, usually nonretryable, errors (garbled URL etc.).
71 93
72 A typical callback might look like this: 94 A typical callback might look like this:
73 95
74 sub { 96 sub {
75 my ($body, $hdr) = @_; 97 my ($body, $hdr) = @_;
84 Additional parameters are key-value pairs, and are fully optional. 106 Additional parameters are key-value pairs, and are fully optional.
85 They include: 107 They include:
86 108
87 recurse => $count (default: $MAX_RECURSE) 109 recurse => $count (default: $MAX_RECURSE)
88 Whether to recurse requests or not, e.g. on redirects, 110 Whether to recurse requests or not, e.g. on redirects,
89 authentication retries and so on, and how often to do so. 111 authentication and other retries and so on, and how often to do
112 so.
113
114 Only redirects to http and https URLs are supported. While most
115 common redirection forms are handled entirely within this
116 module, some require the use of the optional URI module. If it
117 is required but missing, then the request will fail with an
118 error.
90 119
91 headers => hashref 120 headers => hashref
92 The request headers to use. Currently, "http_request" may 121 The request headers to use. Currently, "http_request" may
93 provide its own "Host:", "Content-Length:", "Connection:" and 122 provide its own "Host:", "Content-Length:", "Connection:" and
94 "Cookie:" headers and will provide defaults for "User-Agent:" 123 "Cookie:" headers and will provide defaults at least for "TE:",
95 and "Referer:". 124 "Referer:" and "User-Agent:" (this can be suppressed by using
125 "undef" for these headers in which case they won't be sent at
126 all).
127
128 You really should provide your own "User-Agent:" header value
129 that is appropriate for your program - I wouldn't be surprised
130 if the default AnyEvent string gets blocked by webservers sooner
131 or later.
132
133 Also, make sure that your headers names and values do not
134 contain any embedded newlines.
96 135
97 timeout => $seconds 136 timeout => $seconds
98 The time-out to use for various stages - each connect attempt 137 The time-out to use for various stages - each connect attempt
99 will reset the timeout, as will read or write activity. Default 138 will reset the timeout, as will read or write activity, i.e.
139 this is not an overall timeout.
140
100 timeout is 5 minutes. 141 Default timeout is 5 minutes.
101 142
102 proxy => [$host, $port[, $scheme]] or undef 143 proxy => [$host, $port[, $scheme]] or undef
103 Use the given http proxy for all requests. If not specified, 144 Use the given http proxy for all requests, or no proxy if
104 then the default proxy (as specified by $ENV{http_proxy}) is
105 used. 145 "undef" is used.
106 146
107 $scheme must be either missing or "http" for HTTP, or "https" 147 $scheme must be either missing or must be "http" for HTTP.
108 for HTTPS. 148
149 If not specified, then the default proxy is used (see
150 "AnyEvent::HTTP::set_proxy").
151
152 Currently, if your proxy requires authorization, you have to
153 specify an appropriate "Proxy-Authorization" header in every
154 request.
155
156 Note that this module will prefer an existing persistent
157 connection, even if that connection was made using another
158 proxy. If you need to ensure that a new connection is made in
159 this case, you can either force "persistent" to false or e.g.
160 use the proxy address in your "sessionid".
109 161
110 body => $string 162 body => $string
111 The request body, usually empty. Will be-sent as-is (future 163 The request body, usually empty. Will be sent as-is (future
112 versions of this module might offer more options). 164 versions of this module might offer more options).
113 165
114 cookie_jar => $hash_ref 166 cookie_jar => $hash_ref
115 Passing this parameter enables (simplified) cookie-processing, 167 Passing this parameter enables (simplified) cookie-processing,
116 loosely based on the original netscape specification. 168 loosely based on the original netscape specification.
117 169
118 The $hash_ref must be an (initially empty) hash reference which 170 The $hash_ref must be an (initially empty) hash reference which
119 will get updated automatically. It is possible to save the 171 will get updated automatically. It is possible to save the
120 cookie_jar to persistent storage with something like JSON or 172 cookie jar to persistent storage with something like JSON or
121 Storable, but this is not recommended, as expiry times are 173 Storable - see the "AnyEvent::HTTP::cookie_jar_expire" function
122 currently being ignored. 174 if you wish to remove expired or session-only cookies, and also
175 for documentation on the format of the cookie jar.
123 176
124 Note that this cookie implementation is not of very high 177 Note that this cookie implementation is not meant to be
125 quality, nor meant to be complete. If you want complete cookie 178 complete. If you want complete cookie management you have to do
126 management you have to do that on your own. "cookie_jar" is 179 that on your own. "cookie_jar" is meant as a quick fix to get
127 meant as a quick fix to get some cookie-using sites working. 180 most cookie-using sites working. Cookies are a privacy disaster,
128 Cookies are a privacy disaster, do not use them unless required 181 do not use them unless required to.
129 to. 182
183 When cookie processing is enabled, the "Cookie:" and
184 "Set-Cookie:" headers will be set and handled by this module,
185 otherwise they will be left untouched.
130 186
131 tls_ctx => $scheme | $tls_ctx 187 tls_ctx => $scheme | $tls_ctx
132 Specifies the AnyEvent::TLS context to be used for https 188 Specifies the AnyEvent::TLS context to be used for https
133 connections. This parameter follows the same rules as the 189 connections. This parameter follows the same rules as the
134 "tls_ctx" parameter to AnyEvent::Handle, but additionally, the 190 "tls_ctx" parameter to AnyEvent::Handle, but additionally, the
137 and high-security (CA and common-name verification) TLS context. 193 and high-security (CA and common-name verification) TLS context.
138 194
139 The default for this option is "low", which could be interpreted 195 The default for this option is "low", which could be interpreted
140 as "give me the page, no matter what". 196 as "give me the page, no matter what".
141 197
198 See also the "sessionid" parameter.
199
200 sessionid => $string
201 The module might reuse connections to the same host internally
202 (regardless of other settings, such as "tcp_connect" or
203 "proxy"). Sometimes (e.g. when using TLS or a specfic proxy),
204 you do not want to reuse connections from other sessions. This
205 can be achieved by setting this parameter to some unique ID
206 (such as the address of an object storing your state data or the
207 TLS context, or the proxy IP) - only connections using the same
208 unique ID will be reused.
209
210 on_prepare => $callback->($fh)
211 In rare cases you need to "tune" the socket before it is used to
212 connect (for example, to bind it on a given IP address). This
213 parameter overrides the prepare callback passed to
214 "AnyEvent::Socket::tcp_connect" and behaves exactly the same way
215 (e.g. it has to provide a timeout). See the description for the
216 $prepare_cb argument of "AnyEvent::Socket::tcp_connect" for
217 details.
218
219 tcp_connect => $callback->($host, $service, $connect_cb,
220 $prepare_cb)
221 In even rarer cases you want total control over how
222 AnyEvent::HTTP establishes connections. Normally it uses
223 AnyEvent::Socket::tcp_connect to do this, but you can provide
224 your own "tcp_connect" function - obviously, it has to follow
225 the same calling conventions, except that it may always return a
226 connection guard object.
227
228 The connections made by this hook will be treated as equivalent
229 to connections made the built-in way, specifically, they will be
230 put into and taken from the persistent connection cache. If your
231 $tcp_connect function is incompatible with this kind of re-use,
232 consider switching off "persistent" connections and/or providing
233 a "sessionid" identifier.
234
235 There are probably lots of weird uses for this function,
236 starting from tracing the hosts "http_request" actually tries to
237 connect, to (inexact but fast) host => IP address caching or
238 even socks protocol support.
239
142 on_header => $callback->($headers) 240 on_header => $callback->($headers)
143 When specified, this callback will be called with the header 241 When specified, this callback will be called with the header
144 hash as soon as headers have been successfully received from the 242 hash as soon as headers have been successfully received from the
145 remote server (not on locally-generated errors). 243 remote server (not on locally-generated errors).
146 244
151 249
152 This callback is useful, among other things, to quickly reject 250 This callback is useful, among other things, to quickly reject
153 unwanted content, which, if it is supposed to be rare, can be 251 unwanted content, which, if it is supposed to be rare, can be
154 faster than first doing a "HEAD" request. 252 faster than first doing a "HEAD" request.
155 253
254 The downside is that cancelling the request makes it impossible
255 to re-use the connection. Also, the "on_header" callback will
256 not receive any trailer (headers sent after the response body).
257
156 Example: cancel the request unless the content-type is 258 Example: cancel the request unless the content-type is
157 "text/html". 259 "text/html".
158 260
159 on_header => sub { 261 on_header => sub {
160 $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/ 262 $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
167 269
168 It has to return either true (in which case AnyEvent::HTTP will 270 It has to return either true (in which case AnyEvent::HTTP will
169 continue), or false, in which case AnyEvent::HTTP will cancel 271 continue), or false, in which case AnyEvent::HTTP will cancel
170 the download (and call the completion callback with an error 272 the download (and call the completion callback with an error
171 code of 598). 273 code of 598).
274
275 The downside to cancelling the request is that it makes it
276 impossible to re-use the connection.
172 277
173 This callback is useful when the data is too large to be held in 278 This callback is useful when the data is too large to be held in
174 memory (so the callback writes it to a file) or when only some 279 memory (so the callback writes it to a file) or when only some
175 information should be extracted, or when the body should be 280 information should be extracted, or when the body should be
176 processed incrementally. 281 processed incrementally.
190 AnyEvent::Handle object associated with the connection. In error 295 AnyEvent::Handle object associated with the connection. In error
191 cases, "undef" will be passed. When there is no body (e.g. 296 cases, "undef" will be passed. When there is no body (e.g.
192 status 304), the empty string will be passed. 297 status 304), the empty string will be passed.
193 298
194 The handle object might or might not be in TLS mode, might be 299 The handle object might or might not be in TLS mode, might be
195 connected to a proxy, be a persistent connection etc., and 300 connected to a proxy, be a persistent connection, use chunked
196 configured in unspecified ways. The user is responsible for this 301 transfer encoding etc., and configured in unspecified ways. The
197 handle (it will not be used by this module anymore). 302 user is responsible for this handle (it will not be used by this
303 module anymore).
198 304
199 This is useful with some push-type services, where, after the 305 This is useful with some push-type services, where, after the
200 initial headers, an interactive protocol is used (typical 306 initial headers, an interactive protocol is used (typical
201 example would be the push-style twitter API which starts a 307 example would be the push-style twitter API which starts a
202 JSON/XML stream). 308 JSON/XML stream).
203 309
204 If you think you need this, first have a look at "on_body", to 310 If you think you need this, first have a look at "on_body", to
205 see if that doesn't solve your problem in a better way. 311 see if that doesn't solve your problem in a better way.
206 312
313 persistent => $boolean
314 Try to create/reuse a persistent connection. When this flag is
315 set (default: true for idempotent requests, false for all
316 others), then "http_request" tries to re-use an existing
317 (previously-created) persistent connection to same host (i.e.
318 identical URL scheme, hostname, port and sessionid) and, failing
319 that, tries to create a new one.
320
321 Requests failing in certain ways will be automatically retried
322 once, which is dangerous for non-idempotent requests, which is
323 why it defaults to off for them. The reason for this is because
324 the bozos who designed HTTP/1.1 made it impossible to
325 distinguish between a fatal error and a normal connection
326 timeout, so you never know whether there was a problem with your
327 request or not.
328
329 When reusing an existent connection, many parameters (such as
330 TLS context) will be ignored. See the "sessionid" parameter for
331 a workaround.
332
333 keepalive => $boolean
334 Only used when "persistent" is also true. This parameter decides
335 whether "http_request" tries to handshake a HTTP/1.0-style
336 keep-alive connection (as opposed to only a HTTP/1.1 persistent
337 connection).
338
339 The default is true, except when using a proxy, in which case it
340 defaults to false, as HTTP/1.0 proxies cannot support this in a
341 meaningful way.
342
343 handle_params => { key => value ... }
344 The key-value pairs in this hash will be passed to any
345 AnyEvent::Handle constructor that is called - not all requests
346 will create a handle, and sometimes more than one is created, so
347 this parameter is only good for setting hints.
348
349 Example: set the maximum read size to 4096, to potentially
350 conserve memory at the cost of speed.
351
352 handle_params => {
353 max_read_size => 4096,
354 },
355
207 Example: make a simple HTTP GET request for http://www.nethype.de/ 356 Example: do a simple HTTP GET request for http://www.nethype.de/ and
357 print the response body.
208 358
209 http_request GET => "http://www.nethype.de/", sub { 359 http_request GET => "http://www.nethype.de/", sub {
210 my ($body, $hdr) = @_; 360 my ($body, $hdr) = @_;
211 print "$body\n"; 361 print "$body\n";
212 }; 362 };
213 363
214 Example: make a HTTP HEAD request on https://www.google.com/, use a 364 Example: do a HTTP HEAD request on https://www.google.com/, use a
215 timeout of 30 seconds. 365 timeout of 30 seconds.
216 366
217 http_request 367 http_request
218 GET => "https://www.google.com", 368 HEAD => "https://www.google.com",
369 headers => { "user-agent" => "MySearchClient 1.0" },
219 timeout => 30, 370 timeout => 30,
220 sub { 371 sub {
221 my ($body, $hdr) = @_; 372 my ($body, $hdr) = @_;
222 use Data::Dumper; 373 use Data::Dumper;
223 print Dumper $hdr; 374 print Dumper $hdr;
224 } 375 }
225 ; 376 ;
226 377
227 Example: make another simple HTTP GET request, but immediately try 378 Example: do another simple HTTP GET request, but immediately try to
228 to cancel it. 379 cancel it.
229 380
230 my $request = http_request GET => "http://www.nethype.de/", sub { 381 my $request = http_request GET => "http://www.nethype.de/", sub {
231 my ($body, $hdr) = @_; 382 my ($body, $hdr) = @_;
232 print "$body\n"; 383 print "$body\n";
233 }; 384 };
234 385
235 undef $request; 386 undef $request;
236 387
388 DNS CACHING
389 AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for the
390 actual connection, which in turn uses AnyEvent::DNS to resolve
391 hostnames. The latter is a simple stub resolver and does no caching on
392 its own. If you want DNS caching, you currently have to provide your own
393 default resolver (by storing a suitable resolver object in
394 $AnyEvent::DNS::RESOLVER) or your own "tcp_connect" callback.
395
237 GLOBAL FUNCTIONS AND VARIABLES 396 GLOBAL FUNCTIONS AND VARIABLES
238 AnyEvent::HTTP::set_proxy "proxy-url" 397 AnyEvent::HTTP::set_proxy "proxy-url"
239 Sets the default proxy server to use. The proxy-url must begin with 398 Sets the default proxy server to use. The proxy-url must begin with
240 a string of the form "http://host:port" (optionally "https:..."). 399 a string of the form "http://host:port", croaks otherwise.
400
401 To clear an already-set proxy, use "undef".
402
403 When AnyEvent::HTTP is loaded for the first time it will query the
404 default proxy from the operating system, currently by looking at
405 "$ENV{http_proxy"}.
406
407 AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
408 Remove all cookies from the cookie jar that have been expired. If
409 $session_end is given and true, then additionally remove all session
410 cookies.
411
412 You should call this function (with a true $session_end) before you
413 save cookies to disk, and you should call this function after
414 loading them again. If you have a long-running program you can
415 additionally call this function from time to time.
416
417 A cookie jar is initially an empty hash-reference that is managed by
418 this module. Its format is subject to change, but currently it is as
419 follows:
420
421 The key "version" has to contain 2, otherwise the hash gets cleared.
422 All other keys are hostnames or IP addresses pointing to
423 hash-references. The key for these inner hash references is the
424 server path for which this cookie is meant, and the values are again
425 hash-references. Each key of those hash-references is a cookie name,
426 and the value, you guessed it, is another hash-reference, this time
427 with the key-value pairs from the cookie, except for "expires" and
428 "max-age", which have been replaced by a "_expires" key that
429 contains the cookie expiry timestamp. Session cookies are indicated
430 by not having an "_expires" key.
431
432 Here is an example of a cookie jar with a single cookie, so you have
433 a chance of understanding the above paragraph:
434
435 {
436 version => 2,
437 "10.0.0.1" => {
438 "/" => {
439 "mythweb_id" => {
440 _expires => 1293917923,
441 value => "ooRung9dThee3ooyXooM1Ohm",
442 },
443 },
444 },
445 }
446
447 $date = AnyEvent::HTTP::format_date $timestamp
448 Takes a POSIX timestamp (seconds since the epoch) and formats it as
449 a HTTP Date (RFC 2616).
450
451 $timestamp = AnyEvent::HTTP::parse_date $date
452 Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie spec)
453 or a bunch of minor variations of those, and returns the
454 corresponding POSIX timestamp, or "undef" if the date cannot be
455 parsed.
241 456
242 $AnyEvent::HTTP::MAX_RECURSE 457 $AnyEvent::HTTP::MAX_RECURSE
243 The default value for the "recurse" request parameter (default: 10). 458 The default value for the "recurse" request parameter (default: 10).
459
460 $AnyEvent::HTTP::TIMEOUT
461 The default timeout for connection operations (default: 300).
244 462
245 $AnyEvent::HTTP::USERAGENT 463 $AnyEvent::HTTP::USERAGENT
246 The default value for the "User-Agent" header (the default is 464 The default value for the "User-Agent" header (the default is
247 "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; 465 "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION;
248 +http://software.schmorp.de/pkg/AnyEvent)"). 466 +http://software.schmorp.de/pkg/AnyEvent)").
249 467
250 $AnyEvent::HTTP::MAX_PER_HOST 468 $AnyEvent::HTTP::MAX_PER_HOST
251 The maximum number of concurrent conenctions to the same host 469 The maximum number of concurrent connections to the same host
252 (identified by the hostname). If the limit is exceeded, then the 470 (identified by the hostname). If the limit is exceeded, then
253 additional requests are queued until previous connections are 471 additional requests are queued until previous connections are
254 closed. 472 closed. Both persistent and non-persistent connections are counted
473 in this limit.
255 474
256 The default value for this is 4, and it is highly advisable to not 475 The default value for this is 4, and it is highly advisable to not
257 increase it. 476 increase it much.
477
478 For comparison: the RFC's recommend 4 non-persistent or 2 persistent
479 connections, older browsers used 2, newer ones (such as firefox 3)
480 typically use 6, and Opera uses 8 because like, they have the
481 fastest browser and give a shit for everybody else on the planet.
482
483 $AnyEvent::HTTP::PERSISTENT_TIMEOUT
484 The time after which idle persistent connections get closed by
485 AnyEvent::HTTP (default: 3).
258 486
259 $AnyEvent::HTTP::ACTIVE 487 $AnyEvent::HTTP::ACTIVE
260 The number of active connections. This is not the number of 488 The number of active connections. This is not the number of
261 currently running requests, but the number of currently open and 489 currently running requests, but the number of currently open and
262 non-idle TCP connections. This number of can be useful for 490 non-idle TCP connections. This number can be useful for
263 load-leveling. 491 load-leveling.
492
493 SHOWCASE
494 This section contains some more elaborate "real-world" examples or code
495 snippets.
496
497 HTTP/1.1 FILE DOWNLOAD
498 Downloading files with HTTP can be quite tricky, especially when
499 something goes wrong and you want to resume.
500
501 Here is a function that initiates and resumes a download. It uses the
502 last modified time to check for file content changes, and works with
503 many HTTP/1.0 servers as well, and usually falls back to a complete
504 re-download on older servers.
505
506 It calls the completion callback with either "undef", which means a
507 nonretryable error occurred, 0 when the download was partial and should
508 be retried, and 1 if it was successful.
509
510 use AnyEvent::HTTP;
511
512 sub download($$$) {
513 my ($url, $file, $cb) = @_;
514
515 open my $fh, "+<", $file
516 or die "$file: $!";
517
518 my %hdr;
519 my $ofs = 0;
520
521 if (stat $fh and -s _) {
522 $ofs = -s _;
523 warn "-s is ", $ofs;
524 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
525 $hdr{"range"} = "bytes=$ofs-";
526 }
527
528 http_get $url,
529 headers => \%hdr,
530 on_header => sub {
531 my ($hdr) = @_;
532
533 if ($hdr->{Status} == 200 && $ofs) {
534 # resume failed
535 truncate $fh, $ofs = 0;
536 }
537
538 sysseek $fh, $ofs, 0;
539
540 1
541 },
542 on_body => sub {
543 my ($data, $hdr) = @_;
544
545 if ($hdr->{Status} =~ /^2/) {
546 length $data == syswrite $fh, $data
547 or return; # abort on write errors
548 }
549
550 1
551 },
552 sub {
553 my (undef, $hdr) = @_;
554
555 my $status = $hdr->{Status};
556
557 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
558 utime $time, $time, $fh;
559 }
560
561 if ($status == 200 || $status == 206 || $status == 416) {
562 # download ok || resume ok || file already fully downloaded
563 $cb->(1, $hdr);
564
565 } elsif ($status == 412) {
566 # file has changed while resuming, delete and retry
567 unlink $file;
568 $cb->(0, $hdr);
569
570 } elsif ($status == 500 or $status == 503 or $status =~ /^59/) {
571 # retry later
572 $cb->(0, $hdr);
573
574 } else {
575 $cb->(undef, $hdr);
576 }
577 }
578 ;
579 }
580
581 download "http://server/somelargefile", "/tmp/somelargefile", sub {
582 if ($_[0]) {
583 print "OK!\n";
584 } elsif (defined $_[0]) {
585 print "please retry later\n";
586 } else {
587 print "ERROR\n";
588 }
589 };
590
591 SOCKS PROXIES
592 Socks proxies are not directly supported by AnyEvent::HTTP. You can
593 compile your perl to support socks, or use an external program such as
594 socksify (dante) or tsocks to make your program use a socks proxy
595 transparently.
596
597 Alternatively, for AnyEvent::HTTP only, you can use your own
598 "tcp_connect" function that does the proxy handshake - here is an
599 example that works with socks4a proxies:
600
601 use Errno;
602 use AnyEvent::Util;
603 use AnyEvent::Socket;
604 use AnyEvent::Handle;
605
606 # host, port and username of/for your socks4a proxy
607 my $socks_host = "10.0.0.23";
608 my $socks_port = 9050;
609 my $socks_user = "";
610
611 sub socks4a_connect {
612 my ($host, $port, $connect_cb, $prepare_cb) = @_;
613
614 my $hdl = new AnyEvent::Handle
615 connect => [$socks_host, $socks_port],
616 on_prepare => sub { $prepare_cb->($_[0]{fh}) },
617 on_error => sub { $connect_cb->() },
618 ;
619
620 $hdl->push_write (pack "CCnNZ*Z*", 4, 1, $port, 1, $socks_user, $host);
621
622 $hdl->push_read (chunk => 8, sub {
623 my ($hdl, $chunk) = @_;
624 my ($status, $port, $ipn) = unpack "xCna4", $chunk;
625
626 if ($status == 0x5a) {
627 $connect_cb->($hdl->{fh}, (format_address $ipn) . ":$port");
628 } else {
629 $! = Errno::ENXIO; $connect_cb->();
630 }
631 });
632
633 $hdl
634 }
635
636 Use "socks4a_connect" instead of "tcp_connect" when doing
637 "http_request"s, possibly after switching off other proxy types:
638
639 AnyEvent::HTTP::set_proxy undef; # usually you do not want other proxies
640
641 http_get 'http://www.google.com', tcp_connect => \&socks4a_connect, sub {
642 my ($data, $headers) = @_;
643 ...
644 };
264 645
265SEE ALSO 646SEE ALSO
266 AnyEvent. 647 AnyEvent.
267 648
268AUTHOR 649AUTHOR
269 Marc Lehmann <schmorp@schmorp.de> 650 Marc Lehmann <schmorp@schmorp.de>
270 http://home.schmorp.de/ 651 http://home.schmorp.de/
271 652
272 With many thanks to Дмитрий Шалашов, who provided 653 With many thanks to Дмитрий Шалашов, who provided countless testcases
273 countless testcases and bugreports. 654 and bugreports.
274 655

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines