| 1 |
NAME |
| 2 |
AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client |
| 3 |
|
| 4 |
SYNOPSIS |
| 5 |
use AnyEvent::HTTP; |
| 6 |
|
| 7 |
http_get "http://www.nethype.de/", sub { print $_[1] }; |
| 8 |
|
| 9 |
# ... do something else here |
| 10 |
|
| 11 |
DESCRIPTION |
| 12 |
This module is an AnyEvent user, you need to make sure that you use and |
| 13 |
run a supported event loop. |
| 14 |
|
| 15 |
This module implements a simple, stateless and non-blocking HTTP client. |
| 16 |
It supports GET, POST and other request methods, cookies and more, all |
| 17 |
on a very low level. It can follow redirects, supports proxies, and |
| 18 |
automatically limits the number of connections to the values specified |
| 19 |
in the RFC. |
| 20 |
|
| 21 |
It should generally be a "good client" that is enough for most HTTP |
| 22 |
tasks. Simple tasks should be simple, but complex tasks should still be |
| 23 |
possible as the user retains control over request and response headers. |
| 24 |
|
| 25 |
The caller is responsible for authentication management, cookies (if the |
| 26 |
simplistic implementation in this module doesn't suffice), referer and |
| 27 |
other high-level protocol details for which this module offers only |
| 28 |
limited support. |
| 29 |
|
| 30 |
METHODS |
| 31 |
http_get $url, key => value..., $cb->($data, $headers) |
| 32 |
Executes an HTTP-GET request. See the http_request function for |
| 33 |
details on additional parameters and the return value. |
| 34 |
|
| 35 |
http_head $url, key => value..., $cb->($data, $headers) |
| 36 |
Executes an HTTP-HEAD request. See the http_request function for |
| 37 |
details on additional parameters and the return value. |
| 38 |
|
| 39 |
http_post $url, $body, key => value..., $cb->($data, $headers) |
| 40 |
Executes an HTTP-POST request with a request body of $body. See the |
| 41 |
http_request function for details on additional parameters and the |
| 42 |
return value. |
| 43 |
|
| 44 |
http_request $method => $url, key => value..., $cb->($data, $headers) |
| 45 |
Executes a HTTP request of type $method (e.g. "GET", "POST"). The |
| 46 |
URL must be an absolute http or https URL. |
| 47 |
|
| 48 |
When called in void context, nothing is returned. In other contexts, |
| 49 |
"http_request" returns a "cancellation guard" - you have to keep the |
| 50 |
object at least alive until the callback get called. If the object |
| 51 |
gets destroyed before the callback is called, the request will be |
| 52 |
cancelled. |
| 53 |
|
| 54 |
The callback will be called with the response body data as first |
| 55 |
argument (or "undef" if an error occurred), and a hash-ref with |
| 56 |
response headers (and trailers) as second argument. |
| 57 |
|
| 58 |
All the headers in that hash are lowercased. In addition to the |
| 59 |
response headers, the "pseudo-headers" (uppercase to avoid clashing |
| 60 |
with possible response headers) "HTTPVersion", "Status" and "Reason" |
| 61 |
contain the three parts of the HTTP Status-Line of the same name. If |
| 62 |
an error occurs during the body phase of a request, then the |
| 63 |
original "Status" and "Reason" values from the header are available |
| 64 |
as "OrigStatus" and "OrigReason". |
| 65 |
|
| 66 |
The pseudo-header "URL" contains the actual URL (which can differ |
| 67 |
from the requested URL when following redirects - for example, you |
| 68 |
might get an error that your URL scheme is not supported even though |
| 69 |
your URL is a valid http URL because it redirected to an ftp URL, in |
| 70 |
which case you can look at the URL pseudo header). |
| 71 |
|
| 72 |
The pseudo-header "Redirect" only exists when the request was a |
| 73 |
result of an internal redirect. In that case it is an array |
| 74 |
reference with the "($data, $headers)" from the redirect response. |
| 75 |
Note that this response could in turn be the result of a redirect |
| 76 |
itself, and "$headers->{Redirect}[1]{Redirect}" will then contain |
| 77 |
the original response, and so on. |
| 78 |
|
| 79 |
If the server sends a header multiple times, then their contents |
| 80 |
will be joined together with a comma (","), as per the HTTP spec. |
| 81 |
|
| 82 |
If an internal error occurs, such as not being able to resolve a |
| 83 |
hostname, then $data will be "undef", "$headers->{Status}" will be |
| 84 |
590-599 and the "Reason" pseudo-header will contain an error |
| 85 |
message. Currently the following status codes are used: |
| 86 |
|
| 87 |
595 - errors during connection establishment, proxy handshake. |
| 88 |
596 - errors during TLS negotiation, request sending and header |
| 89 |
processing. |
| 90 |
597 - errors during body receiving or processing. |
| 91 |
598 - user aborted request via "on_header" or "on_body". |
| 92 |
599 - other, usually nonretryable, errors (garbled URL etc.). |
| 93 |
|
| 94 |
A typical callback might look like this: |
| 95 |
|
| 96 |
sub { |
| 97 |
my ($body, $hdr) = @_; |
| 98 |
|
| 99 |
if ($hdr->{Status} =~ /^2/) { |
| 100 |
... everything should be ok |
| 101 |
} else { |
| 102 |
print "error, $hdr->{Status} $hdr->{Reason}\n"; |
| 103 |
} |
| 104 |
} |
| 105 |
|
| 106 |
Additional parameters are key-value pairs, and are fully optional. |
| 107 |
They include: |
| 108 |
|
| 109 |
recurse => $count (default: $MAX_RECURSE) |
| 110 |
Whether to recurse requests or not, e.g. on redirects, |
| 111 |
authentication and other retries and so on, and how often to do |
| 112 |
so. |
| 113 |
|
| 114 |
Only redirects to http and https URLs are supported. While most |
| 115 |
common redirection forms are handled entirely within this |
| 116 |
module, some require the use of the optional URI module. If it |
| 117 |
is required but missing, then the request will fail with an |
| 118 |
error. |
| 119 |
|
| 120 |
headers => hashref |
| 121 |
The request headers to use. Currently, "http_request" may |
| 122 |
provide its own "Host:", "Content-Length:", "Connection:" and |
| 123 |
"Cookie:" headers and will provide defaults at least for "TE:", |
| 124 |
"Referer:" and "User-Agent:" (this can be suppressed by using |
| 125 |
"undef" for these headers in which case they won't be sent at |
| 126 |
all). |
| 127 |
|
| 128 |
You really should provide your own "User-Agent:" header value |
| 129 |
that is appropriate for your program - I wouldn't be surprised |
| 130 |
if the default AnyEvent string gets blocked by webservers sooner |
| 131 |
or later. |
| 132 |
|
| 133 |
Also, make sure that your headers names and values do not |
| 134 |
contain any embedded newlines. |
| 135 |
|
| 136 |
timeout => $seconds |
| 137 |
The time-out to use for various stages - each connect attempt |
| 138 |
will reset the timeout, as will read or write activity, i.e. |
| 139 |
this is not an overall timeout. |
| 140 |
|
| 141 |
Default timeout is 5 minutes. |
| 142 |
|
| 143 |
proxy => [$host, $port[, $scheme]] or undef |
| 144 |
Use the given http proxy for all requests, or no proxy if |
| 145 |
"undef" is used. |
| 146 |
|
| 147 |
$scheme must be either missing or must be "http" for HTTP. |
| 148 |
|
| 149 |
If not specified, then the default proxy is used (see |
| 150 |
"AnyEvent::HTTP::set_proxy"). |
| 151 |
|
| 152 |
Currently, if your proxy requires authorization, you have to |
| 153 |
specify an appropriate "Proxy-Authorization" header in every |
| 154 |
request. |
| 155 |
|
| 156 |
Note that this module will prefer an existing persistent |
| 157 |
connection, even if that connection was made using another |
| 158 |
proxy. If you need to ensure that a new connection is made in |
| 159 |
this case, you can either force "persistent" to false or e.g. |
| 160 |
use the proxy address in your "sessionid". |
| 161 |
|
| 162 |
body => $string |
| 163 |
The request body, usually empty. Will be sent as-is (future |
| 164 |
versions of this module might offer more options). |
| 165 |
|
| 166 |
cookie_jar => $hash_ref |
| 167 |
Passing this parameter enables (simplified) cookie-processing, |
| 168 |
loosely based on the original netscape specification. |
| 169 |
|
| 170 |
The $hash_ref must be an (initially empty) hash reference which |
| 171 |
will get updated automatically. It is possible to save the |
| 172 |
cookie jar to persistent storage with something like JSON or |
| 173 |
Storable - see the "AnyEvent::HTTP::cookie_jar_expire" function |
| 174 |
if you wish to remove expired or session-only cookies, and also |
| 175 |
for documentation on the format of the cookie jar. |
| 176 |
|
| 177 |
Note that this cookie implementation is not meant to be |
| 178 |
complete. If you want complete cookie management you have to do |
| 179 |
that on your own. "cookie_jar" is meant as a quick fix to get |
| 180 |
most cookie-using sites working. Cookies are a privacy disaster, |
| 181 |
do not use them unless required to. |
| 182 |
|
| 183 |
When cookie processing is enabled, the "Cookie:" and |
| 184 |
"Set-Cookie:" headers will be set and handled by this module, |
| 185 |
otherwise they will be left untouched. |
| 186 |
|
| 187 |
tls_ctx => $scheme | $tls_ctx |
| 188 |
Specifies the AnyEvent::TLS context to be used for https |
| 189 |
connections. This parameter follows the same rules as the |
| 190 |
"tls_ctx" parameter to AnyEvent::Handle, but additionally, the |
| 191 |
two strings "low" or "high" can be specified, which give you a |
| 192 |
predefined low-security (no verification, highest compatibility) |
| 193 |
and high-security (CA and common-name verification) TLS context. |
| 194 |
|
| 195 |
The default for this option is "low", which could be interpreted |
| 196 |
as "give me the page, no matter what". |
| 197 |
|
| 198 |
See also the "sessionid" parameter. |
| 199 |
|
| 200 |
sessionid => $string |
| 201 |
The module might reuse connections to the same host internally |
| 202 |
(regardless of other settings, such as "tcp_connect" or |
| 203 |
"proxy"). Sometimes (e.g. when using TLS or a specfic proxy), |
| 204 |
you do not want to reuse connections from other sessions. This |
| 205 |
can be achieved by setting this parameter to some unique ID |
| 206 |
(such as the address of an object storing your state data or the |
| 207 |
TLS context, or the proxy IP) - only connections using the same |
| 208 |
unique ID will be reused. |
| 209 |
|
| 210 |
on_prepare => $callback->($fh) |
| 211 |
In rare cases you need to "tune" the socket before it is used to |
| 212 |
connect (for example, to bind it on a given IP address). This |
| 213 |
parameter overrides the prepare callback passed to |
| 214 |
"AnyEvent::Socket::tcp_connect" and behaves exactly the same way |
| 215 |
(e.g. it has to provide a timeout). See the description for the |
| 216 |
$prepare_cb argument of "AnyEvent::Socket::tcp_connect" for |
| 217 |
details. |
| 218 |
|
| 219 |
tcp_connect => $callback->($host, $service, $connect_cb, |
| 220 |
$prepare_cb) |
| 221 |
In even rarer cases you want total control over how |
| 222 |
AnyEvent::HTTP establishes connections. Normally it uses |
| 223 |
AnyEvent::Socket::tcp_connect to do this, but you can provide |
| 224 |
your own "tcp_connect" function - obviously, it has to follow |
| 225 |
the same calling conventions, except that it may always return a |
| 226 |
connection guard object. |
| 227 |
|
| 228 |
The connections made by this hook will be treated as equivalent |
| 229 |
to connections made the built-in way, specifically, they will be |
| 230 |
put into and taken from the persistent connection cache. If your |
| 231 |
$tcp_connect function is incompatible with this kind of re-use, |
| 232 |
consider switching off "persistent" connections and/or providing |
| 233 |
a "sessionid" identifier. |
| 234 |
|
| 235 |
There are probably lots of weird uses for this function, |
| 236 |
starting from tracing the hosts "http_request" actually tries to |
| 237 |
connect, to (inexact but fast) host => IP address caching or |
| 238 |
even socks protocol support. |
| 239 |
|
| 240 |
on_header => $callback->($headers) |
| 241 |
When specified, this callback will be called with the header |
| 242 |
hash as soon as headers have been successfully received from the |
| 243 |
remote server (not on locally-generated errors). |
| 244 |
|
| 245 |
It has to return either true (in which case AnyEvent::HTTP will |
| 246 |
continue), or false, in which case AnyEvent::HTTP will cancel |
| 247 |
the download (and call the finish callback with an error code of |
| 248 |
598). |
| 249 |
|
| 250 |
This callback is useful, among other things, to quickly reject |
| 251 |
unwanted content, which, if it is supposed to be rare, can be |
| 252 |
faster than first doing a "HEAD" request. |
| 253 |
|
| 254 |
The downside is that cancelling the request makes it impossible |
| 255 |
to re-use the connection. Also, the "on_header" callback will |
| 256 |
not receive any trailer (headers sent after the response body). |
| 257 |
|
| 258 |
Example: cancel the request unless the content-type is |
| 259 |
"text/html". |
| 260 |
|
| 261 |
on_header => sub { |
| 262 |
$_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/ |
| 263 |
}, |
| 264 |
|
| 265 |
on_body => $callback->($partial_body, $headers) |
| 266 |
When specified, all body data will be passed to this callback |
| 267 |
instead of to the completion callback. The completion callback |
| 268 |
will get the empty string instead of the body data. |
| 269 |
|
| 270 |
It has to return either true (in which case AnyEvent::HTTP will |
| 271 |
continue), or false, in which case AnyEvent::HTTP will cancel |
| 272 |
the download (and call the completion callback with an error |
| 273 |
code of 598). |
| 274 |
|
| 275 |
The downside to cancelling the request is that it makes it |
| 276 |
impossible to re-use the connection. |
| 277 |
|
| 278 |
This callback is useful when the data is too large to be held in |
| 279 |
memory (so the callback writes it to a file) or when only some |
| 280 |
information should be extracted, or when the body should be |
| 281 |
processed incrementally. |
| 282 |
|
| 283 |
It is usually preferred over doing your own body handling via |
| 284 |
"want_body_handle", but in case of streaming APIs, where HTTP is |
| 285 |
only used to create a connection, "want_body_handle" is the |
| 286 |
better alternative, as it allows you to install your own event |
| 287 |
handler, reducing resource usage. |
| 288 |
|
| 289 |
want_body_handle => $enable |
| 290 |
When enabled (default is disabled), the behaviour of |
| 291 |
AnyEvent::HTTP changes considerably: after parsing the headers, |
| 292 |
and instead of downloading the body (if any), the completion |
| 293 |
callback will be called. Instead of the $body argument |
| 294 |
containing the body data, the callback will receive the |
| 295 |
AnyEvent::Handle object associated with the connection. In error |
| 296 |
cases, "undef" will be passed. When there is no body (e.g. |
| 297 |
status 304), the empty string will be passed. |
| 298 |
|
| 299 |
The handle object might or might not be in TLS mode, might be |
| 300 |
connected to a proxy, be a persistent connection, use chunked |
| 301 |
transfer encoding etc., and configured in unspecified ways. The |
| 302 |
user is responsible for this handle (it will not be used by this |
| 303 |
module anymore). |
| 304 |
|
| 305 |
This is useful with some push-type services, where, after the |
| 306 |
initial headers, an interactive protocol is used (typical |
| 307 |
example would be the push-style twitter API which starts a |
| 308 |
JSON/XML stream). |
| 309 |
|
| 310 |
If you think you need this, first have a look at "on_body", to |
| 311 |
see if that doesn't solve your problem in a better way. |
| 312 |
|
| 313 |
persistent => $boolean |
| 314 |
Try to create/reuse a persistent connection. When this flag is |
| 315 |
set (default: true for idempotent requests, false for all |
| 316 |
others), then "http_request" tries to re-use an existing |
| 317 |
(previously-created) persistent connection to same host (i.e. |
| 318 |
identical URL scheme, hostname, port and sessionid) and, failing |
| 319 |
that, tries to create a new one. |
| 320 |
|
| 321 |
Requests failing in certain ways will be automatically retried |
| 322 |
once, which is dangerous for non-idempotent requests, which is |
| 323 |
why it defaults to off for them. The reason for this is because |
| 324 |
the bozos who designed HTTP/1.1 made it impossible to |
| 325 |
distinguish between a fatal error and a normal connection |
| 326 |
timeout, so you never know whether there was a problem with your |
| 327 |
request or not. |
| 328 |
|
| 329 |
When reusing an existent connection, many parameters (such as |
| 330 |
TLS context) will be ignored. See the "sessionid" parameter for |
| 331 |
a workaround. |
| 332 |
|
| 333 |
keepalive => $boolean |
| 334 |
Only used when "persistent" is also true. This parameter decides |
| 335 |
whether "http_request" tries to handshake a HTTP/1.0-style |
| 336 |
keep-alive connection (as opposed to only a HTTP/1.1 persistent |
| 337 |
connection). |
| 338 |
|
| 339 |
The default is true, except when using a proxy, in which case it |
| 340 |
defaults to false, as HTTP/1.0 proxies cannot support this in a |
| 341 |
meaningful way. |
| 342 |
|
| 343 |
handle_params => { key => value ... } |
| 344 |
The key-value pairs in this hash will be passed to any |
| 345 |
AnyEvent::Handle constructor that is called - not all requests |
| 346 |
will create a handle, and sometimes more than one is created, so |
| 347 |
this parameter is only good for setting hints. |
| 348 |
|
| 349 |
Example: set the maximum read size to 4096, to potentially |
| 350 |
conserve memory at the cost of speed. |
| 351 |
|
| 352 |
handle_params => { |
| 353 |
max_read_size => 4096, |
| 354 |
}, |
| 355 |
|
| 356 |
Example: do a simple HTTP GET request for http://www.nethype.de/ and |
| 357 |
print the response body. |
| 358 |
|
| 359 |
http_request GET => "http://www.nethype.de/", sub { |
| 360 |
my ($body, $hdr) = @_; |
| 361 |
print "$body\n"; |
| 362 |
}; |
| 363 |
|
| 364 |
Example: do a HTTP HEAD request on https://www.google.com/, use a |
| 365 |
timeout of 30 seconds. |
| 366 |
|
| 367 |
http_request |
| 368 |
HEAD => "https://www.google.com", |
| 369 |
headers => { "user-agent" => "MySearchClient 1.0" }, |
| 370 |
timeout => 30, |
| 371 |
sub { |
| 372 |
my ($body, $hdr) = @_; |
| 373 |
use Data::Dumper; |
| 374 |
print Dumper $hdr; |
| 375 |
} |
| 376 |
; |
| 377 |
|
| 378 |
Example: do another simple HTTP GET request, but immediately try to |
| 379 |
cancel it. |
| 380 |
|
| 381 |
my $request = http_request GET => "http://www.nethype.de/", sub { |
| 382 |
my ($body, $hdr) = @_; |
| 383 |
print "$body\n"; |
| 384 |
}; |
| 385 |
|
| 386 |
undef $request; |
| 387 |
|
| 388 |
DNS CACHING |
| 389 |
AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for the |
| 390 |
actual connection, which in turn uses AnyEvent::DNS to resolve |
| 391 |
hostnames. The latter is a simple stub resolver and does no caching on |
| 392 |
its own. If you want DNS caching, you currently have to provide your own |
| 393 |
default resolver (by storing a suitable resolver object in |
| 394 |
$AnyEvent::DNS::RESOLVER) or your own "tcp_connect" callback. |
| 395 |
|
| 396 |
GLOBAL FUNCTIONS AND VARIABLES |
| 397 |
AnyEvent::HTTP::set_proxy "proxy-url" |
| 398 |
Sets the default proxy server to use. The proxy-url must begin with |
| 399 |
a string of the form "http://host:port", croaks otherwise. |
| 400 |
|
| 401 |
To clear an already-set proxy, use "undef". |
| 402 |
|
| 403 |
When AnyEvent::HTTP is loaded for the first time it will query the |
| 404 |
default proxy from the operating system, currently by looking at |
| 405 |
"$ENV{http_proxy"}. |
| 406 |
|
| 407 |
AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end] |
| 408 |
Remove all cookies from the cookie jar that have been expired. If |
| 409 |
$session_end is given and true, then additionally remove all session |
| 410 |
cookies. |
| 411 |
|
| 412 |
You should call this function (with a true $session_end) before you |
| 413 |
save cookies to disk, and you should call this function after |
| 414 |
loading them again. If you have a long-running program you can |
| 415 |
additionally call this function from time to time. |
| 416 |
|
| 417 |
A cookie jar is initially an empty hash-reference that is managed by |
| 418 |
this module. Its format is subject to change, but currently it is as |
| 419 |
follows: |
| 420 |
|
| 421 |
The key "version" has to contain 2, otherwise the hash gets cleared. |
| 422 |
All other keys are hostnames or IP addresses pointing to |
| 423 |
hash-references. The key for these inner hash references is the |
| 424 |
server path for which this cookie is meant, and the values are again |
| 425 |
hash-references. Each key of those hash-references is a cookie name, |
| 426 |
and the value, you guessed it, is another hash-reference, this time |
| 427 |
with the key-value pairs from the cookie, except for "expires" and |
| 428 |
"max-age", which have been replaced by a "_expires" key that |
| 429 |
contains the cookie expiry timestamp. Session cookies are indicated |
| 430 |
by not having an "_expires" key. |
| 431 |
|
| 432 |
Here is an example of a cookie jar with a single cookie, so you have |
| 433 |
a chance of understanding the above paragraph: |
| 434 |
|
| 435 |
{ |
| 436 |
version => 2, |
| 437 |
"10.0.0.1" => { |
| 438 |
"/" => { |
| 439 |
"mythweb_id" => { |
| 440 |
_expires => 1293917923, |
| 441 |
value => "ooRung9dThee3ooyXooM1Ohm", |
| 442 |
}, |
| 443 |
}, |
| 444 |
}, |
| 445 |
} |
| 446 |
|
| 447 |
$date = AnyEvent::HTTP::format_date $timestamp |
| 448 |
Takes a POSIX timestamp (seconds since the epoch) and formats it as |
| 449 |
a HTTP Date (RFC 2616). |
| 450 |
|
| 451 |
$timestamp = AnyEvent::HTTP::parse_date $date |
| 452 |
Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie spec) |
| 453 |
or a bunch of minor variations of those, and returns the |
| 454 |
corresponding POSIX timestamp, or "undef" if the date cannot be |
| 455 |
parsed. |
| 456 |
|
| 457 |
$AnyEvent::HTTP::MAX_RECURSE |
| 458 |
The default value for the "recurse" request parameter (default: 10). |
| 459 |
|
| 460 |
$AnyEvent::HTTP::TIMEOUT |
| 461 |
The default timeout for connection operations (default: 300). |
| 462 |
|
| 463 |
$AnyEvent::HTTP::USERAGENT |
| 464 |
The default value for the "User-Agent" header (the default is |
| 465 |
"Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; |
| 466 |
+http://software.schmorp.de/pkg/AnyEvent)"). |
| 467 |
|
| 468 |
$AnyEvent::HTTP::MAX_PER_HOST |
| 469 |
The maximum number of concurrent connections to the same host |
| 470 |
(identified by the hostname). If the limit is exceeded, then |
| 471 |
additional requests are queued until previous connections are |
| 472 |
closed. Both persistent and non-persistent connections are counted |
| 473 |
in this limit. |
| 474 |
|
| 475 |
The default value for this is 4, and it is highly advisable to not |
| 476 |
increase it much. |
| 477 |
|
| 478 |
For comparison: the RFC's recommend 4 non-persistent or 2 persistent |
| 479 |
connections, older browsers used 2, newer ones (such as firefox 3) |
| 480 |
typically use 6, and Opera uses 8 because like, they have the |
| 481 |
fastest browser and give a shit for everybody else on the planet. |
| 482 |
|
| 483 |
$AnyEvent::HTTP::PERSISTENT_TIMEOUT |
| 484 |
The time after which idle persistent connections get closed by |
| 485 |
AnyEvent::HTTP (default: 3). |
| 486 |
|
| 487 |
$AnyEvent::HTTP::ACTIVE |
| 488 |
The number of active connections. This is not the number of |
| 489 |
currently running requests, but the number of currently open and |
| 490 |
non-idle TCP connections. This number can be useful for |
| 491 |
load-leveling. |
| 492 |
|
| 493 |
SHOWCASE |
| 494 |
This section contains some more elaborate "real-world" examples or code |
| 495 |
snippets. |
| 496 |
|
| 497 |
HTTP/1.1 FILE DOWNLOAD |
| 498 |
Downloading files with HTTP can be quite tricky, especially when |
| 499 |
something goes wrong and you want to resume. |
| 500 |
|
| 501 |
Here is a function that initiates and resumes a download. It uses the |
| 502 |
last modified time to check for file content changes, and works with |
| 503 |
many HTTP/1.0 servers as well, and usually falls back to a complete |
| 504 |
re-download on older servers. |
| 505 |
|
| 506 |
It calls the completion callback with either "undef", which means a |
| 507 |
nonretryable error occurred, 0 when the download was partial and should |
| 508 |
be retried, and 1 if it was successful. |
| 509 |
|
| 510 |
use AnyEvent::HTTP; |
| 511 |
|
| 512 |
sub download($$$) { |
| 513 |
my ($url, $file, $cb) = @_; |
| 514 |
|
| 515 |
open my $fh, "+<", $file |
| 516 |
or die "$file: $!"; |
| 517 |
|
| 518 |
my %hdr; |
| 519 |
my $ofs = 0; |
| 520 |
|
| 521 |
if (stat $fh and -s _) { |
| 522 |
$ofs = -s _; |
| 523 |
warn "-s is ", $ofs; |
| 524 |
$hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9]; |
| 525 |
$hdr{"range"} = "bytes=$ofs-"; |
| 526 |
} |
| 527 |
|
| 528 |
http_get $url, |
| 529 |
headers => \%hdr, |
| 530 |
on_header => sub { |
| 531 |
my ($hdr) = @_; |
| 532 |
|
| 533 |
if ($hdr->{Status} == 200 && $ofs) { |
| 534 |
# resume failed |
| 535 |
truncate $fh, $ofs = 0; |
| 536 |
} |
| 537 |
|
| 538 |
sysseek $fh, $ofs, 0; |
| 539 |
|
| 540 |
1 |
| 541 |
}, |
| 542 |
on_body => sub { |
| 543 |
my ($data, $hdr) = @_; |
| 544 |
|
| 545 |
if ($hdr->{Status} =~ /^2/) { |
| 546 |
length $data == syswrite $fh, $data |
| 547 |
or return; # abort on write errors |
| 548 |
} |
| 549 |
|
| 550 |
1 |
| 551 |
}, |
| 552 |
sub { |
| 553 |
my (undef, $hdr) = @_; |
| 554 |
|
| 555 |
my $status = $hdr->{Status}; |
| 556 |
|
| 557 |
if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) { |
| 558 |
utime $time, $time, $fh; |
| 559 |
} |
| 560 |
|
| 561 |
if ($status == 200 || $status == 206 || $status == 416) { |
| 562 |
# download ok || resume ok || file already fully downloaded |
| 563 |
$cb->(1, $hdr); |
| 564 |
|
| 565 |
} elsif ($status == 412) { |
| 566 |
# file has changed while resuming, delete and retry |
| 567 |
unlink $file; |
| 568 |
$cb->(0, $hdr); |
| 569 |
|
| 570 |
} elsif ($status == 500 or $status == 503 or $status =~ /^59/) { |
| 571 |
# retry later |
| 572 |
$cb->(0, $hdr); |
| 573 |
|
| 574 |
} else { |
| 575 |
$cb->(undef, $hdr); |
| 576 |
} |
| 577 |
} |
| 578 |
; |
| 579 |
} |
| 580 |
|
| 581 |
download "http://server/somelargefile", "/tmp/somelargefile", sub { |
| 582 |
if ($_[0]) { |
| 583 |
print "OK!\n"; |
| 584 |
} elsif (defined $_[0]) { |
| 585 |
print "please retry later\n"; |
| 586 |
} else { |
| 587 |
print "ERROR\n"; |
| 588 |
} |
| 589 |
}; |
| 590 |
|
| 591 |
SOCKS PROXIES |
| 592 |
Socks proxies are not directly supported by AnyEvent::HTTP. You can |
| 593 |
compile your perl to support socks, or use an external program such as |
| 594 |
socksify (dante) or tsocks to make your program use a socks proxy |
| 595 |
transparently. |
| 596 |
|
| 597 |
Alternatively, for AnyEvent::HTTP only, you can use your own |
| 598 |
"tcp_connect" function that does the proxy handshake - here is an |
| 599 |
example that works with socks4a proxies: |
| 600 |
|
| 601 |
use Errno; |
| 602 |
use AnyEvent::Util; |
| 603 |
use AnyEvent::Socket; |
| 604 |
use AnyEvent::Handle; |
| 605 |
|
| 606 |
# host, port and username of/for your socks4a proxy |
| 607 |
my $socks_host = "10.0.0.23"; |
| 608 |
my $socks_port = 9050; |
| 609 |
my $socks_user = ""; |
| 610 |
|
| 611 |
sub socks4a_connect { |
| 612 |
my ($host, $port, $connect_cb, $prepare_cb) = @_; |
| 613 |
|
| 614 |
my $hdl = new AnyEvent::Handle |
| 615 |
connect => [$socks_host, $socks_port], |
| 616 |
on_prepare => sub { $prepare_cb->($_[0]{fh}) }, |
| 617 |
on_error => sub { $connect_cb->() }, |
| 618 |
; |
| 619 |
|
| 620 |
$hdl->push_write (pack "CCnNZ*Z*", 4, 1, $port, 1, $socks_user, $host); |
| 621 |
|
| 622 |
$hdl->push_read (chunk => 8, sub { |
| 623 |
my ($hdl, $chunk) = @_; |
| 624 |
my ($status, $port, $ipn) = unpack "xCna4", $chunk; |
| 625 |
|
| 626 |
if ($status == 0x5a) { |
| 627 |
$connect_cb->($hdl->{fh}, (format_address $ipn) . ":$port"); |
| 628 |
} else { |
| 629 |
$! = Errno::ENXIO; $connect_cb->(); |
| 630 |
} |
| 631 |
}); |
| 632 |
|
| 633 |
$hdl |
| 634 |
} |
| 635 |
|
| 636 |
Use "socks4a_connect" instead of "tcp_connect" when doing |
| 637 |
"http_request"s, possibly after switching off other proxy types: |
| 638 |
|
| 639 |
AnyEvent::HTTP::set_proxy undef; # usually you do not want other proxies |
| 640 |
|
| 641 |
http_get 'http://www.google.com', tcp_connect => \&socks4a_connect, sub { |
| 642 |
my ($data, $headers) = @_; |
| 643 |
... |
| 644 |
}; |
| 645 |
|
| 646 |
SEE ALSO |
| 647 |
AnyEvent. |
| 648 |
|
| 649 |
AUTHOR |
| 650 |
Marc Lehmann <schmorp@schmorp.de> |
| 651 |
http://home.schmorp.de/ |
| 652 |
|
| 653 |
With many thanks to Дмитрий Шалашов, who provided countless testcases |
| 654 |
and bugreports. |
| 655 |
|