1 |
NAME |
2 |
AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client |
3 |
|
4 |
SYNOPSIS |
5 |
use AnyEvent::HTTP; |
6 |
|
7 |
http_get "http://www.nethype.de/", sub { print $_[1] }; |
8 |
|
9 |
# ... do something else here |
10 |
|
11 |
DESCRIPTION |
12 |
This module is an AnyEvent user, you need to make sure that you use and |
13 |
run a supported event loop. |
14 |
|
15 |
This module implements a simple, stateless and non-blocking HTTP client. |
16 |
It supports GET, POST and other request methods, cookies and more, all |
17 |
on a very low level. It can follow redirects, supports proxies, and |
18 |
automatically limits the number of connections to the values specified |
19 |
in the RFC. |
20 |
|
21 |
It should generally be a "good client" that is enough for most HTTP |
22 |
tasks. Simple tasks should be simple, but complex tasks should still be |
23 |
possible as the user retains control over request and response headers. |
24 |
|
25 |
The caller is responsible for authentication management, cookies (if the |
26 |
simplistic implementation in this module doesn't suffice), referer and |
27 |
other high-level protocol details for which this module offers only |
28 |
limited support. |
29 |
|
30 |
METHODS |
31 |
http_get $url, key => value..., $cb->($data, $headers) |
32 |
Executes an HTTP-GET request. See the http_request function for |
33 |
details on additional parameters and the return value. |
34 |
|
35 |
http_head $url, key => value..., $cb->($data, $headers) |
36 |
Executes an HTTP-HEAD request. See the http_request function for |
37 |
details on additional parameters and the return value. |
38 |
|
39 |
http_post $url, $body, key => value..., $cb->($data, $headers) |
40 |
Executes an HTTP-POST request with a request body of $body. See the |
41 |
http_request function for details on additional parameters and the |
42 |
return value. |
43 |
|
44 |
http_request $method => $url, key => value..., $cb->($data, $headers) |
45 |
Executes a HTTP request of type $method (e.g. "GET", "POST"). The |
46 |
URL must be an absolute http or https URL. |
47 |
|
48 |
When called in void context, nothing is returned. In other contexts, |
49 |
"http_request" returns a "cancellation guard" - you have to keep the |
50 |
object at least alive until the callback get called. If the object |
51 |
gets destroyed before the callback is called, the request will be |
52 |
cancelled. |
53 |
|
54 |
The callback will be called with the response body data as first |
55 |
argument (or "undef" if an error occured), and a hash-ref with |
56 |
response headers (and trailers) as second argument. |
57 |
|
58 |
All the headers in that hash are lowercased. In addition to the |
59 |
response headers, the "pseudo-headers" (uppercase to avoid clashing |
60 |
with possible response headers) "HTTPVersion", "Status" and "Reason" |
61 |
contain the three parts of the HTTP Status-Line of the same name. If |
62 |
an error occurs during the body phase of a request, then the |
63 |
original "Status" and "Reason" values from the header are available |
64 |
as "OrigStatus" and "OrigReason". |
65 |
|
66 |
The pseudo-header "URL" contains the actual URL (which can differ |
67 |
from the requested URL when following redirects - for example, you |
68 |
might get an error that your URL scheme is not supported even though |
69 |
your URL is a valid http URL because it redirected to an ftp URL, in |
70 |
which case you can look at the URL pseudo header). |
71 |
|
72 |
The pseudo-header "Redirect" only exists when the request was a |
73 |
result of an internal redirect. In that case it is an array |
74 |
reference with the "($data, $headers)" from the redirect response. |
75 |
Note that this response could in turn be the result of a redirect |
76 |
itself, and "$headers->{Redirect}[1]{Redirect}" will then contain |
77 |
the original response, and so on. |
78 |
|
79 |
If the server sends a header multiple times, then their contents |
80 |
will be joined together with a comma (","), as per the HTTP spec. |
81 |
|
82 |
If an internal error occurs, such as not being able to resolve a |
83 |
hostname, then $data will be "undef", "$headers->{Status}" will be |
84 |
590-599 and the "Reason" pseudo-header will contain an error |
85 |
message. Currently the following status codes are used: |
86 |
|
87 |
595 - errors during connection etsbalishment, proxy handshake. |
88 |
596 - errors during TLS negotiation, request sending and header |
89 |
processing. |
90 |
597 - errors during body receiving or processing. |
91 |
598 - user aborted request via "on_header" or "on_body". |
92 |
599 - other, usually nonretryable, errors (garbled URL etc.). |
93 |
|
94 |
A typical callback might look like this: |
95 |
|
96 |
sub { |
97 |
my ($body, $hdr) = @_; |
98 |
|
99 |
if ($hdr->{Status} =~ /^2/) { |
100 |
... everything should be ok |
101 |
} else { |
102 |
print "error, $hdr->{Status} $hdr->{Reason}\n"; |
103 |
} |
104 |
} |
105 |
|
106 |
Additional parameters are key-value pairs, and are fully optional. |
107 |
They include: |
108 |
|
109 |
recurse => $count (default: $MAX_RECURSE) |
110 |
Whether to recurse requests or not, e.g. on redirects, |
111 |
authentication retries and so on, and how often to do so. |
112 |
|
113 |
headers => hashref |
114 |
The request headers to use. Currently, "http_request" may |
115 |
provide its own "Host:", "Content-Length:", "Connection:" and |
116 |
"Cookie:" headers and will provide defaults at least for "TE:", |
117 |
"Referer:" and "User-Agent:" (this can be suppressed by using |
118 |
"undef" for these headers in which case they won't be sent at |
119 |
all). |
120 |
|
121 |
You really should provide your own "User-Agent:" header value |
122 |
that is appropriate for your program - I wouldn't be surprised |
123 |
if the default AnyEvent string gets blocked by webservers sooner |
124 |
or later. |
125 |
|
126 |
Also, make sure that your headers names and values do not |
127 |
contain any embedded newlines. |
128 |
|
129 |
timeout => $seconds |
130 |
The time-out to use for various stages - each connect attempt |
131 |
will reset the timeout, as will read or write activity, i.e. |
132 |
this is not an overall timeout. |
133 |
|
134 |
Default timeout is 5 minutes. |
135 |
|
136 |
proxy => [$host, $port[, $scheme]] or undef |
137 |
Use the given http proxy for all requests, or no proxy if |
138 |
"undef" is used. |
139 |
|
140 |
$scheme must be either missing or must be "http" for HTTP. |
141 |
|
142 |
If not specified, then the default proxy is used (see |
143 |
"AnyEvent::HTTP::set_proxy"). |
144 |
|
145 |
body => $string |
146 |
The request body, usually empty. Will be sent as-is (future |
147 |
versions of this module might offer more options). |
148 |
|
149 |
cookie_jar => $hash_ref |
150 |
Passing this parameter enables (simplified) cookie-processing, |
151 |
loosely based on the original netscape specification. |
152 |
|
153 |
The $hash_ref must be an (initially empty) hash reference which |
154 |
will get updated automatically. It is possible to save the |
155 |
cookie jar to persistent storage with something like JSON or |
156 |
Storable - see the "AnyEvent::HTTP::cookie_jar_expire" function |
157 |
if you wish to remove expired or session-only cookies, and also |
158 |
for documentation on the format of the cookie jar. |
159 |
|
160 |
Note that this cookie implementation is not meant to be |
161 |
complete. If you want complete cookie management you have to do |
162 |
that on your own. "cookie_jar" is meant as a quick fix to get |
163 |
most cookie-using sites working. Cookies are a privacy disaster, |
164 |
do not use them unless required to. |
165 |
|
166 |
When cookie processing is enabled, the "Cookie:" and |
167 |
"Set-Cookie:" headers will be set and handled by this module, |
168 |
otherwise they will be left untouched. |
169 |
|
170 |
tls_ctx => $scheme | $tls_ctx |
171 |
Specifies the AnyEvent::TLS context to be used for https |
172 |
connections. This parameter follows the same rules as the |
173 |
"tls_ctx" parameter to AnyEvent::Handle, but additionally, the |
174 |
two strings "low" or "high" can be specified, which give you a |
175 |
predefined low-security (no verification, highest compatibility) |
176 |
and high-security (CA and common-name verification) TLS context. |
177 |
|
178 |
The default for this option is "low", which could be interpreted |
179 |
as "give me the page, no matter what". |
180 |
|
181 |
See also the "sessionid" parameter. |
182 |
|
183 |
session => $string |
184 |
The module might reuse connections to the same host internally. |
185 |
Sometimes (e.g. when using TLS), you do not want to reuse |
186 |
connections from other sessions. This can be achieved by setting |
187 |
this parameter to some unique ID (such as the address of an |
188 |
object storing your state data, or the TLS context) - only |
189 |
connections using the same unique ID will be reused. |
190 |
|
191 |
on_prepare => $callback->($fh) |
192 |
In rare cases you need to "tune" the socket before it is used to |
193 |
connect (for exmaple, to bind it on a given IP address). This |
194 |
parameter overrides the prepare callback passed to |
195 |
"AnyEvent::Socket::tcp_connect" and behaves exactly the same way |
196 |
(e.g. it has to provide a timeout). See the description for the |
197 |
$prepare_cb argument of "AnyEvent::Socket::tcp_connect" for |
198 |
details. |
199 |
|
200 |
tcp_connect => $callback->($host, $service, $connect_cb, |
201 |
$prepare_cb) |
202 |
In even rarer cases you want total control over how |
203 |
AnyEvent::HTTP establishes connections. Normally it uses |
204 |
AnyEvent::Socket::tcp_connect to do this, but you can provide |
205 |
your own "tcp_connect" function - obviously, it has to follow |
206 |
the same calling conventions, except that it may always return a |
207 |
connection guard object. |
208 |
|
209 |
There are probably lots of weird uses for this function, |
210 |
starting from tracing the hosts "http_request" actually tries to |
211 |
connect, to (inexact but fast) host => IP address caching or |
212 |
even socks protocol support. |
213 |
|
214 |
on_header => $callback->($headers) |
215 |
When specified, this callback will be called with the header |
216 |
hash as soon as headers have been successfully received from the |
217 |
remote server (not on locally-generated errors). |
218 |
|
219 |
It has to return either true (in which case AnyEvent::HTTP will |
220 |
continue), or false, in which case AnyEvent::HTTP will cancel |
221 |
the download (and call the finish callback with an error code of |
222 |
598). |
223 |
|
224 |
This callback is useful, among other things, to quickly reject |
225 |
unwanted content, which, if it is supposed to be rare, can be |
226 |
faster than first doing a "HEAD" request. |
227 |
|
228 |
The downside is that cancelling the request makes it impossible |
229 |
to re-use the connection. Also, the "on_header" callback will |
230 |
not receive any trailer (headers sent after the response body). |
231 |
|
232 |
Example: cancel the request unless the content-type is |
233 |
"text/html". |
234 |
|
235 |
on_header => sub { |
236 |
$_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/ |
237 |
}, |
238 |
|
239 |
on_body => $callback->($partial_body, $headers) |
240 |
When specified, all body data will be passed to this callback |
241 |
instead of to the completion callback. The completion callback |
242 |
will get the empty string instead of the body data. |
243 |
|
244 |
It has to return either true (in which case AnyEvent::HTTP will |
245 |
continue), or false, in which case AnyEvent::HTTP will cancel |
246 |
the download (and call the completion callback with an error |
247 |
code of 598). |
248 |
|
249 |
The downside to cancelling the request is that it makes it |
250 |
impossible to re-use the connection. |
251 |
|
252 |
This callback is useful when the data is too large to be held in |
253 |
memory (so the callback writes it to a file) or when only some |
254 |
information should be extracted, or when the body should be |
255 |
processed incrementally. |
256 |
|
257 |
It is usually preferred over doing your own body handling via |
258 |
"want_body_handle", but in case of streaming APIs, where HTTP is |
259 |
only used to create a connection, "want_body_handle" is the |
260 |
better alternative, as it allows you to install your own event |
261 |
handler, reducing resource usage. |
262 |
|
263 |
want_body_handle => $enable |
264 |
When enabled (default is disabled), the behaviour of |
265 |
AnyEvent::HTTP changes considerably: after parsing the headers, |
266 |
and instead of downloading the body (if any), the completion |
267 |
callback will be called. Instead of the $body argument |
268 |
containing the body data, the callback will receive the |
269 |
AnyEvent::Handle object associated with the connection. In error |
270 |
cases, "undef" will be passed. When there is no body (e.g. |
271 |
status 304), the empty string will be passed. |
272 |
|
273 |
The handle object might or might not be in TLS mode, might be |
274 |
connected to a proxy, be a persistent connection, use chunked |
275 |
transfer encoding etc., and configured in unspecified ways. The |
276 |
user is responsible for this handle (it will not be used by this |
277 |
module anymore). |
278 |
|
279 |
This is useful with some push-type services, where, after the |
280 |
initial headers, an interactive protocol is used (typical |
281 |
example would be the push-style twitter API which starts a |
282 |
JSON/XML stream). |
283 |
|
284 |
If you think you need this, first have a look at "on_body", to |
285 |
see if that doesn't solve your problem in a better way. |
286 |
|
287 |
persistent => $boolean |
288 |
Try to create/reuse a persistent connection. When this flag is |
289 |
set (default: true for idempotent requests, false for all |
290 |
others), then "http_request" tries to re-use an existing |
291 |
(previously-created) persistent connection to the host and, |
292 |
failing that, tries to create a new one. |
293 |
|
294 |
Requests failing in certain ways will be automatically retried |
295 |
once, which is dangerous for non-idempotent requests, which is |
296 |
why it defaults to off for them. The reason for this is because |
297 |
the bozos who designed HTTP/1.1 made it impossible to |
298 |
distinguish between a fatal error and a normal connection |
299 |
timeout, so you never know whether there was a problem with your |
300 |
request or not. |
301 |
|
302 |
When reusing an existent connection, many parameters (such as |
303 |
TLS context) will be ignored. See the "session" parameter for a |
304 |
workaround. |
305 |
|
306 |
keepalive => $boolean |
307 |
Only used when "persistent" is also true. This parameter decides |
308 |
whether "http_request" tries to handshake a HTTP/1.0-style |
309 |
keep-alive connection (as opposed to only a HTTP/1.1 persistent |
310 |
connection). |
311 |
|
312 |
The default is true, except when using a proxy, in which case it |
313 |
defaults to false, as HTTP/1.0 proxies cannot support this in a |
314 |
meaningful way. |
315 |
|
316 |
handle_params => { key => value ... } |
317 |
The key-value pairs in this hash will be passed to any |
318 |
AnyEvent::Handle constructor that is called - not all requests |
319 |
will create a handle, and sometimes more than one is created, so |
320 |
this parameter is only good for setting hints. |
321 |
|
322 |
Example: set the maximum read size to 4096, to potentially |
323 |
conserve memory at the cost of speed. |
324 |
|
325 |
handle_params => { |
326 |
max_read_size => 4096, |
327 |
}, |
328 |
|
329 |
Example: do a simple HTTP GET request for http://www.nethype.de/ and |
330 |
print the response body. |
331 |
|
332 |
http_request GET => "http://www.nethype.de/", sub { |
333 |
my ($body, $hdr) = @_; |
334 |
print "$body\n"; |
335 |
}; |
336 |
|
337 |
Example: do a HTTP HEAD request on https://www.google.com/, use a |
338 |
timeout of 30 seconds. |
339 |
|
340 |
http_request |
341 |
GET => "https://www.google.com", |
342 |
headers => { "user-agent" => "MySearchClient 1.0" }, |
343 |
timeout => 30, |
344 |
sub { |
345 |
my ($body, $hdr) = @_; |
346 |
use Data::Dumper; |
347 |
print Dumper $hdr; |
348 |
} |
349 |
; |
350 |
|
351 |
Example: do another simple HTTP GET request, but immediately try to |
352 |
cancel it. |
353 |
|
354 |
my $request = http_request GET => "http://www.nethype.de/", sub { |
355 |
my ($body, $hdr) = @_; |
356 |
print "$body\n"; |
357 |
}; |
358 |
|
359 |
undef $request; |
360 |
|
361 |
DNS CACHING |
362 |
AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for the |
363 |
actual connection, which in turn uses AnyEvent::DNS to resolve |
364 |
hostnames. The latter is a simple stub resolver and does no caching on |
365 |
its own. If you want DNS caching, you currently have to provide your own |
366 |
default resolver (by storing a suitable resolver object in |
367 |
$AnyEvent::DNS::RESOLVER) or your own "tcp_connect" callback. |
368 |
|
369 |
GLOBAL FUNCTIONS AND VARIABLES |
370 |
AnyEvent::HTTP::set_proxy "proxy-url" |
371 |
Sets the default proxy server to use. The proxy-url must begin with |
372 |
a string of the form "http://host:port", croaks otherwise. |
373 |
|
374 |
To clear an already-set proxy, use "undef". |
375 |
|
376 |
When AnyEvent::HTTP is laoded for the first time it will query the |
377 |
default proxy from the operating system, currently by looking at |
378 |
"$ENV{http_proxy"}. |
379 |
|
380 |
AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end] |
381 |
Remove all cookies from the cookie jar that have been expired. If |
382 |
$session_end is given and true, then additionally remove all session |
383 |
cookies. |
384 |
|
385 |
You should call this function (with a true $session_end) before you |
386 |
save cookies to disk, and you should call this function after |
387 |
loading them again. If you have a long-running program you can |
388 |
additonally call this function from time to time. |
389 |
|
390 |
A cookie jar is initially an empty hash-reference that is managed by |
391 |
this module. It's format is subject to change, but currently it is |
392 |
like this: |
393 |
|
394 |
The key "version" has to contain 1, otherwise the hash gets emptied. |
395 |
All other keys are hostnames or IP addresses pointing to |
396 |
hash-references. The key for these inner hash references is the |
397 |
server path for which this cookie is meant, and the values are again |
398 |
hash-references. The keys of those hash-references is the cookie |
399 |
name, and the value, you guessed it, is another hash-reference, this |
400 |
time with the key-value pairs from the cookie, except for "expires" |
401 |
and "max-age", which have been replaced by a "_expires" key that |
402 |
contains the cookie expiry timestamp. |
403 |
|
404 |
Here is an example of a cookie jar with a single cookie, so you have |
405 |
a chance of understanding the above paragraph: |
406 |
|
407 |
{ |
408 |
version => 1, |
409 |
"10.0.0.1" => { |
410 |
"/" => { |
411 |
"mythweb_id" => { |
412 |
_expires => 1293917923, |
413 |
value => "ooRung9dThee3ooyXooM1Ohm", |
414 |
}, |
415 |
}, |
416 |
}, |
417 |
} |
418 |
|
419 |
$date = AnyEvent::HTTP::format_date $timestamp |
420 |
Takes a POSIX timestamp (seconds since the epoch) and formats it as |
421 |
a HTTP Date (RFC 2616). |
422 |
|
423 |
$timestamp = AnyEvent::HTTP::parse_date $date |
424 |
Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie spec) |
425 |
or a bunch of minor variations of those, and returns the |
426 |
corresponding POSIX timestamp, or "undef" if the date cannot be |
427 |
parsed. |
428 |
|
429 |
$AnyEvent::HTTP::MAX_RECURSE |
430 |
The default value for the "recurse" request parameter (default: 10). |
431 |
|
432 |
$AnyEvent::HTTP::TIMEOUT |
433 |
The default timeout for conenction operations (default: 300). |
434 |
|
435 |
$AnyEvent::HTTP::USERAGENT |
436 |
The default value for the "User-Agent" header (the default is |
437 |
"Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; |
438 |
+http://software.schmorp.de/pkg/AnyEvent)"). |
439 |
|
440 |
$AnyEvent::HTTP::MAX_PER_HOST |
441 |
The maximum number of concurrent connections to the same host |
442 |
(identified by the hostname). If the limit is exceeded, then the |
443 |
additional requests are queued until previous connections are |
444 |
closed. Both persistent and non-persistent connections are counted |
445 |
in this limit. |
446 |
|
447 |
The default value for this is 4, and it is highly advisable to not |
448 |
increase it much. |
449 |
|
450 |
For comparison: the RFC's recommend 4 non-persistent or 2 persistent |
451 |
connections, older browsers used 2, newers (such as firefox 3) |
452 |
typically use 6, and Opera uses 8 because like, they have the |
453 |
fastest browser and give a shit for everybody else on the planet. |
454 |
|
455 |
$AnyEvent::HTTP::PERSISTENT_TIMEOUT |
456 |
The time after which idle persistent conenctions get closed by |
457 |
AnyEvent::HTTP (default: 3). |
458 |
|
459 |
$AnyEvent::HTTP::ACTIVE |
460 |
The number of active connections. This is not the number of |
461 |
currently running requests, but the number of currently open and |
462 |
non-idle TCP connections. This number can be useful for |
463 |
load-leveling. |
464 |
|
465 |
SHOWCASE |
466 |
This section contaisn some more elaborate "real-world" examples or code |
467 |
snippets. |
468 |
|
469 |
HTTP/1.1 FILE DOWNLOAD |
470 |
Downloading files with HTTP can be quite tricky, especially when |
471 |
something goes wrong and you want to resume. |
472 |
|
473 |
Here is a function that initiates and resumes a download. It uses the |
474 |
last modified time to check for file content changes, and works with |
475 |
many HTTP/1.0 servers as well, and usually falls back to a complete |
476 |
re-download on older servers. |
477 |
|
478 |
It calls the completion callback with either "undef", which means a |
479 |
nonretryable error occured, 0 when the download was partial and should |
480 |
be retried, and 1 if it was successful. |
481 |
|
482 |
use AnyEvent::HTTP; |
483 |
|
484 |
sub download($$$) { |
485 |
my ($url, $file, $cb) = @_; |
486 |
|
487 |
open my $fh, "+<", $file |
488 |
or die "$file: $!"; |
489 |
|
490 |
my %hdr; |
491 |
my $ofs = 0; |
492 |
|
493 |
warn stat $fh; |
494 |
warn -s _; |
495 |
if (stat $fh and -s _) { |
496 |
$ofs = -s _; |
497 |
warn "-s is ", $ofs;#d# |
498 |
$hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9]; |
499 |
$hdr{"range"} = "bytes=$ofs-"; |
500 |
} |
501 |
|
502 |
http_get $url, |
503 |
headers => \%hdr, |
504 |
on_header => sub { |
505 |
my ($hdr) = @_; |
506 |
|
507 |
if ($hdr->{Status} == 200 && $ofs) { |
508 |
# resume failed |
509 |
truncate $fh, $ofs = 0; |
510 |
} |
511 |
|
512 |
sysseek $fh, $ofs, 0; |
513 |
|
514 |
1 |
515 |
}, |
516 |
on_body => sub { |
517 |
my ($data, $hdr) = @_; |
518 |
|
519 |
if ($hdr->{Status} =~ /^2/) { |
520 |
length $data == syswrite $fh, $data |
521 |
or return; # abort on write errors |
522 |
} |
523 |
|
524 |
1 |
525 |
}, |
526 |
sub { |
527 |
my (undef, $hdr) = @_; |
528 |
|
529 |
my $status = $hdr->{Status}; |
530 |
|
531 |
if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) { |
532 |
utime $fh, $time, $time; |
533 |
} |
534 |
|
535 |
if ($status == 200 || $status == 206 || $status == 416) { |
536 |
# download ok || resume ok || file already fully downloaded |
537 |
$cb->(1, $hdr); |
538 |
|
539 |
} elsif ($status == 412) { |
540 |
# file has changed while resuming, delete and retry |
541 |
unlink $file; |
542 |
$cb->(0, $hdr); |
543 |
|
544 |
} elsif ($status == 500 or $status == 503 or $status =~ /^59/) { |
545 |
# retry later |
546 |
$cb->(0, $hdr); |
547 |
|
548 |
} else { |
549 |
$cb->(undef, $hdr); |
550 |
} |
551 |
} |
552 |
; |
553 |
} |
554 |
|
555 |
download "http://server/somelargefile", "/tmp/somelargefile", sub { |
556 |
if ($_[0]) { |
557 |
print "OK!\n"; |
558 |
} elsif (defined $_[0]) { |
559 |
print "please retry later\n"; |
560 |
} else { |
561 |
print "ERROR\n"; |
562 |
} |
563 |
}; |
564 |
|
565 |
SOCKS PROXIES |
566 |
Socks proxies are not directly supported by AnyEvent::HTTP. You can |
567 |
compile your perl to support socks, or use an external program such as |
568 |
socksify (dante) or tsocks to make your program use a socks proxy |
569 |
transparently. |
570 |
|
571 |
Alternatively, for AnyEvent::HTTP only, you can use your own |
572 |
"tcp_connect" function that does the proxy handshake - here is an |
573 |
example that works with socks4a proxies: |
574 |
|
575 |
use Errno; |
576 |
use AnyEvent::Util; |
577 |
use AnyEvent::Socket; |
578 |
use AnyEvent::Handle; |
579 |
|
580 |
# host, port and username of/for your socks4a proxy |
581 |
my $socks_host = "10.0.0.23"; |
582 |
my $socks_port = 9050; |
583 |
my $socks_user = ""; |
584 |
|
585 |
sub socks4a_connect { |
586 |
my ($host, $port, $connect_cb, $prepare_cb) = @_; |
587 |
|
588 |
my $hdl = new AnyEvent::Handle |
589 |
connect => [$socks_host, $socks_port], |
590 |
on_prepare => sub { $prepare_cb->($_[0]{fh}) }, |
591 |
on_error => sub { $connect_cb->() }, |
592 |
; |
593 |
|
594 |
$hdl->push_write (pack "CCnNZ*Z*", 4, 1, $port, 1, $socks_user, $host); |
595 |
|
596 |
$hdl->push_read (chunk => 8, sub { |
597 |
my ($hdl, $chunk) = @_; |
598 |
my ($status, $port, $ipn) = unpack "xCna4", $chunk; |
599 |
|
600 |
if ($status == 0x5a) { |
601 |
$connect_cb->($hdl->{fh}, (format_address $ipn) . ":$port"); |
602 |
} else { |
603 |
$! = Errno::ENXIO; $connect_cb->(); |
604 |
} |
605 |
}); |
606 |
|
607 |
$hdl |
608 |
} |
609 |
|
610 |
Use "socks4a_connect" instead of "tcp_connect" when doing |
611 |
"http_request"s, possibly after switching off other proxy types: |
612 |
|
613 |
AnyEvent::HTTP::set_proxy undef; # usually you do not want other proxies |
614 |
|
615 |
http_get 'http://www.google.com', tcp_connect => \&socks4a_connect, sub { |
616 |
my ($data, $headers) = @_; |
617 |
... |
618 |
}; |
619 |
|
620 |
SEE ALSO |
621 |
AnyEvent. |
622 |
|
623 |
AUTHOR |
624 |
Marc Lehmann <schmorp@schmorp.de> |
625 |
http://home.schmorp.de/ |
626 |
|
627 |
With many thanks to Дмитрий Шалашов, who provided |
628 |
countless testcases and bugreports. |
629 |
|