ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/README
Revision: 1.13
Committed: Wed Jun 16 19:17:30 2010 UTC (13 years, 11 months ago) by root
Branch: MAIN
CVS Tags: rel-1_46, rel-1_45
Changes since 1.12: +24 -4 lines
Log Message:
1.45

File Contents

# User Rev Content
1 root 1.1 NAME
2 root 1.2 AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
3 root 1.1
4     SYNOPSIS
5 root 1.2 use AnyEvent::HTTP;
6 root 1.1
7 root 1.3 http_get "http://www.nethype.de/", sub { print $_[1] };
8    
9     # ... do something else here
10    
11 root 1.1 DESCRIPTION
12     This module is an AnyEvent user, you need to make sure that you use and
13     run a supported event loop.
14    
15 root 1.2 This module implements a simple, stateless and non-blocking HTTP client.
16     It supports GET, POST and other request methods, cookies and more, all
17     on a very low level. It can follow redirects supports proxies and
18     automatically limits the number of connections to the values specified
19     in the RFC.
20    
21     It should generally be a "good client" that is enough for most HTTP
22     tasks. Simple tasks should be simple, but complex tasks should still be
23     possible as the user retains control over request and response headers.
24    
25     The caller is responsible for authentication management, cookies (if the
26     simplistic implementation in this module doesn't suffice), referer and
27     other high-level protocol details for which this module offers only
28     limited support.
29    
30     METHODS
31     http_get $url, key => value..., $cb->($data, $headers)
32     Executes an HTTP-GET request. See the http_request function for
33 root 1.5 details on additional parameters and the return value.
34 root 1.2
35     http_head $url, key => value..., $cb->($data, $headers)
36     Executes an HTTP-HEAD request. See the http_request function for
37 root 1.5 details on additional parameters and the return value.
38 root 1.2
39     http_post $url, $body, key => value..., $cb->($data, $headers)
40 root 1.4 Executes an HTTP-POST request with a request body of $body. See the
41 root 1.5 http_request function for details on additional parameters and the
42     return value.
43 root 1.2
44     http_request $method => $url, key => value..., $cb->($data, $headers)
45     Executes a HTTP request of type $method (e.g. "GET", "POST"). The
46     URL must be an absolute http or https URL.
47    
48 root 1.5 When called in void context, nothing is returned. In other contexts,
49     "http_request" returns a "cancellation guard" - you have to keep the
50     object at least alive until the callback get called. If the object
51     gets destroyed before the callbakc is called, the request will be
52     cancelled.
53    
54 root 1.8 The callback will be called with the response body data as first
55     argument (or "undef" if an error occured), and a hash-ref with
56     response headers as second argument.
57 root 1.2
58     All the headers in that hash are lowercased. In addition to the
59 root 1.13 response headers, the "pseudo-headers" (uppercase to avoid clashing
60     with possible response headers) "HTTPVersion", "Status" and "Reason"
61     contain the three parts of the HTTP Status-Line of the same name.
62    
63     The pseudo-header "URL" contains the actual URL (which can differ
64     from the requested URL when following redirects - for example, you
65     might get an error that your URL scheme is not supported even though
66     your URL is a valid http URL because it redirected to an ftp URL, in
67     which case you can look at the URL pseudo header).
68    
69     The pseudo-header "Redirect" only exists when the request was a
70     result of an internal redirect. In that case it is an array
71     reference with the "($data, $headers)" from the redirect response.
72     Note that this response could in turn be the result of a redirect
73     itself, and "$headers->{Redirect}[1]{Redirect}" will then contain
74     the original response, and so on.
75 root 1.3
76 root 1.6 If the server sends a header multiple times, then their contents
77     will be joined together with a comma (","), as per the HTTP spec.
78 root 1.2
79     If an internal error occurs, such as not being able to resolve a
80     hostname, then $data will be "undef", "$headers->{Status}" will be
81 root 1.8 "59x" (usually 599) and the "Reason" pseudo-header will contain an
82     error message.
83 root 1.2
84     A typical callback might look like this:
85    
86     sub {
87     my ($body, $hdr) = @_;
88    
89     if ($hdr->{Status} =~ /^2/) {
90     ... everything should be ok
91     } else {
92     print "error, $hdr->{Status} $hdr->{Reason}\n";
93     }
94     }
95    
96     Additional parameters are key-value pairs, and are fully optional.
97     They include:
98    
99     recurse => $count (default: $MAX_RECURSE)
100     Whether to recurse requests or not, e.g. on redirects,
101     authentication retries and so on, and how often to do so.
102    
103     headers => hashref
104     The request headers to use. Currently, "http_request" may
105     provide its own "Host:", "Content-Length:", "Connection:" and
106     "Cookie:" headers and will provide defaults for "User-Agent:"
107 root 1.10 and "Referer:" (this can be suppressed by using "undef" for
108     these headers in which case they won't be sent at all).
109 root 1.2
110     timeout => $seconds
111     The time-out to use for various stages - each connect attempt
112 root 1.11 will reset the timeout, as will read or write activity, i.e.
113     this is not an overall timeout.
114    
115     Default timeout is 5 minutes.
116 root 1.2
117     proxy => [$host, $port[, $scheme]] or undef
118     Use the given http proxy for all requests. If not specified,
119     then the default proxy (as specified by $ENV{http_proxy}) is
120     used.
121    
122 root 1.10 $scheme must be either missing, "http" for HTTP or "https" for
123     HTTPS.
124 root 1.2
125     body => $string
126     The request body, usually empty. Will be-sent as-is (future
127     versions of this module might offer more options).
128    
129     cookie_jar => $hash_ref
130     Passing this parameter enables (simplified) cookie-processing,
131     loosely based on the original netscape specification.
132    
133     The $hash_ref must be an (initially empty) hash reference which
134     will get updated automatically. It is possible to save the
135     cookie_jar to persistent storage with something like JSON or
136 root 1.8 Storable, but this is not recommended, as expiry times are
137 root 1.2 currently being ignored.
138    
139     Note that this cookie implementation is not of very high
140     quality, nor meant to be complete. If you want complete cookie
141     management you have to do that on your own. "cookie_jar" is
142     meant as a quick fix to get some cookie-using sites working.
143     Cookies are a privacy disaster, do not use them unless required
144     to.
145    
146 root 1.8 tls_ctx => $scheme | $tls_ctx
147     Specifies the AnyEvent::TLS context to be used for https
148     connections. This parameter follows the same rules as the
149     "tls_ctx" parameter to AnyEvent::Handle, but additionally, the
150     two strings "low" or "high" can be specified, which give you a
151     predefined low-security (no verification, highest compatibility)
152     and high-security (CA and common-name verification) TLS context.
153    
154     The default for this option is "low", which could be interpreted
155     as "give me the page, no matter what".
156    
157 root 1.11 on_prepare => $callback->($fh)
158     In rare cases you need to "tune" the socket before it is used to
159     connect (for exmaple, to bind it on a given IP address). This
160     parameter overrides the prepare callback passed to
161     "AnyEvent::Socket::tcp_connect" and behaves exactly the same way
162     (e.g. it has to provide a timeout). See the description for the
163     $prepare_cb argument of "AnyEvent::Socket::tcp_connect" for
164     details.
165    
166 root 1.8 on_header => $callback->($headers)
167     When specified, this callback will be called with the header
168     hash as soon as headers have been successfully received from the
169     remote server (not on locally-generated errors).
170    
171     It has to return either true (in which case AnyEvent::HTTP will
172     continue), or false, in which case AnyEvent::HTTP will cancel
173     the download (and call the finish callback with an error code of
174     598).
175    
176     This callback is useful, among other things, to quickly reject
177     unwanted content, which, if it is supposed to be rare, can be
178     faster than first doing a "HEAD" request.
179    
180     Example: cancel the request unless the content-type is
181     "text/html".
182    
183     on_header => sub {
184     $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
185     },
186    
187     on_body => $callback->($partial_body, $headers)
188     When specified, all body data will be passed to this callback
189     instead of to the completion callback. The completion callback
190     will get the empty string instead of the body data.
191    
192     It has to return either true (in which case AnyEvent::HTTP will
193     continue), or false, in which case AnyEvent::HTTP will cancel
194     the download (and call the completion callback with an error
195     code of 598).
196    
197     This callback is useful when the data is too large to be held in
198     memory (so the callback writes it to a file) or when only some
199     information should be extracted, or when the body should be
200     processed incrementally.
201    
202     It is usually preferred over doing your own body handling via
203 root 1.9 "want_body_handle", but in case of streaming APIs, where HTTP is
204     only used to create a connection, "want_body_handle" is the
205     better alternative, as it allows you to install your own event
206     handler, reducing resource usage.
207 root 1.8
208     want_body_handle => $enable
209     When enabled (default is disabled), the behaviour of
210     AnyEvent::HTTP changes considerably: after parsing the headers,
211     and instead of downloading the body (if any), the completion
212     callback will be called. Instead of the $body argument
213     containing the body data, the callback will receive the
214     AnyEvent::Handle object associated with the connection. In error
215     cases, "undef" will be passed. When there is no body (e.g.
216     status 304), the empty string will be passed.
217    
218     The handle object might or might not be in TLS mode, might be
219     connected to a proxy, be a persistent connection etc., and
220     configured in unspecified ways. The user is responsible for this
221     handle (it will not be used by this module anymore).
222    
223     This is useful with some push-type services, where, after the
224     initial headers, an interactive protocol is used (typical
225     example would be the push-style twitter API which starts a
226     JSON/XML stream).
227    
228     If you think you need this, first have a look at "on_body", to
229 root 1.9 see if that doesn't solve your problem in a better way.
230 root 1.8
231 root 1.2 Example: make a simple HTTP GET request for http://www.nethype.de/
232    
233     http_request GET => "http://www.nethype.de/", sub {
234     my ($body, $hdr) = @_;
235     print "$body\n";
236     };
237    
238     Example: make a HTTP HEAD request on https://www.google.com/, use a
239     timeout of 30 seconds.
240    
241     http_request
242     GET => "https://www.google.com",
243     timeout => 30,
244     sub {
245     my ($body, $hdr) = @_;
246     use Data::Dumper;
247     print Dumper $hdr;
248     }
249     ;
250    
251 root 1.5 Example: make another simple HTTP GET request, but immediately try
252     to cancel it.
253    
254     my $request = http_request GET => "http://www.nethype.de/", sub {
255     my ($body, $hdr) = @_;
256     print "$body\n";
257     };
258    
259     undef $request;
260    
261 root 1.13 DNS CACHING
262     AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for the
263     actual connection, which in turn uses AnyEvent::DNS to resolve
264     hostnames. The latter is a simple stub resolver and does no caching on
265     its own. If you want DNS caching, you currently have to provide your own
266     default resolver (by storing a suitable resolver object in
267     $AnyEvent::DNS::RESOLVER).
268    
269 root 1.2 GLOBAL FUNCTIONS AND VARIABLES
270     AnyEvent::HTTP::set_proxy "proxy-url"
271     Sets the default proxy server to use. The proxy-url must begin with
272 root 1.12 a string of the form "http://host:port" (optionally "https:..."),
273     croaks otherwise.
274    
275     To clear an already-set proxy, use "undef".
276 root 1.2
277     $AnyEvent::HTTP::MAX_RECURSE
278     The default value for the "recurse" request parameter (default: 10).
279    
280     $AnyEvent::HTTP::USERAGENT
281     The default value for the "User-Agent" header (the default is
282 root 1.8 "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION;
283 root 1.2 +http://software.schmorp.de/pkg/AnyEvent)").
284    
285 root 1.8 $AnyEvent::HTTP::MAX_PER_HOST
286 root 1.10 The maximum number of concurrent connections to the same host
287 root 1.8 (identified by the hostname). If the limit is exceeded, then the
288     additional requests are queued until previous connections are
289     closed.
290 root 1.2
291 root 1.8 The default value for this is 4, and it is highly advisable to not
292     increase it.
293 root 1.2
294     $AnyEvent::HTTP::ACTIVE
295     The number of active connections. This is not the number of
296     currently running requests, but the number of currently open and
297     non-idle TCP connections. This number of can be useful for
298     load-leveling.
299 root 1.1
300     SEE ALSO
301 root 1.2 AnyEvent.
302 root 1.1
303     AUTHOR
304 root 1.3 Marc Lehmann <schmorp@schmorp.de>
305     http://home.schmorp.de/
306 root 1.1
307 root 1.7 With many thanks to Дмитрий Шалашов, who provided
308     countless testcases and bugreports.
309