--- AnyEvent-HTTP/README 2008/10/23 02:46:20 1.5 +++ AnyEvent-HTTP/README 2010/06/16 19:17:30 1.13 @@ -51,22 +51,35 @@ gets destroyed before the callbakc is called, the request will be cancelled. - The callback will be called with the response data as first argument - (or "undef" if it wasn't available due to errors), and a hash-ref - with response headers as second argument. + The callback will be called with the response body data as first + argument (or "undef" if an error occured), and a hash-ref with + response headers as second argument. All the headers in that hash are lowercased. In addition to the - response headers, the "pseudo-headers" "HTTPVersion", "Status" and - "Reason" contain the three parts of the HTTP Status-Line of the same - name. The pseudo-header "URL" contains the original URL (which can - differ from the requested URL when following redirects). + response headers, the "pseudo-headers" (uppercase to avoid clashing + with possible response headers) "HTTPVersion", "Status" and "Reason" + contain the three parts of the HTTP Status-Line of the same name. + + The pseudo-header "URL" contains the actual URL (which can differ + from the requested URL when following redirects - for example, you + might get an error that your URL scheme is not supported even though + your URL is a valid http URL because it redirected to an ftp URL, in + which case you can look at the URL pseudo header). + + The pseudo-header "Redirect" only exists when the request was a + result of an internal redirect. In that case it is an array + reference with the "($data, $headers)" from the redirect response. + Note that this response could in turn be the result of a redirect + itself, and "$headers->{Redirect}[1]{Redirect}" will then contain + the original response, and so on. - If the server sends a header multiple lines, then their contents - will be joined together with "\x00". + If the server sends a header multiple times, then their contents + will be joined together with a comma (","), as per the HTTP spec. If an internal error occurs, such as not being able to resolve a hostname, then $data will be "undef", "$headers->{Status}" will be - 599 and the "Reason" pseudo-header will contain an error message. + "59x" (usually 599) and the "Reason" pseudo-header will contain an + error message. A typical callback might look like this: @@ -91,20 +104,23 @@ The request headers to use. Currently, "http_request" may provide its own "Host:", "Content-Length:", "Connection:" and "Cookie:" headers and will provide defaults for "User-Agent:" - and "Referer:". + and "Referer:" (this can be suppressed by using "undef" for + these headers in which case they won't be sent at all). timeout => $seconds The time-out to use for various stages - each connect attempt - will reset the timeout, as will read or write activity. Default - timeout is 5 minutes. + will reset the timeout, as will read or write activity, i.e. + this is not an overall timeout. + + Default timeout is 5 minutes. proxy => [$host, $port[, $scheme]] or undef Use the given http proxy for all requests. If not specified, then the default proxy (as specified by $ENV{http_proxy}) is used. - $scheme must be either missing or "http" for HTTP, or "https" - for HTTPS. + $scheme must be either missing, "http" for HTTP or "https" for + HTTPS. body => $string The request body, usually empty. Will be-sent as-is (future @@ -117,7 +133,7 @@ The $hash_ref must be an (initially empty) hash reference which will get updated automatically. It is possible to save the cookie_jar to persistent storage with something like JSON or - Storable, but this is not recommended, as expire times are + Storable, but this is not recommended, as expiry times are currently being ignored. Note that this cookie implementation is not of very high @@ -127,6 +143,91 @@ Cookies are a privacy disaster, do not use them unless required to. + tls_ctx => $scheme | $tls_ctx + Specifies the AnyEvent::TLS context to be used for https + connections. This parameter follows the same rules as the + "tls_ctx" parameter to AnyEvent::Handle, but additionally, the + two strings "low" or "high" can be specified, which give you a + predefined low-security (no verification, highest compatibility) + and high-security (CA and common-name verification) TLS context. + + The default for this option is "low", which could be interpreted + as "give me the page, no matter what". + + on_prepare => $callback->($fh) + In rare cases you need to "tune" the socket before it is used to + connect (for exmaple, to bind it on a given IP address). This + parameter overrides the prepare callback passed to + "AnyEvent::Socket::tcp_connect" and behaves exactly the same way + (e.g. it has to provide a timeout). See the description for the + $prepare_cb argument of "AnyEvent::Socket::tcp_connect" for + details. + + on_header => $callback->($headers) + When specified, this callback will be called with the header + hash as soon as headers have been successfully received from the + remote server (not on locally-generated errors). + + It has to return either true (in which case AnyEvent::HTTP will + continue), or false, in which case AnyEvent::HTTP will cancel + the download (and call the finish callback with an error code of + 598). + + This callback is useful, among other things, to quickly reject + unwanted content, which, if it is supposed to be rare, can be + faster than first doing a "HEAD" request. + + Example: cancel the request unless the content-type is + "text/html". + + on_header => sub { + $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/ + }, + + on_body => $callback->($partial_body, $headers) + When specified, all body data will be passed to this callback + instead of to the completion callback. The completion callback + will get the empty string instead of the body data. + + It has to return either true (in which case AnyEvent::HTTP will + continue), or false, in which case AnyEvent::HTTP will cancel + the download (and call the completion callback with an error + code of 598). + + This callback is useful when the data is too large to be held in + memory (so the callback writes it to a file) or when only some + information should be extracted, or when the body should be + processed incrementally. + + It is usually preferred over doing your own body handling via + "want_body_handle", but in case of streaming APIs, where HTTP is + only used to create a connection, "want_body_handle" is the + better alternative, as it allows you to install your own event + handler, reducing resource usage. + + want_body_handle => $enable + When enabled (default is disabled), the behaviour of + AnyEvent::HTTP changes considerably: after parsing the headers, + and instead of downloading the body (if any), the completion + callback will be called. Instead of the $body argument + containing the body data, the callback will receive the + AnyEvent::Handle object associated with the connection. In error + cases, "undef" will be passed. When there is no body (e.g. + status 304), the empty string will be passed. + + The handle object might or might not be in TLS mode, might be + connected to a proxy, be a persistent connection etc., and + configured in unspecified ways. The user is responsible for this + handle (it will not be used by this module anymore). + + This is useful with some push-type services, where, after the + initial headers, an interactive protocol is used (typical + example would be the push-style twitter API which starts a + JSON/XML stream). + + If you think you need this, first have a look at "on_body", to + see if that doesn't solve your problem in a better way. + Example: make a simple HTTP GET request for http://www.nethype.de/ http_request GET => "http://www.nethype.de/", sub { @@ -157,30 +258,38 @@ undef $request; + DNS CACHING + AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for the + actual connection, which in turn uses AnyEvent::DNS to resolve + hostnames. The latter is a simple stub resolver and does no caching on + its own. If you want DNS caching, you currently have to provide your own + default resolver (by storing a suitable resolver object in + $AnyEvent::DNS::RESOLVER). + GLOBAL FUNCTIONS AND VARIABLES AnyEvent::HTTP::set_proxy "proxy-url" Sets the default proxy server to use. The proxy-url must begin with - a string of the form "http://host:port" (optionally "https:..."). + a string of the form "http://host:port" (optionally "https:..."), + croaks otherwise. + + To clear an already-set proxy, use "undef". $AnyEvent::HTTP::MAX_RECURSE The default value for the "recurse" request parameter (default: 10). $AnyEvent::HTTP::USERAGENT The default value for the "User-Agent" header (the default is - "Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; + "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"). - $AnyEvent::HTTP::MAX_PERSISTENT - The maximum number of persistent connections to keep open (default: - 8). - - Not implemented currently. + $AnyEvent::HTTP::MAX_PER_HOST + The maximum number of concurrent connections to the same host + (identified by the hostname). If the limit is exceeded, then the + additional requests are queued until previous connections are + closed. - $AnyEvent::HTTP::PERSISTENT_TIMEOUT - The maximum time to cache a persistent connection, in seconds - (default: 2). - - Not implemented currently. + The default value for this is 4, and it is highly advisable to not + increase it. $AnyEvent::HTTP::ACTIVE The number of active connections. This is not the number of @@ -195,3 +304,6 @@ Marc Lehmann http://home.schmorp.de/ + With many thanks to Дмитрий Шалашов, who provided + countless testcases and bugreports. +