HTTPX: 青出于蓝,比肩requests的新生代网络请求库

作为新生代的网络请求库，HTTPX 不仅支持 requests 的所有操作，同时支持异步API 及 HTTP/2。根据官网的描述，总结有如下特点：

标准的同步接口及异步支持
HTTP/1.1 和 HTTP/2
直接向 WSGI/ASGI 应用程序发出请求
严格的超时设置
全类型注释
100% 测试覆盖率

文章目录

快速开始

>>> import httpx
>>> r = httpx.get('https://github.com')
>>> r
<Response [200 OK]>
>>> r.status_code
200
>>> r.text
<!DOCTYPE html>\n<html lang="en"  class="html-fluid"> ...

或者，使用异步API

>>> async with httpx.AsyncClient() as client:
>>>     r = await client.get('https://github.com')
>>> r
<Response [200 OK]>

以及, HTTP/2

>>> client = httpx.Client(http2=True)
>>> r = client.get('https://github.com')
>>> r
<Response [200 OK]>
>>> r.http_version
HTTP/2

安装

使用 pip 安装

pip install httpx

[可选] http2 支持

pip install httpx[http2]

[可选] brotli 解码器支持

pip install httpx[brotli]

基本使用

发起请求

>>> httpx.get('*')
>>> httpx.post('*')
>>> httpx.put('*')
>>> httpx.delete('*')
>>> httpx.head('*')
>>> httpx.options('*')

传递参数

# get 参数
httpx.get(url, params={
   'key1': 'value1', 'key2': ['1', '2']})

# post 参数
httpx.post(url, data={
   'username': '123'})

# json 参数
httpx.post(url, json={
   'query': 'hello'})

# 文件
httpx.post(url, files={
   'file': open('report.xls', 'rb')})

# headers
httpx.get(url, headers={
   'User-agent': 'baiduspider'})

# cookies
httpx.get(url, cookies={
   'sessionid': '***'})

响应

>>> r = httpx.get('https://github.com')

>>> r.url
URL('https://github.com')

>>> r.encoding
utf-8

>>> r.status_code
200

# 文本响应
>>> r.text

# 二进制响应
>>> r.content

# JSON 响应
>>> r.json()

重定向

默认情况下，HTTPX 遵循所有 http 方法的重定向

history 属性可用于展示请求发生的重定向，它包含重定向响应的列表，按照响应的顺序排列

例如, Github 自动将 http 请求重定向到 https

>>> r = httpx.get('http://github.com')
>>> r.url
URL('https://github.com/')
>>> r.status_code
200
>>> r.history
[<Response [301 Moved Permanently]>]

可以使用 allow_redirects 参数禁止默认重定向

>>> r = httpx.get('http://github.com', allow_redirects=False)
>>> r.url
URL('http://github.com/')
>>> r.status_code
301
>>> r.history
[]
>>> r.next_request
<Request('GET', 'https://github.com/')>

超时

HTTPX 允许你以更细的粒度控制不同类型的超时行为，分别是 connect, read, write, pool

connect: 建立一个套接字连接的最大时间，超时抛出 ConnectTimeout 异常
read: 接收数据块的最大时间，超时抛出 ReadTimeout 异常
write: 发送数据块的最大时间，超时抛出 WriteTimeout 异常
pool: 获取连接池中连接的最大时间，超时抛出 PoolTimeout 异常

# 指定 connect 60 秒超时, 其它 10 秒超时
timeout = httpx.Timeout(10, connect=60)
r = httpx.get(url, timeout=timeout)

进阶

Client

当通过 httpx.get 的方式发起请求时，HTTPX 必须为每一个请求建立新连接(连接不会被重用)，随着请求数量的增加，很快就会变得低效

Client 使用 HTTP 连接池，当你对同一主机发出多个请求时，client 将重用底层 TCP 连接，这可以带来非常显著的性能提升，包括：

减少请求的延迟(无握手)
减少 CPU 使用率和往返次数
减少网络拥塞

以及

跨请求的 Cookie 持久性
配置所有请求
使用 http 代理
使用 http/2

如果你使用 requests, httpx.client 可以用来代替 requests.Session

用法

(1) 使用上下文管理器

with httpx.Client() as client:
    ...

(2) 显式关闭连接

client = httpx.client()
try:
    ...
finally:
    client.close()

共享配置

Client 允许您传入参数应用于所有发出的请求

headers = {
   'user-agent': 'httpx/0.18.1'}
with httpx.Client(headers=headers) as client:
    r = client.get(url)

合并配置

当 cient 和 request 都配置请求参数时，可能会发生如下两种情况:

对于 headers，params，cookies, 这些值将组合到一起

>>> headers = {
   'user-agent': 'httpx/0.18.1'}
>>> with httpx.Client(headers=headers) as client:
>>>     r = client.get(url, headers={
   'Content-Type': 'application/json'})
>>>     print(r.headers)
Headers({
   ..., 'user-agent': 'httpx/0.18.1', 'content-type': 'application/json'})

对于其他参数，request 优先

Event hooks

httpx 允许你在 cient 中注册钩子函数，发生特定类型的事件后会自动调用这些函数

目前有两个事件类型:

request，即将发生请求时调用
response，发生响应后调用

event hook 只读

>>> def log_request(request):
>>>     print(f'Request event hook: {
     request.method} - {
     request.url} sending..')

>>> def log_response(response):
>>>     request = response.request
>>>     print(f"Response event hook: {
     request.method} {
     request.url} - Status {
     response.status_code}")

    
>>> with httpx.Client(event_hooks={
   'request': [log_request]}) as client:
>>>     client.get('https://github.com')
>>>     print(r.status_code)

Request event hook: GET - https://github.com sending..
Response event hook: GET https://github.com - Status 200
200

event hooks 为一个列表，你可以为每种类型的事件注册多个钩子函数

Http proxy

# 代理全部请求
httpx.Client(proxies="http://localhost:8030")

# 按协议指定代理
proxies = {
   
    "http://": "http://localhost:8030",
    "https://": "http://localhost:8031",
}

# 复杂代理
proxies = {
   
    # 代理指定端口
    "all://*:1234": "http://localhost:8030",
    # 代理指定二级域名
    "all://*.example.com": "http://localhost:8030",
    # 指定 http 协议不设置代理
    "http://*": None
}

Stream

当我们请求大文件时，无需直接把它们读入内存，可以采用数据流的方式，返回一点读取一点，直至取完全部内容。

你可以流式传输响应的二进制数据:

>>> with httpx.stream("GET", "https://www.example.com") as r:
>>>     for data in r.iter_bytes():
>>>         print(data)

当你使用流响应的时候，response.content 和 response.text 将不可用，但是你也可以在流中有条件地加载响应正文:

>>> with httpx.stream("GET", "https://www.example.com") as r:
>>>     if r.headers['Content-Length'] < TOO_LONG:
>>>         r.read()
>>>         print(r.text)