Python requests 网络数据采集图片下载
环境配置
Pycharm开发环境
python 版本 python3.7
Anconda 集成开发环境
爬虫的一般思路 主要流程步骤
环境配置
Pycharm开发环境
python 版本 python3.7
Anconda 集成开发环境
导入第三方模块
pip install requests
pip install json
代码分析
import requests
import parsel
base_url = 'https://tieba.baidu.com/f?kw=%D4%BC%BB%E1'
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko'}
response = requests.get(url=base_url,headers=headers)
html_str = response.text
html = parsel.Selector(html_str)
title_url = html.xpath('//div[@class="threadlist_lz clearfix"]/div/a/@href').extract()
second_url = 'https://tieba.baidu.com'
for url in title_url:
all_url =second_url + url
print('当前贴子链接',all_url)
response_2= requests.get(all_url, headers=headers).text
response_2_data = parsel.Selector(response_2)
reslut_list = response_2_data.xpath('//cc/div/img[@class="BDE_Image"]/@src').extract()
for li in reslut_list:
img_data = requests.get(li,headers= headers).content
file_name =li.split('/')[-1]
print(file_name)
with open('image\\'+file_name,'wb') as f:
print('下载图片',file_name)
f.write(img_data)
效果