Python网络数据采集–视频下载
简介
- 模块导入(第三方包)
pip install requests
pip install json
- 基本流程(思路)
案例分析
- 使用浏览器打开网络数据采集网站https://haokan.baidu.com/
我们随便选择一个分类,例如娱乐https://haokan.baidu.com/tab/yule
使用的浏览器是谷歌chrome浏览器
- 使用谷歌浏览器右击打开检查
- 选择Network
- 重新加载界面,选择XHR
- 选择videorec?tab…选项打开 里面有两个主要部分
- Headers
- Preview
- 建立一个python工程文件
- 使用的环境 Pycharm python3.7.6 anconad
代码分析
import requests
base_url = 'https://haokan.baidu.com/videoui/api/videorec?tab=dongman&act=pcFeed&pd=pc&num=20&shuaxin_id=1586786048409'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
'cookie': 'BAIDUID=D77C61722C38FDD3B0BFA8B2A820D953:FG=1; BIDUPSID=D77C61722C38FDD3B0BFA8B2A820D953; PSTM=1585266931; BDUSS=lV0S3J5WUpoQVpjc1dzSTd2WDdFRFVJcWxxSm1zWmYxOXJvR3ZjNUlRd1NDN1plSVFBQUFBJCQAAAAAABAAAAEAAAAx~5v6bWlrZWFwawAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABJ-jl4Sfo5eR; BDORZ=FFFB88E999055A3F8A630C64834BD6D0; ai-studio-ticket=F4CFFDA9FF2746AF92ABE82F307177F75D82A97D5CDB468D986D3E8F239B685A; PC_TAB_LOG=haokan_website_page; Hm_lvt_4aadd610dfd2f5972f1efee2653a2bc5=1586780010,1586781215; Hm_lpvt_4aadd610dfd2f5972f1efee2653a2bc5=1586781215; reptileData=%7B%22data%22%3A%228ae556604f8334e690c6df18585d95fd66da1768f08b6ef4500a1f442661606743ece5594100dd732b5b7051563e865f31ecc62ed625c9baeb91b86afee8f1f79a81b01972873f7ff06a74b8073c635a0615b26b0790e9afa06686141a80a6de2ca66c7d36af97e2183fd9e72e44bd8b21c7bad462e6fc48f4f2422df70d9ed8%22%2C%22key_id%22%3A%2230%22%2C%22sign%22%3A%22612c0e81%22%7D'
}
response = requests. get(base_url,headers)
data = response . json()
print(data)
data_list = data['data']['response']['videos']
for datal in data_list:
video_title = datal['title'] +'.mp4'
video_ur1 = datal['play_url']
print('srart download.....:',video_title)
video_data = requests. get(video_ur1,headers=headers).content
with open('video\\'+ video_title,mode='wb') as f:
f.write(video_data)
print('download finised ....\n')
print('下载结束了')
1. base_url 视频网站的的url
2. headers 模拟浏览器登录 user-agent cookie
3. 文件的保存
参考资料
- 哔哩哔哩https://www.bilibili.com/video/BV16741127po
- 菜鸟教程 Pythonhttps://www.runoob.com/python3/python3-tutorial.html
- 博客https://lemonhubs.github.io/
- 知乎https://www.zhihu.com/people/bi-shi-san-2-81/posts