大家好,又见面了,我是你们的朋友风君子。

下载目标是堆糖网热门图片,打开网页并下拉发现图片是通过ajax加载的,按F12打开开发者工具选择nerwork并筛选xhr,继续下拉网页找到ajax请求的api,如下图所示

 

堆糖网热门图片下载[通俗易懂](数据结构中堆与内存堆区的区别)-风君雪科技博客

然后就可以构造请求获取包含图片url的json数据,对于网络请求等IO密集型任务,开启进程池可以提高下载速度

代码如下:

import requests from requests import exceptions import re from multiprocessing import Pool import os def get_pic_info(): url = 'https://www.duitang.com/napi/index/hot/?' for i in range(1000): params = { 'include_fields': 'top_comments,is_root,source_link,item,buyable,root_id,status,like_count,sender,album', 'limit': '24', 'start': 24 * i, } response = requests.get(url, params=params) json_data = response.json() pic_list = json_data['data']['object_list'] for pic_ in pic_list: image = {} pic_info = pic_['album'] pic_url = pic_info['covers'][0] image['pic_name'] = re.sub(r'[\\/:*?"<>|\r\n。,.? ]+', '', pic_info['name']) + '.' + pic_url.split('.')[-1] image['pic_url'] = pic_url yield image def download_pic(image): if not os.path.exists(f'./img/{image["pic_name"]}'): try: resp = requests.get(image['pic_url']) if resp.status_code == 200: with open(f'./img/{image["pic_name"]}', 'wb') as f: f.write(resp.content) except exceptions: return None else: print(image['pic_name'] + ' has already downloaded') if __name__ == '__main__': if not os.path.exists('./img'): os.mkdir('./img') pool = Pool() pool.map(download_pic, get_pic_info()) pool.close() pool.join()