01-regex 练习「音乐抓取」
原创2022年8月8日大约 2 分钟...约 464 字
1. 需求
- 抓取目标网站:https://bornforthis.cn/web_runing/crawler/regex/index.html
- 技术限制:
- requests
- re
- Python 基础语法
- 抓取目标音乐
- 存储制定路径:/data/music/
2. 导入所需库
import requests
import re
3. 编写代码
import requests
import re
from requests.exceptions import RequestException
from urllib.parse import urljoin
BASE = "https://bornforthis.cn/web_runing/crawler/regex/"
def requests_fun(url, binary=False):
try:
response = requests.get(url)
if response.status_code == 200:
if binary:
return response.content
else:
return response.text
return None
except RequestException as e:
return None
def save_music(path, binary):
with open(path, "wb") as f:
f.write(binary)
def parse(html):
pattern = '<a.*?href="(.*?)".*?</a>'
result = re.findall(pattern, html)
return result
def joint(url_lst):
url_list = []
for url in url_lst:
url = urljoin(BASE, url)
url_list.append(url)
return url_list
def postfix(url):
music_name = url.split("/")[-1]
return music_name
def main():
url = "https://bornforthis.cn/web_runing/crawler/regex/index.html"
html = requests_fun(url)
# url_lst = parse(html)
url_list = joint(parse(html))
# print(url_list)
for url in url_list:
# print(url)
binary_content = requests_fun(url, binary=True)
# print(binary_content)
save_music(f"data/music/{postfix(url)}", binary_content)
if __name__ == '__main__':
main()
欢迎关注我公众号:AI悦创,有更多更好玩的等你发现!
公众号:AI悦创【二维码】
AI悦创·编程一对一
AI悦创·推出辅导班啦,包括「Python 语言辅导班、C++ 辅导班、java 辅导班、算法/数据结构辅导班、少儿编程、pygame 游戏开发」,全部都是一对一教学:一对一辅导 + 一对一答疑 + 布置作业 + 项目实践等。当然,还有线下线上摄影课程、Photoshop、Premiere 一对一教学、QQ、微信在线,随时响应!微信:Jiabcdefh
C++ 信息奥赛题解,长期更新!长期招收一对一中小学信息奥赛集训,莆田、厦门地区有机会线下上门,其他地区线上。微信:Jiabcdefh
方法一:QQ
方法二:微信:Jiabcdefh
你认为这篇文章怎么样?
- 0
- 0
- 0
- 0
- 0
- 0