直接附上程式碼, 這是我要備份Medium CDN用的.
主要是用bs4解析, 再用request去download.
Python3 urlretrieve已不可使用, 下載檔案需用request.
import requests
from bs4 import BeautifulSoup as bs
from urllib.request import (
urlopen, urlparse, urlunparse, urlretrieve)
import os
import sys
from os import listdir
from os.path import isfile, join
import random
import time
out_folder = "D:/BlogBackup"
mypath = "D:\BlogBackup\medium-export\posts"
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
for i in onlyfiles:
# print("/BlogBackup/medium-export/posts/"+str(i))
file = open("/BlogBackup/medium-export/posts/"+str(i), "r",encoding="utf-8").read()
soup = bs(file)
for image in soup.findAll("img"):
url = image["src"]
filename = image["src"].split("/")[-1]
print(filename)
r = requests.get(url)
with open("/BlogBackup/MeduimImg/"+filename.replace("*","$"), 'wb') as outfile:
outfile.write(r.content)
time.sleep(random.random()*10)
參考來源: https://stackoverflow.com/questions/257409/download-image-file-from-the-html-page-source-using-python https://stackoverflow.com/questions/34957748/http-error-403-forbidden-with-urlretrieve https://blog.csdn.net/fengzhizi76506/article/details/59229846
Python
SQL Server Analytics Service 1
SEO(1) Github(2) Title Tag(2) ML(1) 李宏毅(1) SQL Server(18) Tempdb(1) SSMS(1) Windows(1) 自我成長(2) Excel(1) python Flask(1) python(5) Flask(2)
Max Chen (159)