当前位置：首页 > python > 正文

python爬虫实例(python爬虫案例详细)

python
2024-03-12 09:03:36
1729

Python 爬虫示例
要点：

1. 使用 Requests 库获取网页源代码


import requests
url = "http://www.example.com/"
response = requests.get(url)
html = response.text

2. 使用 BeautifulSoup 库解析 HTML


from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")

3. 查找和提取所需数据


titles = soup.find_all("h1")
for title in titles:
    print(title.text)

4. 处理和存储数据


# 写入 CSV 文件
import csv
with open("data.csv", "w", newline="") as csvfile:
    csvwriter = csv.writer(csvfile)
    csvwriter.writerow(["Title", "URL"])
    for title in titles:
        csvwriter.writerow([title.text, url])

5. 处理错误和响应代码


try:
    response = requests.get(url)
    if response.status_code != 200:
        raise Exception("Error: " + str(response.status_code))
except Exception as e:
    print(e)

6. 使用多线程或并发库提高速度


# 使用 ThreadPoolExecutor 多线程
from concurrent.futures import ThreadPoolExecutor
def get_html(url):
    response = requests.get(url)
    html = response.text
    return html
urls = ["url1", "url2", "url3"]
with ThreadPoolExecutor(max_workers=4) as executor:
    results = executor.map(get_html, urls)