python爬虫入库操作教程

admin 阅读：622 2024-09-04

python爬虫入库操作包括：建立数据库连接；准备sql插入语句；执行插入操作；提交事务；关闭连接。

python爬虫入库操作教程

Python 爬虫入库操作教程

引言

Python爬虫入库是指将爬取到的数据保存到数据库（如MySQL、MongoDB）中。这一步骤对于数据分析、机器学习和数据可视化等任务至关重要。本教程将分步介绍如何使用Python爬虫将数据入库。

数据库设置

立即学习“Python免费学习笔记（深入）”；

创建一个数据库（例如MySQL）并创建一个数据库表以存储数据。
确保数据库服务器正在运行。

Python 爬虫设置

安装Python爬虫库（例如BeautifulSoup、Requests）
编写爬虫代码以获取所需数据。

入库操作

1. 建立数据库连接

import mysql.connector as mysql

db = mysql.connect(
    host="localhost",
    user="root",
    password="rootpassword",  # 替换为您的密码
    database="my_database",
)
cursor = db.cursor()

2. 准备 SQL 插入语句

sql = "INSERT INTO my_table (field1, field2, field3) VALUES (%s, %s, %s)"

3. 执行插入操作

data = ("value1", "value2", "value3")
cursor.execute(sql, data)

4. 提交事务

db.commit()

5. 关闭连接

cursor.close()
db.close()

示例

以下是使用BeautifulSoup和Requests爬取网页数据并存入MySQL数据库的示例代码：

import requests
from bs4 import BeautifulSoup
import mysql.connector as mysql

# 爬取网页数据
url = "example.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# 提取数据并准备 SQL 插入语句
sql = "INSERT INTO my_table (title, content) VALUES (%s, %s)"
data = []
for article in soup.find_all("article"):
    title = article.find("h1").text
    content = article.find("p").text
    data.append((title, content))

# 建立数据库连接并执行插入操作
db = mysql.connect(...)  # 同上
cursor = db.cursor()
cursor.executemany(sql, data)
db.commit()

# 关闭连接
cursor.close()
db.close()

声明

1、部分文章来源于网络，仅作为参考。
2、如果网站中图片和文字侵犯了您的版权，请联系1943759704@qq.com处理！