登录
首页大数据时代如何快速入门和使用beautifulsoup?
如何快速入门和使用beautifulsoup?
2020-07-15
收藏

BeautifulSoup是一款灵活又便捷的HTML/XML的解析器,通常被用来解析和提取 HTML/XML 数据。BeautifulSoup处理速度快,效率高,而且支持多种解析器,不用编写正则表达式也能快速地实现网页信息的提取。

1、BeautifulSoup与其他抓取工具的对比:

2、解析库

3、安装

(1)pip3 install beautifulsoup

(2)导入模块:from bs4 import BeautifulSoup

(3) 创建BeautifulSoup对象

参数一:解析的文本内容

参数二:使用的解析器,一般为lxml(必须添加,否则会发出警告)

(4)格式化输出 soup 对象的内容

4、基本使用

html = """

<html><head><title>The Dormouse's story</title></head>

<body>

<p class="title" name="dromouse"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were

<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,

<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

and they lived at the bottom of a well.</p>

<p class="story">...</p>

"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'lxml')#传入解析器:lxml

print(soup.prettify())#格式化代码,自动补全

print(soup.title.string)#得到title标签里的内容

数据分析咨询请扫描二维码

客服在线
立即咨询