如何快速入门和使用beautifulsoup?-CDA数据分析师官网

热线电话：13121318867

首页大数据时代如何快速入门和使用beautifulsoup?

如何快速入门和使用beautifulsoup?

2020-07-15

BeautifulSoup是一款灵活又便捷的HTML/XML的解析器，通常被用来解析和提取 HTML/XML 数据。BeautifulSoup处理速度快，效率高，而且支持多种解析器，不用编写正则表达式也能快速地实现网页信息的提取。

1、BeautifulSoup与其他抓取工具的对比：

2、解析库

3、安装

(1)pip3 install beautifulsoup

（2）导入模块：from bs4 import BeautifulSoup

（3）创建BeautifulSoup对象

参数一：解析的文本内容

参数二：使用的解析器，一般为lxml(必须添加，否则会发出警告)

（4）格式化输出 soup 对象的内容

4、基本使用

html = """

<html><head><title>The Dormouse's story</title></head>

<body>

<p class="title" name="dromouse"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were

<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,

<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

and they lived at the bottom of a well.</p>

<p class="story">...</p>

"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'lxml')#传入解析器：lxml

print(soup.prettify())#格式化代码，自动补全

print(soup.title.string)#得到title标签里的内容

CDA数据分析师考试相关入口一览（建议收藏）：

▷ 想报名CDA认证考试，点击>>> “CDA报名” 了解CDA考试详情；