登录
首页精彩阅读R语言之正则表达式
R语言之正则表达式
2017-02-16
收藏

R语言正则表达式

正则表达式表通常被用来检索、替换那些符合某个模式(规则)的文本。在我看来,正则表达式的主要用途有两种:①查找特定的信息②查找并编辑特定的信息,也就是我们经常用的替换。。比如我们要在Word,记事本等里面使用快捷键Ctrl+F,进行查找一个特定的字符,或者替换一个字符,这就使用了正则表达式

正则表达式的功能非常强大,尤其是在文本数据进行处理中显得更加突出。R中的grep、grepl、sub、gsub、regexpr、gregexpr等函数都使用正则表达式的规则进行匹配。这几个函数原型如下:

grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,

fixed = FALSE, useBytes = FALSE, invert = FALSE)

grepl(pattern, x, ignore.case = FALSE, perl = FALSE,

fixed = FALSE, useBytes = FALSE)

sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,

fixed = FALSE, useBytes = FALSE)

gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,

fixed = FALSE, useBytes = FALSE)

regexpr(pattern, text, ignore.case = FALSE, perl = FALSE,

fixed = FALSE, useBytes = FALSE)

gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE,

fixed = FALSE, useBytes = FALSE)

regexec(pattern, text, ignore.case = FALSE, perl = FALSE,

fixed = FALSE, useBytes = FALSE)

这里是对参数进行一个解释说明。

接下来我们对这几个函数谈谈他们的不同点。

 

 

 

现在来举几个例子。

首先使用[]中括号的功能,来查找一下看有没有do组合的单词。

text<-c("Don't","aim","for","success","if","you","want","it","just","do","what","you","love",

"and","believe","in","and","it","will","come","naturally")

#查找含有DO组合的单词

grep("[Dd]o",text)#不区分大小写

grep("[D]o",text)#D要大写

grep("[d]o",text)#D小写

运行结果如下:

> text<-c("Don't","aim","for","success","if","you","want","it","just","do","what",

"you","love","and","believe","in","and","it","will","come","naturally")

> 数据分析培训

> #查找含有DO组合的单词

> grep("[Dd]o",text)#不区分大小写

[1]  1 10

> grep("[D]o",text)#D要大写

[1] 1

> grep("[d]o",text)#D小写

[1] 10 

邮箱匹配:

#邮箱匹配:

text2<-c("704232753@qq.com is my email address.")

grepl("[0-9.*]+@[a-z.*].[a-z.*]",text2)

结果如下

> text2<-c("704232753@qq.com is my email address.")

> grepl("[0-9.*]+@[a-z.*].[a-z.*]",text2)

[1] TRUE

说明可以查找到了。


数据分析咨询请扫描二维码

客服在线
立即咨询