詹惠儿

2018-11-19   阅读量: 545

数据分析师 R语言

关于正则表达式的练习示例(一)

扫码加入数据分析学习群

1.从一串字符中提取数字

#extract digits - all 4 works
string <- "My roll number is 1006781"
gsub(pattern = "[^0-9]",replacement = "",x = string)
stringi::stri_extract_all_regex(str = string,pattern = "\\d+") #list
regmatches(string, regexpr("[0-9]+",string))
regmatches(string, regexpr("[[:digit:]]+",string))

2.从一串字符串中删除空格

#remove space
gsub(pattern = "[[:space:]]",replacement = "",x = "and going there today tomorrow")
gsub(pattern = "[[:blank:]]",replacement = "",x = "and going there today tomorrow")
gsub(pattern = "\\s",replacement = "",x = "and going there today tomorrow")

3.如果向量中存在值,则返回

#match values
det <- c("A1","A2","A3","A4","A5","A6","A7")
grep(pattern = "A1|A4",x = det,value =T)

4.提取键值对中可用的字符串

d <- c("(monday :: 0.1231313213)","tomorrow","(tuesday :: 0.1434343412)")
grep(pattern = "\\([az]+ :: (0\\.[0-9]+)\\)",x = d,value = T)
regmatches(d,regexpr(pattern = "\\((.*) :: (0\\.[0-9]+)\\)",text = d))

说明:您可能会发现理解起来很复杂,所以让我们一点一点地看一下。 “\(”用于转义元字符。“[az] +”匹配字母一次或多次。“(0 \。[0-9] +)”匹配十进制值,其中元字符(。)被转义使用双反斜杠,周期也是如此。使用“[0-9] +”匹配数字。

添加CDA认证专家【维克多阿涛】,微信号:【cdashijiazhuang】,提供数据分析指导及CDA考试秘籍。已助千人通过CDA数字化人才认证。欢迎交流,共同成长!
0.0000 0 0 关注作者 收藏

评论(0)


暂无数据

推荐课程

推荐帖子