数据科学专业问答社区，好文章，一字千金--CDA答疑社区

select * from
	(select CustomerID as '客户ID',group_concat(SKU) as'SKU' from Orderinfo,OrderDetail 
	where Orderinfo.OrderID=OrderDetail.OrderID
	group by CustomerID) aa
	where SKU like '%SKU1%'
	and SKU like '%SKU2%';

0 0 0

朝阳Tim

2019-02-25

mysql多条件查找的思路？

需要Index.difference：
B.loc[B.index.difference(A.index)]
编辑：
A = pd.DataFrame({'A':range(10)}, index=pd.date_range('2019-02-01', periods=10))
B = pd.DataFrame({'A':range(10, 20)}, index=pd.date_range('2019-01-27', periods=10))
df = pd.concat([A, B.loc[B.index.difference(A.index)]]).sort_index()
print (df)
A
2019-01-27 10
2019-01-28 11
2019-01-29 12
2019-01-30 13
2019-01-31 14
2019-02-01 0
2019-02-02 1
2019-02-03 2
2019-02-04 3
2019-02-05 4
2019-02-06 5
2019-02-07 6
2019-02-08 7
2019-02-09 8
2019-02-10 9
df1= pd.concat([A, B])
df1 = df1[~df1.index.duplicated()].sort_index()
print (df1)
A
2019-01-27 10
2019-01-28 11
2019-01-29 12
2019-01-30 13
2019-01-31 14
2019-02-01 0
2019-02-02 1
2019-02-03 2
2019-02-04 3
2019-02-05 4
2019-02-06 5
2019-02-07 6
2019-02-08 7
2019-02-09 8
2019-02-10 9

0 0 0

啊啊啊啊啊吖

2019-02-25

Pandas加入（合并？）数据帧，只保留唯一的指标问问题

方法四：工作文件夹下“shift 右键”cmd进入当前路径，输入jupyter notebook

0 0 0

PGC123

2019-02-23

jupyter 修改路径

geom_point，geom_line，geom_hline，和geom_abline。为了摆脱这些界限，我们需要

geom_abline(aes(color = "yellow", intercept = 0, slope = 1), show.legend = FALSE)

而对于我们必须添加的点

guides(color = guide_legend(override.aes = list(shape = c(19, NA, NA))))

0 0 0

啊啊啊啊啊吖

2019-02-20

如何删除添加到图例的倾斜线？

解决办法：
首先导入&&子集化数据
#I called mine Ancovas. --> Note, export your df as .csv to work with it in R.
Ancovas <- read.csv("~/Dropbox/YOUR DATAFILE NAME.csv")
#Next, subset your data by the two conditions (e.g. "l"=light, "d"=dark), and both treatments (e.g. "MQ"=water, "DOM"=media)
AncovasL <- Ancovas[(Ancovas$UV == "Light"), ]
AncovasL.MQ <- AncovasL[(AncovasL$DOM == "MQ"), ]
AncovasL.DOM <- AncovasL[(AncovasL$DOM == "DOM"), ]
AncovasD <- Ancovas[(Ancovas$UV == "Dark"), ]
AncovasD.MQ <- AncovasD[(AncovasD$DOM == "MQ"), ] #This code only keeps what is inside the brackets
AncovasD.DOM <- AncovasD[(AncovasD$DOM == "DOM"), ] #Note, adding and "!" after square bracket removes what is in " ".
创建回归函数
#--> this code was gathered from several sites.
#note: I don't understand the logic of how the numbers in brackets are organized. But this essentially pulls some information from the fit model. i.e. [9] means find the 9th value in the list (I think)
regression = function(Ancovas){
fit <- lm(AvgBio ~ Exposure, data=Ancovas)
slope <- round(coef(fit)[2],1)
intercept <- round(coef(fit)[1],0)
R2 <- round(as.numeric(summary(fit)[8]),3)
R2.Adj <- round(as.numeric(summary(fit)[9]),3)
p.val <- signif(summary(fit)$coef[2,4], 3)
c(slope,intercept,R2,R2.Adj, p.val) }
现在通过TREATMENT拆分回归数据并应用回归函数
#Call your column "Treatments"
regressions_dataL.MQ <- ddply(AncovasL.MQ, "Treatment", regression) #For light samples using water
regressions_dataL.DOM <- ddply(AncovasL.DOM, "Treatment", regression) #For light samples using media
regressions_dataD.MQ <- ddply(AncovasD.MQ, "Treatment", regression) #For dark samples using water
regressions_dataD.DOM <- ddply(AncovasD.DOM, "Treatment", regression) #For dark samples using media
#Rename columns
colnames(regressions_dataL.MQ) <-c ("Treatment","slope","intercept","R2","R2.Adj","p.val")
colnames(regressions_dataL.DOM) <-c ("Treatment","slope","intercept","R2","R2.Adj","p.val")
colnames(regressions_dataD.MQ) <-c ("Treatment","slope","intercept","R2","R2.Adj","p.val")
colnames(regressions_dataD.DOM) <-c ("Treatment","slope","intercept","R2","R2.Adj","p.val")
为数字创建主题
#Yes I like to hyper control every aspect of my theme
theme_new <- theme(panel.background = element_rect(fill = "white", linetype = "solid", colour = "black"),
legend.key = element_rect(fill = "white"), panel.grid.minor = element_blank(), panel.grid.major = element_blank(),
axis.text.x=element_text(size = 11, angle = 0, hjust=0.5), #axis numbers (set it to 1 to place it on left side, 0.5 for middle and 0 for right side)
axis.text.y=element_text(size = 13, angle = 0),
plot.title=element_text(size=15, vjust=0, hjust=0), #hjust 0.5 to center title
axis.title.x=element_text(size=14), #X-axis title
axis.title.y=element_text(size=14, vjust=1.5), #Y-axis title
legend.position = "top",
legend.title = element_text(size = 11, colour = "black"), #Legend title
legend.text = element_text(size = 8, colour = "black", angle = 0), #Legend text
strip.text.x = element_text(size = 9, colour = "black", angle = 0), #Facet x text size
strip.text.y = element_text(size = 9, colour = "black", angle = 270)) #Facet y text size
guides_new <- guides(color = guide_legend(reverse=F), fill = guide_legend(reverse=F)) #Controls the order of your legend
Colours <-
rainbow_hcl(length(levels(factor(StackedTable$DOM))), start = 30, end = 300) #Yes I am Canadian so Colours has a "u"
Colours[5] <- "#47984c" #Green
Colours[4] <- "#7b64b4" #Purple-grey
Colours[3] <- "#ff7f50" #Orange
Colours[2] <- "#cc3636" #Red
Colours[1] <- "#4783ba" #Blue
创建两个稍后合并的数字
#Making plot for panel A ("Dark condition")
PlotA <-
ggplot(AncovasD, aes(x=as.numeric(Time.h), y=as.numeric(Measurement), fill=as.factor(Treatment))) +
geom_smooth(data=subset(AncovasD,Treatment =="MQ"), aes(Time.h,Measurement,color=factor(Treatment)),method="lm", formula = y~x, se=T, show.legend = F) +
geom_smooth(data=subset(AncovasD,Treatment =="DOM"), aes(Time.h,Measurement,color=factor(Treatment)),method="lm", formula = y~x, se=T, show.legend = F) + #You need this line twice, once for each condition
geom_errorbar(data=AncovasD, aes(ymin=Measurement-SD, ymax=Measurement+SD), width=0.2, colour="#73777a", size = 0.5) + #Change width based on the size of your X-axis
geom_point(shape = 21, size = 3, colour = "black", stroke = 1) + #colour is the outline of the circle, stroke is the thickness of that outline
facet_grid(Treatment ~ UV) + #This places all your treatments into a grid. Change the order if you want them horizontal. Use "." if you do not want a label.
geom_label(data=regressions_dataD.MQ, inherit.aes=FALSE, size=0.7, colour=Colours[1], #Add label for DOM regressions, specify same colour as your legend, change size depending on how large you want the text
aes(x=-0.1, y=41, label=paste(" ", "m == ", slope, "\n " , #replace this line with the values you want: e.g. R-squared=("R2 == ", R2.Adj) ; intercept=("b == ", intercept). The "\n " makes a second line
" ", "p == ", p.val ))) + #This completes the first label. Repeat same process for second label.
geom_label(data=regressions_dataD.DOM, inherit.aes=FALSE, size=0.7, colour=Colours[2],
aes(x=-0.1, y=4, label=paste(" ", "m == ", slope, "\n " ,
" ", "p == ", p.val )))
#Now for the irradiated samples "light" plot (Panel B)
PlotB <-
ggplot(AncovasL, aes(x=as.numeric(Time.h), y=as.numeric(Measurement), fill=as.factor(Treatment))) + #Same as above but use your second dataframe.
geom_smooth(data=subset(AncovasL,Treatment =="MQ"), aes(Time.h,Measurement,color=factor(Treatment)),method="lm", formula = y~x, se=T, show.legend = F) +
geom_smooth(data=subset(AncovasL,Treatment =="DOM"), aes(Time.h,Measurement,color=factor(Treatment)),method="lm", formula = y~x, se=T, show.legend = F) +
geom_errorbar(data=AncovasL, aes(ymin=Measurement-SD, ymax=Measurement+SD), width=0.2, colour="#73777a", size = 0.5) +
geom_point(shape = 21, size = 3, colour = "black", stroke = 1) +
facet_grid(Treatment ~ UV) +
geom_label(data=regressions_dataL.MQ, inherit.aes=FALSE, size=0.7, colour=Colours[1],
aes(x=-0.1, y=41, label=paste(" ", "m == ", slope, "\n " ,
" ", "p == ", p.val ))) +
geom_label(data=regressions_dataL.DOM, inherit.aes=FALSE, size=0.7, colour=Colours[2],
aes(x=-0.1, y=4, label=paste(" ", "m == ", slope, "\n " ,
" ", "p == ", p.val )))

0 0 0

啊啊啊啊啊吖

2019-02-20

ggplot2：使用斜率和p值标签面对多个回归

修复第一张图，应该是：从数据库-->从MySQL数据库

0 0 0

朝阳Tim

2019-02-19

excel如何读取mysql数据库的数据？

我终于找到了解决方案。可以在代码中手动定位颜色条，但我想保留原始间距的所有内容。我的最终解决方案概述如下。
步骤1.在底部子图上使用单个颜色条创建绘图。
figure('color', 'white', 'DefaultAxesFontSize', fontSize, 'pos', posVec) ax(1) = subplot2(2,1,1); pcolor(x2d, t2d, dataMat1) shading interp ylim([0 10]) xlim([-0.3 0.3]) xticklabels({}) set(gca, 'clim', [-20 0]) colormap(flipud(gray)) set(gca,'layer','top') axis ij ax(2) = subplot2(2,1,2); pcolor(x2d, t2d, dataMat2); xlabel('x') ylabel('y') shading interp ylim([0 10]) xlim([-0.3 0.3]) set(gca, 'clim', [-20 0]) yticklabels({}) cbar = colorbar; cbar.Label.String = 'Normalized Unit'; colormap(flipud(gray)) set(gca,'layer','top') axis ij
步骤2.保存两个子图和颜色条的位置矢量。
`pos1 = ax(1).Position; % Position vector = [x y width height] pos2 = ax(2).Position; pos3 = cbar.Position;`
步骤3.更新颜色条的位置以延伸到顶部子图的顶部。
`cbar.Position = [pos3(1:3) (pos1(2)-pos3(2))+pos1(4)];`
步骤4.更新顶部子图的宽度以容纳颜色条。
`ax(1).Position = [pos1(1) pos1(2) pos2(3) pos1(4)];`
步骤5.更新底部子图的宽度以容纳颜色条。
`ax(2).Position = pos2;`

0 0 0

啊啊啊啊啊吖

2019-02-19

用于垂直子图的单色条

我是来看评论的。

0 0 0

PGC123

2019-02-19

SQL中的AVG()函数用法

解决办法：
可以尝试使用的是pd.Series.value_counts()：
# Mock df
df = pd.DataFrame({key:np.random.randint(1, 6, 5) for key in "abcde"})
a b c d e
0 5 5 2 4 5
1 1 1 2 3 4
2 1 1 1 4 4
3 2 1 1 1 4
4 5 2 4 5 3
cols = ["a", "b", "c"]
new_df = pd.concat([df[c].value_counts() for c in cols], 1).fillna(0).astype(int)
print(new_df)
a b c
1 2 3 2
2 1 1 2
4 0 0 1
5 2 1 0

0 0 0

啊啊啊啊啊吖

2019-02-18

有用谢谢

解决类OLEOBJECT的Activate方法无效

有用，谢谢

解决类OLEOBJECT的Activate方法无效

调节效应和交互效应应该如何区分？除了说在模型的地位是否等价之外，还有哪些区分点呢？特别是调节和交互的种属关系，调节是交互的子概念？就是M在X-->Y的关系中，如果是调节一定是交互，如果是交互不一定是调节

调节作用和交互作用的区别是什么？

python3中map函数是惰性的，如果不进行[*map(func,[1,2,3,4,5,6,7])]或者list(map(func,[1,2,3,4,5,6,7]))操作，map函数不会实际执行，切记！

如何给map函数传递多个参数？

如果跨越某个cumsum阈值，如何对pandas行进行分组

可以像这样计算6h的平均值：
df.set_index('datetime').resample('6h').mean()
这是每6小时一个值。如果你想要滚动的意思，你会想要结账pd.DataFrame.rolling

计算pandas数据框问题

如何根据一列中的唯一值将df拆分为较小的df

使用嵌套数据遍历行和列问问题

如何对某些ID的先前值进行计数/求和？

如何使用不同的矩阵幂来生成图

计算df中的行数以发现每天的存活率

更正代码如下：
`select * from (select CustomerID as '客户ID',group_concat(SKU) as'SKU' from Orderinfo,OrderDetail where Orderinfo.OrderID=OrderDetail.OrderID group by CustomerID) aa where SKU like '%SKU1%' and SKU like '%SKU2%';`

mysql多条件查找的思路？

Pandas加入（合并？）数据帧，只保留唯一的指标问问题

方法四：工作文件夹下“shift 右键”cmd进入当前路径，输入jupyter notebook

jupyter 修改路径

`geom_point`，`geom_line`，`geom_hline`，和`geom_abline`。为了摆脱这些界限，我们需要
`geom_abline(aes(color = "yellow", intercept = 0, slope = 1), show.legend = FALSE)`
而对于我们必须添加的点
`guides(color = guide_legend(override.aes = list(shape = c(19, NA, NA))))`

如何删除添加到图例的倾斜线？

ggplot2：使用斜率和p值标签面对多个回归

修复第一张图，应该是：从数据库-->从MySQL数据库

excel如何读取mysql数据库的数据？

用于垂直子图的单色条

我是来看评论的。

SQL中的AVG()函数用法

通过计算来自多个列的不同值的出现次数来创建新df的函数

有用 谢谢

有用，谢谢

调节效应和交互效应应该如何区分？除了说在模型的地位是否等价之外，还有哪些区分点呢？ 特别是调节和交互的种属关系，调节是交互的子概念？就是M在X-->Y的关系中，如果是调节一定是交互， 如果是交互不一定是调节

python3中map函数是惰性的，如果不进行[*map(func,[1,2,3,4,5,6,7])]或者list(map(func,[1,2,3,4,5,6,7]))操作，map函数不会实际执行，切记！

可以像这样计算6h的平均值：df.set_index('datetime').resample('6h').mean()这是每6小时一个值。如果你想要滚动的意思，你会想要结账pd.DataFrame.rolling

更正代码如下：select * from (select CustomerID as '客户ID',group_concat(SKU) as'SKU' from Orderinfo,OrderDetail where Orderinfo.OrderID=OrderDetail.OrderID group by CustomerID) aa where SKU like '%SKU1%' and SKU like '%SKU2%';

方法四：工作文件夹下“shift 右键”cmd进入当前路径，输入jupyter notebook

geom_point，geom_line，geom_hline，和geom_abline。为了摆脱这些界限，我们需要geom_abline(aes(color = "yellow", intercept = 0, slope = 1), show.legend = FALSE)而对于我们必须添加的点guides(color = guide_legend(override.aes = list(shape = c(19, NA, NA))))

修复第一张图，应该是：从数据库-->从MySQL数据库

我终于找到了解决方案。可以在代码中手动定位颜色条，但我想保留原始间距的所有内容。我的最终解决方案概述如下。步骤1.在底部子图上使用单个颜色条创建绘图。

我是来看评论的。

有用谢谢

调节效应和交互效应应该如何区分？除了说在模型的地位是否等价之外，还有哪些区分点呢？特别是调节和交互的种属关系，调节是交互的子概念？就是M在X-->Y的关系中，如果是调节一定是交互，如果是交互不一定是调节

可以像这样计算6h的平均值：
df.set_index('datetime').resample('6h').mean()
这是每6小时一个值。如果你想要滚动的意思，你会想要结账pd.DataFrame.rolling

更正代码如下：
`select * from (select CustomerID as '客户ID',group_concat(SKU) as'SKU' from Orderinfo,OrderDetail where Orderinfo.OrderID=OrderDetail.OrderID group by CustomerID) aa where SKU like '%SKU1%' and SKU like '%SKU2%';`

`geom_point`，`geom_line`，`geom_hline`，和`geom_abline`。为了摆脱这些界限，我们需要
`geom_abline(aes(color = "yellow", intercept = 0, slope = 1), show.legend = FALSE)`
而对于我们必须添加的点
`guides(color = guide_legend(override.aes = list(shape = c(19, NA, NA))))`