算出来的列类型是pd.Timedelta,实际上可以视为时间戳,可以通过其他方式转换为数值
import pandas as pd
import datetime
delay_time = datetime.datetime.strptime('2020-10-31', '%Y-%m-%d')
df =pd.read_excel("C:\\Users\\Administrator\\Downloads\\1604563522_368980.xlsx")
df['time'] = pd.to_datetime(df['time'], unit='d')
df.loc[df['标识']=='推迟','天数_py'] = delay_time + datetime.timedelta(days=1) - df.loc[df['标识']=='推迟','time']
df.loc[df['标识']!='推迟','天数_py'] = pd.Timedelta(seconds=0)wangxishi
2020-11-05
使用excel的话,在C1输入 =IF(B2="推迟",44135-A2+1,0),用py的话源是csv导入到Dataframe吗?
wangxishi
2020-11-05
import pandas as pd
data=pd.read_excel("D:\\360安全浏览器下载\\1604548797_924061.xlsx")
data["计算列1"]=data["time"].map(lambda x:1 if x>pd.Timestamp("2020-1-1") else 0)
重点在于字符串变成Timestamp对象之后才能和Timestamp对象进行大小比较
wangxishi
2020-11-05
你的错误在于这一句:
b = CustomBusinessDay(holidays) CustomBusinessDay这个类里面的holidays参数是一个关键字参数,不是一个位置参数,所以你进行实例化类的时候应该这样写
b = CustomBusinessDay(holidays=holidays)
这个类的部分帮助文件如下
help(CustomBusinessDay)
Help on class CustomBusinessDay in module pandas.tseries.offsets:
class CustomBusinessDay(_CustomMixin, BusinessDay)
| CustomBusinessDay(n=1, normalize=False, weekmask='Mon Tue Wed Thu Fri', holidays=None, calendar=None, offset=datetime.timedelta(0))
|
| DateOffset subclass representing possibly n custom business days,
| excluding holidays.
ermutuxia
2020-11-05
int() argument must be a string, a bytes-like object or a number, not 'list'
During handling of the above exception, another exception occurred:
TypeError: `n` argument must be an integer, got <class 'list'>
ermutuxia
2020-11-05
TypeError Traceback (most recent call last)pandas\_libs\tslibs\offsets.pyx in pandas._libs.tslibs.offsets.BaseOffset._validate_n()TypeError: int() argument must be a string, a bytes-like object or a number, not 'list' During handling of the above exception, another exception occurred:TypeError Traceback (most recent call last)<ipython-input-100-eab4994fcd77> in <module> 22 count_businessday(start_day,end_day) 23 ---> 24 sheet['工作日'] = sheet.apply(lambda x:count_businessday(x['出院时间'],x['归档日期']),axis=1)D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds) 7546 kwds=kwds, 7547 )-> 7548 return op.get_result() 7549 7550 def applymap(self, func) -> "DataFrame":D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self) 178 return self.apply_raw() 179 --> 180 return self.apply_standard() 181 182 def apply_empty_result(self):D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self) 269 270 def apply_standard(self):--> 271 results, res_index = self.apply_series_generator() 272 273 # wrap resultsD:\ProgramData\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_series_generator(self) 298 for i, v in enumerate(series_gen): 299 # ignore SettingWithCopy here in case the user mutates--> 300 results[i] = self.f(v) 301 if isinstance(results[i], ABCSeries): 302 # If we have a view on v, we need to make a copy because<ipython-input-100-eab4994fcd77> in <lambda>(x) 22 count_businessday(start_day,end_day) 23 ---> 24 sheet['工作日'] = sheet.apply(lambda x:count_businessday(x['出院时间'],x['归档日期']),axis=1)<ipython-input-100-eab4994fcd77> in count_businessday(start_day, end_day) 4 '2020-10-05','2020-10-06','2020-10-07','2020-10-08'] 5 holidays=[pd.Timestamp(i) for i in holidays]----> 6 b = CustomBusinessDay(holidays) 7 bus_day = pd.date_range(start=start_day, end=end_day, freq=b) 8 length = len(bus_day)pandas\_libs\tslibs\offsets.pyx in pandas._libs.tslibs.offsets.CustomBusinessDay.__init__()pandas\_libs\tslibs\offsets.pyx in pandas._libs.tslibs.offsets.BusinessMixin.__init__()pandas\_libs\tslibs\offsets.pyx in pandas._libs.tslibs.offsets.BaseOffset.__init__()pandas\_libs\tslibs\offsets.pyx in pandas._libs.tslibs.offsets.BaseOffset._validate_n()TypeError: `n` argument must be an integer, got <class 'list'>
ermutuxia
2020-11-05
from pandas.tseries.offsets import CustomBusinessDay
def count_businessday(start_day,end_day):
holidays=['2020-04-06','2020-05-01','2020-05-04','2020-05-05','2020-06-25','2020-10-01',
'2020-10-05','2020-10-06','2020-10-07','2020-10-08']
holidays=[pd.Timestamp(i) for i in holidays]
b = CustomBusinessDay(holidays)
bus_day = pd.date_range(start=start_day, end=end_day, freq=b)
length = len(bus_day)
extra_work_day = ['2020-04-26','2020-05-09','2020-06-28','2020-09-27','2020-10-10']
extra_work_day = [pd.Timestamp(i) for i in extra_work_day]
extra_len = 0
for i in extra_work_day:
if i >= start_day and i <= end_day:
extra_len = extra_len+1
return(length+extra_len-1)
if __name__=='__main__':
count_businessday(start_day,end_day)
sheet['工作日'] = sheet.apply(lambda x:count_businessday(x['出院时间'],x['归档日期']),axis=1)ermutuxia
2020-11-05
问题在于你的extra_work_day= ['2020-04-26','2020-05-09','2020-06-28','2020-09-27','2020-10-10'] 这个列表里面的元素是字符串,所以不能和时间戳对象进行比较,你需要变一下,把他变成时间戳对象,可以通过下面的方式: extra_work_day= ['2020-04-26','2020-05-09','2020-06-28','2020-09-27','2020-10-10']
extra_work_day=[pd.Timestamp(i) for i in extra_work_day]

然后你就可以用原来的命令比较啦
ermutuxia
2020-11-05
from pandas.tseries.offsets import CustomBusinessDay
def count_businessday(start_day,end_day):
b = CustomBusinessDay(holidays=['2020-04-06','2020-05-01','2020-05-04','2020-05-05','2020-06-25','2020-10-01',
'2020-10-05','2020-10-06','2020-10-07','2020-10-08'])
bus_day = pd.date_range(start=start_day, end=end_day, freq=b)
length = len(bus_day)
extra_work_day = ['2020-04-26','2020-05-09','2020-06-28','2020-09-27','2020-10-10']
extra_len = 0
dayday = Interval(start_day,end_day)
for i in extra_work_day:
if i in dayday:
extra_len = extra_len+1
return(length+extra_len-1)
if __name__=='__main__':
count_businessday(start_day,end_day)sheet['工作日'] = sheet.apply(lambda x:count_businessday(x['出院时间'],x['归档日期']),axis=1)
出现这个错误:
'<' not supported between instances of 'str' and 'Timestamp'
ermutuxia
2020-11-05