Python pandas change duplicate timestamp to unique -
i have file containing duplicate timestamps, maximum 2 each timestamp, not duplicate, second timestamp needs add millisecond timestamp. example, having these in file,
.... 2011/1/4 9:14:00 2011/1/4 9:15:00 2011/1/4 9:15:01 2011/1/4 9:15:01 2011/1/4 9:15:02 2011/1/4 9:15:02 2011/1/4 9:15:03 2011/1/4 9:15:03 2011/1/4 9:15:04 ....
i change them into
2011/1/4 9:14:00 2011/1/4 9:15:00 2011/1/4 9:15:01 2011/1/4 9:15:01.500 2011/1/4 9:15:02 2011/1/4 9:15:02.500 2011/1/4 9:15:03 2011/1/4 9:15:03.500 2011/1/4 9:15:04 ....
what efficient way perform such task?
setup
in [69]: df = dataframe(dict(time = x)) in [70]: df out[70]: time 0 2013-01-01 09:01:00 1 2013-01-01 09:01:00 2 2013-01-01 09:01:01 3 2013-01-01 09:01:01 4 2013-01-01 09:01:02 5 2013-01-01 09:01:02 6 2013-01-01 09:01:03 7 2013-01-01 09:01:03 8 2013-01-01 09:01:04 9 2013-01-01 09:01:04
find locations difference in time previous row 0 seconds
in [71]: mask = (df.time-df.time.shift()) == np.timedelta64(0,'s') in [72]: mask out[72]: 0 false 1 true 2 false 3 true 4 false 5 true 6 false 7 true 8 false 9 true name: time, dtype: bool
set theose locations use offset of 5 milliseconds (in question used 500 anything). requires numpy >= 1.7. (not syntax changing in 0.13 allow more direct df.loc[mask,'time'] += pd.offsets.milli(5)
in [73]: df.loc[mask,'time'] = df.time[mask].apply(lambda x: x+pd.offsets.milli(5)) in [74]: df out[74]: time 0 2013-01-01 09:01:00 1 2013-01-01 09:01:00.005000 2 2013-01-01 09:01:01 3 2013-01-01 09:01:01.005000 4 2013-01-01 09:01:02 5 2013-01-01 09:01:02.005000 6 2013-01-01 09:01:03 7 2013-01-01 09:01:03.005000 8 2013-01-01 09:01:04 9 2013-01-01 09:01:04.005000
Comments
Post a Comment