1 Year Rolling mean pandas on column date

0 votes

I would like to compute the 1 year rolling average for each line on the Dataframe below

test:

index   id      date        variation
2313    7034    2018-03-14  4.139148e-06
2314    7034    2018-03-13  4.953194e-07
2315    7034    2018-03-12  2.854749e-06
2316    7034    2018-03-09  3.907458e-06
2317    7034    2018-03-08  1.662412e-06
2318    7034    2018-03-07  1.346433e-06
2319    7034    2018-03-06  8.731700e-06
2320    7034    2018-03-05  7.145597e-06
2321    7034    2018-03-02  4.893283e-06
...

For example, I would need to calculate:

  • mean of variation of 7034 between 2018-03-14 and 2017-08-14
  • mean of variation of 7034 between 2018-03-13 and 2017-08-13

  • etc.

I tried:

test.groupby(['id','date'])['variation'].rolling(window=1,freq='Y',on='date').mean()

but I got the error message:

ValueError: invalid on specified as date, must be a column (if DataFrame) or None

How can I use the pandas rolling() function is this case?

[EDIT 1]

Thank you Sacul

I tested:

df['date'] = pd.to_datetime(df['date'])

df.set_index('date').groupby('id').rolling(window=1, freq='Y').mean()['variation']

But freq='Y' doesn't work (I got: ValueError: Invalid frequency: Y) Then I used window = 365, freq = 'D'.

But there is another issue: because there is never 365 consecutive dates for each couple id-date, the result is always empty. Even if there missing dates, I would like to ignore them and consider all dates between the current date and the current date - 365 to compute the rolling mean. for instance, imagine I have:

index   id      date        variation
2313    7034    2018-03-14  4.139148e-06
2314    7034    2018-03-13  4.953194e-07
2315    7034    2017-03-13  2.854749e-06

Then,

  • for 7034 2018-03-14: I would like to compute MEAN(4.139148e-06,4.953194e-07, 2.854749e-06)
  • for 7034 2018-03-13: I would like to compute also MEAN(4.139148e-06,4.953194e-07, 2.854749e-06)

How can I do that?

[EDIT 2]

Finaly I used the formula below to calculate rolling median, averages and standard deviation on 1 Year by ignoring missing values:

pd.rolling_median(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)

pd.rolling_mean(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)

pd.rolling_std(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)

Sep 6, 2018 in Python by bug_seeker
• 15,520 points
14,028 views

1 answer to this question.

0 votes

I believe this should work for you:

# First make sure that `date` is a datetime object:

df['date'] = pd.to_datetime(df['date'])

df.set_index('date').groupby('id').rolling(window=1, freq='A').mean()['variation']

using pd.DataFrame.rolling with datetime works well when the date is the index, which is why I used df.set_index('date') (as can be seen in one of the documentation's examples)

I can't really test if it works on the year's average on your example dataframe, as there is only one year and only one ID, but it should work.

Arguably Better Solution:

[EDIT] As pointed out by Mihai-Andrei Dinculescu, freq is now a deprecated argument. Here is an alternative (and probably more future-proof) way to do what you're looking for:

df.set_index('date').groupby('id')['variation'].resample('A').mean()

You can take a look at the resample documentation for more details on how this works, and this linkregarding the frequency arguments.

answered Sep 6, 2018 by Priyaj
• 58,090 points

Related Questions In Python

0 votes
2 answers

Deleting DataFrame row in Pandas based on column value

Pandas provide data analysts a way to ...READ MORE

answered Jan 4, 2021 in Python by Nikita
2,065 views
0 votes
0 answers

how to merge two data frames based on particular column in pandas python?

I want to  merge two data frames: df1 company,standard tata,A1 cts,A2 dell,A3 df2 company,return tata,71 dell,78 cts,27 hcl,23 I ...READ MORE

May 26, 2022 in Python by Kichu
• 19,050 points
240 views
0 votes
1 answer

Need help installing easy_install in Python 2.7.1 on Windows 7

That tool is part of the setuptools ...READ MORE

answered Dec 26, 2018 in Python by Nymeria
• 3,560 points
934 views
0 votes
1 answer

Get business days between start and end date using pandas

You can use BDay() to get the ...READ MORE

answered Feb 15, 2019 in Python by SDeb
• 13,300 points
1,816 views
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 4,067 views
0 votes
1 answer
0 votes
1 answer

Start an iteration on first row of a group Pandas

For anyone needing this information in the ...READ MORE

answered Sep 6, 2018 in Python by Priyaj
• 58,090 points
2,083 views
0 votes
1 answer

Replace NaN in rolling mean in python

Try this: In [92]: ts.rolling(window=2, min_periods=1).mean() Out[92]: ...READ MORE

answered Sep 27, 2018 in Python by Priyaj
• 58,090 points
7,492 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP