samedi 9 juillet 2016

reindex sorted pandas dataframe

I have a dataframe like so:

    Column A    Column B    Date       Value
1          A           1   2011-01-01   10
2          B           1   2011-01-01   10
3          A           2   2011-01-01   10
4          B           2   2011-01-01   10
5          A           1   2011-01-02   10
6          B           1   2011-01-02   10
7          A           2   2011-01-02   10
8          B           2   2011-01-02   10
9          A           1   2011-01-03   10
10         B           1   2011-01-03   10
11         B           2   2011-01-03   10

I want to find missing dates for every value of A and B (in this case, it would be A, date: 2011-01-03), and insert NaN there. I tried the reindex function:

df.sort_values(['Column A','Column B'],ascending = [True,True], inplace = True)
df.index = range(1,len(df)+1)
dates = pd.date_range('2011-01-01','2011-01-03')
df = df.reindex(dates, fill_value = None)
print df

But it gives me NaN in every column. Does anyone have any suggestions as to how I can flag these missing values?

Aucun commentaire:

Enregistrer un commentaire