mercredi 22 juin 2016

Pandas Sort Multiindex by Group Sum

Given the following data frame:

import pandas as pd
df=pd.DataFrame({'County':['A','B','C','D','A','B','C','D','A','B','C','D','A','B','C','D','A','B'],
                'Hospital':['a','b','c','d','e','a','b','c','e','a','b','c','d','e','a','b','c','e'],
                'Enrollment':[44,55,42,57,95,54,27,55,81,54,65,23,89,76,34,12,1,67],
                'Year':['2012','2012','2012','2012','2012','2012','2012','2012','2012','2013',
                        '2013','2013','2013','2013','2013','2013','2013','2013']})
d2=pd.pivot_table(df,index=['County','Hospital'],columns=['Year'])#.sort_columns

d2
        Enrollment
        Year   2012     2013
County  Hospital        
A       a      44.0     NaN
        c      NaN      1.0
        d      NaN      89.0
        e      88.0     NaN
B       a      54.0     54.0
        b      55.0     NaN
        e      NaN      71.5
C       a      NaN      34.0
        b      27.0     65.0
        c      42.0     NaN
D       b      NaN      12.0
        c      55.0     23.0
        d      57.0     NaN

I need to sort the data frame such that County is sorted descendingly by the sum of Enrollment for the most recent year (I want to avoid using '2013' directly) like this:

        Enrollment  
    Year          2012  2013
County  Hospital        
B       a         54    54
        b         55    NaN
        e         NaN   71.5
C       a         NaN   34
        b         27    65
        c         42    NaN
A       a         44    NaN
        c         NaN   1
        d         NaN   89
        e         88    NaN
D       b         NaN   12
        c         55    23
        d         57    NaN

Then, I'd like each hospital sorted within each county, descendingly, but 2013 enrollments like this:

        Enrollment  
        Year    2012    2013
County  Hospital        
B       e       NaN 71.5
        a       54  54
        b       55  NaN
C       b       27  65
        a       NaN 34
        c       42  NaN
A       d       NaN 89
        c       NaN 1
        a       44  NaN
        e       88  NaN
D       c       55  23
        b       NaN 12
        d       57  NaN

So far, I've tried using groupby to get the sums and merge the back but have not had any luck:

d2.groupby('County').sum()

Thanks in advance!

Aucun commentaire:

Enregistrer un commentaire