I need to winsorize two columns in my dataframe of 12 columns.
Say, I have columns 'A', 'B', 'C', and 'D', each with a series of values. Given that I cleaned some NaN columns, the number of columns was reduced from 100 to 80, but they are still indexed to 100 with gaps (e.g. row 5 is missing).
I want to transform only columns 'A' and 'B' via winsorize method. To do this, I must convert my columns to a np.array.
import scipy.stats
df['A','B','C','D'] = #some values per each column
ab_df = df['A','B']
X = scipy.stats.mstats.winsorize(ab_df.values, limits=0.01)
new_ab_df = pd.DataFrame(X, columns = ['A','B'])
df = pd.concat([df['C','D'], new_ab_df], axis=1, join='inner', join_axes=[df.index])
When I convert to a np.array, then back to a pd.DataFrame, it's len() is correct at 80 but my indexes have been reset to be 0->80. How can I ensure that my transform 'A' and 'B' columns are indexed correctly? I don't think I can use the apply(), which would preserve index order and simply swap out the values instead of my approach, which creates a transformed copy of my df with only 2 columns, then concats them to the rest of my non-transformed columns.
 
Aucun commentaire:
Enregistrer un commentaire