I have a Pandas Series, that needs to be log-transformed to be normal distributed. But I can´t log transform yet, because there are 1 < values < 1. Therefore I want to normalize the Series first. I heard of StandardScaler(scikit-learn), Z-score standardization and Min-Max scaling(normalization). I want to cluster the data later, which would be the best method? StandardScaler and Z-score standardization use mean, variance etc. Can I use them on "not yet normal distibuted" data?
Aucun commentaire:
Enregistrer un commentaire