lundi 11 juillet 2016

Getting relative frequencies for a categorical variable (filtered on a count)?

I've got a DataFrame of student test results, where the two columns that interest me are country and result, as in:

country    result
FR         Pass
FR         Fail
US         Pass
US         Pass
DK         Fail
DK         Fail
SE         Pass
...        ...

What I'm trying to figure out is how to get the relative "Fail" frequency per country, descending (meaning - I want the students from that country that failed, as a percentage of all the students from that particular country), but only for countries that had over, let's say, 200 students take the test:

country    % fail    students
FR         0.056     997
US         0.051     855
DK         0.042     627
NL         0.032     511

I've seen colleagues at work do it with a very short SQL query, but for the life of me I can't figure out how to do it with pandas!

Aucun commentaire:

Enregistrer un commentaire