Let's use the famous Titanic dataset, found here:
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.xls
And read it in as a dataframe: df
The features that I'm interested in are 'age' (a float) and 'survived' (also a float, but binary 1 or 0).
I know how to plot a histogram to show the distribution of ages for the passengers:
sns.set_style("white")
df_age = df[df.age >= 0]
age = df_age.age
fig = plt.figure(figsize=(12,6))
plt.hist(age.values, facecolor='gray', bins=20, alpha=.8)
plt.axvline(age.median(), color='gray', linestyle='dashed', linewidth=2)
plt.xlabel('age')
plt.ylabel('passenger count')
plt.show()
Easy! But what I want is a line chart of survival rate for passengers in each histogram bin on top of that, like this:
How can I do that?
Aucun commentaire:
Enregistrer un commentaire