dataframe histogram visualization with seaborn using countplot


We often need to plot histograms to visualize distributions of certain features or variables.
How to quickly obtain a useful plot and get the work done? If what we care is the frequency of each values, seaborn provides
a convenient way, count_plot() function, to get the plot without count the data by ourself and then do the bar chars.

Check the following example:

get the data and do a count plot

%matplotlib inline
import seaborn as sns


titanic = sns.load_dataset("titanic")
titanic['class'] = titanic['class'].astype('str')
display(titanic)

survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
886 0 2 male 27.0 0 0 13.0000 S Second man True NaN Southampton no True
887 1 1 female 19.0 0 0 30.0000 S First woman False B Southampton yes True
888 0 3 female NaN 1 2 23.4500 S Third woman False NaN Southampton no False
889 1 1 male 26.0 0 0 30.0000 C First man True C Cherbourg yes True
890 0 3 male 32.0 0 0 7.7500 Q Third man True NaN Queenstown no True

891 rows × 15 columns

sns.set_theme(style="darkgrid")
ax = sns.countplot(x="embark_town", data=titanic)

png

what if we have too many values for the feature, and we can’t plot all of their distributions in the histogram?

# get the distinct values first, then choose the top n values we want to present; here we choose 2 as an example

sub_index = titanic['class'].value_counts().index[:2]
sub_data = titanic[titanic['class'].isin(sub_index)]
sub_data = sub_data.reset_index(drop=True)

ax = sns.countplot(x="class", data=sub_data)


png

# we can also explicitly require the order to be ascending
ax = sns.countplot(x="class", data=sub_data,order=sub_index[::-1])

png

now how to show the value counts for two categorical variables?

ax = sns.countplot(x="class", hue="who", data=titanic)

png


Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC