Data Visualization

Seaborn

Hyerang Raina Kim

4 min readAug 14, 2020

✔️ Having identical statistics doesn’t mean the individual data sets are all equal!!

Visualizing data by types: Numeric x Numeric

Visualizing data by types: Numeric x Categorical

import seaborn as sns# example datasets are given by seaborn. Imported datasets can be used just the same as we used earlier with pandas.raw = sns.load_datatset('tips')

Seaborn Basic function structure

sns.scatterplot(data=dataframe, x='total_bill', y='tip', hue='sex')

Data Distribution (Numeric vs. Numeric)

relplot(data=dataframe, x=<column>, y=<column>, hue=<column>, kind='scatter)

kind options: ‘scatter’(default), ‘line’

sns.relplot(data=raw, x='tip', y='total_bill')

jointplot(data = df, x = <coloumn>, y=<column>, kind = 'scatter)

kind options

❓ ‘scatter’(default): point

❓ ‘reg’: point + regression

❓ ‘kde’: cumulative distribution chart like map

sns.jointplot(data = raw, x = 'tip', y = 'total_bill')

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'kde')

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'regg')

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'hex')

Pairplot(data = df)

Visualize the relationship between each two column in the entire numeric data column in data frame

sns.pairplot(data = raw)

sns.pairplot(data = raw, hue = 'sex')

Data Distribution (Numeric vs. Categorical)

sns.boxplot(data = raw, x = 'day', y = 'tip)

The line in the box indicates where the datasets are heavily weighted and the dots above (could be placed at the bottom) indicates unusual data.

sns.boxplot(data = raw, x = 'day, y = 'tip', hue = 'smoker')

Boxplot does not show the individual value for each data so if the amount of data is low, we can not roughly estimate with boxplot.

sns.swarmplot(data = raw, x = 'day', y = 'tip')

sns.swarmplot(data = raw, x = 'day', y = 'tip', hue = 'smoker', dodge = True)

sns.barplot(data = raw, x = 'size', y = 'tip')

sns.barplot(data = raw, x = 'size', y = 'tip', hue = 'sex')

Data Distribution (Numeric vs. Categorical vs. Categorical)

If using heatmap, we can see the entire two categorical data distribution of numerical data value all in one by using color.

df = raw.pivot_table(index = 'day', columns = 'size', values = 'tip', aggfunc = 'mean')

sns.heatmap(data = df)

sns.heatmap(data = df, annot = True)

sns.heatmap(data = df, annot = True, fmt = '.2f')

sns.heatmap(data = df, annot = True, fmt = '.2f', cmap = 'Blues')

fmt options: ‘.1f’, ‘.2f’, ‘.3f’ …
cmap options: Reds, Blues, vlag, Pastel1