Data Visualization
Seaborn
✔️ Having identical statistics doesn’t mean the individual data sets are all equal!!

Visualizing data by types: Numeric x Numeric

Visualizing data by types: Numeric x Categorical

import seaborn as sns# example datasets are given by seaborn. Imported datasets can be used just the same as we used earlier with pandas.raw = sns.load_datatset('tips')

Seaborn Basic function structure
sns.scatterplot(data=dataframe, x='total_bill', y='tip', hue='sex')
Data Distribution (Numeric vs. Numeric)
relplot(data=dataframe, x=<column>, y=<column>, hue=<column>, kind='scatter)
- kind options: ‘scatter’(default), ‘line’
sns.relplot(data=raw, x='tip', y='total_bill')

jointplot(data = df, x = <coloumn>, y=<column>, kind = 'scatter)
- kind options
❓ ‘scatter’(default): point
❓ ‘reg’: point + regression
❓ ‘kde’: cumulative distribution chart like map
sns.jointplot(data = raw, x = 'tip', y = 'total_bill')

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'kde')

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'regg')

sns.jointplot(data = raw, x = 'tip', y = 'total_bill', kind = 'hex')

Pairplot(data = df)
Visualize the relationship between each two column in the entire numeric data column in data frame
sns.pairplot(data = raw)

sns.pairplot(data = raw, hue = 'sex')

Data Distribution (Numeric vs. Categorical)
sns.boxplot(data = raw, x = 'day', y = 'tip)

The line in the box indicates where the datasets are heavily weighted and the dots above (could be placed at the bottom) indicates unusual data.
sns.boxplot(data = raw, x = 'day, y = 'tip', hue = 'smoker')

Boxplot does not show the individual value for each data so if the amount of data is low, we can not roughly estimate with boxplot.
sns.swarmplot(data = raw, x = 'day', y = 'tip')

sns.swarmplot(data = raw, x = 'day', y = 'tip', hue = 'smoker', dodge = True)

sns.barplot(data = raw, x = 'size', y = 'tip')

sns.barplot(data = raw, x = 'size', y = 'tip', hue = 'sex')

Data Distribution (Numeric vs. Categorical vs. Categorical)
If using heatmap, we can see the entire two categorical data distribution of numerical data value all in one by using color.
df = raw.pivot_table(index = 'day', columns = 'size', values = 'tip', aggfunc = 'mean')

sns.heatmap(data = df)

sns.heatmap(data = df, annot = True)

sns.heatmap(data = df, annot = True, fmt = '.2f')

sns.heatmap(data = df, annot = True, fmt = '.2f', cmap = 'Blues')

- fmt options: ‘.1f’, ‘.2f’, ‘.3f’ …
- cmap options: Reds, Blues, vlag, Pastel1