Principal Components Analysis Recitation πŸ•

Week 10

Author

Jessica Cooperstone

Introduction

Today is the first recitation for Module 4 where we put together a lot of the material we’ve learned in the first 3 modules of this course. Today’s material is on conducting principal components analysis (PCA) using R, and visualizing the results with some tools we’ve already learned to use, and some new wrangling and viz tips along the way.

library(tidyverse) # everything
library(readxl) # reading in excel sheets
library(factoextra) # easy PCA plotting
library(glue) # easy pasting
library(ggrepel) # repelling labels away from their points
library(patchwork) # for combining and arranging plots

Read in data

We will be using data about pizza, which includes data collected about the nutritional information of 300 different grocery store pizzas, from 10 brands compiled by f-imp and posted to Github.

pizza <- read_csv(file = "https://raw.githubusercontent.com/f-imp/Principal-Component-Analysis-PCA-over-3-datasets/master/datasets/Pizza.csv")
Rows: 300 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): brand
dbl (8): id, mois, prot, fat, ash, sodium, carb, cal

β„Ή Use `spec()` to retrieve the full column specification for this data.
β„Ή Specify the column types or set `show_col_types = FALSE` to quiet this message.

How different are each of the different brands of pizzas analyzed overall?

1. Run a PCA

2. Make a scree plot of the percent variance explained by each component

3. Make a scores plot of samples, coloring each sample by its brand

4. Make a loadings plot of samples

5. Create either a biplot, or a visualization that shows both your scores and loadings plot together.

Back to top