Principal Components Analysis Recitation 🍕

Week 10

Author

Jessica Cooperstone

Introduction

Today is the first recitation for Module 4 where we put together a lot of the material we’ve learned in the first 3 modules of this course. Today’s material is on conducting principal components analysis (PCA) using R, and visualizing the results with some tools we’ve already learned to use, and some new wrangling and viz tips along the way.

A picture of NY style gooey pizza — Source

library(tidyverse) # everything
library(readxl) # reading in excel sheets
library(factoextra) # easy PCA plotting
library(glue) # easy pasting
library(ggrepel) # repelling labels away from their points
library(patchwork) # for combining and arranging plots

Read in data

We will be using data about pizza, which includes data collected about the nutritional information of 300 different grocery store pizzas, from 10 brands compiled by f-imp and posted to Github.

pizza <- read_csv(file = "https://raw.githubusercontent.com/f-imp/Principal-Component-Analysis-PCA-over-3-datasets/master/datasets/Pizza.csv")

Rows: 300 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): brand
dbl (8): id, mois, prot, fat, ash, sodium, carb, cal

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

How different are each of the different brands of pizzas analyzed overall?

Principal Components Analysis Recitation 🍕

Introduction

Read in data

1. Run a PCA

2. Make a scree plot of the percent variance explained by each component

3. Make a scores plot of samples, coloring each sample by its brand

4. Make a loadings plot of samples

5. Create either a biplot, or a visualization that shows both your scores and loadings plot together.