# Visualizing Correlations

Week 8

## Introduction

We will will building on our lessons on ggplot101 and ggplot102 which focused on an overall understanding of the grammar of graphics, basic syntax, adding data, aesthetic mappings, geoms, facets, scales, labels, and themes. Today we are going to apply what we learned towards trying to better understanding and visualize correlations within our data. To do this we will also use some ggplot extension packages.

### Load libraries and data

Before we get started, letâ€™s load our libraries.

`library(tidyverse)`

Today we are going to continue to use the same real research data from my group from last week. We will be reading in the supplementary data from a paper written by Michael Dzakovich, and published in The Plant Genome. The data is present in a Excel worksheet, so we will use the function `read_excel()`

from the tidyverse (but not core tidyverse) package `readxl`

. We want to import Supplemental Table 1. You can indicate which sheet you want to import in the arguments to `read_excel()`

.

```
<- readxl::read_excel("tpg220192-sup-0002-supmat.xlsx",
alkaloids sheet = "S1 Raw Data Diversity Panel")
```

`::kable(head(alkaloids)) knitr`

ID | Year | Environment | Block | Genotype | Plot_Source | Class | Origin | Provence | Blanca_Cluster1 | Blanca_Cluster2 | Passport_Species | Passport_Classification | Sim_Grouping | Latitude | Longitude | Dehydrotomatidine | Tomatidine | Dehydrotomatine1 | Dehydrotomatine2 | TotalDehydrotomatine | Tomatine | Hydroxytomatine1 | Hydroxytomatine2 | Hydroxytomatine3 | Hydroxytomatine4 | TotalHydroxytomatine | Acetoxytomatine1 | Acetoxytomatine2 | Acetoxytomatine3 | TotalAcetoxytomatine | DehydrolycoperosideFGdehydroesculeosideA | LycoperosideFGEsculeosideA1 | LycoperosideFGEsculeosideA2 | TotalLycoperosideFGEsculeosideA | EsculeosideB1 | EsculeosideB2 | EsculeosideB3 | TotalEsculeosideB | Total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

7805 | 2018 | FreEarly18 | 1 | CULBPT_05_11 | 2K17-7724 | Cultivated Processing | USA | NY | SLL_processing_2 | SLL_processing_2 | SLL | SLL_processing_NY | Arid | 40.712800000000001 | -74.006 | 0.000000 | 0.000000 | 5.726010 | 0.350331 | 6.076341 | 172.66244 | 1.079190 | 86.72742 | 17.831892 | 9.142607 | 114.78111 | 18.902399 | 56.307182 | 1.890053 | 77.099634 | 5.125904 | 10.277325 | 336.8893 | 347.1666 | 3.787979 | 0.924195 | 3.943230 | 8.655404 | 731.5675 |

7898 | 2017 | Fre17 | 2 | CULBPT_05_11 | 2K9-8584 | Cultivated Processing | USA | NY | SLL_processing_2 | SLL_processing_2 | SLL | SLL_processing_NY | Arid | 40.712800000000001 | -74.006 | 0.000000 | 0.169068 | 0.000000 | 0.000000 | 0.000000 | 55.47329 | 0.000000 | 53.32292 | 13.630697 | 4.841762 | 71.79538 | 3.557348 | 4.107289 | 0.000000 | 7.664637 | 2.905500 | 5.548102 | 199.6694 | 205.2175 | 8.978931 | 1.897850 | 6.794690 | 17.671471 | 360.8969 |

7523 | 2018 | FreLate18 | 2 | CULBPT_05_11 | 2K17-7724 | Cultivated Processing | USA | NY | SLL_processing_2 | SLL_processing_2 | SLL | SLL_processing_NY | Arid | 40.712800000000001 | -74.006 | 0.135675 | 0.680554 | 5.073552 | 0.000000 | 5.073552 | 123.85835 | 0.000000 | 50.90989 | 6.503939 | 1.368847 | 58.78268 | 3.931461 | 4.123222 | 0.623340 | 8.678023 | 2.185082 | 5.104115 | 259.0177 | 264.1218 | 4.049145 | 0.000000 | 6.749386 | 10.798531 | 474.3143 |

7724 | 2017 | Fre17 | 1 | CULBPT_05_11 | 2K9-8584 | Cultivated Processing | USA | NY | SLL_processing_2 | SLL_processing_2 | SLL | SLL_processing_NY | Arid | 40.712800000000001 | -74.006 | 0.054300 | 0.497261 | 19.419087 | 0.000000 | 19.419087 | 239.01264 | 0.000000 | 36.02318 | 8.557673 | 7.483933 | 52.06478 | 3.341048 | 16.415426 | 1.057100 | 20.813574 | 0.000000 | 0.000000 | 203.0061 | 203.0061 | 1.678210 | 0.000000 | 2.349633 | 4.027843 | 538.8955 |

7427 | 2018 | FreLate18 | 1 | CULBPT_05_11 | 2K17-7724 | Cultivated Processing | USA | NY | SLL_processing_2 | SLL_processing_2 | SLL | SLL_processing_NY | Arid | 40.712800000000001 | -74.006 | 0.139454 | 0.553801 | 0.000000 | 0.000000 | 0.000000 | 64.31783 | 0.879435 | 39.91027 | 7.228388 | 3.015298 | 51.03339 | 0.000000 | 3.131685 | 0.000000 | 3.131685 | 0.000000 | 4.054211 | 299.5687 | 303.6229 | 10.146857 | 0.000000 | 4.882339 | 15.029197 | 437.8283 |

7854 | 2018 | FreEarly18 | 2 | CULBPT_05_11 | 2K17-7724 | Cultivated Processing | USA | NY | SLL_processing_2 | SLL_processing_2 | SLL | SLL_processing_NY | Arid | 40.712800000000001 | -74.006 | 0.049700 | 0.262174 | 3.737579 | 0.000000 | 3.737579 | 68.44913 | 0.000000 | 23.86864 | 13.506299 | 1.456982 | 38.83192 | 4.657902 | 4.259007 | 0.605729 | 9.522638 | 9.832149 | 11.595595 | 459.5205 | 471.1161 | 6.839930 | 0.486236 | 5.595751 | 12.921917 | 614.7233 |

This dataset has 605 observations, with data about different steroidal alkaloids in the fruits of different tomato germplasm grown in 3 locations across 2 years. There is also some other metadata too.

For those who are chemistry minded, here is a little pathway context for the compounds we are investigating today.

## Scatterplots

A very simple first pass way to understand if you have relationships within your data is to make scatterplots of the variables you think might be correlated. Letâ€™s start by investigating how the different alkaloid concentrations are correlated to each other. First we will see how alpha-tomatine content (`Tomatine`

) is related to total steroidal alkaloid content (`Total`

).

```
%>%
alkaloids ggplot(aes(x = Total, y = Tomatine)) +
geom_point() +
labs(x = "Total Steroidal Alkaloids (Âµg/100 g)",
y = "Alpha-Tomatine (Âµg/100 g)")
```

It seems like there are two separate groups here - the points with a steeper slope, and the points with a less steep slope. We can color our points based on what `Class`

of tomato the data comes from, maybe that will reveal something. In the meanwhile letâ€™s make this plot look a bit nicer. The package `scales`

has some nice functions that help you control the scaling of your plots, in this case, making each of the axes have numbers in `comma_format()`

. I also am using the hex codes for a color-blind friendly qualitative color scheme developed by Paul Tol.

`library(scales)`

```
Attaching package: 'scales'
```

```
The following object is masked from 'package:purrr':
discard
```

```
The following object is masked from 'package:readr':
col_factor
```

```
%>%
alkaloids ggplot(aes(x = Total, y = Tomatine, color = Class)) +
geom_point(alpha = 0.8) +
scale_x_continuous(labels = comma_format(big.mark = ",")) + # requires the package scales
scale_y_continuous(labels = comma_format(big.mark = ",")) + # requires the package scales
scale_color_manual(values = c("#4477AA", "#EE6677", "#228833", "#CCBB44", "#66CCEE")) +
theme_minimal() +
labs(x = "Total Steroidal Alkaloids (Âµg/100 g fresh weight)",
y = "Alpha-Tomatine (Âµg/100 g fresh weight)",
title = "Relationship between Alpha-Tomatine and Total Steroidal Alkaloids \nAcross Different Germplasm in the Red Tomato Clade")
```

All of the tomatoes in the two extremes of this plot are from the `Class`

Wild Cherry. What would this look like if we removed these fruits? Note, I adjusted the color scale to remove the hex code associated with Wild Cherry but keeping the other colors the same.

```
%>%
alkaloids filter(Class != "Wild Cherry") %>%
ggplot(aes(x = Total, y = Tomatine, color = Class)) +
geom_point() +
scale_x_continuous(labels = comma_format(big.mark = ",")) + # requires the package scales
scale_y_continuous(labels = comma_format(big.mark = ",")) + # requires the package scales
scale_color_manual(values = c("#4477AA", "#EE6677", "#228833", "#CCBB44")) +
theme_minimal() +
labs(x = "Total Steroidal Alkaloids (Âµg/100 g fresh weight)",
y = "Alpha-Tomatine (Âµg/100 g fresh weight)",
title = "Relationship between Alpha-Tomatine and Total Steroidal Alkaloids \nAcross Different Germplasm in the Red Tomato Clade")
```

## Adding `geom_smooth()`

```
%>%
alkaloids ggplot(aes(x = Total, y = Tomatine, color = Class)) +
geom_point(alpha = 0.8) +
geom_smooth(method = "lm") +
scale_x_continuous(labels = comma_format(big.mark = ",")) + # requires the package scales
scale_y_continuous(labels = comma_format(big.mark = ",")) + # requires the package scales
scale_color_manual(values = c("#4477AA", "#EE6677", "#228833", "#CCBB44", "#66CCEE")) +
theme_minimal() +
labs(x = "Total Steroidal Alkaloids (Âµg/100 g fresh weight)",
y = "Alpha-Tomatine (Âµg/100 g fresh weight)",
title = "Relationship between Alpha-Tomatine and Total Steroidal Alkaloids \nAcross Different Germplasm in the Red Tomato Clade")
```

``geom_smooth()` using formula = 'y ~ x'`

### Faceted scatterplots

We may be able to see trends by tomato class more easily if we facet our scatterplots. I also am demonstrating here how within the ggplot function you can make alter the aesthetics you plot - here I am turning data that is present as Âµg/100 g to mg/100 g by dividing by 1000 and changing the axis labels accordingly.

```
%>%
alkaloids ggplot(aes(x = Total/1000, y = Tomatine/1000, color = Class)) +
geom_point(alpha = 0.8) +
scale_color_manual(values = c("#4477AA", "#EE6677", "#228833", "#CCBB44", "#66CCEE")) +
facet_wrap(vars(Class), scales = "free") +
theme_minimal() +
theme(legend.position = "none") +
labs(x = "Total Steroidal Alkaloids (mg/100 g fresh weight)",
y = "Alpha-Tomatine (mg/100 g fresh weight)",
title = "Relationship between Alpha-Tomatine and Total Steroidal Alkaloids \nAcross Different Germplasm in the Red Tomato Clade")
```

## Correlation matrix with `cor()`

`cor()`

is a function from base R that will allow you to create a correlation matrix.

Before we use `cor()`

we will clean up our dataset to include only the variables we want to correlate.

`colnames(alkaloids)`

```
[1] "ID"
[2] "Year"
[3] "Environment"
[4] "Block"
[5] "Genotype"
[6] "Plot_Source"
[7] "Class"
[8] "Origin"
[9] "Provence"
[10] "Blanca_Cluster1"
[11] "Blanca_Cluster2"
[12] "Passport_Species"
[13] "Passport_Classification"
[14] "Sim_Grouping"
[15] "Latitude"
[16] "Longitude"
[17] "Dehydrotomatidine"
[18] "Tomatidine"
[19] "Dehydrotomatine1"
[20] "Dehydrotomatine2"
[21] "TotalDehydrotomatine"
[22] "Tomatine"
[23] "Hydroxytomatine1"
[24] "Hydroxytomatine2"
[25] "Hydroxytomatine3"
[26] "Hydroxytomatine4"
[27] "TotalHydroxytomatine"
[28] "Acetoxytomatine1"
[29] "Acetoxytomatine2"
[30] "Acetoxytomatine3"
[31] "TotalAcetoxytomatine"
[32] "DehydrolycoperosideFGdehydroesculeosideA"
[33] "LycoperosideFGEsculeosideA1"
[34] "LycoperosideFGEsculeosideA2"
[35] "TotalLycoperosideFGEsculeosideA"
[36] "EsculeosideB1"
[37] "EsculeosideB2"
[38] "EsculeosideB3"
[39] "TotalEsculeosideB"
[40] "Total"
```

From looking at the colnames and reading the supplemental information, we can see that some columns are composites of others. For example, the column `TotalAcetoxytomatine`

= `Acetoxytomatine1`

+ `Acetoxytomatine2`

+ `Acetoxytomatine3`

. So we want to pull only the columns that represent the total for any given alkaloids. There should be 10 columns.

```
# create a vector of the names we want to keep
<- c("Dehydrotomatidine",
alkaloid_total_names "Tomatidine",
"TotalDehydrotomatine",
"Tomatine",
"TotalHydroxytomatine",
"TotalAcetoxytomatine",
"DehydrolycoperosideFGdehydroesculeosideA",
"TotalLycoperosideFGEsculeosideA",
"TotalEsculeosideB",
"Total")
# make a new df including some metadata and the alkaloid_total_names
<- alkaloids %>%
alkaloids_totals select(ID, Year, Environment, Block, Genotype, Class, all_of(alkaloid_total_names))
# did it work? look at colnames()
colnames(alkaloids_totals)
```

```
[1] "ID"
[2] "Year"
[3] "Environment"
[4] "Block"
[5] "Genotype"
[6] "Class"
[7] "Dehydrotomatidine"
[8] "Tomatidine"
[9] "TotalDehydrotomatine"
[10] "Tomatine"
[11] "TotalHydroxytomatine"
[12] "TotalAcetoxytomatine"
[13] "DehydrolycoperosideFGdehydroesculeosideA"
[14] "TotalLycoperosideFGEsculeosideA"
[15] "TotalEsculeosideB"
[16] "Total"
```

Now we can create a correlation matrix to see how each of our 10 alkaloids is correlated to the concentration of each other alkaloid (including the compile metric of `Total`

which sums all the alkaloids). The default for `cor()`

is to use Pearsonâ€™s correlation coefficient, but you can set to use Spearman `method = "spearman"`

or Kendall `method = "kendall"`

if you prefer. Check the documentation for `cor()`

for more information.

```
<- alkaloids_totals %>%
alkaloids_cor select(all_of(alkaloid_total_names)) %>%
cor()
# look at our correlation matrix
::kable(alkaloids_cor) # kable makes a nicely formatted table knitr
```

Dehydrotomatidine | Tomatidine | TotalDehydrotomatine | Tomatine | TotalHydroxytomatine | TotalAcetoxytomatine | DehydrolycoperosideFGdehydroesculeosideA | TotalLycoperosideFGEsculeosideA | TotalEsculeosideB | Total | |
---|---|---|---|---|---|---|---|---|---|---|

Dehydrotomatidine | 1.0000000 | 0.2974462 | 0.0324918 | 0.0238230 | 0.0099126 | 0.0322029 | 0.0305049 | 0.0761907 | 0.0282219 | 0.0708252 |

Tomatidine | 0.2974462 | 1.0000000 | 0.3744672 | 0.3736949 | 0.1003558 | 0.0382981 | -0.0059964 | 0.0373649 | 0.0126724 | 0.2044979 |

TotalDehydrotomatine | 0.0324918 | 0.3744672 | 1.0000000 | 0.9214859 | 0.2290192 | 0.4011257 | -0.0820469 | -0.1149682 | -0.1217560 | 0.5636969 |

Tomatine | 0.0238230 | 0.3736949 | 0.9214859 | 1.0000000 | 0.0995212 | 0.1220596 | -0.1140360 | -0.1357819 | -0.1260377 | 0.3756155 |

TotalHydroxytomatine | 0.0099126 | 0.1003558 | 0.2290192 | 0.0995212 | 1.0000000 | 0.3563506 | 0.0330078 | 0.0284887 | 0.0134806 | 0.4774036 |

TotalAcetoxytomatine | 0.0322029 | 0.0382981 | 0.4011257 | 0.1220596 | 0.3563506 | 1.0000000 | -0.0865506 | -0.1106212 | -0.0947254 | 0.6782337 |

DehydrolycoperosideFGdehydroesculeosideA | 0.0305049 | -0.0059964 | -0.0820469 | -0.1140360 | 0.0330078 | -0.0865506 | 1.0000000 | 0.8862982 | 0.7401116 | 0.4792108 |

TotalLycoperosideFGEsculeosideA | 0.0761907 | 0.0373649 | -0.1149682 | -0.1357819 | 0.0284887 | -0.1106212 | 0.8862982 | 1.0000000 | 0.7779405 | 0.5222751 |

TotalEsculeosideB | 0.0282219 | 0.0126724 | -0.1217560 | -0.1260377 | 0.0134806 | -0.0947254 | 0.7401116 | 0.7779405 | 1.0000000 | 0.4059824 |

Total | 0.0708252 | 0.2044979 | 0.5636969 | 0.3756155 | 0.4774036 | 0.6782337 | 0.4792108 | 0.5222751 | 0.4059824 | 1.0000000 |

Note the diagonal is all composed of 1s. This makes sense because the correlation of each alkaloid with itself is 1.

## Using `ggcorrplot()`

from `ggcorrplot`

Use the function `ggcorrplot()`

without any additional arguments besides the correlation matrix `alkaloids_cor`

. In general, I think if you want to make a bunch of correlation plots quickly, and donâ€™t intend to publish them, ``ggcorrplot()`

works well, but the visuals of the plot are quite difficult to customize.

```
library(ggcorrplot)
ggcorrplot(alkaloids_cor)
```

This is not a perfect plot but its a good starting point. Correlation matrices are inherently symmetric, meaning if we display only the top or bottom triangle, we do not lose any information. We will work on editing this plot in different ways to show more information and make it more beautiful.

We could also make the plot circles instead of squares at the same time.

```
ggcorrplot(alkaloids_cor,
method = "circle",
type = "lower")
```

In general, I think if you want to make a bunch of correlation plots quickly, and donâ€™t intend to publish them, ``ggcorrplot()`

works well, but the visuals of the plot are quite difficult to customize.

## Using `corrplot()`

from `corrplot`

Similarly, you can use a base R plotting based package `corrplot()`

to make correlation plots. The customization syntax here is quite different from what weâ€™ve been working with in ggplot, but I wanted you to feel familiar with some base R tools.

`library(corrplot)`

`corrplot 0.92 loaded`

`corrplot(alkaloids_cor, type = "lower")`

I have used `corrplot()`

in publications before and felt like I couldnâ€™t customize the plots as much as I wanted. In the process of putting together this content, I learned some news ways to customize these plots that are actually very nice. Here are some parameters you can modify in R. You can also order your variables by hierarchical clustering.

First we will start (as we always do) by wrangling.

```
# create matrix for correlation
<- alkaloids_totals %>%
alkaloids_to_cor select(all_of(alkaloid_total_names)) %>%
as.matrix() # rcorr() needs a matrix
library(Hmisc) # does cor() but also computes significance levels
```

```
Attaching package: 'Hmisc'
```

```
The following objects are masked from 'package:dplyr':
src, summarize
```

```
The following objects are masked from 'package:base':
format.pval, units
```

```
# create a matrix of pvalues for the correlations
= rcorr(alkaloids_to_cor, type = "pearson")
alkaloids_rcorr
# create a vector of the alkaloid names for labeling
<- c("Dehydrotomatidine",
alkaloid_labels "Tomatidine",
"Dehydrotomatine",
"Alpha-Tomatine",
"Hydroxytomatine",
"Acetoxytomatine",
"Dehydrlycoperoside F, G, \nor Dehydroescueloside A",
"Lycoperoside F, G, \nor Escueloside A",
"Escueloside B",
"Total Steroidal Alkaloids")
# change row and column names of the correlation matrix
# so they are how we want them to be plotted
colnames(alkaloids_rcorr$r) <- alkaloid_labels
rownames(alkaloids_rcorr$r) <- alkaloid_labels
# change row and column names of the pvalue matrix
# so they are how we want them to be plotted
colnames(alkaloids_rcorr$P) <- alkaloid_labels
rownames(alkaloids_rcorr$P) <- alkaloid_labels
```

Now we are ready to plot

```
corrplot(alkaloids_rcorr$r, # the correlation matrix
type = "lower", # lower triangle
tl.col = "black", # axis labels are black
p.mat = alkaloids_rcorr$P, # pvalue matrix
sig.level = 0.05, # how sig does a cor need to be to be included
insig = "blank", # do not display insignificant correlations
addCoef.col = "black", # display correlations in black
diag = FALSE, # don't show the diagonal (because this is all 1)
number.cex = 0.6) # size of correlation font
```

`GGally`

`ggcorr()`

Another ggplot extension package `ggally`

has the function `ggcorr()`

which also allows the creation of correlation plots, but ones that are more easily customizable. `ggcorr()`

objects are moderately customizable. They make work for some of you so Iâ€™m sharing how to make them.

Note, `GGally::ggcorr()`

does not take a correlation matrix, but instead takes the data you want to make a correlation matrix for. You can specific the method of correlation in the arguments. The default is Pearsonâ€™s correlation.

`library(GGally)`

```
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
```

```
<- alkaloids_totals %>%
to_corr select(all_of(alkaloid_total_names))
ggcorr(to_corr)
```

There is only one labeled axis - this is because there is no diagonal in these plots, like we saw with `ggcorrplot()`

and `corrplot()`

.

We can now spend some time improving the aesthetics of our plot.

```
ggcorr(to_corr, # data for correlation
low = "#f1a340", # -1 correlation color
mid = "#f7f7f7", # 0 correlation color
high = "#998ec3") # 1 correlation color
```

```
ggcorr(to_corr,
low = "#f1a340", mid = "#f7f7f7", high = "#998ec3",
geom = "circle",
label = TRUE,
label_size = 2,
label_round = 2,
layout.exp = 3)
```

For this example, we have very long label names which are really difficult to wrap, but if your labels are more reasonable this may work well for you.

`ggpairs()`

We can also use the function `GGally::ggpairs()`

to make a matrix of correlation related plots.

```
%>%
alkaloids_totals ggpairs(columns = c("Tomatine", "TotalLycoperosideFGEsculeosideA", "Total"), # pick variables
aes(color = Class))
```

Letâ€™s customized a bit.

```
# remove zeroes since they don't log transform
# make log transformed columns
<- alkaloids_totals %>%
alkaloids_totals_log filter(Tomatine != 0,
!= 0,
TotalLycoperosideFGEsculeosideA != 0) %>%
Total mutate(log10_tomatine = log10(Tomatine),
log10_FGA = log10(TotalLycoperosideFGEsculeosideA),
log10_total = log10(Total))
%>%
alkaloids_totals_log ggpairs(columns = c("log10_tomatine", "log10_FGA", "log10_total"),
aes(color = Class, alpha = 0.5), # note alpha inside aes which is weird idk why
columnLabels = c("Alpha-Tomatine", "Lycoperoside F/G\n Escueloside A", "Total Alkaloids"))
```

## Manually making correlation plots with `reshape::melt()`

and `ggplot`

Because some of the correlation specific packages are hard to customize, I am going to show you how to make your own plots by reshaping your data with `reshape2::melt()`

and some base R functions, and plotting using the standard ggplot syntax.

`library(reshape2) # contains melt()`

```
Attaching package: 'reshape2'
```

```
The following object is masked from 'package:tidyr':
smiths
```

```
# take cor matrix and convert to df with 3 columns: Var1, Var2, and value
<- melt(alkaloids_cor)
melted_alkaloids_cor
# what does it look like?
head(melted_alkaloids_cor)
```

```
Var1 Var2 value
1 Dehydrotomatidine Dehydrotomatidine 1.000000000
2 Tomatidine Dehydrotomatidine 0.297446153
3 TotalDehydrotomatine Dehydrotomatidine 0.032491778
4 Tomatine Dehydrotomatidine 0.023823011
5 TotalHydroxytomatine Dehydrotomatidine 0.009912624
6 TotalAcetoxytomatine Dehydrotomatidine 0.032202892
```

First pass minimalist plotting

```
%>%
melted_alkaloids_cor ggplot(aes(x = Var1, y = Var2, fill = value)) +
geom_tile()
```

Lots to fix! What if we want only the upper or lower triangle, again since this plot is symmetric.

### Upper triangle

Keep only the upper triangle.

```
# "save as"
<- alkaloids_cor
alkaloids_upper
# use function lower.tri() and set the lower triangle all to NA
# then we can keep only the upper triangle
lower.tri(alkaloids_upper)] <- NA
alkaloids_upper[
# melt to go back to long format
<- melt(alkaloids_upper, na.rm = TRUE)
melted_alkaloids_upper
# did it work?
head(melted_alkaloids_upper) # yup
```

```
Var1 Var2 value
1 Dehydrotomatidine Dehydrotomatidine 1.00000000
11 Dehydrotomatidine Tomatidine 0.29744615
12 Tomatidine Tomatidine 1.00000000
21 Dehydrotomatidine TotalDehydrotomatine 0.03249178
22 Tomatidine TotalDehydrotomatine 0.37446722
23 TotalDehydrotomatine TotalDehydrotomatine 1.00000000
```

### Lower triangle

Create a lower triangle object to plot.

```
# "save as"
<- alkaloids_cor
alkaloids_lower
# use function upper.tri() and set the upper triangle all to NA
# then we can keep only the lower triangle
upper.tri(alkaloids_lower)] <- NA
alkaloids_lower[
# melt to go back to long format
<- melt(alkaloids_lower, na.rm = TRUE)
melted_alkaloids_lower
# did it work?
head(melted_alkaloids_lower) # yup
```

```
Var1 Var2 value
1 Dehydrotomatidine Dehydrotomatidine 1.000000000
2 Tomatidine Dehydrotomatidine 0.297446153
3 TotalDehydrotomatine Dehydrotomatidine 0.032491778
4 Tomatine Dehydrotomatidine 0.023823011
5 TotalHydroxytomatine Dehydrotomatidine 0.009912624
6 TotalAcetoxytomatine Dehydrotomatidine 0.032202892
```

Plot

```
# remember we made alkaloid_labels
print(alkaloid_labels)
```

```
[1] "Dehydrotomatidine"
[2] "Tomatidine"
[3] "Dehydrotomatine"
[4] "Alpha-Tomatine"
[5] "Hydroxytomatine"
[6] "Acetoxytomatine"
[7] "Dehydrlycoperoside F, G, \nor Dehydroescueloside A"
[8] "Lycoperoside F, G, \nor Escueloside A"
[9] "Escueloside B"
[10] "Total Steroidal Alkaloids"
```

```
%>%
melted_alkaloids_lower ggplot(aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
geom_text(aes(label = round(value, 2)), color = "black") +
scale_fill_gradient2(low = "#f1a340",
mid = "#f7f7f7",
high = "#998ec3",
limits = c(-1, 1)) +
scale_x_discrete(labels = alkaloid_labels) +
scale_y_discrete(labels = alkaloid_labels) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
legend.justification = c(1, 0),
legend.position = c(0.5, 0.7),
legend.direction = "horizontal") +
labs(fill = "Correlation \ncoefficient",
x = "",
y ="",
title = "Correlation between steroidal alkaloids using \nPearson's correlation coefficient")
```