1 - Principles of Data Visualization

Week 1
8/22/2023 ๐Ÿ’ป ๐Ÿงฐ ๐Ÿ“Š ๐Ÿฅณ

Jessica Cooperstone, Ph.D.

Introductions ๐Ÿ‘‹

  • Name
  • Program
  • Why you decided to take this class
  • One thing you hope to learn

Course logistics ๐Ÿ—บ๏ธ

Teaching Team

Instructor: Jessica Cooperstone

โœ‰๏ธ cooperstone.1@osu.edu

TA: Daniel Quiroz Moreno

โœ‰๏ธ quirozmoreno.1@osu.edu

Office hours: go.osu.edu/dataviz-times

Website

If you have found these slides, youโ€™ve made it to the website! (Good job.)

All course materials will be posted to, or linked to from www.rdataviz.com


Syllabus

  • A full version of the syllabus can be found on Carmen

  • A trimmed version of the syllabus can be found on our course site

Attendance

  • Class will taught in a hybrid, synchronous manner, meaning I expect you to attend class during class time. This attendance can happen in person, or virtually via Zoom I have found that students who attend in person are more engaged, and tend to master material more quickly. But, it is up to you how you want to attend.

  • I will record class time for those who want to 1) revisit material or 2) canโ€™t attend (this should be uncommon). These recordings are not to replace coming to class.

How class will be?

  • A combination of lecture, code run-throughs, live coding, and hands-on exercises.

  • Bring a laptop (not tablet) to class with R and RStudio downloaded

  • Come with your questions!

  • Engage as much as you can!

Previous programming experience

Assigments

  • Module assignments: After each module, there will be an assignment to provide practice for the techniques learned in class.

  • Class reflections: After 10 of the 15 weeks, you will write a 1 paragraph reflection on the material that was presented in class. This can include your thoughts on how you will use these lessons in your own research and data visualizations, ways in which you have investigated this topic (or expect to) on your own, or what else youโ€™d like to learn in this area. The purpose of this assignment is not to be burdensome, but to keep you engaged in the course material, and providing feedback to me on what parts youโ€™ve found useful, what youโ€™ve struggled with, and what youโ€™d like to see more of in the future.

  • Capstone assignment: At the end of the semester, you will complete a capstone assignment where you create a series of visualizations based on your research data, data coming from your lab, or other data that is publicly available. I expect this assignment to be completed in R Markdown, annotated, and knitted into an easy-to-read .html file. I also expect your code to be fully commented such that I can understand what you are doing with each step, and why.

Late assignments

  • I expect you will turn assignments in on time. Late assignments are not accepted. If there are extenuating circumstances that prevent you from turning in an assignment on time, please connect with me as soon as possible after such a situation arises for discussion about a possible deadline extension.

Academic integrity ๐Ÿซ

  • It is fine for you to work with your classmates/labmates/whoever, but I expect you to turn in your own independent assignments representing your work

  • All assignments are open book, googling/investigating is required!

๐Ÿ—“ Schedule

This is our tentative class schedule - but subject to change depending on our pacing, and your interests!

๐Ÿ—“๏ธ Schedule (part 1)

Date Module Topic
2023-08-22 1: Principles Principles of data visualization
2023-08-29 1: Principles Good and bad visualizations

๐Ÿ—“๏ธ Schedule (part 2)

Date Module Topic
2023-09-05 2: Coding fundamentals R Markdown for reproducible research
2023-09-12 2: Coding fundamentals Wrangling, the basics
2023-09-19 2: Coding fundamentals ggplot 101
2023-09-26 2: Coding fundamentals Themes, labels, facets (ggplot 102)

๐Ÿ—“๏ธ Schedule (part 3)

Date Module Topic
2023-10-03 3: Data exploration Data distributions
2023-10-10 3: Data exploration Correlations
2023-10-17 Open session, capstone prep Open session, capstone prep
2023-10-24 3: Data exploration Annotating statistics

๐Ÿ—“๏ธ Schedule (part 4)

Date Module Topic
2023-10-31 4: Putting it together Principal components analysis
2023-11-07 4: Putting it together Manhattan plots and making lots of plots at once
2023-11-14 4: Putting it together Interactive plots
2023-11-21 No class, Thanksgiving Relaxing and eating
2023-11-28 4: Putting it together ggplot extension packages and complexheatmap
2023-12-05 4: Putting it together Capstone assignment open session

Why do we visualize our data? ๐Ÿ—ฃ๏ธ

There may be a data dinosaur ๐Ÿฆ–

A gif of 13 different datasets (include one who's points make the shape of a dinosaur) that all have the same mean and standard deviation, but have very different distributions

Figure by Alberto Cairo

To understand distribution

Anscombeโ€™s quartet ๐ŸŽป

To discover data secrets

A 3 panel figure showing raw data, boxplots, and violin plots demonstrating how boxplots don't fully explain the distribution of data

Figures from Justin Matejka and George Fitzmaurice

To convey our message

Figure from Bilbrey et al., 2021 New Phytologist showing the locations on the apple genetic map (17 chromosomes) where there are significant associations between metabolomic features, and genomic markers.

Bilbrey et al., New Phytologist, 2021

The data visualization process

A figure with three circles, and arrows between the first and second, and second and third. First circle says data, middle circle says analyst, and third right-most circle says learner. In between data and analyst= is explore, analyze and learn, and in between analyst and learner is explain, explore and persuade.

Figure adapted from one by Rick Scavetta

Small changes can make a big difference (and some examples)

Simple changes improve interpretability

Simple changes improve interpretability

Encoding data with easy-to-process visual clues

Length is easier to see than angles or areas.

Encoding data with easy-to-process visual clues

Length is easier to see than angles or areas.

Color scales should be intuitive and accessible

Figure with two maps of Georgia, depciting COVID cases per 100K people from July 2, 2020 and July 17, 2020. The color scale goes from white, to light blue, to dark blue, then to red, and the number of people in the different bins are not the same across plots.

These are not.

Show your data if you can

#barbarplots

Show your data if you can

#barbarplots

Show your data if you can

#barbarplots

Cut your axes with care

Figure showing the average height of women (y-axis) from different countries (x-axis). But the y-axis only goes from 5 foot to 5 foot 7 inches, making women from India look tiny and women from Latvia seem enormous.

Cut your axes with care

Cut your axes with care

Avoid figure spaghetti ๐Ÿ

Avoid figure spaghetti ๐Ÿ

Be consistent among figures

  • Use the same color schemes/shapes across figures

  • If youโ€™re ordering/grouping, do so in the same manner

Make sure your plot has a clear message ๐Ÿ•

Figure showing the average height of women (y-axis) from different countries (x-axis). But the y-axis only goes from 5 foot to 5 foot 7 inches, making women from India look tiny and women from Latvia seem enormous.

Marie Kondo your plots

Declutter, and keep only parts that are informative (and spark joy) ๐Ÿ˜ป

A very ugly 3D plot showing life expectancy across the 5 continents where the 3D makes it hard to read, it has duplicative legends, and meaningless colors.

From https://socviz.co/lookatdata.html

Oral presentation and publication figures might not be the same

Some take home messages

What should you think about when making visualizations?

  1. Who are you talking to? ๐Ÿ“ข

  2. What are you trying to convey? ๐Ÿ“

  3. How can you fairly represent your data? ๐Ÿšฏ