eda_inclass_demo.utf8

Exploratory Data Analysis

Introduction

Here’s where you’ll explain your data. Where is it from and what’s a little bit of the background. Then you need to explain the columns (variables) in the dataset:

column1: description
column2: description
column3: description
etc.

Data Preparation

Describe here what you did to clean the data.

### Use this chunk to read in the data and clean it
scoobydoo <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-13/scoobydoo.csv')

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_character(),
##   index = col_double(),
##   date_aired = col_date(format = ""),
##   run_time = col_double(),
##   monster_amount = col_double(),
##   unmask_other = col_logical(),
##   caught_other = col_logical(),
##   caught_not = col_logical(),
##   suspects_amount = col_double(),
##   culprit_amount = col_double(),
##   door_gag = col_logical(),
##   batman = col_logical(),
##   scooby_dum = col_logical(),
##   scrappy_doo = col_logical(),
##   hex_girls = col_logical(),
##   blue_falcon = col_logical()
## )
## ℹ Use `spec()` for the full column specifications.

scoobydoo %>%
  mutate(imdb = as.numeric(ifelse(imdb == 'NULL', NA, imdb)),
         engagement = as.numeric(ifelse(engagement == 'NULL', 
                                        NA, imdb))) -> scoobydoo_tidy

Do some series have a higher IMDB rating? Or IMDB engagement?
Is there more engagement for older shows vs newer scooby do series?
Are any of the monsters real?
Combine the captured columns? combine the unmask columns? snack columns?
Do shows that use if_it_wasnt_for have higher engagement?

Questions

For each question, make a plot illustrating the question, use a statistical to answer the question, and describe your conclusions.

Question #1: Do shows that use `if_it_wasnt_for` have higher engagement?

DESCRIBE YOUR RESULTS HERE

### imdb rating
scoobydoo_tidy %>% 
  mutate(if_it_wasnt_for2 = ifelse(if_it_wasnt_for == 'NULL', 'no', 'yes')) %>%
ggplot(aes(x = imdb)) +
  geom_density(aes(color = if_it_wasnt_for2)) +
  theme_classic()

## Warning: Removed 15 rows containing non-finite values (stat_density).

### use this chunk to conduct a statistical test to answer your question
scoobydoo_tidy %>% 
  mutate(if_it_wasnt_for2 = ifelse(if_it_wasnt_for == 'NULL', 'no', 'yes')) %$%
  chisq.test(imdb, if_it_wasnt_for2) %>% 
  tidy()

## Warning in chisq.test(imdb, if_it_wasnt_for2): Chi-squared approximation may be
## incorrect

## # A tibble: 1 x 4
##   statistic p.value parameter method                    
##       <dbl>   <dbl>     <int> <chr>                     
## 1      83.5 0.00113        48 Pearson's Chi-squared test

Rinse and repeat for another 9 questions

References