Exploratory Data Analysis

Introduction

Here’s where you’ll explain your data. Where is it from and what’s a little bit of the background. Then you need to explain the columns (variables) in the dataset:



Data Preparation

Describe here what you did to clean the data.

### Use this chunk to read in the data and clean it
scoobydoo <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-13/scoobydoo.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_character(),
##   index = col_double(),
##   date_aired = col_date(format = ""),
##   run_time = col_double(),
##   monster_amount = col_double(),
##   unmask_other = col_logical(),
##   caught_other = col_logical(),
##   caught_not = col_logical(),
##   suspects_amount = col_double(),
##   culprit_amount = col_double(),
##   door_gag = col_logical(),
##   batman = col_logical(),
##   scooby_dum = col_logical(),
##   scrappy_doo = col_logical(),
##   hex_girls = col_logical(),
##   blue_falcon = col_logical()
## )
## ℹ Use `spec()` for the full column specifications.
scoobydoo %>%
  mutate(imdb = as.numeric(ifelse(imdb == 'NULL', NA, imdb)),
         engagement = as.numeric(ifelse(engagement == 'NULL', 
                                        NA, imdb))) -> scoobydoo_tidy



Questions

For each question, make a plot illustrating the question, use a statistical to answer the question, and describe your conclusions.

Question #1: Do shows that use if_it_wasnt_for have higher engagement?

DESCRIBE YOUR RESULTS HERE

### imdb rating
scoobydoo_tidy %>% 
  mutate(if_it_wasnt_for2 = ifelse(if_it_wasnt_for == 'NULL', 'no', 'yes')) %>%
ggplot(aes(x = imdb)) +
  geom_density(aes(color = if_it_wasnt_for2)) +
  theme_classic()
## Warning: Removed 15 rows containing non-finite values (stat_density).

### use this chunk to conduct a statistical test to answer your question
scoobydoo_tidy %>% 
  mutate(if_it_wasnt_for2 = ifelse(if_it_wasnt_for == 'NULL', 'no', 'yes')) %$%
  chisq.test(imdb, if_it_wasnt_for2) %>% 
  tidy()
## Warning in chisq.test(imdb, if_it_wasnt_for2): Chi-squared approximation may be
## incorrect
## # A tibble: 1 x 4
##   statistic p.value parameter method                    
##       <dbl>   <dbl>     <int> <chr>                     
## 1      83.5 0.00113        48 Pearson's Chi-squared test

Rinse and repeat for another 9 questions

References

  1. Reference should be for your dataset(s), but if you looked up more information that you included in the introduction, please reference it here as well.