Coriell Bioinformatics Research Experience 2020
The Bioinformatics Research Experience is a four-week research training program for undergraduate students interested in learning scientific biological data analysis. The following year’s material can be seen here: 2021
Getting Started
Programs to Install
- Zoom https://zoom.us/download
- Slack https://slack.com/. If you’re familiar with Slack, our slack name is coriellbioinf-se37156. Otherwise you can follow the attached directions in slack_intructions.pdf.
- Download and install TeamViewer https://www.teamviewer.com/en-us/. This will allow the research experience team to screen share with your computer to assist with technical problems.
- R. If you don’t already have it installed, go to R Cloud https://cloud.r-project.org/ to download and install R.
- RStudio. Go to RStudio’s website https://rstudio.com/products/rstudio/download/ and download the FREE version.
- Cisco AnyConnect VPN. See attached directions in cisco_vpn.pdf
- GitHub https://github.com/ If you don’t have a GitHub account already you need to sign up for one; it’s free.
Operating System Specific Programs
Mac
- You will need to use the terminal application to connect to remote servers. This comes with the Mac, so no additional software is needed, but it’s not listed in the Applications folder. Use Spotlight to search for Terminal, then pin it to the Dock so you’re prepared to use it when we get to it.
- Download and install XQuartz https://www.xquartz.org/. This will let you view images from the server.
- Download and install Cyberduck. https://cyberduck.io/
- Install Git https://git-scm.com/download/mac
PC
- Download and install PuTTy https://www.chiark.greenend.org.uk/~sgtatham/putty/. This will let you connect to the server.
- Download and install XMing https://sourceforge.net/projects/xming/. This will let you view images from the server.
- Download and install WinSCP https://winscp.net/eng/index.php. This will let you transfer files to and from your computer and the server.
- Install Git https://git-scm.com/download/win
Schedule
Daily Schedule
- 9AM-10AM Daily Lecture
- 12PM Presentations; Tuesdays - Thursdays Speaker Presentations, Fridays Participant Presentations
- 1PM - 3PM Daily Office Hours
Event Schedule
Date | Event | Time | Topic |
---|---|---|---|
7/07 | Daily Lecture | 9AM | Program introduction; Introduction to Git |
Coriell Journal Club | 12PM | Jaroslav Jelinek, MD, PhD | |
7/08 | Daily Lecture | 9AM | R: Introduction to the tidyverse, Rmarkdown, and dplyr |
BRE Talk | 12PM | Jaroslav Jelinek, MD, PhD: Epigenetics and DNA methylation | |
7/09 | Daily Lecture | 9AM | R: Plotting with ggplot2 |
BRE Talk | 12PM | Jozef Madzo, PhD | |
7/10 | Daily Lecture | 9AM | How to read a scientific paper |
BRE Talk | 12PM | TBD | |
7/14 | Daily Lecture | 9AM | R: Importing and tidying data with readr and tidyr |
No talk, Coriell Seminar cancelled | |||
7/15 | Daily Lecture | 9AM | R: Clustering |
BRE Talk | 12PM | Himani Vaidya, MS | |
7/16 | Daily Lecture | 9AM | R: Statistics |
BRE Talk | 12PM | Dara Kusic, PhD | |
7/17 | Daily Lecture | 9AM | Catch up day |
Participant Presentaions | 12PM | First 3 review papers presented, 20 min per person | |
7/21 | Daily Lecture | 9AM | How to use a Linux server |
Coriell Journal Club | 12PM | Nahid Turan, PhD | |
7/22 | Daily Lecture | 9AM | Processing RNA-seq data |
BRE Talk | 12PM | Laura Scheinfeldt, PhD | |
7/23 | Daily Lecture | 9AM | Analyzing RNA-seq data |
BRE Talk | 12PM | Gennaro Calendo, MS | |
7/24 | Daily Lecture | 9AM | Introduction to final project |
Participant Presentaions | 12PM | First 3 review papers presented, 20 min per person | |
7/28 | Final project work day | ||
Coriell Seminar | 12PM | Dara Kusic, PhD | |
7/29 | Final project work day | ||
BRE Talk | 12PM | Matt Mitchell, PhD | |
7/30 | Final project work day | ||
BRE Talk | 12PM | Shoghag Panjarian, PhD | |
7/31 | Final project presentations | 12PM | All participants 10 minute presentation on RNA-seq analyzed |
BRE Material
July 07: Introduction to Git
- Introduction to Coriell’s Bioinformatics Research Experience
- Introduction to all things Git: Git, GitHub, GitHub Desktop and GitHub Classroom
- slides
Today’s Assignment: GitHub Practice https://classroom.github.com/a/kNgsFDtr
July 08: Introduction to R and the Tidyverse
Today’s Assignment: dplyr https://classroom.github.com/a/6j30_kft
July 09: Plotting with ggplot2
Today’s Assignment: dplyr https://classroom.github.com/a/WCFxfVF_
July 10: How to Troubleshoot Code and How to Read a Scientific Paper
- How to troubleshoot code
- How to read a scientific paper
- slides
Journal Club Assignment
- Assignment description is on the last slide in the slides above.
- The Google Doc to sign up for papers and times is here https://docs.google.com/spreadsheets/d/1TVXcyu1PKn4d_Wk2hEjOWLUxLkwE4U_xLLDlVM2k2CU/edit?usp=sharing
- example slides
- Example video https://www.youtube.com/watch?v=xXKL9uLFHy4
- Papers
- Cani 2018 Microbiome
- Greenberg 2019 Methylation
- Kazazian 2017 Repetitive/Transposable Elements
- Klemm 2019 Chromation Accessibility
- Stark 2019 RNA-seq
- Tang 2019 Single Cell Sequencing
July 14: Reading Data with readr
and Tidying Data with tidyr
Today’s Assignment: readr and tidyr https://classroom.github.com/a/rLa6syth
July 15: Clustering
Today’s Assignment: Clustering https://classroom.github.com/a/UDYxMOkf
July 16: Statistics
Today’s Assignment: Statistics https://classroom.github.com/a/4NWo-P_3
July 17: Grab Bag and Exploratory Data Analysis
Today’s Assignment: Exploratory Data Analysis https://classroom.github.com/a/sUlgFd-I
Because this is a more involved assignment, it’s due Friday 7/24
July 21 Command-Line Server Navigation
- Getting Started with the Server pdf
- slides
- Rough list from
history
of all commands shown today server_navigation_demo.txt
Today’s Assignment: Practice Server Commands Playing Terminus https://classroom.github.com/a/sIUEXjYO
July 22 More Command-Line
- slides
- tmux cheatsheet https://gist.github.com/MohamedAlaa/2961058
July 23 Process RNA-seq Data Day 1
- RNA-seq processing full code, code covered in class
- slides
July 24 Process RNA-seq Data Day 2
- RNA-seq processing full code
- slides
- Download and upload files via command line instructions in md and in pdf
Assignment: Process RNA-seq Data
- RNA-seq files are on the server at
/mnt/data/encode_tissue_data/
- Tissue Sign Up Google Doc https://docs.google.com/spreadsheets/d/1qea2Zgy1OuA-nUgMrdDKemS1QTN2n2m_fEk9mNH2_JA/edit?usp=sharing
July 28 Analyze RNA-seq Day 1
- Analyze RNA-seq demo code
- slides
- DESeq2 https://bioconductor.org/packages/release/bioc/html/DESeq2.html
July 29 Analyze RNA-seq Day 2
- Analyze RNA-seq demo code with in lecture changes over the past 2 days
Assignment: Analyze RNA-seq
- assignment in markdown and as a pdf PDF
- We never got to this assignment, but if you want to try to complete the RNA-seq analysis on your own, this is what you should do
July 30 Analyze RNA-seq Day 3: Pathway Analysis
- pathway analysis demo code and with in class changes
Example RNA-seq Analysis
Here’s an example RNA-seq analysis in Rmd, html, and pdf.
The count files used are posted on the server used for the course at /mnt/data/encode_tissue_data/
and are also posted here on GitHub at RNA-seq/rnaseq_example/count_tables/