Jun 13-14, 2016
9:00am - 5:00pm
Instructors: Erika Mudrak, Emily Davenport, Lynn Johnson
Helpers: Francoise Vermeylen, Stephen Parry, Kevin Packard, David Kent
Data Carpentry workshops teach basic concepts, skills and tools for working more effectively with data.
We will cover Data organization in spreadsheets and Data cleaning with OpenRefine, R day 1: managing and analyzing data with dplyr, visualizing data with ggplot , SQL for data management and R day 2: Intro to programming, and automatic reports with R Markdown. Participants should bring their laptops and plan to participate actively. By the end of the workshop learners should be able to more effectively manage and analyze data and be able to apply the tools and approaches directly to their ongoing research.
Who: The course is aimed at faculty, research staff, postdocs, graduate students, advanced undergraduates, and other researchers in any field. Priority will be given to people from Cornell Departments that support CSCU. See this page for a list of such departments.
Where: Albert R. Mann Library Room B30A, 237 Mann Drive, Cornell University. Get directions with OpenStreetMap or Google Maps.
Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating sytem (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). They are also required to abide by Data Carpentry's Code of Conduct.
Prerequisites: We especially encourage registration for those who may be less familiar with the above topics. To allow for coverage of more advanced R topics, we require that participants be familiar enough with R and RStudio to:
data.frame
via read.csv()
or read.table()
$
<-
or =
) and comments #
data.frame
via head() str() summary() nrow() ncol() names() rownames() table() levels() mean() length() max() min()
as.factor() relevel() as.numeric()
c()
or seq()
and index them with [,]
bracket notation[,]
bracket notion and logical vectors (==, !=, <, >, %in%
) for conditionsplot() barplot() hist()
?
or searching the help tab
If you have never used R or want a refresher, please prepare for the Data Carpentry Workshop by attending CSCU's free workshops:
Learn the above in Introductory Statistic Using R on June 9
Practice the above in Intermediate Statistics Using R on June 10th.
Fee: We charge a $40 fee to help defray costs.
Contact: Please mail mudrak@cornell.edu for more information.
Please be sure to complete these surveys before and after the workshop.
We will use The Portal Project Teaching Database
Get it here: https://github.com/datacarpentry/ecology-workshop/blob/master/data.md
Data for OpenRefine Lesson https://www.dropbox.com/s/kbb4k00eanm19lg/Portalrodents19772002_scinameUUIDs.csv?dl=0
Data for ggplot2 Lesson svy_complete.csv (right click on the link to save)
These are the lessons that were used during the workshop
Introductory powerpoint (Monday morning)
Excel lessons (Monday morning)
OpenRefine lessons (Monday morning)
R lessons (Both days)
SQL lessons (Tuesday morning)
Right click on the links to save:
This is the script that Lynn generated on her computer during the lessons for dplyr and ggplot2 (Monday afternoon)
This script contains the code Erika used to query SQL databases from R (Tuesday afternoon)
This is the README markdown file Emily generated (Tuesday afternoon)
This is the Rmd document Emily generated during the Rmarkdown lesson (Tuesday afternoon)
This is the Rmd document Emily generated during the if/else, loops, and function lesson (Tuesday afternoon)
Morning | Data organization in spreadsheets and Data cleaning with OpenRefine |
Afternoon | R day 1: managing and analyzing data with dplyr, visualizing data with ggplot |
Morning | SQL for data management |
Afternoon | R day 2: Intro to programming, and automatic reports with R Markdown |
Etherpad: https://public.etherpad-mozilla.org/p/2016-06-13-cornell.
We will use this Etherpad for chatting, taking notes, and sharing URLs and bits of code.
If that link goes dead, an export of the etherpad is here.
To participate in a Data Carpentry workshop, you will need working copies of the described software. Please make sure to install everything (or at least to download the installers) before the start of your workshop. Participants should bring and use their own laptops to insure the proper setup of tools for an efficient workflow once you leave the workshop.
Please follow these Setup Instructions.
We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.