BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Springshare//LibCal//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-TIMEZONE:America/New_York
X-PUBLISHED-TTL:PT15M
BEGIN:VEVENT
DTSTART:20260303T140000Z
DTEND:20260303T163000Z
DTSTAMP:20260303T000000Z
SUMMARY:Working with Big Data in R
DESCRIPTION:The workshop emphasizes best practices for data quality 
 assurance and statistical considerations when working with large datasets\, 
 addressing common challenges such as computational efficiency\, memory 
 management\, and maintaining data integrity across complex processing 
 pipelines. This intermediate-level workshop provides social science 
 researchers with essential skills for analyzing datasets that exceed 
 typical computer memory limitations. Participants will learn to distinguish 
 between datasets and databases\, implement efficient data storage solutions 
 using Apache Arrow and Parquet files\, and build robust 
 Extract-Transform-Load (ETL) pipelines for large-scale data processing. The 
 workshop covers partitioning strategies for optimal performance\, writing 
 custom functions using both dplyr API for Acero and SQL syntax\, and 
 creating local analytical databases with DuckDB. Through hands-on exercises 
 using real voter file data\, researchers will develop practical skills in 
 out-of-core processing\, database management\, and scalable data analysis 
 workflows. \n\nPrerequisites: The frameworks and packages used in this 
 workshop are designed to be written in tidy syntax. We will be using 
 chained operations\, high-level control structures and writing custom 
 functions. Working proficiency in both R and tidy code is strongly 
 recommended.\n\nPlease note that registrants for the "Working with Big 
 Data" workshop will need access to the L2 Political dataset. To access this 
 data\, please complete the required form at least one week prior to the 
 workshop date.
LOCATION:RKZ Library Classroom 01\, Science Hill
ORGANIZER;CN="Ted Ellsworth":MAILTO:ted.ellsworth@yale.edu
CATEGORIES:Marx Science and Social Science Library , StatLab
CONTACT;CN="Ted Ellsworth":MAILTO:ted.ellsworth@yale.edu
STATUS:CONFIRMED
UID:LibCal-16254142
URL:https://schedule.yale.edu/event/16254142
X-MICROSOFT-CDO-BUSYSTATUS:BUSY
BEGIN:VALARM
TRIGGER:-PT15M
ACTION:DISPLAY
DESCRIPTION:Reminder
END:VALARM
END:VEVENT

END:VCALENDAR