🏆 Cleaning and Summarizing Data with pandas (Python 2)
Event box
🏆 Cleaning and Summarizing Data with pandas (Python 2) In-Person
Move beyond the basics and start analyzing real-world research data.
This 2-hour, hands-on workshop is designed for researchers, students, and faculty who have a basic grasp of Python and are ready to tackle the "messy" side of data science. Transitioning from basic scripts & notebooks to a professional local development workflow, we will use Positron—the new data science IDE from the creators of RStudio—to manage a complete analysis pipeline.
Using the OASIS-1 neuroscience dataset, we will work through the practical steps of transforming raw MRI demographics into research insights. By the end of the session, we will answer a specific clinical question: Does normalized brain volume differ by dementia rating?
Learning Objectives:
By the end of this session, learners will be able to:
- Establish a reproducible research workflow by setting up a local environment with Positron and virtual environments.
- Load tabular data into a pandas DataFrame and inspect its structure using
info(),head(),describe(), and related methods - Select specific rows, columns, and subsets of data using bracket notation and
.loc[] - Clean a real-world dataset by handling missing values (
dropna(),fillna()), renaming columns, dropping unnecessary columns, and converting data types - Summarize data by grouping and aggregating with
groupby()to answer a specific research question - Visualize data distributions and relationships using seaborn (if time permits)
- Date:
- Thursday, April 23, 2026
- Time:
- 10:00am - 12:00pm
- Time Zone:
- Eastern Time - US & Canada (change)
- Location:
- SHM L 111, Cushing/Whitney Medical Library, 333 Cedar Street
- Campus:
- Medical School
- Categories:
- Coding Data Programming
Workshop Incentive Program: Any Yale affiliate who attends at least three library workshops this semester will be eligible to receive a FREE Yale Library tote bag. Learn more.
