Websites can be full of useful data that are not always downloadable or easily accessible. Rather than doing a manual copy/paste of a site, python allows you to access the raw HTML behind every webpage and automate the process of retrieving, structuring, and outputting data from pages across a domain. This workshop will cover identifying good candidates for scraping, discovering what data can be scrapped, and how python helps automate the process. Attendees are encouraged to bring in examples of sites they want to scrape as there may be some time to discuss individual projects.This class assumes a working knowledge of python (running code, installing libraries, etc) and familiarity with HTML structure.
This workshop will use the Linux command line to run python code. While lab computers have Python IDLE installed, attendees can use personal laptops with any python environment to run code.