jake kara, software engineer ‣ scrapetools.py - for pulling down all linked files from a page ░ • Projects • Blog January 27, 2017 scrapetools.py - for pulling down all linked files from a page I wrote some Python functions to help download every file linked to on a web page. Here’s the repo It’s called scrape_tools.py, and it has a modest four methods: • makedir(directory) - create a directory if it doesn’t exist. usefule for setting up your output directory structure programmatically. • get(url) - wraps requests.get() call and raises an exception when response status != 200. Otherwise returns content. • download_bin(url,output_file) - downloads a file, again using requests, and saves it to output_file (which is path string, not file handle). • def get_files(html, base_url=lambda x: x, match_term=”.csv”, fname=lambda x: x) - get all files linked to in html, containing the term match_term, which defaults to “.csv”. base_url is a method that takes a url and generates a base url, and fname is function takes a url and generates a local filename to save the file as on the local machine. Both of these methods have default values that do nothing. example: seec.py The example seec.py demonstrates how to use the file download all of the CSV files linked to on the Connecticut State Elections Enforcement Commission’s disbursement and receipt data page.