Hitchhikers-guide - The Hitchhiker's Guide to Data Science for Social Good

Overview

Welcome to the Hitchhiker's Guide to Data Science for Social Good.

What is the Data Science for Social Good Fellowship?

The Data Science for Social Good Fellowship (DSSG) is a hands-on and project-based summer program that launched in 2013 at the University of Chicago and has now expanded to multiple locations globally. It brings data science fellows, typically graduate students, from across the world to work on machine learning, artificial intelligence, and data science projects that have a social impact. From a pool of typically over 800 applicants, 20-40 fellows are selected from diverse computational and quantitative disciplines including computer science, statistics, math, engineering, psychology, sociology, economics, and public policy.

The fellows work in small, cross-disciplinary teams on social good projects spanning education, health, energy, transportation, criminal justice, social services, economic development and international development in collaboration with global government agencies and non-profits. This work is done under close and hands-on mentorship from full-time, dedicated data science mentors as well as dedicated project managers, with industry experience. The result is highly trained fellows, improved data science capacity of the social good organization, and a high quality data science project that is ready for field trial and implementation at the end of the program.

In addition to hands-on project-based training, the summer program also consists of workshops, tutorials, and ethics discussion groups based on our data science for social good curriculum designed to train the fellows in doing practical data science and artificial intelligence for social impact.

Who is this guide for?

The primary audience for this guide is the set of fellows coming to DSSG but we want everything we create to be open and accessible to larger world. We hope this is useful to people beyond the summer fellows coming to DSSG.

If you are applying to the program or have been accepted as a fellow, check out the manual to see how you can prepare before arriving, what orientation and training will cover, and what to expect from the summer.

If you are interested in learning at home, check out the tutorials and teach-outs developed by our staff and fellows throughout the summer, and to suggest or contribute additional resources.

*Another one of our goals is to encourage collaborations. Anyone interested in doing this type of work, or starting a DSSG program, to build on what we've learned by using and contributing to these resources.

What is in this guide?

Our number one priority at DSSG is to train fellows to do data science for social good work. This curriculum includes many things you'd find in a data science course or bootcamp, but with an emphasis on solving problems with social impact, integrating data science with the social sciences, discussing ethical implications of the work, as well as privacy, and confidentiality issues.

We have spent many (sort of) early mornings waxing existential over Dunkin' Donuts while trying to define what makes a "data scientist for social good," that enigmatic breed combining one part data scientist, one part consultant, one part educator, and one part bleeding heart idealist. We've come to a rough working definition in the form of the skills and knowledge one would need, which we categorize as follows:

  • Programming, because you'll need to tell your computer what to do, usually by writing code.
  • Computer science, because you'll need to understand how your data is - and should be - structured, as well as the algorithms you use to analyze it.
  • Math and stats, because everything else in life is just applied math, and numerical results are meaningless without some measure of uncertainty.
  • Machine learning, because you'll want to build predictive or descriptive models that can learn, evolve, and improve over time.
  • Social science, because you'll need to know how to design experiments to validate your models in the field, and to understand when correlation can plausibly suggest causation, and sometimes even do causal inference.
  • Problem and Project Scoping, because you'll need to be able to go from a vague and fuzzy project description to a problem you can solve, understand the goals of the project, the interventions you are informing, the data you have and need, and the analysis that needs to be done.
  • Project management, to make progress as a team, to work effectively with your project partner, and work with a team to make that useful solution actually happen.
  • Privacy and security, because data is people and needs to be kept secure and confidential.
  • Ethics, fairness, bias, and transparency, because your work has the potential to be misused or have a negative impact on people's lives, so you have to consider the biases in your data and analyses, the ethical and fairness implications, and how to make your work interpretable and transparent to the users and to the people impacted by it.
  • Communications, because you'll need to be able to tell the story of why what you're doing matters and the methods you're using to a broad audience.
  • Social issues, because you're doing this work to help people, and you don't live or work in a vacuum, so you need to understand the context and history surrounding the people, places and issues you want to impact.

All material is licensed under CC-BY 4.0 License: CC BY 4.0

Table of Contents

The links below will help you find things quickly.

DSSG Manual

Summer Overview

This sections covers general information on projects, working with partners, presentations, orientation information, and the following schedules:

Conduct, Culture, and Communications

This section details the DSSG anti-harassment policy, goals of the fellowship, what we hope fellows get out of the experience, the expectations of the fellows, and the DSSG environment. A slideshow version of this can also be found here.

Curriculum

This section details the various topics we will be covering throughout the summer. This includes:

Wiki

In the wiki, you will find a bunch of helpful information and instructions that people have found helpful along the way. It covers topics like:

  • Accessing S3 from the command line
  • Creating an alias to make Python3 your default (rather than python2)
  • Installing RStudio on your EC2
  • Killing your query
  • Creating a custom jupyter setup
  • Mounting box from ubuntu
  • Pretty Print psql and less output
  • Remotely editing text files in your favorite text editor
  • SQL Server to Postgres
  • Using rpy2
  • VNC Viewer
Owner
Data Science for Social Good
Data Science for Social Good
Exercise to teach a newcomer to the CLSP grid to set up their environment and run jobs

Exercise to teach a newcomer to the CLSP grid to set up their environment and run jobs

Alexandra 2 May 18, 2022
Ingest openldap data into bloodhound

Bloodhound for Linux Ingest a dumped OpenLDAP ldif into neo4j to be visualized in Bloodhound. Usage: ./ldif_to_neo4j.py ./sample.ldif | cypher-shell -

Guillaume Quéré 71 Nov 09, 2022
AHP Calculator - A method for organizing and evaluating complicated decisions, using Maths and Psychology

AHP Calculator - A method for organizing and evaluating complicated decisions, using Maths and Psychology

16 Aug 08, 2022
Code for Crowd counting via unsupervised cross-domain feature adaptation.

CDFA-pytorch Code for Unsupervised crowd counting via cross-domain feature adaptation. Pre-trained models Google Drive Baidu Cloud : t4qc Environment

Guanchen Ding 6 Dec 11, 2022
A python script for combining multiple native SU2 format meshes into one mesh file for multi-zone simulations.

A python script for combining multiple native SU2 format meshes into one mesh file for multi-zone simulations.

MKursatUzuner 1 Jan 20, 2022
Minitel 5 somewhat reverse-engineered

Minitel 5 The Minitel was a french dumb terminal with an embedded modem which had its Golden Age before the rise of Internet. Typically cubic, with an

cLx 10 Dec 28, 2022
A toolkit for developing and deploying serverless Python code in AWS Lambda.

Python-lambda is a toolset for developing and deploying serverless Python code in AWS Lambda. A call for contributors With python-lambda and pytube bo

Nick Ficano 1.4k Jan 03, 2023
Use Fofa、shodan、zoomeye、360quake to collect information(e.g:domain,IP,CMS,OS)同时调用Fofa、shodan、zoomeye、360quake四个网络空间测绘API完成红队信息收集

Cyberspace Map API English/中文 Development fofaAPI Completed zoomeyeAPI shodanAPI regular 360 quakeAPI Completed Difficulty APIs uses different inputs

Xc1Ym 61 Oct 08, 2022
A code to clean and extract a bib file based on keywords.

These are two scripts I use to generate clean bib files. clean_bibfile.py: Removes superfluous fields (which are not included in fields_to_keep.json)

Antoine Allard 4 May 16, 2022
Find virtual hosts (vhosts) from IP addresses and hostnames

Features Enumerate vhosts from a list of IP addresses and domain names. Virtual Hosts are enumerated using the following process: Supplied domains are

3 Jul 09, 2022
Fabric mod where anyone can PR anything, concerning or not. I'll merge everything as soon as it works.

Guess What Will Happen In This Fabric mod where anyone can PR anything, concerning or not (Unless it's too concerning). I'll merge everything as soon

anatom 65 Dec 25, 2022
Convert ldapdomaindump to Bloodhound

ldd2bh Usage usage: ldd2bh.py [-h] [-i INPUT_FOLDER] [-o OUTPUT_FOLDER] [-a] [-u] [-c] [-g] [-d] Convert ldapdomaindump to Bloodhoun

64 Oct 30, 2022
Model synchronization from dbt to Metabase.

dbt-metabase Model synchronization from dbt to Metabase. If dbt is your source of truth for database schemas and you use Metabase as your analytics to

Mike Gouline 270 Jan 08, 2023
🌲 Um simples criador de arvore de items feito em Python para o Prompt 🐍

Esse projeto foi feito em Python com, intuito de fortificar meu aprendizado de programação. Sobre • Tecnologias • Pré Requisitos • Licença • Autor 📄

Kawan Henrique 1 Aug 02, 2021
Assignment for python course, BUPT 2021.

pyFuujinrokuDestiny Assignment for python course, BUPT 2021. Notice username and password must be ASCII encoding. If username exists in database, syst

Ellias Kiri Stuart 3 Jun 18, 2021
A domonic-like wrapper around selectolax

A domonic-like wrapper around selectolax

byteface 3 Jun 23, 2022
Think DSP: Digital Signal Processing in Python, by Allen B. Downey.

ThinkDSP LaTeX source and Python code for Think DSP: Digital Signal Processing in Python, by Allen B. Downey. The premise of this book (and the other

Allen Downey 3.2k Jan 08, 2023
A simple watcher for the XTZ/kUSD pool on Quipuswap

Kolibri Quipuswap Watcher This repo holds the source code for the QuipuBot bot deployed to the #quipuswap-updates channel in the Kolibri Discord Setup

Hover Labs 1 Nov 18, 2021
AIO solution for SSIS students

ssis.bit AIO solution for SSIS students Hardware CircuitPython supports more than 200 different boards. Locally available is the TTGO T8 ESP32-S2 ST77

3 Jun 05, 2022
Add your recently blog and douban states in your GitHub Profile

Add your recently blog and douban states in your GitHub Profile

Bingjie Yan 4 Dec 12, 2022