Harnessing Data Science for Social Impact in 2024 By: Alana Walker October 25, 2023 Estimated reading time: 6 minutes. Data can do a whole lot of good. Across the globe, nonprofits, NGOs, and data and investment firms are working together to bring about data driven social change. However, as we’ll see in this article, there is a divide between large companies generating large profits from data and the smaller companies that need access to standardized data to fuel their company’s goals. What is data science for social impact? According to Sopact, data science for social impact is a process whereby: “Data-driven methods are used to address social and environmental problems.” Combining data analysis, machine learning, natural language processing, and statistical modelling, data for social impact aims to help organizations and community actors extract meaningful insights from their data to inform decisions that lead to positive change. Data science for social impact helps organizations harness the power of data to better understand their audience’s needs and behaviours. This is especially important for nonprofits, public services, and NGOs, who are behind in leveraging their data to accomplish their respective missions. What is the main purpose of data science? The main goal of data science is to identify patterns from large data sets. Using statistical analysis strategies, data scientists transform unsorted mass data into actionable and vital insights that companies can integrate to bring about positive change. How can data science be used to help people? Data science for social good is currently used to drive projects that have a positive impact on everything from policing equity and healthcare to making government benefits easier to access. In Switzerland, data scientists from the Immigration Policy Lab at Stanford University and ETH Zurich worked off of historical data of where immigrants often found work. Taking this information, they then created an algorithm that optimizes job placement for incoming refugees. The Company MedAware is helping to keep hospital patients safe by eliminating prescription errors. The organization has built machine-learning algorithms that analyze data from millions of electronic health records to find outliers in prescribing patterns, thereby alerting healthcare professionals of prescription errors. The Center for Policing Equity gathers and analyzes behavioural data in public safety systems, using that data to help communities implement safer policing practices. Driven by the goal of making policing less racist, deadly, and omnipresent, the company is collecting and analyzing data to detect communities where racial justice is lacking and equip them with the right resources to redesign public safety and achieve racial equity. Challenges and ethical considerations Before setting a course for any action, those using datasets must be aware of biases. Biases can be particularly detrimental in cases of using data for social good as the primary stakeholders are usually NGOs or nonprofits that don’t have a lot of backup funds in case something goes wrong. Historical Bias Historical data refers to when socio-cultural prejudices and beliefs are reflected in systematic processes. This becomes especially problematic when these datasets are used to train machine learning algorithms. As many of these organizations often have racial or gender equality as part of their mandates, using historically biased data can lead to inaccurate readings of racial or gender-based information and lead researchers to wrong conclusions about how to help these groups. Survivorship Bias Survivorship bias can lead data scientists to focus on the winners of history instead of looking at the big picture. This is glaringly negative for social actors as minority communities have often been sidelined by systemic processes on which a lot of data is based. Outlier Bias Outlier bias refers to when the overall average of the dataset hides data showing undesirable outcomes. In the MedAware example, the company uses this to their advantage as the outlier data is what points them to possible mistaken prescriptions doctors would otherwise hand out. The “Data Divide” The biggest challenges, however, are linked to a lack of funding for the organizations involved and frustrating systems to navigate. Defined as the data divide, this term refers to the mass profits large companies generate from artificial intelligence and data while nonprofits and NGOs lag severely behind. For example, US corporations raked in $2.77 trillion in profits, a record. Meanwhile, the United Nations Sustainable Development Goals have seen less than 10 percent growth in over 20 years. But once community organizations harness the power of data, they flourish. A 2017 study by IBM found that 78 percent of nonprofits using advanced analytics reported higher effectiveness in accomplishing their stated missions. The Stanford Social Innovation Review has identified four main reasons for this data divide. There are few dominant or widely-used data platforms in the social sector. Contrary to the private sector, nonprofits often use a variety of homemade or specialty data tools, resulting in data fragmentation and a need for more transferability between organizations. Social impact data is not structured. There is no federal standard for how the information about the impact of social programs is to be organized. Advanced use cases are underdeveloped. Due to the scattered nature of small-scale organizations’ data and the lack of structure, investment in new technologies in the sector is limited. Funding programs that work are not closely linked to impact. Impact data (positive results) don’t generate immediate financial benefits so the organization can grow. What the industry needs is large-scale, shareable, accessible, standardized data platforms. Some advancements have been made in this area, with intermediaries like TechSoup, which offers free or reduced-cost access to hundreds of products and organizations like DataKind and Data.org, making data science talent more accessible to nonprofits. However, on a larger scale, the data remains piecemeal, and the tools used and access to information need to be standardized and made more accessible. The role of organizations in impactful social change Once organizations realize the power of data, it takes the collaboration of several actors working together on the local level to bring about change. NGOs often need the support of government funding and partnership with a tech or social data company. The larger companies need the in-touch info that NGOs can provide on the needs of their communities. One example of this is the Built for Zero initiative to bring homelessness to a “functional zero” in the United States. Functional zero is a milestone determined by the Built for Zero initiative. It indicates that a local community has “solved” homelessness for their population, meaning that homelessness is a rare and brief occurrence. The methodology is as follows: Real-time data accounts for everyone in the database experiencing homelessness by name and specific needs. Using information collected and shared with each individual’s consent, each person has their specific needs and housing situation recorded. With this by-name list, updated at least monthly, the community can more easily match the individual with their ideal housing situation. This allows the community to allocate resources effectively and understand if their efforts are working. They also use real-time data to secure housing resources and where to place them for the greatest success. For a full overview of how Built for Zero functions, you can check out this video explaining their methodology. How do companies measure social impact? Each company has different goals and ways to measure success. For example, in the Built for Zero example above, the end goal is that homelessness is rare and lasts only a short time. Whatever the goals, companies will have a framework for achieving them: Identify the problem Build a strategy Measure the output data Adjust as needed Tricks and tools of the trade Both data science and AI tools are used to measure impact. Common machine learning tools like Sckitlearn and TensorFlow are helpful in building algorithms to track information. Data analytics tools like SQL, Python, and Microsoft Excel are used in tandem to meet goals. Where to next? One area that’s being explored is using fabricated data to mimic real-world modelling in healthcare. Data scientists evaluate patterns of disease experienced in the real world and use that data to develop a fictitious population of patients that imitate the real data, explains Dean Euric of the School of Public Health at the University of Alberta. Of course, innovation goes beyond the technical. More companies are implementing social impact practices into their business processes. According to Pearl Consulting, a company based out of Singapore, some future trends include companies increasing the importance of metrics, frameworks, and disclosure to shareholders. Investors increasingly desire to see companies act on social issues such as climate change. A 2022 KPMG survey of more than 1,300 global CEOs found that 69% now face pressure from stakeholders to improve reporting transparency when it comes to social issues. While larger companies pivoting to a more socially conscious way of operating can be seen as a positive, it’s important to remember that many smaller organizations with mission driven social goals have been fighting for recognition and funding for a long time. To truly drive change, the industry needs three things: Innovation - attract entrepreneurs, researchers, and technologists who can develop new ways to make data more refined and more accessible. Ex. The What Works Clearinghouse at the Institute for Educational Sciences codes evaluation studies on education interventions’ effectiveness. Incentives - increased demand for funding socially-driven projects. Ex. In 2004, the Ansari XPRIZE awarded $10 million (at the time, the largest prize in history) to anyone who could build a reusable crewed spaceship that could go into space twice in two weeks. This incentivized other prize competitions directly linked cash to outcomes to take place. Investment - governments need to rethink how they invest. Social impact data needs more people to work in data science at socially driven organizations. Beyond the rewarding career that working in data offers, you could also have the satisfaction that your career is influencing social good. To start your social impact journey, sign up for the Data Science Program today.