Using artificial intelligence to derive data on the social determinants of health from electronic medical records

Data to Enable a Learning Health System
In Progress
Artificial Intelligence, Machine Learning, Social Determinants of Health


Our health is determined in large part by factors outside of the health care system. Known as the social determinants of health, these are the conditions in which we live, grow, work and play, as well as the processes in society that shape our access to crucial resources, including the health care system. The Public Health Agency of Canada identify a number of key social determinants including gender identity, socioeconomic status, and housing status.

While the social determinants of health are well-known by health providers and policymakers, we typically do not routinely collect this information in a structured and consistent way in health organizations in Canada and in many countries. Where it is collected, it is a time-intensive process that requires each patient to answer a standardized survey. We will explore whether data on the social determinants of health can be derived from electronic medical record (EMR) data that emerges from routine clinical care.

We build on our recent work, which has used unique structured self-reported SDoH data from 17,139 patients linked to their EMR. First, we will evaluate approaches that use natural language processing and machine learning to derive SDoH data, and examine whether they outperform simple searches of regular expressions. Second, we will evaluate whether structured self-reported SDoH data collected in clinics in six provinces in a continuous fashion can be used to regularly train and improve our algorithms over time, and whether performance varies by contexts. We will also examine whether our algorithms are biased.

Our study team includes experts in primary care, EMR data, SDoH, computer science, statistics and ethics, local, provincial and national knowledge users, and patient partners.


If data on social determinants can be successfully derived from EMR data this will save a great deal of time and effort and represent a significant new data source to identify and reduce health inequities.

Team Members

Contact Information