Developing and evaluating de-identification approaches for primary care electronic medical record data (De-ID) study

Data to Enable a Learning Health System
In Progress
Electronic Medical Records
Oct 2023 - June 2025


In Canada, electronic medical records (EMRs) are a vital part of primary healthcare, providing a rich data source for research and quality improvement. These records comprise structured data, which are easily analyzed, and unstructured data such as physician’s progress notes, which are not. The latter presents privacy risks due to the sensitive information contained. To ensure privacy, the Personal Health Information Protection Act requires de-identification of patient data, which is done using various automated tools. However, most tools are developed with U.S. healthcare regulations in mind and may not suit the Canadian context, particularly for primary care EMR notes. Therefore, Canada lacks a standard method for de-identifying EMR progress notes, posing a challenge for leveraging this data in research while maintaining patient confidentiality.

The objective of this study is to test and evaluate several available de-identification approaches for Canadian primary care EMR free-text notes. Depending on the validation results whether adaptations or more accuracy are needed, we also aim to develop a new tool and evaluate it as appropriate, with the aim of establishing a recommendation for best practices.


This work should have multiple effects. First, robust de-identification technologies will improve patient data security and privacy, boosting public trust and compliance with EMR systems. Second, the de-identified datasets will be used for future research, advancing healthcare by offering a rich, anonymized dataset for study without compromising patient anonymity. This project will strengthen EMR data use for research and quality improvement while maintaining the highest patient privacy and data security. 

Team Members

Contact Information