Anonymization has long been a popular topic in the privacy world. But it is getting even more attention in Canada these days as federal Bill C-27 gets analyzed at the Standing Committee on Industry and Technology (INDU) of the House of Commons, and in response to the proposed Regulation respecting the anonymization of personal information in connection with Law 25 in Quebec.
Privacy laws are designed to protect “personal” information – meaning, information that can be connected to a specific person. Therefore, a threshold issue for the application of privacy laws is identifiability. If information can identify a person, privacy laws will apply; if a person cannot be identified from the information, it is generally considered outside the scope of such laws. This is why terminology like “de-identified” or “anonymized” or “aggregate” is so important. It is often assumed that once information has been transformed into these types of states, it is no longer identifiable and, therefore, no longer subject to privacy laws. But is this actually true? Well, it depends.
The pre-condition of identifiability is not an easy one to apply. As is often said, identifiability is viewed on a spectrum. At one end of the spectrum is fully identifiable information (information that can, on its own, directly identify an individual). At the other end of the spectrum is information for which there is no chance an individual will be identified. Most information will fall in between these two states yet there is no consensus across regimes on where or how one can draw a line to demarcate identifiable from non-identifiable information. And the inconsistent use of the relevant terminology compounds the confusion. For example, de-identification, anonymization, and aggregation are terms that are often treated as synonyms of each other, but they can mean different things.
Consider, for example, Ontario’s approach under its health privacy legislation, the Personal Health Information Protection Act, 2004 (PHIPA). PHIPA applies to “identifying information” of a certain type (health information). It uses the term “de-identify” to mean “… to remove any information that identifies the individual or for which it is reasonably foreseeable in the circumstances that it could be utilized, either alone or with other information, to identify the individual….”[1] De-identified information is not considered identifiable under PHIPA and is therefore, with minor exception, outside the scope of the Act.
Under PHIPA, information will be considered de-identified if the risk of re-identification is deemed low enough to be acceptable for the circumstances. The standard of “reasonably foreseeable in the circumstances” from the definition triggers a contextual risk assessment (e.g., consideration of the audience/recipient, their potential uses of the information, what other information the recipient may have available to them), meaning the circumstances will dictate the threshold to be applied. Therefore, if information has been de-identified pursuant to the standard in PHIPA, this does not mean an individual cannot be identified (or, put another way, it does not mean that there is no risk of re-identification); rather it means that based on the risk assessment of the circumstances, the risk that an individual may be identified is low enough to be acceptable to remove the information from PHIPA protections.
Contrast that approach with the one set out in Bill C-27,[2] a bill currently making its way through the legislative process at the federal level. Bill C-27 is an effort to, among other things, repeal and amend parts of the Personal Information Protection and Electronic Documents Act and enact the Consumer Privacy Protection Act (CPPA). The definition of de-identified information in the CPPA is different from the one used in PHIPA – in this context, the meaning is tied to direct identifiers (a data element that can directly and uniquely identify a single individual, either alone or with other information).[3] If an entity removes direct identifiers, the information is de-identified. However, unlike in PHIPA, this de-identified information is still subject to the Act, but a less stringent set of rules is applied to it, in recognition of the fact that the identifiability risk is reduced. To make clear the type of information that will be outside the scope of the Act,[4] the concept of “anonymized data”, which requires a higher level of transformation, was incorporated into the CPPA. De-identified information is subject to the CPPA but anonymized information is not.
And, to further complicate the picture, the definition of “anonymize” as set out in the CPPA creates a very high bar. It means to “irreversibly and permanently modify personal information, in accordance with generally accepted best practices, to ensure that no individual can be identified from the information, whether directly or indirectly, by any means”.[5] The use of the term “irreversible” suggests that the information must have zero risk of re-identification in order to qualify.[6] This creates a very high (arguably, impossibly high) standard to meet and, as such, it becomes very difficult to avoid having to comply with the Act. That may be by design, as there are some arguments for broadening the application of privacy laws to include even non-identifiable (human-derived data), but there is no doubt it represents a shift in approach. This issue has been the subject of discussion at the INDU hearings on Bill C-27 so it will be interesting to see if this proposed language gets amended.
In summary, “de-identified” under PHIPA means something different than “de-identified” as used in the CPPA. Information that meets the PHIPA standard of de-identified can fall somewhere between the standards of “de-identified” and “anonymized” as set out in the CPPA. These different approaches to the defining and regulating of personal information across jurisdictions, even within Canada, create ambiguity, confusion, and operational complexity.
Given these challenges, here are some suggestions to bear in mind:
- It is common in privacy policies to describe how an organization will create, use and/or disclose “de-identified” or “anonymized” information. Before making such statements, ensure you understand which privacy laws apply to your operations, as this will establish the appropriate terminology to use, and the standard to be applied to those commitments. To state the obvious, do not make commitments that your organization is not set up to meet.
- Ensure that those in the organization who are handling de-identified or anonymized information understand what those terms mean. It is common for lay people to use these terms without appreciating that there are specific requirements attached to them.
- Although the law around these concepts is evolving, there are many secondary sources that provide useful guidance on best practices for addressing identifiability. For example, in the health space, you can look to the United States’ Health Information Portability and Accountability Act “Safe Harbor” methods for de-identification of PHI, Health Canada’s guidance document on Public Release of Clinical Information, and Ontario’s Information and Privacy Commissioner’s De-identification Guidelines for Structured Data. There is also ISO 27559 (Information security, cybersecurity and privacy protection – Privacy enhancing data de-identification framework). And there are two useful decisions from Canadian privacy regulators in PHIPA Decision 175 from the Information and Privacy Commissioner of Ontario, and the decision issued by the Privacy Commissioner of Canada in the investigation into Public Health Agency of Canada and Health Canada regarding use of mobility data (some of which was acquired through Telus’ “Data for Good” program).
[1] Personal Health Information Protection Act, 2004, S.O. 2004, Chapter 3 Schedule A, s. 2
[2] Canada, Bill C-27, An Act to enact the Consumer Privacy Protection Act, the Personal Information and Data Protection Tribunal Act and the Artificial Intelligence and Data Act and to make consequential and related amendments to other Acts, 1st Sess., 44th Parl., 2022 (“Bill C-27”).
[3] Examples of direct identifiers: name, home address, email address, telephone number, license plate number, vehicle identification number, social insurance number, health card number, medical record number, internet protocol (IP) address number: De-Identification Guidelines for Structured Data, Information and Privacy Commissioner of Ontario. One can contrast those data elements with “indirect” identifiers.
[4] Section 6(5) of CPPA.
[5] Bill C-27 at cl. 2(1).
[6] Khaled El Emam and Mike Hintze, “10 Recommendations for Regulating Non-identifiable Data”, Replica Analytics (September 2021) at 7, online: https://replica-analytics.com/blog/ten-recommendations-for-regulating-non-identifiable-data/