Pseudonymisation

What is it?

Pseudonymisation is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, known as pseudonyms from which the identities of individuals cannot be inferred.

Examples of this process are replacing an NHS number with another random number, replacing a name with a code or replacing an address with a location code.

Pseudonyms themselves should not contain any information that could identify the individual to which they relate (e.g. should not be made up of characters from the date of birth, etc.).

The correct application of this process will produce the same pseudonym for a patient across different data sets and time so that patient data can still be linked.

A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing.

Pseudonymized data can be restored to its original state with the addition of information which allows individuals to be re-identified. In contrast, anonymisation is intended to prevent re-identification of individuals within the dataset.

Why is it important?

Pseudonymisation by DSCRO prior to data being sent to the ICB effectively anonymises the patient level information. Only the DSCRO holds the key with which the pseudonym can be reversed, and as such the ICB or its partner organisations are not able to re-identify and obtain the NHS number.

The correct application of this process will produce the same pseudonym for a patient across different data sets and time so that patient data can still be linked, patient journeys across different services and providers can be followed, allowing the NHS to better understand how they affect each other.

How will it be done?

To effectively pseudonymise data the following actions must be taken:

  • an algorithm must be applied to the agreed data field within the patient record, for example the NHS number to generate a pseudonymised identification number to be used on reports from secondary uses purposes.
  • each field of PCD must have a unique pseudonym.
  • pseudonyms to be used in place of NHS numbers and other fields that are to be used by staff must be of the same length and formatted on output to ensure readability.

For example, to replace NHS numbers in existing report formats, then the output pseudonym should generally be of the same field length but not of the same character, i.e., “5L7 TWX 619Z” which incorporates letters within the pseudonym for an NHS number to avoid confusion with original numbers.