Data services provided by accredited data service providers under the DATA Scheme
Guidance note 2024:3
This guidance note describes the various types of data services performed by Accredited Data Service Providers (ADSPs) under the DATA Scheme. It includes discussion of the definitions of the data services in the Data Availability and Transparency Act 2022 (the Act) and how these relate to common activities that are undertaken to support data analysis and other uses through data sharing.
Related guidance
For guidance on the role of ADSPs in the DATA Scheme and considerations for determining when as ADSP is required see when to use an ADSP in the DATA Scheme. For guidance on fees that ADSPs may charge for their services see charging of fees by ADSPs. For guidance to assist in the preparation of ADSP accreditation applications and to assist other Scheme entities on how to choose ADSPs see expected characteristics for accreditation as an ADSP.
The three data services
The Act describes three data services that only ADSPs may provide to other entities in the DATA Scheme:
- de-identification data services1
- secure access data services2
- complex data integration services3.
An ADSP may be accredited to provide any or all of these services. The definitions of the services in the Act are broad and do not describe the activities or processes that may be involved in providing the services.
The data services defined in the Act cover services that are commonly undertaken when sharing public sector data, such as data linkage and the management of access to secure research environments. Although there are many common approaches adopted in practice, there are no agreed standard classifications which can be applied to the array of activities that could be covered by the Act’s definitions. Additionally, different terminology is often used among data service providers for similar concepts. The Act takes a particular approach to defining the services. In some cases, it focuses on the nature of the data being shared and privacy protections and other safeguards, rather than the specific activities that comprise each service. This approach aims to accommodate changes in technology and flexibility in the application of practices while upholding safe sharing under the DATA Scheme.
De-identification
The Act provides a broad definition of de-identification services, being ‘a service to treat data that includes personal information so that the data is de‑identified, using techniques that restrict the data being used in a way that would have the result that the data ceases to be de‑identified.’
In this context of the DATA Scheme, de-identification services aim to protect the privacy of individuals. This aligns with requirements in the Act on privacy protections and the treatment of personal data in DATA Scheme projects.
In practice de-identification services can also be provided for projects involving non‑personal data (e.g. suppressing the identity of a business). However, these services are not considered to be de-identification services under the Act. Providing services to de‑identify non-personal data may require the expertise of an ADSP and be covered in the provision of complex data integration services (see below).
The Office of the Australian Information Commissioner (OAIC) provides a guide to understanding de-identification of personal information – see De-identification and the Privacy Act (oaic.gov.au). The OAIC guide describes de-identification as involving two steps. The first is the removal of direct identifiers. The second is taking one or both of the following additional steps:
- the removal or alteration of other information that could potentially be used to re-identify an individual, and/or
- the use of controls and safeguards in the data access environment to prevent re-identification.
Under the DATA Scheme, de-identification services relate to the treatment of the data itself. Controls put in place to manage and secure the data to prevent re-identification are covered under secure access data services (see below).
When considering current practices, the types of common treatments that would meet the DATA Scheme definition include application of the ‘separation’ principle (splitting identifiers from content) and confidentialisation of unit record data or aggregated outputs.
While de-identification activities can be performed on individual data sets or data sources as a standalone data service, the treatments mentioned above could also be undertaken in the provision of complex data integration, which typically involves treatments to data that remove personal and identifying information and reduce the risk of re-identification of individuals.
Secure access data services
The Act defines secure access data service as:
- the service of providing ‘ADSP‑controlled access’, or
- any other service that enables an entity to access data under the control of another entity and that includes controls to prevent or minimise the risk of the data being misused.
ADSP-controlled access relates to preventing or minimising the risk of data being used to identify individuals. The data is shared with the accredited user by means of the ADSP providing access to identified designated individuals through systems controlled by the ADSP. Those individuals will have appropriate experience, qualifications or training.
In providing secure access data services, both for ADSP-controlled access and any other services an ADSP may provide, the ADSP will implement controls to prevent or minimise the risk of misuse of the shared data (e.g. a researcher extracting data that may be used to reidentify individuals or export unauthorised copies of the data). Such arrangements could involve the ADSP controlling the storage of data in a secure system they manage and establishing controls such as user activity monitoring, output vetting and restricting access to subsets of data or to particular data manipulation and extraction functionality based on individuals’ credentials.
Complex data integration
The Act describes the characteristics of the data to be integrated through a complex data integration service, rather than the underlying activities that constitute ‘data integration’.
Under this approach, a service to integrate data is a complex data integration service if:
- two or more entities control the data being integrated,
- the data is at the unit or micro level, and
- any of the following applies to any of the data to be integrated, or to the integrated data:
- the data includes personal information
- the data includes commercially sensitive information (including trade secrets) about the business, commercial, or financial affairs of an organisation
- the data includes information that is not publicly available about an industry or sector that forms part of the Australian economy
- the data includes information about one or more persons or things the data custodian of the data considers to be vulnerable or sensitive
- the data is to be used for more than one project
- the data meets conditions prescribed by the rules, or
- the data to be integrated, or the integrated data, has any of the characteristics prescribed by the rules4.
In practice, data can be integrated in a variety of ways. Typically, data is integrated by linking records from different data sets using identifiers that are common in each data set (names, addresses, customer numbers, etc). Data can also be integrated using a combination of characteristics that have a high probability of correctly matching records across data sets (e.g. for personal records – age, location, occupation, etc.; for business records – industry, size, location, etc).
Each data integration process will have two main components. The first is identifying records between data sets that relate to the same person or entity and the second is merging the data to create an integrated data set. Each of these phases may involve multiple steps, depending on the complexity of the data and the methodology used.
The following diagram sets out the typical stages in a data integration project where data linkage is used.
Diagram 1:
It is common, particularly for large complex data integration projects, for services to be provided by more than one ADSP. For example, some entities provide only specialist data linkage services, with data merging then undertaken by another service provider.
In recent years the development and use of privacy preserving record linkage techniques has become more common. ADSPs in the DATA Scheme applying these techniques will need to be accredited for complex data integration services.
ADSP accreditation
ADSPs provide a critical intermediary function in the DATA Scheme which enables data custodians and accredited users to fill gaps in their capability, access technological advancements and adhere to robust data and privacy safeguards. Accreditation supports this function by verifying that ADSPs are capable of providing safe and secure data services.
Due to the interlinkages between the data services specified in the Act, entities considering applying for accreditation as an ADSP may wish to consider accreditation for services other than the one(s) they would primarily want to provide. In particular, the following will apply:
- Entities applying for accreditation to provide complex data integration services will also generally need to apply to provide de-identification services, as complex data integration services may deal with unit level personal information and require its de‑identification.
- Entities applying for accreditation to provide secure access data services that routinely merge data sets controlled by two or more entities will also be required to seek accreditation for complex data integration services.
When applying for accreditation to provide complex data integration services, entities may also consider becoming accredited for only part of the services. For example, specialist data linkage centres may seek only to be accredited for the data linkage phases of data integration, while secure access data service providers may seek to be accredited to only undertake data merging services. In these circumstances, the National Data Commissioner can impose conditions that the entity provide only particular parts of a service, or provide those services in a particular way.
Other data services an ADSP can perform under the DATA Scheme
Although certain functions under the DATA Scheme must be provided by an ADSP, ADSPs may provide other services as part of a data sharing project depending on the needs of the data custodian and the accredited user. These may be specified in the data sharing agreement for a project, or may be subject to other arrangements agreed between the parties. Additional services undertaken by an ADSP in relation to a data sharing project could include:
- data analysis
- data product development
- co-ordinating ethics applications
- carrying out privacy impact assessments
- data or methodological development
- quality assurance
- data curation services
- producing technical papers.
ADSPs that are also data custodians may hold data as an agent for another data custodian (sometimes referred to as ‘data curation services’). In this capacity ADSPs may undertake development of data sharing agreements with the accredited user. However, unless the data custodian of the source data has authorised the ADSP to act as an agent for the data custodian under the DATA Scheme, the data custodian of the source data will need to be the data custodian party to the data sharing agreement and to discharge their duties as such under the DATA Scheme.
Data custodians and accredited users should make their own assessments of an ADSP’s capabilities where these services are included as part of a DATA Scheme project.
1 See subsection 16C(3)
2 See subsection 16C(4)
3 See subsection 16D(3)
4 Note, no such matters are currently prescribed.
Guidance note 2024:3
Last updated: 28 August 2024