AI in the energy sector guidance consultation
Appendix 4: data use and management
4.1 This appendix does not cover cyber security concerns.
4.2 In addition to the potential uncertainties associated with AI, the accuracy, availability and completeness of data used to train AI system components is likely to impact the usefulness of the AI materially. This appendix does not cover cyber security concerns.
4.3 AI is also likely to exacerbate issues currently experienced regarding the collection, storage and use of data. AI requires training data, whether purchased from a third-party supplier, tuned, or trained internally. It is essential that the relevant data management and governance practices are in place to ensure that this can be carried out appropriately, and existing data regulations are complied with. For example, during the creation of any AI, it is necessary to consider GDPR and security classification of the data. Currently, practical and robust methods for removing data from AI are not available. Due to this, it is necessary to consider data removal prior to the creation, or purchase of any AI. The implications of the right to erasure will need appropriate measures in place to ensure that this obligation can be met. This could be achieved by using data not subject to GDPR erasure considerations or putting in place appropriate processes for safely and securely deleting the AI and entries from the base dataset, before training the AI on the new dataset. Read more about Right to erasure | ICO
4.4 Existing good practice should be used to mitigate risks associated with data use and management. This good practice relates to input data, the output of AI and data used for training purposes. Additionally, the AI model itself is a data asset in its own right. Stakeholders should ensure:
a. appropriate data governance is in place, covering the entire data life cycle from collection to disposal. Data used to train and operate the AI is fit for use for the application, accurate and complete
b. data used to train AI models is appropriately prepared. Relevant steps are taken such as to remove biases, and address unfairness where it is captured in the data. This should include versioning datasets, recording weaknesses and assumptions
c. training and operational data is appropriately managed (including the AI), and secured, with appropriate consideration to data accumulation risks, for example large or combined datasets becoming more sensitive than their components. This should include access to and use of AI models, with arrangements in place to protect the AI as both a software and data asset
d. data collection methods and associated data generated are appropriately secured against intentional and unintentional compromise, through the relevant measures, for example physical and technical security at weather stations
e. appropriately secure methods are in place for the disposal of both AI systems, and the data processed, with the relevant governance and assurance in place
f. appropriate monitoring and validation arrangements are implemented to identify shifts in behaviour, for example model drift