On May 22, 2025, the Cybersecurity and Infrastructure Security Agency (“CISA”), which sits within the Department of Homeland Security (“DHS”) released guidance for AI system operators regarding managing data security risks. The associated press release explains that the guidance provides “best practices for system operators to mitigate cyber risks through the artificial intelligence lifecycle, including consideration on securing the data supply chain and protecting data against unauthorized modification by threat actors.” CISA published the guidance in conjunction with the National Security Agency, the Federal Bureau of Investigation, and cyber agencies from Australia, the United Kingdom, and New Zealand. This guidance is intended for organizations using AI systems in their operations, including Defense Industrial Bases, National Security Systems owners, federal agencies, and Critical Infrastructure owners and operators. This guidance builds on the Joint Guidance on Deploying AI Systems Security released by CISA and several other U.S. and foreign agencies in April 2024.
The guidance’s stated goals include raising awareness of the potential data security risks of AI systems, providing best practices for securing AI, and establishing a strong foundation for data security in AI systems. The first part of the guidance outlines a set of cybersecurity best practices for AI systems, after which the guidance provides additional detail on three separate risk categories for AI systems (data supply chain risks, maliciously modified data, and data drift) and describes mitigation recommendations for each risk category.
The guidance outlines ten cybersecurity best practices that are specific to AI systems, and refers to the NIST SP 800-53 “Security and Privacy Controls for Information Systems and Organizations” for additional details on general cybersecurity best practices (though does not specify any particular applicable baseline). Several of the best practices, such as “source reliable data and track data provenance,” and “verify and maintain data integrity during storage and transport,” align with the data supply chain risks discussed in greater detail further below in the guidance. Many of the other best practices build on security practices described in NIST SP 800-53 other common security frameworks, such as classifying data, leveraging access controls and trusted infrastructure, encrypting data, and storing and deleting data securely. The guidance’s best practices also reference leveraging privacy-preserving techniques, such as data depersonalization or differential privacy, and conducting ongoing data security risk assessments.
In the section of the guidance devoted to data supply chain risks, the guidance discusses general risks and identifies three specific risks. The general risks section warns that “one cannot simply assume that [web-scale] datasets are clean, accurate, and free of malicious content.” The guidance offers several mitigation strategies, including dataset verification, using content credentials to track the provenance of data, requesting assurances of a foundation model trained by another party, requiring certification from dataset providers, and securely storing data after ingest.
In addition to this general risk, the guidance identifies “curated web-scale datasets” as the first of three specific data supply chain risks. The guidance notes that curated AI datasets are vulnerable to a technique known as “split-view poisoning” which can arise when someone purchases an expired domain and manipulates the data. The second risk is “collected web-scale datasets,” which are vulnerable to “frontrunning poisoning techniques.” This occurs when malicious examples are injected just before crowd-sourced content is collected from a website. The third risk is “web-crawled datasets,” which is described as an inherently risky type of dataset because it is less curated. The guidance provides a variety of mitigation strategies that include broad recommendations like dataset verification to detect abnormalities to more specific recommendations including using raw data hashes with hash verification.
Next, the guidance identifies risks and mitigation strategies for maliciously modified data, explaining that “deliberate manipulation of data can result in inaccurate outcomes, poor
decisions, and compromised security.” The risks include adversarial machine learning threats, bad data statements, statistical bias, data poisoning from inaccurate information, and data duplications. The guidance proposes various mitigation strategies to address these risks. For example, it recommends sanitizing the training data to reduce the impact of outliers and poisoned inputs. Similarly, it suggests that metadata validation may be helpful to check the completeness and consistency of metadata before it is used for AI training.
Finally, the guidance describes risks associated with data drift. The guidance explains that data drift occurs naturally over time as the statistical properties of input data become different from those of the original data used to train the model. The guidance suggests that data drift can be mitigated by “incorporating application-specific data management protocols” including continuous monitoring, retraining a model with new data, and data cleansing. The mitigation strategies in this section focus on data management which can include continuous monitoring and data cleansing. Many of the mitigation strategies posed in the earlier sections are good practices that can be applied here as well.
Overall, the guidance notes that “organizations can fortify their AI systems against potential threats and safeguard sensitive, proprietary, and mission critical data used in the development and operation of their AI systems” by identifying risks and adopting best practices. The guidance serves as a reminder to organizations of the importance of data security to maintaining the accuracy, reliability, and integrity of AI, and the unique cybersecurity risks that are applicable to these types of systems.