Verifying and Validating AI in Safety Critical Systems


As countries worldwide begin to establish AI regulations, space engineers designing AI-enabled systems must meet these newly introduced specifications and standards. On October 30, the United States White House issued an executive order on AI regulation, highlighting the importance of robust Verification and Validation (V&V) processes for AI-enabled systems.

The directive mandates AI companies to report and test specific models to ensure that AI systems function as intended and meet specified requirements.

AI regulations and V&V processes will significantly impact safety-critical systems. AI is increasingly used for system design, including in safety-critical applications such as automotive and aerospace industries.

Verification and Validation in AI-Enabled Systems

Verification determines whether an AI model is designed and developed per the specified requirements, whereas validation involves checking whether the product has met the client's needs and expectations. By employing V&V techniques, space engineers can ensure that the AI model’s outputs meet specifications, allowing for early bug detection and mitigation of data bias.

One advantage of using AI in safety-critical systems is that AI models can approximate physical systems and validate the design. Engineers simulate entire AI-enabled systems and use the data to test systems in different scenarios, including outlier events. Performing V&V in safety-critical scenarios ensures that an AI-enabled safety-critical system can maintain its performance level under various circumstances.

Most industries that develop AI-enhanced products require engineers to comply with standards before going to market. These certification processes ensure that specific elements are built into these products. Engineers perform V&V to test the functionality of these elements, which makes it easier to obtain certifications.

In the automotive industry, the ISO/CD PAS 8800 is a standard being developed to address safety-related properties and risk factors for road vehicles. In aerospace and defence, where certification is mandatory, existing standards such as the Software Considerations in Airborne Systems and Equipment Certification (DO178C) cannot always directly address the unique challenges posed by AI. For this reason, the new ARP6983 process standard is being created to provide guidelines for developing and certifying aeronautical safety-related products implementing AI.

Deep Learning Toolbox™ Verification Library and MATLAB® Test™ can help engineers stay at the forefront of V&V in aviation and automotive by developing software that helps to adhere to industry standards, streamlining the verification and testing of AI models within larger systems.

V&V AI Processes in Safety-Critical Systems

When performing V&V, the space engineer’s goal is to ensure that the AI component meets the specified requirements, is reliable under all operating conditions, and, therefore, is safe and ready for deployment. The V&V process for AI involves performing software assurance activities that include a combination of static and dynamic analyses, testing, formal methods, and real-world operational monitoring.

V&V processes may vary slightly across industries, but the overarching steps in the V&V process are:

  • Analysing the decision-making process to solve the Black Box problem
  • Testing the model against representative datasets
  • Conducting AI system simulations
  • Ensuring the model operates within acceptable bounds

The steps in the V&V process described below are iterative, allowing for continuous refinement and improvement of the AI system as engineers collect new data, gain new insights, and integrate operational feedback.

Analysing the Decision-Making Process to Solve the Black Box Problem

When engineers use an AI model to add automation to a system, one issue that arises is the Black Box problem. Understanding how AI-based systems make decisions is crucial to providing transparency, enabling engineers and scientists to build trust in model predictions and comprehend decision-making.

Feature Importance Analysis

Feature Importance Analysis is a technique that helps engineers identify which input variables impact a model’s predictions most significantly. Although the analysis works differently for different models, such as tree-based and linear models, the general procedure assigns a feature importance score to each input variable. A higher importance score signifies that the feature has a greater impact on the model’s decision. In the case of a safety-critical system in the automotive industry, variables may include environmental factors, such as precipitation or the presence and behavior of other vehicles.


Explainability techniques offer insights into the model’s behaviour. This is especially relevant when the black-box nature of the model prevents us from using other approaches. In the context of images, these techniques identify the regions of an image that contribute the most to the final prediction. This enables engineers to understand the model’s primary focus when making a prediction.

Testing the Model Against Representative Datasets

Engineers often evaluate an AI model’s performance in real-world scenarios where the safety-critical system is expected to operate. The goal is to identify limitations and improve the accuracy and reliability of the model. Engineers gather a wide range of real-world representative datasets and clean up the data to make it suitable for testing. Test cases are then designed to evaluate various aspects of the model, such as its accuracy and reproducibility. Finally, the model is applied to the datasets, and the results are recorded and compared to the expected output. The model design is improved according to the outcome of the data testing.

Conducting AI System Simulations

Simulating an AI-enabled system enables engineers to evaluate and assess the system’s performance in a controlled environment. During a simulation, a virtual environment is created that mimics a real-world system under a variety of conditions. Engineers first define the inputs and parameters to simulate a system, such as initial conditions and environmental factors. The simulation is then executed using software such as Simulink®, which outputs the system’s responses to the proposed scenario. As in data testing, the simulation results are compared to expected or known outcomes, and the model is improved iteratively.

Ensuring the Model Operates within Acceptable Bounds

For AI models to operate safely and reliably, it is vital to establish limits and monitor the model’s behaviour to ensure that it stays within those boundaries. One of the most common boundary issues occurs when a model has been trained on a limited dataset and encounters out-of-distribution data at runtime. Similarly, the model may not be robust enough and can potentially lead to unpredictable behaviour.

Engineers employ bias mitigation and robustification techniques to ensure AI models operate within acceptable bounds.

Data Augmentation and Balancing

One way to mitigate data bias is to create variability in the data used to train the AI model, which reduces a model’s dependence on repeating patterns that restrict its learning. The Data Augmentation technique helps ensure fairness and equal treatment of different classes and demographics. In the case of a self-driving car, data augmentation may involve using pictures of pedestrians from various angles to help the model detect a pedestrian regardless of their positioning. The Data Balancing technique is often paired with Data Augmentation and includes similar samples from each data class. Using the pedestrian example, balancing the data means ensuring that the dataset contains a proportionate number of images for each variation of pedestrian scenarios, such as different body shapes, clothing styles, lighting conditions, and backgrounds. This technique minimises bias and improves the model's generalisation ability across diverse real-world situations.


Robustness is a primary concern when deploying neural networks in safety-critical situations. Neural networks are susceptible to misclassification due to small, imperceptible changes that pose significant risks. These disturbances can cause a neural network to output incorrect or dangerous results, which is alarming in systems where errors can lead to catastrophes. One solution is integrating formal methods into the development and validation process. Formal methods involve using rigorous mathematical models to establish and prove the correctness properties of neural networks. By applying these methods, engineers can improve the network’s resilience to certain types of disturbances, ensuring higher robustness and reliability in safety-critical applications.


In the era of AI-enabled safety-critical systems, V&V procedures are becoming crucial to obtaining industry certifications and complying with legal requirements. Building and maintaining trustworthy systems requires engineers to employ verification techniques that provide explainability and transparency for the AI models that run those systems. As engineers use AI to aid in their V&V processes, it’s essential to explore a variety of testing approaches that address the increasingly complex challenges of AI technologies. In safety-critical systems, these efforts ensure AI is used responsibly and transparently.

To learn more about Verifying and Validating AI in Safety Critical Systems across space systems, visit

Receive the latest developments and updates on Australia’s space industry direct to your inbox. Subscribe today to Space Connect here.

Receive the latest developments and updates on Australia’s space industry direct to your inbox. Subscribe today to Space Connect.