Explainable AI (XAI) makes AI systems transparent, explainable, and credible. It ensures users understand why a given model arrived at a specific decision so that it is easier to verify, debug, and trust the system. Let’s look at all that XAI entails followed by a real-world example of how it works.
Agentic AI represents the next evolution of intelligent systems—digital entities that not only follow instructions but also set goals, make decisions, and learn independently. These agents will soon encompass every aspect of human life, from finance and energy to robotics and personalised courses. But with this exciting potential comes a critical challenge: understanding what these agents are thinking. Agentic AI does not simply follow conventional systems but engages in self-directed behaviour influenced by constantly changing goals, interaction with the environment, and acquired knowledge, thereby giving rise to some very complicated and often opaque processes behind decisions. An example: A totally autonomous vehicle may suddenly make an unexpected manoeuvre. Or a healthcare agent may recommend some treatment that is not of common standard protocol. Certainly, knowing what an AI agent has done is insufficient. We want to know why it did so.
But why is this understanding so vital?
First, we are more likely to trust technologies we understand. If agentic systems are black boxes making decisions without clear rationale, it will lead to erosion of public trust and subsequently hinder the spread of such systems, and the benefits derived from them. Would you trust a rationally opaque entity to control your health or finances?
Second, it is something that must be done. Since agents are going to take on new roles, accountability is primary. If an autonomous agent inflicts damage or makes a blunder, who comes under fire when there is no knowledge of the system’s mental processes?
It’s also important to know why an agent made a certain decision so that its logic may be debugged and risks mitigated to ensure it operates safely and reliably.
Finally, we need this understanding for further growth and improvement. Knowledge of the strengths and weaknesses of an agent’s reasoning enables algorithm refinement and helps build better systems. We learn from their successes and failures.
As intelligent agents take on greater roles in society, understanding their decision-making is no longer optional—it’s essential for trust, accountability, and safety. To fully harness their potential, we must ensure their actions are transparent and explainable. This is the critical role of explainable AI (XAI), which enables us to interpret and trust these increasingly autonomous systems.
By giving results without informing how it gets them, AI, especially deep learning, often works as a ‘black box’. Explainable AI is designed to fulfil the goal of rendering AI decisions evident and understandable to humans — it helps understand why an AI system came to a particular conclusion.
The methodologies explainable AI uses are briefly outlined below.
Intrinsic explainability (white-box models)
This approach focuses on using inherently interpretable model architectures. These ‘white-box’ models are designed from the ground up to be transparent in their decision-making process. Examples include:
- Decision trees: These models represent decisions as a series of hierarchical rules, making the path to a prediction easily traceable.
- Linear regression: The relationship between input features and the output is clearly defined by coefficients, indicating the direction and magnitude of each feature’s influence.
- Rule-based systems: Decisions are made based on a set of explicit ‘if-then’ rules that are directly understandable by humans.
While these models offer high interpretability, they often come with limitations in terms of the complexity of the patterns they can learn and their predictive power compared to black-box models.
Post-hoc explainability (black-box models)
These methods illuminate ‘the black box’ without modifying its internal workings. Some important post-hoc methods include:
- Feature importance: These methods identify which input features had the most significant influence on a model’s prediction. Techniques like SHAP (SHapley Additive exPlanations) and permutation importance fall into this category.
- Saliency maps: Primarily used in computer vision, these techniques highlight the regions in an input image that were most important for the model’s classification decision.
- Local Interpretable Model-agnostic Explanations (LIME): LIME approximates the decision boundary of a complex model locally around a specific instance by training a simpler, interpretable model on perturbed versions of that instance.
- Counterfactual explanations: These explanations identify the minimal changes to an input that would lead to a different prediction, helping users understand ‘what if’ scenarios.
- Attention mechanisms: In models like transformers used in natural language processing, attention weights can provide insights into which parts of the input sequence the model focused on when making a prediction.
Selecting the appropriate explainability technique depends on the specific use case and data modality. Table 1 maps common use cases to recommended XAI methods along with their rationale.
Table 1: Mapping common use cases to recommended XAI methods
Use case |
Recommended technique |
Why? |
Global |
SHAP (SHapley Additive exPlanations) |
Offers additive, consistent global explanations |
Local |
LIME (Local Interpretable Model-agnostic Explanations) |
Explains individual predictions using local linearity |
Image feature attribution |
Saliency maps |
Highlights important pixels influencing decision |
Tabular data with correlated features |
SHAP |
Handles feature interaction better than LIME |
Table 2: Comparison of intrinsic vs post-hoc explainability
Aspect |
Intrinsic |
Post-hoc |
Definition |
Built into the model architecture itself |
Interpretation applied after training |
Model type |
Transparent (e.g., decision trees, linear models) |
Black-box (e.g., neural networks, ensemble models) |
Interpretability level |
High, as the logic is human-understandable |
Varies, depends on technique used |
Performance trade-off |
May sacrifice performance for interpretability |
Typically maintains model performance |
Example |
Logistic regression showing feature coefficients |
SHAP explaining a CNN’s image classification |
XAI promotes ethical use and responsibility in AI applications. As agentic AI grows, XAI will become even more important. Since these systems make decisions based on complex learning and interactions, understanding their reasoning is crucial. The key objectives of XAI are shown in Figure 1.


The challenge of explaining agent autonomy
The shift from traditional AI to agentic AI adds complexity to explainability.
Autonomous agents are computer programs that can make decisions and act on their own without overt human intervention. With more and more sophisticated and pervasive agents—self-driving vehicles, trading algorithms, and medical diagnoses—the need to explain and understand their actions becomes critical. The key challenges are:
- Opaque decision logic: Most autonomous systems are powered by complex models like deep neural networks, which are ‘black boxes’. It is often extremely difficult to explain why a specific decision was made.
- Dynamic environments: Autonomous agents routinely interact with dynamic and unstable environments, so explanations must be more context-dependent and less generalizable.
- Real-time constraints: Decisions must be made in milliseconds. Generating explanations that are both helpful and in real-time adds immense additional complexity.
- Multi-agent interactions: In systems with large numbers of interacting agents (e.g., swarm robotics or AI for games), an agent’s action may depend on others’ intentions as well.
- User trust and accountability: Without explanations, users do not trust and developers cannot verify behaviour or meet regulations.
Methods such as SHAP, LIME, counterfactual reasoning, and rule extraction assist in solving the puzzle, but complete transparency in agentic AI is still an active research frontier.
The importance of explainable AI for agentic systems becomes even clearer when we examine real-world applications and potential case studies. The case studies in Figure 2 illustrate the application of XAI in a wide range of real-world scenarios. These examples illustrate how XAI techniques can provide crucial insights, build trust, and address ethical concerns in various domains where autonomous agents are being deployed or are on the horizon.
Understanding ‘why’ an agent behaves is not only a theoretical benefit, but a practical necessity in order to achieve the full potential of agentic AI while reducing its dangers.
XAI in code: A developer’s hands-on approach
Let’s now look at how XAI can be applied in early sepsis detection using Python libraries.
Sepsis is a life-threatening medical condition affecting over 1.7 million adults in the US annually, causing more than 270,000 deaths. Early detection is vital but challenging due to its subtle symptoms. AI and machine learning can identify sepsis risk early, but clinical adoption depends on trust and understanding of model predictions. Explainable AI (XAI), particularly using SHAP (SHapley Additive exPlanations), enables doctors to interpret these predictions.
Explainability is crucial in sepsis prediction because even though AI models trained on electronic health records can outperform traditional alerts, clinicians may not trust or act on alerts they don’t understand. To gain their confidence, it’s essential to clarify which features contributed to a high sepsis risk, ensure the insights are clinically relevant, and show how reliable the predictions are based on a patient’s current and historical vitals.
There are several open source libraries for the interpretation and explanation of machine learning models, including LIME, ELI5, Anchor Explanations, InterpretML, DALEX, AIX360, and PyCaret, to name a few. Each of these libraries has something distinct and different to offer, from local model interpretability to high-level visualisation and debugging capabilities. But explaining all these tools is out of the domain of this article. For the sake of simplicity and ease of explanation, we will focus on demonstrating a straightforward example with the help of SHAP.
SHAP is an open source Python library for model interpretability, based on the concept of Shapley values from cooperative game theory. It explains the output of any machine learning model by computing the contribution of each feature to a specific prediction. SHAP was developed by Scott Lundberg, a researcher at Microsoft Research, and the methodology was introduced in a paper he wrote in 2017. Here’s the Python program:
!pip install shap xgboost pandas matplotlib scikit-learn --quiet import shap import xgboost import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Optional visual theme plt.style.use(‘seaborn-whitegrid’) # ------------------------ # Simulated Dataset # ------------------------ np.random.seed(42) n = 1000 X = pd.DataFrame({ ‘heart_rate’: np.random.normal(90, 15, n), ‘resp_rate’: np.random.normal(22, 5, n), ‘temp’: np.random.normal(98.6, 1, n), ‘wbc’: np.random.normal(11, 4, n), ‘lactate’: np.random.normal(1.5, 0.8, n), ‘age’: np.random.normal(65, 20, n), }) # Target Label Logic X[‘sepsis_risk’] = ( (X[‘heart_rate’] > 100).astype(int) + (X[‘resp_rate’] > 24).astype(int) + (X[‘lactate’] > 2.5).astype(int) + (X[‘wbc’] > 12).astype(int) ) y = (X[‘sepsis_risk’] >= 2).astype(int) X.drop(columns=’sepsis_risk’, inplace=True) # ------------------------ # Train-Test Split # ------------------------ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) model = xgboost.XGBClassifier(use_label_encoder=False, eval_metric=’logloss’) model.fit(X_train, y_train) # Accuracy preds = model.predict(X_test) print(f”Test Accuracy: {accuracy_score(y_test, preds):.2f}”) # ------------------------ # Show Dataset Samples print(“\nPreview of Dataset Features:”) print(X.head()) print(“\nFirst Test Sample (used for SHAP waterfall/force):”) print(X_test.iloc[0]) # SHAP Explainability explainer = shap.Explainer(model, X_train) shap_values = explainer(X_test)
# 1. Waterfall plot (local)
plt.figure(figsize=(10, 5)) print(“\n[1] Waterfall Plot: Local SHAP Explanation for First Instance”) shap.plots.waterfall(shap_values[0], show=False) plt.title(“Waterfall Plot: SHAP Value Breakdown for First Prediction”, fontsize=14) plt.tight_layout() plt.show()
# 2. Beeswarm plot (global)
plt.figure(figsize=(10, 6)) print(“[2] Beeswarm Plot: Global SHAP Feature Importance”) shap.plots.beeswarm(shap_values, show=False) plt.title(“Beeswarm Plot: Global Feature Impact”, fontsize=14) plt.xlabel(“SHAP Value (Effect on Prediction Output)”, fontsize=12) plt.ylabel(“Features”, fontsize=12) plt.tight_layout() plt.show()
# 3. Force plot (local)
print(“[3] Force Plot: Push and Pull Effect for First Instance”) shap.plots.force(shap_values[0], matplotlib=True)
Figure 3 shows the program output.


Explainable AI with waterfall, beeswarm, and force plots
An XGBoost model is trained to predict whether a patient is at high risk of sepsis based on clinical vitals (heart rate, respiration, WBC count, etc). Since decisions in healthcare have high stakes, explainability is not optional — it is critical. SHAP (SHapley Additive exPlanations) provides transparent and mathematically grounded explanations that help:
- Understand individual predictions (local interpretability)
- Understand global model behaviour (global interpretability)
Waterfall plot: The SHAP waterfall plot gives a local explanation for the prediction for one individual patient. It visually breaks down how each feature pushed the prediction up or down from the model’s baseline.
Let’s dive into this plot in the context of explainable AI in healthcare (sepsis prediction).
- The model starts with a baseline prediction (E[f(x)]) of –3.734, which is the average prediction across all patients.
- Then, each feature’s SHAP value adds or subtracts from this base to reach the final prediction f(x) = –4.633.
- Arrows show positive (red) or negative (blue) influence on the prediction.
The strong negative influence of WBC and heart rate outweigh the positive influence of respiratory rate, leading to an overall low risk of sepsis (–4.633).
Table 3: XAI significance of waterfall plot
XAI aspect |
Explanation with waterfall plot |
Transparency |
We see exactly which features drove the decision and in which direction. |
Local |
Tailored to this specific patient — not a general trend. |
Consistency |
Matches domain knowledge (e.g., very low WBC decreases risk). |
Trust |
Clinicians can understand the rationale — increasing confidence in model output. |
Auditability |
Useful in regulated environments to defend decisions on individual predictions. |

Beeswarm plot: Now let us see what the beeswarm plot is showing.
- Each dot = one patient in the dataset.
- Y-axis = features used by the model.
- X-axis = SHAP value (how much a feature pushed prediction up or down).
This plot provides a global view of feature importance across all patients in the test set. Each dot = one patient.
Feature |
Role |
SHAP trend |
resp_rate |
Most important overall |
High values → higher sepsis risk |
wbc |
Strongly bi-directional |
Low values → lower risk |
heart_rate |
Important signal |
High values → higher risk |
lactate |
Moderate influence |
Non-linear behaviour |
temp, age |
Minimal contribution |
Weak or neutral |
Feature |
Insight |
resp_rate |
Most important feature. High values consistently push predictions strongly up → higher sepsis risk. |
wbc |
Bidirectional impact. High WBC pushes risk up (infection likely). Low WBC pushes risk down. |
heart_rate |
Important signal. High heart rate = elevated sepsis risk. |
lactate |
Moderate influence. Shows non-linear behaviour—some high values impact prediction strongly, others don’t. |
temp |
Weak impact. Most dots cluster around 0 SHAP value. |
age |
Minimal contribution overall. Likely not a key predictor in this model for sepsis. |

Force plot: The force plot visualises how different features (input variables) ‘push’ a model’s prediction away from the base value (the average model prediction across the training data).
- Base value (grey line): The average model output across all patients. In this case, it’s around –3.73.
- f(x) (black text): The final model prediction for this individual = –4.63 (This means the model predicts low sepsis risk).
- Red sections (left push): Features that increase the predicted value (push toward higher sepsis risk).
resp_rate = 24.91 → +4.2 SHAP value → strongly pushed toward higher risk. - Blue sections (right push): Features that decrease the predicted value (push toward lower risk).
- wbc = 1.03 → –3.3 SHAP value
- heart_rate = 98.15 → –1.56 SHAP value
- What this means clinically is that:
- High resp rate (red dots on the right)
pushes risk up. - Low WBC (blue dots on the left) pushes
risk down. - The patient had a high respiration rate, which is typically a red flag for sepsis.
- However, very low white blood cell (WBC) count and moderate heart rate provided strong
counter-evidence. - Net effect: Despite a major risk signal (resp_rate), the stronger negative signals led the model to predict low risk of sepsis.
The force plot gives an interactive summary of how individual features push the prediction away from the base value. Here, features like wbc and heart_rate pulled the model away from predicting sepsis, despite resp_rate pushing in the other direction.
So the final impression is:
- Only one feature (resp_rate) increased the predicted risk,
- But strong negative evidence (low WBC) led the model to conclude low likelihood of sepsis.
The emergence of agentic AI, with the capability to pursue goals, interact with environments, and learn through experience, is a significant revolution in artificial intelligence. Yet, the sophistication of such systems generates trust, accountability, safety, and ethical problems. Explainable AI (XAI) helps overcome these challenges by making the decision-making process of agents transparent, ensuring correct behaviour, determining errors, and promoting trust and cooperation.
Table 4: How XAI helped explain the result
XAI criteria |
What SHAP output shows |
Transparency |
Doctor can see exactly what raised/lowered risk |
Actionability |
WBC is very low — probably not sepsis |
Trust |
Explains decision logic — clinicians more likely to trust model |
Auditable |
Can show regulators why a patient was (or wasn’t) flagged |
Bias detection |
Beeswarm shows if age, etc, is unfairly weighted |
The particularity of agentic systems—such as dynamic behaviour, goal-orientedness, and learning—requires XAI techniques tailored to such systems. Traditional XAI methods are evolving to accommodate these intricacies, such as enhanced temporal action explanation, goal-oriented behaviour explanation, and visualisation of internal states. Increased adoption of XAI in real-world applications, ranging from autonomous cars to healthcare and industry, underscores its significance. Yet, more research is necessary to enhance XAI methods, particularly in managing complex behaviours and adapting explanations to various users.
Ultimately, whether agentic AI will succeed or not rests with our capacity to make these systems explainable. By making explainability a priority, we can make these agents trustworthy, ethical, and aligned with human values, setting the stage for their use in society.
Disclaimer: The insights and perspectives shared in this article represent the authors independent views and interpretations and not those of their organization or employer.