Artificial intelligence (AI) is increasingly becoming a part of our day-to-day lives. It has been seamlessly integrated into our technology and our lives in general. AI’s presence is only growing with time: one day we’re asking ChatGPT to help us write an email and the next we’re asking a chatbot for a refund on a damaged skirt.
Trust is a big part of our budding relationship with AI and the algorithms that power it underneath the hood. We want to make sure that AIs are giving us consistent, harmonious, and accurate information, especially when they’re charged with making big decisions. This confidence requires transparency into what exactly AI and algorithms are doing.
That’s where interpretability and explainability come in. This article will touch on those two concepts, how they differ, and how each can be used to create a better relationship between humans and AI.
Interpretability is the ability to translate an AI model’s inner workings into simple explanations that boost human understanding. Models may have very elaborate and complex pieces behind the scenes. These pieces could use extremely complex logic or mathematical explanations to process data. Interpretability is being able to “decode” each component’s logic into a humanistic and high-level explanation. The interpretability of a model depends on the translatability of each component within its inner workings. The more interpretable a model is, the more easily explained each piece of the model is.
An example of this could be a neural network powering a reverse image search app. At a high level, reverse image search takes a given image as input and analyzes features of the image to find results across the Internet that either match or have very similar traits. With such a large data set to work with, we can imagine that the neural network powering this is quite large and has many different layers and nodes to process aspects of a photo and find similar ones.
If we continuously upload photos of objects that are the color green to the app, we may notice a common node in the network that continuously activates when the model is processing each image for recognition. Although we may not be able to explain the exact matrix and dot product calculations that typically happen in these layers, we can provide an interpretable explanation: this node has the ability to recognize the color green, since that is the commonality between all of the provided inputs, no matter how different the objects might be otherwise.
Any highly interpretable algorithm means that many or all of its parts can be translated into humanistic explanations, which is a huge step towards trust and transparency.
The benefit of interpretability lies in two main groups:
Explainability means being able to explain the machine learning model in human terms (particularly for end users) without focusing on its inner workings. Explainability focuses on providing higher-level insight into the model’s decision-making, to easily explain the decision to humans.
Explainability examines only the inputs and outputs of a model to understand how it works. It focuses on creating clear and intuitive explanations that are catered to users of AI products and non-technical audiences. No justification from model parameters or any internal workings is needed to exercise explainability.
An example of explainability is a model that examines medical imaging for cancer detection. In this example, the model examines given inputs of images of healthy imaging as well as imaging with tumors. Then, it outputs the likelihood of a tumor identified in the image. To apply explainability to this model, we can identify patterns between the inputs and outputs to get a glimpse into the model’s decision-making process. The model may identify certain patterns in input images, such as crisper tumor margins or particular shapes that map to common tumors versus other medical conditions that present similarly. Being able to explain what the model does and generally what it seems to be looking for in human terms allows the specialist examining images to provide a more informed justification for diagnosis, as well as a clearer treatment plan for the patient. Both the patient and specialist may not know the intricacies of how the algorithm that powers the model works, but they can still trust how the model came to its decision.
That means that the main benefit of explainability lies with the end users of the algorithm. Explainability works to deepen their understanding of the products they’re using, which builds trust.
Interpretability and explainability are often used interchangeably but are important to differentiate, as they serve two distinct purposes.
Interpretability is focused on full transparency behind the model itself, specifically its inner workings and how each component works to produce the output. Therefore, it is often used more by those who are working directly with the models, such as engineers and those who evaluate the algorithm’s compliance with ethical principles.
Explainability is focused more on the general, high-level reasoning behind how a model took an input and came up with its corresponding output. Unlike interpretability, no context into the model’s inner workings is needed. It relies heavily on external observations to create an explanation of the behavior for end users.
Explainability is important for any model, but gaining interpretability might not be worth the sacrifice in performance. Because the way we humans divide up concepts in our heads is usually not the most efficient, the more an AI makes use of those human concepts internally (and therefore becomes interpretable by humans), the more likely it is to be slow and underperforming.
It’s important to note that these concepts are not mutually exclusive. It’s possible to have one without the other. For example, a very complicated neural network could be hard to interpret and translate its internal workings, but the high-level relationship between inputs and outputs might be easy to explain to the end user.
As mentioned above, it’s easy to confuse the two, as they provide similar benefits. Both explainability and interpretability address common concerns between humans and AI, such as accountability, reduction of bias, ethics, fairness, and responsibility. They help make AI feel more approachable.
Interpretability and explainability are valuable concepts to help humans improve their understanding and trust of AI models. This trust is the foundation of a continuous, symbiotic, and positive relationship with humanity and its use of AI and machine learning to streamline and improve our lives.
As we continue to add more and more AI into our daily lives, we should also aim for interpretability and/or explainability to ensure that AI continues to be a constructive tool. This will help ensure progress towards even more impactful and responsible AI in the future.
Ashley Huynh
Freelance Writer at Authors CollectivePowered by Algolia AI Recommendations
Vincent Caruana
Senior Digital Marketing Manager, SEOCatherine Dee
Search and Discovery writerJulien Lemoine
Co-founder & former CTO at Algolia