Counterfactuals provide an easy to understand local explanation to models prediction using a counter example providing a what-if intuition to the model's predictions.
Counterfactuals provide an alternative way to look at the models decision making process. Instead of explaining what feature contributed most to this specific prediction, counterfactuals show the minimal changes needed in order to reverse the prediction.
If all the changes written in each counterfactual will be applied, the resulting sample would yield a different classification. Any feature that is not mentioned in the list of changes remains the same as the original sample.
The Sparsity of a counterfactual example is defined as the total number of features needed to be changed from the original sample divided by the total number of features.
The Similarity of a counterfactual is defined as max(1 - distance, 0) between the counterfactual and the original example.
To learn more about counterfactuals, please refer to Wachter et al. (2017): Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR.
Comentários