top of page

Feature Leakage

The feature leakage ROC plot is a graphic way to evaluate the potential leakage there might be between each feature and the label.

Data leakage - "...The use of information in the model training process which would not be expected to be available at prediction time..." (source).

One way a leakage can occur is by using features that are proxy of the label or partially give away the label (or the label itself!).

To identify whether there is a leakage between one of the features and the label of the samples, each feature is taken independently to train a model and provide a predictions. The predictions and the labels are then used to plot a receiver operating characteristic curve. If the ROC plot is "too good" or simply shows that this feature alone can be used to predict the label with very high results, we can suspect that there is a leakage.

To learn more about ROC plots press here.


bottom of page