Description
When training a machine learning model the quality of the training data is of utmost
importance. Poor training data can lead to bad generalization of the trained model. A
particular issue with training data is the presence of shortcuts: Shortcuts are features in the
training data that are highly predictive of the training target, but semantically unconnected
to the problem. When training a model on such data, the model often learns to only rely on
the shortcut, as no incentive to learn further features exist. As test data is typically collected
the same way as the training data – and thus also contains the same shortcuts – the reliance
of the model on shortcuts often remains undiscovered.
This thesis presents an approach to detect these shortcuts in training data and automati-
cally neutralize them: A small image-to-image network, called “lens”, is prepended to the
classifier network. The lens is trained adversarially to remove features that the classifier is
paying attention to. The limited capacity of the lens and an additional reproduction loss
ensure only simple and local features can be removed. The output of the lens also gives visual
feedback on which features are removed.
The model is evaluated on data with synthetically added shortcuts as well as a real-world
chest x-ray dataset. We find that the lens is reliably detecting, in-painting, and neutralizing
shortcuts. At the same time, the classifier performance is not impacted if the data does not
contain shortcuts.
|