EEOB publication - Berger-Wolf, Carstens

A simple interpretable transformer for fine-grained image classification and analysis
Dipanjyoti Paul, Arpita Chowdhury, Xinqi Xiong, Samuel Stevens, Kaiya L Provost, Anuj Karpatne, Charles Stewart, Tanya Berger-Wolf, Feng-Ju Chang, David Edward Carlyn, Bryan Carstens, Daniel Rubenstein, Yu Su, Wei-Lun Chao. 2023. Link to article
Abstract
We present a novel usage of Transformers to make image classification inter- pretable. Unlike mainstream classifiers that wait until the last fully-connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR). We learn “class-specific” queries (one for each class) as input to the decoder, enabling each class to localize its patterns in an image via cross-attention. We name our approach INterpretable TRansformer (INTR), which is fairly easy to implement and exhibits several compelling properties. We show that INTR intrinsi- cally encourages each class to attend distinctively; the cross-attention weights thus provide a faithful interpretation of the prediction. Interestingly, via “multi-head” cross-attention, INTR could identify different “attributes” of a class, making it particularly suitable for fine-grained classification and analysis, which we demon- strate on eight datasets. Our code and pre-trained model are publicly accessible at https://github.com/Imageomics/INTR.