Are Convolutional Neural Networks or Transformers more like human vision?

This repository contains the code and fine-tuned models of popular Convolutional Neural Networks (CNNs) and the recently proposed Vision Transformer (ViT) on the augmented Imagenet dataset and the shape/texture bias tests run on the Stylized Imagenet dataset.

This work compares CNNs and the ViT against humans in terms of error consistency beyond traditional metrics. Through these tests, we were able to show that recently proposed self-attention based Transformer models have more human-like errors that traditional CNNs.

Colab

You can directly run tests on the results using a Google Colaboratory without needing to install anything on your local machine. Click "Open in Colab" below:

Developer

Shikhar Tuli. For any questions, comments or suggestions, please reach me at [email protected].

Cite this work

If you use our experimental results or fine-tuned models, please cite:

@article{tuli2021cogsci,
      title={Are Convolutional Neural Networks or Transformers more like human vision?}, 
      author={Shikhar Tuli and Ishita Dasgupta and Erin Grant and Thomas L. Griffiths},
      year={2021},
      eprint={2105.07197},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Study of human inductive biases in CNNs and Transformers.

Related tags

Overview

Are Convolutional Neural Networks or Transformers more like human vision?

Colab

Developer

Cite this work

Owner

Shikhar Tuli

POCO: Point Convolution for Surface Reconstruction

World Models with TensorFlow 2

robomimic: A Modular Framework for Robot Learning from Demonstration

Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Main Results on ImageNet with Pretrained Models

CoRe: Contrastive Recurrent State-Space Models

Little tool in python to watch anime from the terminal (the better way to watch anime)

LineBoard - Python+React+MySQL-白板即時系統改善人群行為

PyTorch implementation for the ICLR 2020 paper "Understanding the Limitations of Variational Mutual Information Estimators"

An all-in-one application to visualize multiple different local path planning algorithms

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Robustness via Cross-Domain Ensembles

Dungeons and Dragons randomized content generator

Confident Semantic Ranking Loss for Part Parsing

3.8% and 18.3% on CIFAR-10 and CIFAR-100

Pytorch implementation for RelTransformer

Simple tools for logging and visualizing, loading and training

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

The first dataset on shadow generation for the foreground object in real-world scenes.

A Player for Kanye West's Stem Player. Sort of an emulator.