Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Last update: Aug 09, 2022

Related tags

Deep Learning grokking

Overview

Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Paper

Original paper can be found here

Datasets

I'm not super clear on how they defined their division. I am using integer division:

$$x\circ y = (x // y) mod p$$, for some prime $$p$$ and $$0\leq x,y \leq p$$
$$x\circ y = (x // y) mod p$$ if y is odd else (x - y) mod p, for some prime $$p$$ and $$0\leq x,y \leq p$$

Hyperparameters

The default hyperparameters are from the paper, but can be adjusted via the command line when running train.py

Running experiments

To run with default settings, simply run python train.py. The first time you train on any dataset you have to specify --force_data.

Arguments:

optimizer args

"--lr", type=float, default=1e-3
"--weight_decay", type=float, default=1
"--beta1", type=float, default=0.9
"--beta2", type=float, default=0.98

model args

"--num_heads", type=int, default=4
"--layers", type=int, default=2
"--width", type=int, default=128

data args

"--data_name", type=str, default="perm", choices=[
- "perm_xy", # permutation composition x * y
- "perm_xyx1", # permutation composition x * y * x^-1
- "perm_xyx", # permutation composition x * y * x
- "plus", # x + y
- "minus", # x - y
- "div", # x / y
- "div_odd", # x / y if y is odd else x - y
- "x2y2", # x^2 + y^2
- "x2xyy2", # x^2 + y^2 + xy
- "x2xyy2x", # x^2 + y^2 + xy + x
- "x3xy", # x^3 + y
- "x3xy2y" # x^3 + xy^2 + y ]
"--num_elements", type=int, default=5 (choose 5 for permutation data, 97 for arithmetic data)
"--data_dir", type=str, default="./data"
"--force_data", action="store_true", help="Whether to force dataset creation."

training args

"--batch_size", type=int, default=512
"--steps", type=int, default=10**5
"--train_ratio", type=float, default=0.5
"--seed", type=int, default=42
"--verbose", action="store_true"
"--log_freq", type=int, default=10
"--num_workers", type=int, default=4

Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Related tags

Overview

Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Paper

Datasets

Hyperparameters

Running experiments

Arguments:

optimizer args

model args

data args

training args

Owner

Tom Lieberum

[CVPR 2016] Unsupervised Feature Learning by Image Inpainting using GANs

Generative Autoregressive, Normalized Flows, VAEs, Score-based models (GANVAS)

Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

Official PyTorch implementation of the paper "TEMOS: Generating diverse human motions from textual descriptions"

Advantage Actor Critic (A2C): jax + flax implementation

Python calculations for the position of the sun and moon.

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Hide screen when boss is approaching.

Torchlight2 lan game server tool - A message forwarding tool for Torchlight 2 lan game

Transfer Learning library for Deep Neural Networks.

StarGAN - Official PyTorch Implementation (CVPR 2018)

A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

Eth brownie struct encoding example

A collection of resources, problems, explanations and concepts that are/were important during my Data Science journey

An example to implement a new backbone with OpenMMLab framework.

Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"

Boundary-aware Transformers for Skin Lesion Segmentation

DeLiGAN - This project is an implementation of the Generative Adversarial Network

dualPC.R contains the R code for the main functions.

PyVideoAI: Action Recognition Framework