Ego4D Episodic Memory Benchmark

EGO4D is the world's largest egocentric (first person) video ML dataset and benchmark suite.

For more information on Ego4D or to download the dataset, read: Start Here.

The Episodic Memory Benchmark aims to make past video queryable and requires localizing where the answer can be seen within the user’s past video. The repository contains the code needed to reproduce the results in the Ego4D: Around the World in 3,000 Hours of Egocentric Video.

There are 4 related tasks within a benchmark. Please see the README within each benchmark for details on setting up the codebase.

VQ2D: Visual Queries with 2D Localization

This task asks: “When did I last see [this]?” Given an egocentric video clip and an image crop depicting the query object, the goal is to return the last occurrence of the object in the input video, in terms of the tracked bounding box (2D + temporal localization). The novelty of this task is to upgrade traditional object instance recognition to deal with video, and particularly ego-video with challenging view transformations.

VQ3D: Visual Queries with 3D Localization

This task asks, “Where did I last see [this]?” Given an egocentric video clip and an image crop depicting the query object, the goal is to localize the last time it was seen in the video and return a 3D displacement vector from the camera center of the query frame to the center of the object in 3D. Hence, this task builds on the 2D localization above, expanding it to require localization in the 3D environment. The task is novel in how it requires both video object instance recognition and 3D reasoning.

NLQ: Natural Language Queries

This task asks, "What/when/where....?" -- general natural language questions about the video past. Given a video clip and a query expressed in natural language, the goal is to localize the temporal window within all the video history where the answer to the question is evident. The task is novel because it requires searching through video to answer flexible linguistic queries. For brevity, these example clips illustrate the video surrounding the ground truth (whereas the original input videos are each ~8 min).

MQ: Moments Queries

This task asks, "When did I do X?” Given an egocentric video and an activity name (i.e., a "moment"), the goal is to localize all instances of that activity in the past video. The task is activity detection, but specifically for the egocentric activity of the camera wearer who is largely out of view.

License

Ego4D is released under the MIT License.

Episodic-memory - Ego4D Episodic Memory Benchmark

Related tags

Overview

Ego4D Episodic Memory Benchmark

VQ2D: Visual Queries with 2D Localization

VQ3D: Visual Queries with 3D Localization

NLQ: Natural Language Queries

MQ: Moments Queries

Owner

Open-sourcing the Slates Dataset for recommender systems research

Official PyTorch implementation of "ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows"

Run containerized, rootless applications with podman

This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors

Volsdf - Volume Rendering of Neural Implicit Surfaces

This repository allows the user to automatically scale a 3D model/mesh/point cloud on Agisoft Metashape

This is a collection of our NAS and Vision Transformer work.

This repo contains the code for the paper "Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging" that has been accepted to NeurIPS 2021.

[arXiv] What-If Motion Prediction for Autonomous Driving ❓🚗💨

Cours d'Algorithmique Appliquée avec Python pour BTS SIO SISR

Detector for Log4Shell exploitation attempts

N-Person-Check-Checker-Splitter - A calculator app use to divide checks

The original weights of some Caffe models, ported to PyTorch.

Aiming at the common training datsets split, spectrum preprocessing, wavelength select and calibration models algorithm involved in the spectral analysis process

Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

Key information extraction from invoice document with Graph Convolution Network

This repository contains the source codes for the paper AtlasNet V2 - Learning Elementary Structures.

Source code for the paper "Periodic Traveling Waves in an Integro-Difference Equation With Non-Monotonic Growth and Strong Allee Effect"

The Codebase for Causal Distillation for Language Models.