Minimal PyTorch implementation of YOLOv3

Overview

PyTorch-YOLOv3

A minimal PyTorch implementation of YOLOv3, with support for training, inference and evaluation.

Ubuntu CI PyPI pyversions PyPI license

Installation

Installing from source

For normal training and evaluation we recommend installing the package from source using a poetry virtual enviroment.

git clone https://github.com/eriklindernoren/PyTorch-YOLOv3
cd PyTorch-YOLOv3/
pip3 install poetry --user
poetry install

You need to join the virtual enviroment by runing poetry shell in this directory before running any of the following commands without the poetry run prefix. Also have a look at the other installing method, if you want to use the commands everywhere without opening a poetry-shell.

Download pretrained weights

./weights/download_weights.sh

Download COCO

./data/get_coco_dataset.sh

Install via pip

This installation method is recommended, if you want to use this package as a dependency in another python project. This method only includes the code, is less isolated and may conflict with other packages. Weights and the COCO dataset need to be downloaded as stated above. See API for further information regarding the packages API. It also enables the CLI tools yolo-detect, yolo-train, and yolo-test everywhere without any additional commands.

pip3 install pytorchyolo --user

Test

Evaluates the model on COCO test dataset. To download this dataset as well as weights, see above.

poetry run yolo-test --weights weights/yolov3.weights
Model mAP (min. 50 IoU)
YOLOv3 608 (paper) 57.9
YOLOv3 608 (this impl.) 57.3
YOLOv3 416 (paper) 55.3
YOLOv3 416 (this impl.) 55.5

Inference

Uses pretrained weights to make predictions on images. Below table displays the inference times when using as inputs images scaled to 256x256. The ResNet backbone measurements are taken from the YOLOv3 paper. The Darknet-53 measurement marked shows the inference time of this implementation on my 1080ti card.

Backbone GPU FPS
ResNet-101 Titan X 53
ResNet-152 Titan X 37
Darknet-53 (paper) Titan X 76
Darknet-53 (this impl.) 1080ti 74
poetry run yolo-detect --images data/samples/

Train

For argument descriptions have a lock at poetry run yolo-train --help

Example (COCO)

To train on COCO using a Darknet-53 backend pretrained on ImageNet run:

poetry run yolo-train --data config/coco.data  --pretrained_weights weights/darknet53.conv.74

Tensorboard

Track training progress in Tensorboard:

poetry run tensorboard --logdir='logs' --port=6006

Storing the logs on a slow drive possibly leads to a significant training speed decrease.

You can adjust the log directory using --logdir when running tensorboard and yolo-train.

Train on Custom Dataset

Custom model

Run the commands below to create a custom model definition, replacing with the number of classes in your dataset.

./config/create_custom_model.sh <num-classes>  # Will create custom model 'yolov3-custom.cfg'

Classes

Add class names to data/custom/classes.names. This file should have one row per class name.

Image Folder

Move the images of your dataset to data/custom/images/.

Annotation Folder

Move your annotations to data/custom/labels/. The dataloader expects that the annotation file corresponding to the image data/custom/images/train.jpg has the path data/custom/labels/train.txt. Each row in the annotation file should define one bounding box, using the syntax label_idx x_center y_center width height. The coordinates should be scaled [0, 1], and the label_idx should be zero-indexed and correspond to the row number of the class name in data/custom/classes.names.

Define Train and Validation Sets

In data/custom/train.txt and data/custom/valid.txt, add paths to images that will be used as train and validation data respectively.

Train

To train on the custom dataset run:

poetry run yolo-train --model config/yolov3-custom.cfg --data config/custom.data

Add --pretrained_weights weights/darknet53.conv.74 to train using a backend pretrained on ImageNet.

API

You are able to import the modules of this repo in your own project if you install the pip package pytorchyolo.

An example prediction call from a simple OpenCV python script would look like this:

import cv2
from pytorchyolo import detect, models

# Load the YOLO model
model = models.load_model(
  "/yolov3.cfg", 
  "/yolov3.weights")

# Load the image as an numpy array
img = cv2.imread("")

# Runs the YOLO model on the image 
boxes = detect.detect_image(model, img)

print(boxes)

For more advanced usage look at the method's doc strings.

Credit

YOLOv3: An Incremental Improvement

Joseph Redmon, Ali Farhadi

Abstract
We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared to 57.5 AP50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at https://pjreddie.com/yolo/.

[Paper] [Project Webpage] [Authors' Implementation]

@article{yolov3,
  title={YOLOv3: An Incremental Improvement},
  author={Redmon, Joseph and Farhadi, Ali},
  journal = {arXiv},
  year={2018}
}
Owner
Erik Linder-Norén
ML engineer at Apple. Excited about machine learning, basketball and building things.
Erik Linder-Norén
[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data (NeurIPS 2021) This repository will provide the official PyTorch implementa

Liming Jiang 238 Nov 25, 2022
Offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation

Shunted Transformer This is the offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation by Sucheng Ren, Daquan Zhou, Shengf

156 Dec 27, 2022
[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

FiGNN for CTR prediction The code and data for our paper in CIKM2019: Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Predicti

Big Data and Multi-modal Computing Group, CRIPAC 75 Dec 30, 2022
PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick." [Project page] [Paper

Gyungin Shin 59 Sep 25, 2022
Simple (but Strong) Baselines for POMDPs

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs Welcome to the POMDP world! This repo provides some simple baselines for POMDPs, specific

Tianwei V. Ni 172 Dec 29, 2022
Exploration-Exploitation Dilemma Solving Methods

Exploration-Exploitation Dilemma Solving Methods Medium article for this repo - HERE In ths repo I implemented two techniques for tackling mentioned t

Aman Mishra 6 Jan 25, 2022
A model which classifies reviews as positive or negative.

SentiMent Analysis In this project I built a model to classify movie reviews fromn the IMDB dataset of 50K reviews. WordtoVec : Neural networks only w

Rishabh Bali 2 Feb 09, 2022
A repository that finds a person who looks like you by using face recognition technology.

Find Your Twin Hello everyone, I've always wondered how casting agencies do the casting for a scene where a certain actor is young or old for a movie

Cengizhan Yurdakul 3 Jan 29, 2022
Code of the paper "Deep Human Dynamics Prior" in ACM MM 2021.

Code of the paper "Deep Human Dynamics Prior" in ACM MM 2021. Figure 1: In the process of motion capture (mocap), some joints or even the whole human

Shinny cui 3 Oct 31, 2022
This repo will contain code to reproduce and build upon understanding transfer learning

What is being transferred in transfer learning? This repo contains the code for the following paper: Behnam Neyshabur*, Hanie Sedghi*, Chiyuan Zhang*.

4 Jun 16, 2021
This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition This is the research repository for Vid2

Future Interfaces Group (CMU) 26 Dec 24, 2022
AI assistant built in python.the features are it can display time,say weather,open-google,youtube,instagram.

AI assistant built in python.the features are it can display time,say weather,open-google,youtube,instagram.

AK-Shanmugananthan 1 Nov 29, 2021
An excellent hash algorithm combining classical sponge structure and RNN.

SHA-RNN Recurrent Neural Network with Chaotic System for Hash Functions Anonymous Authors [摘要] 在这次作业中我们提出了一种新的 Hash Function —— SHA-RNN。其以海绵结构为基础,融合了混

Houde Qian 5 May 15, 2022
The VeriNet toolkit for verification of neural networks

VeriNet The VeriNet toolkit is a state-of-the-art sound and complete symbolic interval propagation based toolkit for verification of neural networks.

9 Dec 21, 2022
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR)

Ilya Kostrikov 3k Dec 31, 2022
Code to reproduce the results for Compositional Attention

Compositional-Attention This repository contains the official implementation for the paper Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal 58 Nov 30, 2022
Using pytorch to implement unet network for liver image segmentation.

Using pytorch to implement unet network for liver image segmentation.

zxq 1 Dec 17, 2021
Code for Paper "Evidential Softmax for Sparse MultimodalDistributions in Deep Generative Models"

Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models Abstract Many applications of generative models rely on the marginali

Stanford Intelligent Systems Laboratory 9 Jun 06, 2022
Applying PVT to Semantic Segmentation

Applying PVT to Semantic Segmentation Here, we take MMSegmentation v0.13.0 as an example, applying PVTv2 to SemanticFPN. For details see Pyramid Visio

35 Nov 30, 2022
S-attack library. Official implementation of two papers "Are socially-aware trajectory prediction models really socially-aware?" and "Vehicle trajectory prediction works, but not everywhere".

S-attack library: A library for evaluating trajectory prediction models This library contains two research projects to assess the trajectory predictio

VITA lab at EPFL 71 Jan 04, 2023