某学校选课系统GIF验证码数据集 + Baseline模型 + 上下游相关工具

Overview

elective-dataset-2021spring

某学校2021春季选课系统GIF验证码数据集(29338张) + 准确率98.4%的Baseline模型 + 上下游相关工具。

数据集采用 知识共享署名-非商业性使用 4.0 国际许可协议 进行许可。

Baseline模型和上下游相关工具采用 MIT License 进行许可。

数据集

dataset/ 目录包含了收集到的所有带标签验证码数据,共29338张。

  • dataset/manual: 人工标注的带标签验证码GIF数据集,标签经过了elective验证因此都是正确的。共5471张。
  • dataset/auto-corrdataset/auto-fail-tagged: 模型自动标注的带标签验证码GIF数据集,其中 auto-corr 是识别正确(通过了elective验证)的部分,auto-fail-tagged 是识别错误然后手工重新标注的部分(此部分不保证正确性)。共22931(正确)+936(错误)张。

使用时请注意,由于 GitHub 的限制

  • auto-fail-tagged 在仓库中存储为7-zip压缩包;
  • manual 在仓库中存储为7个不超过48MB的7-zip分卷;
  • auto-corr 没有存储在仓库中,而是压缩为14个不超过95MB的7-zip分卷放在了 Release页面

Baseline 模型

baseline/ 目录包含一个简易的验证码识别模型。

此模型进行了提取关键帧、基于OpenCV的图像增强以及基于CNN的分类器等一系列工作以完成识别。

将训练集和测试集图片分别放入 set-trainset-test 后运行 train.py 进行训练,用一块TITAN RTX训练需要几分钟的时间。

用大约一万张图片训练好的 checkpoints/model_29.pth 能达到 98.4% 的整体精确度。

predict_bootstrap.py 在elective系统上测试当前模型,将检验正确的带标签图片放入 bootstrap_img_succ 目录,错误的图片放入 bootstrap_img_fail 目录。

上下游相关工具

  • crawl/: 验证码众包标注平台,可以从elective爬取验证码、辅助多名用户同时标注、检验正确性后将正确的数据放入 img_correct 目录。检验错误的验证码将被抛弃,这是初期的一个设计失误,这样将使得数据集的分布与真实分布有偏差。
  • retag/: 手工标注模型识别错误数据的工具。从 bootstrap_img_fail 读取标注错误图片,人工输入正确标注后移动到 bootstrap_img_fail_tagged
  • serve/: 提供在线验证码识别服务的 HTTP RPC 服务器。POST /fire 并传入base64编码的验证码GIF来进行识别。

数据处理过程

首先,我们设立了众包标注平台,多名志愿者累计标注了超过五千张验证码。

有了这些数据后,我们利用OpenCV进行了简单的图片增强、二值化、分字、裁切,然后随手糊了一个简单的CNN网络来识别。在随意调参之后,模型的整体(四个字)准确率接近95%。

然后,我们利用此模型来对数据集进行自举:爬取验证码后调用模型识别然后检验正确性,其中识别错误的部分手工标注。这样我们可以轻易地扩大数据集的规模,从而提升模型效果。

经过了更多的随意调参,模型的整体准确率可以达到98.4%。因为继续提升准确率意义不大,就没有继续优化。考虑到 PyTorch 安装比较麻烦,模型不易于部署到用户的设备上,我们实现了一个 HTTP API 可以用于云端识别。

相关工作

by Elector Quartet (按字典序的倒序 @xmcp, @SpiritedAwayCN, @Rabbit, @gzz)

You might also like...
Owner
xmcp
叶氏筛法第 NaN 代传人
xmcp
TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations

TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations Requirements python 3.6 torch 1.9 numpy 1.19 Quick Start The experimen

DMIRLAB 4 Oct 16, 2022
Dist2Dec: A Simplicial Neural Network for Homology Localization

Dist2Dec: A Simplicial Neural Network for Homology Localization

Alexandros Keros 6 Jun 12, 2022
The codes and models in 'Gaze Estimation using Transformer'.

GazeTR We provide the code of GazeTR-Hybrid in "Gaze Estimation using Transformer". We recommend you to use data processing codes provided in GazeHub.

65 Dec 27, 2022
[CVPR'21] DeepSurfels: Learning Online Appearance Fusion

DeepSurfels: Learning Online Appearance Fusion Paper | Video | Project Page This is the official implementation of the CVPR 2021 submission DeepSurfel

Online Reconstruction 52 Nov 14, 2022
CARL provides highly configurable contextual extensions to several well-known RL environments.

CARL (context adaptive RL) provides highly configurable contextual extensions to several well-known RL environments.

AutoML-Freiburg-Hannover 51 Dec 28, 2022
Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Preference-Planning-Deep-IRL Introduction Check my portfolio post Dependencies Gym stable-baselines3 PyTorch Usage Take Demonstration python3 record.

Tianyu Li 9 Oct 26, 2022
Matplotlib Image labeller for classifying images

mpl-image-labeller Use Matplotlib to label images for classification. Works anywhere Matplotlib does - from the notebook to a standalone gui! For more

Ian Hunt-Isaak 5 Sep 24, 2022
Pytorch implementation of YOLOX、PPYOLO、PPYOLOv2、FCOS an so on.

简体中文 | English miemiedetection 概述 miemiedetection是女装大佬咩酱基于YOLOX进行二次开发的个人检测库(使用的深度学习框架为pytorch),支持Windows、Linux系统,以女装大佬咩酱的名字命名。miemiedetection是一个不需要安装的

248 Jan 02, 2023
nanodet_plus,yolov5_v6.0

OAK_Detection OAK设备上适配nanodet_plus,yolov5_v6.0 Environment pytorch = 1.7.0

炼丹去了 1 Feb 18, 2022
PyTorch implementation of adversarial patch

adversarial-patch PyTorch implementation of adversarial patch This is an implementation of the Adversarial Patch paper. Not official and likely to hav

Jamie Hayes 172 Nov 29, 2022
PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

samplernn-pytorch A PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model. It's based on the reference implem

DeepSound 261 Dec 14, 2022
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag

Jiayi Weng 110 Dec 27, 2022
A bunch of random PyTorch models using PyTorch's C++ frontend

PyTorch Deep Learning Models using the C++ frontend Gettting started Clone the repo 1. https://github.com/mrdvince/pytorchcpp 2. cd fashionmnist or

Vince 0 Jul 13, 2021
ESP32 python application to read data from a Tilt™ Hydrometer for homebrewing

TitlESP32 ESP32 MicroPython application to read and log data from a Tilt™ Hydrometer. Requirements A board with an ESP32 chip USB cable - USB A / micr

IoBeer 5 Dec 01, 2022
Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

Aspect Sentiment Quad Prediction (ASQP) This repo contains the annotated data and code for our paper Aspect Sentiment Quad Prediction as Paraphrase Ge

Isaac 39 Dec 11, 2022
Code and project page for ICCV 2021 paper "DisUnknown: Distilling Unknown Factors for Disentanglement Learning"

DisUnknown: Distilling Unknown Factors for Disentanglement Learning See introduction on our project page Requirements PyTorch = 1.8.0 torch.linalg.ei

Sitao Xiang 24 May 16, 2022
Code accompanying paper: Meta-Learning to Improve Pre-Training

Meta-Learning to Improve Pre-Training This folder contains code to run experiments in the paper Meta-Learning to Improve Pre-Training, NeurIPS 2021. P

28 Dec 31, 2022
Code for weakly supervised segmentation of a single class

SingleClassRL Implementation of weak single object segmentation from paper "Regularized Loss for Weakly Supervised Single Class Semantic Segmentation"

16 Nov 14, 2022
Official implementation for "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation" (CVPR 2022)

QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation (CVPR2022) https://arxiv.org/abs/2203.08483 Unpaired image-to-image (I2I

Xueqi Hu 50 Dec 16, 2022
code for generating data set ES-ImageNet with corresponding training code

es-imagenet-master code for generating data set ES-ImageNet with corresponding training code dataset generator some codes of ODG algorithm The variabl

Ordinarabbit 18 Dec 25, 2022