An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Overview

The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval.

CLIP4Clip is a video-text retrieval model based on CLIP (ViT-B/32). We investigate three similarity calculation approaches: parameter-free type, sequential type, and tight type, in this work. The model achieve SOTA results on MSR-VTT, MSVC, and LSMDC.

CLIP4Clip

Requirement

# From CLIP 
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install ftfy regex tqdm
pip install opencv-python boto3 requests pandas

Data Preparing

For MSRVTT

The official data and video links can be found in link.

For the convenience, you can also download the splits and captions by,

wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msrvtt_data.zip

For MSVD

Raw videos can be download from link.

The splits and raw_captions can be found in the wonderful job collaborative-experts. For the convenience, you can also download them by,

wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msvd_data.zip

For LSMDC

You must obtain permission from MPII to download and use the data. The download link is here. The 1000 test clips data is link. Read our paper and the dataloader for more information.

How to Run

--features_path is the video root path

--linear_patch can be set with 2d or 3d

--sim_header can be set with meanP, seqLSTM, seqTransf, or tightTransf

read our paper for more details on --linear_patch and --sim_header. Test more hyperparameters for better performance.

Download CLIP (ViT-B/32) weight,

 wget -P ./modules https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt

Then, run

MSRVTT

DATA_PATH=[Your MSRVTT data and videos path]
python -m torch.distributed.launch --nproc_per_node=4 \
main_task_retrieval.py --do_train --num_thread_reader=0 \
--epochs=5 --batch_size=128 --n_display=50 \
--train_csv ${DATA_PATH}/MSRVTT_train.9k.csv \
--val_csv ${DATA_PATH}/MSRVTT_JSFUSION_test.csv \
--data_path ${DATA_PATH}/MSRVTT_data.json \
--features_path ${DATA_PATH}/MSRVTT_Videos \
--output_dir ckpts/ckpt_msrvtt_retrieval_looseType \
--lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \
--datatype msrvtt --expand_msrvtt_sentences  \
--feature_framerate 1 --coef_lr 1e-3 \
--freeze_layer_num 0  --slice_framepos 2 \
--loose_type --linear_patch 2d --sim_header meanP

MSVD

DATA_PATH=[Your MSVD data and videos path]
python -m torch.distributed.launch --nproc_per_node=4 \
main_task_retrieval.py --do_train --num_thread_reader=2 \
--epochs=5 --batch_size=128 --n_display=50 \
--data_path ${DATA_PATH} \
--features_path ${DATA_PATH}/MSVD_Videos \
--output_dir ckpts/ckpt_msvd_retrieval_looseType \
--lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \
--datatype msvd \
--feature_framerate 1 --coef_lr 1e-3 \
--freeze_layer_num 0 --slice_framepos 2 \
--loose_type --linear_patch 2d --sim_header meanP

LSMDC

DATA_PATH=[Your LSMDC data and videos path]
python -m torch.distributed.launch --nproc_per_node=4 \
main_task_retrieval.py --do_train --num_thread_reader=2 \
--epochs=5 --batch_size=128 --n_display=50 \
--data_path ${DATA_PATH} \
--features_path ${DATA_PATH}/LSMDC_Videos \
--output_dir ckpts/ckpt_lsmdc_retrieval_looseType \
--lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \
--datatype lsmdc --feature_framerate 1 --coef_lr 1e-3 \
--freeze_layer_num 0  --slice_framepos 2 \
--loose_type --linear_patch 2d --sim_header meanP

Citation

If you find CLIP4Clip useful in your work, you can cite the following paper:

@Article{Luo2021CLIP4Clip,
  author  = {Huaishao Luo and Lei Ji and Ming Zhong and Yang Chen and Wen Lei and Nan Duan and Tianrui Li},
  title   = {CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval},
  journal = {arXiv preprint arXiv:2104.08860},
  year    = {2021},
}

Acknowledgments

Our code is based on CLIP (ViT-B/32) and UniVL.

Comments
  • Poor performance when reproduce CLIP4clip(meanP) on ActivityNet

    Poor performance when reproduce CLIP4clip(meanP) on ActivityNet

    @ArrowLuo Hi, I directly train the CLIP4clip(meanP) on ActivityNet and get [email protected]=37.9 which is much worse than 40.5 reported in Table 4.

    I extracted images from the original videos with FPS=1, and trained the CLIP4clip(meanP) on 8 RTX3090. Due to the GPU memory constrain, I set the gradient_accumulation_steps=2.

    The caption is downloaded from https://cs.stanford.edu/people/ranjaykrishna/densevid/.

    opened by jianghaojun 14
  • How to download

    How to download "cross_pytorch_model.bin" as pretrained weights used in the project? & problem in DDP

    1. 我按照readme提供的参数,无法收敛,观察到日志中:
    2. 05/20/2022 00:18:31 - INFO - Weight doesn't exsits. xxx/modules/cross-base/cross_pytorch_model.bin
    3. 05/20/2022 00:18:42 - INFO - Weights from pretrained model not used in : clip.input_resolution clip.context_length clip.vocab_size 以上,请问如何下载modules/cross-base/cross_pytorch_model.bin文件?谢谢

    同时,我在2机8卡上进行分布式训练,按照正常的理解是每台机器8个进程,共16个(和GPU数一致),但跑这个工程的时候,每台机器先是启动了8个进程,接着又各派生了8个子进程,每台机器上存在16个进程,并且其中8个进程休眠,这个问题一直没有解决,想交流一下其他人是否遇到了这个问题?

    opened by AAUfoa 10
  • Some questions about the results of the MARVTT with `sim_header seqTransf`.

    Some questions about the results of the MARVTT with `sim_header seqTransf`.

    When I use the following configuration to train the model on MSRVTT Training-9K, the best result I got is 07/27/2021 13:11:01 - INFO - sim matrix size: 1000, 1000 07/27/2021 13:11:01 - INFO - Length-T: 1000, Length-V:1000 07/27/2021 13:11:01 - INFO - Text-to-Video: 07/27/2021 13:11:01 - INFO - >>> [email protected]: 43.2 - [email protected]: 71.0 - [email protected]: 79.4 - Median R: 2.0 - Mean R: 15.4 07/27/2021 13:11:01 - INFO - Video-to-Text: 07/27/2021 13:11:01 - INFO - >>> [email protected]: 43.1 - V2T$R@5: 71.2 - [email protected]: 80.7 - V2T$Median R: 2.0 - V2T$Mean R: 11.9. It's worse than the results [email protected]: 44.5 listed in the paper. Did i miss some details? Here is the configuration. CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_addr=127.0.0.2 --master_port 29552 main_ta sk_retrieval.py --num_thread_reader=4 --epochs=5 --batch_size=128 --n_display=20 --train_csv /home/hadoop-vacv/cephfs/data/caoshuqia ng/data/jobs/MSRVTT/csv/msrvtt_data/MSRVTT_train.9k.csv --val_csv /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/csv/msr vtt_data/MSRVTT_JSFUSION_test.csv --data_path /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/csv/msrvtt_data/MSRVTT_data .json --features_path /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/MSRVTT_Videos --output_dir /home/hadoop-vacv/cephfs /data/caoshuqiang/code/vicab/newexp/hope/clip_raw --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 12 --datatype msrvtt -- expand_msrvtt_sentences --feature_framerate 1 --coef_lr 1e-3 --freeze_layer_num 0 --slice_framepos 2 --loose_type --linear_patch 2d --sim_header seqTransf --do_train.

    opened by sqiangcao99 10
  • subprocess.CalledProcessError

    subprocess.CalledProcessError

    Hi, I wanna train CLIP4Clip on MSRVTT. But I got this issue. Can you Help me?

    subprocess.CalledProcessError: Command '['/home/nccu/anaconda3/envs/Clip4Video-alvin/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=0', '--epochs=5', '--batch_size=128', '--n_display=50', '--train_csv', './MSRVTT/videos/MSRVTT_train.9k.csv', '--val_csv', './MSRVTT/videos/MSRVTT_JSFUSION_test.csv', '--data_path', './MSRVTT/videos/MSRVTT_data.json', '--features_path', './MSRVTT/videos/MSRVTT_Videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.

    I will appreciate your help with this situation. Thank you in advance.

    opened by alvinlin1271320 9
  • The reproduction results of `meanP` and `seqTransf` on  `DiDeMo` dataset are much worse than those in the paper

    The reproduction results of `meanP` and `seqTransf` on `DiDeMo` dataset are much worse than those in the paper

    I trained clip4clip on didemo dataset, and the [email protected] of text-to-video is much worse than that shown in paper.

    The metric reported in paper is 43.4 on DiDeMo when similarity calculator is meanP and is 42.8 when the head is seqTransf.

    But according to my reproduction result, based on meanP, max t2v [email protected] is only up to 40.5, and it only up to 40.2 based on seqTransf.

    All settings remain the same as in the paper.

    image
    opened by HanielF 8
  • Query search

    Query search

    As I understood correctly, after training and evaluation, videos are set on the basis of ranks and similarity scores. Is there any script available to make a search query as text and get the video with some freedom of search based on confidence or similarity index with some closest ranked video ?

    opened by Tortoise17 7
  • What is the meaning of --num_thread_reader=0 in MSR-VTT training configuration?

    What is the meaning of --num_thread_reader=0 in MSR-VTT training configuration?

    Can you explain the number of thread reader in the training configuration? I can adjust this value to decrease my training time? (Why --num_thread_reader=0 in MSR-VTT while --num_thread_reader=2 in other dataset.) Thank you so much!

    opened by thinh276 6
  • loss becomes nan

    loss becomes nan

    Hi, I ran the code on MSRVTT dataset with 2 A100s, and its loss becomes nan after some iterations, like this issue. However, I found that the RawVideoExtractorCV2 function succeeded in reading the video when testing only one video input (Directly test the video_to_tensor function in RawVideoExtractorCV2 ), but failed to read the video with multiple num_workers when running the given scripts (the print log is printed in line 63 by myself, but no thing will be printed in line 211 ). Is there something wrong with the multiple threads setting?

    opened by fake-warrior8 5
  • Data split issue

    Data split issue

    Hi @ArrowLuo, I have a question about the data split during the fine-tuning stage. The paper claims that you refer to the data split of the ECCV'20 paper "Multi-modal transformer for video retrieval". In the ECCV'20 paper's code, they sample a subset from the training data to cross-validate and select model. And in your implementation, you directly validate on the 1k testing set. Is it ok to use the testing set to select the best model?

    opened by zhixinma 5
  • torch.distributed.init_process_group(backend=

    torch.distributed.init_process_group(backend="nccl") error and some other errors

    Dear author,

    Thank you for helping me about 4 months before.

    I'd tried to leave you recomment to your helping words, but my issue was already canceled. I really really feel sorry for that.

    Now, I have another issues on my running process with MSVD datasets.

    I successed to run your framework several time, but I got another problem on my environment these days.

    If you don't mind, Can I burrow your hand one more time?

    the following is my error log, and I can manage the GPU sever of two.

    each separation means that the error log raised on each server.

    If you need another request required for, leave me your comments.

    Sincerely,

    Server 16

    (CLIP4Clip) [email protected]:~/CLIP4Clip$ main_task_retrieval.py DATA_PATH=/home/key2317/CLIP4Clip/msvd_data/ VISIBLE_DEVICES=3,4,0,5 python -m torch.distributed.launch --nproc_per_node=4
    main_task_retrieval.py --do_train --num_thread_reader=2
    --epochs=5 --batch_size=128 --n_display=50
    --data_path ${DATA_PATH}
    --features_path ${DATA_PATH}/MSVD_Videos
    --output_dir ckpts/ckpt_msvd_retrieval_looseType
    --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16
    --datatype msvd
    --feature_framerate 1 --coef_lr 1e-3
    --freeze_layer_num 0 --slice_framepos 2
    --loose_type --linear_patch 2d --sim_header meanP
    --pretrained_clip_name ViT-B/32main_task_retrieval.py: command not found (CLIP4Clip) [email protected]:~/CLIP4Clip$ CUDA_VISIBLE_DEVICES=3,4,0,5 python -m torch.distributed.launch --nproc_per_node=4 \

    main_task_retrieval.py --do_train --num_thread_reader=2
    --epochs=5 --batch_size=128 --n_display=50
    --data_path ${DATA_PATH}
    --features_path ${DATA_PATH}/MSVD_Videos
    --output_dir ckpts/ckpt_msvd_retrieval_looseType
    --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16
    --datatype msvd
    --feature_framerate 1 --coef_lr 1e-3
    --freeze_layer_num 0 --slice_framepos 2
    --loose_type --linear_patch 2d --sim_header meanP
    --pretrained_clip_name ViT-B/32 Traceback (most recent call last): File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 185, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 111, in _get_module_details import(pkg_name) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/init.py", line 189, in _load_global_deps() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/init.py", line 142, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/ctypes/init.py", line 373, in init self._handle = _dlopen(self._name, mode) OSError: /home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/lib/../../../../libcublas.so.11: undefined symbol: free_gemm_select, version libcublasLt.so.11 (CLIP4Clip) [email protected]:~/CLIP4Clip$

    Server 19

    (CLIP4Clip) [email protected]:~/video-multimodal/CLIP4Clip$ python -m torch.distributed.launch --nproc_per_node=4 \

    main_task_retrieval.py --do_train --num_thread_reader=2
    --epochs=5 --batch_size=128 --n_display=50
    --data_path ${DATA_PATH}
    --features_path ${DATA_PATH}/MSVD_Videos
    --output_dir ckpts/ckpt_msvd_retrieval_looseType
    --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16
    --datatype msvd
    --feature_framerate 1 --coef_lr 1e-3
    --freeze_layer_num 0 --slice_framepos 2
    --loose_type --linear_patch 2d --sim_header meanP
    --pretrained_clip_name ViT-B/32

    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

    Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 436, in init_process_group store, rank, world_size = next(rendezvous_iterator) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 179, in _env_rendezvous_handler store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout) RuntimeError: Address already in use Traceback (most recent call last): File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/key2317/anaconda3/envs/CLIP4Clip/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=2', '--epochs=5', '--batch_size=128', '--n_display=50', '--data_path', '--features_path', '/MSVD_Videos', '--output_dir', 'ckpts/ckpt_msvd_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msvd', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1. (CLIP4Clip) [email protected]:~/video-multimodal/CLIP4Clip$ Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group barrier() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370172916/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8 Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group barrier() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370172916/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8 Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group barrier() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370172916/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8

    opened by celestialxevermore 5
  • Some errors : subprocess.CalledProcessError

    Some errors : subprocess.CalledProcessError

    Dear Author, really thanks you all for opening this open source.

    I'm a novice and do not have much techniques in dealing with and running all kinds of framework, so today, I've been suffered from error :

    subprocess.CalledProcessError: Command '['/home/key2317/anaconda3/envs/CLIP4Clip/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=0', '--epochs=5', '--batch_size=128', '--n_display=50', '--train_csv', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_train.9k.csv', '--val_csv', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_JSFUSION_test.csv', '--data_path', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_data.json', '--features_path', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_Videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.

    From entering my starting command, It seemed to run well for about 5 seconds showing inner state like vison_layers: 12, vision_width : 768, blah blah blah. But unfortunately, It was all ended up with aforementioned messages.

    One of my colleague guessed that It would be problem in our unmatching issue in GPU environments, in short, GPU problems, but I'm not sure how can I diagnose my problem.

    I am really interested in your papers, also codes, but, because of this initial step, I cannot go for next. Would you please help me?

    Thanks.

    opened by celestialxevermore 5
  • About mean_pooling on text sequence

    About mean_pooling on text sequence

    Dear author, I hope you enjoy your new year.

    I have a quetion about, the reason why you do not use the function, _mean_pooling_for_similarity_sequence.

    Is there any special reason for that?

    I also looked out your previous model, UniVL, but I cannot find any reason about that.

    I hope you reply soon.

    Thx.

    Sincerly,

    opened by celestialxevermore 1
  • run simple inference

    run simple inference

    1.Is there a simple example to run an inference Eg: python infer.py --model --input 'red car', and return the video(s) or url(s) or something like that. 2. Is it possible to get the embeddings?

    opened by jdso1988 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • train on DiDeMo

    train on DiDeMo

    Thanks for sharing your wonderful work!

    When I train on DiDeMo dataset, the following information appears on the terminal more than once. 139ae0410327e9b89ba27ca5d1d6cfe

    But the model training was not interrupted. The model could be trained and tested normally. I ran according to the training parameters on DiDeMo dataset you provided, but changed the batch size. I want to know whether you have encountered the same situation, and whether it will affect the model results? I searched the Internet for solutions, but I could hardly find them.

    opened by qjyyyy 1
  • MSVD Weights

    MSVD Weights

    Hello, thanks for providing the code!

    In regards to the models used for generating the results in the paper, is this uploaded anywhere that can be shared? I would be interested in the inference without running the training from scratch.

    opened by ntseng450 1
Owner
ArrowLuo
Each place has water. The problem is that the depth we dig is not enough. So be more patient, be more try.
ArrowLuo
基于Transformer的单模型、多尺度的VAE模型

UniVAE 基于Transformer的单模型、多尺度的VAE模型 介绍 https://kexue.fm/archives/8475 依赖 需要大于0.10.6版本的bert4keras(当前还没有推到pypi上,可以直接从GitHub上clone最新版)。 引用 @misc{univae,

苏剑林(Jianlin Su) 49 Aug 24, 2022
A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

Reddit text to speech generator A basic reddit tts video generator Current functionality Generate videos for subs based on comments,(askreddit) so rea

Aadvik 17 Dec 19, 2022
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

ParlAI (pronounced “par-lay”) is a python framework for sharing, training and testing dialogue models, from open-domain chitchat, to task-oriented dia

Facebook Research 9.7k Jan 09, 2023
Finally, some decent sample sentences

tts-dataset-prompts This repository aims to be a decent set of sentences for people looking to clone their own voices (e.g. using Tacotron 2). Each se

hecko 19 Dec 13, 2022
justCTF [*] 2020 challenges sources

justCTF [*] 2020 This repo contains sources for justCTF [*] 2020 challenges hosted by justCatTheFish. TLDR: Run a challenge with ./run.sh (requires Do

justCatTheFish 25 Dec 27, 2022
Fake Shakespearean Text Generator

Fake Shakespearean Text Generator This project contains an impelementation of stateful Char-RNN model to generate fake shakespearean texts. Files and

Recep YILDIRIM 1 Feb 15, 2022
The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Speech Separation The simple project to separate mixed voice (2 clean voices) to 2 separate voices. Result Example (Clisk to hear the voices): mix ||

vuthede 31 Oct 30, 2022
An implementation of the Pay Attention when Required transformer

Pay Attention when Required (PAR) Transformer-XL An implementation of the Pay Attention when Required transformer from the paper: https://arxiv.org/pd

7 Aug 11, 2022
Materials (slides, code, assignments) for the NYU class I teach on NLP and ML Systems (Master of Engineering).

FREE_7773 Repo containing material for the NYU class (Master of Engineering) I teach on NLP, ML Sys etc. For context on what the class is trying to ac

Jacopo Tagliabue 90 Dec 19, 2022
Material for GW4SHM workshop, 16/03/2022.

GW4SHM Workshop Wednesday, 16th March 2022 (13:00 – 15:15 GMT): Presented by: Dr. Rhodri Nelson, Imperial College London Project website: https://www.

Devito Codes 1 Mar 16, 2022
Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Bethge Lab 61 Dec 21, 2022
NLP Text Classification

多标签文本分类任务 近年来随着深度学习的发展,模型参数的数量飞速增长。为了训练这些参数,需要更大的数据集来避免过拟合。然而,对于大部分NLP任务来说,构建大规模的标注数据集非常困难(成本过高),特别是对于句法和语义相关的任务。相比之下,大规模的未标注语料库的构建则相对容易。为了利用这些数据,我们可以

Jason 1 Nov 11, 2021
Text editor on python to convert english text to malayalam(Romanization/Transiteration).

Manglish Text Editor This is a simple transiteration (romanization ) program which is used to convert manglish to malayalam (converts njaan to ഞാൻ ).

Merin Rose Tom 1 May 11, 2022
ElasticBERT: A pre-trained model with multi-exit transformer architecture.

This repository contains finetuning code and checkpoints for ElasticBERT. Towards Efficient NLP: A Standard Evaluation and A Strong Baseli

fastNLP 48 Dec 14, 2022
Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models

Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models. A paraphrase framework is more than just a paraphrasing model.

Prithivida 681 Jan 01, 2023
Conversational text Analysis using various NLP techniques

Conversational text Analysis using various NLP techniques

Rita Anjana 159 Jan 06, 2023
nlp基础任务

NLP算法 说明 此算法仓库包括文本分类、序列标注、关系抽取、文本匹配、文本相似度匹配这五个主流NLP任务,涉及到22个相关的模型算法。 框架结构 文件结构 all_models ├── Base_line │   ├── __init__.py │   ├── base_data_process.

zuxinqi 23 Sep 22, 2022
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

CrossNER is a fully-labeled collected of named entity recognition (NER) data spanning over five diverse domains (Politics, Natural Science, Music, Literature, and Artificial Intelligence) with specia

Zihan Liu 89 Nov 10, 2022
Py65 65816 - Add support for the 65C816 to py65

Add support for the 65C816 to py65 Py65 (https://github.com/mnaberez/py65) is a

4 Jan 04, 2023
Python library for Serbian Natural language processing (NLP)

SrbAI - Python biblioteka za procesiranje srpskog jezika SrbAI je projekat prikupljanja algoritama i modela za procesiranje srpskog jezika u jedinstve

Serbian AI Society 3 Nov 22, 2022