🤗 Push your spaCy pipelines to the Hugging Face Hub

Last update: Oct 09, 2022

Overview

spacy-huggingface-hub: Push your spaCy pipelines to the Hugging Face Hub

This package provides a CLI command for uploading any trained spaCy pipeline packaged with spacy package to the Hugging Face Hub. It auto-generates all meta information for you, uploads a pretty README (requires spaCy v3.1+) and handles version control under the hood.

🤗 About the Hugging Face Hub

The Hugging Face Hub hosts Git-based repositories which are storage spaces that can contain all your files. These repositories have multiple advantages: versioning (commit history and diffs), branches, useful metadata about their tasks, languages, metrics and more, browser-based visualizers to explore the models interactively in your browser, as well as an API to use the models in production.

🚀 Quickstart

You can install spacy-huggingface-hub from pip:

pip install spacy-huggingface-hub

To check if the command has been registered successfully:

python -m spacy huggingface-hub --help

Hugging Face uses Git Large File Storage (LFS) to handle files larger than 10mb. You can find instructions on how to download it here.

You can then upload any pipeline packaged with spacy package. Make sure to set --build wheel to output a binary .whl file. The uploader will read all metadata from the pipeline package, including the auto-generated pretty README.md and the model details available in the meta.json.

huggingface-cli login
python -m spacy package ./en_ner_fashion ./output --build wheel
cd ./output/en_ner_fashion-0.0.0/dist
python -m spacy huggingface-hub push en_ner_fashion-0.0.0-py3-none-any.whl

The command will output two things:

Where to find your repo in the Hub! For example, https://huggingface.co/spacy/en_core_web_sm
And how to install the pipeline directly from the Hub!

pip install https://huggingface.co/spacy/en_core_web_sm/resolve/main/en_core_web_sm-any-py3-none-any.whl

Now you can share your pipelines very quickly with others. Additionally, you can also test your pipeline directly in the browser!

⚙️ Usage and API

If spaCy is already installed in the same environment, this package automatically adds the spacy huggingface-hub commands to the CLI. If you don't have spaCy installed, you can also execute the CLI directly via the package.

`push`

python -m spacy huggingface-hub push [whl_path] [--org] [--msg] [--local-repo] [--verbose]

python -m spacy_huggingface_hub push [whl_path] [--org] [--msg] [--local-repo] [--verbose]

Argument	Type	Description
`whl_path`	str / `Path`	The path to the `.whl` file packaged with `spacy package`.
`--org`, `-o`	str	Optional name of organization to which the pipeline should be uploaded.
`--msg`, `-m`	str	Commit message to use for update. Defaults to `"Update spaCy pipeline"`.
`--local-repo`, `-l`	str / `Path`	Local path to the model repository (will be created if it doesn't exist). Defaults to `hub` in the current working directory.
`--verbose`, `-V`	bool	Output additional info for debugging, e.g. the full generated hub metadata.

Usage from Python

Instead of using the CLI, you can also call the push function from Python. It returns a dictionary containing the "url" of the published model and the "whl_url" of the wheel file, which you can install with pip install

from spacy_huggingface_hub import push

result = push("./en_ner_fashion-0.0.0-py3-none-any.whl")
print(result["url"])

Comments

HTTP Error 400 when pushing model to HuggingFace hub

Hello,

I'm not quite sure if this issue is related to #5.

When I'm trying to push a model on Hugging Face Hub organisation with spaCy CLI:

python -m spacy huggingface-hub push fr_core_ner4archives_v3_default-0.0.0-py3-none-any.whl -o ner4archives -V

This raises an HTTP 400 error:

Pushing repository to the hub...
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/spacy/__main__.py", line 4, in <module>
    setup_cli()
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/spacy/cli/_util.py", line 71, in setup_cli
    command(prog_name=COMMAND)
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/spacy_huggingface_hub/push.py", line 53, in huggingface_hub_push_cli
    push(whl_path, organization, commit_msg, verbose=verbose)
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/spacy_huggingface_hub/push.py", line 130, in push
    url = upload_folder(
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 2109, in upload_folder
    pr_url = self.create_commit(
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 1844, in create_commit
    _raise_for_status(commit_resp)
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 84, in _raise_for_status
    _raise_with_request_id(request)
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 95, in _raise_with_request_id
    raise e
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 90, in _raise_with_request_id
    request.raise_for_status()
  File "/home/lterriel/Documents/dev/almanach-projects/N4A_project/Training_pipelines/venv/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/models/lterriel/fr_core_ner4archives_v3_default/commit/main (Request ID: QTOTcNAiXjAsn45fwTKLO)

However, the new model repository is well created in Hugging Face Hub organisation but without the model (files) and its card.

packages installed:

spaCy==3.3.1
spacy-huggingface-hub==0.0.7
huggingface-hub==0.8.1

opened by Lucaterre 24

Error when pushing to huggingface hub

Hello,

I have a problem using this library. When I try to run the command to upload the whl file:

python -m spacy huggingface-hub push en_acnl_electra_pipeline-0.0.1-py3-none-any.whl

The error occurs:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/repository.py", line 365, in git_pull
    cwd=self.local_dir,
  File "/usr/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['git', 'pull', '--rebase']' returned non-zero exit status 128.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/spacy/__main__.py", line 4, in <module>
    setup_cli()
  File "/usr/local/lib/python3.7/dist-packages/spacy/cli/_util.py", line 69, in setup_cli
    command(prog_name=COMMAND)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/usr/local/lib/python3.7/dist-packages/spacy_huggingface_hub/push.py", line 53, in huggingface_hub_push_cli
    push(whl_path, organization, commit_msg, local_repo_path, verbose=verbose)
  File "/usr/local/lib/python3.7/dist-packages/spacy_huggingface_hub/push.py", line 91, in push
    repo.git_pull(rebase=True)
  File "/usr/local/lib/python3.7/dist-packages/huggingface_hub/repository.py", line 368, in git_pull
    raise EnvironmentError(exc.stderr)
OSError: error: cannot pull with rebase: Your index contains uncommitted changes.
error: please commit or stash them.

I'm using:

============================== Info about spaCy ==============================

spaCy version 3.1.3
Location /usr/local/lib/python3.7/dist-packages/spacy Platform Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic Python version 3.7.12
Pipelines

opened by danielvasic 2

Use new non-git based API
With this approach:

Users don't need to configure git or use git lfs

A local copy of the repo is not needed anymore, so no local artifacts and issues related to git conflicts anymore

Due to above, --local-repo flag is removed.

Some light cleanup of codebase, in particular there was an edge scenario when the version is not specified in the whl name (which is the case of the whl on the HF Hub), and it was not being handled appropriately. Now we retrieve the version from the metadata file if the version is not specified

Result: https://huggingface.co/osanseviero/en_core_web_sm
opened by osanseviero 1

License problem when pushing model

Hi, I am trying to push my spaCy model to the Huggingface Hub and I get the following error:

Traceback (most recent call last):
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/huggingface_hub/repository.py", line 412, in git_push
    result = subprocess.run(
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['git', 'push']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/spacy/__main__.py", line 4, in <module>
    setup_cli()
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/spacy/cli/_util.py", line 69, in setup_cli
    command(prog_name=COMMAND)
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/spacy_huggingface_hub/push.py", line 53, in huggingface_hub_push_cli
    push(whl_path, organization, commit_msg, local_repo_path, verbose=verbose)
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/spacy_huggingface_hub/push.py", line 131, in push
    url = repo.push_to_hub(commit_message=commit_msg)
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/huggingface_hub/repository.py", line 434, in push_to_hub
    return self.git_push()
  File "/home/elena/Workspace/Spacy_project/.env3/lib/python3.8/site-packages/huggingface_hub/repository.py", line 422, in git_push
    raise EnvironmentError(exc.stderr)
OSError: remote: ----------------------------------------------------------        
remote: Sorry, your push was rejected:        
remote: - error : yaml metadata schema issue on key "license"        
remote: help: key: /license must be equal to one of the allowed values (error rule: enum), please find the license identifier that fits you at https://huggingface.co/docs/hub/model-repos#list-of-license-identifiers        
remote: ----------------------------------------------------------        
remote: Please find the documentation at:        
remote: https://huggingface.co/docs/hub/model-repos        
remote: ----------------------------------------------------------        
To https://huggingface.co/nucci/sv_pipeline
 ! [remote rejected] main -> main (pre-receive hook declined)
error: failed to push some refs to 'https://huggingface.co/nucci/sv_pipeline'

It seems to be a license problem, so I put the identifier corresponding to the license that I want to give my model in meta.json, but it doesn't seem to solve the problem. Any thoughts?

opened by Nuccy90 1

Fix license format

Hugging Face started to validate the license from the metadata by ensuring it's in a list of allowed ones. The expectation is that the license is in lower case, but spaCy uses MIT by default (and others are also in uppercase).

opened by osanseviero 1
Fix model-index structure and misc minor fixes
This PR has multiple changes, including:

Allowing to push an unversioned whl name (recall that repos in HF have whl files with any version in the name, and the extraction happens differently for those).

Add the task to the metric name

Change the model-index in the metadata to match latest specification.

This will enable showing a "Evaluation Results" section in the model card

FYI @julien-c
opened by osanseviero 1
bugfix/fix path for windows + add model to lfs
zip_ref.namelist() produces a different path format than Path(), that's why for windows the base_layer and file_name couldn't be compared. Because of that, the if file_name.startswith(str(base_name)): statement was never true. The fix was to format file_name to the same format as base_name when using .startswith().

model was added to repo.lfs_track() because it's size exceeded 10 Mb, at least for windows
opened by thomashacker 0
argument for adding a model card

It would be great to be able to pre-generate, (edit), and attach a model card.

I have created a model, pushed it, edited the card, then had to push it again, and quite logically it just over-wrote the edited model card. Even without full pull functionality, it would be great to be able to attach edited cards to avoid the need to re-do this job every time manually.

opened by DSLituiev 1

Releases(v0.0.8)

v0.0.8(Dec 8, 2022)
Extend wasabi support to v1.1.

Source code(tar.gz)
Source code(zip)
v0.0.7(Jul 20, 2022)
Update to use new huggingface_hub v0.8 API without local git repos

Source code(tar.gz)
Source code(zip)
v0.0.6(Dec 10, 2021)
Add pos, lemma, and morph accuracies

Update metric names and types

Fix typo in reported score for LAS (LAS not UAS)

Source code(tar.gz)
Source code(zip)
v0.0.5(Oct 12, 2021)
Adjust license format.

Source code(tar.gz)
Source code(zip)
v0.0.4(Aug 11, 2021)
Update model-index format to match new metrics spec.

Fix upload for wheels downloaded from the hub.

Source code(tar.gz)
Source code(zip)
v0.0.3(Jul 9, 2021)
Fix .whl file extraction on Windows.

Make sure to track binary model weights with LFS.

Source code(tar.gz)
Source code(zip)
v0.0.2(Jul 7, 2021)
Improve error handling.

Fix model-index format.

Source code(tar.gz)
Source code(zip)
v0.0.1(Jul 6, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Explosion

A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing

GitHub Repository

A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥

Lightning-Hydra-Template A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥 Click on Use this template to initialize new re

1 Dec 20, 2021

Meta Learning for Semi-Supervised Few-Shot Classification

few-shot-ssl-public Code for paper Meta-Learning for Semi-Supervised Few-Shot Classification. [arxiv] Dependencies cv2 numpy pandas python 2.7 / 3.5+

501 Jan 08, 2023

This repository contains pre-trained models and some evaluation code for our paper Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Contriever: Towards Unsupervised Dense Information Retrieval with Contrastive Learning This repository contains pre-trained models and some evaluation

207 Jan 08, 2023

PyTorch code of my WACV 2022 paper Improving Model Generalization by Agreement of Learned Representations from Data Augmentation

Improving Model Generalization by Agreement of Learned Representations from Data Augmentation (WACV 2022) Paper ArXiv Why it matters? When data augmen

5 Mar 04, 2022

Run containerized, rootless applications with podman

Why? restrict scope of file system access run any application without root privileges creates usable "Desktop applications" to integrate into your nor

119 Dec 27, 2022

wgan, wgan2(improved, gp), infogan, and dcgan implementation in lasagne, keras, pytorch

Generative Adversarial Notebooks Collection of my Generative Adversarial Network implementations Most codes are for python3, most notebooks works on C

1.5k Dec 16, 2022

a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

pytorch-spynet This is a personal reimplementation of SPyNet [1] using PyTorch. Should you be making use of this work, please cite the paper according

269 Jan 02, 2023

The materials used in the SaxonJS tutorial presented at Declarative Amsterdam, 2021

SaxonJS-Tutorial-2021, version 1.0.4 Last updated on 4 November, 2021. Table of contents Background Prerequisites Starting a web server Running a Java

11 Oct 23, 2022

Nested Graph Neural Network (NGNN) is a general framework to improve a base GNN's expressive power and performance

Nested Graph Neural Networks About Nested Graph Neural Network (NGNN) is a general framework to improve a base GNN's expressive power and performance.

38 Jan 05, 2023

Unifying Global-Local Representations in Salient Object Detection with Transformer

GLSTR (Global-Local Saliency Transformer) This is the official implementation of paper "Unifying Global-Local Representations in Salient Object Detect

11 Aug 24, 2022

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

16 Nov 04, 2020

🤗 Push your spaCy pipelines to the Hugging Face Hub

Related tags

Overview

spacy-huggingface-hub: Push your spaCy pipelines to the Hugging Face Hub

🤗 About the Hugging Face Hub

🚀 Quickstart

⚙️ Usage and API

push

Usage from Python

Comments

Releases(v0.0.8)

v0.0.8(Dec 8, 2022)

v0.0.7(Jul 20, 2022)

v0.0.6(Dec 10, 2021)

v0.0.5(Oct 12, 2021)

v0.0.4(Aug 11, 2021)

v0.0.3(Jul 9, 2021)

v0.0.2(Jul 7, 2021)

v0.0.1(Jul 6, 2021)

Owner

Explosion

A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥

Meta Learning for Semi-Supervised Few-Shot Classification

This repository contains pre-trained models and some evaluation code for our paper Towards Unsupervised Dense Information Retrieval with Contrastive Learning

PyTorch code of my WACV 2022 paper Improving Model Generalization by Agreement of Learned Representations from Data Augmentation

Run containerized, rootless applications with podman

wgan, wgan2(improved, gp), infogan, and dcgan implementation in lasagne, keras, pytorch

a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

The materials used in the SaxonJS tutorial presented at Declarative Amsterdam, 2021

Nested Graph Neural Network (NGNN) is a general framework to improve a base GNN's expressive power and performance

Unifying Global-Local Representations in Salient Object Detection with Transformer

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

Deep learning-based approach to discovering Granger causality networks in multivariate time series

This is the source code of the solver used to compete in the International Timetabling Competition 2019.

Transfer Learning Shootout for PyTorch's model zoo (torchvision)

AquaTimer - Programmable Timer for Aquariums based on ATtiny414/814/1614

disentanglement_lib is an open-source library for research on learning disentangled representations.

Normalization Matters in Weakly Supervised Object Localization (ICCV 2021)

`push`