Topic Inference with Zeroshot models

Overview

zeroshot_topics

Table of Contents

Installation

zeroshot_topics is distributed on PyPI as a universal wheel and is available on Linux/macOS and Windows and supports Python 3.7+ and PyPy.

$ pip install zeroshot_topics

Usage

from zeroshot_topics import ZeroShotTopicFinder
zsmodel = ZeroShotTopicFinder()
text = """can you tell me anything else okay great tell me everything you know about George_Washington.
he was the first president he was well he I'm trying to well he fought in the Civil_War he was a general
in the Civil_War and chopped down his father's cherry tree when he was a little boy he that's it."""
zsmodel.find_topic(text)

License

zeroshot_topics is distributed under the terms of

You might also like...
This repo stores the codes for topic modeling on palliative care journals.

This repo stores the codes for topic modeling on palliative care journals. Data Preparation You first need to download the journal papers. bash 1_down

topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API
topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

NLP Space News Topic Modeling Photos by nasa.gov (1, 2, 3, 4, 5) and extremetech.com Table of Contents Project Idea Data acquisition Primary data sour

Biterm Topic Model (BTM): modeling topics in short texts
Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Reduce T5 model size by 3X and increase the inference speed up to 5X. Install Usage Details Functionalities Benchmarks Onnx model Quantized onnx model

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS) Yoonhyung Lee, Joongbo Shin, Kyomin Jung Abstract: Although early

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation
LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

LightSeq is a high performance inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP models such as BERT, GPT2, Transformer, etc. It is therefore best useful for Machine Translation, Text Generation, Dialog, Language Modelling, and other related tasks using these models.

Spert NLP Relation Extraction API deployed with torchserve for inference

SpERT torchserve Spert_torchserve is the Relation Extraction model (SpERT)Span-based Entity and Relation Transformer API deployed with pytorch/serve.

A minimal code for fairseq vq-wav2vec model inference.

vq-wav2vec inference A minimal code for fairseq vq-wav2vec model inference. Runs without installing the fairseq toolkit and its dependencies. Usage ex

Comments
  • Error when I run the sample code

    Error when I run the sample code

    I get this when I try to run the sample code:

    Traceback (most recent call last): File "zerotopics.py", line 1, in from zeroshot_topics import ZeroShotTopicFinder File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/init.py", line 3, in from .zeroshot_tm import ZeroShotTopicFinder File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/zeroshot_tm.py", line 3, in from .utils import load_zeroshot_model File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/utils.py", line 6, in def load_zeroshot_model(model_name="valhalla/distilbart-mnli-12-6"): File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/functools.py", line 490, in lru_cache raise TypeError('Expected maxsize to be an integer or None') TypeError: Expected maxsize to be an integer or None

    Specifics: Python version 3.7.9

    pip freeze gives (yeh this virtualenv is getting big :):

    absl-py==1.0.0 aiohttp==3.8.1 aiosignal==1.2.0 alabaster==0.7.12 aniso8601==9.0.1 antlr4-python3-runtime==4.8 appnope @ file:///opt/concourse/worker/volumes/live/4f734db2-9ca8-4d8b-5b29-6ca15b4b4772/volume/appnope_1606859466979/work async-timeout==4.0.2 asynctest==0.13.0 attrs==20.3.0 Babel==2.9.1 backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work bertopic==0.6.0 blis @ file:///opt/concourse/worker/volumes/live/cd6a6bea-d063-4b62-4c10-fcc89b17d0ac/volume/cython-blis_1594246851083/work boto3==1.17.86 botocore==1.20.86 brotlipy==0.7.0 cachetools==4.2.1 catalogue==2.0.6 certifi==2020.12.5 cffi @ file:///opt/concourse/worker/volumes/live/2aa8abfe-8b8d-4889-78d9-837b74c3cd64/volume/cffi_1606255119410/work chardet @ file:///opt/concourse/worker/volumes/live/9efbf151-b45b-463d-6340-a5c399bf00b7/volume/chardet_1607706825988/work charset-normalizer==2.0.9 click==7.1.2 colorama==0.4.4 coloredlogs==15.0.1 commonmark==0.9.1 cryptography @ file:///opt/concourse/worker/volumes/live/41c3d62a-f1f8-46ce-414a-9adaf4ea7d96/volume/cryptography_1607636752064/work cycler==0.10.0 cymem @ file:///opt/concourse/worker/volumes/live/3e8d7428-f57d-4000-44e7-34ac8a744f13/volume/cymem_1605062299053/work Cython==0.29.23 dataclasses==0.6 datasets==1.17.0 decorator @ file:///home/ktietz/src/ci/decorator_1611930055503/work dill==0.3.4 docformatter==1.4 docutils==0.15.2 emoji==1.6.1 en-core-web-lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.2.0/en_core_web_lg-3.2.0-py3-none-any.whl en-core-web-md @ https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.2.0/en_core_web_md-3.2.0-py3-none-any.whl en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.2.0/en_core_web_sm-3.2.0-py3-none-any.whl en-core-web-trf @ https://github.com/explosion/spacy-models/releases/download/en_core_web_trf-3.2.0/en_core_web_trf-3.2.0-py3-none-any.whl et-xmlfile==1.1.0 fairscale==0.4.4 Faker==8.16.0 fasttext @ file:///Users/scharlesworth/fastText-0.9.2 filelock==3.0.12 flake8==4.0.1 flake8-bugbear==21.11.29 Flask==2.0.2 Flask-Cors==3.0.10 Flask-RESTful==0.3.9 frozenlist==1.2.0 fsspec==2021.11.1 future==0.18.2 gitdb==4.0.9 gitdb2==4.0.2 GitPython==3.1.24 google-api-core==1.26.2 google-api-python-client==2.0.2 google-auth==1.28.0 google-auth-httplib2==0.1.0 google-auth-oauthlib==0.4.6 googleapis-common-protos==1.53.0 grpcio==1.43.0 hdbscan==0.8.27 httplib2==0.19.0 huggingface-hub==0.2.1 humanfriendly==10.0 hydra-core==1.1.1 idna @ file:///tmp/build/80754af9/idna_1593446292537/work imagesize==1.3.0 importlib-metadata @ file:///tmp/build/80754af9/importlib-metadata_1602276842396/work importlib-resources==5.4.0 iniconfig==1.1.1 iopath==0.1.9 ipykernel @ file:///opt/concourse/worker/volumes/live/73e8766c-12c3-4f76-62a6-3dea9a7da5b7/volume/ipykernel_1596206701501/work/dist/ipykernel-5.3.4-py3-none-any.whl ipython @ file:///opt/concourse/worker/volumes/live/ac685347-76d6-4904-4b88-886c6a434f22/volume/ipython_1614616430264/work ipython-genutils @ file:///tmp/build/80754af9/ipython_genutils_1606773439826/work itsdangerous==2.0.1 jedi @ file:///opt/concourse/worker/volumes/live/5006b7b5-a924-4788-6cfe-ae05d8be8830/volume/jedi_1606932947370/work Jinja2==3.0.1 jmespath==0.10.0 joblib==1.0.1 jsonlines==3.0.0 jsonschema==3.0.2 jupyter-client @ file:///tmp/build/80754af9/jupyter_client_1601311786391/work jupyter-core @ file:///opt/concourse/worker/volumes/live/a699b83f-e941-4170-5136-bf87e3f37756/volume/jupyter_core_1612213304212/work keybert==0.5.0 kiwisolver==1.3.1 langcodes==3.3.0 llvmlite==0.36.0 loguru==0.5.3 Markdown==3.3.4 markdown-it-py==0.5.8 MarkupSafe==2.0.1 matplotlib==3.4.0 mccabe==0.6.1 mkl-fft==1.2.0 mkl-random==1.1.1 mkl-service==2.3.0 mock==4.0.3 multidict==5.2.0 multiprocess==0.70.12.2 murmurhash @ file:///opt/concourse/worker/volumes/live/9a0582f9-9097-4dab-6d7a-fcf62b4968ae/volume/murmurhash_1607456116622/work myst-parser==0.12.10 nltk==3.6.5 numba==0.53.1 numpy==1.20.2 oauthlib==3.1.1 omegaconf==2.1.1 openai==0.6.3 openpyxl==3.0.9 packaging==20.9 pandas==1.2.1 parlai==1.5.1 parquet==1.3.1 parso==0.7.0 pathy==0.6.1 pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work Pillow==8.2.0 plac @ file:///opt/concourse/worker/volumes/live/a94b6881-2d18-4055-5a3c-f24036f05ef6/volume/plac_1594259982880/work pluggy==1.0.0 ply==3.11 portalocker==2.3.2 praw==7.1.0 prawcore==1.5.0 preshed @ file:///opt/concourse/worker/volumes/live/952fa955-acc7-4aa0-6766-86f802ea8ef1/volume/preshed_1608233410312/work prompt-toolkit @ file:///tmp/build/80754af9/prompt-toolkit_1616415428029/work protobuf==3.15.6 ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl py==1.11.0 py-gfm==1.0.2 py-rouge==1.1 py4j==0.10.7 pyarrow==6.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.6.1 pycodestyle==2.8.0 pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work pydantic==1.8.2 pyee==8.2.2 pyflakes==2.4.0 Pygments @ file:///tmp/build/80754af9/pygments_1615143339740/work PyJWT==2.3.0 pynndescent==0.5.2 pyodbc==4.0.32 pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1608057966937/work pyparsing==2.4.7 pyrsistent @ file:///opt/concourse/worker/volumes/live/656e0c1b-ef87-4251-4a51-1290b2351993/volume/pyrsistent_1600141745371/work PySocks @ file:///opt/concourse/worker/volumes/live/ef943889-94fc-4539-798d-461c60b77804/volume/pysocks_1605305801690/work pytest==6.2.5 pytest-datadir==1.3.1 pytest-regressions==2.2.0 python-dateutil @ file:///home/ktietz/src/ci/python-dateutil_1611928101742/work python-slugify==5.0.2 pytorch-transformers==1.2.0 pytz==2020.5 PyYAML==6.0 pyzmq==20.0.0 regex==2021.11.10 requests @ file:///tmp/build/80754af9/requests_1608241421344/work requests-mock==1.9.3 requests-oauthlib==1.3.0 requests-toolbelt==0.9.1 rich==10.16.2 rsa==4.7.2 s3transfer==0.4.2 sacremoses==0.0.44 scikit-learn==0.24.1 scipy==1.6.2 seaborn==0.11.1 sentence-transformers==1.0.4 sentencepiece==0.1.91 seqeval==0.0.5 sh==1.14.2 six @ file:///opt/concourse/worker/volumes/live/f983ba11-c9fe-4dff-7ce7-d89b95b09771/volume/six_1605205318156/work sklearn==0.0 slack-bolt==1.11.1 slack-sdk==3.13.0 slackclient==2.9.3 slackeventsapi==3.0.1 smart-open==5.2.1 smmap==5.0.0 snowballstemmer==2.2.0 spacy==3.2.0 spacy-alignments==0.8.4 spacy-legacy==3.0.8 spacy-loggers==1.0.1 spacy-sentence-bert==0.1.2 spacy-transformers==1.1.2 spark-nlp==3.0.2 Sphinx==2.2.2 sphinx-autodoc-typehints==1.10.3 sphinx-rtd-theme==1.0.0 sphinxcontrib-applehelp==1.0.2 sphinxcontrib-devhelp==1.0.2 sphinxcontrib-htmlhelp==2.0.0 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.3 sphinxcontrib-serializinghtml==1.1.5 srsly==2.4.2 subword-nmt==0.3.8 tensorboard==2.7.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.0 tensorboardX==2.4.1 text-unidecode==1.3 thinc==8.0.13 threadpoolctl==2.1.0 thriftpy2==0.4.14 tokenizers==0.10.2 toml==0.10.2 torch==1.10.1 torchtext==0.11.1 tornado @ file:///opt/concourse/worker/volumes/live/d531d395-893c-4ca1-6a5f-717b318eb08c/volume/tornado_1606942307627/work tqdm==4.62.3 traitlets @ file:///home/ktietz/src/ci/traitlets_1611929699868/work transformers==4.11.0 typer==0.4.0 typing-extensions==3.7.4.3 umap-learn==0.5.1 Unidecode==1.3.2 untokenize==0.1.1 update-checker==0.18.0 uritemplate==3.0.1 urllib3==1.26.7 wasabi==0.8.2 wcwidth @ file:///tmp/build/80754af9/wcwidth_1593447189090/work webexteamsbot==0.1.4.2 webexteamssdk==1.6 websocket-client==0.57.0 websocket-server==0.6.4 Werkzeug==2.0.1 xlrd==2.0.1 xxhash==2.0.2 yarl==1.7.2 zeroshot-topics==0.1.0 zipp @ file:///tmp/build/80754af9/zipp_1604001098328/work

    opened by sdcharle 1
  • Add size to lru_cache

    Add size to lru_cache

    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/__init__.py in <module>()
          1 __version__ = '0.1.0'
          2 
    ----> 3 from .zeroshot_tm import ZeroShotTopicFinder
    
    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/zeroshot_tm.py in <module>()
          1 import attr
          2 from keybert import KeyBERT
    ----> 3 from .utils import load_zeroshot_model
          4 from nltk.corpus import wordnet as wn
          5 
    
    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/utils.py in <module>()
          4 
          5 @lru_cache
    ----> 6 def load_zeroshot_model(model_name="valhalla/distilbart-mnli-12-6"):
          7     classifier = pipeline("zero-shot-classification", model=model_name)
          8     return classifier
    
    /usr/lib/python3.7/functools.py in lru_cache(maxsize, typed)
        488             maxsize = 0
        489     elif maxsize is not None:
    --> 490         raise TypeError('Expected maxsize to be an integer or None')
        491 
        492     def decorating_function(user_function):
    
    TypeError: Expected maxsize to be an integer or None
    

    I assume that you have to provide, maxsize parameter to lru_cache. Worked for me, when I provided the parameter.

    opened by gsasikiran 6
Releases(v.0.0.1)
Owner
Rita Anjana
ML engineer
Rita Anjana
This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".

LipGAN Generate realistic talking faces for any human speech and face identity. [Paper] | [Project Page] | [Demonstration Video] Important Update: A n

Rudrabha Mukhopadhyay 438 Dec 31, 2022
Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

TestRank in Pytorch Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks by Yu Li, Min Li, Qiuxia Lai, Ya

3 May 19, 2022
多语言降噪预训练模型MBart的中文生成任务

mbart-chinese 基于mbart-large-cc25 的中文生成任务 Input source input: text + /s + lang_code target input: lang_code + text + /s Usage token_ids_mapping.jso

11 Sep 19, 2022
Black for Python docstrings and reStructuredText (rst).

Style-Doc Style-Doc is Black for Python docstrings and reStructuredText (rst). It can be used to format docstrings (Google docstring format) in Python

Telekom Open Source Software 13 Oct 24, 2022
Help you discover excellent English projects and get rid of disturbing by other spoken language

GitHub English Top Charts 「Help you discover excellent English projects and get

GrowingGit 544 Jan 09, 2023
Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

ESACL: Enhanced Seq2Seq Autoencoder via Contrastive Learning for AbstractiveText Summarization This repo is for our paper "Enhanced Seq2Seq Autoencode

Rachel Zheng 14 Nov 01, 2022
🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

Pretrained BigBird Model for Korean What is BigBird • How to Use • Pretraining • Evaluation Result • Docs • Citation 한국어 | English What is BigBird? Bi

Jangwon Park 183 Dec 14, 2022
Official Stanford NLP Python Library for Many Human Languages

Official Stanford NLP Python Library for Many Human Languages

Stanford NLP 6.4k Jan 02, 2023
Twitter-Sentiment-Analysis - Analysis of twitter posts' positive and negative score.

Twitter-Sentiment-Analysis The hands-on project is in Python 3 Programming class offered by University of Michigan via Coursera. The task is to build

Eszter Pai 1 Jan 03, 2022
Train 🤗-transformers model with Poutyne.

poutyne-transformers Train 🤗 -transformers models with Poutyne. Installation pip install poutyne-transformers Example import torch from transformers

Lennart Keller 2 Dec 18, 2022
NLP Core Library and Model Zoo based on PaddlePaddle 2.0

PaddleNLP 2.0拥有丰富的模型库、简洁易用的API与高性能的分布式训练的能力,旨在为飞桨开发者提升文本建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

6.9k Jan 01, 2023
A framework for cleaning Chinese dialog data

A framework for cleaning Chinese dialog data

Yida 136 Dec 20, 2022
TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech

TFPNER TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech Named entity recognition (NER), which aims at identifyin

1 Feb 07, 2022
IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

Digital Phonetics at the University of Stuttgart 247 Jan 05, 2023
Conversational text Analysis using various NLP techniques

Conversational text Analysis using various NLP techniques

Rita Anjana 159 Jan 06, 2023
jiant is an NLP toolkit

🚨 Update 🚨 : As of 2021/10/17, the jiant project is no longer being actively maintained. This means there will be no plans to add new models, tasks,

ML² AT CILVR 1.5k Dec 28, 2022
A python gui program to generate reddit text to speech videos from the id of any post.

Reddit text to speech generator A python gui program to generate reddit text to speech videos from the id of any post. Current functionality Generate

Aadvik 17 Dec 19, 2022
Utilizing RBERT model for KLUE Relation Extraction task

RBERT for Relation Extraction task for KLUE Project Description Relation Extraction task is one of the task of Korean Language Understanding Evaluatio

snoop2head 14 Nov 15, 2022
Topic Modelling for Humans

gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ

RARE Technologies 13.8k Jan 02, 2023
Understand Text Summarization and create your own summarizer in python

Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent

Sreekanth M 1 Oct 18, 2022