The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Last update: Jan 28, 2022

Related tags

Text Data & NLP information_retrieval

Overview

Main Idea

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Setup

Download trained models

There are two models trained for spanish, a bi-encoder and a cross-encoder. These serve to make the retrieval system using the retrieve and rerank idea:

make setup
pip install -r requirements.txt

Basic usage

Setup Elasticsearch index with semantic vectors. For this step we supose that a set of json files is folder. Each json can contain several optional fields but need to contain id and text fiedlds.

from information_retrieval import SemanticEmbedder, CrossEncoder, Prepare, Search

data_folder = 'data/'
text_field = "texto_parrafo"
id_field = "id_parrafo"
elastic_index_name = "sentencias_2.0"

# Read the files, compute embeddings and upload them to elasticsearch
P = Prepare(data_folder, text_field, id_field, elastic_index_name)
P.prepare()

Make queries to retrieve documents:

from information_retrieval import SearchEngine

query = "la vida es bella"
S = SearchEngine(elastic_index_name)
S.retrieve(query) # Only semantic search

S.rerank(query) # Retrieve and rerank

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Related tags

Overview

Main Idea

Setup

Download trained models

Basic usage

Model architecture

Training

Finetuning

Owner

Sergio Arnaud Gomez

🏖 Easy training and deployment of seq2seq models.

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

Auto_code_complete is a auto word-completetion program which allows you to customize it on your needs

Translates basic English sentences into the Huna language (hoo-NAH)

Kestrel Threat Hunting Language

Repository of the Code to Chatbots, developed in Python

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

Modified GPT using average pooling to reduce the softmax attention memory constraints.

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

An ActivityWatch watcher to pose questions to the user and record her answers.

Machine translation models released by the Gourmet project

Automatically search Stack Overflow for the command you want to run

Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

NLP, Machine learning

edge-SR: Super-Resolution For The Masses

Signature remover is a NLP based solution which removes email signatures from the rest of the text.

Fidibo.com comments Sentiment Analyser

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.