Convert a collection of features to a fixed-dimensional matrix using the hashing trick.

Overview

FeatureHasher

Convert a collection of features to a fixed-dimensional matrix using the hashing trick.

Note, this requires Jina>=2.2.4.

Example

Here I use FeatureHasher to hash each sentence of Pride and Prejudice into a 128-dim vector, and then use .match to find top-K similar sentences.

from jina import Document, DocumentArray, Flow

# load 
   
d = Document(uri='https://www.gutenberg.org/files/1342/1342-0.txt').convert_uri_to_text()

# cut into non-empty sentences store in a DA
da = DocumentArray(Document(text=s.strip()) for s in d.text.split('\n') if s.strip())

# use FeatureHasher in a Flow
f = Flow().add(uses='jinahub://FeatureHasher')

embed_da = DocumentArray()
with f:
    f.post('/', da, on_done=lambda req: embed_da.extend(req.docs), show_progress=True)

print('self-matching...')
embed_da.match(embed_da, exclude_self=True, limit=5, normalization=(1, 0))
print('total sentences: ', len(embed_da))
for d in embed_da:
    print(d.text)
    for m in d.matches:
        print(m.scores['cosine'], m.text)
    input()
           [email protected][I]:πŸŽ‰ Flow is ready to use!
	πŸ”— Protocol: 		GRPC
	🏠 Local access:	0.0.0.0:52628
	πŸ”’ Private network:	192.168.178.31:52628
	🌐 Public address:	217.70.138.123:52628
β Ή       DONE ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:01 100% ETA: 0 seconds 40 steps done in 1 second
total sentences:  12153
ο»ΏThe Project Gutenberg eBook of Pride and Prejudice, by Jane Austen

   
     *** END OF THE PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE ***

    
      *** START OF THE PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE ***

     
       production, promotion and distribution of Project Gutenberg-tm

      
        Pride and Prejudice

       
         By Jane Austen This eBook is for the use of anyone anywhere in the United States and 
        
          This eBook is for the use of anyone anywhere in the United States and 
         
           by the awkwardness of the application, and at length wholly 
          
            Elizabeth passed the chief of the night in her sister’s room, and 
           
             the happiest memories in the world. Nothing of the past was 
            
              charities and charitable donations in all 50 states of the United 
            
           
          
         
        
       
      
     
    
   

In practice, you can implement matching and storing via an indexer inside Flow. This example is only for demo purpose so any non-feature hashing related ops are implemented outside the Flow to avoid distraction.

Owner
Jina AI
A Neural Search Company. We provide the cloud-native neural search solution powered by state-of-the-art AI technology.
Jina AI
A tool to crack a wifi password with a help of wordlist

A tool to crack a wifi password with a help of wordlist. This may take long to crack a wifi depending upon number of passwords your wordlist contains. Also it is slower as compared to social media ac

Saad 144 Dec 29, 2022
Log4jake works by spidering a web application for GET/POST requests

Log4jake Log4jake works by spidering a web application for GET/POST requests. It will then automatically execute the GET/POST requests, filling any di

16 May 09, 2022
GitLab CE/EE Preauth RCE using ExifTool

CVE-2021-22205 GitLab CE/EE Preauth RCE using ExifTool This project is for learning only, if someone's rights have been violated, please contact me to

3ND 164 Dec 10, 2022
BOF-Roaster is an automated buffer overflow exploit machine which is begin written with Python 3.

BOF-Roaster is an automated buffer overflow exploit machine which is begin written with Python 3. On first release it was able to successfully break many of the most well-known buffer overflow exampl

Kaan Caglan 5 Nov 23, 2021
AttractionFinder - 2022 State Qualified FBLA Attraction Finder Application

Attraction Finder Developers: Riyon Praveen, Aaron Bijoy, & Yash Vora How It Wor

$ky 2 Feb 09, 2022
Python APK Reverser & Patcher Tool

DTL-X An Advanced Python APK Reverser and Patcher Tool. --rmads1: target=AndroidManifest.xml,replace=com.google.android.gms.ad --rmads2: No Internet (

DedSecTL 10 Oct 31, 2022
Searches filesystem for CVE-2021-44228 and CVE-2021-45046 vulnerable instances of log4j library, including embedded (jar/war/zip) packaged ones.

log4shell_finder Python port of https://github.com/mergebase/log4j-detector log4j-detector is copyright (c) 2021 - MergeBase Software Inc. https://mer

Hynek Petrak 33 Jan 04, 2023
dos-atack-tor script de python que permite usar conexiones cebollas para atacar paginas .onion o paginas convencionales via tor.

script de python que permite usar conexiones cebollas para atacar paginas .onion o paginas convencionales via tor. tiene capacidad de ajustar la cantidad de informacion a enviar, el numero de hilos a

Desmon 2 Jun 01, 2022
Chromepass - Hacking Chrome Saved Passwords

Chromepass - Hacking Chrome Saved Passwords and Cookies View Demo Β· Report Bug Β· Request Feature Table of Contents About the Project AV Detection Gett

darkArp 622 Jan 04, 2023
Ini membuat tema berbasis bendera Indonesia with Python + Linux.py

tema Ubah Tema Termux Menjadi Linux Ubah Font Termux Jadi Linux dibuat oleh wahyudioputra INSTALL pkg update && pkg upgrade pkg install python pkg ins

wahyudioputra 2 Nov 30, 2021
Password list generator for password spraying - prebaked with goodies

Generates permutations of Months, Seasons, Years, Sports Teams (NFL, NBA, MLB, NHL), Sports Scores, "Password", and even Iterable Keyspaces of a specified size.

Casey Erdmann 65 Dec 22, 2022
A windows post exploitation tool that contains a lot of features for information gathering and more.

Crowbar - A windows post exploitation tool Status - βœ”οΈ This project is now considered finished. Any updates from now on will most likely be new script

29 Nov 20, 2022
Instagram brute force tool that uses tor as its proxy connections

Insta-crack This is a instagram brute force tool that uses tor as its proxy connections, keep in mind that you should not do anything illegal with thi

Liam 3 Jan 28, 2022
Course: Information Security with Python

Curso: Segurança da Informação com Python Curso realizado atravès da Plataforma da Digital Innovation One Prof: Bruno Dias Conteúdo: Introdução aos co

Elizeu Barbosa Abreu 1 Nov 28, 2021
M.E.A.T. - Mobile Evidence Acquisition Toolkit

M.E.A.T. - Mobile Evidence Acquisition Toolkit Meet M.E.A.T! From Jack Farley - BlackStone Discovery This toolkit aims to help forensicators perform d

1 Nov 11, 2021
CVE-2022-1388 F5 BIG-IP iControl REST Auth Bypass RCE

CVE-2022-1388 CVE-2022-1388 F5 BIG-IP iControl REST Auth Bypass RCE. POST /mgmt/tm/util/bash HTTP/1.1 Host: Accept-Encoding: gzip, deflate Accept: */

M4rtin Hsu 81 Dec 12, 2022
Mert GΓΌvenΓ§li 142 Jan 05, 2023
CVE-2021-21972

CVE-2021-21972 % python3 /tmp/CVE_2021_21972.py -i /tmp/urls.txt -n 8 -e [*] Creating tmp.tar containing ../../../../../home/vsphere-ui/.ssh/authoriz

Keith Lee 30 Nov 19, 2022
Deltaspy - an advanced keylogger that can send keylogs and screenshots to gmail

Deltaspy Deltaspy is a advanced keylogger which sends keylogs and screenshot to

Praanesh S 1 Dec 31, 2021
Exploit for CVE-2017-17562 vulnerability, that allows RCE on GoAhead (< v3.6.5) if the CGI is enabled and a CGI program is dynamically linked.

GoAhead RCE Exploit Exploit for CVE-2017-17562 vulnerability, that allows RCE on GoAhead ( v3.6.5) if the CGI is enabled and a CGI program is dynamic

Francisco SpΓ­nola 2 Dec 12, 2021