Convert a collection of features to a fixed-dimensional matrix using the hashing trick.

Overview

FeatureHasher

Convert a collection of features to a fixed-dimensional matrix using the hashing trick.

Note, this requires Jina>=2.2.4.

Example

Here I use FeatureHasher to hash each sentence of Pride and Prejudice into a 128-dim vector, and then use .match to find top-K similar sentences.

from jina import Document, DocumentArray, Flow

# load 
   
d = Document(uri='https://www.gutenberg.org/files/1342/1342-0.txt').convert_uri_to_text()

# cut into non-empty sentences store in a DA
da = DocumentArray(Document(text=s.strip()) for s in d.text.split('\n') if s.strip())

# use FeatureHasher in a Flow
f = Flow().add(uses='jinahub://FeatureHasher')

embed_da = DocumentArray()
with f:
    f.post('/', da, on_done=lambda req: embed_da.extend(req.docs), show_progress=True)

print('self-matching...')
embed_da.match(embed_da, exclude_self=True, limit=5, normalization=(1, 0))
print('total sentences: ', len(embed_da))
for d in embed_da:
    print(d.text)
    for m in d.matches:
        print(m.scores['cosine'], m.text)
    input()
           [email protected][I]:🎉 Flow is ready to use!
	🔗 Protocol: 		GRPC
	🏠 Local access:	0.0.0.0:52628
	🔒 Private network:	192.168.178.31:52628
	🌐 Public address:	217.70.138.123:52628
⠹       DONE ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:01 100% ETA: 0 seconds 40 steps done in 1 second
total sentences:  12153
The Project Gutenberg eBook of Pride and Prejudice, by Jane Austen

   
     *** END OF THE PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE ***

    
      *** START OF THE PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE ***

     
       production, promotion and distribution of Project Gutenberg-tm

      
        Pride and Prejudice

       
         By Jane Austen This eBook is for the use of anyone anywhere in the United States and 
        
          This eBook is for the use of anyone anywhere in the United States and 
         
           by the awkwardness of the application, and at length wholly 
          
            Elizabeth passed the chief of the night in her sister’s room, and 
           
             the happiest memories in the world. Nothing of the past was 
            
              charities and charitable donations in all 50 states of the United 
            
           
          
         
        
       
      
     
    
   

In practice, you can implement matching and storing via an indexer inside Flow. This example is only for demo purpose so any non-feature hashing related ops are implemented outside the Flow to avoid distraction.

Owner
Jina AI
A Neural Search Company. We provide the cloud-native neural search solution powered by state-of-the-art AI technology.
Jina AI
Vulnerability Scanner & Auto Exploiter You can use this tool to check the security by finding the vulnerability in your website or you can use this tool to Get Shells

About create a target list or select one target, scans then exploits, done! Vulnnr is a Vulnerability Scanner & Auto Exploiter You can use this tool t

Nano 108 Dec 04, 2021
阿里云accesskey利用工具

aliyun-accesskey-Tools 此工具用于查询ALIYUN_ACCESSKEY的主机,并且远程执行命令。 对于ALIYUN_ACCESSKEY利用方式可参考文章:记一次阿里云主机泄露Access Key到Getshell 工具截图 安装模块 pip install -r require

一灯老和尚 826 Jan 01, 2023
Fast python tool to test apache path traversal CVE-2021-41773 in a List of url

CVE-2021-41773 Fast python tool to test apache path traversal CVE-2021-41773 in a List of url Usage :- create a live urls file and use the flag "-l" p

Zahir Tariq 12 Nov 09, 2022
Scan your logs for CVE-2021-44228 related activity and report the attackers

jndiRep - CVE-2021-44228 Basically a bad grep on even worse drugs. search for malicious strings decode payloads print results to stdout or file report

js-on 2 Nov 24, 2022
A collection of write-ups and solutions for Cyber FastTrack Spring 2021.

IMPORTANT: Please contact us before you use any styling or content shown here! Cyber FastTrack Spring 2021 / National Cyber Scholarship Competition -

Alice 48 Aug 28, 2022
A simple password generator using Python Tkinter.

Password-Generator-using-Python A simple password generator that generates password for you. User can Copy the password to Clipboard. Project made usi

Prashant Agheda 1 Nov 02, 2022
GitLab CI security tools runner

Common Security Pipeline Описание проекта: Данный проект является вариантом реализации DevSecOps практик, на базе: GitLab DefectDojo OpenSouce tools g

Сити-Мобил 14 Dec 23, 2022
xkeysnail is yet another keyboard remapping tool for X environment written in Python

xkeysnail is yet another keyboard remapping tool for X environment written in Python. It's like xmodmap but allows more flexible remappings.

Masafumi Oyamada 809 Dec 26, 2022
🏃 Python Solutions of All Problems in FHC 2021 (In Progress)

FacebookHackerCup-2021 Python solutions of Facebook Hacker Cup 2021. Solution begins with * means it will get TLE in the largest data set (total compu

kamyu 14 Oct 15, 2022
Infoga is a tool gathering email accounts informations (ip,hostname,country,...) from different public source

Infoga - Email OSINT Infoga is a tool gathering email accounts informations (ip,hostname,country,...) from different public source (search engines, pg

m4ll0k (mallok) 1.8k Jan 04, 2023
macOS Initial Access Payload Generator

Mystikal macOS Initial Access Payload Generator Related Blog Post: https://posts.specterops.io/introducing-mystikal-4fbd2f7ae520 Usage: Install Xcode

Leo Pitt 206 Dec 31, 2022
Log4Shell Proof of Concept (CVE-2021-44228)

CVE-2021-44228 Log4Shell Proof of Concept (CVE-2021-44228) Make sure to use Java 8 JDK. Java 8 Download Images Credits Casey Dunham - Java Reverse She

Kr0ff 3 Jul 23, 2022
adb - A tool that allows you to search for vulnerable android devices across the world and exploit them.

adb - An exploitation tool for android devices. A tool that allows you to search for vulnerable android devices across the world and exploit them. Fea

136 Jan 02, 2023
An advanced multi-threaded, multi-client python reverse shell for hacking linux systems

PwnLnX An advanced multi-threaded, multi-client python reverse shell for hacking linux systems. There's still more work to do so feel free to help out

0xTRAW 212 Dec 24, 2022
A small Minecraft server to help players detect vulnerability to the Log4Shell exploit 🐚

log4check A small Minecraft server to help players detect vulnerability to the Log4Shell exploit 🐚 Tested to work between Minecraft versions 1.12.2 a

Evan J. Markowitz 4 Dec 23, 2021
Facebook account cloning/hacking advanced tool + dictionary attack added | Facebook automation tool

loggef Facebook automation tool, Facebook account hacking and cloning advanced tool + dictionary attack added Warning Use this tool for educational pu

Md Josif Khan 149 Aug 10, 2022
POC using subprocess lib in Python 🐍

POC subprocess ☞ POC using the subprocess library with Python. References: https://github.com/GuillaumeFalourd/poc-subprocess https://geekflare.com/le

Guillaume Falourd 2 Nov 28, 2022
recover Firefox and more browsers logins

Browser Creds this script will recover saved browsers logins into txt files. It currently only support windows 10. currently support : Chrome Opera Fi

HugoLB 41 Nov 09, 2022
Log4j rce test environment and poc

log4jpwn log4j rce test environment See: https://www.lunasec.io/docs/blog/log4j-zero-day/ Experiments to trigger in various software products mentione

Leon Jacobs 307 Dec 24, 2022
A script based on sqlmap that uses sql injection vulnerabilities to traverse the existence of a file

A script based on sqlmap that uses sql injection vulnerabilities to traverse the existence o

2 Nov 09, 2022