Convert a collection of features to a fixed-dimensional matrix using the hashing trick.

Overview

FeatureHasher

Convert a collection of features to a fixed-dimensional matrix using the hashing trick.

Note, this requires Jina>=2.2.4.

Example

Here I use FeatureHasher to hash each sentence of Pride and Prejudice into a 128-dim vector, and then use .match to find top-K similar sentences.

from jina import Document, DocumentArray, Flow

# load 
   
d = Document(uri='https://www.gutenberg.org/files/1342/1342-0.txt').convert_uri_to_text()

# cut into non-empty sentences store in a DA
da = DocumentArray(Document(text=s.strip()) for s in d.text.split('\n') if s.strip())

# use FeatureHasher in a Flow
f = Flow().add(uses='jinahub://FeatureHasher')

embed_da = DocumentArray()
with f:
    f.post('/', da, on_done=lambda req: embed_da.extend(req.docs), show_progress=True)

print('self-matching...')
embed_da.match(embed_da, exclude_self=True, limit=5, normalization=(1, 0))
print('total sentences: ', len(embed_da))
for d in embed_da:
    print(d.text)
    for m in d.matches:
        print(m.scores['cosine'], m.text)
    input()
           [email protected][I]:🎉 Flow is ready to use!
	🔗 Protocol: 		GRPC
	🏠 Local access:	0.0.0.0:52628
	🔒 Private network:	192.168.178.31:52628
	🌐 Public address:	217.70.138.123:52628
⠹       DONE ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:01 100% ETA: 0 seconds 40 steps done in 1 second
total sentences:  12153
The Project Gutenberg eBook of Pride and Prejudice, by Jane Austen

   
     *** END OF THE PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE ***

    
      *** START OF THE PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE ***

     
       production, promotion and distribution of Project Gutenberg-tm

      
        Pride and Prejudice

       
         By Jane Austen This eBook is for the use of anyone anywhere in the United States and 
        
          This eBook is for the use of anyone anywhere in the United States and 
         
           by the awkwardness of the application, and at length wholly 
          
            Elizabeth passed the chief of the night in her sister’s room, and 
           
             the happiest memories in the world. Nothing of the past was 
            
              charities and charitable donations in all 50 states of the United 
            
           
          
         
        
       
      
     
    
   

In practice, you can implement matching and storing via an indexer inside Flow. This example is only for demo purpose so any non-feature hashing related ops are implemented outside the Flow to avoid distraction.

Owner
Jina AI
A Neural Search Company. We provide the cloud-native neural search solution powered by state-of-the-art AI technology.
Jina AI
ADExplorerSnapshot.py is an AD Explorer snapshot ingestor for BloodHound.

ADExplorerSnapshot.py ADExplorerSnapshot.py is an AD Explorer snapshot ingestor for BloodHound. AD Explorer allows you to connect to a DC and browse L

576 Dec 23, 2022
Bandit is a tool designed to find common security issues in Python code.

A security linter from PyCQA Free software: Apache license Documentation: https://bandit.readthedocs.io/en/latest/ Source: https://github.com/PyCQA/ba

Python Code Quality Authority 4.8k Dec 31, 2022
Source code for "A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction" @ NAACL 2022

TSAR Source code for NAACL 2022 paper: A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction. 🔥 Introduction We focus on extra

21 Sep 24, 2022
Simple tool to create passwords.

PasswordGenerator Simple password generator: -Simplisitc Window Application -Allows Numbers, Symbols & letters upper and lowercase -Restricts rows of

DM 1 Jan 10, 2022
自动化爆破子域名,并遍历所有端口寻找http服务,并使用crawlergo、dirsearch、xray等工具扫描并集成报告;支持动态添加扫描到的域名至任务;

AutoScanner AutoScanner是什么 AutoScanner是一款自动化扫描器,其功能主要是遍历所有子域名、及遍历主机所有端口寻找出所有http服务,并使用集成的工具进行扫描,最后集成扫描报告; 工具目前有:oneforall、masscan、nmap、crawlergo、dirse

633 Dec 30, 2022
Exploiting CVE-2021-44228 in vCenter for remote code execution and more

Log4jCenter Exploiting CVE-2021-44228 in vCenter for remote code execution and more. Blog post detailing exploitation linked below: COMING SOON Why? P

81 Dec 20, 2022
Find exposed API keys based on RegEx and get exploitation methods for some of keys that are found

dora Features Blazing fast as we are using ripgrep in backend Exploit/PoC steps for many of the API key, allowing to write a good report for bug bount

Siddharth Dushantha 243 Dec 27, 2022
SCodeScanner stands for Source Code scanner where the user can scans the source code for finding the Critical Vulnerabilities.

The SCodeScanner stands for Source Code Scanner, where you can scan your source code files like PHP and get identify the vulnerabilities inside it. The tool can use by Pentester, Developer to quickly

136 Dec 13, 2022
POC for CVE-2022-1388

CVE-2022-1388 POC for CVE-2022-1388 affecting multiple F5 products. Follow the Horizon3.ai Attack Team on Twitter for the latest security research: Ho

Horizon 3 AI Inc 231 Dec 07, 2022
Remote control your Greenbone Vulnerability Manager (GVM)

Greenbone Vulnerability Management Tools The Greenbone Vulnerability Management Tools gvm-tools are a collection of tools that help with remote contro

Greenbone 130 Dec 17, 2022
Lnkbomb - Malicious shortcut generator for collecting NTLM hashes from insecure file shares

Lnkbomb Lnkbomb is used for uploading malicious shortcut files to insecure file

Joe Helle 216 Jan 08, 2023
Simulating Log4j Remote Code Execution (RCE) vulnerability in a flask web server using python's logging library with custom formatter that simulates lookup substitution by executing remote exploit code.

py4jshell Simulating Log4j Remote Code Execution (RCE) CVE-2021-44228 vulnerability in a flask web server using python's logging library with custom f

Narasimha Prasanna HN 86 Aug 21, 2022
D-810 is an IDA Pro plugin which can be used to deobfuscate code at decompilation time by modifying IDA Pro microcode.

Introduction fork from https://gitlab.com/eshard/d810 What is D-810 D-810 is an IDA Pro plugin which can be used to deobfuscate code at decompilation

Banny 30 Dec 06, 2022
Internal network honeypot for detecting if an attacker or insider threat scans your network for log4j CVE-2021-44228

log4j-honeypot-flask Internal network honeypot for detecting if an attacker or insider threat scans your network for log4j CVE-2021-44228 This can be

Binary Defense 144 Nov 19, 2022
Dark-Fb No Login 100% safe

Dark-Fb No Login 100% safe TERMUX • pkg install python2 && git -y • pip2 install requests mechanize tqdm • git clone https://github.com/BOT-033/Sensei

Bukan Hamkel 1 Dec 04, 2021
Directory Traversal in Afterlogic webmail aurora and pro

CVE-2021-26294 Exploit Directory Traversal in Afterlogic webmail aurora and pro . Description: AfterLogic Aurora and WebMail Pro products with 7.7.9 a

Ashish Kunwar 8 Nov 09, 2022
Dapunta Multi Brute Force Facebook - Crack Facebook With Login - Free

✭ DMBF CRACK Dibuat Dengan ❤️ Oleh Dapunta Author: - Dapunta Khurayra X ⇨ Fitur Login [✯] Login Token ⇨ Fitur Crack [✯] Crack Dari Teman, Public,

Dapunta ID 10 Oct 19, 2022
LittleBrother is a simple parental control application monitoring specific processes on Linux hosts to monitor and limit the play time of children.

Parental Control Application LittleBrother Overview LittleBrother is a simple parental control application monitoring specific processes (read "games"

40 Dec 21, 2022
Mad Spammer is a python webhook spammer which is very easy and safe to use.

Mad Spammer 👿 Pre-Setup: Open your terminal/console and type: pip install module colorama python MadSpammer.py Setup: After doing that, you should be

1 Nov 26, 2021
hackinsta: a program to hack instagram

hackinsta a program to hack instagram Yokoback_(instahack) is the file to open, you need libraries write on import. You run that file in the same fold

1 Dec 04, 2021