Exploring dimension-reduced embeddings

Last update: Nov 29, 2022

Related tags

Text Data & NLP sleepwalk

Overview

sleepwalk

Exploring dimension-reduced embeddings

This is the code repository. See here for the Sleepwalk web page.

License and disclaimer

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Comments

Error running sleepwalk: cannot open the connection
Dear sleepwalk developers, Thanks a lot for providing such nice method. I could install the package but I get the following error when I tried to run:

> sleepwalk([email protected][email protected], [email protected][email protected]) Estimating 'maxdist' for feature matrix 1 Server has been stopped. Server has been stopped. Error in app$openPage(useViewer, browser) : Timeout waiting for websocket. In addition: Warning messages: 1: In file(con, "r") : cannot open file 'sleepwalk_canvas.html': No such file or directory 2: In func(req) : File '/favicon.ico' is not found

I know this is probably not a sleepwalk specific error, but I couldn't find a solution for this. Any hints/help on how to fix this issue?

Also, I have a question about the output. Besides using the interactive mode to manually inspect cells that might be "misplaced" on the reduced-dimension space, I would like to systematically find the cells that don't quite fit to the clusters they were originally assigned to. In other words, how would you suggest to use sleepwalk to refine my clustering since I suspect that many of my cells were wrongly assigned to their clusters. I am using Seurat package to reduce dimension and clustering.

Thank you very much, Gustavo
opened by gufranca 2
Error: 'browser' must be a non-empty character string
Hello,

After calling the sleepwalk function on a Seurat object, I got this error:

> sleepwalk( as.matrix([email protected][email protected]), as.matrix([email protected][email protected]) ) Estimating 'maxdist' for feature matrix 1 Error in browseURL(str_c("http://localhost:", port, "/", pageobj$startPage), : 'browser' must be a non-empty character string

I have loaded the stringr library (containing the function str_c()), and I cannot find the file originating this error. Can I ask if someone had this problem at some point?

Thank you
opened by PedroRaposo 2
slw_on_selection error when sleepwalk is not attached

Running sleepwalk without attaching the package (i.e., NOT specifying library(sleepwalk)) like this works fine:

sleepwalk::sleepwalk(se[email protected][email protected], t([email protected][[email protected],]))

But the moment you select cells with your mouse, it crashed (browser tab closes) and R gives this error:

Error in slw_on_selection(selPoints, 1) : could not find function "slw_on_selection"

Loading the package using library(sleepwalk) solves the issue, but it'd be nice if it weren't necessary.

opened by FelixTheStudent 0
doc for comparison

The example on the web page for comparing two embeddings still uses the old version where both distances are used concurrently. We also need to change the explanation below to say that the same cell always has the same colour in all embeddings

opened by simon-anders 0
Suggestion: Link embeddings from transposed table

Let say I have e.g. a matrix where I have individuals (cells e.g.) as rows and features as columns, and then run a UMAP on both the ordinary matrix, and the transposed one. Then it would be natural to want to look at the individual UMAP with the default usage (the distances to other individuals), but it would also be interesting to see the features for that individual (and vice versa).

Is it clear what I mean?

opened by StaffanBetner 2

Releases(v0.3.2)

v0.3.2(Sep 17, 2021)
jrc now (v.0.5.0) uses setLimits function for all the security restriction. This update fixes the dependency problem caused by that change.

Source code(tar.gz)
Source code(zip)
v0.3.1(Sep 30, 2020)
broken path to the start page, caused by jrc update fixed

Source code(tar.gz)
Source code(zip)
v.0.3.0(Feb 27, 2020)
New argument metric allows to use angular distance (metric = "cosine") as an alternative to default Euclidean distance (meric = "euclid").

If compare = "distances", it is no longer required to provide several embeddings. If only one embedding is given, it will be used for all the distances.

Source code(tar.gz)
Source code(zip)
v0.2.1(Oct 2, 2019)
Changes due to an update of the jrc package.

Indices of selected points are no longer stored in a variable and can be accessed only via the callback function. Thus, no changes to the global environment are made, unless user specifies them his- or herself.

Added the possibility to pass arguments to jrc::openPage (such as port number or browser in which to open the app.)

Source code(tar.gz)
Source code(zip)
v0.2.0(Sep 27, 2019)
Now HTML Canvas is used to plot the embedding. It makes Sleepwalk faster and allows to simultaneously display more points.

New parameter mode = c("canvas", "svg") is added, that allows user to go back to the old SVG-based version of Sleepwalk app.

Bug in slw_snapshot is fixed. The function no longer returns a list of identical plots, when used with several different embeddings.

Source code(tar.gz)
Source code(zip)

Owner

S. Anders's research group at ZMBH

GitHub Repository https://anders-biostat.github.io/sleepwalk/

The SVO-Probes Dataset for Verb Understanding

The SVO-Probes Dataset for Verb Understanding This repository contains the SVO-Probes benchmark designed to probe for Subject, Verb, and Object unders

20 Nov 30, 2022

Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers

beyond masking Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers The code is coming Figure 1: Pipeline of token-based pre-

23 Sep 27, 2022

A raytrace framework using taichi language

ti-raytrace The code use Taichi programming language Current implement acceleration lvbh disney brdf How to run First config your anaconda workspace,

73 Dec 11, 2022

Experiments in converting wikidata to ftm

FollowTheMoney / Wikidata mappings This repo will contain tools for converting Wikidata entities into FtM schema. Prefixes: https://www.mediawiki.org/

2 Nov 12, 2021

AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

AI Dynamic Text Reader: This is a simple dynamic text reader based on Artificial

1 Jan 18, 2022

An A-SOUL Text Generator Based on CPM-Distill.

ASOUL-Generator-Backend 本项目为 https://asoul.infedg.xyz/ 的后端。模型为基于 CPM-Distill 的 transformers 转化版本 CPM-Generate-distill 训练而成。

46 Dec 11, 2022

Mkdocs + material + cool stuff

Modern-Python-Doc-Example mkdocs + material + cool stuff Doc is live here Features out of the box amazing good looking website thanks to mkdocs.org an

61 Oct 26, 2022

Translates basic English sentences into the Huna language (hoo-NAH)

huna-translator The Huna Language Translates basic English sentences into the Huna language (hoo-NAH). The Huna constructed language was developed in

0 Jan 20, 2022

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

59 Dec 01, 2022

Python library for processing Chinese text

SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库，可以方便的处理中文文本内容，是受到了TextBlob的启发而写的，由于现在大部分的自然语言处理库基本都是针对英文的，于是写了一个方便处理中文的类库，并且和TextBlob

6k Jan 02, 2023

STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

STonKGs STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs. This multimodal Transformer combin

27 Aug 11, 2022

문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

Namuwiki corpus 문장단위로 미리 분절된 나무위키 코퍼스. 목적이 LM등에서 사용하기 위한 데이터셋이라, 링크/이미지/테이블 등등이 잘려있습니다. 문장 단위 분절은 kss를 활용하였습니다. 라이선스는 나무위키에 명시된 바와 같이 CC BY-NC-SA 2.0

16 Apr 02, 2022

Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

2.7k Jan 08, 2023

This is a simple item2vec implementation using gensim for recbole

recbole-item2vec-model This is a simple item2vec implementation using gensim for recbole( https://recbole.io ) Usage When you want to run experiment f

2 Oct 06, 2022

Arabic speech recognition, classification and text-to-speech.

klaam Arabic speech recognition, classification and text-to-speech using many advanced models like wave2vec and fastspeech2. This repository allows tr

177 Dec 27, 2022

Translate U is capable of translating the text present in an image from one language to the other.

Translate U is capable of translating the text present in an image from one language to the other. The app uses OCR and Google translate to identify and translate across 80+ languages.

1 Dec 22, 2021

Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

Linear Multihead Attention (Linformer) PyTorch Implementation of reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer:

58 Dec 23, 2022

Exploring dimension-reduced embeddings

Related tags

Overview

sleepwalk

License and disclaimer

Comments

Error running sleepwalk: cannot open the connection

Error: 'browser' must be a non-empty character string

slw_on_selection error when sleepwalk is not attached

doc for comparison

Suggestion: Link embeddings from transposed table

Releases(v0.3.2)

v0.3.2(Sep 17, 2021)

v0.3.1(Sep 30, 2020)

v.0.3.0(Feb 27, 2020)

v0.2.1(Oct 2, 2019)

v0.2.0(Sep 27, 2019)

Owner

S. Anders's research group at ZMBH

The SVO-Probes Dataset for Verb Understanding

Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers

A raytrace framework using taichi language

Experiments in converting wikidata to ftm

AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

An A-SOUL Text Generator Based on CPM-Distill.

Mkdocs + material + cool stuff

Translates basic English sentences into the Huna language (hoo-NAH)

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Python library for processing Chinese text

STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

Text preprocessing, representation and visualization from zero to hero.

This is a simple item2vec implementation using gensim for recbole

Arabic speech recognition, classification and text-to-speech.

Translate U is capable of translating the text present in an image from one language to the other.

Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

Crie tokens de autenticação íntegros e seguros com UToken.

Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)