This repository is home to the Optimus data transformation plugins for various data processing needs.

Overview

Transformers

test workflow build workflow

Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus.

To install plugins via homebrew

brew tap odpf/taps
brew install optimus-plugins-odpf

To install plugins via shell

curl -sL ${PLUGIN_RELEASE_URL} | tar xvz
chmod +x optimus-*
mv optimus-* /usr/bin/
Comments
  • Fix: fix ignoreupstream helper for big query view

    Fix: fix ignoreupstream helper for big query view

    Hello, Currently, for any query, we try to find the dependancy and ignoredependancy with FindDependenciesWithRegex and then we again pull the Refereced table with big query dry run.

    If query contains view which is marked with /* @ignoreupstream */ helper, then ignoredependancy will contain the view name but not the table referenced by view.

    The change here is to revise ignoredependancy list with table referenced by view.

    I kept the loop execution in sequential manner, please let me know if should add concurrency here

    enhancement 
    opened by SumitAgrawal03071989 2
  • @ignoreupstream ineffective on big query view

    @ignoreupstream ineffective on big query view

    We have a query referencing to table as well as view. select * from proj.dataset.table t1 left join proj.dataset.view v1 on t1.date = v1.date and t1.id = v1.id

    • now if we apply @ignoreupstream helper on table proj.dataset.table then it correctly ignores to create upstream dependancy for this table.
    • But if we apply @ignoreupstream helper on view proj.dataset.view ( note the view query refers to 2 more tables ) then it does not ignore view or table referenced by view.
    opened by SumitAgrawal03071989 2
  • feat : migrate plugins for the inti-container changes in optimus

    feat : migrate plugins for the inti-container changes in optimus

    As per Optimus PR, the executor boot process is standardised and maintained at optimus. Plugin devs need no longer have to wrap the executor image. closes odpf/optimus#405

    opened by smarch-int 1
  • monthly job didn't run for the last day of month

    monthly job didn't run for the last day of month

    Hi team,

    I have a bq2bq job with window configuration

      window:
        size: 720h
        offset: -48h
        truncate_to: M
    

    I expect to have transformation for date 01 to last day of the month, e.g on April, I expect got transformation from date 01 - 30. but currently only got transformation from date 01 - 29

    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-26 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-27 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-28 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-29 00:00:00+00:00
    [2022-06-13 15:08:12,324] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: start transformation job
    [2022-06-13 15:08:12,324] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: sql transformation query:
    

    after checking, I suspect the logic may related to this line, where the last day generated by windows class not included as the transformation partition.

    https://github.com/odpf/transformers/blob/ea1de4f0de3d17d9be7ccefb1e2f3beab1a685f1/task/bq2bq/executor/bumblebee/transformation.py#L393

    please kindly check it, and release the fix. thank you

    opened by novanxyz 1
  • feat: add support for secret env vars

    feat: add support for secret env vars

    With this we are adding support for using secrets in macros, we do not want to print the env vars in the logs, so exporting them as a separate file from optimus.

    Plugins can export this extra file to get env vars.

    opened by sbchaos 1
  • feat : remove wrapper image and use bq2bq executor image in plugin

    feat : remove wrapper image and use bq2bq executor image in plugin

    As per https://github.com/odpf/optimus/pull/425, the executor boot process is standardised and maintained at optimus. Plugin devs need no longer have to wrap the executor image. closes https://github.com/odpf/optimus/issues/405

    opened by smarch-int 0
  • Generate Dependencies is using the dry run apis which is bound to fail with macros

    Generate Dependencies is using the dry run apis which is bound to fail with macros

    The most intuitive way is to parse the query and hit the metadata apis instead of going through the dry run which should be definitly costly then the metadata fetch apis.

    enhancement performance 
    opened by sravankorumilli 0
  • BQ2BQ Replace load dispostion doesn't handle aggregations

    BQ2BQ Replace load dispostion doesn't handle aggregations

    Options

    1. Add an extra option in Replace load dispostition to take input from users to replace a specific or range of partitions using literals / all , dstart, dend. Default is all : all represents splitting of query to multiple partitions from dstart to dend.
    2. Use a new Load Disposition, to replace to a single destination partition which is window start
    bug 
    opened by sravankorumilli 0
Releases(v0.2.1)
Owner
Open Data Platform
Next-gen collaborative, domain-driven and distributed data platform
Open Data Platform
GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training Code and model from our AAAI 2021 paper

Amazon Web Services - Labs 83 Jan 09, 2023
Pattern Matching in Python

Pattern Matching finalmente chega no Python 3.10. E daรญ? "Pattern matching", ou "correspondรชncia de padrรตes" como รฉ conhecido no Brasil. Algumas pesso

Fabricio Werneck 6 Feb 16, 2022
Command Line Text-To-Speech using Google TTS

cli-tts Thanks to gTTS by @pndurette! This is an interactive command line text-to-speech tool using Google TTS. Just type text and the voice will be p

ReekyStive 3 Nov 11, 2022
Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

msg systems ag 169 Dec 21, 2022
This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

Pattern-Exploiting Training (PET) This repository contains the code for Exploiting Cloze Questions for Few-Shot Text Classification and Natural Langua

Timo Schick 1.4k Dec 30, 2022
NLP library designed for reproducible experimentation management

Welcome to the Transfer NLP library, a framework built on top of PyTorch to promote reproducible experimentation and Transfer Learning in NLP You can

Feedly 290 Dec 20, 2022
Natural Language Processing at EDHEC, 2022

Natural Language Processing Here you will find the teaching materials for the "Natural Language Processing" course at EDHEC Business School, 2022 What

1 Feb 04, 2022
Repository for the paper: VoiceMe: Personalized voice generation in TTS

๐Ÿ—ฃ VoiceMe: Personalized voice generation in TTS Abstract Novel text-to-speech systems can generate entirely new voices that were not seen during trai

Pol van Rijn 80 Dec 29, 2022
This is a modification of the OpenAI-CLIP repository of moein-shariatnia

This is a modification of the OpenAI-CLIP repository of moein-shariatnia

Sangwon Beak 2 Mar 04, 2022
Tools and data for measuring the popularity & growth of various programming languages.

growth-data Tools and data for measuring the popularity & growth of various programming languages. Install the dependencies $ pip install -r requireme

3 Jan 06, 2022
State of the Art Natural Language Processing

Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide

John Snow Labs 3k Jan 05, 2023
Global Rhythm Style Transfer Without Text Transcriptions

Global Prosody Style Transfer Without Text Transcriptions This repository provides a PyTorch implementation of AutoPST, which enables unsupervised glo

Kaizhi Qian 193 Dec 30, 2022
NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.

This file contains the following documents sumbited for Baruch CIS9665 group 9 fall 2021. 1. Dataset: drug_reviews.csv 2. python codes for text classi

Aarif Munwar Jahan 2 Jan 04, 2023
๋‰ด์Šค ๋„๋ฉ”์ธ ์งˆ์˜์‘๋‹ต ์‹œ์Šคํ…œ (21-1ํ•™๊ธฐ ์กธ์—… ํ”„๋กœ์ ํŠธ)

๋‰ด์Šค ๋„๋ฉ”์ธ ์งˆ์˜์‘๋‹ต ์‹œ์Šคํ…œ ๋ณธ ํ”„๋กœ์ ํŠธ๋Š” ๋‰ด์Šค๊ธฐ์‚ฌ์— ๋Œ€ํ•œ ์งˆ์˜์‘๋‹ต ์„œ๋น„์Šค ๋ฅผ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ง„ํ–‰ํ•œ ํ”„๋กœ์ ํŠธ์ž…๋‹ˆ๋‹ค. ์•ฝ 3๊ฐœ์›”๊ฐ„ ( 21. 03 ~ 21. 05 ) ์ง„ํ–‰ํ•˜์˜€์œผ๋ฉฐ Transformer ์•„ํ‚คํ…์ณ ๊ธฐ๋ฐ˜์˜ Encoder๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•œ๊ตญ์–ด ์งˆ์˜์‘๋‹ต ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ

TaegyeongEo 4 Jul 08, 2022
A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

RunMany Intro | Installation | VSCode Extension | Usage | Syntax | Settings | About A tool to run many programs written in many languages from one fil

6 May 22, 2022
Code for text augmentation method leveraging large-scale language models

HyperMix Code for our paper GPT3Mix and conducting classification experiments using GPT-3 prompt-based data augmentation. Getting Started Installing P

NAVER AI 47 Dec 20, 2022
Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

Import Subtitles for Blender VSE Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module. Supported formats by py

4 Feb 27, 2022
String Gen + Word Checker

Creates random strings and checks if any of them are a real words. Mostly a waste of time ngl but it is cool to see it work and the fact that it can generate a real random word within10sec

1 Jan 06, 2022
ConvBERT: Improving BERT with Span-based Dynamic Convolution

ConvBERT Introduction In this repo, we introduce a new architecture ConvBERT for pre-training based language model. The code is tested on a V100 GPU.

YITUTech 237 Dec 10, 2022
Ecco is a python library for exploring and explaining Natural Language Processing models using interactive visualizations.

Visualize, analyze, and explore NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BER

Jay Alammar 1.6k Dec 25, 2022