:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.

Overview

dbt-pre-commit

pre-commit-dbt

CI black black

List of pre-commit hooks to ensure the quality of your dbt projects.

BETA NOTICE: This tool is still BETA and may have some bugs, so please be forgiving!

Goal

Quick ensure the quality of your dbt projects.

dbt is awesome, but when a number of models, sources, and macros grow it starts to be challenging to maintain quality. People often forget to update columns in schema files, add descriptions, or test. Besides, with the growing number of objects, dbt slows down, users stop running models/tests (because they want to deploy the feature quickly), and the demands on reviews increase.

If this is the case, pre-commit-dbt is here to help you!

List of pre-commit-dbt hooks

๐Ÿ’ก Click on hook name to view the details.

Model checks:

Script checks:

Source checks:

Modifiers:

dbt commands:


โ— If you have an idea for a new hook or you found a bug, let us know โ—

Install

For detailed installation and usage, instructions see pre-commit.com site.

pip install pre-commit

Setup

  1. Create a file named .pre-commit-config.yaml in your dbt root folder.
  2. Add list of hooks you want to run befor every commit. E.g.:
repos:
- repo: https://github.com/offbi/pre-commit-dbt
  rev: v0.1.1
  hooks:
  - id: check-script-semicolon
  - id: check-script-has-no-table-name
  - id: dbt-test
  - id: dbt-docs-generate
  - id: check-model-has-all-columns
    name: Check columns - core
    files: ^models/core
  - id: check-model-has-all-columns
    name: Check columns - mart
    files: ^models/mart
  - id: check-model-columns-have-desc
    files: ^models/mart
  1. Optionally, run pre-commit install to set up the git hook scripts. With this, pre-commit will run automatically on git commit! You can also manually run pre-commit run after you stage all files you want to run. Or pre-commit run --all-files to run the hooks against all of the files (not only staged).

Run as Github Action

Unfortunately, you cannot natively use pre-commit-dbt if you are using dbt Cloud. But you can run checks after you push changes into Github.

To do that, make a file .github/workflows/pre-commit.yml.

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 - uses: actions/[email protected]
 - uses: pre-commit/[email protected]

To run only changed files:

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 - uses: actions/[email protected]
 - id: file_changes
 uses: trilom/[email protected]
 with:
 output: ' '
 - uses: pre-commit/[email protected]
 with:
 extra_args: --files ${{ steps.file_changes.outputs.files}}

To be able to run modifiers you need to use only private repository and change your .github/workflows/pre-commit.yml to:

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 with:
 fetch-depth: 0
 - uses: actions/[email protected]
 - id: file_changes
 uses: trilom/[email protected]
 with:
 output: ' '
 - uses: pre-commit/[email protected]
 with:
 extra_args: --files ${{ steps.file_changes.outputs.files}}
 token: ${{ secrets.GITHUB_TOKEN }}

For more informations about pre-commit/action visit https://github.com/pre-commit/action.

Comments
  • add argument support to dbt commands (dbt clean & dbt deps)

    add argument support to dbt commands (dbt clean & dbt deps)

    This enhancement will allow the dbt-clean and dbt-deps hooks to take arguments for global and cmd flags. This addresses issue 39 by allowing the project-dir flag to be added to the dbt clean and dbt deps commands Example:

    • id: dbt-clean args: ["--cmd-flags", "++project-dir", "./transform/dbt"]
    opened by Aostojic 8
  • Docker image pull down image action fails

    Docker image pull down image action fails

    Describe the bug When hooking to the docker file, getting error invalid reference format when the build runs the step to pull the action image.

    Pull down action image 'offbi:pre-commit-dbt:v1.0.0' /usr/bin/docker pull offbi:pre-commit-dbt:v1.0.0 invalid reference format Warning: Docker pull failed with exit code 1, back off 3.275 seconds before retry.

    To Reproduce Steps to reproduce the behavior:

    1. add new file .github/workflows/pre-commit-dbt.yml:
    name: pre-commit
    
    on:
      pull_request:
        branches: [master]
    
    jobs:
      pre-commit:
        runs-on: ubuntu-latest
        steps:
        - uses: actions/[email protected]
        - uses: actions/[email protected]
        - id: file_changes
          uses: trilom/[email protected]
          with:
            output: ' '
        - uses: offbi/[email protected]
          env:
            DB_PASSWORD: ${{ secrets.SuperSecret }}
          with:
            args: run --files ${{ steps.file_changes.outputs.files}}
    
    1. add the following code to existing (or create new) .pre-commit-config.yaml
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: v1.0.0
      hooks:
      - id: check-script-has-no-table-name
        files: ^dbt
    
    1. commit changes and create a pull request to trigger test build.

    Expected behavior Build test completes successfully.

    Version: v1.0.0 (dbt version 0.21.0)

    Additional context I think it's odd that the yml file clearly shows offbi/[email protected] for the step action, but when the docker pull command is passed the syntax is changed to offbi:pre-commit-dbt:v1.0.0 (notice colons in place of slash & @). Nowhere in the documentation does it list how to avoid this. Also note: The "test" completes successfully if the following block is removed (all other steps run correctly):

        - uses: offbi/[email protected]
          env:
            DB_PASSWORD: ${{ secrets.SuperSecret }}
          with:
            args: run --files ${{ steps.file_changes.outputs.files}}
    

    So it seems that it's this specific docker image causing the issue.

    bug 
    opened by grace-cityblock 6
  • add prj_root argument to dbt commands

    add prj_root argument to dbt commands

    This enhancement enables user to benefit from dbt-command hooks whenever their dbt project is not at the root level of the repo. It works by providing an additional argument to the dbt commands. Example: hooks: - id: dbt-docs-generate files: ^projects/ args: ["--prj-root","./common/dbt"] - id: dbt-deps files: ^projects/ args: ["--prj-root","./common/dbt"]

    In this example, the root of the dbt project is located inside the folder './commond/dbt'.

    Addresses issue https://github.com/offbi/pre-commit-dbt/issues/39

    opened by joaobernardopa 5
  • Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json')

    Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json')

    Describe the bug Any hook that tries to read target/manifest.json results in Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json') if dbt project directory (contains target/) != git project root.

    To Reproduce Steps to reproduce the behavior:

    1. Create the following project structure
    my_project/
    โ”œโ”€โ”€ .git/
    โ””โ”€โ”€ dbt_project/
         โ”œโ”€โ”€ dbt_project.yml
         โ”œโ”€โ”€ target/
         โ”œโ”€โ”€ models/
         โ””โ”€โ”€ ...
    
    1. Run any hook that requires the manifest

    Expected behavior I guess the current behavior is expected but there could be an option to specify the dbt project directory.

    Workaround: cp -r dbt_project/target target

    Version: v1.0.0

    bug 
    opened by stumelius 5
  • Cannot use the action in a workflow

    Cannot use the action in a workflow

    Describe the bug Hi there, first of all thanks for the efforts in developing this library. We are trying to include this step in our workflow, but we are facing an issue related to the docker image. It cannot pull it:

    Screenshot 2021-04-23 at 10 31 04

    Might the issue be related to the actual image defined in action.yaml ?

    To Reproduce Steps to reproduce the behavior:

    1. follow either the readme example or the snippet as reported in the marketplace

    Example configuration:

         - id: dbt_checks
            uses: offbi/[email protected]
            env:
              [ ... ]
            with:
              args: run --files ${{ steps.file_changes.outputs.files }}
    

    Version: v1.0.0

    bug 
    opened by Sh1n 5
  • check-column-desc-are-same fails with a Python error

    check-column-desc-are-same fails with a Python error

    Describe the bug When trying to use the check-column-desc-are-same hook, I'm getting the following Python error. Other hooks I've tried so far are working:

    Check column descriptions are same.......................................Failed
    - hook id: check-column-desc-are-same
    - exit code: 1
    
    Traceback (most recent call last):
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/bin/check-column-desc-are-same", line 8, in <module>
        sys.exit(main())
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 80, in main
        return check_column_desc(paths=args.filenames, ignore=args.ignore)
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 55, in check_column_desc
        grouped = get_grouped(paths, ignore)
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 48, in get_grouped
        sorted(columns, key=lambda x: x.column_name), lambda x: x.column_name
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 29, in get_all_columns
        for item in schemas:
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/utils.py", line 134, in get_model_schemas
        model_name = model.get("name")
    AttributeError: 'str' object has no attribute 'get'
    

    Version: v0.1.1 Python 3.7.9

    bug 
    opened by MartinGuindon 5
  • check-script-has-no-table-name is failing when using lateral flatten

    check-script-has-no-table-name is failing when using lateral flatten

    Describe the bug See updated description of the bug.

    ~~The check-script-has-no-table-name pre-commit hook is confusing CTEs with tables, and fails with code like this:~~

    with source as (
    
        select * from {{ source('stripe', 'payments') }}
    
    ),
    
    renamed as (
    
        select
            id as payment_id,
            order_id,
            payment_method,
    
            --`amount` is currently stored in cents, so we convert it to dollars
            amount / 100 as amount
    
        from source
    
    )
    
    select * from renamed
    

    ~~It reports that "source" and "renamed" are tables even though they are not, even though it looks the same from a code perspective.~~

    ~~I think this hook should perhaps fail only at the presence of schema.table or database.schema.table references, unless we can make this hook smarter by being aware of the CTEs defined in the model.~~

    Version: v0.1.1

    bug 
    opened by MartinGuindon 5
  • Fix Github Action docker reference

    Fix Github Action docker reference

    According to Github Actions documentation, we should reference the docker image directly from the project: https://docs.github.com/pt/actions/creating-actions/creating-a-docker-container-action#creating-an-action-metadata-file

    This fixes: https://github.com/offbi/pre-commit-dbt/issues/26

    opened by tlfbrito 3
  • Add check-model-name-contract hook

    Add check-model-name-contract hook

    There's now a check-model-name-contract hook (similar to check-model-name-contract) that is used to ensure that models follow naming convention.

    To-do before merge

    • [x] Write unit tests for the hook
    opened by stumelius 3
  • check-model-has-properties-file fails on macro with a valid properties yml

    check-model-has-properties-file fails on macro with a valid properties yml

    Describe the bug When running the test check-model-has-properties-file with a macro, the test fails with the following error.

    Check the model has properties file......................................Failed
    - hook id: check-model-has-properties-file
    - exit code: 1
    
    macros/grant_select_on_schemas.sql: does not have model properties defined in any .yml file.
    

    The .pre-commit-config.yaml includes the rule:

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: 607cb07a1918442f5963662a9aa19da8984931e6
      hooks:
      - id: check-model-has-properties-file
    

    And the macro has the following .yml file (the filename is the same as the macro name and is stored within the macros folder):

    
    macros:
      - name: grant_select_on_schemas
        description: "Grants privileges to groups after dbt run"
        docs:
          show: false
    

    Hope we can get this fixed soon as this is a really useful test to include

    bug 
    opened by andrewlee-trouva 3
  • wip: Added hook: check_model_has_tests_by_group

    wip: Added hook: check_model_has_tests_by_group

    This PR adds a hook that checks if a model has a sufficient number of tests pulled out of a group of acceptable tests, e.g. this model has 1 of unique, unique_where, or unique_threshold.

    opened by jtalmi 3
  • `check_macro_arguments_have_desc` hook fails to parse arguments

    `check_macro_arguments_have_desc` hook fails to parse arguments

    Describe the bug

    check_macro_arguments_have_desc hook raises the following error even though the content of the files is ok. Other hooks are working correctly, including check_macro_has_description.

    Traceback (most recent call last):
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/bin/check-macro-arguments-have-desc", line 8, in <module>
        sys.exit(main())
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/lib/python3.10/site-packages/pre_commit_dbt/check_macro_arguments_have_desc.py", line 90, in main
        status_code, _ = check_argument_desc(paths=args.filenames, manifest=manifest)
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/lib/python3.10/site-packages/pre_commit_dbt/check_macro_arguments_have_desc.py", line 52, in check_argument_desc
        for key, value in item.macro.get("arguments", {}).items()
    AttributeError: 'list' object has no attribute 'items'
    

    To Reproduce

    Steps to reproduce the behavior using getdbt examples:

    1. macros/schema.yml
    version: 2
    
    macros:
      - name: cents_to_dollars
        description: A macro to convert cents to dollars
        arguments:
          - name: column_name
            type: string
            description: The name of the column you want to convert
          - name: precision
            type: integer
            description: Number of decimal places. Defaults to 2.
    
    1. macros/cents_to_dollars.sql
    {% macro cents_to_dollars(column_name, precision=3) %}
        COALESCE (TRUNC(CAST({{ column_name }}/100 AS numeric), {{ precision }}), 0)
    {% endmacro %}
    
    1. Execute the following command after dbt deps, dbt compile and dbt docs generate: pre-commit run check-macro-arguments-have-desc --files macros/cents_to_dollars.sql

    Expected behavior The hook should pass successfully.

    Version: commit 34a2341234675d7a6b61766b2c33bdd5c33d090b (current latest commit)

    Additional context It looks like the problem is here: for key, value in item.macro.get("arguments", {}).items() The hook is guessing the arguments key contains a dictionary while this is a list of dictionaries. https://github.com/offbi/pre-commit-dbt/blob/main/pre_commit_dbt/check_macro_arguments_have_desc.py#L52

    bug 
    opened by hacherix 0
  • check-model-parents-and-childs for zero child check never runs

    check-model-parents-and-childs for zero child check never runs

    Attempting to use check-model-parents-and-childs hook to ensure that our data consumption layer models do not have any children does not fail when models DO have children

    Steps to reproduce the behavior:

    1. dbt project with a parent and child models
    2. Add check-model-parents-and-childs with --max-child-cnt of zero
      - id: check-model-parents-and-childs
        name: Check for child models in data consumption layers
        # manifest.json required
        args: ["--manifest","./pipelines/target/manifest.json","--max-child-cnt","0","--"]
        files: models/self_service/
    

    Expected outcome:

    Model that has a child is raised as failure

    Actual outcome:

    Hook passes

    Version:

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: 34a2341234675d7a6b61766b2c33bdd5c33d090b
    

    Additional info:

    Offending code appears to be checking for default (None) for the --max-child-cnt returning false for zero value i.e.

     if req_cnt and req_operator(real_value, req_cnt):
         status_code = 1
         print(
    ...
    

    may need to explicitly check for None?

     if req_cnt is not None and req_operator(real_value, req_cnt):
       status_code = 1
         print(
    ...
    
    bug 
    opened by PeteCorbettWS 0
  • `check-script-has-no-table-name` fails incorrectly due to `EXTRACT` function

    `check-script-has-no-table-name` fails incorrectly due to `EXTRACT` function

    Describe the bug Using an EXTRACT date function will recognize the column reference as a table reference.

    Check the script has not table name......................................Failed
    - hook id: check-script-has-no-table-name
    - exit code: 1
    
    models/example.sql: does not use source() or ref() macros for tables:
    - order_date
    

    To Reproduce Based on the jaffle shop dbt example, create a model with the following content:

    SELECT
        *,
        EXTRACT(YEAR FROM ORDER_DATE) as ORDER_YEAR
    FROM source('jaffle_shop', 'orders')
    

    Expected behavior Using an EXTRACT date function should not make the check-script-has-no-table-name fail.

    Version: v1.0.0

    Additional context Link to EXTRACT function documentation for different warehouses:

    bug 
    opened by nasseredine 0
  • Feature Request: `check-model-has-column`

    Feature Request: `check-model-has-column`

    Describe the feature you'd like Add a hook which asserts that a given column with a given type exists in a model.

    Additional context We generally require that every model has an audit timestamp column called _updated_at and it would be nice to be able to enforce that.

    enhancement 
    opened by huptonbirdsall 0
  • `Check model name contract` hook problem with version

    `Check model name contract` hook problem with version

    When using the following pre commit config file

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: v1.0.0
      hooks:
      - id: check-model-name-contract
        args: [--pattern, "(rep__).*"]
        files: models/reporting
    

    I get the following response [ERROR] check-model-name-contract is not present in repository https://github.com/offbi/pre-commit-dbt. Typo? Perhaps it is introduced in a newer version? Often pre-commit autoupdate fixes this.

    Even if I run the autoupdate command the error message is the same. Am I missing something?

    bug 
    opened by papost 2
Releases(v1.0.0)
Owner
Offbi
Data engineering with โค๏ธ
Offbi
A fluid medium for storing, relating, and surfacing thoughts.

Conceptarium A fluid medium for storing, relating, and surfacing thoughts. Read more... Instructions The conceptarium takes up about 1GB RAM when runn

115 Dec 19, 2022
AdventOfCode 2021 solutions from the Devcord server

adventofcode-21 Ein Sammel-Repository fรผr Advent of Code 2021-Lรถsungen der deutschen DevCord-Community. A repository collecting Advent of Code 2021 so

Devcord 12 Aug 26, 2022
A tool for removing PUPs using signatures

Unwanted program removal tool A tool for removing PUPs using signatures What is the unwanted program removal tool? The unwanted program removal tool i

4 Sep 20, 2022
Used the pyautogui library to automate some processes on the computer

Pyautogui Utilizei a biblioteca pyautogui para automatizar alguns processos no c

Dheovani Xavier 1 Dec 30, 2021
Tool to audit and fix Python project requirements.

Requirement Auditor Utility to revise and updated python requirement files.

Luis Carlos Berrocal 1 Nov 07, 2021
Personal Chat Assistance

Python-Programming Personal Chat Assistance {% import "bootstrap/wtf.html" as wtf %} titleEVT/title script src="https://code.jquery.com/jquery-3.

PRASH_SMVIT 2 Nov 14, 2021
LINUX-AOS (Automatic Optimization System)

LINUX-AOS (Automatic Optimization System)

1 Jul 12, 2022
This is a simple quizz which can ask user for login/register session, then consult to the Quiz interface.

SIMPLE-QUIZ- This is a simple quizz which can ask user for login/register session, then consult to the Quiz interface. By CHAKFI Ahmed MASTER SYSTEMES

CHAKFI Ahmed 1 Jan 10, 2022
KiCad bus length matching script.

KiBus length matching script This script implements way to monitor multiple nets, combined into a bus that needs to be length matched

Piotr Esden-Tempski 22 Mar 17, 2022
A project to explore and provide useful code for Mango Markets

๐Ÿฅญ Mango Explorer A project to explore and provide useful code for Mango Markets

Blockworks Foundation 160 Dec 19, 2022
a simple functional programming language compiler written in python

Functional Programming Language A compiler for my small functional language. Written in python with SLY lexer/parser generator library. Requirements p

Ashkan Laei 3 Nov 05, 2021
๐Ÿš€ emojimash ๐Ÿš€ is a programming language with ALL THE EMOJI

๐Ÿš€ emojimash ๐Ÿš€ is a programming language with ALL THE EMOJI

Python Whiz 256 1 Oct 26, 2021
๐ŸŽด LearnQuick is a flashcard application that you can study with decks and cards.

๐ŸŽด LearnQuick is a flashcard application that you can study with decks and cards. The main function of the application is to show the front sides of the created cards to the user and ask them to guess

Mehmet Gรผdรผk 7 Aug 21, 2022
Packages of Example Data for The Effect

causaldata This repository will contain R, Stata, and Python packages, all called causaldata, which contain data sets that can be used to implement th

103 Dec 24, 2022
Procscan is a quick and dirty python script used to look for potentially dangerous api call patterns in a Procmon PML file.

PROCSCAN Procscan is a quick and dirty python script used to look for potentially dangerous api call patterns in a Procmon PML file. Installation git

Daniel Santos 9 Sep 02, 2022
Funchacks - Fun module which is a small set of utilities

funchacks ๐Ÿ‘‹ Introduction Funchacks is a fun module that provides a small packag

DenyS 6 Aug 04, 2022
Apache Superset out of box version(Windows 64-bit)

superset_app Apache Superset out of box version (Windows 64bit) prepare job download 3 files python-3.8.10-embed-amd64.zip get-pip.py python_geohashโ€‘0

Steven Lee 9 Oct 02, 2022
WildHack 2021 solution by Nuclear Foxes team (public version).

WildHack 2021 Nuclear Foxes Team This repo contains our project for the Wildberries Hackathon 2021. Task 2: Searching tags Implement an algorithm of r

Sergey Zakharov 1 Apr 18, 2022
Python library to interact with Move Hub / PoweredUp Hubs

Python library to interact with Move Hub / PoweredUp Hubs Move Hub is central controller block of LEGOยฎ Boost Robotics Set. In fact, Move Hub is just

Andrey Pokhilko 499 Jan 04, 2023
VacationCycleLogicBackEnd - Vacation Cycle Logic BackEnd With Python

Vacation Cycle Logic BackEnd Getting Started Existing virtualenv If your project

Mohamed Gamal 0 Jan 03, 2022