Visual disk-usage analyser for docker images

Overview

whaler

PyPI versions PyPI versions

What? A command-line tool for visually investigating the disk usage of docker images

Why? Large images are slow to move and expensive to store. They cost developer productivity by lengthening devops tasks and often contain unnecessary data

Who is this for? Primarily for engineers working with images containing Python packages.

User Stories

This tool should allow you to answer questions such as:

  1. Which file types are occupying the most disk space?
  2. Which are my largest Python packages?
  3. What are my unknown causes of high disk usage?

Quick start

pip install whaler

Run against a local directory

➜ whaler .venv
Running bash -c cd .venv && du -a -k
Done. Serving output at http://localhost:8000 (ctrl+c to exit)
Running python3 -m http.server 8000 --directory=_whaler/html

Run against a docker image

The tool will pull the image first if it is not present.

whaler --image='hl:latest' /
Running docker run --rm --entrypoint=du --workdir=/ hl:latest -a -k
Ignoring what seems to be non-fatal error(s):
du: cannot access './proc/1/task/1/fd/4': No such file or directory
du: cannot access './proc/1/task/1/fdinfo/4': No such file or directory
du: cannot access './proc/1/fd/3': No such file or directory
du: cannot access './proc/1/fdinfo/3': No such file or directory


Done. Serving output at http://localhost:8000 (ctrl+c to exit)
Running python3 -m http.server 8000 --directory=_whaler/html

HTML Report

Play with a hosted demo

Limitations

  1. Platform: whaler uses du to gather disk usage data. It must be present in your docker image
  2. Scale: I have tested the web UI with up to 500,000 file system nodes with du output of up to ~100MB.

Alternatives/Complements to this tool:

  1. Whaler can tell you what is taking up space in the final layer of your Docker image, but you may have intermediate layers which are contributing to the image size. For diving through the layers, use dive
    • Related: read up on multi-stage builds to understand how to mitigate the problem of intermediate layers bloating your image.
  2. For investigating disk usage in non-docker directories, Disk Inventory X is a great tool on OS X which I have based whaler on.
  3. GoogleContainerTools/distroless for base images not containing standard linux distro contents

See also

  1. a good blog post on making small python images

Developing

See .github/workflows/test.yml for the development platform and setup.

For UI, see whaler-ui

You might also like...
Lima is an alternative to using Docker Desktop on your Mac.
Lima is an alternative to using Docker Desktop on your Mac.

lima-xbar-plugin Table of Contents Description Installation Dependencies Lima is an alternative to using Docker Desktop on your Mac. Description This

Build Netbox as a Docker container

netbox-docker The Github repository houses the components needed to build Netbox as a Docker container. Images are built using this code and are relea

Some automation scripts to setup a deployable development database server (with docker).

Postgres-Docker Database Initializer This is a simple automation script that will create a Docker Postgres database with a custom username, password,

A job launching library for docker, EC2, GCP, etc.

doodad A library for packaging dependencies and launching scripts (with a focus on python) on different platforms using Docker. Currently supported pl

A repository containing a short tutorial for Docker (with Python).
A repository containing a short tutorial for Docker (with Python).

Docker Tutorial for IFT 6758 Lab In this repository, we examine the advtanges of virtualization, what Docker is and how we can deploy simple programs

Bitnami Docker Image for Python using snapshots for the system packages repositories

Python Snapshot packaged by Bitnami What is Python Snapshot? Python is a programming language that lets you work quickly and integrate systems more ef

Hatch plugin for Docker containers

hatch-containers CI/CD Package Meta This provides a plugin for Hatch that allows

Create pinned requirements.txt inside a Docker image using pip-tools
Create pinned requirements.txt inside a Docker image using pip-tools

Pin your Python dependencies! pin-requirements.py is a script that lets you pin your Python dependencies inside a Docker container. Pinning your depen

This repository contains useful docker-swarm-tools.

docker-swarm-tools This repository contains useful docker-swarm-tools. swarm-guardian This Docker image is intended to be used in a multihost docker e

Comments
  • Bounding box on selected node barely visible

    Bounding box on selected node barely visible

    This is due to the canvas not being fully re-drawn upon selection.

    The solution is probably to ensure the canvas can be re-drawn quickly and creating a fresh render every time

    bug 
    opened by alex-treebeard 0
Releases(0.1.2)
Owner
Treebeard Technologies
Helping with Data Infra
Treebeard Technologies
Quick & dirty controller to schedule Kubernetes Jobs later (once)

K8s Jobber Operator Quickly implemented Kubernetes controller to enable scheduling of Jobs at a later time. Usage: To schedule a Job later, Set .spec.

Jukka Väisänen 2 Feb 11, 2022
Build Netbox as a Docker container

netbox-docker The Github repository houses the components needed to build Netbox as a Docker container. Images are built using this code and are relea

Farshad Nick 1 Dec 18, 2021
A lobby boy will create a VPS server when you need one, and destroy it after using it.

Lobbyboy What is a lobby boy? A lobby boy is completely invisible, yet always in sight. A lobby boy remembers what people hate. A lobby boy anticipate

226 Dec 29, 2022
A little script and trick to make your heroku app run forever without being concerned about dyno hours.

A little script and trick to make your heroku app run forever without being concerned about dyno hours.

Tiararose Biezetta 152 Dec 25, 2022
Pulumi - Developer-First Infrastructure as Code. Your Cloud, Your Language, Your Way 🚀

Pulumi's Infrastructure as Code SDK is the easiest way to create and deploy cloud software that use containers, serverless functions, hosted services,

Pulumi 14.7k Jan 08, 2023
A tool to clone efficiently all the repos in an organization

cloner A tool to clone efficiently all the repos in an organization Installation MacOS (not yet tested) python3 -m venv .venv pip3 install virtualenv

Ramon 6 Apr 15, 2022
Jenkins-AWS-CICD - Implement Jenkins CI/CD with AWS CodeBuild and AWS CodeDeploy, build a python flask web application.

Jenkins-AWS-CICD - Implement Jenkins CI/CD with AWS CodeBuild and AWS CodeDeploy, build a python flask web application.

Ning 1 Jan 01, 2022
Copy a Kubernetes pod and run commands in its environment

copypod Utility for copying a running Kubernetes pod so you can run commands in a copy of its environment, without worrying about it the pod potential

Memrise 4 Apr 08, 2022
More than 130 check plugins for Icinga and other Nagios-compatible monitoring applications. Each plugin is a standalone command line tool (written in Python) that provides a specific type of check.

Python-based Monitoring Check Plugins Collection This Enterprise Class Check Plugin Collection offers a package of more than 130 Python-based, Nagios-

Linuxfabrik 119 Dec 27, 2022
Let's learn how to build, release and operate your containerized applications to Amazon ECS and AWS Fargate using AWS Copilot.

🚀 Welcome to AWS Copilot Workshop In this workshop, you'll learn how to build, release and operate your containerised applications to Amazon ECS and

Donnie Prakoso 15 Jul 14, 2022
Rancher Kubernetes API compatible with RKE, RKE2 and maybe others?

kctl Rancher Kubernetes API compatible with RKE, RKE2 and maybe others? Documentation is WIP. Quickstart pip install --upgrade kctl Usage from lazycls

1 Dec 02, 2021
a CLI that provides a generic automation layer for assessing the security of ML models

Counterfit About | Getting Started | Learn More | Acknowledgments | Contributing | Trademarks | Contact Us -------------------------------------------

Microsoft Azure 575 Jan 02, 2023
A cron monitoring tool written in Python & Django

Healthchecks Healthchecks is a cron job monitoring service. It listens for HTTP requests and email messages ("pings") from your cron jobs and schedule

Healthchecks 5.8k Jan 02, 2023
A job launching library for docker, EC2, GCP, etc.

doodad A library for packaging dependencies and launching scripts (with a focus on python) on different platforms using Docker. Currently supported pl

Justin Fu 55 Aug 27, 2022
Honcho: a python clone of Foreman. For managing Procfile-based applications.

___ ___ ___ ___ ___ ___ /\__\ /\ \ /\__\ /\ \ /\__\ /\

Nick Stenning 1.5k Jan 03, 2023
Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

Arie Bregman 35.1k Jan 02, 2023
Caboto, the Kubernetes semantic analysis tool

Caboto Caboto, the Kubernetes semantic analysis toolkit. It contains a lightweight Python library for semantic analysis of plain Kubernetes manifests

Michael Schilonka 8 Nov 26, 2022
DC/OS - The Datacenter Operating System

DC/OS - The Datacenter Operating System The easiest way to run microservices, big data, and containers in production. What is DC/OS? Like traditional

DC/OS 2.3k Jan 06, 2023
Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:

Latest Salt Documentation Open an issue (bug report, feature request, etc.) Salt is the world’s fastest, most intelligent and scalable automation engi

SaltStack 12.9k Jan 04, 2023