A lightweight tool to get an AI Infrastructure Stack up in minutes not days.

Overview


Welcome to K3ai Project

K3ai is a lightweight tool to get an AI Infrastructure Stack up in minutes not days.

cli version  go version  go report  license


NOTE on the K3ai origins

Original K3ai Project has been developed at the end of October 2020 in 2 weeks by:

K3ai v1.0 has been entirely re-written by Alessandro Festa during the month of October 2021 to offer a better User Experience.

Thanks to the amazing and incredible people and projects that have been instrumental to create K3ai project repositories,website,etc...

⚑️ Quick start

Let's discover K3ai in three simple steps.

🌘 Getting Started

Get started by download k3ai from the release page here.

Or try K3ai companion script using this command:

curl -LO https://get.k3ai.in | sh -

πŸŒ— Load K3ai configuration

Let's start loading the configuration:

k3ai up

First time k3ai run will ask for a Github PAT (Personal Access Token) that we will use to avoid API calls limitations. Check Github Documentation to learn how to create one. Your personal GH PAT only need read repository permission.


πŸŒ– Configure the base infrastructure

Choose your favourite Kubernetes flavor and run it:

To know which K8s flavors are available

k3ai cluster list --all

it should print something like:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ INFRASTRUCTURE                                                                                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ TYPE  β”‚ DESCRIPTION                                         β”‚ KIND  β”‚ TAG    β”‚ VERSION β”‚ STATUS         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ CIVO  β”‚ The First Cloud Native Service Provider Power...    β”‚ infra β”‚ cloud  β”‚ latest  β”‚ Available      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ EKS-A β”‚ Amazon Eks Anywhere Is A New Deployment Option...   β”‚ infra β”‚ hybrid β”‚ v0.5.0  β”‚ Available      β”‚
β”‚       β”‚ ate And Operate Kubernetes Clusters On Custome...   β”‚       β”‚        β”‚         β”‚                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ K3S   β”‚ K3s Is A Highly Available, Certified Kubernetes...  β”‚ infra β”‚ local  β”‚ latest  β”‚ Available      β”‚
β”‚       β”‚ oads In Unattended, Resource-Constrained...         β”‚       β”‚        β”‚         β”‚                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ KIND  β”‚ Kind Is A Tool For Running Local Kubernetes...      β”‚ infra β”‚ local  β”‚ v0.11.2 β”‚ Available      β”‚
β”‚       β”‚ as Primarily Designed For Testing Kubernetes...     β”‚       β”‚        β”‚         β”‚                β”‚
β”‚       β”‚  Or Ci.                                             β”‚       β”‚        β”‚         β”‚                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ TANZU β”‚ Tanzu Community Edition Is A Fully-Featured...      β”‚ infra β”‚ hybrid β”‚ latest  β”‚ In Development β”‚
β”‚       β”‚ ers And Users. It Is A Freely Available...          β”‚       β”‚        β”‚         β”‚                β”‚
β”‚       β”‚  Of Vmware Tanzu.                                   β”‚       β”‚        β”‚         β”‚                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Now let start with something super fast and super simple:

k3ai cluster deploy --type k3s --n mycluster

🌝 Install a plugin to do your AI experimentations

Now that the server is up and running let's type:

k3ai plugin deploy -n mlflow -t mycluster

K3ai will print the url where you may access to the MLFLow tracking server at the end of the installation. That's all now just start having fun with K3ai!

🌈 Push a piece of code to the AI tools and focus on your goals

Let's push some code to the AI tool (i.e.: MLFlow)

k3ai run --source https://github.com/k3ai/quickstart --target mycluster --backend mlflow

wait the run to complete and login the backend AI tolls (i.e.: on the MLFlow UI http:// :30500 )

Current Implementation support

Operating Systems

Operating System K3ai v1.0.0
Linux Yes
Windows In Progress
MacOs In Progress
Arm In Progress

Clusters

K8s Clusters K3ai v1.0.0
Rancher K3s Yes
Vmware Tanzu Community Ed. Yes
Amazon EKS Anywhere Yes
KinD Yes

Plugins

Plugins K3ai v1.0.0
Kuebflow Components Yes
MLFlow Yes
Apache Airflow Yes
Argo Workflows Yes

⭐️ Project assistance

If you want to say thank you or/and support active development of K3ai Project:

Together, we can make this project better every day! 😘

⚠️ License

K3ai is free and open-source software licensed under the BSD 3-Clause. Official logo was created by Alessandro Festa.

Comments
  • [Core] - Initial work for v3 version

    [Core] - Initial work for v3 version

    This PR is the initial work to re-write k3ai into a more flexible tool. This PR implements:

    • [x] #3
    • [x] #4
    • [x] #5
    • [ ] #6
    • [x] #7
    • [x] #8
    • [ ] #9

    This PR also include the Issues in the Plugin repo:

    • [x] https://github.com/k3ai/plugins/issues/1
    • [x] https://github.com/k3ai/plugins/issues/2
    • [x] https://github.com/k3ai/plugins/issues/3
    done 
    opened by alefesta 11
  • [BUG] k3ai up yields version `GLIBC_2.28' not found error

    [BUG] k3ai up yields version `GLIBC_2.28' not found error

    Describe the bug After following the installation instructions, the following error is reported:

    k3ai: /lib/x86_64-linux-gnu/libc.so.6: versionGLIBC_2.28' not found (required by k3ai)`

    To Reproduce Steps to reproduce the behavior:

    1. curl -LO https://get.k3ai.in | sh -
    2. k3ai up

    Expected behavior It should have spun up the cluster!

    OS: Ubuntu 18.04

    done bug 
    opened by htahir1 7
  • [Feature] - Running Kubeflow and MLFLow code through

    [Feature] - Running Kubeflow and MLFLow code through "One-Click" approach

    This PR address:

    • #10
    • #14 Also introduce -x as Extra and -e ad Entrypoint

    Examples:

    k3ai run -s https://github.com/alefesta/sample/mlflow -b mlflow -t <clustername>
    k3ai run -s https://github.com/alefesta/sample/kfp -b kfp -e condition.py -t <clustername>
    

    For MLFlow remains to manage the need of boto3 in the conda.yaml file that is a requirement to run on K8s. This need to be addressed before merge this PR. We may:

    • try to inject boto3 in the conda.yaml file at runtime
    • Force the user to fork the example first , change conda and run k3ai The first seems more compliant with k3ai goals of making life of user easier
    done 
    opened by alefesta 6
  • [BUG] - Kubeflow Pipelines Quickstart Repository Missing

    [BUG] - Kubeflow Pipelines Quickstart Repository Missing

    Describe the bug I was trying to follow the kubeflow pipelines tutorial as described in the k3ai website. It seems the final step of running the pipeline fails because the quickstart repository for kubeflow pipelines does not exist.

    To Reproduce Steps to reproduce the behavior:

    1. k3ai up
    2. k3ai cluster deploy -t k3s -n myk3scluster
    3. k3ai plugin deploy -n kf-pa -t myk3scluster
    4. k3ai run -s https://github.com/k3ai/quickstart/kfp -b kfp -e condition.py -t mycluster

    Expected behavior Pipeline to run successfully.

    Actual behavior Pipeline run fails.

    done 
    opened by harshitmahapatra 5
  • [Feature] - Add support for k3d

    [Feature] - Add support for k3d

    πŸš€ Is your feature request related to a problem? Please describe. Currently, k3ai doesn't have support for k3d.

    k3s is known to have issues with WSL2 deployment (systemd requirement, etc.), so it would be better to have k3d support.

    πŸ’‘ Describe the solution you'd like We can add k3d support to k3ai in a subsequent release. (would require some work on pkg/io/execution).

    epic done 
    opened by burntcarrot 5
  • Fix lint issues

    Fix lint issues

    Fixed 100+ issues related to ineffectual assignments and added error checks. Added golangci-lint workflow to check linting issues while pushing code.

    The log level used while error checking is Fatal (log.Fatal(err)).

    opened by burntcarrot 4
  • [BUG] - MLFlow endpoint doesn't work in WSL2

    [BUG] - MLFlow endpoint doesn't work in WSL2

    Describe the bug

    While running the MLFlow plugin, the endpoint URI displayed by k3ai is not accessible.

    k3ai-mlflow

    The following endpoints are not accessible:

    • http://172.29.170.187:30500/ (displayed by k3ai)
    • http://172.29.170.187:5000/
    • http://10.96.150.194:30500/
    • http://10.244.0.7:30500/

    The IP address for the WSL2 machine is (through wsl hostname -I): 172.29.170.187

    WSL2 uses dynamic IP allocation.

    To Reproduce Steps to reproduce the behavior:

    k3ai run -s https://github.com/k3ai/quickstart -b mlflow
    

    Expected behavior The MLFlow endpoint exposed through k3ai should have worked.

    bug 
    opened by burntcarrot 4
  • [Feature] - Implement a system domain to automatically bind the plugins

    [Feature] - Implement a system domain to automatically bind the plugins

    K3ai should implement an automatice system domain (i.e: sslip.io or nip.io) so that any plugin installed could be exposed with the standard: <plugin-name>.<clusterIP>.nip.io This way we may use the same IP in cases like:

    • WSL
    • Laptops
    epic 
    opened by alefesta 4
  • [BUG] - runtime error with index out of range when running quickstart

    [BUG] - runtime error with index out of range when running quickstart

    Describe the bug A clear and concise description of what the bug is.

    To Reproduce Steps to reproduce the behavior:

    1. Follow quickstart steps:
    k3ai up
    k3ai cluster deploy -t k3s -n mycluster
    k3ai plugin deploy -n mlflow -t mycluster
    
    1. Try running quickstart: $ k3ai run -s https://github.com/k3ai/quickstart -b mlflow
    2. Receive error:
    πŸ§ͺ	Initializing code...
    panic: runtime error: index out of range [0] with length 0
    
    goroutine 1 [running]:
    github.com/k3ai/pkg/runner.Loader({0x7fff7fe7bb4e, 0x22}, {0x0, 0x0}, {0x7fff7fe7bb74, 0x6}, {0x0, 0x0}, {0x0, 0x0})
    	/home/joshec/git/k3ai/pkg/runner/run.go:78 +0x10f6
    github.com/k3ai/cmd.runCommand.func1(0xc000403680, {0xc0003d57c0, 0x0, 0x4})
    	/home/joshec/git/k3ai/cmd/run.go:71 +0x58b
    github.com/spf13/cobra.(*Command).execute(0xc000403680, {0xc0003d5780, 0x4, 0x4})
    	/home/joshec/go/pkg/mod/github.com/spf13/[email protected]/command.go:860 +0x5f8
    github.com/spf13/cobra.(*Command).ExecuteC(0x2254960)
    	/home/joshec/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3bc
    github.com/spf13/cobra.(*Command).Execute(...)
    	/home/joshec/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
    github.com/k3ai/cmd.Execute(...)
    	/home/joshec/git/k3ai/cmd/root.go:34
    main.main()
    	/home/joshec/git/k3ai/main.go:10 +0x25
    

    Expected behavior A clear and concise description of what you expected to happen.

    A successful run with proper artifact storage and tracking URI settings

    Screenshots If applicable, add screenshots to help explain your problem.

    in progress bug 
    opened by jeinstei 3
  • [CI/CD] - Add Lint support

    [CI/CD] - Add Lint support

    On running golangci-lint on my local machine, I was able to find 40+ linting issues.

    10 of them were deadcode issues, so it can be ignored as they're a part of adding code for future releases.

    The rest are ineffectual assignments and skipped error checks. We can log the error message for the skipped error checks; it would help us more in debugging.

    I know this sounds like a minor issue, but with more code coming in the subsequent releases, addressing this earlier can help us save a lot of time maintaining good quality code.

    Suggested Fix: Add golangci-lint action as workflow to check linting issues. We can add a rule for excluding deadcode issues for now.

    done 
    opened by burntcarrot 3
  • 'invalid argument'

    'invalid argument'

    Hello - I am trying out the Mlflow deployment as in the tutorials and I get a stream of logs that say "invalid argument" and after a while I get "We tried to publish MLFLow at:http://172.17.0.2:30500" .. but when I go to this page there is no Mlflow server.

    Would appreciate the help. Thanks.

    Great work btw! this library is amazing!

    opened by jsnanavati 2
  • [BUG] - Kubeflow Pipelines not starting

    [BUG] - Kubeflow Pipelines not starting

    Describe the bug I am trying to run the kubeflow plugin on a single node 8vcpu / 16gb ram.

    To Reproduce curl -sfL https://get.k3ai.in | sh - k3ai up k3ai cluster deploy --type k3s -n mycluster k3ai plugin deploy -n kf-pa -t mycluster

    Issue Installation never ends, seems the pods are not being started correctly

    ubuntu:~$ k3s kubectl get pods -n kubeflow
    NAME                                              READY   STATUS                   RESTARTS        AGE
    workflow-controller-b7f95d6c6-q2wkf               1/1     Running                  0               4m22s
    ml-pipeline-scheduledworkflow-5c549bc5f5-drkmn    1/1     Running                  0               4m23s
    ml-pipeline-viewer-crd-7555c4d55f-fpd2m           1/1     Running                  0               4m23s
    metadata-envoy-deployment-7654b98955-rkt2g        1/1     Running                  0               4m24s
    ml-pipeline-ui-656466fdc9-qg9xv                   1/1     Running                  0               4m23s
    mysql-55778745b6-g4vbd                            1/1     Running                  0               4m22s
    minio-6d6d45469f-xgmz2                            1/1     Running                  0               4m24s
    cache-deployer-deployment-6f8ff5b986-tvwn4        1/1     Running                  0               4m24s
    metadata-grpc-deployment-5c8599b99c-b45jf         1/1     Running                  1 (3m17s ago)   4m24s
    ml-pipeline-8995b746f-dhznz                       1/1     Running                  1 (2m31s ago)   4m23s
    cache-server-74494cbf5-k956w                      0/1     Pending                  0               2m20s
    cache-server-74494cbf5-6v5lj                      0/1     ContainerStatusUnknown   0               4m24s
    ml-pipeline-persistenceagent-59689585f6-s8dhd     1/1     Running                  1 (2m5s ago)    4m23s
    ml-pipeline-visualizationserver-6b8fb8c44-mmrk8   0/1     ContainerStatusUnknown   0               4m22s
    ml-pipeline-visualizationserver-6b8fb8c44-svm25   0/1     Pending                  0               113s
    metadata-writer-fd965db48-9lw22                   0/1     Error                    0               4m24s
    metadata-writer-fd965db48-rqt7d                   0/1     Pending                  0               82s
    

    Pod metadata-writer-fd965db48-9lw22 error : message: 'The node was low on resource: ephemeral-storage. Container main was using 392Ki, which exceeds its request of 0. '

    Any ideas? Thanks!

    needs-triage bug 
    opened by tonxxd 1
  • [BUG] - postgress crashes when deploying mlflow on k3s / intel

    [BUG] - postgress crashes when deploying mlflow on k3s / intel

    Describe the bug in postgres pod: Bus error (core dumped)

    running on: (base) [email protected]:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.4 LTS Release: 20.04 Codename: focal (base) [email protected]:~$

    To Reproduce

    k3ai cluster deploy --type k3s --name arrakis rk3ai plugin deploy -n mlflow -t arrakis k3s kubectl logs postgres-0

    Expected behavior successful mlflow startup

    Screenshots (base) [email protected]:~$ kubectl get all -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system pod/local-path-provisioner-6c79684f77-996dg 1/1 Running 0 7m28s kube-system pod/coredns-d76bd69b-8zqzz 1/1 Running 0 7m28s kube-system pod/metrics-server-7cd5fcb6b7-6zwwl 1/1 Running 0 7m28s default pod/minio-0 1/1 Running 0 6m40s default pod/mlflow-7c6768c4c-m6j6d 1/1 Running 0 6m23s default pod/postgres-0 0/1 CrashLoopBackOff 6 (20s ago) 6m31s

    NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default service/kubernetes ClusterIP 10.43.0.1 443/TCP 7m43s kube-system service/kube-dns ClusterIP 10.43.0.10 53/UDP,53/TCP,9153/TCP 7m40s kube-system service/metrics-server ClusterIP 10.43.39.68 443/TCP 7m39s default service/minio-service ClusterIP 10.43.144.140 9000/TCP 6m40s default service/postgres-service ClusterIP 10.43.236.158 5432/TCP 6m23s default service/mlflow-service NodePort 10.43.192.251 5000:30500/TCP 6m8s

    NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE kube-system deployment.apps/local-path-provisioner 1/1 1 1 7m40s kube-system deployment.apps/coredns 1/1 1 1 7m40s kube-system deployment.apps/metrics-server 1/1 1 1 7m39s default deployment.apps/mlflow 1/1 1 1 6m23s

    NAMESPACE NAME DESIRED CURRENT READY AGE kube-system replicaset.apps/local-path-provisioner-6c79684f77 1 1 1 7m29s kube-system replicaset.apps/coredns-d76bd69b 1 1 1 7m29s kube-system replicaset.apps/metrics-server-7cd5fcb6b7 1 1 1 7m29s default replicaset.apps/mlflow-7c6768c4c 1 1 1 6m23s

    NAMESPACE NAME READY AGE default statefulset.apps/minio 1/1 6m40s default statefulset.apps/postgres 0/1 6m31s

    (base) [email protected]:~$ kubectl logs postgres-0 The files belonging to this database system will be owned by user "postgres". This user must also own the server process.

    The database cluster will be initialized with locale "en_US.utf8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english".

    Data page checksums are disabled.

    fixing permissions on existing directory /var/lib/postgresql/mlflow/data ... ok creating subdirectories ... ok selecting default max_connections ... 20 selecting default shared_buffers ... 400kB selecting default timezone ... Etc/UTC selecting dynamic shared memory implementation ... posix creating configuration files ... ok

    bug todo :spiral_notepad: 
    opened by paxinos 1
  • [BUG] - Incompatible k3s version for kubeflow

    [BUG] - Incompatible k3s version for kubeflow

    Describe the bug There's bug on kubeflow part, they currently don't support k8s 1.22, so at the moment kubeflow pipelines seems to work, with k3s, but e.g. kf-dashboard is failing, which might be related to unsupported k8s API version.

    ...
     ⏳     Working...
     πŸš€ unable to recognize "/home/user/.k3ai/git/common/istio-1-9/istio-crds/base": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1"
    ...
     πŸš€ unable to recognize "/home/user/.k3ai/git/common/istio-1-9/istio-install/base": no matches for kind "EnvoyFilter" in version "networking.istio.io/v1alpha3"
    ...
     πŸš€ unable to recognize "/home/user/.k3ai/git/common/istio-1-9/istio-install/base": no matches for kind "Gateway" in version "networking.istio.io/v1alpha3"
     πŸš€ unable to recognize "/home/user/.k3ai/git/common/istio-1-9/istio-install/base": no matches for kind "AuthorizationPolicy" in version "security.istio.io/v1beta1"
    ...
     πŸš€ unable to recognize "/home/user/.k3ai/git/common/istio-1-9/istio-install/base": no matches for kind "MutatingWebhookConfiguration" in version "admissionregistration.k8s.io/v1beta1"
     πŸš€ unable to recognize "/home/user/.k3ai/git/common/istio-1-9/istio-install/base": no matches for kind "ValidatingWebhookConfiguration" in version "admissionregistration.k8s.io/v1beta1"
    ...
    

    To Reproduce

    k3ai plugin deploy -n kf-dashboard -t myk3scluster

    Expected behavior

    Successful deployment of all kubeflow components on k3s.

    in progress docs 
    opened by Adiqq 1
  • [Feature] - Clean up CLI error messages

    [Feature] - Clean up CLI error messages

    πŸš€ Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

    See Issue https://github.com/k3ai/k3ai/issues/53 regarding k3ai run not returning a useful error when a required argument was missing

    πŸ’‘ Describe the solution you'd like A clear and concise description of what you want to happen.

    a better CLI handler with more descriptive errors, or at least a fix for this bug on this command's handling

    🀩 Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    None yet

    help-wanted epic todo :spiral_notepad: docs 
    opened by jeinstei 0
  • Implement Plugin remove

    Implement Plugin remove

    Hey, great work with K3ai. Works For most of the operations pretty smooth.

    After experimenting a bit with Kubeflow I wanted to remove a plugin, but it seems that the command is not implemented:

    ➜ k3ai  plugin remove --name kf-pa
    Remove a given plugin based on NAME
    
    Usage:
      k3ai[options] plugin remove [-n NAME] [other flags]
    
    Flags:
      -n, --name string     NAME of plugin to be created/deleted
      -t, --target string   Target from where to remove plugin.
      -q, --quiet           Suppress output messages. Useful when k3ai is used within scripts.
      -c, --config string   Configure K3ai using a custom config file.[-c /path/tofile] [-c https://urlToFile]
    

    See here: https://github.com/k3ai/k3ai/blob/main/cmd/plugin.go#L138

    I am not sure, whether I am missing something but couldn't find anything related in the issues or Roadmap.


    • Your operating system name and version: Ubuntu 18.04
    • Detailed steps to reproduce the bug: Follow exact steps from documentation or README to deploy a plugin
    help-wanted epic todo :spiral_notepad: 
    opened by daniel-vera-g 2
  • [Feature] - Use Github Actions to create issues for exported reports

    [Feature] - Use Github Actions to create issues for exported reports

    πŸš€ Is your feature request related to a problem? Please describe. Related:

    • #37

    Using the metrics report exported through the executor, we can use Github Actions workflows to create automated issues containing the reports.

    πŸ’‘ Describe the solution you'd like Create issues with the exported report as the content using Github Actions.

    epic todo :spiral_notepad: 
    opened by burntcarrot 1
Releases(1.0.1)
  • 1.0.1(Dec 7, 2021)

    Full Changelog: https://github.com/k3ai/k3ai/compare/1.0...1.0.1

    What's Changed

    K3ai Features:

    ARM support #13 @alefesta Kubeflow one-click pipeline #14 @alefesta Implementing GH actions (GHA) as a method to run K3ai from within the repo #19 Minimal documentation to run K3ai as GH @burntcarrot Implementing a config file to mimic an e2e workflow #18 @burntcarrot Add support for k3d #25 @burntcarrot

    K3ai Plugins:

    https://github.com/k3ai/plugins/issues/6 @alefesta https://github.com/k3ai/plugins/issues/7 @alefesta

    Bugs

    • [BUG] - certain plugins fail to install by @alefesta in https://github.com/k3ai/k3ai/pull/23
    • [BUG] - Fixes right tools download for Architecture by @alefesta in https://github.com/k3ai/k3ai/pull/43
    • [BUG] - minor fixes on download tools by @alefesta in https://github.com/k3ai/k3ai/pull/44
    • [BUG] - Fixes on Civo CLI for ARM by @alefesta in https://github.com/k3ai/k3ai/pull/45

    New Contributors

    • @burntcarrot made their first contribution in https://github.com/k3ai/k3ai/pull/27
    Source code(tar.gz)
    Source code(zip)
    k3ai(43.51 MB)
    k3ai.arm64(40.75 MB)
    k3ai.darwin.amd64(32.96 MB)
  • 1.0(Nov 1, 2021)

    Full Changelog: https://github.com/k3ai/k3ai/commits/1.0

    What's Changed

    • [Core] - Initial work for v1.0.0 version by @alefesta in https://github.com/k3ai/k3ai/pull/2

    New Contributors

    • @alefesta made their first contribution in https://github.com/k3ai/k3ai/pull/2

    Full Changelog:

    • Introducing K3ai DB to manage clusters and plugins dynamically
    • Introducing new CLI logic : K3ai [COMMAND] [ACTION] [OPTIONS]
    • Introducing the One Click experience to run training over deployed plugins.

    Current Operating Systems supported

    • Linux x64
    • macOS (Not Tested) Have fun with K3ai
    Source code(tar.gz)
    Source code(zip)
    k3ai(30.78 MB)
Learning Energy-Based Models by Diffusion Recovery Likelihood

Learning Energy-Based Models by Diffusion Recovery Likelihood Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P. Kingma Paper: https://arxiv.o

Ruiqi Gao 41 Nov 22, 2022
Implementation of the πŸ˜‡ Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

HaloNet - Pytorch Implementation of the Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones. This re

Phil Wang 189 Nov 22, 2022
Code accompanying paper: Meta-Learning to Improve Pre-Training

Meta-Learning to Improve Pre-Training This folder contains code to run experiments in the paper Meta-Learning to Improve Pre-Training, NeurIPS 2021. P

28 Dec 31, 2022
Hack Camera, Microphone, Location, Clipboard With Just a Link. Also, Get Many Details About Victim's Device. And So On...

An Automated Tool to Hack Victim's Camera, Microphone, Location, Clipboard. Has 2 Extra Features. Version 1.1 Update Fixed Some Major Bugs Data Saving

ToxicNoob 36 Jan 07, 2023
Exploring Visual Engagement Signals for Representation Learning

Exploring Visual Engagement Signals for Representation Learning Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie and Ser-Nam Lim C

Menglin Jia 9 Jul 23, 2022
Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

BRIMs Bidirectional Recurrent Independent Mechanisms Implementation of the paper Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neura

Sarthak Mittal 26 May 26, 2022
Deep Inside Convolutional Networks - This is a caffe implementation to visualize the learnt model

Deep Inside Convolutional Networks This is a caffe implementation to visualize the learnt model. Part of a class project at Georgia Tech Problem State

Jigar 61 Apr 15, 2022
A booklet on machine learning systems design with exercises

Machine Learning Systems Design Read this booklet here. This booklet covers four main steps of designing a machine learning system: Project setup Data

Chip Huyen 7.6k Jan 08, 2023
Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis

Readme File for "Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis" by Ham, Imai, and Janson. (2022) All scripts were written and

0 Jan 27, 2022
A pytorch implementation of faster RCNN detection framework (Use detectron2, it's a masterpiece)

Notice(2019.11.2) This repo was built back two years ago when there were no pytorch detection implementation that can achieve reasonable performance.

Ruotian(RT) Luo 1.8k Jan 01, 2023
Pytorch Implementation of Spiking Neural Networks Calibration, ICML 2021

SNN_Calibration Pytorch Implementation of Spiking Neural Networks Calibration, ICML 2021 Feature Comparison of SNN calibration: Features SNN Direct Tr

Yuhang Li 60 Dec 27, 2022
Fast and Simple Neural Vocoder, the Multiband RNNMS

Multiband RNN_MS Fast and Simple vocoder, Multiband RNN_MS. Demo Quick training How to Use System Details Results References Demo ToDO: Link super gre

tarepan 5 Jan 11, 2022
FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control

FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control by Dimitri von RΓΌtte, Luca Biggio, Yannic Kilcher, Thomas Hofmann FIGARO: Generat

Dimitri 83 Jan 07, 2023
Run Effective Large Batch Contrastive Learning on Limited Memory GPU

Gradient Cache Gradient Cache is a simple technique for unlimitedly scaling contrastive learning batch far beyond GPU memory constraint. This means tr

Luyu Gao 198 Dec 29, 2022
gitγ€ŠInvestigating Loss Functions for Extreme Super-Resolution》(CVPR 2020) GitHub:

Investigating Loss Functions for Extreme Super-Resolution NTIRE 2020 Perceptual Extreme Super-Resolution Submission. Our method ranked first and secon

Sejong Yang 0 Oct 17, 2022
An elaborate and exhaustive paper list for Named Entity Recognition (NER)

Named-Entity-Recognition-NER-Papers by Pengfei Liu, Jinlan Fu and other contributors. An elaborate and exhaustive paper list for Named Entity Recognit

Pengfei Liu 388 Dec 18, 2022
Logistic Bandit experiments. Official code for the paper "Jointly Efficient and Optimal Algorithms for Logistic Bandits".

Code for the paper Jointly Efficient and Optimal Algorithms for Logistic Bandits, by Louis Faury, Marc Abeille, Clément Calauzènes and Kwang-Sun Jun.

Faury Louis 1 Jan 22, 2022
A public available dataset for road boundary detection in aerial images

Topo-boundary This is the official github repo of paper Topo-boundary: A Benchmark Dataset on Topological Road-boundary Detection Using Aerial Images

Zhenhua Xu 79 Jan 04, 2023
Motion planning algorithms commonly used on autonomous vehicles. (path planning + path tracking)

Overview This repository implemented some common motion planners used on autonomous vehicles, including Hybrid A* Planner Frenet Optimal Trajectory Hi

Huiming Zhou 1k Jan 09, 2023
Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21)

Learning Structural Edits via Incremental Tree Transformations Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21) 1.

NeuLab 40 Dec 23, 2022