Kalidokit is a blendshape and kinematics solver for Mediapipe/Tensorflow.js face, eyes, pose, and hand tracking models

Overview

KalidoKit - Face, Pose, and Hand Tracking Kinematics

Kalidokit Template

Kalidokit is a blendshape and kinematics solver for Mediapipe/Tensorflow.js face, eyes, pose, and hand tracking models, compatible with Facemesh, Blazepose, Handpose, and Holistic. It takes predicted 3D landmarks and calculates simple euler rotations and blendshape face values.

As the core to Vtuber web apps, Kalidoface and Kalidoface 3D, KalidoKit is designed specifically for rigging 3D VRM models and Live2D avatars!

Kalidokit Template

ko-fi

Install

Via NPM

npm install kalidokit
import * as Kalidokit from "kalidokit";

// or only import the class you need

import { Face, Pose, Hand } from "kalidokit";

Via CDN

">
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/kalidokit.umd.js"></script>

Methods

Kalidokit is composed of 3 classes for Face, Pose, and Hand calculations. They accept landmark outputs from models like Facemesh, Blazepose, Handpose, and Holistic.

// Accepts an array(468 or 478 with iris tracking) of vectors
Kalidokit.Face.solve(facelandmarkArray, {
    runtime: "tfjs", // `mediapipe` or `tfjs`
    video: HTMLVideoElement,
    imageSize: { height: 0, width: 0 },
    smoothBlink: false, // smooth left and right eye blink delays
    blinkSettings: [0.25, 0.75], // adjust upper and lower bound blink sensitivity
});

// Accepts arrays(33) of Pose keypoints and 3D Pose keypoints
Kalidokit.Pose.solve(poseWorld3DArray, poseLandmarkArray, {
    runtime: "tfjs", // `mediapipe` or `tfjs`
    video: HTMLVideoElement,
    imageSize: { height: 0, width: 0 },
    enableLegs: true,
});

// Accepts array(21) of hand landmark vectors; specify 'Right' or 'Left' side
Kalidokit.Hand.solve(handLandmarkArray, "Right");

// Using exported classes directly
Face.solve(facelandmarkArray);
Pose.solve(poseWorld3DArray, poseLandmarkArray);
Hand.solve(handLandmarkArray, "Right");

Additional Utils

// Stabilizes left/right blink delays + wink by providing blenshapes and head rotation
Kalidokit.Face.stabilizeBlink(
    { r: 0, l: 1 }, // left and right eye blendshape values
    headRotationY, // head rotation in radians
    {
        noWink = false, // disables winking
        maxRot = 0.5 // max head rotation in radians before interpolating obscured eyes
    });

// The internal vector math class
Kalidokit.Vector();

Remixable VRM Template with KalidoKit

Quick-start your Vtuber app with this simple remixable example on Glitch. Face, full-body, and hand tracking in under 350 lines of javascript. This demo uses Mediapipe Holistic for body tracking, Three.js + Three-VRM for rendering models, and KalidoKit for the kinematic calculations. This demo uses a minimal amount of easing to smooth animations, but feel free to make it your own!

Remix on Glitch

Basic Usage

Kalidokit Template

The implementation may vary depending on what pose and face detection model you choose to use, but the principle is still the same. This example uses Mediapipe Holistic which concisely combines them together.

{ await holistic.send({image: HTMLVideoElement}); }, width: 640, height: 480 }); camera.start(); ">
import * as Kalidokit from 'kalidokit'
import '@mediapipe/holistic/holistic';
import '@mediapipe/camera_utils/camera_utils';

let holistic = new Holistic({locateFile: (file) => {
    return `https://cdn.jsdelivr.net/npm/@mediapipe/[email protected]/${file}`;
}});

holistic.onResults(results=>{
    // do something with prediction results
    // landmark names may change depending on TFJS/Mediapipe model version
    let facelm = results.faceLandmarks;
    let poselm = results.poseLandmarks;
    let poselm3D = results.ea;
    let rightHandlm = results.rightHandLandmarks;
    let leftHandlm = results.leftHandLandmarks;

    let faceRig = Kalidokit.Face.solve(facelm,{runtime:'mediapipe',video:HTMLVideoElement})
    let poseRig = Kalidokit.Pose.solve(poselm3d,poselm,{runtime:'mediapipe',video:HTMLVideoElement})
    let rightHandRig = Kalidokit.Hand.solve(rightHandlm,"Right")
    let leftHandRig = Kalidokit.Hand.solve(leftHandlm,"Left")

    };
});

// use Mediapipe's webcam utils to send video to holistic every frame
const camera = new Camera(HTMLVideoElement, {
  onFrame: async () => {
    await holistic.send({image: HTMLVideoElement});
  },
  width: 640,
  height: 480
});
camera.start();

Slight differences with Mediapipe and Tensorflow.js

Due to slight differences in the results from Mediapipe and Tensorflow.js, it is recommended to specify which runtime version you are using as well as the video input/image size as a reference.

Kalidokit.Pose.solve(poselm3D,poselm,{
    runtime:'tfjs', // default is 'mediapipe'
    video: HTMLVideoElement,// specify an html video or manually set image size
    imageSize:{
        width: 640,
        height: 480,
    };
})

Kalidokit.Face.solve(facelm,{
    runtime:'mediapipe', // default is 'tfjs'
    video: HTMLVideoElement,// specify an html video or manually set image size
    imageSize:{
        width: 640,
        height: 480,
    };
})

Outputs

Below are the expected results from KalidoKit solvers.

// Kalidokit.Face.solve()
// Head rotations in radians
// Degrees and normalized rotations also available
{
    eye: {l: 1,r: 1},
    mouth: {
        x: 0,
        y: 0,
        shape: {A:0, E:0, I:0, O:0, U:0}
    },
    head: {
        x: 0,
        y: 0,
        z: 0,
        width: 0.3,
        height: 0.6,
        position: {x: 0.5, y: 0.5, z: 0}
    },
    brow: 0,
    pupil: {x: 0, y: 0}
}
// Kalidokit.Pose.solve()
// Joint rotations in radians, leg calculators are a WIP
{
    RightUpperArm: {x: 0, y: 0, z: -1.25},
    LeftUpperArm: {x: 0, y: 0, z: 1.25},
    RightLowerArm: {x: 0, y: 0, z: 0},
    LeftLowerArm: {x: 0, y: 0, z: 0},
    LeftUpperLeg: {x: 0, y: 0, z: 0},
    RightUpperLeg: {x: 0, y: 0, z: 0},
    RightLowerLeg: {x: 0, y: 0, z: 0},
    LeftLowerLeg: {x: 0, y: 0, z: 0},
    LeftHand: {x: 0, y: 0, z: 0},
    RightHand: {x: 0, y: 0, z: 0},
    Spine: {x: 0, y: 0, z: 0},
    Hips: {
        worldPosition: {x: 0, y: 0, z: 0},
        position: {x: 0, y: 0, z: 0},
        rotation: {x: 0, y: 0, z: 0},
    }
}
// Kalidokit.Hand.solve()
// Joint rotations in radians
// only wrist and thumb have 3 degrees of freedom
// all other finger joints move in the Z axis only
{
    RightWrist: {x: -0.13, y: -0.07, z: -1.04},
    RightRingProximal: {x: 0, y: 0, z: -0.13},
    RightRingIntermediate: {x: 0, y: 0, z: -0.4},
    RightRingDistal: {x: 0, y: 0, z: -0.04},
    RightIndexProximal: {x: 0, y: 0, z: -0.24},
    RightIndexIntermediate: {x: 0, y: 0, z: -0.25},
    RightIndexDistal: {x: 0, y: 0, z: -0.06},
    RightMiddleProximal: {x: 0, y: 0, z: -0.09},
    RightMiddleIntermediate: {x: 0, y: 0, z: -0.44},
    RightMiddleDistal: {x: 0, y: 0, z: -0.06},
    RightThumbProximal: {x: -0.23, y: -0.33, z: -0.12},
    RightThumbIntermediate: {x: -0.2, y: -0.19, z: -0.01},
    RightThumbDistal: {x: -0.2, y: 0.002, z: 0.15},
    RightLittleProximal: {x: 0, y: 0, z: -0.09},
    RightLittleIntermediate: {x: 0, y: 0, z: -0.22},
    RightLittleDistal: {x: 0, y: 0, z: -0.1}
}

Community Showcase

If you'd like to share a creative use of KalidoKit, we would love to hear about it! Feel free to also use our Twitter hashtag, #kalidokit.

Kalidoface virtual webcam Kalidoface Pose Demo

Open to Contributions

The current library is a work in progress and contributions to improve it are very welcome. Our goal is to make character face and pose animation even more accessible to creatives regardless of skill level!

Owner
Rich
Making Vtuber apps with Mediapipe and Tensorflow.js
Rich
Cross-media Structured Common Space for Multimedia Event Extraction (ACL2020)

Cross-media Structured Common Space for Multimedia Event Extraction Table of Contents Overview Requirements Data Quickstart Citation Overview The code

Manling Li 49 Nov 21, 2022
A machine learning malware analysis framework for Android apps.

🕵️ A machine learning malware analysis framework for Android apps. ☢️ DroidDetective is a Python tool for analysing Android applications (APKs) for p

James Stevenson 77 Dec 27, 2022
PyTorch implementation of "Learn to Dance with AIST++: Music Conditioned 3D Dance Generation."

Learn to Dance with AIST++: Music Conditioned 3D Dance Generation. Installation pip install -r requirements.txt Prepare Dataset bash data/scripts/pre

Zj Li 8 Sep 07, 2021
Official pytorch implementation of Active Learning for deep object detection via probabilistic modeling (ICCV 2021)

Active Learning for Deep Object Detection via Probabilistic Modeling This repository is the official PyTorch implementation of Active Learning for Dee

NVIDIA Research Projects 130 Jan 06, 2023
Like Dirt-Samples, but cleaned up

Clean-Samples Like Dirt-Samples, but cleaned up, with clear provenance and license info (generally a permissive creative commons licence but check the

TidalCycles 39 Nov 30, 2022
Revisiting Self-Training for Few-Shot Learning of Language Model.

SFLM This is the implementation of the paper Revisiting Self-Training for Few-Shot Learning of Language Model. SFLM is short for self-training for few

15 Nov 19, 2022
General Vision Benchmark, a project from OpenGVLab

Introduction We build GV-B(General Vision Benchmark) on Classification, Detection, Segmentation and Depth Estimation including 26 datasets for model e

174 Dec 27, 2022
DLFlow is a deep learning framework.

DLFlow是一套深度学习pipeline,它结合了Spark的大规模特征处理能力和Tensorflow模型构建能力。利用DLFlow可以快速处理原始特征、训练模型并进行大规模分布式预测,十分适合离线环境下的生产任务。利用DLFlow,用户只需专注于模型开发,而无需关心原始特征处理、pipeline构建、生产部署等工作。

DiDi 152 Oct 27, 2022
Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

GraspNet Baseline Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020). [paper] [dataset] [API] [do

GraspNet 209 Dec 29, 2022
Simulate genealogical trees and genomic sequence data using population genetic models

msprime msprime is a population genetics simulator based on tskit. Msprime can simulate random ancestral histories for a sample of individuals (consis

Tskit developers 150 Dec 14, 2022
Omniscient Video Super-Resolution

Omniscient Video Super-Resolution This is the official code of OVSR (Omniscient Video Super-Resolution, ICCV 2021). This work is based on PFNL. Datase

36 Oct 27, 2022
A "gym" style toolkit for building lightweight Neural Architecture Search systems

A "gym" style toolkit for building lightweight Neural Architecture Search systems

Jack Turner 12 Nov 05, 2022
Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Path-Generator-QA This is a Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Common

Peifeng Wang 33 Dec 05, 2022
Per-Pixel Classification is Not All You Need for Semantic Segmentation

MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation Bowen Cheng, Alexander G. Schwing, Alexander Kirillov [arXiv] [Proj

Facebook Research 1k Jan 08, 2023
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

2.6k Jan 04, 2023
The aim of the game, as in the original one, is to find a specific image from a group of different images of a person's face

GUESS WHO Main Links: [Github] [App] Related Links: [CLIP] [Celeba] The aim of the game, as in the original one, is to find a specific image from a gr

Arnau - DIMAI 3 Jan 04, 2022
Real-time ground filtering algorithm of cloud points acquired using Terrestrial Laser Scanner (TLS)

This repository contains tools to simulate the ground filtering process of a registered point cloud. The repository contains two filtering methods. The first method uses a normal vector, and fit to p

5 Aug 25, 2022
AI创造营 :Metaverse启动机之重构现世,结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

paddle-wechaty-Zodiac AI创造营 :Metaverse启动机之重构现世,结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人 12星座若穿越科幻剧,会拥有什么超能力呢?快来迎接你的专属超能力吧! 现在很多年轻人都喜欢看科幻剧,像是复仇者系列,里面有很多英雄、超

105 Dec 22, 2022
Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

The official code for the NeurIPS 2021 paper Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

13 Dec 22, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Antoine Yang 108 Dec 27, 2022