🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library

Overview

thinc-apple-ops

Make spaCy and Thinc up to 8 × faster on macOS by calling into Apple's native libraries.

Install

Make sure you have Xcode installed and then install with pip:

pip install thinc-apple-ops

🏫 Motivation

Matrix multiplication is one of the primary operations in machine learning. Since matrix multiplication is computationally expensive, using a fast matrix multiplication implementation can speed up training and prediction significantly.

Most linear algebra libraries provide matrix multiplication in the form of the standardized BLAS gemm functions. The work behind scences is done by a set of matrix multiplication kernels that are meticulously tuned for specific architectures. Matrix multiplication kernels use architecture-specific SIMD instructions for data-level parallism and can take factors such as cache sizes and intstruction latency into account. Thinc uses the BLIS linear algebra library, which provides optimized matrix multiplication kernels for most x86_64 and some ARM CPUs.

Recent Apple Silicon CPUs, such as the M-series used in Macs, differ from traditional x86_64 and ARM CPUs in that they have a separate matrix co-processor(s) called AMX. Since AMX is not well-documented, it is unclear how many AMX units Apple M CPUs have. It is certain that the (single) performance cluster of the M1 has an AMX unit and there is empirical evidence that both performance clusters of the M1 Pro/Max have an AMX unit.

Even though AMX units use a set of undocumented instructions, the units can be used through Apple's Accelerate linear algebra library. Since Accelerate implements the BLAS interface, it can be used as a replacement of the BLIS library that is used by Thinc. This is where the thinc-apple-ops package comes in. thinc-apple-ops extends the default Thinc ops, so that gemm matrix multiplication from Accelerate is used in place of the BLIS implementation of gemm. As a result, matrix multiplication in Thinc is performed on the fast AMX unit(s).

Benchmarks

Using thinc-apple-ops leads to large speedups in prediction and training on Apple Silicon Macs, as shown by the benchmarks below.

Prediction

This first benchark compares prediction speed of the de_core_news_lg spaCy model between the M1 with and without thinc-apple-ops. Results for an Intel Mac Mini and AMD Ryzen 5900X are also provided for comparison. Results are in words per second. In this prediction benchmark, using thinc-apple-ops improves performance by 4.3 times.

CPU BLIS thinc-apple-ops Package power (Watt)
Mac Mini (M1) 6492 27676 5
MacBook Air Core i5 2020 9790 10983 9
AMD Ryzen 5900X 22568 N/A 52

Training

In the second benchmark, we compare the training speed of the de_core_news_lg spaCy model (without NER). The results are in training iterations per second. Using thinc-apple-ops improves training time by 3.0 times.

CPU BLIS thinc-apple-ops Package power (Watt)
Mac Mini M1 2020 3.34 10.07 5
MacBook Air Core i5 2020 3.10 3.27 10
AMD Ryzen 5900X 6.53 N/A 53
Comments
  • Pass through Accelerate sgemm/saxpy in Ops.cblas

    Pass through Accelerate sgemm/saxpy in Ops.cblas

    This can be used by e.g. the parser in spaCy 3.4 to use Accelerate's implementations.

    I am not sure how to handle this dependency-wise, since this requires Thinc 8.1, but we still want to people to be able to use thinc-apple-ops with Thinc 8.0.x and spaCy < 3.4. Do we need another minor release that sets thinc < 8.1.0?

    opened by danieldk 5
  • IndexError: Out of bounds on buffer access (axis 1)

    IndexError: Out of bounds on buffer access (axis 1)

    Hi I tried to use this awesome package and I am getting this error. Not sure what it means, maybe you guys could help me?

    I should mention that my data is quite big and I am also using some SWAP space. Could this be the reason of this error?

    [2021-09-28 21:09:01,238] [INFO] Set up nlp object from config
    [2021-09-28 21:09:01,500] [INFO] Pipeline: ['tok2vec', 'ner', 'sentencizer', 'entity_linker']
    [2021-09-28 21:09:01,505] [INFO] Created vocabulary
    [2021-09-28 21:09:01,505] [INFO] Finished initializing nlp object
    Traceback (most recent call last):
      File "/Users/joozty/Documents/kolurbo/venv/bin/spacy", line 8, in <module>
        sys.exit(setup_cli())
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/cli/_util.py", line 69, in setup_cli
        command(prog_name=COMMAND)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
        return self.main(*args, **kwargs)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1062, in main
        rv = self.invoke(ctx)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 763, in invoke
        return __callback(*args, **kwargs)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/typer/main.py", line 500, in wrapper
        return callback(**use_params)  # type: ignore
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/cli/train.py", line 60, in train_cli
        nlp = init_nlp(config, use_gpu=use_gpu)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/training/initialize.py", line 84, in init_nlp
        nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/language.py", line 1272, in initialize
        proc.initialize(get_examples, nlp=self, **p_settings)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/pipeline/tok2vec.py", line 216, in initialize
        self.model.initialize(X=doc_sample)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 299, in initialize
        self.init(self, X=X, Y=Y)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 86, in init
        layer.initialize(X=curr_input, Y=Y)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 299, in initialize
        self.init(self, X=X, Y=Y)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 90, in init
        curr_input = layer.predict(curr_input)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 315, in predict
        return self._func(self, X, is_train=False)[0]
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in forward
        Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in <listcomp>
        Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 291, in __call__
        return self._func(self, X, is_train=is_train)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/ml/staticvectors.py", line 46, in forward
        vectors_data = model.ops.gemm(model.ops.as_contig(V[rows]), W, trans2=True)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc_apple_ops/ops.py", line 25, in gemm
        C = blas.gemm(x, y, trans1=trans1, trans2=trans2)
      File "thinc_apple_ops/blas.pyx", line 37, in thinc_apple_ops.blas.gemm
      File "thinc_apple_ops/blas.pyx", line 53, in thinc_apple_ops.blas.gemm
    IndexError: Out of bounds on buffer access (axis 1)
    

    Info about spaCy

    • spaCy version: 3.1.3
    • Platform: macOS-11.6-arm64-arm-64bit
    • Python version: 3.9.7
    • Pipelines: en_core_web_sm (3.1.0), en_core_web_md (3.1.0)
    opened by Joozty 2
  • Can't compile thinc on Macbook Air M1

    Can't compile thinc on Macbook Air M1

    Hello, I find myself unable to compile this otherwise magnificent tool! Please help, if you can!

    I am on MacOS 12.1, Kernel Version 21.2.0, and have installed the latest Python (3.10.2)

    Here is the error message I get after trying to install with pip (apparently it can't find the Accelerate Libraries, especially Accelerate.h Header ...):

    ERROR: Command errored out with exit status 1: command: /Library/Frameworks/Python.framework/Versions/3.10/bin/python3.10 /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/tmp0bhlw2sh cwd: /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-install-wgga78t9/thinc-apple-ops_f5b38888c7a149cd9f99fd524c2bd340 Complete output (34 lines): running bdist_wheel running build running build_py creating build creating build/lib.macosx-10.9-universal2-3.10 creating build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/init.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/ops.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops creating build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests copying thinc_apple_ops/tests/init.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests copying thinc_apple_ops/tests/test_gemm.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests running egg_info warning: no files found matching '.pxd' under directory 'thinc_apple_ops' warning: no files found matching '.txt' under directory 'thinc_apple_ops' writing manifest file 'thinc_apple_ops.egg-info/SOURCES.txt' copying thinc_apple_ops/blas.pyx -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/py.typed -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops running build_ext creating build/temp.macosx-10.9-universal2-3.10 creating build/temp.macosx-10.9-universal2-3.10/thinc_apple_ops clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch arm64 -arch x86_64 -g -I/private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.10/include/python3.10 -c thinc_apple_ops/blas.c -o build/temp.macosx-10.9-universal2-3.10/thinc_apple_ops/blas.o In file included from thinc_apple_ops/blas.c:706: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/arrayobject.h:5: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/ndarrayobject.h:12: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/ndarraytypes.h:1960: /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it with "
    ^ thinc_apple_ops/blas.c:714:10: fatal error: 'Accelerate/Accelerate.h' file not found #include "Accelerate/Accelerate.h" ^~~~~~~~~~~~~~~~~~~~~~~~~ thinc_apple_ops/blas.c:714:10: note: did not find header 'Accelerate.h' in framework 'Accelerate' (loaded from '/System/Library/Frameworks') 1 warning and 1 error generated. error: command '/Library/Developer/CommandLineTools/usr/bin/clang' failed with exit code 1

    ERROR: Failed building wheel for thinc-apple-ops Failed to build thinc-apple-ops ERROR: Could not build wheels for thinc-apple-ops, which is required to install pyproject.toml-based projects

    ------------------------------------------ END---------------------------------------------------------------------------

    Any help would be greatly appreciated, thanks!

    duplicate 
    opened by amal1us 1
  • AppleOps.gemm: write in-place when `output` is given

    AppleOps.gemm: write in-place when `output` is given

    NumpyOps.gemm (with BLIS) writes the result of matrix multiplication in-place when the output argument is given. This changes AppleOps.gemm to do the same, avoiding allocation of a temporary.

    enhancement 
    opened by danieldk 0
  • Change thinc upper bound to <8.1.0

    Change thinc upper bound to <8.1.0

    thinc-apple-ops will require thinc >= 8.1.0 in the future for the CBLAS passthrough functionality. As discussed in #15, we should first do another minor thinc-apple-ops release specifically for thinc <8.1.0.

    Also bump the version to v0.0.7 to prepare for the release.

    opened by danieldk 0
  • Fix 0-size arrays

    Fix 0-size arrays

    Our bit of Cython code uses memory buffers, which apparently have a bounds-check when the size is 0 when acquiring the pointer. In contrast, in other bits of code we often acquire the buffer by casting the array.data pointer, which has no such bounds check. This led to IndexError being raised when zero shapes were passed through.

    opened by honnibal 0
  • Require thinc with ops registry

    Require thinc with ops registry

    Technically it doesn't require a currently unreleased version of thinc to run, but if people install it into an existing venv, then it's better to require the version of thinc to upgraded so that it's detected and used.

    opened by adrianeboyd 0
Releases(v0.1.3)
Owner
Explosion
A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing
Explosion
Sathal's Python Projects Repository

Sathal's Python Projects Repository Purpose and Motivation I come from a mainly C Programming Language background and have previous classroom experien

Sam 1 Oct 20, 2021
This is sample project needed for security course to connect web service to database

secufaku This is sample project needed for security course to "connect web service to database". Why it suits alignment purpose It connects to postgre

Mark Nicholson 6 May 15, 2022
Decentralized intelligent voting application.

DiVA Decentralized intelligent voting application. Hack the North 2021. Inspiration Following the previous US election, many voters were fearful that

Ali Shariatmadari 4 Jun 05, 2022
Script to produce `.tex` files of example GAP sessions

Introduction The main file GapToTex.py in this directory is used to produce .tex files of example GAP sessions. Instructions Run python GapToTex.py [G

Friedrich Rober 2 Oct 06, 2022
A simple 3D rigid body simulation written in python

pyRigidBody3d A simple 3D rigid body simulation written in python

30 Oct 07, 2022
User management system (UMS), has the primary purpose of connecting to an Active Directory (AD)

💿 Sistema de Gerenciamento de Usuário (SGU) 📚 Sobre o projeto Sistema de gerenciamento de usuários (SGU), tem o objetivo primário de se conectar a u

Patrick Viegas 2 Feb 25, 2022
Learn to code in any language. If

Learn to Code It is an intiiative undertaken by Student Ambassadors Club, Jamshoro for students who are absolute begineers in programming and want to

Student Ambassadors' Club at Mehran UET 15 Oct 19, 2022
Educational Repo. Used whilst learning Flask.

flask_python Educational Repo. Used whilst learning Flask. The below instructions will be required whilst establishing as new project. Install Flask (

Jordan 2 Oct 15, 2021
JHBuild is a tool designed to ease building collections of source packages, called “modules”.

JHBuild README JHBuild is a tool designed to ease building collections of source packages, called “modules”. JHBuild was originally written for buildi

GNOME Github Mirror 46 Nov 22, 2022
Online learning platform

🛠 Status: In Development Teached is currently in development. So we encourage you to use it and give us your feedback, but there are things that have

Mohamed Nesredin 2 Feb 07, 2021
App to get data from popular polish pages with job offers

Job board parser I written simple app to get me data from popular pages with job offers, because I wanted to knew immidietly if there is some new offe

0 Jan 04, 2022
Your missing PO formatter and linter

pofmt Your missing PO formatter and linter Features Wrap msgid and msgstr with a constant max width. Can act as a pre-commit hook. Display lint errors

Frost Ming 5 Mar 22, 2022
A log likelihood fit for extracting neutrino oscillation parameters

A-log-likelihood-fit-for-extracting-neutrino-oscillation-parameters Minimised the negative log-likelihood fit to extract neutrino oscillation paramete

Vid Homsak 1 Jan 23, 2022
Simple module with some functions such as generate password (get_random_string)

Simple module with some functions such as generate password (get_random_string), fix unicode strings, size converter, dynamic console, read/write speed checker, etc.

Dmitry 2 Dec 03, 2022
A brainfuck-based game oriented language written in python.

GF.py STILL WIP Gamefuck.py is a programming language based off brainfuck. It is oriented towards game development, and as such has many commands spec

Xenon 1 Feb 23, 2022
Myrepo - A tool to create your own Arch Linux repository

myrepo A (experimental) tool to create your own Arch Linux repository Example We

Anton Hvornum 5 Feb 19, 2022
Nook is a simple, concatenative programming language written in Python.

Nook Nook is a simple, concatenative programming language written in Python. Status Nook is currently WIP. It lacks a lot of basic feature, and will n

Wumi4 4 Jul 20, 2022
This library is an ongoing effort towards bringing the data exchanging ability between Java/Scala and Python

PyJava This library is an ongoing effort towards bringing the data exchanging ability between Java/Scala and Python

Byzer 6 Oct 17, 2022
serological measurements from multiplexed ELISA assays

pysero pysero enables serological measurements with multiplexed and standard ELISA assays. The project automates estimation of antibody titers from da

Chan Zuckerberg Biohub 5 Aug 06, 2022
VCM EE1.2 P-layer feature map anchor generation 137th MPEG-VCM

VCM EE1.2 P-layer feature map anchor generation 137th MPEG-VCM

IPSL 6 Oct 18, 2022