Callisto, Jupyter and Mac Optimized Machine Learning -- Part 2

In my last post, I looked at how to install TensorFlow optimized for Apple Silicon. This time around, I’ll explore Apple Silicon support in PyTorch, another wildly popular library for machine learning.

Setting up Callisto for PyTorch is easy! The suggested pip command is

pip install torch torchvision torchaudio

And we can do that directly in the Callisto package manager. Remember, you can install multiple packages at a time by adding a space separated list, so paste torch torchvision torchaudio into the install field and away we go!

I was looking for little example to run and compare performance of PyTorch on the Apple Silicon CPU with performance on the GPU. To be quite honest, it was difficult to find a straightforward example. Fortunately, I ran across this notebook by Daniel Bourke. Daniel works through an example training a model on both the CPU device and the MPS device. MPS is the Metal Performance Shaders backend which uses Apple’s Metal framework to harness the power of the M1’s graphics hardware. In this example, he creates a Convolutional Neural Network (CNN) for image classification and compares the performance of the CPU and MPS backends.

The bottom line? MPS is at least 10x faster than using the CPU. In Daniel’s posted notebook, he saw a speed up of around 10.6. On my machine, I saw a performance increase of about 11.1x. The best thing about optimization in PyTorch is that it doesn’t require any extra work. For Mac, the MPS backend is the default so everyone benefits from the performance boost.

In addition to TensorFlow and PyTorch, I checked some other popular Python ML libraries and to see how they took advantage of Apple Silicon. While some libraries have choosen not to pursue Apple Silicon specific optimization, all of them run correctly in CPU mode.

Keras
- Built on TensorFlow, Keras should show significant performance improvements when you use an optimized version of TensorFlow
FastAI
- Built on PyTorch, fastai should show significant performance improvements when you use an optimized version of PyTorch
Scikit-learn
- To avoid the management overhead and complexity, scikit-learn doesn’t support GPU acceleration
Numpy
- It maybe be possible to improve performance in numpy by compiling it against an optimized BLAS library which uses Apple’s Accelerate framework. The Accelerate framework provides high performance, vector optimized mathematical functions which are tuned for Apple Silicon. This is a bit involved and will require more research to see what impact this can have.
XGBoost
- XGBoost seems to be focused on GPUs that support CUDA for hardware acceleration and currently have no plans to support Apple Silicon.
Numba
- Numba also seems to focus only on CUDA based GPU acceleration

Callisto, Jupyter and Mac Optimized Machine Learning – Part 2