Apple Silicon Machines

With their Apple Silicon chips, the Apple machines have moved to an arm64 architecture (from the x86-64 of the Intel ones). While the new chips are a great upgrade on their Intel counterparts, the new architecture involves some caveats and requires a transition from most pieces of software to run natively.

Thankfully, all programs meant for the retired Intel x86-64 chips can be run through Rosetta 2, a translation layer provided by Apple.

The following link is a useful resource to see which apps and pieces of software are optimized for Apple Silicon devices: https://isapplesiliconready.com/.

Rosetta 2

Using the Rosetta 2 translation layer is straightforward: when running a program requiring translation for the first time, the user will be prompted to install Rosetta 2, and no more work is ever required.

Alternatively it is possible to install it manually from the command line with:

softwareupdate --install-rosetta

Python Setup

The conda-forge people have gone through a lot of work to make automatic (as much as can be) cross-compilation of packages to various target architectures.

The easiest way to handle your Python virtual environments is to use miniforge, which is essentially a miniconda-like distribution pre-configured to use the conda-forge channel and with support for various CPU architectures.

Python Virtual Environments

Don't know what a virtual environment is or what purpose it serves? Here is a good (but lengthy) primer on virtual environments by RealPython.

The GitHub page of Miniforge contains download links to their installers for various configurations (conda/mamba, CPython/Pypy, amd64/arm64/macos-arm64 etc.). I personally recommend the mambaforge distributions, as the mamba tool is an amazing alternative to conda that one would be missing out on.

Intel-Type Python Environments

One might need, in order to install Python packages that do not run natively under Apple Silicon, to create a virtual environment with an x86-64 architecture. To do so, define the CONDA_SUBDIR environment variable to osx-64 before running the environment creation command.

For instance, to create a new virtual Python 3.9 environment named example with x86-64 architecture, do:

CONDA_SUBDIR=osx-64 conda create -n example python=3.9

As the whole environment will run through Rosetta 2, do not be surprised if its first startup (call to Python) is a little slow. Further startups will be much faster, as the translations are cached.

MAD-X & cpymad

MAD-X does not provide native builds for Apple Silicon, however one can perfectly install the madx-macosx64-gnu version and have it run through Rosetta2. The performance difference is non-existent.

As MAD-X does not run natively on Apple Silicon chips, neither does cpymad. As a consequence, in order to install cpymad on Apple Silicon, one needs to create an osx-64-type architecture environment in which to install the package. For instructions on how to do so, see the Intel-Type Python Environments tooltip in the Python Setup section.

TensorFlow on Silicon GPU

In version 2.5, TensorFlow has introduced the PluggableDevice plugins API, which Apple has used to provide a plugin to make tensorflow aware of the GPU available on Apple Silicon chips.

If one has a miniforge or mambaforge setup as instructed in the Python Setup above, creating a Python 3.9 environment to make use of tensorflow natively, running on the Apple Silicon GPU, is as simple as:

conda create -n tensorflow python=3.9 --yes
conda activate tensorflow

conda install -c apple tensorflow-deps tensorflow --yes
python -m pip install --upgrade tensorflow-metal  # PluggableDevice plugin

That's it! There is nothing more to do, TensorFlow will automatically detect the GPU and will use it for computations.

Verifying the Install

One can test that the installation has made tensorflow aware of the Apple Silicon GPU by running the following script:

from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation="relu", input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation="relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation="relu"))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation="relu"))
model.add(layers.Dense(10, activation="softmax"))
model.summary()

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype("float32") / 255
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

model.compile(optimizer="rmsprop", loss="categorical_crossentropy", metrics=["accuracy"])
model.fit(train_images, train_labels, epochs=5, batch_size=64)

test_loss, test_acc = model.evaluate(test_images, test_labels)
print(test_acc)

When running the script, one should see logged the following line, confirming that tensorflow is indeed using the Apple Silicon GPU through the ML Compute framework and the Metal backend:

tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)

Note that the device is detected with 0 MB memory, which is normal since the Apple Silicon GPU shares a unified memory with the CPU and does not have a dedicated device memory.

Additionally, one can open the Activity Monitor during the model training and confirm that the Python process gets a high percentage value in the GPU column.

PyTorch on Silicon GPU

Since version 1.12, PyTorch includes native support for Apple Silicon GPUs through Apple's Metal Performance Shaders framework.

If one has a miniforge or mambaforge setup as instructed in the Python Setup above, creating a Python 3.9 environment to make use of pytorch natively, running on the Apple Silicon GPU, is as simple as:

conda create -n pytorch python=3.9 --yes
conda activate pytorch
python -m pip install torch torchvision torchaudio

Making PyTorch use the GPU

The PyTorch integration with Metal is not as seamless as the TensorFlow one, and just like with any other accelerators, PyTorch requires you to explicitly set the device for calculations. To use the Apple Silicon GPU, one has to specify the device as mps (Metal Performance Shaders) in either the torch.device constructor, or when creating tensors:

import torch

gpu = torch.device("mps")
x = torch.ones(5, device=gpu)
# or alternatively: x = torch.ones(5, device="mps")

All following calculations will be done on the GPU:

y = x ** 2
y  # >> will print tensor([1., 1., 1., 1., 1.], device='mps:0')

One can easily load pre-existing models and transfer them to the GPU for inference:

model = YourFavoriteNet()
model.to(gpu)  # the torch.device("mps") defined two blocks above

# Now every call runs on the GPU
predictions = model(inputs)

Docker

Starting with Docker Desktop 4.3.0, the application can run natively on Apple Silicon chips.

Image Platform Warning

One might encounter the following warning when running images:

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

To circumvent these, make sure to provide the --platform flag when running the image, with the appropriate value linux/amd64. For instance, running the quickstart example from the Docker documentation would then be:

docker run --platform linux/amd64 -dp 80:80 docker/getting-started

Click here to get to the BE-ABP Docker image made by Guido. Another good starting point is to have a look at the AccPy images. Here is a guide provided by the AccPy maintainers.

AFS

The Auristor file system works on Apple Silicon and can enable one to access AFS. A step-by-step guide is available at the following link.

Here Be Demons!

The installation procedure for Auristor requires you to lower the security systems of macOS. While this is in practice an ok thing to do, please DO NOT DO SO unless you know your system well and know exactly what you are doing.