FelixCLC FCLC

Context

I was helping a few computer science students and enthusiasts understand “how” modern processors got to be “so fast” outside of clock speed increases.

Here is the main ;p exert

Acronyms:

SIMD: Single Instruction, Multiple Data

The follow up on M1 Cluster- Jetson Orin AGX

Original M1 Piece here: https://gist.github.com/FCLC/6e0f0e79e9d4f5740573f09d7579eb72

No system exists in a vacuum, and so as a follow up to the M1 Cluster, I thought I’d look at a similar cluster based on another integrated ARM device.

Oracle typically builds a Rpi cluster every few years. Their most recent unit, built using 1060 Rpi 3B+ is an interesting piece of tech. Another is the 750 Pi cluster built by LANL. But Pi clusters seem like the domain of Jeff Geerling and co. so, let’s look at something else. The most popular developer board is the Nvidia Jetson series, and the most powerful unit is the latest Orin AGX 64GB.

Setting the stage

The Beginning

During the 13th of January 2023 HPC Huddle (now hosted by hpc.social) the topic of #HPC development and workloads on Apple silicon came up briefly.

Thinking on it, once #Asahi Linux has GPU compute support squared away; I can see a world where devices like Mac Studio with M1 ultra are augmented by Thunderbolt4 networking cards. Even if it is for PR, vendors like Oracle amongst others have demonstrated a willingness to build weird and wonderful clusters as a “because we can.” It is far from Ideal, but we have done worse to get less. Beyond Oracle and the Pi cluster, the US DOD/Air Force ran a PS3 cluster for years. https://phys.org/news/2010-12-air-playstation-3s-supercomputer.html

Setting the stage

A few baselines before I go on:

Ignore this post and read the new one instead: https://gist.github.com/FCLC/6e0f0e79e9d4f5740573f09d7579eb72

Originally this was a borderline copy/paste of a Mastodon exchange. it was fairly crap, so I rewrote the whole thing; the updated version is available via the link above. I prefer not to hide this sort of thing, so the archive will remain public

# Warnings and alarm bells

"What Cursed thing are you talking about now?"

Warning: This is going to be a long one.

I'm assuming general knowledge of x86_64 hardware extensions, and some insight into the workings of large hardware vendors.

Understanding why AVX512 is useful not only in HPC, but also for gamers in emulation, or more efficient use of executions ports is a bonus.

You don't need to have published 2 dozen papers on optimizing compute architecture.

currently out of date as of September 2022, needs a fresh update

TLDR; edit the amdgpu-install script to add pop as supported debian distribution, comment out check for linux-modules-extra-[versions] since they're provided by linux-modules [per Jeremy Solle of System_76], then run with --no-dkms

3 EDIT: 4 things need to be done.

Problem 1) Pop!_OS not valid install target "Unsupported OS: /etc/os-release ID 'pop'" The issue as of now is that amdgpu-install doesnt recognize pop as a valid installation candidate. Solution is to add pop as a valid target in the amdgpu-install script.

	Felix$ spack install hipsycl%"[email protected]"
	==> Installing boost-1.69.0-z4oleaz6rrkaqoibusrp4e4bol3wpbv7
	==> No binary for boost-1.69.0-z4oleaz6rrkaqoibusrp4e4bol3wpbv7 found: installing from source
	==> Using cached archive: /usr/local/Cellar/spack/0.19.0/var/spack/cache/_source-cache/archive/8f/8f32d4617390d1c2d16f26a27ab60d97807b35440d45891fa340fc2648b04406.tar.bz2
	==> Applied patch /usr/local/Cellar/spack/0.19.0/var/spack/repos/builtin/packages/boost/darwin_clang_version.patch
	==> Applied patch /usr/local/Cellar/spack/0.19.0/var/spack/repos/builtin/packages/boost/system-non-virtual-dtor-include.patch
	==> Applied patch /usr/local/Cellar/spack/0.19.0/var/spack/repos/builtin/packages/boost/system-non-virtual-dtor-test.patch
	==> Applied patch /usr/local/Cellar/spack/0.19.0/var/spack/repos/builtin/packages/boost/pthread-stack-min-fix.patch
	==> Ran patch() for boost
	==> boost: Executing phase: 'install'