Build an Azure stack to operate NP-series VMs on Azure with Dragen's pay-as-you-go (PAYG) license

Purpose

Prerequisites

Sign up for an Azure subscription if you don't already have one. NP-series VMs are not available on the Free Trial.
Follow these instructions to register resource providers Microsoft.Network, Microsoft.Storage, and Microsoft.Compute.
Visit Quotas in Azure Portal, login if needed, and increase Standard NPS Family vCPUs in your preferred region. These VMs are only available in some regions per the FAQs here. Based on demand for these SKUs in your region, you may also need to submit a service request and justify your use-case to a person before that quota gets approved. Also note that a quota of 40 vCPUs lets you run 4 NP10 VMs at a time, or 2 NP20 VMs, or 1 NP40 VM.
Visit this page, login if needed, and ensure that Status is set to Enable for the Azure subscription you intend to use. This allows programmatic deployment of the Dragen 4.3.14 Pay-As-You-Go (PAYG) VMs that we intend to use.
All the commands in this repo were written for Ubuntu 24.04 on WSL2 with these dotfiles, but you should be fine with Bash in any Linux environment.
Generate an SSH key using the Ed25519 algorithm and store it as ~/.ssh/id_ed25519. We'll use this to SSH into VMs.
Install Azure CLI, then run az login --use-device-code and follow the instructions to link your subscription.

Accept the Terms of Use for the Dragen PAYG image we will use.

az vm image terms accept --urn illuminainc1586452220102:dragen-vm-payg:dragen-4-3-14-payg:latest

Install AzCopy to upload/download data.

Build

Make separate resource groups for networking, storage accounts, and VMs.

az group create --name dgn-net-rg --location southcentralus
az group create --name dgn-st-rg --location southcentralus
az group create --name dgn-vms-rg --location southcentralus

Create a VNet with a subnet that denies all outbound connections to the internet, and permits SSH connections from our current IP address.

az network nsg create --resource-group dgn-net-rg --name dgn-nsg
az network vnet create --resource-group dgn-net-rg --network-security-group dgn-nsg --name dgn-vnet --address-prefixes "10.145.188.0/24" --private-endpoint-vnet-policies basic
az network vnet subnet create --resource-group dgn-net-rg --network-security-group dgn-nsg --vnet-name dgn-vnet --name dgn-sub1 --address-prefixes "10.145.188.0/25" --service-endpoints Microsoft.Storage.Global --private-endpoint-network-policies enabled
az network nsg rule create --resource-group dgn-net-rg --nsg-name dgn-nsg --name DenyInternetOutBound --priority 200 --direction Outbound --access Deny --destination-address-prefixes Internet --destination-port-ranges "*" --protocol "*"
az network nsg rule create --resource-group dgn-net-rg --nsg-name dgn-nsg --name AllowSSHInBound --priority 200 --protocol TCP --access Allow --direction Inbound --source-address-prefixes $(curl -s https://icanhazip.com) --source-port-ranges "*" --destination-address-prefixes "*" --destination-port-ranges 22

Permit VMs in this subnet to connect to the three IP addresses that license.edicogenome.com resolves to.

az network nsg rule create --resource-group dgn-net-rg --nsg-name dgn-nsg --name AllowLicenseServer1 --priority 101 --direction Outbound --access Allow --destination-address-prefixes 3.218.255.84 --destination-port-ranges 443 --protocol TCP
az network nsg rule create --resource-group dgn-net-rg --nsg-name dgn-nsg --name AllowLicenseServer2 --priority 102 --direction Outbound --access Allow --destination-address-prefixes 18.235.29.22 --destination-port-ranges 443 --protocol TCP
az network nsg rule create --resource-group dgn-net-rg --nsg-name dgn-nsg --name AllowLicenseServer3 --priority 103 --direction Outbound --access Allow --destination-address-prefixes 54.209.234.156 --destination-port-ranges 443 --protocol TCP

Create a storage account dgntestdata with hierarchical namespace enabled (ADLS Gen2) and prevent access to it from anywhere other than our current IP address or the subnet created earlier.

az storage account create --resource-group dgn-st-rg --name dgntestdata --kind StorageV2 --access-tier Hot --sku Standard_LRS --enable-hierarchical-namespace true --min-tls-version TLS1_2 --public-network-access enabled --default-action deny --publish-internet-endpoints false --publish-microsoft-endpoints false --routing-choice MicrosoftRouting
SUBNET=$(az network vnet subnet show --resource-group dgn-net-rg --vnet-name dgn-vnet --name dgn-sub1 --query id --output tsv)
az storage account network-rule add --resource-group dgn-st-rg --account-name dgntestdata --subnet $SUBNET
az storage account network-rule add --resource-group dgn-st-rg --account-name dgntestdata --ip-address $(curl -s https://icanhazip.com)

Also permit access to the storage account in the NSG that blocks all outbound internet access.

STIP=$(getent hosts dgntestdata.blob.core.windows.net | awk '{ print $1 }')
az network nsg rule create --resource-group dgn-net-rg --nsg-name dgn-nsg --name AllowStorageAccount --priority 100 --direction Outbound --access Allow --destination-address-prefixes $STIP --destination-port-ranges 443 --protocol TCP

Create blob containers (aka file systems) for input FASTQs, reference data, and output data from Dragen.

az storage container create --account-name dgntestdata --name fqs --auth-mode login --public-access off
az storage container create --account-name dgntestdata --name ref --auth-mode login --public-access off
az storage container create --account-name dgntestdata --name dgn --auth-mode login --public-access off

Upload

Set environment variables that allow Azure CLI to access our storage account.

export AZURE_STORAGE_ACCOUNT=dgntestdata
export AZURE_STORAGE_KEY=$(az storage account keys list --account-name dgntestdata --query [0].value --output tsv)

Generate a single-use short-lived SAS token to upload data into the ref container, and use it to upload the hg38 v4 Graph Reference Genome compatible with Dragen 4.3.

curl -LO https://webdata.illumina.com/downloads/software/dragen/resource-files/hg38-alt_masked.cnv.graph.hla.rna-10-r4.0-1.tar.gz
gzip -d hg38-alt_masked.cnv.graph.hla.rna-10-r4.0-1.tar.gz
END=$(date -u -d "2 hours" '+%Y-%m-%dT%H:%MZ')
SAS=$(az storage container generate-sas --name ref --permissions cw --expiry $END --output tsv)
azcopy cp hg38-alt_masked.cnv.graph.hla.rna-10-r4.0-1.tar "https://dgntestdata.blob.core.windows.net/ref/hg38/?${SAS}" --content-type="application/x-tar"

Similarly, download test FASTQs from the dataset created here, and upload them into the fqs container.

curl -LO https://data.cyri.ac/test_trio_wgs.tar
tar -xf test_trio_wgs.tar --wildcards *_{L001,L002}_{R1,R2}_001.fastq.gz
END=$(date -u -d "20 mins" '+%Y-%m-%dT%H:%MZ')
SAS=$(az storage container generate-sas --name fqs --permissions cw --expiry $END --output tsv)
azcopy cp dad "https://dgntestdata.blob.core.windows.net/fqs/ajtrio/?${SAS}" --recursive --content-type="text/fastq" --content-encoding="gzip"
azcopy cp mom "https://dgntestdata.blob.core.windows.net/fqs/ajtrio/?${SAS}" --recursive --content-type="text/fastq" --content-encoding="gzip"
azcopy cp son "https://dgntestdata.blob.core.windows.net/fqs/ajtrio/?${SAS}" --recursive --content-type="text/fastq" --content-encoding="gzip"

Test

Set up an SSH configuration that we have determined (with trial and error) will reliably get us into Azure VMs to run long-running scripts.

SSH_USERNAME="azureuser"
SSH_AUTH_KEY="~/.ssh/id_ed25519"
SSH_OPTIONS="-q -i ${SSH_AUTH_KEY} -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=QUIET -o ServerAliveInterval=120 -o ServerAliveCountMax=30"

Start a Dragen 4.3.14 VM on the subnet and NSG created earlier.

SUBNET=$(az network vnet subnet show --resource-group dgn-net-rg --vnet-name dgn-vnet --name dgn-sub1 --query id --output tsv)
NSG=$(az network nsg show --resource-group dgn-net-rg --name dgn-nsg --query id --output tsv)
VMIP=$(az vm create --resource-group dgn-vms-rg --subnet ${SUBNET} --nsg ${NSG} --name dgn1 --size Standard_NP10s --ephemeral-os-disk true --ephemeral-os-disk-placement ResourceDisk --nic-delete-option delete --image illuminainc1586452220102:dragen-vm-payg:dragen-4-3-14-payg:latest --admin-username ${SSH_USERNAME} --ssh-key-values ${SSH_AUTH_KEY}.pub --query publicIpAddress --output tsv)

Copy over a script that does alignment and variant calling, then run it for one of the test samples.

curl -LO https://gist.githubusercontent.com/ckandoth/4006866209475ae558ead88a53e6b59f/raw/align_fastqs.sh
chmod +x align_fastqs.sh
scp ${SSH_OPTIONS} align_fastqs.sh ${SSH_USERNAME}@${VMIP}:~/
ssh ${SSH_OPTIONS} ${SSH_USERNAME}@${VMIP} "AZURE_STORAGE_ACCOUNT=${AZURE_STORAGE_ACCOUNT} AZURE_STORAGE_KEY=${AZURE_STORAGE_KEY} ./align_fastqs.sh fqs/ajtrio/mom ref/hg38 dgn/ajtrio-mom"

NOTE: When using the Dragen PAYG 4.3.6 image, we needed to add LANG='en_US.UTF-8' before the SSH command above because of a bug in the CentOS 7 image that breaks something in the FPGA driver when LANG='C.UTF-8', the default LANG in Ubuntu 24.04 Server. This is no longer an issue in the Dragen PAYG 4.3.14 image which uses AlmaLinux 8 instead.

Make sure the Dragen output was uploaded into the dgn container.

az storage blob list --container dgn --prefix ajtrio-mom --query [].name --output tsv

Delete the VM to save money.

az vm delete --yes --resource-group dgn-vms-rg --name dgn1

Now we are ready to orchestrate the creation of VMs and have them analyze multiple samples in parallel and/or in series.

Now that we have CRAM files and gVCFs representing the raw FASTQs, we can either delete the FASTQs or set their blobs to Cold tier if we can be certain not to unarchive them for at least 1 year. Here is how we can move a set of FASTQs into Cold tier after alignment.

az storage blob list --container-name fqs --prefix ajtrio/mom --query [].name --output tsv | grep fastq.gz$ | xargs -L1 az storage blob set-tier --container-name fqs --tier cold --name

ckandoth/az-dgn.md

Purpose

Prerequisites

Build

Upload

Test