Skip to content

Instantly share code, notes, and snippets.

@ckandoth
Last active November 3, 2025 08:28
Show Gist options
  • Save ckandoth/f6a52ae704a170f9b67e80e0aa5d4b23 to your computer and use it in GitHub Desktop.
Save ckandoth/f6a52ae704a170f9b67e80e0aa5d4b23 to your computer and use it in GitHub Desktop.
Build an Azure stack to operate NP-series VMs on Azure with Dragen's pay-as-you-go (PAYG) license

Purpose

Build an Azure stack to operate NP-series VMs on Azure with Dragen's pay-as-you-go (PAYG) license.

Prerequisites

  • Sign up for an Azure subscription at this link if you don't already have one.
  • Follow these instructions to register resource providers Microsoft.Network, Microsoft.Storage, and Microsoft.Compute.
  • Visit this page, login if needed, and ensure that Status is set to Enable for the Azure subscription you intend to use. This allows programmatic deployment of the VMs that we will use later.
  • Follow these instructions to generate an SSH key using the Ed25519 algorithm and store it as ~/.ssh/id_ed25519. We'll use this to SSH into VMs.
  • Follow these instructions to install Azure CLI, and then run az login and follow the instructions to link your subscription.
  • Accept the Terms of Use for the Dragen PAYG image we will use.
    az vm image terms accept --urn illuminainc1586452220102:dragen-vm-payg:dragen-4-4-4-payg:latest
  • NP-series VMs are restricted to specific regions per the FAQs here. But for reasons unknown, additional restrictions have been in place since early 2025. For example, the only region in the US with NP-series VMs available is South Central US. Use this command to find a region with no restrictions.
    az vm list-skus --all true --size Standard_NP10s --query "[].{location: locations[0], restriction: restrictions[0].reasonCode}" --output tsv
  • Visit Quotas in Azure Portal, login if needed, and increase Standard NPS Family vCPUs in your preferred region to 10 vCPUs. This allows you to create a single Standard_NP10s VM in this region. Based on demand for these SKUs, you may also need to submit a service request and justify your use-case to a person before that quota gets approved. Also note that a quota of 40 vCPUs lets you run 4 NP10 VMs at a time. NP20 and NP40 VMs have 2 and 4 FPGA cards respectively, but Dragen can only use one at a time. So, we'll stick with NP10.
  • All the commands in this repo were tested on Ubuntu 24.04 in WSL2 with these dotfiles, but you should be fine with Bash in any Linux environment or Zsh on macOS.

Config

Define shell variables for the Azure region and resource groups we'll create for networking, storage accounts, and VMs. Then set names of the virtual network, subnet, and network security group.

AZ_REGION="southcentralus"
RG_NET="dgn-net-rg"
RG_ST="dgn-st-rg"
RG_VMS="dgn-vms-rg"
NW_VNET="dgn-vnet"
NW_SUBNET="dgn-sub1"
NW_NSG="dgn-nsg"

Set names for virtual machines we'll create in this guide. In production, you'll likely want to have a cheap persistent VM that orchestrates the creation and deletion of many Dragen VMs that can run in parallel to process many samples. Set the username, auth key, and other options for SSH into VMs. With trial and error, we've determined below how to keep-alive SSH connections to Azure VMs.

VM_DEV="dev1"
VM_DEV_SIZE="Standard_D4as_v6"
VM_DGN="dgn1"
VM_DGN_SIZE="Standard_NP10s"
SSH_USERNAME="azureuser"
SSH_AUTH_KEY="~/.ssh/id_ed25519"
SSH_OPTIONS="-q -i ${SSH_AUTH_KEY} -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=QUIET -o ServerAliveInterval=120 -o ServerAliveCountMax=30"

Set the private IPs or IP ranges to use for various resources.

NW_VNET_IPS="10.10.10.0/24"
NW_SUBNET_IPS="10.10.10.0/25"
ST_BLOB_IP="10.10.10.64"
ST_DFS_IP="10.10.10.65"

Important: Do not blindly copy-paste the shell variable below. This is the name of a storage account you'll create later for input FASTQs, reference files, and Dragen outputs. It needs to be globally unique, so replace the random number suffix in the name below before you proceed.

ST_ACCOUNT="dgndata13465"

Build

Make resource groups for networking, storage accounts, and VMs.

az group create --name ${RG_NET} --location ${AZ_REGION}
az group create --name ${RG_ST} --location ${AZ_REGION}
az group create --name ${RG_VMS} --location ${AZ_REGION}

Create a VNet with a subnet and permit SSH connections from only our current IP address. You'll need to create more NSG rules to permit SSH from other IP addresses.

az network vnet create --resource-group ${RG_NET} --name ${NW_VNET} --address-prefixes ${NW_VNET_IPS}
az network nsg create --resource-group ${RG_NET} --name ${NW_NSG}
az network vnet subnet create --resource-group ${RG_NET} --vnet-name ${NW_VNET} --network-security-group ${NW_NSG} --name ${NW_SUBNET} --address-prefixes ${NW_SUBNET_IPS} --default-outbound-access false --service-endpoints Microsoft.Storage
az network nsg rule create --resource-group ${RG_NET} --nsg-name ${NW_NSG} --name AllowSSHInBound --priority 200 --protocol TCP --access Allow --direction Inbound --source-address-prefixes $(curl -s https://icanhazip.com) --source-port-ranges "*" --destination-address-prefixes "*" --destination-port-ranges 22

Create a storage account with hierarchical namespace enabled (ADLS Gen2), disable public network access, and create two private endpoints (blob and dfs) with static IPs.

az storage account create --resource-group ${RG_ST} --name ${ST_ACCOUNT} --kind StorageV2 --access-tier Hot --sku Standard_LRS --enable-hierarchical-namespace true --min-tls-version TLS1_2 --public-network-access disabled --default-action deny --allow-shared-key-access false --publish-internet-endpoints false --publish-microsoft-endpoints false --routing-choice MicrosoftRouting
ST_ID=$(az storage account show --resource-group ${RG_ST} --name ${ST_ACCOUNT} --query id --output tsv)
az network private-endpoint create --resource-group ${RG_NET} --connection-name ${ST_ACCOUNT}-blob-connection --name ${ST_ACCOUNT}-blob-pe --vnet-name ${NW_VNET} --subnet ${NW_SUBNET} --private-connection-resource-id ${ST_ID} --group-id blob --nic-name ${ST_ACCOUNT}-blob-nic --ip-config group-id=blob member-name=blob name=${ST_ACCOUNT}-blob-nic-ipconfig private-ip-address=${ST_BLOB_IP}
az network private-endpoint create --resource-group ${RG_NET} --connection-name ${ST_ACCOUNT}-dfs-connection --name ${ST_ACCOUNT}-dfs-pe --vnet-name ${NW_VNET} --subnet ${NW_SUBNET} --private-connection-resource-id ${ST_ID} --group-id dfs --nic-name ${ST_ACCOUNT}-dfs-nic --ip-config group-id=dfs member-name=dfs name=${ST_ACCOUNT}-dfs-nic-ipconfig private-ip-address=${ST_DFS_IP}

Create DNS entries for use by VMs in the local VNet to resolve the private endpoints into the static IPs.

az network private-dns zone create --resource-group ${RG_NET} --name privatelink.blob.core.windows.net
az network private-dns zone create --resource-group ${RG_NET} --name privatelink.dfs.core.windows.net
az network private-dns link vnet create --resource-group ${RG_NET} --zone-name privatelink.blob.core.windows.net --name ${NW_VNET}-blob-link --virtual-network ${NW_VNET} --registration-enabled false
az network private-dns link vnet create --resource-group ${RG_NET} --zone-name privatelink.dfs.core.windows.net --name ${NW_VNET}-dfs-link --virtual-network ${NW_VNET} --registration-enabled false
az network private-dns record-set a add-record --resource-group ${RG_NET} --zone-name privatelink.blob.core.windows.net --record-set-name ${ST_ACCOUNT} --ipv4-address ${ST_BLOB_IP}
az network private-dns record-set a add-record --resource-group ${RG_NET} --zone-name privatelink.dfs.core.windows.net --record-set-name ${ST_ACCOUNT} --ipv4-address ${ST_DFS_IP}

Create a managed identity with a contributor role to the storage account, that we can assign to VMs we'll create later. This is generally more secure than passing around storage account keys or SAS tokens.

SP_ID=$(az identity create --resource-group ${RG_ST} --name "dgndata-manager" --query principalId --output tsv)
ST_ID=$(az storage account show --resource-group ${RG_ST} --name ${ST_ACCOUNT} --query id --output tsv)
az role assignment create --assignee ${SP_ID} --role "Storage Blob Data Contributor" --scope ${ST_ID}

Data

Create a general-purpose Ubuntu VM with a public IP to add input FASTQs and reference files to the storage account. This will help us later to reduce runtime costs of PAYG Dragen VMs, by giving them fast access to a co-located storage account for inputs/outputs.

VM_IP=$(az network public-ip create --resource-group ${RG_NET} --name ${VM_DEV}-pip --allocation-method static --sku standard --query publicIp.ipAddress --output tsv)
VM_NIC=$(az network nic create --resource-group ${RG_NET} --network-security-group ${NW_NSG} --vnet-name ${NW_VNET} --subnet ${NW_SUBNET} --public-ip-address ${VM_DEV}-pip --name ${VM_DEV}-nic --query NewNIC.id --output tsv)
AZ_ID=$(az identity show --resource-group ${RG_ST} --name "dgndata-manager" --query id --output tsv)
az vm create --resource-group ${RG_VMS} --nics ${VM_NIC} --name ${VM_DEV} --size ${VM_DEV_SIZE} --disk-controller-type NVMe --os-disk-name ${VM_DEV}-os-disk --os-disk-size-gb 256 --os-disk-delete-option delete --nic-delete-option delete --storage-sku Premium_LRS --security-type TrustedLaunch --image Ubuntu2404 --admin-username ${SSH_USERNAME} --ssh-key-values ${SSH_AUTH_KEY}.pub --assign-identity ${AZ_ID} --only-show-errors

Set shell variables in the VM that help access the storage account. Then SSH into the VM, and install Azure CLI and azcopy.

ssh ${SSH_OPTIONS} ${SSH_USERNAME}@${VM_IP} "echo -e '\nexport AZURE_STORAGE_ACCOUNT=\"${ST_ACCOUNT}\"\nexport AZCOPY_AUTO_LOGIN_TYPE=AZCLI' >> ~/.bashrc"
ssh ${SSH_OPTIONS} ${SSH_USERNAME}@${VM_IP}
curl -sSL https://aka.ms/InstallAzureCLIDeb | sudo bash
curl -sSL -O https://packages.microsoft.com/config/ubuntu/24.04/packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb && rm packages-microsoft-prod.deb
sudo apt update && sudo apt install -y azcopy

Authenticate Azure CLI for access to the storage account, and create blob containers (aka file systems) for input FASTQs, reference files, and Dragen output.

az login --identity
az storage container create --auth-mode login --name fqs --public-access off
az storage container create --auth-mode login --name ref --public-access off
az storage container create --auth-mode login --name dgn --public-access off

Download a pair of FASTQs from GIAB sample HG002 that was sequenced using PCR-free prep and a NovaSeq 6000 for 30x mean depth. And rename them to conform to the FASTQ file naming standards that have been around since 2011.

mkdir -p giab/HG002
wget -P giab/HG002 https://storage.googleapis.com/deepvariant/benchmarking/fastq/wgs_pcr_free/30x/HG002.novaseq.pcr-free.30x.{R1,R2}.fastq.gz
mv giab/HG002/HG002.novaseq.pcr-free.30x.R1.fastq.gz giab/HG002/HG002.novaseq.pcr-free.30x_R1_001.fastq.gz
mv giab/HG002/HG002.novaseq.pcr-free.30x.R2.fastq.gz giab/HG002/HG002.novaseq.pcr-free.30x_R2_001.fastq.gz

Download the hg38 v5 Graph Reference Genome compatible with Dragen 4.4. We'll also untar it in advance, so that we don't waste time on the Dragen VM later.

mkdir -p ref/hg38_v5_pangenome
wget -P ref https://webdata.illumina.com/downloads/software/dragen/references/genome-files/hg38-alt_masked.cnv.graph.hla.methyl_cg.rna-11-r5.0-1.tar.gz
tar -zxf ref/hg38-alt_masked.cnv.graph.hla.methyl_cg.rna-11-r5.0-1.tar.gz -C ref/hg38_v5_pangenome

Download Illumina's population SNP VCF for use by the Dragen ASCN caller to measure B-allele counts. Note that at UPenn, we use a different VCF with SNPs that have a high frequency of heterozygosity within each major gnomAD subpopulation. This results in a more uniform sensitivity for loss-of-heterozygosity events across subpopulations.

curl -Lo ref/hg38_v5_pangenome/hg38_1000G_phase1.ascn.snps.vcf.gz https://webdata.illumina.com/downloads/software/dragen/resource-files/misc/hg38_1000G_phase1.snps.high_confidence.vcf.gz

Authenticate AzCopy, upload the FASTQs into the fqs container, the reference files to the ref container, and then exit the VM.

azcopy cp giab "https://${AZURE_STORAGE_ACCOUNT}.blob.core.windows.net/fqs/" --recursive --content-type="text/fastq" --content-encoding="gzip"
azcopy cp "ref/hg38_v5_pangenome/*" "https://${AZURE_STORAGE_ACCOUNT}.blob.core.windows.net/ref/hg38/" --recursive
exit

Delete the VM and its public IP to save money.

az vm delete --resource-group ${RG_VMS} --name ${VM_DEV} --yes
az network public-ip delete --resource-group ${RG_NET} --name ${VM_DEV}-pip

Test

Create a Dragen VM with a 1TB OS disk with the managed identity that has access to the storage account.

VM_IP=$(az network public-ip create --resource-group ${RG_NET} --name ${VM_DGN}-pip --allocation-method static --sku standard --query publicIp.ipAddress --output tsv)
VM_NIC=$(az network nic create --resource-group ${RG_NET} --network-security-group ${NW_NSG} --vnet-name ${NW_VNET} --subnet ${NW_SUBNET} --public-ip-address ${VM_DGN}-pip --name ${VM_DGN}-nic --query NewNIC.id --output tsv)
AZ_ID=$(az identity show --resource-group ${RG_ST} --name "dgndata-manager" --query id --output tsv)
az vm create --resource-group ${RG_VMS} --nics ${VM_NIC} --name ${VM_DGN} --size ${VM_DGN_SIZE} --os-disk-name ${VM_DGN}-os-disk --os-disk-size-gb 1024 --os-disk-delete-option delete --nic-delete-option delete --image illuminainc1586452220102:dragen-vm-payg:dragen-4-4-4-payg:latest --admin-username ${SSH_USERNAME} --ssh-key-values ${SSH_AUTH_KEY}.pub --assign-identity ${AZ_ID} --only-show-errors

Copy over the bash script that runs alignment and variant calling, and run it for the sample downloaded earlier. Use nohup to run it as a detached process.

scp ${SSH_OPTIONS} process_fastqs.sh ${SSH_USERNAME}@${VM_IP}:~/
ssh ${SSH_OPTIONS} ${SSH_USERNAME}@${VM_IP} AZURE_STORAGE_ACCOUNT=${ST_ACCOUNT} 'nohup ./process_fastqs.sh fqs/giab/HG002 ref/hg38 dgn/giab/HG002 &> /tmp/process_fastqs_HG002.log &'

You can safely monitor the output of that script using tail -f on the remote log file as follows. And use Ctrl+c to quit doing that.

ssh ${SSH_OPTIONS} ${SSH_USERNAME}@${VM_IP} tail -f /tmp/process_fastqs_HG002.log

When the script is done, make sure the Dragen output was uploaded into the dgn container in the storage account. And then copy over the runtime log for your records.

ssh ${SSH_OPTIONS} ${SSH_USERNAME}@${VM_IP} AZCOPY_AUTO_LOGIN_TYPE=AZCLI azcopy ls "https://${ST_ACCOUNT}.blob.core.windows.net/dgn/giab/HG002"
scp ${SSH_OPTIONS} ${SSH_USERNAME}@${VM_IP}:/tmp/process_fastqs_HG002.log ./

Delete the VM and its public IP to save money.

az vm delete --resource-group ${RG_VMS} --name ${VM_DGN} --yes
az network public-ip delete --resource-group ${RG_NET} --name ${VM_DGN}-pip

Now you are ready to orchestrate the creation of VMs and have them analyze multiple samples in parallel and/or in series using the process_fastqs.sh script. When used in series to process samples one after another on the same VM, the script cleans up from the prior run but reuses the previously downloaded reference files. It also creates sentinel files that your orchestration script can look for (e.g. using a polling loop) to check for successful completion (/tmp/.success) or failure (/tmp/.failure). For example, using SSH like shown below.

ssh ${SSH_OPTIONS} ${SSH_USERNAME}@${VM_IP} 'test -f /tmp/.success'
ssh ${SSH_OPTIONS} ${SSH_USERNAME}@${VM_IP} 'test -f /tmp/.failure'
#!/bin/bash
set -uo pipefail
# Clean up sentinel files from prior run if any
rm -f /tmp/.failure /tmp/.success
error() {
echo "Error: $1" >&2
touch /tmp/.failure # Signals failure to the orchestrator
exit 1
}
if (( $# != 3 )); then
cat >&2 << EOM_USAGE
Usage: ./process_fastqs.sh [FASTQ_BLOB_DIR] [REF_BLOB_DIR] [OUTPUT_BLOB_DIR]
Purpose: Process a single sample whose FASTQs are stored in a blob storage folder, and upload results back to blob storage
Command-line arguments:
FASTQ_BLOB_DIR - e.g. "fqs/giab/HG002" where "fqs" is the container and "giab/HG002" is the subfolder containing FASTQs to process
REF_BLOB_DIR - e.g. "ref/hg38" where "ref" is the container and "hg38" is the subfolder containing Dragen reference data as a tar file
OUTPUT_BLOB_DIR - e.g. "dgn/giab/HG002" where "dgn" is the container, "HG002" is the sample name, and "giab/HG002" is the subfolder for outputs
Environment variables:
AZURE_STORAGE_ACCOUNT - the name of the ADLS Gen2 storage account we will use
EOM_USAGE
touch /tmp/.failure
exit 1
fi
if [[ -z "${AZURE_STORAGE_ACCOUNT:-}" ]]; then
error "AZURE_STORAGE_ACCOUNT environment variable is not set"
fi
FASTQ_BLOB_DIR=$1
REF_BLOB_DIR=$2
OUTPUT_BLOB_DIR=$3
echo "Processing ${FASTQ_BLOB_DIR} on VM $(hostname)"
# Piece together the full blob storage URLs we will need
STORAGE_ACCT_ENDPOINT=https://${AZURE_STORAGE_ACCOUNT}.blob.core.windows.net
FASTQ_BLOB_URL="${STORAGE_ACCT_ENDPOINT}/${FASTQ_BLOB_DIR}"
REF_BLOB_URL="${STORAGE_ACCT_ENDPOINT}/${REF_BLOB_DIR}"
OUTPUT_BLOB_URL="${STORAGE_ACCT_ENDPOINT}/${OUTPUT_BLOB_DIR}"
# Format the local disk with xfs for better handling of large files and to clean up data from a prior run of this script
PARTITION=$(findmnt -no SOURCE /mnt)
sudo umount /mnt
sudo mkfs.xfs -qf $PARTITION
sudo mount $PARTITION /mnt
sudo chown -R $USER:$GROUPS /mnt || error "Failed to change ownership of /mnt"
# Create some directories we will need if they don't already exist
mkdir -p /mnt/{fqs,dgn} || error "Failed to create directories in /mnt"
mkdir -p /tmp/{ref,dgn} || error "Failed to create directories in /tmp"
# Use Azure CLI to login with the managed identity, which stores creds under ~/.azure that AzCopy can use
echo -n "Logging into managed identity in Azure subscription " && az login --identity --query [].name --output tsv
export AZCOPY_AUTO_LOGIN_TYPE=AZCLI
# If needed, upgrade AzCopy to the latest version to support auto-login with Azure CLI creds
AZCOPY_VERSION=$(azcopy --version | awk '{print $3}')
OLDER_VERSION=$(echo -e $AZCOPY_VERSION"\n10.31.0" | sort -V | head -n1)
if [[ "$OLDER_VERSION" != "10.31.0" ]]; then
echo "Upgrading AzCopy to support auto-login with Azure CLI creds..."
curl -sSL -O https://packages.microsoft.com/config/alma/8/packages-microsoft-prod.rpm
sudo rpm -i packages-microsoft-prod.rpm
rm packages-microsoft-prod.rpm
sudo yum install --quiet --assumeyes azcopy
fi
# Download reference data into /tmp if it wasn't already downloaded by a previous run of this script
REF_DIR="/tmp/${REF_BLOB_DIR}"
if [[ ! -d "${REF_DIR}" ]]; then
echo "Downloading reference data into ${REF_DIR}..."
azcopy cp "${REF_BLOB_URL}/*" ${REF_DIR} --recursive --output-level=quiet || error "Failed to download reference data"
else
echo "Reusing existing reference data under ${REF_DIR}..."
fi
# Download FASTQs and parse headers to make a FASTQ list for use with Dragen
FQS_DIR="/mnt/${FASTQ_BLOB_DIR}"
SAMPLE=$(basename $OUTPUT_BLOB_DIR)
echo "Downloading FASTQs and creating a FASTQ list for Dragen under ${FQS_DIR}..."
azcopy cp "${FASTQ_BLOB_URL}/*" ${FQS_DIR} --output-level=quiet || error "Failed to download FASTQs"
FASTQ_LIST="${FQS_DIR}/fastq_list.csv"
echo "RGID,RGSM,RGLB,Lane,Read1File,Read2File" > ${FASTQ_LIST}
for fq1 in ${FQS_DIR}/*_R1*.fastq.gz; do
HEADER=$(gzip -dc "$fq1" | head -n1)
FLOWCELL=$(echo "$HEADER" | cut -f3 -d:)
LANE=$(echo "$HEADER" | cut -f4 -d:)
RGID="${FLOWCELL}.${LANE}.${SAMPLE}"
fq2=$(echo "$fq1" | sed 's/_R1/_R2/')
if [[ -f "${fq2}" ]]; then
echo "$RGID,$SAMPLE,UnknownLibrary,$LANE,$fq1,$fq2" >> ${FASTQ_LIST}
else
error "Could not find R2 FASTQ for $fq1"
fi
done
# Locate the population SNP VCF for use by the ASCN caller to measure B-allele counts
ASCN_SNP_VCF=$(find ${REF_DIR} -name "*.ascn.snps.vcf.gz")
echo "Running Dragen on sample ${SAMPLE} using ${FASTQ_LIST}..."
OUTPUT_DIR="/mnt/${OUTPUT_BLOB_DIR}"
mkdir -p $OUTPUT_DIR
dragen --intermediate-results-dir /tmp/dgn --ref-dir "${REF_DIR}" --enable-map-align true --enable-map-align-output true --output-format CRAM --cram-version 3.1 --enable-duplicate-marking true --generate-sa-tags true --enable-sort true --qc-coverage-ignore-overlaps true --qc-detect-contamination true --enable-variant-caller true --vc-emit-ref-confidence GVCF --vc-compact-gvcf true --vc-enable-vcf-output true --vc-combine-phased-variants-distance 6 --enable-targeted true --targeted-merge-vc true --enable-sv true --enable-cnv true --cnv-population-b-allele-vcf "${ASCN_SNP_VCF}" --cnv-enable-cyto-output true --cnv-enable-mosaic-calling true --cnv-interval-width 10000 --cnv-enable-self-normalization true --cnv-enable-gcbias-correction true --cnv-counts-method start --cnv-enable-segdups-extension true --cnv-enable-tracks false --enable-hla true --enable-star-allele true --enable-pgx true --repeat-genotype-enable true --enable-mrjd true --mrjd-enable-high-sensitivity-mode true --fastq-list "${FASTQ_LIST}" --fastq-list-sample-id "${SAMPLE}" --output-directory "${OUTPUT_DIR}" --output-file-prefix "${SAMPLE}" || error "Dragen run failed."
echo "Dragen run successful. Uploading outputs to ${OUTPUT_BLOB_URL}..."
azcopy cp "${OUTPUT_DIR}/*" "${OUTPUT_BLOB_URL}" --output-level=quiet || error "Failed to upload outputs"
# Cleanup and signal completion to the orchestrator
rm -rf /tmp/dgn || error "Failed to delete /tmp/dgn"
touch /tmp/.success
echo "Finished with sample ${SAMPLE}."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment