Build an Azure stack to operate NP-series VMs on Azure with Dragen's pay-as-you-go (PAYG) license
- Sign up for an Azure subscription if you don't already have one. NP-series VMs are not available on the Free Trial.
- Follow these instructions to register resource providers
Microsoft.Network
,Microsoft.Storage
, andMicrosoft.Compute
. - Visit Quotas in Azure Portal, login if needed, and increase
Standard NPS Family vCPUs
in your preferred region. These VMs are only available in some regions per the FAQs here. Based on demand for these SKUs in your region, you may also need to submit a service request and justify your use-case to a person before that quota gets approved. Also note that a quota of 40 vCPUs lets you run 4 NP10 VMs at a time, or 2 NP20 VMs, or 1 NP40 VM. - Visit this page, login if needed, and ensure that
Status
is set toEnable
for the Azure subscription you intend to use. This allows programmatic deployment of the Dragen 4.3.6 Pay-As-You-Go (PAYG) VMs that we intend to use. - All the commands in this repo were written for Ubuntu 24.04 on WSL2 with these dotfiles, but you should be fine with Bash in any Linux environment.
- Generate an SSH key using the Ed25519 algorithm and store it as
~/.ssh/id_ed25519
. We'll use this to SSH into VMs. - Install Azure CLI, then run
az login --use-device-code
and follow the instructions to link your subscription. - Accept the Terms of Use for the Dragen PAYG image we will use.
az vm image terms accept --urn illuminainc1586452220102:dragen-vm-payg:dragen-4-3-6-payg:latest
- Install AzCopy to upload/download data.
Make separate resource groups for networking, storage accounts, and VMs.
az group create --name dgn-net-rg --location southcentralus
az group create --name dgn-st-rg --location southcentralus
az group create --name dgn-vms-rg --location southcentralus
Create a VNet with a subnet that denies all outbound connections to the internet, and permits SSH connections from our current IP address.
az network nsg create --resource-group dgn-net-rg --name dgn-nsg
az network vnet create --resource-group dgn-net-rg --network-security-group dgn-nsg --name dgn-vnet --address-prefixes "10.145.188.0/24" --private-endpoint-vnet-policies basic
az network vnet subnet create --resource-group dgn-net-rg --network-security-group dgn-nsg --vnet-name dgn-vnet --name dgn-sub1 --address-prefixes "10.145.188.0/25" --service-endpoints Microsoft.Storage.Global --private-endpoint-network-policies enabled
az network nsg rule create --resource-group dgn-net-rg --nsg-name dgn-nsg --name DenyInternetOutBound --priority 200 --direction Outbound --access Deny --destination-address-prefixes Internet --destination-port-ranges "*" --protocol "*"
az network nsg rule create --resource-group dgn-net-rg --nsg-name dgn-nsg --name AllowSSHInBound --priority 200 --protocol TCP --access Allow --direction Inbound --source-address-prefixes $(curl -s https://icanhazip.com) --source-port-ranges "*" --destination-address-prefixes "*" --destination-port-ranges 22
Permit VMs in this subnet to connect to the three IP addresses that license.edicogenome.com
resolves to.
az network nsg rule create --resource-group dgn-net-rg --nsg-name dgn-nsg --name AllowLicenseServer1 --priority 101 --direction Outbound --access Allow --destination-address-prefixes 52.20.23.25 --destination-port-ranges 443 --protocol TCP
az network nsg rule create --resource-group dgn-net-rg --nsg-name dgn-nsg --name AllowLicenseServer2 --priority 102 --direction Outbound --access Allow --destination-address-prefixes 52.55.106.178 --destination-port-ranges 443 --protocol TCP
az network nsg rule create --resource-group dgn-net-rg --nsg-name dgn-nsg --name AllowLicenseServer3 --priority 103 --direction Outbound --access Allow --destination-address-prefixes 54.81.202.239 --destination-port-ranges 443 --protocol TCP
Create a storage account dgntestdata
with hierarchical namespace enabled (ADLS Gen2) and prevent access to it from anywhere other than our current IP address or the subnet created earlier.
az storage account create --resource-group dgn-st-rg --name dgntestdata --kind StorageV2 --access-tier Hot --sku Standard_LRS --enable-hierarchical-namespace true --min-tls-version TLS1_2 --public-network-access enabled --default-action deny --publish-internet-endpoints false --publish-microsoft-endpoints false --routing-choice MicrosoftRouting
SUBNET=$(az network vnet subnet show --resource-group dgn-net-rg --vnet-name dgn-vnet --name dgn-sub1 --query id --output tsv)
az storage account network-rule add --resource-group dgn-st-rg --account-name dgntestdata --subnet $SUBNET
az storage account network-rule add --resource-group dgn-st-rg --account-name dgntestdata --ip-address $(curl -s https://icanhazip.com)
Also permit access to the storage account in the NSG that blocks all outbound internet access.
STIP=$(getent hosts dgntestdata.blob.core.windows.net | awk '{ print $1 }')
az network nsg rule create --resource-group dgn-net-rg --nsg-name dgn-nsg --name AllowStorageAccount --priority 100 --direction Outbound --access Allow --destination-address-prefixes $STIP --destination-port-ranges 443 --protocol TCP
Create blob containers (aka file systems) for input FASTQs, reference data, and output data from Dragen.
az storage container create --account-name dgntestdata --name fqs --auth-mode login --public-access off
az storage container create --account-name dgntestdata --name ref --auth-mode login --public-access off
az storage container create --account-name dgntestdata --name dgn --auth-mode login --public-access off
Set environment variables that allow Azure CLI to access our storage account.
export AZURE_STORAGE_ACCOUNT=dgntestdata
export AZURE_STORAGE_KEY=$(az storage account keys list --account-name dgntestdata --query [0].value --output tsv)
Generate a single-use short-lived SAS token to upload data into the ref
container, and use it to upload the hg38 v4 Graph Reference Genome compatible with Dragen 4.3.
curl -LO https://webdata.illumina.com/downloads/software/dragen/resource-files/hg38-alt_masked.cnv.graph.hla.rna-10-r4.0-1.tar.gz
gzip -d hg38-alt_masked.cnv.graph.hla.rna-10-r4.0-1.tar.gz
END=$(date -u -d "2 hours" '+%Y-%m-%dT%H:%MZ')
SAS=$(az storage container generate-sas --name ref --permissions cw --expiry $END --output tsv)
azcopy cp hg38-alt_masked.cnv.graph.hla.rna-10-r4.0-1.tar "https://dgntestdata.blob.core.windows.net/ref/hg38/?${SAS}" --content-type="application/x-tar"
Similarly, download test FASTQs from the dataset created here, and upload them into the fqs
container.
curl -LO https://data.cyri.ac/test_trio_wgs.tar
tar -xf test_trio_wgs.tar --wildcards *_{L001,L002}_{R1,R2}_001.fastq.gz
END=$(date -u -d "20 mins" '+%Y-%m-%dT%H:%MZ')
SAS=$(az storage container generate-sas --name fqs --permissions cw --expiry $END --output tsv)
azcopy cp dad "https://dgntestdata.blob.core.windows.net/fqs/ajtrio/?${SAS}" --recursive --content-type="text/fastq" --content-encoding="gzip"
azcopy cp mom "https://dgntestdata.blob.core.windows.net/fqs/ajtrio/?${SAS}" --recursive --content-type="text/fastq" --content-encoding="gzip"
azcopy cp son "https://dgntestdata.blob.core.windows.net/fqs/ajtrio/?${SAS}" --recursive --content-type="text/fastq" --content-encoding="gzip"
Set up an SSH configuration that we have determined (with trial and error) will reliably get us into Azure VMs to run long-running scripts.
SSH_USERNAME="azureuser"
SSH_AUTH_KEY="~/.ssh/id_ed25519"
SSH_OPTIONS="-q -i ${SSH_AUTH_KEY} -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=QUIET -o ServerAliveInterval=120 -o ServerAliveCountMax=30"
Start a Dragen 4.3.6 VM on the subnet and NSG created earlier.
SUBNET=$(az network vnet subnet show --resource-group dgn-net-rg --vnet-name dgn-vnet --name dgn-sub1 --query id --output tsv)
NSG=$(az network nsg show --resource-group dgn-net-rg --name dgn-nsg --query id --output tsv)
VMIP=$(az vm create --resource-group dgn-vms-rg --subnet ${SUBNET} --nsg ${NSG} --name dgn1 --size Standard_NP10s --ephemeral-os-disk true --ephemeral-os-disk-placement ResourceDisk --nic-delete-option delete --image illuminainc1586452220102:dragen-vm-payg:dragen-4-3-6-payg:latest --admin-username ${SSH_USERNAME} --ssh-key-values ${SSH_AUTH_KEY}.pub --query publicIpAddress --output tsv)
Copy over a script that does alignment and variant calling, then run it for one of the test samples.
NOTE: We send LANG='en_US.UTF-8'
via SSH because of a bug in the Dragen PAYG CentOS 7 image that breaks something in the FPGA driver when LANG='C.UTF-8'
, the default LANG
in Ubuntu 24.04 Server.
curl -LO https://gist.githubusercontent.com/ckandoth/4006866209475ae558ead88a53e6b59f/raw/align_fastqs.sh
chmod +x align_fastqs.sh
scp ${SSH_OPTIONS} align_fastqs.sh ${SSH_USERNAME}@${VMIP}:~/
LANG='en_US.UTF-8' ssh ${SSH_OPTIONS} ${SSH_USERNAME}@${VMIP} "AZURE_STORAGE_ACCOUNT=${AZURE_STORAGE_ACCOUNT} AZURE_STORAGE_KEY=${AZURE_STORAGE_KEY} ./align_fastqs.sh fqs/ajtrio/mom ref/hg38 dgn/ajtrio-mom"
Make sure the Dragen output was uploaded into the dgn
container.
az storage blob list --container dgn --prefix ajtrio-mom --query [].name --output tsv
Delete the VM to save money.
az vm delete --yes --resource-group dgn-vms-rg --name dgn1
Now we are ready to orchestrate the creation of VMs and have them analyze multiple samples in parallel and/or in series.