Skip to content

Instantly share code, notes, and snippets.

@roolebo
Last active April 21, 2025 12:21
Show Gist options
  • Save roolebo/32ffdbdede0f3c5ada949973ec195a15 to your computer and use it in GitHub Desktop.
Save roolebo/32ffdbdede0f3c5ada949973ec195a15 to your computer and use it in GitHub Desktop.
#!/bin/bash
# SPDX-License-Identifier: MIT
# Copyright (C) 2024 Roman Bolshakov.
# All rights reserved.
#
# The script was tested on Samsung PM173x with FW EPK9CB5Q and EPK9GB5Q.
#
# It's recommended to use at least VQ=3 VI=3 with PM173x for decent
# single-threaded I/O performance.
#
# NVMe namespaces can be created and attached to secondary controllers at any
# moment, regardless if seconary controllers are offline or online.
#
# Example invocation to create three secondary NVMe controllers and expose them
# to the host:
# sudo env DRIVER=nvme VQ=3 VI=3 NUM_VFS=3 ./nvme-vf SERIAL|nvmeX
#
# NB1! It's recommended to pass NVMe serial as an argument rather than device
# name to the script because Linux NVMe controller number is dynamic and may
# change across reboots.
#
# NB2! A number of Primary Flexible Resources should be freed once before using
# the script. E.g. to give all Flexible Resources for secondary controllers:
#
# nvme virt-mgmt /dev/nvmeX -c 0x41 -r 0 -n 0 -a 1
# nvme virt-mgmt /dev/nvmeX -c 0x41 -r 1 -n 0 -a 1
#
# Then you need to make a Controller Level Reset. NVMe spec defines multiple
# ways to do it but none worked on my machine. I have found a simple reboot as
# the most predictable way to apply the changes.
#
# Alternative ways to perform Controller Level Reset that may or may not work:
#
# 1. Controller reset:
# nvme reset /dev/nvmeX
# echo 1 > /sys/bus/pci/rescan
#
# 2. Susbystem reset (frezes my machine):
# nvme subsystem-reset /dev/nvmeX
#
# After the reboot/reset you should see the zero-valued vqrfap and virfap in
# nvme Primary Controller Capabilities structure:
#
# nvme primary-ctrl-caps /dev/nvmeX
# NVME Identify Primary Controller Capabilities:
# cntlid : 0x41
# portid : 0
# crt : 0x3
# vqfrt : 226
# vqrfa : 0
# vqrfap : 0
# vqprt : 64
# vqfrsm : 9
# vqgran : 1
# vifrt : 226
# virfa : 0
# virfap : 0
# viprt : 64
# vifrsm : 9
# vigran : 1
# vigran : 1
#
# Now you're good to go to create NVMe VFs with the script.
#
set -e
: ${VQ:=2}
: ${VI:=1}
: ${NUM_VFS:=1}
: ${DRIVER:=vfio-pci}
die()
{
echo >&2 "$@"
exit 1
}
quiet_nvme()
{
out=$(nvme "$@" 2>&1) || {
local rc=$?
echo >&2 "ERROR: $out"
return $rc
}
}
[ -z "$1" ] && die "ERROR: NVMe device or NVMe device serial is not provided"
[ "$UID" == 0 ] || die "ERROR: Need superuser permissions"
if [ -e "/sys/class/nvme/$1" ]; then
NVME=$1
else
# Test if it's a serial
for nvme in /sys/class/nvme/*; do
[ "$1" == $(cat $nvme/serial) ] && {
NVME=$(basename $nvme)
break
}
done
fi
[ -z "$NVME" ] && die "ERROR: Invalid NVMe device: $1"
PF=$(basename $(readlink /sys/class/nvme/$NVME/device))
SECONDARY_CTRL_LIST_JSON=$(nvme list-secondary /dev/$NVME -o json)
MAX_VFS=$(cat /sys/class/nvme/$NVME/device/sriov_totalvfs)
[ "$NUM_VFS" -gt "$MAX_VFS" ] && \
die "ERROR: Maximum number of VFs on $NVME is $MAX_VFS."
PRIMARY_CTRL_CAPS_JSON=$(nvme primary-ctrl-caps /dev/$NVME -o json)
VQFRT=$(jq .vqfrt <<<"$PRIMARY_CTRL_CAPS_JSON")
VQRFAP=$(jq .vqrfap <<<"$PRIMARY_CTRL_CAPS_JSON")
VQFRSM=$(jq .vqfrsm <<<"$PRIMARY_CTRL_CAPS_JSON")
VIFRT=$(jq .vifrt <<<"$PRIMARY_CTRL_CAPS_JSON")
VIRFAP=$(jq .virfap <<<"$PRIMARY_CTRL_CAPS_JSON")
VIFRSM=$(jq .vifrsm <<<"$PRIMARY_CTRL_CAPS_JSON")
# "If a secondary controller has no VQ Resources assigned to it, then it
# remains in the Offline state. A secondary controller cannot transition to the
# Online state until it has VQ Resources for an Admin Queue and one or more I/O
# Queues assigned to it (i.e., the minimum number of VQ Resources that may be
# assigned is two)."
[ "$VQ" -ge 2 ] || die "ERROR: Minimum VQ count is 2"
[ "$VQ" -gt "$VQFRSM" ] && die "ERROR: Maximum VQ count is $VQFRSM"
VQ_AVAIL=$(( $VQFRT - $VQRFAP ))
VQ_NEED=$(( $VQ * $NUM_VFS ))
[ "$VQ_NEED" -gt "$VQ_AVAIL" ] && \
die "ERROR: Not enough VQ resources: need $VQ_NEED, available $VQ_AVAIL"
# If a secondary controller that supports VI Resources has no VI Resources
# assigned to it, then it remains in the Offline state. A secondary controller
# cannot transition to the Online state until it has a VI Resource for
# interrupt vector 0 assigned to it. For a secondary controller that supports
# VI Resources with MSI-X vectors, if no VI Resources are assigned to it, then
# MSIXCAP.MXC.TS is reserved.
[ "$VI" -ge 1 ] || die "ERROR: Minimum VI count is 1"
[ "$VI" -gt "$VIFRSM" ] && die "ERROR: Maximum VI count is $VIFRSM"
VI_AVAIL=$(( $VIFRT - $VIRFAP ))
VI_NEED=$(( $VI * $NUM_VFS ))
[ "$VI_NEED" -gt "$VI_AVAIL" ] && \
die "ERROR: Not enough VI resources: need $VI_NEED, available $VI_AVAIL"
# Pre-load the desired driver
if [ -n "$DRIVER" ]; then
modprobe $DRIVER
fi
# Avoid autoprobe hang when controller is not yet online
echo 0 > /sys/bus/pci/devices/$PF/sriov_drivers_autoprobe
CUR_VFS=$(cat /sys/bus/pci/devices/$PF/sriov_numvfs)
if [ "$CUR_VFS" != "$NUM_VFS" ]; then
echo 0 > /sys/bus/pci/devices/$PF/sriov_numvfs
[ "$NUM_VFS" != 0 ] && echo $NUM_VFS > /sys/bus/pci/devices/$PF/sriov_numvfs
fi
VFS=()
for i in $(seq 1 $NUM_VFS); do
VFN=$(jq ".[\"secondary-controllers\"][$(( $i - 1 ))][\"virtual-function-number\"]" <<<"$SECONDARY_CTRL_LIST_JSON")
PF_BUS=$(grep -oP '[[:alnum:]]{4}:[[:alnum:]]{2}' <<< "$PF")
PF_DEV=$(grep -oP '[[:alnum:]]{4}:[[:alnum:]]{2}:\K[[:alnum:]]{2}' <<< "$PF")
VF=$(printf "$PF_BUS:%02x.%x" $(( 0x$PF_DEV + $VFN / 8 )) $(( $VFN % 8 )))
[ -e "/sys/bus/pci/devices/$VF/driver" ] && \
echo "$VF" > "/sys/bus/pci/devices/$VF/driver/unbind"
# Per NVMe Specification, revision 1.3, 8.5.3, "To ensure that the host
# accurately detects capabilities of the secondary controller, the host
# should complete the following procedure to bring a secondary controller
# Online:
#
# 1. Use the Virtualization Management command to set the secondary
# controller to the Offline state.
quiet_nvme virt-mgmt /dev/$NVME -c $i -a 7
# 2. Use the Virtualization Management command to assign VQ resources and VI
# resources.
quiet_nvme virt-mgmt /dev/$NVME -c $i -r 0 -n $VQ -a 8
quiet_nvme virt-mgmt /dev/$NVME -c $i -r 1 -n $VI -a 8
# 3. Perform a Controller Level Reset. If the secondary controller is a VF,
# then this should be a VF Function Level Reset.
echo 1 > /sys/bus/pci/devices/$VF/reset
# 4. Use the Virtualization Management command to set the secondary
# controller to the Online state."
quiet_nvme virt-mgmt /dev/$NVME -c $i -a 9
if [ -n "$DRIVER" ]; then
echo "$DRIVER" > /sys/bus/pci/devices/$VF/driver_override
fi
echo "VF $VF online"
VFS+=($VF)
done
# Enable autoprobe to allow explicit binding
echo 1 > /sys/bus/pci/devices/$PF/sriov_drivers_autoprobe
for VF in "${VFS[@]}"; do
echo "$VF" > "/sys/bus/pci/drivers/$DRIVER/bind"
echo "VF $VF bound to $DRIVER"
done
@rivkab721
Copy link

my ssd samsung FW is EPK9AB5Q

@rivkab721
Copy link

i did run like this; sudo env DRIVER=nvme VQ=3 VI=3 NUM_VFS=3 ./nvme-vf nvme0
and i received :

VF 0000:5e:00.1 online
VF 0000:5e:00.2 online
VF 0000:5e:00.3 online

Message from syslogd@ladj1814 at Mar 12 14:36:13 ...
kernel:[ 349.806664] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/0:1:10]

Message from syslogd@ladj1814 at Mar 12 14:36:45 ...
kernel:[ 376.056913] watchdog: BUG: soft lockup - CPU#0 stuck for 52s! [kworker/0:1:10]

how i cam solve this ?

@rivkab721
Copy link

hi, i extracted the steps manually from your script and there are below:

Preparation before:
sudo nvme virt-mgmt /dev/nvme0n1 -c 0x41 -r 0 -n 0 -a 1
sudo nvme virt-mgmt /dev/nvme0n1 -c 0x41 -r 1 -n 0 -a 1

sudo nvme reset /dev/nvme0n1
echo 1 | sudo tee /sys/bus/pci/rescan

the beginning of the steps:
sudo modprobe nvme

echo 0 | sudo tee /sys/bus/pci/devices/0000:5e:00.0/sriov_drivers_autoprobe

echo 0 | sudo tee /sys/bus/pci/devices/0000:5e:00.0/sriov_numvfs
echo 1 | sudo tee /sys/bus/pci/devices/0000:5e:00.0/sriov_numvfs

echo "0000:5e:00.1" | sudo tee "/sys/bus/pci/devices/0000:5e:00.0/driver/unbind"

sudo nvme virt-mgmt /dev/nvme0n1 -c 1 -a 7
sudo nvme virt-mgmt /dev/nvme0n1 -c 1 -r 0 -n 2 -a 8
sudo nvme virt-mgmt /dev/nvme0n1 -c 1 -r 1 -n 1 -a 8
echo 1 | sudo tee /sys/bus/pci/devices/0000:5e:00.1/reset
sudo nvme virt-mgmt /dev/nvme0n1 -c 1 -a 9

echo "nvme" | sudo tee /sys/bus/pci/devices/0000:5e:00.1/driver_override
sudo echo "VF 0000:5e:00.1 online"
echo 1 | sudo tee /sys/bus/pci/devices/0000:5e:00.0/sriov_drivers_autoprobe
echo "0000:5e:00.1" | sudo tee /sys/bus/pci/drivers/nvme/bind
echo "VF 0000:5e:00.1 bound to nvme"

ok and during the step before the last one , the host is stuck , disconnected, how i can fix it ??

@roolebo
Copy link
Author

roolebo commented Mar 14, 2025

i did run like this; sudo env DRIVER=nvme VQ=3 VI=3 NUM_VFS=3 ./nvme-vf nvme0
and i received :

VF 0000:5e:00.1 online
VF 0000:5e:00.2 online
VF 0000:5e:00.3 online

This means virtual functions are created.

Message from syslogd@ladj1814 at Mar 12 14:36:13 ...
kernel:[ 349.806664] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/0:1:10]

You need to check your dmesg when this happens.

my ssd samsung FW is EPK9AB5Q

You have an early revision of firmware, you likely need to update it EPK9CB5Q or EPK9GB5Q.

Please see the post to update your firmware, then the script should work without the issue you have hit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment