Skip to content

Instantly share code, notes, and snippets.

@sunrise2575
Last active September 3, 2020 01:59
Show Gist options
  • Save sunrise2575/dd825ac468079ad04e29685b62672c1e to your computer and use it in GitHub Desktop.
Save sunrise2575/dd825ac468079ad04e29685b62672c1e to your computer and use it in GitHub Desktop.
How to setup multiple Samsung 1725a NVMe PCIe SSD 6.4TB on Power9 (AC922; 8335-GTH)

How to setup multiple Samsung PM1725a NVMe PCIe SSD 6.4TB on Power9 (AC922; 8335-GTH)

Environment

  • Machine: IBM AC922
  • SSD: Samsung PM1725a NVMe PCIe SSD (6.4TB), Firmware version: MN12MN12
  • OS: Ubuntu 18.04 ppc64le

Note

  • You should work on "18.04" (kernel version = 4.x)
  • Do not use "20.04" or "18.04 HWE" (kernel version >= 5.x)
    • If your kernel version is 5, multiple NVMe devices rejected on booting, and only one NVMe device is up, like this:

      dmesg | grep nvme
      [    3.051850] nvme nvme0: pci function 0000:01:00.0
      [    3.051877] nvme 0000:01:00.0: enabling device (0140 -> 0142)
      [    3.051984] nvme nvme1: pci function 0003:01:00.0
      [    3.052071] nvme 0003:01:00.0: enabling device (0140 -> 0142)
      [    3.052224] nvme nvme2: pci function 0030:01:00.0
      [    3.052407] nvme 0030:01:00.0: enabling device (0140 -> 0142)
      [    5.477694] nvme nvme0: Shutdown timeout set to 10 seconds
      [    5.478143] nvme nvme1: Duplicate cntlid 33 with nvme0, rejecting
      [    5.478165] nvme nvme1: Removing after probe failure status: -22
      [    5.479038] nvme nvme2: Duplicate cntlid 33 with nvme0, rejecting
      [    5.479051] nvme nvme2: Removing after probe failure status: -22
      [    5.539906] nvme nvme0: 128/0/0 default/read/poll queues
        [    5.550419] nvme0n1: detected capacity change from 0 to 6348800000000
      

Solution

  • Create namespaces with 4096 Byte (4k) sectors. (The 1562805846 is maximum 4k block count which I manually checked)

    nvme create-ns /dev/nvme0 -s 1562805846 -c 1562805846 -f 0 -d 0
    nvme create-ns /dev/nvme1 -s 1562805846 -c 1562805846 -f 0 -d 0
    nvme create-ns /dev/nvme2 -s 1562805846 -c 1562805846 -f 0 -d 0
    
    • Note that -f 0 means 4k block. I think it's different by manufacturers.
  • Attach namespaces to NVMe controller. The controller ID is 0x21

    nvme attach-ns /dev/nvme0 -n 1 -c 0x21
    nvme attach-ns /dev/nvme1 -n 1 -c 0x21
    nvme attach-ns /dev/nvme2 -n 1 -c 0x21
    
  • You should reset NVMe devices

    nvme reset /dev/nvme0
    nvme reset /dev/nvme1
    nvme reset /dev/nvme2
    
    • Note: The GRUB_CMDLINE_LINUX_DEFAULT="pcie_aspm=off" on /etc/default/grub is not a proper solution for immediate awareness of NVMe device state changing. You must reset NVMe devices.
  • After you completed all steps, you can see this beautiful output:

    root@power9:~# nvme list
    Node             SN                   Model                                    Namespace Usage                      Format           FW Rev
    ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
    /dev/nvme0n1     S3RXNA0K300125       PCIe3 6.4TB NVMe Flash Adapter II x8     1           6.40  TB /   6.40  TB      4 KiB +  0 B   MN12MN12
    /dev/nvme0n2     S3RXNA0K300086       PCIe3 6.4TB NVMe Flash Adapter II x8     1           6.40  TB /   6.40  TB      4 KiB +  0 B   MN12MN12
    /dev/nvme0n3     S3RXNA0K300084       PCIe3 6.4TB NVMe Flash Adapter II x8     1           6.40  TB /   6.40  TB      4 KiB +  0 B   MN12MN12
    
  • (Optional) If not working somehow, reboot the system

    reboot
    

Discussion

  • Note that this "malfunction":

    /dev/nvme0n1 -> /dev/nvme0 namespace
    /dev/nvme0n2 -> /dev/nvme1 namespace
    /dev/nvme0n3 -> /dev/nvme2 namespace
    

    I tried it on x86_64 Supermicro server, and it is different (note: different SN and Model name, but I ordered same hardware; PM1725a):

    [root@x86_64 ~]# nvme list
    Node             SN                   Model                                    Namespace Usage                      Format           FW Rev
    ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
    /dev/nvme0n1     S4C6NY0M901232       SAMSUNG MZPLL6T4HMLA-00005               1           6.40  TB /   6.40  TB    512   B +  0 B   GPJA2B3Q
    /dev/nvme1n1     S4C6NY0M901426       SAMSUNG MZPLL6T4HMLA-00005               1           6.40  TB /   6.40  TB    512   B +  0 B   GPJA2B3Q
    

    which is:

    /dev/nvme0n1 -> /dev/nvme0 namespace
    /dev/nvme1n1 -> /dev/nvme1 namespace
    
  • I guess the whole situation comes from one of two reasons:

    1. Firmware version is wrong (ppc64le = MN12MN12, x86_64 =GPJA2B3Q; old firmware MN12MN12 has an error)
    2. Power9 system is wrong
  • I think the first one is actual reason

    • Please someone guide me to update firmwares for PM1725a?
    • I only can find Dell firmware release for RedHat Linux, and It requires Dell's custom program.
    • Samsung product, Dell firmware... whats happening?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment