https://forum.level1techs.com/t/vega-10-and-12-reset-application/145666
-
-
Save numinit/1bbabff521e0451e5470d740e0eb82fd to your computer and use it in GitHub Desktop.
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c | |
index 44c4ae1abd00..27840129e4b0 100644 | |
--- a/drivers/pci/quirks.c | |
+++ b/drivers/pci/quirks.c | |
@@ -3433,6 +3433,14 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0034, quirk_no_bus_reset); | |
*/ | |
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset); | |
+/* | |
+ * Radeon RX Vega and Navi devices break on bus reset. Oi... | |
+ * This is *not a real workaround* - disabling bus reset | |
+ * for your GPU may have unintended consequences. | |
+ */ | |
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, 0x687f, quirk_no_bus_reset); | |
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, 0xaaf8, quirk_no_bus_reset); | |
+ | |
static void quirk_no_pm_reset(struct pci_dev *dev) | |
{ | |
/* |
Please don't upstream my terrible patch. :-(
Unless there's a really good reason to work around broken hardware (i.e. it's non-fixable even in the VBIOS), this problem is AMD's to solve.
So the alternative is to wait for something that might never happen?
To include this is to promote bad behaviour on AMD's part. Not only that, this is not a fix, if your guest VM crashes or fails to shutdown, or the guest AMDGPU driver crashes (which happens often) or your physical bios posts the AMD GPU before you can post it inside your VM, this patch does nothing.
Renamed the patch file to hopefully be more clear that this is not a real workaround. Sure, it works for some people, hopefully people get mileage out of it, but bus reset itself being broken is a bad problem that needs to be fixed.
Incredible work thank you. I'll test this out on my Vega 56 on the weekend!
FYI: I'm new to kernel patching so this might just be my inexperience, but the current patch fails when running the patch command on Ubuntu 18.04 using Kernel 5.2.7.
I ended up grabbing the code from your initial commit which worked.
peter@ElephantBox:~/Downloads/linux-5.2.7$ patch -p1 < ~/Downloads/patch_for_vega/fix-vega-reset.patch
patching file drivers/pci/quirks.c
patch: **** malformed patch at line 18: {
peter@ElephantBox:~/Downloads/linux-5.2.7$ nano ~/Downloads/patch_for_vega/fix-vega-reset.patch
peter@ElephantBox:~/Downloads/linux-5.2.7$ patch -p1 < ~/Downloads/patch_for_vega/fix-vega-reset.patch
patching file drivers/pci/quirks.c
Hunk #1 succeeded at 3433 with fuzz 1 (offset 60 lines).
peter@ElephantBox:~/Downloads/linux-5.2.7$
Is it correct to just add
/*
* Radeon RX Vega and Navi devices break on bus reset. Oi...
* This is *not a real workaround* - disabling bus reset
* for your GPU may have unintended consequences.
*/
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, 0x687f, quirk_no_bus_reset);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, 0xaaf8, quirk_no_bus_reset);
after
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset);
?
(I've got Kernel 5.2.13 so the patch wont apply in its current state.)
yes
@numinit This patch has invalid syntax since you added two comment lines without updating the line count (12->14 on the @@ line). Here is a fixed version (diffed from 5.3.0): https://gist.github.com/aiberia/dee39e883defbcb430994c2abc7d9fff
@aiberia Thank you, fixed it.
Awesome, but please do not upstream this patch. I am working with AMD to produce a proper reset of the device also as a PCI quirk.
This is unfixed a year or more later. Who should we be chasing at AMD? Are they working with you on a proper fix?
Hi,
Is this patch still relevant? I have compiled a kernel 5.4.3 with this patch and my system even doesn't detect anymore the graphic card.
On the standard kernel 5.3.0-3 (without patch), Winows boot but i get frequently a error "pci header type '127' for device" when i try to reboot my VM.
I have to disconnect the power wire of my pc before to reboot
this isnt working for me, it keeps asking for
[imre@localhost Desktop]$ patch -p0 < fix-vega-reset.patch
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
|index 44c4ae1abd00..27840129e4b0 100644
|--- a/drivers/pci/quirks.c
|+++ b/drivers/pci/quirks.c
--------------------------
File to patch:
@ImreBrassai patch -p0
!= patch -p1
what
[imre@localhost ~]$ patch -p1 < vega.patch
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
|index 44c4ae1abd00..27840129e4b0 100644
|--- a/drivers/pci/quirks.c
|+++ b/drivers/pci/quirks.c
--------------------------
File to patch:
You need to patch the kernel source and recompile, you don't just run the command provided...
Please stop bolding your text too, just use the insert code
button.
ok sorry about that, it bolds automatically dont know why
so i did what you said, i downloaded the kernel and patched it, i installed the kernel, and when i run your script to test if it worked it gives me this
[imre@localhost linux-5.4.1]$ ~/Downloads/reset-test 0000:0a:00.0
============================================================================
AMD Vega 10/12 Reset Application (Version: 1.0)
Copyright (c) 2019 Geoffrey McRae <[email protected]>
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
This tool is intended as an interim workaround while I port this into the
kernel driver. If you like my work and want to support it you can contribute
using the following methods:
* Ko-Fi - https://ko-fi.com/lookingglass
* Patreon - https://www.patreon.com/gnif
* BTC - 14ZFcYjsKPiVreHqcaekvHGL846u3ZuT13
============================================================================
Unsupported device 1002:731f
Thats a Navi GPU, not Vega. See:
https://forum.level1techs.com/t/navi-reset-kernel-patch/147547/47
OH! i see now
@gnif just wanted to thank you for your hard work, you saved my brand new threadripper build. really can't thank you enough. new patreon incoming.
has there been any progress made on an upstream fix for this?
Thanks mate.
any progress made on an upstream fix for this
Not yet, things slowed down across the holiday break, contacts have gone quiet for now ;)
In the interim work is progressing on Looking Glass :)
Which file should you patch? How do I apply the patch?
Which file should you patch? How do I apply the patch?
first you must download the source code of the linux kernel. the patch is applied in the root directory of the linux kernel source, before compiling. please google how to apply patches to the linux kernel using your distro of choice. this thread should be for information pertinent to the patch, not generic questions about the linux kernel itself.
Could this be applied with kpatch/live patching?
Because this issue exists in all modern Radeon cards and if we just implement a quirk AMD will just keep releasing broken cards.