Skip to content

Instantly share code, notes, and snippets.

@shinji257
Created February 23, 2024 23:26
Show Gist options
  • Save shinji257/d6d5af640f42a77ce8e02209ef621a8e to your computer and use it in GitHub Desktop.
Save shinji257/d6d5af640f42a77ce8e02209ef621a8e to your computer and use it in GitHub Desktop.
AMD GPU Crash - Frigate
Feb 23 18:24:08 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:24 vmid:5 pasid:32773, for process ffmpeg pid 27260 thread ffmpeg:cs0 pid 28003)
Feb 23 18:24:08 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: in page starting at address 0x00008001072fb000 from client 0x12 (VMC)
Feb 23 18:24:08 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00503830
Feb 23 18:24:08 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: Faulty UTCL2 client ID: VCN (0x1c)
Feb 23 18:24:08 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: MORE_FAULTS: 0x0
Feb 23 18:24:08 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 23 18:24:08 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: PERMISSION_FAULTS: 0x3
Feb 23 18:24:08 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 23 18:24:08 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: RW: 0x0
Feb 23 18:24:18 Apollo kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec_0 timeout, signaled seq=9107, emitted seq=9109
Feb 23 18:24:18 Apollo kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process ffmpeg pid 27260 thread ffmpeg:cs0 pid 28003
Feb 23 18:24:18 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset begin!
Feb 23 18:24:19 Apollo kernel: [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
Feb 23 18:24:19 Apollo kernel: [drm] Register(0) [mmUVD_RBC_RB_RPTR] failed to reach value 0x00000360 != 0x00000300
Feb 23 18:24:19 Apollo kernel: [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
Feb 23 18:24:19 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: free PSP TMR buffer
Feb 23 18:24:19 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: MODE2 reset
Feb 23 18:24:19 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset succeeded, trying to resume
Feb 23 18:24:19 Apollo kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
Feb 23 18:24:19 Apollo kernel: [drm] PSP is resuming...
Feb 23 18:24:19 Apollo kernel: [drm] reserve 0xa00000 from 0xf41e000000 for PSP TMR
Feb 23 18:24:19 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: RAS: optional ras ta ucode is not available
Feb 23 18:24:19 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: RAP: optional rap ta ucode is not available
Feb 23 18:24:19 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Feb 23 18:24:19 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: SMU is resuming...
Feb 23 18:24:19 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: smu driver if version = 0x00000004, smu fw if version = 0x00000005, smu fw program = 0, smu fw version = 0x00544fdf (84.79.223)
Feb 23 18:24:19 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: SMU driver if version not matched
Feb 23 18:24:19 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: SMU is resumed successfully!
Feb 23 18:24:19 Apollo kernel: [drm] DMUB hardware initialized: version=0x05000F00
Feb 23 18:24:19 Apollo kernel: [drm] kiq ring mec 2 pipe 1 q 0
Feb 23 18:24:19 Apollo kernel: amdgpu 0000:10:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110)
Feb 23 18:24:19 Apollo kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vcn_v3_0> failed -110
Feb 23 18:24:19 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset(1) failed
Feb 23 18:24:19 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset end with ret = -110
Feb 23 18:24:19 Apollo kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
Feb 23 18:24:20 Apollo kernel: [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
Feb 23 18:24:21 Apollo kernel: [drm] Register(0) [mmUVD_RBC_RB_RPTR] failed to reach value 0x00000010 != 0x00000000
Feb 23 18:24:21 Apollo kernel: [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
Feb 23 18:24:25 Apollo kernel: ------------[ cut here ]------------
Feb 23 18:24:25 Apollo kernel: WARNING: CPU: 1 PID: 0 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:655 amdgpu_irq_put+0x4e/0x90 [amdgpu]
Feb 23 18:24:25 Apollo kernel: Modules linked in: xt_connmark xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_mark xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap ipvlan xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc bonding tls mlx4_en mlx4_core r8169 realtek amdgpu edac_mce_amd intel_rapl_msr edac_core intel_rapl_common gpu_sched iosf_mbi dm_thin_pool drm_buddy i2c_algo_bit dm_persistent_data drm_ttm_helper ttm dm_bio_prison dm_bufio drm_display_helper kvm_amd drm_kms_helper dm_mod kvm
Feb 23 18:24:25 Apollo kernel: drm crct10dif_pclmul crc32_pclmul agpgart crc32c_intel input_leds syscopyarea ghash_clmulni_intel i2c_piix4 sha512_ssse3 sysfillrect sha256_ssse3 sha1_ssse3 aesni_intel wmi_bmof crypto_simd cryptd rapl i2c_core hid_apple k10temp sysimgblt led_class ccp nvme fb_sys_fops ahci nvme_core libahci video tpm_crb tpm_tis wmi tpm_tis_core backlight tpm acpi_cpufreq button unix [last unloaded: mlx4_core]
Feb 23 18:24:25 Apollo kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: P O 6.1.74-Unraid #1
Feb 23 18:24:25 Apollo kernel: Hardware name: ASUS System Product Name/PRIME B650M-A AX II, BIOS 1807 09/26/2023
Feb 23 18:24:25 Apollo kernel: RIP: 0010:amdgpu_irq_put+0x4e/0x90 [amdgpu]
Feb 23 18:24:25 Apollo kernel: Code: 10 48 83 38 00 74 25 89 54 24 14 48 89 74 24 08 48 89 3c 24 e8 af fd ff ff 48 8b 3c 24 84 c0 48 8b 74 24 08 8b 54 24 14 75 09 <0f> 0b b8 ea ff ff ff eb 24 89 d0 f0 41 ff 4c 85 00 b8 00 00 00 00
Feb 23 18:24:25 Apollo kernel: RSP: 0018:ffffc9000007ce28 EFLAGS: 00010046
Feb 23 18:24:25 Apollo kernel: RAX: 0000000000000000 RBX: ffff8881066a8000 RCX: ffff888139038700
Feb 23 18:24:25 Apollo kernel: RDX: 0000000000000000 RSI: ffff8881064c6548 RDI: ffff8881064c0000
Feb 23 18:24:25 Apollo kernel: RBP: ffff8881064c0010 R08: ffffffffa0e8a431 R09: 0000000000000000
Feb 23 18:24:25 Apollo kernel: R10: ffffc9000007cd38 R11: ffffc9000007cd3c R12: 0000000000000000
Feb 23 18:24:25 Apollo kernel: R13: ffff888139038700 R14: ffff888285963e00 R15: ffffffffa0567775
Feb 23 18:24:25 Apollo kernel: FS: 0000000000000000(0000) GS:ffff889fde240000(0000) knlGS:0000000000000000
Feb 23 18:24:25 Apollo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 23 18:24:25 Apollo kernel: CR2: 000014b078000010 CR3: 000000000420a000 CR4: 0000000000750ee0
Feb 23 18:24:25 Apollo kernel: PKRU: 55555554
Feb 23 18:24:25 Apollo kernel: Call Trace:
Feb 23 18:24:25 Apollo kernel: <IRQ>
Feb 23 18:24:25 Apollo kernel: ? __warn+0xab/0x122
Feb 23 18:24:25 Apollo kernel: ? report_bug+0x109/0x17e
Feb 23 18:24:25 Apollo kernel: ? amdgpu_irq_put+0x4e/0x90 [amdgpu]
Feb 23 18:24:25 Apollo kernel: ? handle_bug+0x41/0x6f
Feb 23 18:24:25 Apollo kernel: ? exc_invalid_op+0x13/0x60
Feb 23 18:24:25 Apollo kernel: ? asm_exc_invalid_op+0x16/0x20
Feb 23 18:24:25 Apollo kernel: ? drm_vblank_disable_and_save+0xe3/0xe3 [drm]
Feb 23 18:24:25 Apollo kernel: ? amdgpu_irq_put+0x4e/0x90 [amdgpu]
Feb 23 18:24:25 Apollo kernel: ? amdgpu_irq_put+0x3d/0x90 [amdgpu]
Feb 23 18:24:25 Apollo kernel: dm_set_vblank+0xdb/0x185 [amdgpu]
Feb 23 18:24:25 Apollo kernel: drm_vblank_disable_and_save+0x97/0xe3 [drm]
Feb 23 18:24:25 Apollo kernel: vblank_disable_fn+0x63/0x76 [drm]
Feb 23 18:24:25 Apollo kernel: ? drm_vblank_disable_and_save+0xe3/0xe3 [drm]
Feb 23 18:24:25 Apollo kernel: call_timer_fn+0x6f/0x10d
Feb 23 18:24:25 Apollo kernel: __run_timers+0x144/0x184
Feb 23 18:24:25 Apollo kernel: ? tick_init_jiffy_update+0x7c/0x7c
Feb 23 18:24:25 Apollo kernel: ? update_process_times+0x7a/0x81
Feb 23 18:24:25 Apollo kernel: ? tick_sched_timer+0x43/0x71
Feb 23 18:24:25 Apollo kernel: ? __hrtimer_next_event_base+0x27/0x81
Feb 23 18:24:25 Apollo kernel: run_timer_softirq+0x2b/0x43
Feb 23 18:24:25 Apollo kernel: __do_softirq+0x129/0x288
Feb 23 18:24:25 Apollo kernel: __irq_exit_rcu+0x5e/0xb8
Feb 23 18:24:25 Apollo kernel: sysvec_apic_timer_interrupt+0x85/0xa6
Feb 23 18:24:25 Apollo kernel: </IRQ>
Feb 23 18:24:25 Apollo kernel: <TASK>
Feb 23 18:24:25 Apollo kernel: asm_sysvec_apic_timer_interrupt+0x16/0x20
Feb 23 18:24:25 Apollo kernel: RIP: 0010:cpuidle_enter_state+0x11d/0x202
Feb 23 18:24:25 Apollo kernel: Code: 91 f4 9f ff 45 84 ff 74 1b 9c 58 0f 1f 40 00 0f ba e0 09 73 08 0f 0b fa 0f 1f 44 00 00 31 ff e8 84 b0 a4 ff fb 0f 1f 44 00 00 <45> 85 e4 0f 88 ba 00 00 00 48 8b 04 24 49 63 cc 48 6b d1 68 49 29
Feb 23 18:24:25 Apollo kernel: RSP: 0018:ffffc9000018fe98 EFLAGS: 00000246
Feb 23 18:24:25 Apollo kernel: RAX: ffff889fde240000 RBX: ffff888108001400 RCX: 0000000000000000
Feb 23 18:24:25 Apollo kernel: RDX: 000049fc5e21c490 RSI: ffffffff820d8766 RDI: ffffffff820d8c6f
Feb 23 18:24:25 Apollo kernel: RBP: 0000000000000003 R08: 0000000000000002 R09: 0000000000000002
Feb 23 18:24:25 Apollo kernel: R10: 0000000000000010 R11: 0000000000000696 R12: 0000000000000003
Feb 23 18:24:25 Apollo kernel: R13: ffffffff823237a0 R14: 000049fc5e21c490 R15: 0000000000000000
Feb 23 18:24:25 Apollo kernel: ? cpuidle_enter_state+0xf7/0x202
Feb 23 18:24:25 Apollo kernel: cpuidle_enter+0x2a/0x38
Feb 23 18:24:25 Apollo kernel: do_idle+0x18d/0x1fb
Feb 23 18:24:25 Apollo kernel: cpu_startup_entry+0x2a/0x2c
Feb 23 18:24:25 Apollo kernel: start_secondary+0x101/0x101
Feb 23 18:24:25 Apollo kernel: secondary_startup_64_no_verify+0xce/0xdb
Feb 23 18:24:25 Apollo kernel: </TASK>
Feb 23 18:24:25 Apollo kernel: ---[ end trace 0000000000000000 ]---
Feb 23 18:24:30 Apollo kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec_0 timeout, signaled seq=9109, emitted seq=9109
Feb 23 18:24:30 Apollo kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process ffmpeg pid 27260 thread ffmpeg:cs0 pid 28003
Feb 23 18:24:30 Apollo kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset begin!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment