- CPU - AMD Ryzen 9 5900X
- GPU - ASUS TUF Gaming AMD Radeon RX 6800XT
- Motherboard - ASUS TUF Gaming X570 Plus Wi-Fi
- Random reboots were faced initially when Control (2019) was getting downloaded from Epic Games Store - which naturally led to associate the issue with Epic Games Launcher.
- After a couple of randomly occurring crashes and reboots when trying to install Control (2019) from Epic Games Launcher, transferring files via the ethernet cable on a 10/100 LAN connection and installing Bright Memory Infinite (2021) - it got decided to perform those operations one-by-one and detect which operation causes issues.
- Bright Memory Infinite (2021) got successfully installed without any crashes - which naturally led to it getting pulled out of its association with the issue.
- Then, an attempt to transfer files from a laptop running Windows 10 21H2 to the computer via the ethernet cable on a 10/100 LAN connection was made - which failed due to the random crash and reboot.
- This contradicted the previous assumption of the Epic Games Launcher being the application causing the issue as it was not running at the time - hence, widening the spectrum of possible causes.
- A web search got made to check on users facing the same issue on newly built/assembled computers where they had faced random crashes and reboots - more emphasis was placed on posts made on some subreddits.
- It was ruled out on a variety of posts[1] that the most probable cause for the issue might be related to a faulty RAM or PSU which is not able to provide ample power to the internal hardware, thus causing crashes and reboots.
- Some posts[2] asked to recheck connections made on the PSU and, the motherboard as a probable loose connection to either of them can cause the power supply to be either spotty or inadequate in amount.
- Furmark[3] had been run in the
1600x900resolution,Disabledanti-aliasing and1440p (QHD)preset to confirm if the faulty part here is the GPU - but the tests ran fine and got aborted after 3 minutes and 30 seconds of runtime. - Other posts asked to check the
Event Viewer(by runningeventvwron theRunwindow) to precisely know about the event logs leading up to the time when the random crash and reboot took place. - On following the article[4] to navigate around the interface, logs for the event IDs
6005and6006were filtered - of which logs for the event ID6006could not be found.- Event ID
6005will be labeled as “The event log service was started.” This is synonymous with system startup. - Event ID
6006will be labeled as “The event log service was stopped.” This is synonymous with system shutdown.
- Event ID
- The absence of logs for the event ID
6006pointed to the fact that the event logger was not getting terminated safely so, the random crashes and reboots must be fatal/critical. - While investigating the logs, it was noticed that multiple logs of
Warninglevel, from the sourceWHEA-Loggerand, with event ID19statedA corrected hardware error has occurred.as its message. - A variant of the log was noticed of
Errorlevel, from the sourceWHEA-Loggerand, with event ID18which statedA fatal hardware error has occurred.as its message. - Another variant of the log was noticed of
Criticallevel from the sourceKernel-Powerwith event ID41which statedThe system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed or lost power unexpectedly.as its message. - The time for the logs of event ID
41and those for the logs of event ID18coincided with those of the random crashes and reboots - thereby inferring that WHEA error must have been the factor that caused this issue. - Then, the computer was rebooted to the UEFI BIOS utility and
- On searching about the same on the internet, it was discovered from a post[5] on Reddit that WHEA errors occur when the CPU overclocking is unsafe/unstable either due to insufficient voltage or overheating.
- OCCT was downloaded[6], installed and a CPU test with
Largedataset,Extrememode,Variableload,1start cycle,Autoinstruction set andAutothreads were run for 7 minutes and 30 seconds.00:00:00 - Info - Test schedule started at 2022-02-02 09:33:36 00:00:00 - Info - CPU - Started (Duration : 00:15:00) 00:00:28 - Warning - WHEA error detected 00:01:28 - Warning - WHEA error detected 00:02:28 - Warning - WHEA error detected 00:03:28 - Warning - WHEA error detected 00:04:28 - Warning - WHEA error detected 00:05:28 - Warning - WHEA error detected 00:06:28 - Warning - WHEA error detected 00:07:28 - Warning - WHEA error detected 00:07:31 - Info - CPU - Stopped 00:07:31 - Info - Test schedule stopped by user request - The frequent occurrence of WHEA error - that too every after one minute - was alarming and, hence, it needed immediate attention and, the issue had to be addressed quickly.
- To solve the aforementioned factors which lead to the WHEA issues in the first place, the CPU profile was changed from the
ASUS Optimalto theNormalmode in the EZ Mode of the UEFI BIOS utility. - The UEFI BIOS utility was updated from version 4005 to version 4022 to keep up with the most recent additions made to the BIOS and bug fixes introduced.
- A CPU test with
Largedataset,Extrememode,Variableload,1start cycle,Autoinstruction set andAutothreads were run again for 8 minutes on OCCT.00:00:00 - Info - Test schedule started at 2022-02-02 12:09:46 00:00:00 - Info - CPU - Started (Duration : 00:08:00) 00:08:00 - Info - CPU - Stopped 00:08:00 - Info - Test schedule completed - On observing the CPU test results, it was concluded that the CPU was indeed running in a stable/safe state now and hence, the problem is most likely fixed.
- The log variants in the
Event Viewerwere checked and observed again but, no new log of the aforementioned kind turned up - which indeed confirms that WHEA errors are not turning up.
- Switching the CPU profile from
ASUS OptimaltoNormalin the EZ Mode of the UEFI BIOS utility might be considered as playing too safe to fix the issue. ASUS Optimalis supposed to do some safe overclocking within the limits but, those parameters for the voltages need to be set manually for best results.- It might be attempted to set the CPU profile to
ASUS Optimaland increase the VDDCR CPU voltage by a bit to see if some performance can be squeezed out safely without any additional thermal overhead. - For now, the
Normalmode strikes a good balance with a great affinity towards higher clock speeds so, it should suffice for as long as the CPU is not the causative factor of a performance bottleneck.
- https://www.reddit.com/r/techsupport/comments/1seblu/comment/cdwsddz/?utm_source=share&utm_medium=web2x&context=3
- https://www.reddit.com/r/techsupport/comments/1seblu/comment/cdx1785/?utm_source=share&utm_medium=web2x&context=3
- https://geeks3d.com/furmark/downloads/
- https://www.maketecheasier.com/see-pc-startup-and-shutdown-history-in-windows/
- https://www.reddit.com/r/Amd/comments/69q3ia/comment/dh8ja5q/?utm_source=share&utm_medium=web2x&context=3
- https://www.ocbase.com/