- Linux kernel 2.7+ (with headers)
Your symptoms might be different, with the same root problem, but these were mine)
- lsblk shows the drive, but DOES NOT SHOW the partitions, like:
-
$ lsblk sda 3G sda1 1G sda2 2G sdb 1.8T $ # notice that sdb doesn't have any partitions $ # even though it does
-
- Any program trying to open the drive (whether it be
mount
,ddrescue
, evengrep
) will freeze for a while and say:Failed to open /dev/<device>: No such device or address
- NOT to be confused with
Failed to open /dev/<device>: No such file or directory
I used to think this was caused by the drive crashing or something. Instead, it's Linux.
Modern drives, when a read fails or takes a while, will try over and over (or wait it out). The timeout for this in modern drives is very high, and often not configurable. (In this case, you couldn't configure it even if it was available, as smartctl
wouldn't be able to open the drive.)
If a read takes too long, Linux will decide that the drive is not okay and ask the drive to reset. But the drive will be so focused on processing the already-made request, it will ignore the reset request. After a while, Linux will reset the ATA connection. This is why you see No such device or address
rather than No such file or directory
: the file was found, but in the process of being opened, it disappeared.
The timeout by Linux is fully configurable. While 690 (about 11 minutes and 30 seconds) is very overboard, I found it better to play it safe than to risk having to spin the drive down and up (powering it off and on) so it would be recognised again.
$ echo 690 | sudo tee /sys/block/sdb/device/timeout
Unfortunately, this only takes effect for a read if the read is started after the timeout is set. This means the original timeout will take effect while Linux is probing for partitions. To solve this, prevent it from probing automatically with this livepatch. Then, add the timeout. Finally, if you need the partitions loaded so you can use partitions directly (like /dev/sda1), run sudo partprobe /dev/<device>
.
If a timeout of 690 isn't enough, the drive probably crashed.
If you have any questions, contact me at TheTechRobo#7420 on Discord or TheTechRobo on hackint IRC.