My Samsung 870 EVO 2TB SDD is dying after 13 months of basic workstation operation. Looks like some problem with a large batch because I found many other users complaining on forums. I am going for RMA. Fortunately, I restored from my backup.

Lesson learned: SMART needs to be monitored on my home servers, this is not the first time and I was lucky enough to see the errors in the system journal in advance.

How to do that? There are multiple options, there is a shell script which ships with the smartmontools package, but I could not get it working. So I ended landing on a simple solution:

# dnf install smartmontools ssmtp

A dirty shell script will do, note that smartctl utility returns a bit mask so finding if a drive is healthy is a bit tricky. Luckily, the manpage contains an example:

# cat /etc/cron.weekly/smart
for DRIVE in sda sdb sdc sdd; do
  smartctl -H /dev/$DRIVE &>/dev/null
  dying=$(($? & 8))
  if [[ $dying -ne 0 ]]; then
    echo "Subject: SMART problem $DRIVE on $(hostname)" | sendmail -v

Super simple, just an empty email with no body. In case you don’t run MTA on your server like I do:

# cat /etc/ssmtp/ssmtp.conf

From address must match in order to pass MTA anti-spam filters:

# cat /etc/ssmtp/revaliases

It’s dirty, but it should work.

Update: I had a typo in the script, special thanks to François Le Nalio who spotted it and reported back. I actually had the typo in my original script, this could have been another disaster. Like I did not have enough SSD failures in the last three years :-)