Tento text patří spíše do naší interní knowledge-base, ale protože jsme chvilku tápali ve zdroji problému, tak jsme se rozhodli jej zveřejnit v trochu surové formě. Cílem je, aby pomohl všem, kteří narazí na podobný problém.
Dostaly se nám do ruky NVME disky, které byly původně v QNAP NASu. Protože tam už nebyly potřeba, tak jsme se rozhodli je recyklovat do našich serverů. Disky hlásili 100% životnost, takže nebyl důvod je vyhazovat.
Jedná se o Samsung PM983 1.92TB M.2 NVME - MZ1LB1T9HALS-00007.
Výpis ze smartctl:
root@debian:~# smartctl --all /dev/nvme0n1 smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-31-amd64] (local build) Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: SAMSUNG MZ1LB1T9HALS-00007 Serial Number: S436NA0N724295 Firmware Version: EDA7602Q PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 1,920,383,410,176 [1.92 TB] Unallocated NVM Capacity: 0 Controller ID: 4 NVMe Version: 1.2 Number of Namespaces: 1 Namespace 1 Size/Capacity: 1,920,383,410,176 [1.92 TB] Namespace 1 Utilization: 680,938,631,168 [680 GB] Namespace 1 Formatted LBA Size: 512 Local Time is: Tue Nov 11 18:33:54 2025 UTC Firmware Updates (0x17): 3 Slots, Slot 1 R/O, no Reset required Optional Admin Commands (0x000f): Security Format Frmw_DL NS_Mngmt Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 86 Celsius Critical Comp. Temp. Threshold: 87 Celsius Namespace 1 Features (0x02): NA_Fields Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 8.00W - - 0 0 0 0 0 0 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 1 - 4096 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 33 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 9,015,848 [4.61 TB] Data Units Written: 5,709,056 [2.92 TB] Host Read Commands: 32,043,597 Host Write Commands: 20,889,861 Controller Busy Time: 103 Power Cycles: 17 Power On Hours: 6,999 Unsafe Shutdowns: 8 Media and Data Integrity Errors: 0 Error Information Log Entries: 1,028,483 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 33 Celsius Temperature Sensor 2: 48 Celsius Temperature Sensor 3: 58 Celsius Error Information (NVMe Log 0x01, 16 of 64 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 1028483 6 0x5282 0x450c 0x028 0 1 - 1 1028482 6 0x4282 0x450c 0x028 0 1 - 2 1028481 6 0x3282 0x450c 0x028 0 1 - 3 1028480 6 0x2282 0x450c 0x028 0 1 - 4 1028479 6 0x1282 0x450c 0x028 0 1 - 5 1028478 6 0x0282 0x450c 0x028 0 1 - 6 1028477 14 0xe381 0x450c 0x028 0 1 - 7 1028476 15 0x5040 0x450c 0x028 0 1 - 8 1028475 15 0x4040 0x450c 0x028 0 1 - 9 1028474 15 0x3040 0x450c 0x028 0 1 - 10 1028473 15 0x2040 0x450c 0x028 0 1 - 11 1028472 15 0x1040 0x450c 0x028 0 1 - 12 1028471 15 0x0040 0x450c 0x028 0 1 - 13 1028470 8 0x0080 0x450c 0x028 0 1 - 14 1028469 6 0x5281 0x450c 0x028 0 1 - 15 1028468 6 0x4281 0x450c 0x028 0 1 - ... (48 entries not read)
root@debian:~# mkfs.xfs /dev/nvme0n1
meta-data=/dev/nvme0n1 isize=512 agcount=4, agsize=117210902 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=1 inobtcount=1 nrext64=0
data = bsize=4096 blocks=468843606, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=228927, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
mkfs.xfs: libxfs_device_zero write failed: No data available
V tento moment jsme si říkali - to je divné, že by ty disky byly přecijen mrtvé? Nepomáhal k tomu ani výpis z dmesg:
[Tue Nov 11 18:35:35 2025] nvme0n1: I/O Cmd(0x2) @ LBA 1048577920, 8 blocks, I/O Error (sct 0x2 / sc 0x86) MORE [Tue Nov 11 18:35:35 2025] critical medium error, dev nvme0n1, sector 1048577920 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2 [Tue Nov 11 18:35:35 2025] nvme0n1: I/O Cmd(0x2) @ LBA 1048577920, 8 blocks, I/O Error (sct 0x2 / sc 0x86) MORE [Tue Nov 11 18:35:35 2025] critical medium error, dev nvme0n1, sector 1048577920 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [Tue Nov 11 18:35:35 2025] Buffer I/O error on dev nvme0n1p1, logical block 131071984, async page read [Tue Nov 11 18:35:37 2025] nvme0n1: I/O Cmd(0x9) @ LBA 0, 4194304 blocks, I/O Error (sct 0x2 / sc 0x86) MORE [Tue Nov 11 18:35:37 2025] critical medium error, dev nvme0n1, sector 0 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 2 [Tue Nov 11 18:35:37 2025] nvme0n1: I/O Cmd(0x8) @ LBA 1875374480, 4096 blocks, I/O Error (sct 0x2 / sc 0x86) MORE [Tue Nov 11 18:35:37 2025] critical medium error, dev nvme0n1, sector 1875374480 op 0x9:(WRITE_ZEROES) flags 0x8000000 phys_seg 0 prio class 2 [Tue Nov 11 18:35:37 2025] nvme0n1: I/O Cmd(0x8) @ LBA 1875378576, 4096 blocks, I/O Error (sct 0x2 / sc 0x86) MORE [Tue Nov 11 18:35:37 2025] critical medium error, dev nvme0n1, sector 1875378576 op 0x9:(WRITE_ZEROES) flags 0x8000000 phys_seg 0 prio class 2 [Tue Nov 11 18:35:37 2025] nvme0n1: I/O Cmd(0x8) @ LBA 1875382672, 4096 blocks, I/O Error (sct 0x2 / sc 0x86) MORE [Tue Nov 11 18:35:37 2025] critical medium error, dev nvme0n1, sector 1875382672 op 0x9:(WRITE_ZEROES) flags 0x8000000 phys_seg 0 prio class 2 [Tue Nov 11 18:35:37 2025] nvme0n1: I/O Cmd(0x8) @ LBA 1875386768, 4096 blocks, I/O Error (sct 0x2 / sc 0x86) MORE [Tue Nov 11 18:35:37 2025] critical medium error, dev nvme0n1, sector 1875386768 op 0x9:(WRITE_ZEROES) flags 0x8000000 phys_seg 0 prio class 2 [Tue Nov 11 18:35:37 2025] nvme0n1: I/O Cmd(0x8) @ LBA 1875390864, 4096 blocks, I/O Error (sct 0x2 / sc 0x86) MORE [Tue Nov 11 18:35:37 2025] critical medium error, dev nvme0n1, sector 1875390864 op 0x9:(WRITE_ZEROES) flags 0x8000000 phys_seg 0 prio class 2 [Tue Nov 11 18:35:37 2025] nvme0n1: I/O Cmd(0x8) @ LBA 1875394960, 4096 blocks, I/O Error (sct 0x2 / sc 0x86) MORE [Tue Nov 11 18:35:37 2025] critical medium error, dev nvme0n1, sector 1875394960 op 0x9:(WRITE_ZEROES) flags 0x8000000 phys_seg 0 prio class 2 [Tue Nov 11 18:35:37 2025] nvme0n1: I/O Cmd(0x8) @ LBA 1875399056, 4096 blocks, I/O Error (sct 0x2 / sc 0x86) MORE [Tue Nov 11 18:35:37 2025] critical medium error, dev nvme0n1, sector 1875399056 op 0x9:(WRITE_ZEROES) flags 0x8000000 phys_seg 0 prio class 2 [Tue Nov 11 18:35:40 2025] nvme_log_error: 476 callbacks suppressed
Celkem typický výpis pro problémové disky.
Co nás zaujalo ve S.M.A.R.T. výpisu bylo 1 028 483 záznamů o chybě, ale zároveň 100% zdraví. Podívali jsme tedy na obsah těchto chyb:
root@debian:~# nvme error-log /dev/nvme0n1 Error Log Entries for device:nvme0n1 entries:64 ................. error_count : 1047288 sqid : 9 cmdid : 0x93d9 status_field : 0x2286(Access Denied: Access to the namespace and/or LBA range is denied due to lack of access rights) phase_tag : 0 parm_err_loc : 0x28 lba : 0 nsid : 0x1 vs : 0 trtype : The transport type is not indicated or the error is not transport related. cs : 0 trtype_spec_info: 0 .................
Access Denied je omezení na úrovní software, nejedná se o hardwarovou závadu. Chvilku jsme tedy pátrali až nás napadlo, že se některé řady disků umí zašifrovat.
Rychlá kontrola:
root@debian:~# ./sedutil-cli --query /dev/nvme0n1
/dev/nvme0n1 NVMe SAMSUNG MZ1LB1T9HALS-00007 EDA7602Q S436NA0N724295
TPer function (0x0001)
ACKNAK = N, ASYNC = N. BufferManagement = N, comIDManagement = N, Streaming = Y, SYNC = Y
Locking function (0x0002)
Locked = Y, LockingEnabled = Y, LockingSupported = Y, MBRDone = N, MBREnabled = N, MediaEncrypt = Y
Geometry function (0x0003)
Align = Y, Alignment Granularity = 8 (4096), Logical Block size = 512, Lowest Aligned LBA = 0
DataStore function (0x0202)
Max Tables = 9, Max Size Tables = 10485760, Table size alignment = 1
OPAL 2.0 function (0x0203)
Base comID = 0x1004, Initial PIN = 0x00, Reverted PIN = 0x00, comIDs = 1
Locking Admins = 4, Locking Users = 9, Range Crossing = N
**** 2 **** Unknown function codes IGNORED
TPer Properties:
MaxComPacketSize = 66048 MaxResponseComPacketSize = 66048
MaxPacketSize = 66028 MaxIndTokenSize = 65540 MaxPackets = 1
MaxSubpackets = 1 MaxMethods = 1 MaxAuthentications = 5
MaxSessions = 1 MaxTransactionLimit = 1 DefSessionTimeout = 0
Host Properties:
MaxComPacketSize = 2048 MaxResponseComPacketSize = 2048
MaxPacketSize = 2028 MaxIndTokenSize = 1992 MaxPackets = 1
MaxSubpackets = 1 MaxMethods = 1
Důležité řádka říkají, že disk je zamečený a zašifrovaný.
Locking function (0x0002)
Locked = Y, LockingEnabled = Y, LockingSupported = Y, MBRDone = N, MBREnabled = N, MediaEncrypt = Y
Heslo k disku jsme neznali a bohužel nevíme jakým způsobem QNAP tuto funkci na discích zapíná a používá. Hledali jsme tedy způsob, jak tuto funkci vypnout.
K vypnutí je potřeba najít PSID kód, který je přímo na disku.
To se následně použíje (POZOR - jedná se o destruktivní příkaz):
root@debian:~# ./sedutil-cli --yesIreallywanttoERASEALLmydatausingthePSID MBYKBV5************************* /dev/nvme0n1 revertTper completed successfully
root@debian:~# ./sedutil-cli --query /dev/nvme0n1
/dev/nvme0n1 NVMe SAMSUNG MZ1LB1T9HALS-00007 EDA7602Q S436NA0N724295
TPer function (0x0001)
ACKNAK = N, ASYNC = N. BufferManagement = N, comIDManagement = N, Streaming = Y, SYNC = Y
Locking function (0x0002)
Locked = N, LockingEnabled = N, LockingSupported = Y, MBRDone = N, MBREnabled = N, MediaEncrypt = Y
...
Disk nyní hlásí, že není zamčený a můžeme s ním pracovat.
root@debian:~# mkfs.xfs /dev/nvme0n1 -f
meta-data=/dev/nvme0n1 isize=512 agcount=4, agsize=117210902 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=1 inobtcount=1 nrext64=0
data = bsize=4096 blocks=468843606, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=228927, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Discarding blocks...Done.
root@debian:~# mount /dev/nvme0n1 /mnt/
root@debian:~# df -h /mnt/ Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1 1.8T 13G 1.8T 1% /mnt
Toto bylo naše první velmi rychlé seznámení s disky, které se umí zašifrovat a zamknout. Rozhodně je to technologie, která nás nadchla k dalšímu prozkoumávání a vidíme zde potenciál pro zvýšení zabezpečí zákaznických dat. Nyní nás čeká další prozkoumání této technologie a testování výkonu v různých scénářích. Pokud vše dopadne dobře, tak tyto disky začneme nasazovat napříč zákaznickými servery.