Hardware » [SATA] Raid SW, problème HW ?
[SATA] Raid SW, problème HW ?
Publié le 14/02/2011 @ 11:00:11,
Par blietaerBonjour,
La résolution de ce problème fait appel à votre expérience en :
Software: Linux / Raid-1 / mdadm / dmesg / (s)fdisk
Hardware: SATA / Dell PowerEdge 300
Notre petit serveur à tout faire est un rack Dell avec deux disque de 750Go en SATA monté en Raid-1 (mirror)
Le sdb avait commencé à manifester des crasses et s'est fait kické du Raid.
J'ai rebooté (!), re-fdiské/reformaté/rajouté au Raid, et hop le looOOoong crunch avait commencé. Il n'était pas arrivé au bout.
J'ai alors changé le drive.
Puis un autre.
Puis un autre.
4x.
Déjà.
(bon ok les deux premiers n'étaient pas neuf, mais en 2011, un disque peut survivre à 2000h de vols non?)
Bref, je commence à ne plus douter du drive...mais de l'interface sdb (et cela me fait très mal, je suis électronicien)
Morceau choisi (dmesg) :
[796082.196337] sd 1:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[796082.196340] sd 1:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[796082.196343] Descriptor sense data with sense descriptors (in hex):
[796082.196345] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[796082.196353] 00 00 00 00
[796082.196356] sd 1:0:0:0: [sdb] Add. Sense: No additional sense information
[796082.196360] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[796082.196367] end_request: I/O error, dev sdb, sector 0
[796082.196394] Buffer I/O error on device sdb, logical block 0
[796082.196426] ata2: EH complete
[796082.196580] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.196614] ata2.00: BMDMA stat 0x25
[796082.196637] ata2.00: failed command: READ DMA
[796082.196664] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.196665] res 61/04:08:00:00:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.202924] ata2.00: status: { DRDY DF ERR }
[796082.202953] ata2.00: error: { ABRT }
[796082.232333] ata2.00: configured for UDMA/133
[796082.232345] ata2: EH complete
[796082.232489] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.232525] ata2.00: BMDMA stat 0x25
[796082.232549] ata2.00: failed command: READ DMA
[796082.232577] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.232578] res 61/04:08:00:00:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.232669] ata2.00: status: { DRDY DF ERR }
[796082.232693] ata2.00: error: { ABRT }
[796082.252337] ata2.00: configured for UDMA/133
[796082.252343] ata2: EH complete
[796082.252474] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.252508] ata2.00: BMDMA stat 0x25
[796082.252533] ata2.00: failed command: READ DMA
[796082.252561] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.252562] res 61/04:08:00:00:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.252649] ata2.00: status: { DRDY DF ERR }
[796082.252673] ata2.00: error: { ABRT }
[796082.269835] ata2.00: configured for UDMA/133
[796082.269843] ata2: EH complete
[796082.269983] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.270016] ata2.00: BMDMA stat 0x25
[796082.270041] ata2.00: failed command: READ DMA
[796082.270070] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.270071] res 61/04:08:00:00:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.270164] ata2.00: status: { DRDY DF ERR }
[796082.270188] ata2.00: error: { ABRT }
[796082.284330] ata2.00: configured for UDMA/133
[796082.284342] ata2: EH complete
[796082.284475] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.284504] ata2.00: BMDMA stat 0x25
[796082.284528] ata2.00: failed command: READ DMA
[796082.284556] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.284558] res 61/04:08:00:00:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.284659] ata2.00: status: { DRDY DF ERR }
[796082.284683] ata2.00: error: { ABRT }
[796082.301338] ata2.00: configured for UDMA/133
[796082.301353] ata2: EH complete
[796082.301486] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.301515] ata2.00: BMDMA stat 0x25
[796082.301539] ata2.00: failed command: READ DMA
[796082.301567] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.301568] res 61/04:08:00:00:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.301654] ata2.00: status: { DRDY DF ERR }
[796082.301678] ata2.00: error: { ABRT }
[796082.324328] ata2.00: configured for UDMA/133
[796082.324338] sd 1:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[796082.324341] sd 1:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[796082.324344] Descriptor sense data with sense descriptors (in hex):
[796082.324345] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[796082.324351] 00 00 00 00
[796082.324354] sd 1:0:0:0: [sdb] Add. Sense: No additional sense information
[796082.324357] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[796082.324363] end_request: I/O error, dev sdb, sector 0
[796082.324390] Buffer I/O error on device sdb, logical block 0
[796082.324427] ata2: EH complete
[796082.324600] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.324631] ata2.00: BMDMA stat 0x25
[796082.324655] ata2.00: failed command: READ DMA
[796082.324682] ata2.00: cmd c8/00:08:00:10:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.324683] res 61/04:08:00:10:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.324773] ata2.00: status: { DRDY DF ERR }
[796082.324798] ata2.00: error: { ABRT }
[796082.340336] ata2.00: configured for UDMA/133
[796082.340342] ata2: EH complete
[796082.340473] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.340506] ata2.00: BMDMA stat 0x25
[796082.340530] ata2.00: failed command: READ DMA
[796082.340558] ata2.00: cmd c8/00:08:00:10:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.340559] res 61/04:08:00:10:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.340645] ata2.00: status: { DRDY DF ERR }
[796082.340669] ata2.00: error: { ABRT }
[796082.357830] ata2.00: configured for UDMA/133
[796082.357835] ata2: EH complete
[796082.357971] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.357999] ata2.00: BMDMA stat 0x25
[796082.358022] ata2.00: failed command: READ DMA
[796082.358050] ata2.00: cmd c8/00:08:00:10:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.358051] res 61/04:08:00:10:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.358137] ata2.00: status: { DRDY DF ERR }
[796082.358161] ata2.00: error: { ABRT }
[796082.380337] ata2.00: configured for UDMA/133
[796082.380353] ata2: EH complete
[796082.380489] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Je deviens un peu à court de tests/idées/disques neufs....
Les données, c'est pas trop grave, il y a un rsnapshot hourly.
Le downtime serait plus embêtant, et surtout, j'ai l'impression que le re-crunchage du Raid-1 est long et éprouvant pour le sda qui pourrait aussi un jour me dire flûte...
Vous pensez à quelque chose?
La résolution de ce problème fait appel à votre expérience en :
Software: Linux / Raid-1 / mdadm / dmesg / (s)fdisk
Hardware: SATA / Dell PowerEdge 300
Notre petit serveur à tout faire est un rack Dell avec deux disque de 750Go en SATA monté en Raid-1 (mirror)
Le sdb avait commencé à manifester des crasses et s'est fait kické du Raid.
J'ai rebooté (!), re-fdiské/reformaté/rajouté au Raid, et hop le looOOoong crunch avait commencé. Il n'était pas arrivé au bout.
J'ai alors changé le drive.
Puis un autre.
Puis un autre.
4x.
Déjà.
(bon ok les deux premiers n'étaient pas neuf, mais en 2011, un disque peut survivre à 2000h de vols non?)
Bref, je commence à ne plus douter du drive...mais de l'interface sdb (et cela me fait très mal, je suis électronicien)
Morceau choisi (dmesg) :
[796082.196337] sd 1:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[796082.196340] sd 1:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[796082.196343] Descriptor sense data with sense descriptors (in hex):
[796082.196345] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[796082.196353] 00 00 00 00
[796082.196356] sd 1:0:0:0: [sdb] Add. Sense: No additional sense information
[796082.196360] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[796082.196367] end_request: I/O error, dev sdb, sector 0
[796082.196394] Buffer I/O error on device sdb, logical block 0
[796082.196426] ata2: EH complete
[796082.196580] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.196614] ata2.00: BMDMA stat 0x25
[796082.196637] ata2.00: failed command: READ DMA
[796082.196664] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.196665] res 61/04:08:00:00:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.202924] ata2.00: status: { DRDY DF ERR }
[796082.202953] ata2.00: error: { ABRT }
[796082.232333] ata2.00: configured for UDMA/133
[796082.232345] ata2: EH complete
[796082.232489] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.232525] ata2.00: BMDMA stat 0x25
[796082.232549] ata2.00: failed command: READ DMA
[796082.232577] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.232578] res 61/04:08:00:00:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.232669] ata2.00: status: { DRDY DF ERR }
[796082.232693] ata2.00: error: { ABRT }
[796082.252337] ata2.00: configured for UDMA/133
[796082.252343] ata2: EH complete
[796082.252474] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.252508] ata2.00: BMDMA stat 0x25
[796082.252533] ata2.00: failed command: READ DMA
[796082.252561] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.252562] res 61/04:08:00:00:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.252649] ata2.00: status: { DRDY DF ERR }
[796082.252673] ata2.00: error: { ABRT }
[796082.269835] ata2.00: configured for UDMA/133
[796082.269843] ata2: EH complete
[796082.269983] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.270016] ata2.00: BMDMA stat 0x25
[796082.270041] ata2.00: failed command: READ DMA
[796082.270070] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.270071] res 61/04:08:00:00:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.270164] ata2.00: status: { DRDY DF ERR }
[796082.270188] ata2.00: error: { ABRT }
[796082.284330] ata2.00: configured for UDMA/133
[796082.284342] ata2: EH complete
[796082.284475] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.284504] ata2.00: BMDMA stat 0x25
[796082.284528] ata2.00: failed command: READ DMA
[796082.284556] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.284558] res 61/04:08:00:00:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.284659] ata2.00: status: { DRDY DF ERR }
[796082.284683] ata2.00: error: { ABRT }
[796082.301338] ata2.00: configured for UDMA/133
[796082.301353] ata2: EH complete
[796082.301486] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.301515] ata2.00: BMDMA stat 0x25
[796082.301539] ata2.00: failed command: READ DMA
[796082.301567] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.301568] res 61/04:08:00:00:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.301654] ata2.00: status: { DRDY DF ERR }
[796082.301678] ata2.00: error: { ABRT }
[796082.324328] ata2.00: configured for UDMA/133
[796082.324338] sd 1:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[796082.324341] sd 1:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[796082.324344] Descriptor sense data with sense descriptors (in hex):
[796082.324345] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[796082.324351] 00 00 00 00
[796082.324354] sd 1:0:0:0: [sdb] Add. Sense: No additional sense information
[796082.324357] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[796082.324363] end_request: I/O error, dev sdb, sector 0
[796082.324390] Buffer I/O error on device sdb, logical block 0
[796082.324427] ata2: EH complete
[796082.324600] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.324631] ata2.00: BMDMA stat 0x25
[796082.324655] ata2.00: failed command: READ DMA
[796082.324682] ata2.00: cmd c8/00:08:00:10:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.324683] res 61/04:08:00:10:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.324773] ata2.00: status: { DRDY DF ERR }
[796082.324798] ata2.00: error: { ABRT }
[796082.340336] ata2.00: configured for UDMA/133
[796082.340342] ata2: EH complete
[796082.340473] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.340506] ata2.00: BMDMA stat 0x25
[796082.340530] ata2.00: failed command: READ DMA
[796082.340558] ata2.00: cmd c8/00:08:00:10:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.340559] res 61/04:08:00:10:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.340645] ata2.00: status: { DRDY DF ERR }
[796082.340669] ata2.00: error: { ABRT }
[796082.357830] ata2.00: configured for UDMA/133
[796082.357835] ata2: EH complete
[796082.357971] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[796082.357999] ata2.00: BMDMA stat 0x25
[796082.358022] ata2.00: failed command: READ DMA
[796082.358050] ata2.00: cmd c8/00:08:00:10:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[796082.358051] res 61/04:08:00:10:00/04:00:57:00:00/e0 Emask 0x1 (device error)
[796082.358137] ata2.00: status: { DRDY DF ERR }
[796082.358161] ata2.00: error: { ABRT }
[796082.380337] ata2.00: configured for UDMA/133
[796082.380353] ata2: EH complete
[796082.380489] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Je deviens un peu à court de tests/idées/disques neufs....
Les données, c'est pas trop grave, il y a un rsnapshot hourly.
Le downtime serait plus embêtant, et surtout, j'ai l'impression que le re-crunchage du Raid-1 est long et éprouvant pour le sda qui pourrait aussi un jour me dire flûte...
Vous pensez à quelque chose?
Et au besoin s'arrêter.
[SATA] Raid SW, problème HW ?
Publié le 14/02/2011 @ 11:07:50,
Par Jean-ChristopheVous pensez à quelque chose?
Oui, aux pâtes à la carbonara de ce soir, mais ça ne t'aide pas, désolé
[SATA] Raid SW, problème HW ?
Publié le 14/02/2011 @ 11:09:00,
Par Jean-ChristopheSinon, plus sérieusement, je ne connais pas le principe de fonctionnement du raid sw sous linux, mais tu ne pourrais pas inverser physiquement sda et sdb pour voir si le problème suit le drive?
Dernière édition: 14/02/2011 @ 11:09:20
Dernière édition: 14/02/2011 @ 11:09:20
[SATA] Raid SW, problème HW ?
Publié le 14/02/2011 @ 11:09:50,
Par blietaerMe disais aussi que c'était rapide.
edite ton poste et tape nous au moins ta recette secrette en spoiler, qu'on y pense avec toi..
Mais merci de ne pas dévier le post en marmittons..
Et au besoin s'arrêter.
[SATA] Raid SW, problème HW ?
Publié le 14/02/2011 @ 11:13:22,
Par blietaer mais tu ne pourrais pas inverser physiquement sda et sdb pour voir si le problème suit le drive?
Hééééééééééééééééééééééééé!
Au niveau du GRUB, cela passera...au niveau de la construction/détection automatique des md0 et md1..cela pourrait..si ce n'est que j'ai faillé/retiré les 'sdbX' des deux arrays.
Donc je devrais rajouer d'abord un nouveau SDB....
C'est une solution...
Cela dit, si c'est bien le contrôleur n°2 (SATA-1) qui est effectivement merdé, le test pourrait être destructif
Et au besoin s'arrêter.
[SATA] Raid SW, problème HW ?
Publié le 14/02/2011 @ 11:21:15,
Par Jean-ChristopheSinon, une carte SATA et tu bypass le contrôleur interne.
Tu n'aurais jamais la réponse à la question, mais tu seras quitte du problème.
Tu n'aurais jamais la réponse à la question, mais tu seras quitte du problème.