I'm running FC12 on a home server with 2 MD arrays. The hardware and arrays were solid with Centos/REHL5 but I decided to upgrade to Fedora due to some software incompatibility problems. The setup went fine but there are intermittent problems with the MD RAID arrays.
Code:
# uname -a
Linux sudio 2.6.32.10-90.fc12.x86_64 #1 SMP Tue Mar 23 09:47:08 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
The box consists of an ASUS M2N-E motherboard with one RocketRaid SATA2 controller (in plain SATA mode) and a Silicon Image SATA II card. The 5 MD0 disks are on the motherboard SATA controller. MD1 is on 4 ports of the RocketRaid card and 1 port of the SIL card. MD0 consists of 5 disks, 1xHitachi HDT7210105, 1xSeagate ST3100340AS and 3xWDC WD10EADS. MD1 consists of 5xWDC WD10EADS disks. There is also a 160GB system drive (sda.)
The problem is that one of the disks on each array is reported as unknown every 6 days or so and the entire array will rebuild. This is usually followed by another disk being reported as unknown and another rebuild. (This is undesirable for obvious reasons.) The server is usually idle when the error occurs with the disks on standby (spun down.)
Here are some diagnostics:
Code:
# mdadm --monitor /dev/md0 /dev/md1
Apr 15 02:09:59: Rebuild80 on /dev/md0 unknown device
Apr 15 03:00:00: Rebuild80 on /dev/md1 unknown device
Apr 15 07:54:09: RebuildFinished on /dev/md0 unknown device
Apr 15 08:45:20: RebuildFinished on /dev/md1 unknown device
Apr 21 03:36:02: RebuildStarted on /dev/md1 unknown device
Apr 21 03:36:02: RebuildStarted on /dev/md0 unknown device
Apr 21 09:26:04: Rebuild20 on /dev/md1 unknown device
Apr 21 09:26:04: Rebuild20 on /dev/md0 unknown device
Apr 21 14:59:26: Rebuild40 on /dev/md0 unknown device
Apr 21 15:16:06: Rebuild40 on /dev/md1 unknown device
Apr 21 20:32:48: Rebuild60 on /dev/md0 unknown device
Apr 21 21:06:08: Rebuild60 on /dev/md1 unknown device
Apr 22 02:06:10: Rebuild80 on /dev/md0 unknown device
Apr 22 02:56:10: Rebuild80 on /dev/md1 unknown device
Apr 22 07:41:19: RebuildFinished on /dev/md0 unknown device
Apr 22 08:37:31: RebuildFinished on /dev/md1 unknown device
Apr 28 03:35:01: RebuildStarted on /dev/md1 unknown device
Apr 28 03:35:01: RebuildStarted on /dev/md0 unknown device
Apr 28 06:21:42: Rebuild20 on /dev/md0 unknown device
Apr 28 06:38:22: Rebuild21 on /dev/md1 unknown device
Apr 28 10:31:44: Rebuild40 on /dev/md0 unknown device
Apr 28 11:05:04: Rebuild40 on /dev/md1 unknown device
Apr 28 16:05:06: Rebuild60 on /dev/md0 unknown device
Apr 28 16:38:26: Rebuild60 on /dev/md1 unknown device
Apr 28 20:31:47: Rebuild81 on /dev/md0 unknown device
Apr 28 21:05:07: Rebuild81 on /dev/md1 unknown device
Apr 28 23:53:00: RebuildFinished on /dev/md0 unknown device
Apr 29 00:39:55: RebuildFinished on /dev/md1 unknown device
May 5 03:58:01: RebuildStarted on /dev/md0 unknown device
May 5 03:58:01: RebuildStarted on /dev/md1 unknown device
May 5 13:24:45: Rebuild20 on /dev/md1 unknown device
May 5 13:24:45: Rebuild20 on /dev/md0 unknown device
May 5 22:34:48: Rebuild40 on /dev/md0 unknown device
May 5 22:51:28: Rebuild40 on /dev/md1 unknown device
May 6 08:01:32: Rebuild60 on /dev/md1 unknown device
May 6 08:01:32: Rebuild60 on /dev/md0 unknown device
May 6 17:11:36: Rebuild80 on /dev/md0 unknown device
May 6 17:28:16: Rebuild80 on /dev/md1 unknown device
May 7 01:25:14: RebuildFinished on /dev/md0 unknown device
May 7 01:32:07: RebuildFinished on /dev/md1 unknown device
May 12 04:06:01: RebuildStarted on /dev/md1 unknown device
May 12 04:06:01: RebuildStarted on /dev/md0 unknown device
May 12 08:49:23: Rebuild20 on /dev/md1 unknown device
May 12 08:49:23: Rebuild20 on /dev/md0 unknown device
May 12 13:32:44: Rebuild40 on /dev/md0 unknown device
May 12 13:49:25: Rebuild40 on /dev/md1 unknown device
May 12 18:32:46: Rebuild61 on /dev/md0 unknown device
May 12 18:49:27: Rebuild60 on /dev/md1 unknown device
May 12 23:16:08: Rebuild80 on /dev/md0 unknown device
May 12 23:49:28: Rebuild80 on /dev/md1 unknown device
May 13 05:54:10: RebuildFinished on /dev/md0 unknown device
May 13 06:19:18: RebuildFinished on /dev/md1 unknown device
Code:
# cat /etc/mdadm.conf
DEVICE /dev/sd[b-k]1
ARRAY /dev/md0 metadata=0.90 UUID=123c1cf4:280460ae:d359bbcb:ebdf7b4e
ARRAY /dev/md1 metadata=0.90 UUID=2256c45d:94372543:9a2a7338:bdd17f12
Code:
# mdadm -D /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Wed Dec 2 15:29:38 2009
Raid Level : raid5
Array Size : 3907039744 (3726.04 GiB 4000.81 GB)
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon May 17 23:07:31 2010
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 123c1cf4:280460ae:d359bbcb:ebdf7b4e
Events : 0.897
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 113 1 active sync /dev/sdh1
2 8 129 2 active sync /dev/sdi1
3 8 145 3 active sync /dev/sdj1
4 8 161 4 active sync /dev/sdk1
Code:
# mdadm -D /dev/md1
/dev/md1:
Version : 0.90
Creation Time : Tue Nov 24 13:35:36 2009
Raid Level : raid5
Array Size : 3907039744 (3726.04 GiB 4000.81 GB)
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 1
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon May 17 23:03:56 2010
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 2256c45d:94372543:9a2a7338:bdd17f12
Events : 0.1948970
Number Major Minor RaidDevice State
0 8 81 0 active sync /dev/sdf1
1 8 97 1 active sync /dev/sdg1
2 8 65 2 active sync /dev/sde1
3 8 49 3 active sync /dev/sdd1
4 8 33 4 active sync /dev/sdc
This is where I am at today:
Code:
# mdadm --monitor /dev/md0 /dev/md1
May 19 03:41:01: RebuildStarted on /dev/md1 unknown device
May 19 03:41:01: RebuildStarted on /dev/md0 unknown device
May 19 07:34:23: Rebuild20 on /dev/md0 unknown device
May 19 07:51:03: Rebuild20 on /dev/md1 unknown device
May 19 12:17:44: Rebuild40 on /dev/md0 unknown device
May 19 12:34:25: Rebuild41 on /dev/md1 unknown device
Code:
# mdadm -D /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Wed Dec 2 15:29:38 2009
Raid Level : raid5
Array Size : 3907039744 (3726.04 GiB 4000.81 GB)
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Wed May 19 13:57:03 2010
State : active, recovering
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 48% complete
UUID : 123c1cf4:280460ae:d359bbcb:ebdf7b4e
Events : 0.996
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 113 1 active sync /dev/sdh1
2 8 129 2 active sync /dev/sdi1
3 8 145 3 active sync /dev/sdj1
4 8 161 4 active sync /dev/sdk1
Code:
# mdadm -D /dev/md1
/dev/md1:
Version : 0.90
Creation Time : Tue Nov 24 13:35:36 2009
Raid Level : raid5
Array Size : 3907039744 (3726.04 GiB 4000.81 GB)
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 1
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Wed May 19 14:22:16 2010
State : active, recovering
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 48% complete
UUID : 2256c45d:94372543:9a2a7338:bdd17f12
Events : 0.1949000
Number Major Minor RaidDevice State
0 8 65 0 active sync /dev/sde1
1 8 81 1 active sync /dev/sdf1
2 8 49 2 active sync /dev/sdd1
3 8 33 3 active sync /dev/sdc1
4 8 97 4 active sync /dev/sdg1
The question is, how can I find what is causing these unknown device messages/rebuilds and how can I stop it? Or is this normal?