Fedora Linux Support Community & Resources Center
  #1  
Old 19th May 2010, 07:21 PM
ScaryBob Offline
Registered User
 
Join Date: May 2010
Posts: 2
windows_7firefox
Rebuild on /dev/md0 unknown device - Errors?

I'm running FC12 on a home server with 2 MD arrays. The hardware and arrays were solid with Centos/REHL5 but I decided to upgrade to Fedora due to some software incompatibility problems. The setup went fine but there are intermittent problems with the MD RAID arrays.

Code:
# uname -a
Linux sudio 2.6.32.10-90.fc12.x86_64 #1 SMP Tue Mar 23 09:47:08 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
The box consists of an ASUS M2N-E motherboard with one RocketRaid SATA2 controller (in plain SATA mode) and a Silicon Image SATA II card. The 5 MD0 disks are on the motherboard SATA controller. MD1 is on 4 ports of the RocketRaid card and 1 port of the SIL card. MD0 consists of 5 disks, 1xHitachi HDT7210105, 1xSeagate ST3100340AS and 3xWDC WD10EADS. MD1 consists of 5xWDC WD10EADS disks. There is also a 160GB system drive (sda.)

The problem is that one of the disks on each array is reported as unknown every 6 days or so and the entire array will rebuild. This is usually followed by another disk being reported as unknown and another rebuild. (This is undesirable for obvious reasons.) The server is usually idle when the error occurs with the disks on standby (spun down.)

Here are some diagnostics:
Code:
# mdadm --monitor /dev/md0 /dev/md1
Apr 15 02:09:59: Rebuild80 on /dev/md0 unknown device
Apr 15 03:00:00: Rebuild80 on /dev/md1 unknown device
Apr 15 07:54:09: RebuildFinished on /dev/md0 unknown device
Apr 15 08:45:20: RebuildFinished on /dev/md1 unknown device
Apr 21 03:36:02: RebuildStarted on /dev/md1 unknown device
Apr 21 03:36:02: RebuildStarted on /dev/md0 unknown device
Apr 21 09:26:04: Rebuild20 on /dev/md1 unknown device
Apr 21 09:26:04: Rebuild20 on /dev/md0 unknown device
Apr 21 14:59:26: Rebuild40 on /dev/md0 unknown device
Apr 21 15:16:06: Rebuild40 on /dev/md1 unknown device
Apr 21 20:32:48: Rebuild60 on /dev/md0 unknown device
Apr 21 21:06:08: Rebuild60 on /dev/md1 unknown device
Apr 22 02:06:10: Rebuild80 on /dev/md0 unknown device
Apr 22 02:56:10: Rebuild80 on /dev/md1 unknown device
Apr 22 07:41:19: RebuildFinished on /dev/md0 unknown device
Apr 22 08:37:31: RebuildFinished on /dev/md1 unknown device
Apr 28 03:35:01: RebuildStarted on /dev/md1 unknown device
Apr 28 03:35:01: RebuildStarted on /dev/md0 unknown device
Apr 28 06:21:42: Rebuild20 on /dev/md0 unknown device
Apr 28 06:38:22: Rebuild21 on /dev/md1 unknown device
Apr 28 10:31:44: Rebuild40 on /dev/md0 unknown device
Apr 28 11:05:04: Rebuild40 on /dev/md1 unknown device
Apr 28 16:05:06: Rebuild60 on /dev/md0 unknown device
Apr 28 16:38:26: Rebuild60 on /dev/md1 unknown device
Apr 28 20:31:47: Rebuild81 on /dev/md0 unknown device
Apr 28 21:05:07: Rebuild81 on /dev/md1 unknown device
Apr 28 23:53:00: RebuildFinished on /dev/md0 unknown device
Apr 29 00:39:55: RebuildFinished on /dev/md1 unknown device
May  5 03:58:01: RebuildStarted on /dev/md0 unknown device
May  5 03:58:01: RebuildStarted on /dev/md1 unknown device
May  5 13:24:45: Rebuild20 on /dev/md1 unknown device
May  5 13:24:45: Rebuild20 on /dev/md0 unknown device
May  5 22:34:48: Rebuild40 on /dev/md0 unknown device
May  5 22:51:28: Rebuild40 on /dev/md1 unknown device
May  6 08:01:32: Rebuild60 on /dev/md1 unknown device
May  6 08:01:32: Rebuild60 on /dev/md0 unknown device
May  6 17:11:36: Rebuild80 on /dev/md0 unknown device
May  6 17:28:16: Rebuild80 on /dev/md1 unknown device
May  7 01:25:14: RebuildFinished on /dev/md0 unknown device
May  7 01:32:07: RebuildFinished on /dev/md1 unknown device
May 12 04:06:01: RebuildStarted on /dev/md1 unknown device
May 12 04:06:01: RebuildStarted on /dev/md0 unknown device
May 12 08:49:23: Rebuild20 on /dev/md1 unknown device
May 12 08:49:23: Rebuild20 on /dev/md0 unknown device
May 12 13:32:44: Rebuild40 on /dev/md0 unknown device
May 12 13:49:25: Rebuild40 on /dev/md1 unknown device
May 12 18:32:46: Rebuild61 on /dev/md0 unknown device
May 12 18:49:27: Rebuild60 on /dev/md1 unknown device
May 12 23:16:08: Rebuild80 on /dev/md0 unknown device
May 12 23:49:28: Rebuild80 on /dev/md1 unknown device
May 13 05:54:10: RebuildFinished on /dev/md0 unknown device
May 13 06:19:18: RebuildFinished on /dev/md1 unknown device
Code:
# cat /etc/mdadm.conf 
DEVICE /dev/sd[b-k]1
ARRAY /dev/md0 metadata=0.90 UUID=123c1cf4:280460ae:d359bbcb:ebdf7b4e
ARRAY /dev/md1 metadata=0.90 UUID=2256c45d:94372543:9a2a7338:bdd17f12
Code:
# mdadm -D /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Wed Dec  2 15:29:38 2009
     Raid Level : raid5
     Array Size : 3907039744 (3726.04 GiB 4000.81 GB)
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon May 17 23:07:31 2010
          State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 123c1cf4:280460ae:d359bbcb:ebdf7b4e
         Events : 0.897

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8      113        1      active sync   /dev/sdh1
       2       8      129        2      active sync   /dev/sdi1
       3       8      145        3      active sync   /dev/sdj1
       4       8      161        4      active sync   /dev/sdk1
Code:
# mdadm -D /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Tue Nov 24 13:35:36 2009
     Raid Level : raid5
     Array Size : 3907039744 (3726.04 GiB 4000.81 GB)
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon May 17 23:03:56 2010
          State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 2256c45d:94372543:9a2a7338:bdd17f12
         Events : 0.1948970

    Number   Major   Minor   RaidDevice State
       0       8       81        0      active sync   /dev/sdf1
       1       8       97        1      active sync   /dev/sdg1
       2       8       65        2      active sync   /dev/sde1
       3       8       49        3      active sync   /dev/sdd1
       4       8       33        4      active sync   /dev/sdc

This is where I am at today:
Code:
# mdadm --monitor /dev/md0 /dev/md1
May 19 03:41:01: RebuildStarted on /dev/md1 unknown device
May 19 03:41:01: RebuildStarted on /dev/md0 unknown device
May 19 07:34:23: Rebuild20 on /dev/md0 unknown device
May 19 07:51:03: Rebuild20 on /dev/md1 unknown device
May 19 12:17:44: Rebuild40 on /dev/md0 unknown device
May 19 12:34:25: Rebuild41 on /dev/md1 unknown device
Code:
# mdadm -D /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Wed Dec  2 15:29:38 2009
     Raid Level : raid5
     Array Size : 3907039744 (3726.04 GiB 4000.81 GB)
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed May 19 13:57:03 2010
          State : active, recovering
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

 Rebuild Status : 48% complete

           UUID : 123c1cf4:280460ae:d359bbcb:ebdf7b4e
         Events : 0.996

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8      113        1      active sync   /dev/sdh1
       2       8      129        2      active sync   /dev/sdi1
       3       8      145        3      active sync   /dev/sdj1
       4       8      161        4      active sync   /dev/sdk1
Code:
# mdadm -D /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Tue Nov 24 13:35:36 2009
     Raid Level : raid5
     Array Size : 3907039744 (3726.04 GiB 4000.81 GB)
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed May 19 14:22:16 2010
          State : active, recovering
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

 Rebuild Status : 48% complete

           UUID : 2256c45d:94372543:9a2a7338:bdd17f12
         Events : 0.1949000

    Number   Major   Minor   RaidDevice State
       0       8       65        0      active sync   /dev/sde1
       1       8       81        1      active sync   /dev/sdf1
       2       8       49        2      active sync   /dev/sdd1
       3       8       33        3      active sync   /dev/sdc1
       4       8       97        4      active sync   /dev/sdg1
The question is, how can I find what is causing these unknown device messages/rebuilds and how can I stop it? Or is this normal?

Last edited by ScaryBob; 19th May 2010 at 08:05 PM.
Reply With Quote
  #2  
Old 22nd May 2010, 03:31 PM
F-GT Offline
Registered User
 
Join Date: May 2004
Posts: 81
linuxsafari
Re: Rebuild on /dev/md0 unknown device - Errors

A raid check will take place every week
Check /etc/cron.weekly

Seems odd that it's restarting a rebuild after a few hours .. you're not rebooting are you ?
Any errors in /var/log/messages?
Reply With Quote
  #3  
Old 22nd May 2010, 04:05 PM
jpollard Offline
Registered User
 
Join Date: Aug 2009
Location: Waldorf, Maryland
Posts: 6,818
linuxfedorafirefox
Re: Rebuild on /dev/md0 unknown device - Errors

Isn't there a timer on the disks when they spin up? It is possible that the timer is
just a little bit too short?

As for it doing this every couple of hours, that may be a cron job starting and
that needs the disks to be active.
Reply With Quote
  #4  
Old 22nd May 2010, 08:22 PM
ScaryBob Offline
Registered User
 
Join Date: May 2010
Posts: 2
windows_7firefox
Re: Rebuild on /dev/md0 unknown device - Errors

Thanks for the replies. The messages are a result cron.weekly running. The 'unknown device' message had me concerned that maybe something was wrong. I'm still not sure if these particular messages are a bug, a feature or something else. I haven't previously seen such lengthy rebuilds on these devices with other distros. But that is probably a feature.

There are no md related errors in /var/log/messages. I am not rebooting during the rebuild either.

I've seen errors due to slow disk spin up before. That resulted in the disk being marked failed and being taken off line. I don't think that is the issue here.

It looks like the raid check is reading/checking each device on each array sequentially. That seems a little odd since the each array should only require reading once, unless an error is found. The 'unknown device' at the end of the rebuild massage still has me puzzled and may be why the check is repeated. 15 years ago I would have started checking the code but my mind is a little slow for that these days.

Edit: The only thing that stands out in /var/log/messages is this:
Code:
May 17 23:24:38 localhost kernel: md0: unknown partition table
May 17 23:24:38 localhost kernel: md1: unknown partition table
Not sure if that is significant but I did not partition the MD devices when setting them up. The physical disks are partitioned.

Last edited by ScaryBob; 22nd May 2010 at 08:35 PM.
Reply With Quote
  #5  
Old 23rd May 2010, 04:04 AM
F-GT Offline
Registered User
 
Join Date: May 2004
Posts: 81
linuxsafari
Re: Rebuild on /dev/md0 unknown device - Errors

Quote:
Originally Posted by ScaryBob View Post
Edit: The only thing that stands out in /var/log/messages is this:
Code:
May 17 23:24:38 localhost kernel: md0: unknown partition table
May 17 23:24:38 localhost kernel: md1: unknown partition table
Not sure if that is significant but I did not partition the MD devices when setting them up. The physical disks are partitioned.
Edit: Hmmm just noticed your 2nd md1 entry has sdc1 ... not sdc.. Was that an initial copy/paste error ( missed the 1 on sdc1 ? ) ..

Last edited by F-GT; 23rd May 2010 at 04:19 AM.
Reply With Quote
Reply

Tags
device, errors, or dev or md0, rebuild, unknown

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


Current GMT-time: 17:10 (Wednesday, 27-08-2014)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat