Back to Storage Management Contents Page
Dell OpenManage Server Administrator Storage Management User's Guide
Common Troubleshooting Procedures
Specific Problem Situations and Solutions
This section contains troubleshooting procedures for common situations as well as for specific problems.
This section describes commands and procedures that can be used in troubleshooting. Topics covered include:
Verify that the power-supply cord and adapter cables are attached correctly. If the system is having trouble with read and write operations to a particular virtual disk or non-RAID physical disk (if the system hangs, for example), then make sure that the cables attached to the corresponding enclosure or backplane are secure. If the connection is secure but the problem persists, you may need to replace a cable. See also "Isolate Hardware Problems".
On SAS controllers, you should verify that the cable configuration is valid. Refer to the SAS hardware documentation for valid cable configurations. If the cable configuration is invalid, you may receive alerts "2182" or "2356."
Make sure that the system meets all system requirements. In particular, verify that the correct levels of firmware and drivers are installed on the system. For more information on drivers and firmware, see "Drivers and Firmware".
Storage Management is tested with the supported controller firmware and drivers. In order to function properly, the controller must have the minimum required version of the firmware and drivers installed. The most current versions can be obtained from the Dell Support website at support.dell.com.
![]() |
NOTE: You can verify which firmware and drivers are installed by selecting the Storage object in the tree view and clicking the Information/Configuration tab. You can also check the Alert Log for alerts relating to unsupported firmware and driver versions. |
It is also recommended to obtain and apply the latest Dell PowerEdge Server System BIOS on a periodic basis to benefit from the most recent improvements. Please refer to the Dell PowerEdge system documentation for more information.
If you receive a "timeout" alert related to a hardware device or if you otherwise suspect that a device attached to the system is experiencing a failure, then do the following to confirm the problem:
On SCSI controllers, use the Rescan controller task to update information for the controller and attached devices. This operation may take a few minutes if there are a number of devices attached to the controller.
If the Rescan does not properly update the disk information, you may need to reboot your system.
You may need to replace a failed disk in the following situations:
If the failed disk is part of a redundant virtual disk, then the disk failure should not result in data loss. You should replace the failed disk immediately, however, as additional disk failures can cause data loss.
If the redundant virtual disk has a hot spare assigned to it, then the data from the failed disk is rebuilt onto the hot spare. After the rebuild, the former hot spare functions as a regular physical disk and the virtual disk is left without a hot spare. In this case, you should replace the failed disk and make the replacement disk a hot spare.
![]() |
NOTE: If the redundant virtual disk does not have a hot spare assigned to it, then replace the failed disk using the procedure described in "Replacing a Physical Disk Receiving SMART Alerts." |
A rebuild is automatically initiated because the virtual disk is redundant.
If a hot spare was already assigned to the virtual disk, then data from the failed disk may already be rebuilt onto the hot spare. In this case, you need to assign a new hot spare. See "Assign and Unassign Dedicated Hot Spare" and "Assign and Unassign Global Hot Spare" for more information.
If the failed physical disk is part of a nonredundant virtual disk (such as RAID 0), then the failure of a single physical disk will cause the entire virtual disk to fail. To proceed, you need to verify when your last backup was, and if there is any new data that has been written to the virtual disk since that time.
If you have backed up recently and there is no new data on the disks that would be missed, you can restore from backup.
![]() |
NOTE: If the failed disk is attached to a PERC 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4e/DC, 4/Di, or CERC ATA100/4ch controller, you can attempt to recover data from the disk by using the procedure described in "Using the Physical Disk Online Command on Select Controllers" before continuing with the following procedure. |
Do the following:
If you do not have a suitable backup available, and if the failed disk is part of a virtual disk on a controller that supports the Online physical disk task, then you can attempt to retrieve data by selecting Online from the failed disk's drop-down task menu.
The Online command attempts to force the failed disk back into a Online state. If you are able to force the disk into a Online state, you may be able to recover individual files. How much data you can recover depends on the extent of disk damage. File recovery is only possible if a limited portion of the disk is damaged.
There is no guarantee you will be able to recover any data using this method. A forced Online does not fix a failed disk. You should not attempt to write new data to the virtual disk.
After retrieving any viable data from the disk, replace the failed disk as described previously in "Replacing a Failed Disk that is Part of a Redundant Virtual Disk" or "Replacing a Failed Physical Disk that is Part of a Nonredundant Virtual Disk".
On a CERC SATA1.5/2s controller, a rebuild may not start automatically when you replace a failed physical disk that is part of a RAID 1 virtual disk. In this circumstance, use the following procedure to replace the failed physical disk and rebuild the redundant data.
If the physical disk that you mistakenly removed is part of a redundant virtual disk that also has a hot spare, then the virtual disk rebuilds automatically either immediately or when a write request is made. After the rebuild has completed, the virtual disk will no longer have a hot spare since data has been rebuilt onto the disk previously assigned as a hot spare. In this case, you should assign a new hot spare.
If the physical disk that you removed is part of a redundant virtual disk that does not have a hot spare, then replace the physical disk and do a rebuild.
See the following sections for information on rebuilding physical disks and assigning hot spares:
You can avoid removing the wrong physical disk by blinking the LED display on the physical disk that you intend to remove. See "Blink and Unblink" for information on blinking the LED display.
If you upgrade the Microsoft Windows operating system on a server, you may find that Storage Management no longer functions after the upgrade. The installation process installs files and makes registry entries on the server that are specific to the operating system. For this reason, changing the operating system can disable Storage Management.
To avoid this problem, you should uninstall Storage Management before upgrading. If you have already upgraded without uninstalling Storage Management, however, you should uninstall Storage Management after the upgrade.
After you have uninstalled Storage Management and completed the upgrade, reinstall Storage Management using the Storage Management install media. You can download Storage Management from the Dell Support website support.dell.com.
The following sections describe troubleshooting procedures for virtual disks.
A rebuild will not work in the following situations:
In some situations, a rebuild may complete successfully while also reporting errors. This may occur when a portion of the disk containing redundant (parity) information is damaged. The rebuild process can restore data from the healthy portions of the disk but not from the damaged portion.
When a rebuild is able to restore all data except data from damaged portions of the disk, it will indicate successful completion while also generating alert "2163". The rebuild may also report sense key errors. In this situation, take the following actions to restore the maximum data possible:
You might be attempting a RAID configuration that is not supported by the controller. Check the following:
If you create a virtual disk using the minimum allowable size in Storage Management, the virtual disk may not be visible to Windows Disk Management even after initialization. This occurs because Windows Disk Management is only able to recognize extremely small virtual disks if they are dynamic. It is generally advisable to create virtual disks of larger size when using Storage Management.
On some versions of the Linux operating system, the virtual disk size is limited to 1TB. If you create a virtual disk that exceeds the 1TB limitation, your system may experience the following behavior:
If you have created a virtual disk that exceeds the 1TB limitation, you should do the following:
Whether or not your Linux operating system limits virtual disk size to 1TB depends on the version of the operating system and any updates or modifications that you have implemented. See your operating system documentation for more information.
When creating virtual disks, you should avoid using the same physical disks for both redundant and nonredundant virtual disks. This recommendation applies to all controllers. Using the same physical disks for both redundant and nonredundant virtual disks can result in unexpected behavior including data loss.
![]() |
NOTE: SAS controllers do not allow you to create redundant and nonredundant virtual disks on the same set of physical disks. |
This section describes behavior that may occur on the PERC 3/Si, 3/Di, CERC SATA1.5/6ch, and CERC SATA1.5/2s controllers if you use the same physical disks for both redundant and nonredundant virtual disks. In this type of configuration, the failure or removal of a physical disk can cause the following behavior:
Resolution: This behavior is expected because the virtual disk is nonredundant. In this case, the failure or removal of a single physical disk causes the entire virtual disk to fail with no possibility of recovering the data unless a backup is available.
Resolution: This behavior is also expected. Data can be recovered if a hot spare is available to rebuild the failed or removed disk.
Resolution: Perform a "Rescan Controller". When the rescan is complete, select each physical disk that is Offline and perform a "Remove Dead Segments" task. You must remove the dead segments before the physical disk can be brought back online. The dead segments are caused by the failure or removal of the shared physical disk.
![]() |
NOTE: It is recommended that you avoid using the same physical disks for both redundant and nonredundant virtual disks. |
This section contains additional trouble-shooting problem areas. Topics include:
A physical disk may display an error status if it has been damaged, taken offline, or was a member of a virtual disk that has been deleted or initialized. The following actions may resolve the error condition:
When a system in a cluster attempts to rebuild a failed disk but the rebuild fails, then another system takes over the rebuild. In this situation, you may notice that the rebuilt disk continues to be marked as failed on both systems even after the second system has rebuilt successfully. To resolve this problem, perform a rescan on both systems after the rebuild completes successfully.
When you do a Prepare to Remove command on a physical disk attached to a PERC 4/Di controller, you may find that the disk does not display in the Storage Management tree view even after doing a rescan or a reboot.
In this case, do the following to redisplay the disk in the Storage Management tree view:
The following alerts or events are generated when a portion of a physical disk is damaged:
This damage is discovered when the controller performs an operation that requires scanning the disk. Examples of operations that may result in these alerts are as follows:
If you receive an alerts 2146 through 2150 as the result of doing a rebuild or while the virtual disk is in a degraded state, then data cannot be recovered from the damaged disk without restoring from backup. If you receive alerts 2146 through 2150 under circumstances other than a rebuild, then data recovery may be possible. The following describes each of these situations.
Do the following if you receive alerts 2146 through 2150 during a rebuild or while the virtual disk is in a degraded state:
If you receive alerts 2146 through 2150 while performing an operation other than a rebuild, you should replace the damaged disk immediately to avoid data loss.
Do the following:
If the system is hanging, timing out, or experiencing other problems with read and write operations, then there may be a problem with the controller cables or a device. For more information, see "Cables Attached Correctly" and "Isolate Hardware Problems".
If you have implemented channel redundancy on a PERC 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4e/DC, or 4/Di controller, a failure of one channel causes I/O to stop on the other channels included in the channel-redundant configuration. For the resolution to this problem, see "Channel Redundancy on PERC 3/DCL, 3/DC, 3/QC, 4/DC, 4e/DC, 4/Di, and 4e/Di Controllers".
You may notice that the task menus do not always display the same task options. This is because Storage Management only displays those tasks that are valid at the time the menu is displayed. Some tasks are only valid for certain types of objects or at certain times. For example, a Check Consistency task can only be performed on a redundant virtual disk. Similarly, if a disk is already offline, the Offline task option is not displayed.
There may be other reasons why a task cannot be run at a certain time. For example, there may already be a task running on the object that must complete before additional tasks can be run.
Let autocheck run, but do not worry about the message. The reboot will complete after autocheck is finished. Depending on the size of your system, this may take about ten minutes.
Activating the Windows hibernation feature may cause Storage Management to display erroneous status information and error messages. This problem resolves itself when the Windows operating system recovers from hibernation.
Access can be denied here if you do not enter a user name and password that match an administrator account on the remote computer or if you mistype the login information. The remote system may also not be powered on or there may be network problems.
When connecting to a remote Windows Server 2003 system, you must log into the remote system using an account that has administrator privileges. By default, Windows Server 2003 does not allow anonymous (null) connections to access the SAM user accounts. Therefore, if you are attempting to connect using an account that has a blank or null password, the connection may fail.
When reconfiguring a virtual disk using the Mozilla browser, the following error message may display:
Although this page is encrypted, the information you have entered is to be sent over an unencrypted connection and could easily be read by a third party.
You can disable this error message by changing a Mozilla browser setting. To disable this error message:
Storage Management polls the status of physical disks at frequent intervals. When the physical disk is located in an enclosure, Storage Management uses the data reported by the SCSI Enclosure Processor (SEP) to ascertain the status of the physical disk. In the event that the SEP is not functioning, Storage Management is still able to poll the status of the physical disk, but Storage Management is not able to identify the physical disk as being located in the enclosure. In this case, Storage Management displays the physical disk directly below the Connector object in the tree view and not under the enclosure object.
You can resolve this problem by restarting the Server Administrator service or by rebooting the system. For more information on restarting the Server Administrator service, see the Dell OpenManage Server Administrator User's Guide.
Back to Storage Management Contents Page