Status Code
84
Media write
error
A
Status 84 will occur when the system’s device driver returns an I/O error while
NetBackup is writing to either removable media or a disk file.
Table of
Contents
1 Tape errors................................................................................................................................ 3
1.1 I/O errors........................................................................................................................... 3
1.1.1 OS configuration........................................................................................................ 3
1.1.2 SCSI Path.................................................................................................................. 3
1.1.3 Device....................................................................................................................... 4
1.2 Position errors................................................................................................................... 5
1.2.1 Reserve/Release........................................................................................................ 5
1.2.2 3rd-Party applications................................................................................................. 5
1.2.3 SAN firmware or configuration.................................................................................... 5
2 Disk errors................................................................................................................................. 6
3 Links......................................................................................................................................... 6
1 Tape
errors
Tape errors fall into two categories. Either
there was an I/O error while writing, or the tape was not at the correct
position after the write.
1.1 I/O
errors
As an application, NetBackup has no direct
access to a device, instead relying on the operating system (OS) to handle any
communication with the device. This means that during a write operation
NetBackup asks the OS to write to the device and report back the success or
failure of that operation. If there is a failure, NetBackup will merely report
that a failure occurred, and any troubleshooting should start at the OS level.
If the OS is unable to perform the write, there are three likely causes; OS
configuration, a problem on the SCSI path, or a problem with the device.
It is also possible that NetBackup has been
configured to attempt a write operation that would exceed the capabilities of
the OS or device. Remove the following file and retry the operation:
UNIX: /usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS
Windows: <install_path>\NetBackup\db\config\SIZE_DATA_BUFFERS
1.1.1
OS configuration
There are two likely candidates for problems
at the OS level; tape drivers and HBA drivers.
·
Tape drivers and
configuration
Contact the OS or hardware vendor to ensure
that an up-to-date driver is installed for the tape device. Contact the
hardware vendor to ensure that the driver is configured correctly for the
device.
·
Host Bus Adapter
(HBA) driver/configuration/firmware
Make sure the HBA has an updated driver and
firmware and that the configuration is correct.
1.1.2
SCSI Path
Make sure that any devices in the SCSI path
to the target device are operating correctly, typically SAN hardware (switches,
bridges, etc.). Check for faulty cables, or improper SCSI termination.
·
SAN
firmware/configuration
Make sure that any switches or bridges are
operating correctly, and that they have the latest software updates installed.
Example of SAN/SCSI communication errors from a UNIX system (syslog, messages):
May 14 16:25:51 server unix: WARNING:
/pci@6,4000/scsi@2/st@1,0 (st85):
May 14 16:25:51 server unix: Error for Command: write Error Level: Fatal
May 14 16:25:51 server unix: Requested Block: 1531 Error Block: 1531
May 14 16:25:51 server unix: Vendor: QUANTUM Serial Number: XXXXX
May 14 16:25:51 server unix: Sense Key: Aborted Command
May 14 16:25:51 server unix: ASC: 0x47 (scsi parity error), ASCQ: 0x0, FRU: 0x0
May 14 16:25:51 server unix: Error for Command: write Error Level: Fatal
May 14 16:25:51 server unix: Requested Block: 1531 Error Block: 1531
May 14 16:25:51 server unix: Vendor: QUANTUM Serial Number: XXXXX
May 14 16:25:51 server unix: Sense Key: Aborted Command
May 14 16:25:51 server unix: ASC: 0x47 (scsi parity error), ASCQ: 0x0, FRU: 0x0
Jun 20 02:20:09 server scsi: [ID 107833
kern.warning] WARNING:
/pci@8,600000/JNI,FCR@2/st@14,0 (st15):
Jun 20 02:20:09 server SCSI transport failed: reason 'tran_err': giving up
Jun 20 02:26:09 server bptm[29663]: [ID 832037 daemon.error] scsi command failed, may be timeout, scsi_pkt.us_reason = 3
Jun 20 02:26:42 server jnic146x: [ID 362195 kern.notice] jnic146x1: Link not operational. Performing reset.
Jun 20 02:26:42 server jnic146x: [ID 133166 kern.notice] jnic146x1: Link Down
/pci@8,600000/JNI,FCR@2/st@14,0 (st15):
Jun 20 02:20:09 server SCSI transport failed: reason 'tran_err': giving up
Jun 20 02:26:09 server bptm[29663]: [ID 832037 daemon.error] scsi command failed, may be timeout, scsi_pkt.us_reason = 3
Jun 20 02:26:42 server jnic146x: [ID 362195 kern.notice] jnic146x1: Link not operational. Performing reset.
Jun 20 02:26:42 server jnic146x: [ID 133166 kern.notice] jnic146x1: Link Down
Dec 18 21:11:59 server vmunix: 0/1/0/0: Unable to
access previously accessed device at nport ID 0x11900.
Example of SCSI communication
errors from a Windows Event
Viewer System log:
20040830 10:36:30 aic78xx E9 NA The device,
\Device\Scsi\aic78xx1, did not respond within the timeout period.
20040830 10:46:33 aic78xx E9 NA The device, \Device\Scsi\aic78xx1, did not respond within the timeout period.
20040830 10:46:33 aic78xx E9 NA The device, \Device\Scsi\aic78xx1, did not respond within the timeout period.
1.1.3
Device
Here are some things to check for on the
target device.
·
Firmware
Check the Device Compatibility list to see
what firmware version VERITAS engineering has tested with the device. Contact
the hardware vendor to obtain any updates.
·
Bad device
Have the hardware vendor check the device to
verify that it is operating correctly.
·
Bad media
Check to see if the same piece of media has
been causing problems.
·
Dirty Drive
Check to verify that the drives are being
cleaned.
·
Environmental
Check power to the device, proper cooling,
dust, etc.
Example from the UNIX /usr/openv/netbackup/logs/bptm/log.<date> file:
<16> write_data: cannot write image to media
id XXXXXX, drive index 2, I/O error
<16> io_ioctl: ioctl
(MTWEOF) failed on media id XXXXXX, drive index 0, The request could not be
performed because of an I/O device error. (../bptm.c.12720)
<2> io_ioctl: MTWEOF failed during error
recovery, I/O error
<16>
write_tar_image: cannot write image to /tmp/sync_XXXXXX, Error 0
<16> bpbackupdb: NB database backup to media id XXXXXX FAILED
<16> bpbackupdb: NB database backup to media id XXXXXX FAILED
bpbackupdb: EXIT status = 84
<16> write_tar_image: ndmp write of bpbackupdb
image to F:\VERITAS\NetBackup\temp\sync_XXXXXX failed, error code 13
(NDMP_EOM_ERR)
bpbackupdb: EXIT status = 84
<16> io_ioctl: ioctl (MTWEOF) failed on media
id XXXXXX, drive index 0, Data error (cyclic redundancy check).
(..\bptm.c.16728)
Example of tape drive and media errors from a UNIX system (syslog, messages):
May 15 16:41:40 server unix: Error for Command:
write Error Level: Fatal
May 15 16:41:40 server unix: Requested Block: 2181 Error Block: 2181
May 15 16:41:40 server unix: Vendor: QUANTUM Serial Number: XXXXX
May 15 16:41:40 server unix: Sense Key: Media Error
May 15 16:41:40 server unix: ASC: 0xc (write error), ASCQ: 0x0, FRU: 0x0
May 15 16:41:40 server unix: Requested Block: 2181 Error Block: 2181
May 15 16:41:40 server unix: Vendor: QUANTUM Serial Number: XXXXX
May 15 16:41:40 server unix: Sense Key: Media Error
May 15 16:41:40 server unix: ASC: 0xc (write error), ASCQ: 0x0, FRU: 0x0
Nov 5 21:08:47 server avrd[21163]: Tape drive
QUANTUMDLT70001 (device 1, /dev/rmt/c9t0d0BESTnb) needs cleaning. Attempting to
auto-clean...
Example of tape drive and media errors
from a Windows Event Viewer
System log:
20040804 22:47:53 dlttape-VRTS E7 NA The device, \Device\Tape2,
has a bad block.
20040806 17:52:32 dlttape-VRTS E11 NA The driver detected a
controller error on \Device\Tape0.
20040806 17:52:47 dlttape-VRTS E11 NA The driver detected a
controller error on \Device\Tape0.
1.2 Position
errors
NetBackup keeps track of how much data it is
sending to the operating system to write to the device. As an integrity check
after the end of each write, NetBackup will ask the tape device for its
position. If this position does not match what NetBackup has calculated the
position should be, then the job will fail with a media write error.
There are three things to check to ensure
that this position check works correctly.
1.2.1
Reserve/Release
In an environment where multiple hosts have
access to the same devices it is necessary to ensure that only one host has
exclusive access to the device at any one time. NetBackup uses SCSI
reserve/release for this. Refer to the hardware vendor’s documentation to
verify that the device supports SCSI reserve/release.
NetBackup sends the reserve command through
the SCSI pass-thru path for the device, so this needs to be configured
correctly. Refer to the Device Configuration Guide for information on setting
up the SCSI pass-thru path for your OS.
1.2.2
3rd-Party applications
SCSI reserve/release sets a reservation on a
device for that host. Other applications running on the same host can also send
commands to that device, with potentially destructive consequences. An example
would be if someone were to use the mt command on a UNIX host to issue a rewind
to a device that NetBackup is using for a backup. All data on that tape would
be lost.
Monitoring applications can also have the
same affect, and need to be disabled.
Example of errors from a UNIX
system (syslog, messages):
Mar 1 09:27:43 server EMS
[3275]: ------ EMS Event Notification ------ Value: "CRITICAL (5)"
for Resource:
"/storage/events/tapes/SCSI_tape/8_0_1_0.1.19.239.1.2.0" (Threshold:
>= " 3") Execute the following command to obtain event details:
/opt/resmon/bin/resdata -R 214630414 -r
/storage/events/tapes/SCSI_tape/8_0_1_0.1.19.239.1.2.0 -n 214630401 –a
Note: Messages indicating
similar issues may appear in Windows,
within Event Viewer System log.
1.2.3
SAN firmware or configuration
The command that NetBackup uses to read the
position of the tape has to go through the SCSI pass-thru path to the device,
and all devices on the SCSI path to the device. All devices along the path need
to preserve the information that is being passed, or the calculation could be
wrong. Ensure that the software that handles each device is up-to-date and compatible
with the other devices on the path.
Example from the UNIX /usr/openv/netbackup/logs/bptm/log.<date> file:
<2> write_data: block position check: actual
62504, expected 31254
<16> write_data: FREEZING media id XXXXXX, too
many data blocks written, check tape/driver block size configuration
<2> io_terminate_tape: block position check: actual 4, expected 5
<16> write_data: FREEZING media id XXXXXX,
External event caused rewind during write, all data on media is lost
2
Disk errors
This error could occur if the disk that
NetBackup is writing to fills up. Ensure that the destination disk has enough
space for the backup. It could also occur if NetBackup is attempting to write a
file larger than two gigabytes, and the file system or OS does not support
files larger than two gigabytes.
Example from the UNIX /usr/openv/netbackup/logs/bpdm/log.<date> file:
<16> write_data: cannot write image to disk,
attempted
write of 65536 bytes, system wrote 40960
write of 65536 bytes, system wrote 40960
<16> write_backup: cannot write image to disk,
A system call received a parameter that is not valid., file sync failed with
status -1
Example of disk errors from a UNIX system (syslog,
messages):
Feb 4 13:59:38 server unix: NOTICE: alloc: /usr/openv:
file system full
Example of disk errors from a Windows Event Viewer System log:
20040826
19:19:27 Srv E2013 NA The C: disk is at or near capacity. You may need to delete some files.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.