Thursday, 13 August 2015

status 84 in netbackup

Status Code 84
Media write error

A Status 84 will occur when the system’s device driver returns an I/O error while NetBackup is writing to either removable media or a disk file.


Table of Contents

1      Tape errors................................................................................................................................ 3
1.1        I/O errors........................................................................................................................... 3
1.1.1         OS configuration........................................................................................................ 3
1.1.2         SCSI Path.................................................................................................................. 3
1.1.3         Device....................................................................................................................... 4
1.2        Position errors................................................................................................................... 5
1.2.1         Reserve/Release........................................................................................................ 5
1.2.2         3rd-Party applications................................................................................................. 5
1.2.3         SAN firmware or configuration.................................................................................... 5
2      Disk errors................................................................................................................................. 6
3      Links......................................................................................................................................... 6






1      Tape errors

Tape errors fall into two categories. Either there was an I/O error while writing, or the tape was not at the correct position after the write.

1.1    I/O errors

As an application, NetBackup has no direct access to a device, instead relying on the operating system (OS) to handle any communication with the device. This means that during a write operation NetBackup asks the OS to write to the device and report back the success or failure of that operation. If there is a failure, NetBackup will merely report that a failure occurred, and any troubleshooting should start at the OS level. If the OS is unable to perform the write, there are three likely causes; OS configuration, a problem on the SCSI path, or a problem with the device.

It is also possible that NetBackup has been configured to attempt a write operation that would exceed the capabilities of the OS or device. Remove the following file and retry the operation:

UNIX:  /usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS
Windows:  <install_path>\NetBackup\db\config\SIZE_DATA_BUFFERS

1.1.1    OS configuration

There are two likely candidates for problems at the OS level; tape drivers and HBA drivers.

·         Tape drivers and configuration
Contact the OS or hardware vendor to ensure that an up-to-date driver is installed for the tape device. Contact the hardware vendor to ensure that the driver is configured correctly for the device.

·         Host Bus Adapter (HBA) driver/configuration/firmware
Make sure the HBA has an updated driver and firmware and that the configuration is correct.


1.1.2    SCSI Path

Make sure that any devices in the SCSI path to the target device are operating correctly, typically SAN hardware (switches, bridges, etc.). Check for faulty cables, or improper SCSI termination.

·         SAN firmware/configuration
Make sure that any switches or bridges are operating correctly, and that they have the latest software updates installed.

Example of SAN/SCSI communication errors from a UNIX system (syslog, messages):
May 14 16:25:51 server unix: WARNING: /pci@6,4000/scsi@2/st@1,0 (st85):
May 14 16:25:51 server unix: Error for Command: write Error Level: Fatal
May 14 16:25:51 server unix: Requested Block: 1531 Error Block: 1531
May 14 16:25:51 server unix: Vendor: QUANTUM Serial Number: XXXXX
May 14 16:25:51 server unix: Sense Key: Aborted Command
May 14 16:25:51 server unix: ASC: 0x47 (scsi parity error), ASCQ: 0x0, FRU: 0x0

Jun 20 02:20:09 server scsi: [ID 107833 kern.warning] WARNING:
/pci@8,600000/JNI,FCR@2/st@14,0 (st15):
Jun 20 02:20:09 server SCSI transport failed: reason 'tran_err': giving up
Jun 20 02:26:09 server bptm[29663]: [ID 832037 daemon.error] scsi command failed, may be timeout, scsi_pkt.us_reason = 3
Jun 20 02:26:42 server jnic146x: [ID 362195 kern.notice] jnic146x1: Link not operational. Performing reset.
Jun 20 02:26:42 server jnic146x: [ID 133166 kern.notice] jnic146x1: Link Down
Dec 18 21:11:59 server vmunix: 0/1/0/0: Unable to access previously accessed device at nport ID 0x11900.

Example of SCSI communication errors from a Windows Event Viewer System log:
20040830 10:36:30 aic78xx E9 NA The device, \Device\Scsi\aic78xx1, did not respond within the timeout period.
20040830 10:46:33 aic78xx E9 NA The device, \Device\Scsi\aic78xx1, did not respond within the timeout period.

1.1.3            Device

Here are some things to check for on the target device.
·         Firmware
Check the Device Compatibility list to see what firmware version VERITAS engineering has tested with the device. Contact the hardware vendor to obtain any updates.
·         Bad device
Have the hardware vendor check the device to verify that it is operating correctly.
·         Bad media
Check to see if the same piece of media has been causing problems.
·         Dirty Drive
Check to verify that the drives are being cleaned.
·         Environmental
Check power to the device, proper cooling, dust, etc.

Example from the UNIX /usr/openv/netbackup/logs/bptm/log.<date> file:
<16> write_data: cannot write image to media id XXXXXX, drive index 2, I/O error

<16> io_ioctl: ioctl (MTWEOF) failed on media id XXXXXX, drive index 0, The request could not be performed because of an I/O device error. (../bptm.c.12720)

<2> io_ioctl: MTWEOF failed during error recovery, I/O error

<16> write_tar_image: cannot write image to /tmp/sync_XXXXXX, Error 0
<16> bpbackupdb: NB database backup to media id XXXXXX FAILED
bpbackupdb: EXIT status = 84

<16> write_tar_image: ndmp write of bpbackupdb image to F:\VERITAS\NetBackup\temp\sync_XXXXXX failed, error code 13 (NDMP_EOM_ERR)
bpbackupdb: EXIT status = 84

<16> io_ioctl: ioctl (MTWEOF) failed on media id XXXXXX, drive index 0, Data error (cyclic redundancy check). (..\bptm.c.16728)

Example of tape drive and media errors from a UNIX system (syslog, messages):
May 15 16:41:40 server unix: Error for Command: write Error Level: Fatal
May 15 16:41:40 server unix: Requested Block: 2181 Error Block: 2181
May 15 16:41:40 server unix: Vendor: QUANTUM Serial Number: XXXXX
May 15 16:41:40 server unix: Sense Key: Media Error
May 15 16:41:40 server unix: ASC: 0xc (write error), ASCQ: 0x0, FRU: 0x0

Nov 5 21:08:47 server avrd[21163]: Tape drive QUANTUMDLT70001 (device 1, /dev/rmt/c9t0d0BESTnb) needs cleaning. Attempting to auto-clean...

Example of tape drive and media errors from a Windows Event Viewer System log:
20040804 22:47:53 dlttape-VRTS E7 NA The device, \Device\Tape2, has a bad block.
20040806 17:52:32 dlttape-VRTS E11 NA The driver detected a controller error on \Device\Tape0.
20040806 17:52:47 dlttape-VRTS E11 NA The driver detected a controller error on \Device\Tape0.

1.2    Position errors

NetBackup keeps track of how much data it is sending to the operating system to write to the device. As an integrity check after the end of each write, NetBackup will ask the tape device for its position. If this position does not match what NetBackup has calculated the position should be, then the job will fail with a media write error.

There are three things to check to ensure that this position check works correctly.

1.2.1    Reserve/Release

In an environment where multiple hosts have access to the same devices it is necessary to ensure that only one host has exclusive access to the device at any one time. NetBackup uses SCSI reserve/release for this. Refer to the hardware vendor’s documentation to verify that the device supports SCSI reserve/release.

NetBackup sends the reserve command through the SCSI pass-thru path for the device, so this needs to be configured correctly. Refer to the Device Configuration Guide for information on setting up the SCSI pass-thru path for your OS.

1.2.2    3rd-Party applications

SCSI reserve/release sets a reservation on a device for that host. Other applications running on the same host can also send commands to that device, with potentially destructive consequences. An example would be if someone were to use the mt command on a UNIX host to issue a rewind to a device that NetBackup is using for a backup. All data on that tape would be lost.

Monitoring applications can also have the same affect, and need to be disabled.

Example of errors from a UNIX system (syslog, messages):
Mar 1 09:27:43 server EMS [3275]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/tapes/SCSI_tape/8_0_1_0.1.19.239.1.2.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 214630414 -r /storage/events/tapes/SCSI_tape/8_0_1_0.1.19.239.1.2.0 -n 214630401 –a

Note:  Messages indicating similar issues may appear in Windows, within Event Viewer System log.

1.2.3    SAN firmware or configuration

The command that NetBackup uses to read the position of the tape has to go through the SCSI pass-thru path to the device, and all devices on the SCSI path to the device. All devices along the path need to preserve the information that is being passed, or the calculation could be wrong. Ensure that the software that handles each device is up-to-date and compatible with the other devices on the path.

Example from the UNIX /usr/openv/netbackup/logs/bptm/log.<date> file:
<2> write_data: block position check: actual 62504, expected 31254
<16> write_data: FREEZING media id XXXXXX, too many data blocks written, check tape/driver block size configuration

<2> io_terminate_tape: block position check: actual 4, expected 5
<16> write_data: FREEZING media id XXXXXX, External event caused rewind during write, all data on media is lost

2      Disk errors

This error could occur if the disk that NetBackup is writing to fills up. Ensure that the destination disk has enough space for the backup. It could also occur if NetBackup is attempting to write a file larger than two gigabytes, and the file system or OS does not support files larger than two gigabytes.

Example from the UNIX /usr/openv/netbackup/logs/bpdm/log.<date> file:
<16> write_data: cannot write image to disk, attempted
write of 65536 bytes, system wrote 40960

<16> write_backup: cannot write image to disk, A system call received a parameter that is not valid., file sync failed with status -1

Example of disk errors from a UNIX system (syslog, messages):
Feb 4 13:59:38 server unix: NOTICE: alloc: /usr/openv: file system full

Example of disk errors from a Windows Event Viewer System log:
20040826 19:19:27 Srv E2013 NA The C: disk is at or near capacity.  You may need to delete some files.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.