Status Code: 54
Timed out connecting to client
A Status Code 54 will occur when the server could not
complete the connection to the client. The
accept system or winsock call timed out after 60 seconds. This problem can occur when a master/media
server tries to connect to bpcd on the client machine and the client fails to
respond before the software times out after 60 seconds (default timer
setting). The processes involved in
this function are: bpcd, bprd, bpbrm and vnetd (if
vnetd is configured for firewall operation).
Table of Contents
1 General Status 54
troubleshooting............................................................................................... 3
1.1 Verify NetBackup Server processes are
running:.................................................................. 3
1.2 Verify the NetBackup Client Daemon
(bpcd) is listening....................................................... 3
1.3 Use the telnet command to test the
NetBackup daemons..................................................... 4
1.4 Check logging on the affected Media
Server and Client(s):................................................... 4
1.5 Is a firewall present in the
configuration?............................................................................. 5
1.6 Are there any networking issues?........................................................................................ 6
1.6.1 Name resolution issues:.............................................................................................. 6
1.6.2 Network performance issues....................................................................................... 6
1.7 Is there a machine resource issue?...................................................................................... 6
2 Troubleshooting for NetBackup Database Agents........................................................................ 6
3 Links......................................................................................................................................... 7
1 General
Status 54 troubleshooting
The
goal here is to isolate the issue to a machine or pair of machines. Since the issue deals with socket to socket
communications, the issue could be happening between the Master or Media
server, and the client. The key to
success is to isolate which socket is not being established. Once this is done,
then further troubleshooting can be done to isolate the failure to a specific
function, network configuration, performance issue, machine resources, etc.
1.1 Verify
NetBackup Server processes are running:
On
the Windows NT/2000 or UNIX master server, verify the NetBackup Request Manager
(bprd), NetBackup Job Daemon (bpjobd), and NetBackup Database Manager
(bpdbm) services are running. These
daemons must be running on the master server.
Open
a command prompt in Windows, or open a shell in UNIX as root, and run following
command:
For Windows run:
%
<install_path>\VERITAS\NetBackup\bin\bpps
*
1TIME
4/25/05 14:02:30.203
COMMAND PID
LOAD TIME MEM START
bpdbm 1912
0.000% 0.031 4.4M
4/25/05 13:07:45.890
bprd 2080
0.000% 0.125 4.7M
4/25/05 13:07:47.531
bpjobd 3248
0.000% 0.171 3.4M
4/25/05 13:07:48.703
<Media
manager processes would be displayed here>
For UNIX run:
#
/usr/openv/netbackup/bin/bpps
–a
NB
Processes
------------
root
18633 1 0 Apr
22 ? 0:01 /usr/openv/netbackup/bin/bpdbm
root
18620 1 0 Apr
22 ? 0:02 /usr/openv/netbackup/bin/bprd
root
18641 18633 0 Apr 22 ?
0:05 /usr/openv/netbackup/bin/bpjobd
<Media
manager processes would be displayed here>
If
these services are not running on the Windows master, start them.
On
the Windows Desktop:
1.
Right-click
on My Computer on the desktop, or within the Start Menu and choose
"Manage".
2.
Expand
Services and Applications and highlight Services.
3.
Locate
the NetBackup services (NetBackup
Request Manager, NetBackup Database
Manager, NetBackup Client Service, NetBackup Volume Manager, NetBackup Device Manager) and verify
they are started.
4.
If
services are not started then right-click on each service and choose Start.
If
these services are not running on the UNIX master, start them.
#
/usr/openv/netbackup/bin/goodies/netbackup start
1.2 Verify
the NetBackup Client Daemon (bpcd) is listening.
The
client daemons such as NetBackup Client Service (bpcd), etc. are started from bpinetd.exe
on Windows or inetd\xinetd on UNIX\Linux
and won’t appear in the bpps
output. Instead the netstat command can be used to verify these daemons are in LISTEN
status. On the Master and the affected
client(s) run the command below to verify if bpcd is listening.
Windows: netstat -a >
c:\netstat.txt
UNIX: netstat –a > /tmp/netstat.txt
The
netstat.txt file that gets created
should list the listening processes that are running (bpcd, vnetd, vopied, bpjava-msvc). Search this
file to determine if bpcd is in LISTEN
status. The vnetd process should also be
in LISTEN status if vnetd is being
used for firewalls.
Windows: TCP hostname:bpcd hostname.domain.com:0 LISTENING
UNIX: *.bpcd
*.* 0 0
49152 LISTEN
1.3 Use
the telnet command to test the NetBackup daemons
Another
test after the problem systems have been identified would be to try to telnet
to NetBackup well known ports from machine to machine. For example from the Master server, a telnet
session could be run to the Media server or client and visa versa:
From
Master command line:
#
telnet <machine name or machine IP address> bpcd
This
will connect to the target machine and display a message similar to the ones
below:
For UNIX:
If
telnet is successful you will get a message similar to:
#
telnet nbclient bpcd
Trying
x.x.x.x...
Connected
to nbclient.domain.com.
Escape
character is '^]'.
<
If successful no additional messages will be returned >
Press
enter to end telnet session.
If
telnet is unsuccessful you will get a message similar to:
#
telnet nbclient bpcd
Trying
x.x.x.x...
telnet:
Unable to connect to remote host: Connection refused
The
telnet session will end automatically and return to the prompt.
For Windows:
If
telnet is successful you will get a message similar to:
%
telnet nbclient bpcd
<
If successful no displayed messages will be returned >
Press
enter to end telnet session.
If
telnet is unsuccessful you will get a message similar to:
%
telnet nbclient bpcd
Connecting
to nbclient. . .Could not open a connection to host on port 13782 : Connect
failed
The
telnet session will end automatically and return to the prompt.
This
is also a very good test for firewall issues to see if a path is open through
the firewall.
This
test can be repeated for connection testing to bprd, bpdbm, and vnetd.
1.4 Check
logging on the affected Media Server and Client(s):
Examine
the All Log Entries report for the time of the failure to determine where the
failure occurred. Also view the logging information detailed in the previous
flow chart for error and failure information. This log information is the best
way to isolate where the problem is occurring and what machines are involved in
the issue, and will enable you to narrow your focus and concentrate your
troubleshooting efforts.
The
Media server bpbrm log and the
client bpcd log will contain
identical logconnections lines:
<2>
logconnections: BPCD ACCEPT FROM x.x.x.x.<port> TO y.y.y.y.13782
The
x.x.x.x will be the source IP address for the connection. Verify this is using the expected network
interface. The client will need to have
forward and reverse name lookup information for this IP address.
Example
from a UNIX Media server /usr/openv/netbackup/logs/bpbrm/log.<date>
file:
<2>
bpcr_connect: bpcr_connect timeout during select after 60 seconds on port
<port>
<16> bpbrm start_bpcd: timed out trying to connect to <hostname>
<16> bpbrm start_bpcd: timed out trying to connect to <hostname>
This
indicates the client did not reply to the server before the 60 second socket
timeout. In this case check the client’s
bpcd log for additional troubleshooting information.
Example
from a UNIX client /usr/openv/netbackup/logs/bpcd/log.<date>
file:
<8> bpcd peer_hostname: gethostbyaddr failed: HOST_NOT_FOUND
(1)
<16> bpcd peer_hostname: gethostbyaddr failed to return peer host, herrno = 1
<16> bpcd main: Couldn't get peer hostname
<16> bpcd peer_hostname: gethostbyaddr failed to return peer host, herrno = 1
<16> bpcd main: Couldn't get peer hostname
Example
from a UNIX client /usr/openv/netbackup/logs/bpcd/log.<date>
file:
<2>
hosts_equal: gethostbyname failed for <hostname>: No such host is known.
(0)
This
would indicate a failure with the name or reverse name lookup of the master or
media sever. NetBackup does a reverse
name lookup of the IP in order to get the name to authenticate against the
SERVER entry in the Windows Registry or the UNIX /usr/openv/netbackup/bp.conf.
After
reviewing the log files, a better idea of what machines are involved in the
failure should be evident.
For
name lookup errors add an entry to the /etc/hosts
on UNIX or the C:\WINDOWS\system32\drivers\etc\hosts
on Windows and try the operation again.
x.x.x.x master
master.domain.com
1.5 Is
a firewall present in the configuration?
If
so are all of the required ports open? Check
the NetBackup System Administrator Guide (for UNIX or Windows) for firewall and
port information. At a minimum ports 13782
(bpcd) and 13724 (vnetd) need to be opened in the firewall for a client
backup. This requires configurations to
be made for the client on the master before it will work. Additional ports are required for restores or
if the client is also a media server.
Example
from a UNIX client /usr/openv/netbackup/logs/bpcd/log.<date>
file:
<2>
bpcd peer_hostname: Connection from host <hostname> (x.x.x.x) port
<reserved port>
<2>
bpcd main: Peer hostname is <hostname>
<2>
nb_bind_on_port_addr: bound to port <reserved port>
<2>
bpcd main: Got socket for output 5, lport = <reserved port>
This
would indicate the client is using the default of reserved ports for the
callback. The nb_bind_on_port_addr: call will display the reserved port number
being used for the callback. A firewall
will most likely be blocking reserved ports which will cause the backup to
abort on the media server with a status 54.
Example
from a UNIX client /usr/openv/netbackup/logs/bpcd/log.<date>
file:
<4>
bpcd valid_server: hostname comparison succeeded
<2>
bpcd main: output socket port number = 13782
Note: For NetBackup 5.x
there will be a dozen “<2> vnet vnetd_<function>” log entries
between these lines.
<2>
get_vnetd_socket: connected to vnetd socket 5
This
would indicate the client is using vnetd port for callbacks. The nb_bind_on_port_addr:
call will not appear in the logs when vnetd is used for callbacks.
1.6 Are
there any networking issues?
1.6.1 Name
resolution issues:
Use the bpclntcmd to test name lookups in both directions. This should be run against both the hostname
and IP address of each machine involved in order to test both forward and
reverse name lookups.
Review the following Technote http://support.veritas.com/docs/261393 for
details on using the bpclntcmd command.
1.6.2 Network
performance issues
·
Duplex
issues
Commands to run: “netstat
–ian” to check for Ierrs or Oerrs.
·
Routing
issues
Commands to run: “netstat –rn”, “traceroute”
or “ifconfig –a” to check for
routing or subnet mask errors.
·
Network
bottlenecks
Commands to run: “ftp” or “ttcp” to test
underlying network performance.
1.7 Is
there a machine resource issue?
Verify
VERITAS suggested minimum kernel parameters are in place for UNIX machines. Review the following Technote: http://seer.support.veritas.com/docs/238063.htm
2 Troubleshooting
for NetBackup Database Agents
Script-based NetBackup
database clients such as DB2, Informix, Oracle, SAP, Sybase, SQL-Server, and
Teradata require additional troubleshooting to
resolve status 54’s. These clients use
comm files in the /usr/openv/netbackup/logs/user_ops directory tree that must be
updated by the master and the media server and then read by the client prior to
establishing the Name and Data sockets.
First, three connections to the client occur
from the master and then the media server.
These connections use bpcd on the client, including the server
connect-back, to update the comm file with job progress information and
eventually the hostname and additional port numbers that the client should use
to establish the Name and Data sockets.
Troubleshooting a status 54 during this portion of the backup or restore
is identical to the steps for a standard backup described in this document.
A second cause for a status 54 on a database
client backup or restore occurs when the client fails to receive an expected
update from either the master or the media server before the
CLIENT_READ_TIMEOUT or other timeout expires on the client. Upon timeout, the database client will exit
in error. Eventually the job will become
active and bpbrm will bind to ports for the Name and Data sockets, write the
port numbers into the comm file, and wait for the database client
to connect-back. If the connect-back does not occur, within 60
seconds of the comm file update, bpbrm will fail the job with a status 54. The bpbrm
log on the media server will show the additional ports for the sockets along
with the media server hostname to which the client should use to complete the
connect-back.
<2> bpbrm listen_for_client:
HOT_ORACLE_DB_BACKUP
<2> bpbrm listen_for_client: bpbrm.c.19241: listen(2)ing on port: 3826 3826 0x00000ef2
<2> bpbrm listen_for_client: bpbrm.c.19243: listen(2)ing on port: 4941 4941 0x0000134d
<2> bpbrm listen_for_client: bpbrm.c.19241: listen(2)ing on port: 3826 3826 0x00000ef2
<2> bpbrm listen_for_client: bpbrm.c.19243: listen(2)ing on port: 4941 4941 0x0000134d
…
<2> bpcr_get_peername_rqst: Server peername
length = 8
<2> bpbrm write_msg_to_progress_file: INF - Data socket = sv2n2adm.3826
<2> bpbrm write_msg_to_progress_file: INF - Name socket = sv2n2adm.4941
<2> bpbrm write_msg_to_progress_file: INF - Data socket = sv2n2adm.3826
<2> bpbrm write_msg_to_progress_file: INF - Name socket = sv2n2adm.4941
Please note that the hostname provided in the
comm file may differ from the expected hostname for the media server. Such a mismatch is a third potential cause
for a status 54 on a database client backup or restore. If the client cannot resolve the provided
hostname and complete the socket through the network, then bpbrm will timeout
after 60 seconds and fail the job with a status 54.
Hence, it is vitally important that the
database client log be checked to determine if the database client has already
exited, is denied a socket by the network, is unable to bind to a local port,
or is otherwise unable to read the comm file.
To determine the exact cause, enable logging for the database client per
the Troubleshooting instructions in the VERITAS
NetBackup ™ for <Database agent>
System Administrators Guide.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.