Tuesday, March 13, 2012

ONS Failed to Start. ONS.exe keeps crashing, Oracle ONS not starting in Windows

Oracle node error. When trying to add a node to Oracle RAC

CRS-0215 / ONS Failed to Start. Pingwait Exited With Exit Status 2 

Extract from Oracle KB Article


Applies to:

Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 11.1.0.7 - Release: 10.1 to 11.1 Information in this document applies to any platform.


Symptoms


Problem can occur during an installation of CRS or while adding a new node.

*             ONS fails to startup on one or both the nodes during a new install.
*             When adding nodes, the ONS on the new node fails to start.



srvctl start nodeapps -n node2
CRS-0215: Could not start resource 'ora.node2.ons'.


$RDBMS_HOME/opmn/logs/ons.log does not have any updates.


Cause

The problem is that the remote port for ONS is used or not available.

Solution

Set up debugging :


srvctl stop nodeapps -n node2
crsctl debug log res 'ora.node2.ons:5'
srvctl start nodeapps -n nodename


- $ORA_CRS_HOME/log//racg/ora.node2.ons

Oracle Database 11g CRS Release 11.1.0.6.0 - Production Copyright 1996, 2007 Oracle. All rights reserved.
2008-12-05 13:13:11.296: [RACG][3184] [3144][3184][ora.node2.ons]: ons failed to start. pingwait exited with exit status 2

Test:

ons.config :

localport=6150
useocr=on
allowgroup=true
usesharedinstall=true


onsctl ping on node 2 :
onsctl ping
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
{node = node1, port = 6251}
Adding remote host node1:6251
onscfg[1]
{node = node2, port = 6251}
Adding remote host node2:6251
ons is NOT running . . .


onsctl start on node2 :
onsctl start
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
{node = node1, port = 6251}
Adding remote host node1:6251
onscfg[1]
{node = node2, port = 6251}
Adding remote host node2:6251
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
{node = dbserver01, port = 6251}
Adding remote host node1:6251
onscfg[1]
{node = dbserver02, port = 6251}
Adding remote host node2:6251
ons failed to start. pingwait exited with exit status 2

OCRDUMP


[DATABASE.ONS_HOSTS.node1]
ORATEXT : node1
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ,
OTHER_PERMISSION : PROCR_READ, USER_NAME : administrator, GROUP_NAME : }

[DATABASE.ONS_HOSTS.node1.PORT]
ORATEXT : 6251
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ,
OTHER_PERMISSION : PROCR_READ, USER_NAME : administrator, GROUP_NAME : }

[DATABASE.ONS_HOSTS.node2]
ORATEXT : node2
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ,
OTHER_PERMISSION : PROCR_READ, USER_NAME : administrator, GROUP_NAME : }

[DATABASE.ONS_HOSTS.node2.PORT]
ORATEXT : 6251
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ,
OTHER_PERMISSION : PROCR_READ, USER_NAME : administrator

Note: Remote port is registered in the ocrdump. You do not need to have the RemortPort in the ons.conf file.

Solution :

The key issue is that the remote port that we see in the ocrdump is used/unavailable

Ran netstat and found that the node did not have a free port 6251.
Reconfigured ons to use a different free port. In this case we used 2200

srvctl stop nodeapps -n node2
racgons remove_config node2:6251
racgons add_config node2:2200

The two nodes can be configured with different ports. However in this case we made the same modification to both the nodes.

ocrdump will now reflect the new PORT.

- srvctl start nodeapps -n node2 and it started successfully.

Variation of this problem :

The problem can manifest when adding a node on windows and perhaps on other platforms as well.

 From our documentation :

Oracle? Database Release Notes
11g Release 1 (11.1) for Microsoft Windows
Part Number B32005-06

4 Installation, Configuration, and Upgrade Issues

4.7 Incorrect Port Number Registered for the New Node
When you run the crssetup.add.bat batch file to add another node, incorrect
port number is registered for the new node.

Workaround: Complete the following procedure to resolve this issue:

After running the crssetup.add.bat batch file, ignore the error messages
similar to the following error message:

Starting ONS application resource on (*) nodes1:CRS-0215: Could not start
resource 'ora.*.ons'
Use the following command to stop the nodeapps service on all the newly added
nodes:

srvctl stop nodeapps -n node
Use the following command to delete the existing ONS port number registration:

racgons remove_config node:4948
Use the following command to add an ONS port number:

racgons add_config node:remote_port
Use the following command to start the nodeapps service on all the newly added
nodes:

srcvtl start nodeapps -n node



*********************SOLUTION  FOR WINDOWS ********************

The issue in windows in my case wasnt the port being used. The issue was due to the fact that the port being configured with the hostname was in the wrong case.

Get the  hostname via Command prompt windows and 'Hostname'. In my case the hostname we were trying to get ONS to run was node003 and it kept failing.

Looking up the hostname showed the hostname as NODE003 instead. So doing the below with the correct CASE resolved the issue. 


D:\oracle\product\11.1.0\crs\BIN>srvctl stop nodeapps -n node003

D:\oracle\product\11.1.0\crs\BIN>racgons remove_config node003:2300
racgons: Existing key value on node003 = 2300.
racgons: node003:2300 removed from OCR.

D:\oracle\product\11.1.0\crs\BIN>racgons add_config NODE003:6251

D:\oracle\product\11.1.0\crs\BIN>srvctl start nodeapps -n NODE003

D:\oracle\product\11.1.0\crs\BIN>srvctl status nodeapps -n node003 VIP is running on node: node003 GSD is running on node: node003 Listener is running on node: node003 ONS daemon is running on node: node003