Storage Cell MS service failed to start after reboot - RS-7445 [Serv MS is absent] [It will not be restarted]
RS-7445 [Serv MS is absent] [It will not be restarted] [] [] [] [] [] [] [] [] [] []
2021-11-18T09:19:32.879857+01:00
RS version=19.3.13.0.0,label=OSS_19.3.13.0.0_LINUX.X64_201022,Thu_Oct_22_23:43:28_PDT_2020
[RS] Started Service RS_MAIN with pid 58411
[RS] Kill previous monitoring processes for RS_BACKUP, MS and CELLSRV
2021-11-18T09:19:32.991137+01:00
[RS] Started monitoring process /opt/oracle/cell/cellsrv/bin/cellrsbmt with pid 58424
2021-11-18T09:19:33.057173+01:00
RSBK version=19.3.13.0.0,label=OSS_19.3.13.0.0_LINUX.X64_201022,Thu_Oct_22_23:43:28_PDT_2020
[RS] Started Service RS_BACKUP with pid 58425
[RS] Kill previous monitoring process for core RS
2021-11-18T09:19:33.159572+01:00
[RS] Started monitoring process /opt/oracle/cell/cellsrv/bin/cellrssmt with pid 58429
2021-11-18T09:19:42.312339+01:00
[RS] Started monitoring process /opt/oracle/cell/cellsrv/bin/cellrsmmt with pid 58623
2021-11-18T09:21:53.198369+01:00
[RS] Start service MS failed with error: -74.
2021-11-18T09:21:53.238382+01:00
[RS] Monitoring process /opt/oracle/cell/cellsrv/bin/cellrsmmt (pid: 58623, srvc_pid: 58675) returned with error: 162
2021-11-18T09:21:53.238610+01:00
[RS] Service MS with pid 58675 is no longer present
Errors in file /opt/oracle/cell/log/diag/asm/cell/StorageCell0013-man/trace/rstrc_58411_mmt.trc (incident=25):
RS-7445 [Serv MS is absent] [It will not be restarted] [] [] [] [] [] [] [] [] [] []
Incident details in: /opt/oracle/cell/log/diag/asm/cell/StorageCell0013-man/incident/incdir_25/rstrc_58411_mmt_i25.trc
---
root@StorageCell0013-man ~]# imageinfo
Kernel version: 4.14.35-1902.306.2.1.el7uek.x86_64 #2 SMP Wed Oct 21 20:57:15 PDT 2020 x86_64
Cell version: OSS_19.3.13.0.0_LINUX.X64_201022
Cell rpm version: cell-19.3.13.0.0_LINUX.X64_201022-1.x86_64
Active image version: 19.3.13.0.0.201022
Active image kernel version: 4.14.35-1902.306.2.1.el7uek
Active image activated: 2021-03-31 09:39:20 +0200
Active image status: success
Active node type: STORAGE
Active system partition on device: /dev/md24p6
Active software partition on device: /dev/md24p8
Cell boot usb partition: not found
mount: special device /dev/md6 does not exist
Inactive image version: undefined
Rollback to the inactive partitions: Impossible
---
1. Please provide the below details for analysis.
++ Date/Time of crash and a summary of events leading up to the crash.
last reboot
Uname -a
Linux StorageCell0013-man.dbaas.ing.net 4.14.35-1902.306.2.1.el7uek.x86_64 #2 SMP Wed Oct 21 20:57:15 PDT 2020 x86_64 x86_64 x86_64 GNU/Linux
uptime
10:59:11 up 4 days, 4:10, 1 user, load average: 0.08, 0.09, 0.09
++ List of Crashed nodes
++ Exadata Machine Type X8-2
++ How many compute nodes/cell nodes? 6 (X6 CN)
++ (Full / Half / Quarter Rack / One eighth): Full / upgraded from X6
++ Storage server image version (# Imageinfo):
Kernel version: 4.14.35-1902.306.2.1.el7uek.x86_64 #2 SMP Wed Oct 21 20:57:15 PDT 2020 x86_64
Cell version: OSS_19.3.13.0.0_LINUX.X64_201022
Cell rpm version: cell-19.3.13.0.0_LINUX.X64_201022-1.x86_64
Active image version: 19.3.13.0.0.201022
Active image kernel version: 4.14.35-1902.306.2.1.el7uek
Active image activated: 2021-03-31 09:39:20 +0200
Active image status: success
Active node type: STORAGE
Active system partition on device: /dev/md24p6
Active software partition on device: /dev/md24p8
Cell boot usb partition: not found
Inactive image version: undefined
Rollback to the inactive partitions: Impossible
++ Compute node image version (# Imageinfo) :
Kernel version: 4.14.35-1902.306.2.1.el7uek.x86_64 #2 SMP Wed Oct 21 20:57:15 PDT 2020 x86_64
Image kernel version: 4.14.35-1902.306.2.1.el7uek
Image version: 19.3.13.0.0.201022
Image activated: 2021-07-17 21:45:47 +0200
Image status: success
Node type: COMPUTE
System partition on device: /dev/mapper/VGExaDb-LVDbSys1
++ RDBMS version:
/u01/app/oracle/product/12.2.0.1/dbhome_200415
/u01/app/oracle/product/19.7.0.0/dbhome_2
/u01/app/oracle/product/19.7.0.0/dbhome_4
++ Grid Home version:
19.7.0.0
++ Bare metal or OVM:
Bare metal
++ On premises or Cloud OCI/OCI2:
On premises
---
Solution
Jump to table of contents
Dump continued from file: /opt/oracle/cell/log/diag/asm/cell/StorageCell0013-man/trac
[TOC00001]
RS-7445 [Serv MS is absent] [It will not be restarted] [] [] [] [] [] [] [] [] [
[TOC00001-END]
2021-11-18 09:21:53.097 :000023C6: Failed to heartbeat MS (port: 5043 timeout: 60 sec)
2021-11-18 09:21:53.097 :000023C7: socket open error: Port no: 8888. Received errorno 111. Connection refused
2021-11-18 09:21:53.197 :000023C8: mon_proc_pid oldpid: 58675
2021-11-18 09:21:53.197 :000023C9: pid 58675 has disappeared
2021-11-18 09:21:53.197 :000023CA: start service MS failed with error: -74.
2021-11-18 09:21:53.198 :000023CB: Error : start service failed
Please make sure java servers are not running.
ps -ef|grep java
ps -ef|grep -e msServer
If not running, please try redeploy. and confirm results.
/opt/oracle/cell/cellsrv/deploy/scripts/unix/setup_dynamicDeploy
===================
[root@StorageCell0013-man unix]# pwd
/opt/oracle/cell/cellsrv/deploy/scripts/unix
[root@StorageCell0013-man unix]# ls
celladmin_create.sh cell_env.csh cell_limits.sh common_image_install_func exadata-capacity-on-demand freespace.sh install_util_lib.sh permissions_check.py swupdate
celld cell_env.sh cell_updown.sh common_imag_install.sh exadata-capacity-on-demand.service hwadapter migrate.sh rs
celld.service cellfixfsperms.sh collect_jmap.sh exa_config_parser freeall.sh install_model_properties.sh mscore setup_dynamicDeploy
[root@StorageCell0013-man unix]# sh setup_dynamicDeploy
unzipping wls
CLASSPATH=/usr/java/default/lib/tools.jar:/opt/oracle/cell19.3.13.0.0_LINUX.X64_201022/cellsrv/deploy/wls/wlserver_12.2/wlserver/modules/features/wlst.wls.classpath.jar:
PATH=/opt/oracle/cell19.3.13.0.0_LINUX.X64_201022/cellsrv/deploy/wls/wlserver_12.2/wlserver/server/bin:/opt/oracle/cell19.3.13.0.0_LINUX.X64_201022/cellsrv/deploy/wls/wlserver_12.2/wlserver/../oracle_common/modules/thirdparty/org.apac
he.ant/1.9.8.0.0/apache-ant-1.9.8/bin:/usr/java/default/jre/bin:/usr/java/default/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/oracle/cell19.3.13.0.0_LINUX.X64_201022/cellsrv/bin:/sbin:/usr/sbin:/opt/MegaRAID/storcli/:/r
oot/bin:/opt/oracle/cell19.3.13.0.0_LINUX.X64_201022/cellsrv/deploy/wls/wlserver_12.2/wlserver/../oracle_common/modules/org.apache.maven_3.2.5/bin
Your environment has been set.
Initializing WebLogic Scripting Tool (WLST) ...
Welcome to WebLogic Server Administration Scripting Shell
Type help() for help on available commands
Exiting WebLogic Scripting Tool.
/opt/oracle/cell19.3.13.0.0_LINUX.X64_201022/cellsrv/deploy/wls/wlserver_12.2/wlserver/server/bin
subject= /CN=localhost/OU=Oracle Exadata/O=Oracle Corporation/L=Redwood City/ST=California/C=US
RSA key ok
Successfully verified old security identity and certificates.
Generating a 2048 bit RSA private key
.....................................................................+++
.........................+++
writing new private key to '/opt/oracle/cell19.3.13.0.0_LINUX.X64_201022/cellsrv/deploy/config/security/key.original.pem'
-----
sleep until wls is ready ...
Initializing WebLogic Scripting Tool (WLST) ...
Welcome to WebLogic Server Administration Scripting Shell
Type help() for help on available commands
Connecting to t3://localhost:8888 with userid weblogic ...
Successfully connected to Admin Server "msServer" that belongs to domain "msdomain".
Warning: An insecure protocol was used to connect to the server.
To ensure on-the-wire security, the SSL port or Admin port should be used instead.
Location changed to edit tree.
This is a writable tree with DomainMBean as the root.
To make changes you will need to start an edit session via startEdit().
For more help, use help('edit').
Starting an edit session ...
Started edit session, be sure to save and activate your changes once you are done.
Activating all your changes, this may take a while ...
The edit lock associated with this edit session is released once the activation is completed.
The following non-dynamic attribute(s) have been changed on MBeans
that require server re-start:
MBean Changed : Security:Name=myrealmMSUserAuthenticator
Attributes changed : ControlFlag
MBean Changed : Security:Name=myrealm
Attributes changed : AuthenticationProviders
Activation completed
weblogic.Deployer invoked with options: -verbose -name MS -source /opt/oracle/cell19.3.13.0.0_LINUX.X64_201022/cellsrv/lib/MS.war -targets msServer -user weblogic -adminURL t3://localhost:8888 -deploy
<Nov 23, 2021 1:50:41 PM CET> <Info> <J2EE Deployment SPI> <BEA-260121> <Initiating deploy operation for application, MS [archive: /opt/oracle/cell19.3.13.0.0_LINUX.X64_201022/cellsrv/lib/MS.war], to msServer .>
Task 0 initiated: [Deployer:149026]deploy application MS on msServer.
Task 0 completed: [Deployer:149026]deploy application MS on msServer.
Target state: deploy completed on Server msServer
java.lang.Exception: [Deployer:149169]Requires server restart for completion.
Target Assignments:
+ MS msServer
Initializing WebLogic Scripting Tool (WLST) ...
Welcome to WebLogic Server Administration Scripting Shell
Type help() for help on available commands
Connecting to t3://localhost:8888 with userid weblogic ...
Successfully connected to Admin Server "msServer" that belongs to domain "msdomain".
Warning: An insecure protocol was used to connect to the server.
To ensure on-the-wire security, the SSL port or Admin port should be used instead.
Location changed to edit tree.
This is a writable tree with DomainMBean as the root.
To make changes you will need to start an edit session via startEdit().
For more help, use help('edit').
Starting an edit session ...
Started edit session, be sure to save and activate your changes once you are done.
Saving all your changes ...
Saved all your changes successfully.
Activating all your changes, this may take a while ...
The edit lock associated with this edit session is released once the activation is completed.
Activation completed
Initializing WebLogic Scripting Tool (WLST) ...
Welcome to WebLogic Server Administration Scripting Shell
Type help() for help on available commands
Connecting to t3://localhost:8888 with userid weblogic ...
Successfully connected to Admin Server "msServer" that belongs to domain "msdomain".
Warning: An insecure protocol was used to connect to the server.
To ensure on-the-wire security, the SSL port or Admin port should be used instead.
Location changed to edit tree.
This is a writable tree with DomainMBean as the root.
To make changes you will need to start an edit session via startEdit().
For more help, use help('edit').
Starting an edit session ...
Started edit session, be sure to save and activate your changes once you are done.
Saving all your changes ...
Saved all your changes successfully.
Activating all your changes, this may take a while ...
The edit lock associated with this edit session is released once the activation is completed.
Activation completed
0
[root@StorageCell0013-man unix]#
[root@StorageCell0013-man unix]# service celld status
Redirecting to /bin/systemctl status celld.service
● celld.service - celld
Loaded: loaded (/etc/systemd/system/celld.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Thu 2021-11-18 09:18:29 CET; 5 days ago
Main PID: 33770 (code=exited, status=0/SUCCESS)
Nov 18 06:53:02 StorageCell0013-man.dbaas.ing.net celld[33770]: Starting MS services...
Nov 18 06:55:15 StorageCell0013-man.dbaas.ing.net celld[33770]: The STARTUP of MS services was not successful.
Nov 18 06:55:15 StorageCell0013-man.dbaas.ing.net celld[33770]: CELL-01554: MS startup failed for unknown reasons.
Nov 18 06:55:15 StorageCell0013-man.dbaas.ing.net celld[33770]: Starting CELLSRV services...
Nov 18 06:55:34 StorageCell0013-man.dbaas.ing.net celld[33770]: The STARTUP of CELLSRV services was successful.
Nov 18 06:55:34 StorageCell0013-man.dbaas.ing.net systemd[1]: Started celld.
Nov 18 09:18:23 StorageCell0013-man.dbaas.ing.net systemd[1]: Stopping celld...
Nov 18 09:18:23 StorageCell0013-man.dbaas.ing.net celld[57157]: Stopping the RS, CELLSRV, and MS services...
Nov 18 09:18:29 StorageCell0013-man.dbaas.ing.net celld[57157]: The SHUTDOWN of services was successful.
Nov 18 09:18:29 StorageCell0013-man.dbaas.ing.net systemd[1]: Stopped celld.
[root@StorageCell0013-man unix]# cellcli -e alter cell shutdown services all
Stopping the RS, CELLSRV, and MS services...
CELL-01509: Restart Server (RS) not responding.
Getting the state of CELLSRV services... unknown
Getting the state of MS services... unknown
Getting the state of RS services... stopped
[root@StorageCell0013-man unix]# cellcli -e alter cell startup services all
Starting the RS, CELLSRV, and MS services...
Getting the state of RS services... running
Starting CELLSRV services...
The STARTUP of CELLSRV services was successful.
Starting MS services...
The STARTUP of MS services was not successful.
CELL-01554: MS startup failed for unknown reasons.
==========
Ilom was not reachable so cold restart of the Storage server was done. Then MS service was automatically started,
Comments
Post a Comment