Sunday, January 20, 2013

Cloud Control 12c Corrective Actions


In our buggy development environment Weblogic management servers fails with OOM exception as leaves from the trees in the autumn. If we’ll forget for a while about the really best way – nice code, the second choice will be bounce management server and waiting for the next incident. Unfortunately in some cases you unable to stop those servers from console and kill OS process are the only option.

I do it automatically with a small process management trick and corrective actions, the really cool feature of Grid Control/Cloud Control.
I’m going to make some preliminary assumptions:

  • System runs on Linux/UNIX
Let’s start from very small OS script. It should do the very simple action:
  1. Find process id by managed server name
  2. Kill it with SIG_KILL level.
Why I’m going to use SIG_KILL? If you try to kill OS process with other levels Node Manager will consider that is the normal situation, and managed server will shut down, but with SIG_KILL Node Manager restarts managed server automatically. That is very suitable, because we shouldn’t do additional actions.

OS Script

I’ve written the very simple shell script to find and kill managed server process. It literally contains 3 commands:
#!/bin/sh
usr=`whoami`
plist=`ps -fu $usr |awk -v shl=$$ -v srv=$1 '$0~srv && $8~"java" {if ($3!=shl) print $2,$3}'`
kill $2 $plist

Script gets all current user processes. Then search process and parent process ids for process with substring passed as first parameter. The last step kills those processes with signal level passed as the second parameter.
Save this script to the file where it will be accessible for agent and give your group permission to execute it.

Actions Library

Now we should create corrective action in the actions library. It’s more about action templates then real actions, and later you will see it. To create new action, you should do the follow steps:
  1. Open Corrective Actions pages from Main Menu: Enterprise – Monitoring – Corrective Actions
  2. Select OS Command from the drop down box in the row “Create Library Corrective Action” right above the table and press Go button.
  3. On General tab fill Action name, Description and select Target Type. The last one is very important because it defines will you action available for your targets and their metrics or not.
  4. Click on Parameters Tab
  5. Select Single operation as a Command Type and type the command in the Command field. For such small scripts there is “Script” option that allows you type all OS commands inside the action. There is pro and cons, but I prefer to use script files and call them as a single command.
  6. Cloud Control has a plenty amount of substitution variables. It allows us to use one action for every server that we are going to instrument. Full command line should looks like:
/home/oracle/bin/msfire.sh %ServerNames% -9

  1. Press Credentials tab. We can use preferred credentials for host (and it will be determined at runtime), you can define new credentials if you sure that they will be the same on every host or use one of existing pairs.  
  2. The last tab is Access. You may define access level for other administrators or roles, defined in your system.
  3. When all parameters were set, press Save to Library button
  4. The new action will appears in the Corrective Actions list.
We are ready to apply new action to Weblogic servers

Metric and Collection Settings

We have issues with a couple of managed servers, so I’m not going to create or modify monitoring template. We just apply it for Weblogic server status metric. To do this, follow the steps below.
  1. Locate Weblogic Server and open Target Page. You may just type in server name in global Search field and press Enter. Then click on the target link
  2. From the Target Menu select Monitoring then Metric and Collection Settings
  3. Find Status metric in the Metrics table. Press on the blue pencil in  Edit column
  4. In Advanced settings form click on Add button
  5. In Add Corrective Action Form select “From Library …” option and click Continue button.
  6. Select Action that we created before and click Continue
  7. You will see exactly the same form as you seen during create action in the library. Type name for new corrective action on General tab.
  8. Press Continue button. If your target has no preferred credentials you may receive error message as on screenshot below:
  9. On Credentials tab you may create new credentials pair or select one of named pairs.
  10. Click Continue button. You will be returned to Advanced Settings page. Check “Allow only one corrective action …
  11. Click Continue button to save settings for Status metric.
  12. In Metric and Collection Settings table you may see that Status row has a new status in Corrective Actions column.
  13. Press Ok to close Settings page.

Results

Now, anytime when Status metric will have critical value (in this case it means “Down”) Cloud Control will address this issue with corrective action. Let’s take a look to real incident and actions below.
Cloud Control Agent assumes that Weblogic server is down and notifies Cloud Control Server about.


Regarding Incidents Rules, Cloud Control notifies administrators (me for instance) AND run registered Corrective Action for this metric and you can see additional information on the Incident Details page.


Also you may check the execution steps and command output details.



Monday, January 14, 2013

OEM Cloud Control 12c registering stand-alone Oracle HTTP Server 10g

Oracle Enterprise Manager Agents unable to find all the information about the host software where are they were installed. It is still true even if we talk about Oracle Software. Practically we unable automatically take under control standalone Oracle HTTP Server (version 10.1.3.1/ Apache 2.0).
These servers are required for intermediate FMCNA Portal configuration, when applications would be already upgraded (Weblogic 11g/WebCache 11g) and Security Services aren’t (Oracle SSO 10g).
To register Oracle HTTP Server 10.1.3.X on Oracle Enterprise Manager Cloud Control 12c we should complete the follow tasks:

  1. Discover target host
  2. Deploy OEM CC Agent
  3. Add Oracle HTTP Server using Manual Configuration.
First two topics are standard procedures and covered by Oracle Enterprise Manager Documentation.
To register new Oracle HTTP Server 10.1.3.x complete the follow steps:
  1. From the Global Navigation Area select Setup -> Add  Target -> Add Targets Manually
  2. Select  option “Add Non-Host Targets by Specifying Target Monitoring Properties” as on picture below.
  3. Select Target Type from the drop down list
  4. Lookup and select Monitoring Agent for corresponding component host using  button
  5. Press “Add Manually …” button
  6. On the next screen you will see a huge form with a lot of required fields.
  7. In our case not all of them are really necessary, so you could use any reasonable value to fill it out. I’ve used NA, Oracle Support recommends to use put ‘X’ char into it. The really matter fields are there:
    Field nameValueComment
    Target NameDisplay name for targetAny name except special characters
    EM Target Typeoracle_apacheOracle Support recommended value
    Machine namefull.server.name.comFully qualified host name
    Port Number7777Default Listen port for HTTP
    Version of Apache2.0Actual Apache server version for Standalone Oracle HTTP server 10.1.3.1 – 2.0
    Oracle Home Path/oraclewls/admin/ohsFull path to Oracle HTTP Server home dir
    OPMN Port Number7777Listen HTTP Port
    Version CategoryStdApacheOracle Support recommended value

    1. Fill all required fields with fake values of your choice.
    2. Fill out Global properties to describe target Location, Lifecycle and etc.
    3. Save the target parameters.
    4. If all was done right you will able to see target home as follow.

Therefore, you can only see the status of OHS . No performance related metrics, no control, no configuration changes. Thanks to Cloud Control developers there are great metric extensions functionality.

Metric extensions in OEM 12c


For the next 6-8 months we will use combined 11g/10g middleware infrastructure, because of it we have to use standalone Oracle HTTP Server 10g (10.1.3.1.0). Only this Oracle HTTP server, based on top of Apache HTTP 2.x, able to work with Oracle Single Sign-On 10g (mod_osso) and Weblogic servers (mod_wls) together.  Unfortunately Oracle develops their products so fast, that forgets to support their own support products. Oracle EM 12c nor even Oracle EM 11g couldn’t discover and collect metrics for this product. Of course there is a recommended way to register Oracle HTTP Server manually (i.e. #1274815.1 on Metalink). After that OEM able to see state of the product and that’s all, you even could not stop or start it from EM Console.
The good point is that there is another almost straight way to collect server metrics with OEM CC agents and repository. For Old versions of OEM you were able to define so-called User-Defined Metrics. In the newest version of OEM Metric Extensions were introduced. It’s a more powerful way to extend functionality monitoring abilities for OEM 12c. Let’s go thru the practical task and take under monitoring the old Oracle HTTP server 10g. To do this we should:
  • Register Oracle HTTP Server in OEM 12c
  • Enable internal server statistic collection publication
  • Provide this statistic to the agent
  • Collect and publish metrics on the Management Server

Enable publication of server statistics


Oracle or to be correct Apache HTTP Server allows you receive internal server information and load statistics. This functionality is provided by mod_info module and disabled in the default configuration.
Al that we should do is:
1. Create copy of your current configuration
$ cp $ORACLE_HOME/ohs/conf/httpd.conf $ORACLE_HOME/ohs/conf/httpd.conf.stat
2. Open httpd.conf file for edit
$ gedit $ORACLE_HOME/ohs/conf/httpd.conf
3. Make sure that mod_info is loaded
4. Enable access to server statistics from your local domain or only from localhost.
5. Also we need to enable extended statistic generation. Find parameter ExtendedStatus and set it’s value to On

6. Additionally, you may enable /server-info also, but remember it’s not secure.
7. Save the changes
8. Restart Oracle HTTP Server to apply the changes.
$ $ORACLE_HOME/opmn/bin/opmnctl restartproc
9. Check that you able to see server statistics

10. Server gives us extended statistic information in as a simple HTML text and with auto parameter it provides plain text information 

We are ready to automate collection of server statistic.

 Provide statistic to the agent


Even in automated mode server statistic that we received on previous steps is not suitable to feed up the OEM Agent. We should prepare and format this data (i.e. remove field names and Scoreboard string). I have no idea how it could be done on Windows platforms without coding, hopefully I use Linux and all that we need was created long time ago. So we are going to:
  • Transfer HTTP data to the standard output
  • Clean unnecessary information
  • Format output string for OEM Agent
To do this, I’ve create a small shell script that accept two parameters (host name and port), get the data, and transform it
#!/bin/sh
# Input parameters
# HTTP host name
# Http listen port

  
if [ -z $1 ]; then
hst=`hostname -f`
else
hst=$1
fi
if [ -z $2 ]; then
prt=7777
else
prt=$2
fi
echo `wget -qO- http://$hst:$prt/server-status?auto | awk -F ": " '$1!="Scoreboard" { print $2}' |awk '{printf "%s",$0"|"}'`

  
I get the statistic data using wget and send it to the standard output.
Pass all information to the awk to remove field names and the Scoreboard string
Pass result to awk again to create one string with ‘|’ as a divider.
Save this script and if set appropriate default values you should see something like this
$ bin/htpstat.sh
31046|309007|.00409926|72940|.425638|4338.13|10192.1|1|9|

We got the well formatted data and able to configure new metric extension.

 Create OEM 12c Metric Extensions


Metric Extensions is a new OEM 12c feature with a much more functionality than it was in previous versions with User Defined Exceptions. We will use only small part of it, just enough to our needs. So, create new Metric Extension to collect server status from Oracle HTTP Server 10g. To do it, we should complete follow steps:
1. Login to Oracle Enterprise Manager 12c Console with Admin privileges
2. Open Metric Extensions page from Main menu: Enterprise -> Monitoring -> Metric Extensions
3. Push create button on the panel just above extensions list
4. The master will guide you thru the all steps of extension creation process.
    Fill out the General Extension information. Select Target Type – Oracle HTTP Server, Adapter – OS Command – Multiple Columns and all other information.

5. On the next screen  provide adapter configuration. You should provide full path to shell script and also pass two parameters – http host name and listen http port. These parameters give as ability to deploy this metric extension to several HTTP Hosts. Of course shell script should be copied to these hosts into the same directory.
6. Press “Next” button to define metric extension columns
7. On the “Columns” step you can create from up to 10 metric columns to populate them with data from adapter. Also on this step you may define metric thresholds and comparison operator. On the screen below you see edit form for “Idle Workers” metric.

7. On this step you may redefine default monitoring credentials, we are going to use default one, so press “Next” button.
8. There you able to debug out metric. Add one or more targets to the Test Targets table. OEM automatically shows targets compatible with new extension.
9. Select Oracle HTTP Server, which you configured earlier.
10. Run the test using  button and check results in the table below. If all steps was done correctly you will able to see something like this:
11. Press “Next” and review all metric extension properties on the one page.
12. If all the information seems fine, press “Finish” to save your new metric, otherwise you able to return back and make changes
Our new metric is created and ready. To start collect metric values we should deploy it to targets. To do it:
13. From the metric extensions list, select your metric.
14. From the Actions dropdown menu select Deploy To Targets …
15. Press Add button to find and add existing targets to the table
16. Click on “Submit” button below the targets list.
17. In the Metric Extensions list select your new extension and is dropdown Actions select Publish Extension.
After deployment and publication new metrics are available for the selected targets. To check the values :
   18. Open target’s home page. From the Target’s Main Menu select Monitoring -> All Targets

You will see Metrics tree on the left and metric values in the main area.

 
Another option is to use Performance Summary screen from the Monitoring menu.
With  new metrics you able to build and save chart with the metric values and compare it with other targets or with other time frame.