RCUG 7 Selected Topics

From OC Systems Wiki!
Jump to: navigation, search

Next Previous Index Top

RootCause User Guide

     RCUG 7 Selected Topics

rcc-13

Selected Topics


This chapter contains discussions of various RootCause topics that may be of interest to you, the RootCause user.

Advanced Capabilities: Aprobe is Underneath

RootCause is based on the OC Systems Aprobe tool. This is why RootCause is installed (by default) in a directory called "aprobe". All of the power of Aprobe can be used with RootCause. Specifically, one can write probes using Aprobe and then copy the UALs into the workspace and those probes will be automatically included as part of the next RootCause run.

Probes can do just about anything. Typical uses are:

  • Custom formatting of data
  • Data dependent logging
  • Defining alerts
  • Deploying a fix to a user

See the Aprobe user guide for more information about Aprobe. See "Writing Custom Probes" for how to include hand-written APC in your RootCause Workspace.

RootCause and Efficiency Concerns

RootCause should preferably be installed on a local file system. It will work if it is mounted on a remote file system, but this may also impact performance.

The RootCause workspace should be created on a file system that is local to the machine on which the traced process will be run. The data logged by RootCause is written to the workspace. If the workspace is remote, then the logged data will have to be transmitted across the network, increasing the overhead of logging as much as tenfold. See also, "RootCause Data Management".

RootCause adds probes to the application in memory. These probes are optimized machine code, so while they are fast, they must of course add overhead to the execution of the application. RootCause only "patches" the traced functions and methods. For Java, RootCause inserts byte code to only trace the methods of interest, not all methods.

Furthermore, RootCause tracing applies automatic "load shedding" to automatically turn off tracing of functions that are introducing high trace overhead. Such functions can then be removed from the trace specification by the user in the next run. Using this mechanism and by adjusting the load shedding level, one can quickly get to an acceptable level of overhead. See "RootCause Overhead Management".

Typically, we have seen that one can add a 5% load and still get a useful trace. In general, you will have to iterate to define a good trace that adds a reasonable load so the application can still run in the operational environment. Note that RootCause supports this workflow, by allowing one to choose (and remove) trace items from the viewer to speed the removal of "noise" routines (noise routines are those that add little value to the trace).

Note that a program being probed by RootCause, will take somewhat longer to start. Typically, a few extra seconds are required for a RootCause session on an application. This minimal overhead is incurred because RootCause does as much as possible up-front, to reduce the runtime penalty later.

Solaris SETUID, and Security Concerns

This section briefly describes how RootCause / Aprobe can be used with certain "secure" applications on Solaris. These mechanisms are not yet provided for other platforms; contact OC Systems for more information.

The Solaris operating system provides a secure environment for debugging and running your applications. RootCause and Aprobe do not interfere with this mechanism but extend it to work safely in a number of environments that require it.

For the purposes of this document, a secure application is one that has the setuid bit set. We discuss how the Solaris security mechanism works with these applications and how Aprobe and RootCause provide their own extensions to the Solaris security protections to allow you to safely run probes on these applications without compromising system security.

Note that this document does not discuss applications with the setgid (group) bit set. At the time of writing, Aprobe and RootCause do not support running such applications.

Avoiding Solaris Warnings

Even if you do not wish to probe secure applications, you may want to place libapaudit.so in the secure location anyway to eliminate error messages. If you do not do this and try to run RootCause on an application that has the SETUID bit set, you will get an error message something like:

ld.so.1: mail: warning: /opt/RootCause/lib/libapaudit.so: open failed: illegal insecure pathname 
ld.so.1: mail: fatal: /opt/RootCause/lib/libapaudit.so: audit initialization failure: disabled. 

Although these look like fatal errors, the application ran without error, and it was only the loading of libapaudit.so that failed.

Placing libapaudit.so in the secure location as described below will allow libapaudit.so to load for SETUID applications like /usr/bin/mail, so it can determine whether to probe the new process or not.

Note that just placing libapaudit.so in the secure location does not allow one to actually probe the SETUID application unless one is running as the effective user.

The secure path for dynamically-loaded libraries is different on each version of Solaris. This logic is encapsulated in a script,

rootcause_libpath.

The simplest usage is:

  1. Log on as root so you have write access to /usr/lib and its subdirectories.
  2. Set up for using RootCause, e.g., . /opt/RootCause/setup (see "The Setup Script").
  3. Run the command:
    rootcause_libpath -c
    This will copy the appropriate library to the secure locations. These locations are under /usr/lib, so you must be super-user. The script assumes that you are set up for RootCause, so you must run the RootCause setup script first. You should see output like:
    /usr/lib/libapaudit.so correctly installed.
    /usr/lib/secure/libapaudit.so correctly installed.
    /usr/lib/64/libapaudit.so correctly installed.
    /usr/lib/secure/64/libapaudit.so correctly installed.
  4. Log off root on this machine.
  5. You will need to do this on each machine on which you use RootCause.
  6. After doing this, you will need to do rootcause_off, then rootcause_on again to pick up the new values.

Description of Solaris Security

This section briefly describes the Solaris security measures that are appropriate for RootCause / Aprobe. It should be noted that each version of Solaris has it's own subtle variations on this. All examples given are for Solaris 8 and over although, with the exception of Solaris 2.5.1, RootCause and Aprobe can be expected to behave identically on older versions as far as security goes. (Solaris 2.5.1 has overly tight restrictions that were corrected in later versions).

The first concept that must be understood is that every executable run has two users associated with it at runtime. The first is the "real" user, the logged in user - the user shown when you use the command "id". The second is the "effective" user which really governs the permissions you have during runtime.

(One important point is that if the real user is root, all security mechanisms are effectively disabled because they are moot. One practical result of this is that you may use Aprobe on any application if you are logged in as root).

Normally the real and the effective user are the same. If, however, the setuid bit is set on an application, the operating system changes the effective user to match the owner of that application. Most commonly this is the root user and is done to give a regular user temporary access to a limited set of secure resources.

Let's take the "/usr/bin/at" command as an example. The output from "ls -l" might look like this:

   -rwsr-xr-x   1 root sys  37876 Jul 10  2000 /usr/bin/at

Note that instead of an 'x' where we would expect the owner's executable bit, we see a 's'. This means that the application will run with the effective user root, with all the permissions that that allows.

What would happen if we were allowed to attach a debugger to this application? Suddenly we would be able to cause the application to execute arbitrary instructions as if it were root! To prevent this, the operating system will prevent the debugger interface being used in such a situation. (Again, if you are actually logged in as root, you will be allowed access).

Another aspect of security for these applications is where they load their libraries from. Obviously the application can have a set of specific libraries linked in and these can be safely loaded. But the runtime linker also provides some capabilities to add arbitrary shared libraries in using the LD_PRELOAD and LD_AUDIT runtime linker environment variables. Once again it would be a security risk if any library could be specified, so the operating system only allows libraries in "secure" paths to be loaded by these environment variables.

Impact of Security Measures on Aprobe

When we run the "aprobe" command on an executable, we start out life as a debugger, patching in the probes that we've specified. Once this is done, the "aprobe" executable detaches from the application and goes away. As was mentioned above, Solaris will not allow the use of the debugger interface on a secure application. Aprobe will specifically check for this so it can give a more friendly warning if you try to run it:

$ aprobe /usr/bin/at 
(E) /usr/bin/at
This file is owned by root and has the setuid bit set.  
You need to use the secure version of aprobe (saprobe) to run this
application under Aprobe. Please see the section on secure applications
in the Aprobe user's guide.

As this error describes, there is a secure version of Aprobe that allows us to run on these applications. In fact, there are three ways we could run this application:

  1. Log in as root. As was mentioned above, security restrictions are moot for the root user and so Aprobe will run fine.
  2. If you could rebuild or relink the application, you could link in the libdal.so file that allows an executable to patch itself. The use of this is outside the boundaries of this document but you can find more details in the Aprobe user's guide.
  3. Use the secure version of Aprobe mentioned above - saprobe. The secure version itself has the setuid bit set so that it runs as root and can attach to the application.

It doesn't take much thought to realize that option (2), if implemented blindly, could leave a big security hole in your application. But, of course, it isn't implemented blindly. When you run saprobe on an application, the application must be listed in $APROBE/lib/secure_applications. This file is created so that it is only writable by root and we check this is still the case at runtime before allowing its use. Let's see what happens when we try to run without an entry for it:

$ saprobe /usr/bin/at
(W) /usr/bin/at
You are running a secure application but the secure_applications file
did not contain an entry for it.
(F) Aprobe will not run this application due to security restrictions. Please see the section on secure applications in the Aprobe user's guide.

The second level of checking is that the files loaded by Aprobe - the runtime libraries and the UALs - must all be owned by root and not writable by anyone else. Additionally, for all UALs except the default system_ual, an entry for them must exist in the secure_applications file under that application. If it doesn't:

saprobe -u trace /usr/bin/at
(W) "/app1/aprobeinst/fred/aprobe_sun_50/ual_lib/trace.ual":
This ual is not valid for your secure application. It must be listed in the secure_applications file under this application. 
(F) Aprobe will not run this application due to security restrictions. Please see the section on secure applications in the Aprobe user's guide.

The format of the secure_applications file is defined in its header. However, it is pretty trivial. For each application we allow we have an "APPLICATION" keyword followed by any number of "FILE" keywords. Another APPLICATION keyword automatically ends the list of allowed files. For instance:

APPLICATION /usr/bin/at
FILE /app1/aprobe/inst/fred/aprobe_sun_50/ual_lib/trace.ual
FILE /opt/product/probes/myprobe.ual
APPLICATION /usr/bin/another_app ...

Impact of Security Measures on RootCause

RootCause builds on top of Aprobe and so has the same protections described above. However, the RootCause intercept mechanism is based on the LD_AUDIT environment variable and must be managed appropriately.

By default, if you set LD_AUDIT to a specific path, Solaris will not load that audit library when the application is run. Annoyingly, later versions of Solaris give a misleading error message about this being a fatal condition which it isn't!

If, however, the audit library is in a secure location and the LD_AUDIT environment variable is appropriately set, it will be loaded by the runtime linker. The path to that library varies between versions of the O/S but, on Solaris 8 and higher, is /usr/lib/secure.

So, to allow RootCause to intercept secure applications, the audit library is placed within here. In order that this does not create a security risk in itself, RootCause ensures that it will only run an application under RootCause if the workspace's script file is secure. If it isn't, you'll get an error message and the application will be run without RootCause.

By this mechanism, we safely control access to the scripts that will execute Aprobe and trigger the protections that Aprobe introduces.

Using the Secure Version of RootCause / Aprobe

The first step that must be taken is to provide appropriate ownership, permissions and location of certain RootCause files. A normal installation of RootCause does not have a secure version of Aprobe, it doesn't locate the audit libraries in secure paths and it may not have appropriate ownership of runtime libraries and UALs.

To create a secure environment, you must log in as root and run the rootcause_libpath script. This takes a number of parameters and must be run on each machines on which you wish to use the secure version of RootCause.

There are two main parts to this:

  1. Creation of the secure Aprobe files. This must be performed once for a given installation of RootCause / Aprobe. In many networks it must be done on the machine that the installation is directly mounted on (e.g. many NFS mounted filesystems do not allow root write access from across the network). The command to update the installation is rootcause_libpath -s
    This is described in more detail in "Avoiding Solaris Warnings".
  2. Creation of the secure RootCause files. This must be performed once on each machine you wish to intercept secure applications on. To command to do this is rootcause_libpath -c
    Note that you can combine this and the "-s" option where appropriate.

A secondary step for RootCause is to define the workspace as secure. When creating a workspace, check the "Secure Application" checkbox to mark the workspace as secure. This will create runtime scripts that invoke the secure version of Aprobe. If, at a later time, you wish to change the security property of the workspace, you can change it in the Aprobe options tab of the RootCause options dialog (accessed from the Setup menu).

Note that if you build a secure workspace for a non-secure application or vice-versa, you will get error messages at runtime.

64 bit applications

64 bit applications are not yet supported by RootCause. If you require this support, please let us know.

Source and Application Debug Information

RootCause is intended to be used in a production environment, and in a production (stripped executable) environment, neither source code nor debug info is available. Therefore, RootCause traces may be applied to an application with no symbols or debug information. However, symbolic information is required when the traces are developed using the RootCause GUI. This is discussed in detail in "Building a "Traceable" Application".

Logging Controls

One of the most fundamental features of RootCause is a robust and fast logging mechanism, both for persistent and wraparound data collection.

RootCause chooses sane defaults for logging, but you may want to change them. There are several main user-selectable options for logging application data provided in the RootCause Options Dialog.

See "RootCause Data Management" for more information.

Multiple Application Tracing

Each application puts its trace data into an application specific workspace. This mapping of application to workspace is defined in the registry.

When viewing trace data, RootCause can add trace data from other applications/workspaces, so that you can view a fully integrated process trace. The traces are automatically ordered so there is a coherent time line for all traced applications.

RootCause collects data into separate files to eliminate contention for a single logging buffer. For example, if you are tracing 10 processes and all 10 are trying to write to the same buffer, then there will be much contention for that buffer and performance would suffer. RootCause solves this problem by logging the data into independent application specific workspaces and then combining the traces in the GUI viewer.

A trace is merged with an existing trace using the Add Selected Process Data operation in the Trace Display Popup Menu of the Trace Display window. You can then use Save As XML or Save As Text to save this merged trace for future examination.

This is illustrated by the Advanced demo delivered with RootCause in $APROBE/demo/RootCause/Advanced. See the README.html file in that directory for a detailed description of that application, the separate Java and C++ portions, and the merging of combined traces.

The ability to view a single time line trace of multiple processes (even on SMP computers) is a very powerful feature of RootCause.

Multiple Executions of a Single Application

It is not uncommon in production environments for a single application to have multiple processes executing simultaneously. RootCause handles this by tracing each process independently.

As mentioned previously, each application has a workspace.

In the workspace there are a number of sets of Process Data Sets.

RootCause automatically reuses the oldest of these process data sets upon each new invocation of the registered application. The number of process data sets to keep is specified with "Keep logged data for N previous processes" in the RootCause Options Dialog.

So if you wish to trace a total of 10 simultaneous executions of your application, you will tell RootCause to create at least 10 process data sets in the workspace. Note that this mechanism can also be used to save serial executions of a process too. For example, if you would like to trace the last 4 executions of the registered application, tell RootCause to keep 4 previous processes.

See "RootCause Data Management" for more information.

Dynamically Loaded Libraries

The Trace Setup Dialog shows the shared libraries that are statically linked into the program. However, it's common for programs to dynamically (programmatically) load additional libraries, which you also may want to trace.

If you know that a library is loaded dynamically and desire to write probes for that library, then use the Workspace->Add Dynamic Module menu item to add it to the Program Contents Tree.

NOTE: This will allow functions in the library to be traced, and will also force the library to be pre-loaded, before the start of execution. If other libraries or state data must be initialized prior to loading the library of interest, then such pre-loading will break the application, and you will not be able to trace that library.

Libraries with No Debug Information

The RootCause Console GUI takes advantage of Aprobe's APC translator to provide function prototype information for C object modules in shadow header file.

A shadow header file is a legal C header file, containing C type and function prototype definitions and C preprocessor directives (such as #include). The information in this file supplements the information in a compiled object module of the same name, resulting in more useful traces and custom probes.

When you click on the name of a compiled module, say "libc.so", in the Trace Setup Dialog, this causes that module to be opened and searched for debug information provided by the compiler. Then, a shadow header file corresponding to that module--in this case, "libc.so.h"-- is searched for, and if found, the information found there correlated to the symbols read from the module. This results in otherwise "unknown" functions being grouped according to the header file from which they are read, and having parameter type (and often name) information.

Shadow header files are searched for in a "shadow" subdirectory of the .rootcause Directory (e.g., ~/.rootcause/shadow/libm.so.h), and if not found there, in $APROBE/shadow.

OC Systems provides only one or two sample shadow header files on each platform. You're encouraged to add your own, and to contact OC Systems if you need help developing a header file for a particular library. Note that you don't have to provide all the prototypes in the library, only those you need. Conversely, if there are a few extras that aren't in the shadowed library that's okay, too -- they'll be ignored.

The easiest way to create such a file is simply to add #include preprocessor directives for existing C header files provided with your system or compiler. Note that these must be C header files ending in .h, not C++ header files. These are preprocessed using the same environment (include path and preprocessor definitions) as the APC files, but you can edit the files and add your own #define directives as necessary.

Tracing Java and C++ In One Program

RootCause is designed to support both Java and compiled-language probes and traces in a single application. To do this, you will need a license for both RootCause for Java and RootCause for C++; contact OC Systems if you have questions about this. The RootCause GUI itself is an example of mixing Java and C in an application. It is implemented in Java, but has significant portions of its functionality implemented in C, which is dynamically loaded by Java. To see the Java/C interaction in a trace, one would:

  1. Open a Java Workspace for the Java main class of the application.
  2. Use Workspace->Add Dynamic Module to specify the dynamic C/C++ library that will be loaded.
  3. Click Setup to show the Java classes and dynamically loaded module, and define your traces as usual.

Another common scenario is when a C++ application creates another process to act as its GUI, and communicates with it by sockets. In this case, one creates separate workspaces for the compiled and Java parts of the application, and merges the results, as described in #Multiple_Application_Tracing.

RootCause Shipped as Part of Your Application

RootCause is designed to solve problems from a single occurrence while simultaneously reducing support costs. While you can wait until a user reports a problem and then use RootCause to debug it, it is an intended use of RootCause that you include it as part of your application, so your application is always logging trace data. Whenever a user encounters a problem, they merely send you the RootCause collect file, and the root cause analysis of the problem is performed from that file. This greatly simplifies the reporting and debugging of problems. In some cases, for particularly difficult problems, you may have to send a more focused trace to the user site to complete the analysis of the problem, but the RootCause workflow is optimized to do this.

Note that RootCause can also be used to deliver a fix to the user as well. This is an advanced feature in which the fix is implemented in Custom APC, and compiled into the workspace. Contact OC Systems if you want to do this.

If you plan to include RootCause as part of your shipped application, we suggest that you contact OC Systems support to enter into a discussion with one of our technical staff. It is not difficult, but we can discuss various issues with you to save time and effort.

Writing Custom Probes

The RootCause GUI automatically generates probes and traces. However, it also allows you to modify the probes that are generated using the full power of the APC language, a superset of C. In the Trace Setup Dialog, after one has selected some probe actions to perform, click the Custom button at the bottom of the dialog. This pops up the Generate Custom APC Dialog, which allows you to save the text of the probe to an external text file, rather than including it in the workspace. You can then edit this file, adding conditions, logging only specific fields of data, or anything else supported by Aprobe or C.

When you have a custom APC file to be included with your probes, you specify its path in the Custom APC Files text field of the Build Options Tab of the RootCause Options Dialog. This file will then be automatically included when you Build your probes.

You can also use Aprobe to develop separate UALs, and include them by copying them into your workspace. You may also specify the UAL in a more formal way using the Add UAL Dialog (see "Add UAL").

You can even specify a custom Java GUI to configure your UAL, and pass parameters to it.

See CHAPTER 8 - "RootCause GUI Reference", for a full description of these items and dialogs.



Copyright 2006-2017 OC Systems, Inc.

Next Previous Index Top