(Top)
1 Overview
2 Getting started
3 Base concepts
4 C++ Instrumentation API
5 C++ Instrumentation configuration
6 Python Instrumentation API
7 Scripting API
7.1 Exceptions
7.2 Module initialization
7.2.1 initialize_scripting
7.2.2 uninitialize_scripting
7.2.3 set_external_strings
7.3 Process
7.3.1 process_launch
7.3.2 process_connect
7.3.3 process_is_running
7.3.4 process_get_returncode
7.3.5 process_get_stderr_lines
7.3.6 process_get_stdout_lines
7.3.7 process_stop
7.4 Program
7.4.1 program_cli
7.4.2 program_set_freeze_mode
7.4.3 program_get_frozen_threads
7.4.4 program_wait_freeze
7.4.5 program_step_continue
7.5 Data
7.5.1 data_collect_events
7.5.2 data_clear_buffered_events
7.5.3 data_configure_events
7.5.4 hash_string
7.5.5 data_get_unresolved_events
7.5.6 data_get_known_threads
7.5.7 data_get_known_event_kinds
7.6 Troubleshooting
8 More
This chapter describes the API of the Python scripting module.
Such script can remote interact with any instrumented program with Palanteer
, independently of the used language (C++ or Python at the moment).
Some typical usages are:
This API aims at being simple to use. As multi-threading would introduce a lot of complexity for few benefits, calls are synchronous and work through polling rather than blocking APIs.
When the module encounters a nonrecoverable problem, a specific exception is raised with an explanatory message.
This section presents the list of exceptions and when they are raised.
InitializationError is raised when calling an API which requires that the module is initialized and it is not:
class InitializationError(Exception):
"""The Palanteer library has not been initialized. Use the function 'initialize_scripting'."""
ConnectionError is raised when calling an API which requires a connection to the instrumented program and there is none:
class ConnectionError(Exception):
"""There is no connection to the program."""
UnknownThreadError is raised when calling program_step_continue with an unknown thread name:
class UnknownThreadError(Exception):
"""The thread with the provided name is unknown."""
This function initializes the scripting module. It shall be called once before any other call.
The declaration is:
# port : the TCP port that is used to communicate with the instrumented program
# log_func: the logging function for the Palanteer library (debug)
# no returned value
def initialize_scripting(port=59059, log_func=_default_log_func)
The default logging function prints warning or errors on stdout for debugging purposes. It can thus be overriden.
_default_log_min_level = 2
def _default_log_func(level, msg):
if level<_default_log_min_level:
return
date_str = datetime.datetime.today().strftime("%H:%M:%S.%f")[:-3] # [:-3] to remove the microseconds
level_str = "[%s]" % {0: "detail ", 1: "info ", 2: "warning", 3: "error "}.get(level, "unknown")
print("%s %-9s %s" % (date_str, level_str, msg))
An example of call is:
initialize_scripting()
This function uninitializes the scripting module.
If a launched program is still running, it is automatically stopped.
The declaration is:
# no returned value
def uninitialize_scripting()
This function sets the lookup to recover the original content of obfuscated strings.
The lookup generation is explained here.
The lookup is persistent until another one is installed.
The string resolution based on this lookup is done at two moments:
For these reasons, the string lookup shall be installed before configuring the event specification and running the program.
The declaration is:
# filename: the filename which contains the lookup content (text format)
# lkup: a Python 'dict' with integer keys and string value
def set_external_strings(filename=None, lkup={})
If the two parameters are provided, the final lookup is the union of the file content and the Python dictionary.
An example of call is:
set_external_strings("lookup.txt")
This function launches an instrumented program and connects to it.
If the connection fails or another program is already connected, a ConnectionError exception is raised.
The declaration is:
# program_path : path of the program to launch
# args : list of command line arguments
# record_filename: filename for the record. Empty string means no storage
# pass_first_freeze_point: if True, returns only after first freeze point is met and released (kind of synchro)
# capture_output : if True, reads stdout and stderr and allows the user to poll them asynchronously
# cli_to_quit : if not None, the CLI command to call to quit the application. If it fails, it falls back to other termination mechanisms
# connection_timeout_sec : timeout in second to have a connection with the program. After it expires, a ConnectionError exception is triggered
# no returned value
def process_launch(program_path, args=[], record_filename="", pass_first_freeze_point=False, capture_output=False, cli_to_quit=None, connection_timeout_sec=5.)
An example of call is:
process_launch("./build/bin/testprogram", ["collect"], capture_output=True)
This function connects to an externally launched instrumented program, which should use the option “wait for server”.
If the connection fails or another program is already connected, a ConnectionError exception is raised.
The declaration is:
# record_filename: filename for the record. Empty string means no storage
# pass_first_freeze_point: if True, returns only after first freeze point is met and released (kind of synchro)
# cli_to_quit : if not None, the CLI command to call to quit the application. If it fails, it falls back to other termination mechanisms
# connection_timeout_sec : timeout in second to have a connection with the program. After it expires, a ConnectionError exception is triggered
# no returned value
def process_connect(record_filename="", pass_first_freeze_point=False, cli_to_quit=None, connection_timeout_sec=5.)
An example of call is:
process_connect()
This function provides the state of the launched program:
The declaration is:
# state: boolean state True if running, False if not launched or exited
state = process_is_running()
An example of call is:
while process_is_running():
... # Do something
This function provides the return code of the launched program if finished, else None
The declaration is:
# returncode: process return code if finished, else None
returncode = process_get_returncode()
An example of call is:
if not process_is_running():
print("Exit status is: %d" % process_get_returncode())
This function returns the currently received stderr output in case the program has been launched with the parameter capture_output=True
.
This output is buffered in memory waiting for the next call. The buffer is cleared after the call.
The declaration is:
# lines: a list of strings (one per line)
lines = process_get_stderr_lines()
An example of call is:
lines = process_get_stderr_lines()
if lines:
print("\n".join(lines))
This function returns the currently received stdout output in case the program has been launched with the parameter capture_output=True
.
This output is buffered in memory waiting for the next call. The buffer is cleared after the call.
The declaration is:
# lines: a list of strings (one per line)
lines = process_get_stdout_lines()
An example of call is:
lines = process_get_stdout_lines()
if lines:
print("\n".join(lines))
This function “terminates” a launched program with the following mechanisms in order, until the process is effectively ended:
The declaration is:
# no returned value
def process_stop()
An example of call is:
process_stop()
Any running instrumented program is stopped when the palanteer
module is uninitialized or when the python script exits (except with SIG_KILL...).
This property prevents pollution by zombie processes in case of script crash for instance.
This set of commands is functional only if the instrumented program was not compiled with the flag PL_NOCONTROL
set to 1.
Indeed, this directive explicitly removes the remote control feature, leaving only the observation part.
This function calls a remote CLI (Command Line Interface) to be executed on the instrumented program side.
The input is a string which contains the command name and its parameters, see here for full description.
The output is a tuple (status
, text_answer
).
A null status means a successful call, else a failure. In the latter case, the text_answer shall contain some details on the error.
The declaration is:
# command_string: the command string including the command name and the parameters
# status : status code integer value. 0 means success, else failure
# text_answer : text answer from the command. In case of failure, shall contain some details on the error
status, text_answer = program_cli(command_string)
An example of call is:
status, text_answer = palanteer.program_cli("config:setRange min=300 max=500")
if status!=0:
print("Error: %s" % text_answer)
This function controls the “freeze mode” state on the program side.
If enabled, each thread of the program hitting a plFreezePoint()
call is stopped, waiting for an order from the script to continue (either disabling the freeze mode, either using program_step_continue
)
The typical usage is to control the dynamic of the program to safely change its configuration or stimulate it. This can be associated to a dedicated event filtering.
The declaration is:
# state: boolean
# no returned value
def program_set_freeze_mode(state)
An example of call is:
program_set_freeze_mode(True)
This function returns the list of names of the currently frozen threads.
The declaration is:
frozen_thread_list = program_get_frozen_threads()
An example of call is:
for thread_name in program_get_frozen_threads():
print(thread_name)
This function returns the frozen threads among the provided ones either as soon as all provided threads are frozen, either because the timeout expired.
The declaration is:
# thread_names : list of names of threads that we expect to be frozen before the timeout
# timeout_sec : maximum duration in second to wait for all provided threads to be frozen.
# frozen_thread_list: list of names of the frozen threads restricted to the list of provided ones as input (the list can be a string in case of only 1 thread)
frozen_thread_list = program_wait_freeze(thread_names, timeout_sec=1.0)
An example of call is:
if program_wait_freeze("Workers/Worker 1", 0.2):
... # Call a CLI (for instance)
This function releases a selection of “frozen” threads from their current freezing point. It has an effect only on the frozen threads.
If one of the input thread is unknown, an UnknownThreadError
exception is raised.
After sending the command to the remote program, the function waits up to the timeout that each thread changes its frozen state.
The declaration is:
# status: True if all input threads have changed their state, else False
status = program_step_continue(thread_names, timeout_sec=1.) # Note: the list can be a string in case of only 1 thread
An example of call is:
program_step_continue(["Workers/Worker 1", "Main"])
This set of commands is functional only if the instrumented program was not compiled with the flag PL_NOEVENT
set to 1.
Indeed, this directive explicitly removes the code of the events generation, leaving only the control part (CLIs).
This function returns the list of the received events from the selected ones.
This list is cleared after being returned, so only new events are provided. It is also cleared when calling events_clear_buffered
or data_configure_events
.
The call to data_collect_events()
returns only when at least one of these conditions is met (see parameters in data_configure_events
):
wanted
events have been seen at least once
unwanted
events has been seen once
frozen_threads
are frozen
max_event_qty
timeout_sec
seconds
Only the events selected with data_configure_events
are received, in chronological order of the parent's end date (if any parent).
The declaration is:
# wanted: stops the event collection as soon as all event names in this list are found (order does not matter)
# unwanted: stops the event collection as soon as one event name in this list is found
# frozen_threads: stops the event collection as soon as all provided threads are frozen
# max_event_qty: stops the event collection as soon as this quantity of event is collected
# timeout_sec : stops the event collection as soon as the function duration exceeds this timeout
# events: a list of collected Evt objects sorted by "end date"
events = data_collect_events(wanted=[], unwanted=[], frozen_threads=[], max_event_qty=None, timeout_sec=1.)
An example of call is:
for e in palanteer.data_collect_events(unwanted=["Error"], frozen_threads=["Main"]):
print(e)
An Evt object has the following fields:
Field | Description |
---|---|
thread | thread name |
kind | the type of event, among data , log , lock use , lock wait , lock ntf |
path | a list of names from the root event to the captured event name. The event name is always path[−1] |
value | the value of the event. The type depends on the event. For “scopes”, the value is the duration in nanosecond |
date_ns | the event date in nanosecond. The origin is the date of the connection to the program |
spec_id | the event spec index in the configuration list where this event comes from (see data_configure_events )) |
children | a list of nested events if the event is a “hierarchical parent” |
data_configure_events
)
This function clears the buffered event waiting in the module for a polling call to data_collect_events
.
It is “deeper” than a dummy call to data_collect_events
(i.e. and ignoring its content) because it also clears the events buffered inside the dynamic library.
Its typical use is to start a fresh scenario after a non-synchronized CLI call which affects the generated events.
The declaration is:
# no returned value
def data_clear_buffered_events()
An example of call is:
data_clear_buffered_events()
This function configures the subsequent reception of events via data_collect_events
. This configuration includes both the polling process and the content of the event list.
The declaration is:
# specs: list of EvtSpec objects defining the selection of events
def data_configure_events(specs)
Definition: An EvtSpec
is a specification of a simplified subtree of the event hierarchy of a particular thread, as defined in the instrumented program.
Its parameters are:
Example of usage:
# Specify an EvtSpec. It can be prepared in advance and reused multiple time
my_selection = palanteer.EvtSpec(thread="MyThread", events=["first event", "top event/**/child event", "value"], parent="Task/Top node")
# Apply the event selection from now on
palanteer.data_configure_events(my_selection) # Note: with 1 event spec, the list can be replaced by the lonely spec
EvtSpec
objects so that several use cases can be processed at the same time. Evt
object contains the field spec_id
which refers to the event spec index in this list.
Definition: An event path is a way to specify the location of an event in the hierarchical tree.
Its format is similar to Unix paths: an ordered list of event name separated with a slash '/'.
*
is a wildcard for 1 level. It shall be used alone (not a wildcard on a part of the name)
Tasks/*/iteration number
"
**
is a wildcard for any quantity of levels
**/event name
" is equivalent to "event name
")
Tasks/**/iteration number
"
.
forces the location to match the root (either the parent for an element with parent, or the tree root in other cases)
Tasks/./Sub Task
" is equivalent to "Tasks/Sub Task
"
/Tasks/iteration number
"
When no parent event path is specified, the corresponding return list via data_collect_events
is a flat list of “chronological” (see note below) events without children.
When a parent event path is specified, the corresponding return list is a list of “chronological” parent events whose field children
is a flat list of chronological instances of the specified events under it in the event tree.
The date of an event is either the one of the event if it owns one, either the start date of the parent (which is a scope by design so always has a start date).
The chronological ordering follows these dates with the important exception of scopes where the scope end date is used. The reason is that the scope is “complete” and usable only when its end is known, and is stored only at that moment.
Using a parent event path is an implicit and easy way to associate some event instances.
To work properly with this concept, the following points are important to note:
Definition: Resolving a path means confronting and successfully matching an event path as specified above with the time-evolving tree of the instrumented program. It is intrinsically a dynamic process.
A resolution does not always occurs, due to inadequate specified event path or due to some not met dynamic condition to generate such event.
When investigating a resolution failure during a script creation, some valuable hints are provided by the command data_get_unresolved_events
.
As all mechanisms are based on hashed strings, all event names shall be exactly as written in the code, including case, spaces and units.
Examples:
The tree above contains events with names as a letter for clarity. Note that these names are reused (on purpose) in different locations of the tree.
Based on this tree, below are some example of event specifications and their result in comment:
EvtSpec(thread="Worker", events=["A", "D", "I", "K"]) # => selects individually A, D(1), D(2), D(3), I(1), I(2) and K
EvtSpec(thread="Worker", parent="A", events=["D", "I", "K"]) # => selects A as parent, D(1), D(2), I(1) and K. I(2) has a non matching parent
EvtSpec(thread="Worker", parent="A", events=["D/D", "K"]) # => selects A as parent, D(2) and K. D(1) has a non matching event path
EvtSpec(thread="Worker", parent="A", events=["*/D", "K"]) # => selects A as parent, D(2) and K. D(1) event path is not matching because 1 level is missing below the parent
EvtSpec(thread="Worker", parent="A", events=["./D", "K"]) # => selects A as parent, D(1) and K. D(2) is not matching because of the "." constraint
EvtSpec(thread="Worker", parent="A", events=["B/**/K"]) # => selects A as parent, and K(1). K(2) has a non matching event path
EvtSpec(thread="Worker", parent="D/D", events=["K"]) # => selects A as parent, and K(2).
EvtSpec(thread="Worker", parent="A/*/D", events=["K"]) # => selects D(2) as parent, and K(2).
EvtSpec(thread="Worker", parent="D", events=["K"]) # => selects D(1) as parent, and K(2). The closest parent of the root is always selected.
EvtSpec(thread="Worker", parent="D/D", events=["K"]) # => selects D(2) as parent, and K(2).
EvtSpec(thread="Worker", parent="./D", events=["K"]) # => selects D(2) as parent, and K(3). Only L(3) has a parent matching the "." constraint
EvtSpec(thread="Worker", events=["./D/*"]) # => selects individually I(2), J and K(3). Only D(3) has a parent matching the "." constraint
This function returns the Palanteer
hash of the provided string. Its typical usage is debugging scripts with the external string feature enabled.
The declaration is:
# string_to_hash: the string to hash
# is_short_hash: 32 bits hash if set to True (default is 64 bits)
hash = hash_string(string_to_hash, is_short_hash=False):
Example of usage:
# Creating a "manual" external string lookup
set_external_strings(lkup={hash_string("Add fruit"): "Add fruit", hash_string("CRASH"): "CRASH"})
This function returns unresolved event specs configured with data_configure_events
.
It is typically used to investigate a specification issue.
It reports information up to 32 unresolved events.
The information is kept after a process is finished, but is cleared when a new event spec configuration is set.
The declaration is:
# unresolved_event_info_list: list of triplet (spec_id, event_spec, explanation message) for all unresolved event specifications
unresolved_event_info_list = data_get_unresolved_events()
Example of usage:
def debug_print_unresolved_events():
unresolved_event = data_get_unresolved_events()
print("Unresolved events (%d):" % len(unresolved_event))
for spec_id, event_spec, msg in unresolved_event:
print(" - From spec #%d, %s for event '%s'" % (spec_id, msg, event_spec))
debug_print_unresolved_events()
that can be used to investigateThe list of possible error messages is:
Error message | Explanation |
---|---|
“No events in record to match with” | no event was seen during the execution of the program, either because PL_NOEVENT==1 or the program had no time to generate events on the period |
“No matching thread” | no thread is matching the provided name |
“No matching event name” | no event has a name corresponding to the one in the event spec |
“No matching event path” | some event with the right name have been found but the constraints on the path are not matching |
“No matching parent event name” | some event with the right name and path have been found but no parent event has a name corresponding to the one in the event spec |
“No matching parent event path” | Everything is resolved except that the constraints on the parent event path are not matching |
"'.' is not matching the event's root” | Everything is resolved except that the ".” constraint is not fulfilled for the event |
"'.' is not matching the parent event's root” | Everything is resolved except that the ".” constraint is not fulfilled for the parent event |
Inconsistent parent events, it shall be the same for all events” | Several couples (event, parent event) were found but with different parents. The first parent event found becomes the reference, others are ignored and are unresolved. This can be solved by being more specific on the different paths, or by splitting the event spec into several ones. |
This function returns all currently known threads.
The declaration is:
# thread_names: list of thread names
thread_names = data_get_known_threads()
Example of usage:
def debug_print_known_threads():
thread_names = data_get_known_threads()
print("Known threads (%d):" % len(thread_names))
for t in thread_names:
print(" - %s" % t)
debug_print_known_threads()
that can be used to investigateThis function returns all currently known kinds of events.
The declaration is:
# event_kinds: list of pair (event path, kind of event as string, thread name)
event_kinds = data_get_known_event_kinds()
Example of usage:
def debug_print_known_event_kinds(output_file=sys.stdout):
"""This function displays the list of the known event kinds"""
event_kinds = data_get_known_event_kinds()
print("Known event kinds (%d):" % len(event_kinds), file=output_file)
event_kinds.sort(key=lambda x: (x[2].lower(), x[1].lower(), x[0]))
for path, kind, thread_name in event_kinds:
print(" %-11s %-24s : %s" % ("[%s]" % kind, thread_name, "/".join(path)), file=output_file)
debug_print_known_event_kinds()
that can be used to investigateMy test is not always giving the same result when run in a loop
Some actions require strict synchronization to be reliable. In particular:
data_collect_events
A CLI is failing because it is not known. But it is present in my code
The CLI may be present in the code but not declared yet on the remote program side.
Declaring the CLI before the start of the instrumentation library is a good habit and fully prevent this kind of issues.
I have hard times debugging my script
Here are some advises:
debug_print_unresolved_events()
which tells you which Event Specification was not matched in the record and why.
palanteer_scripting.default_log_min_level = 0
** I cannot retrieve log events**
Ensure that the EvtSpec refers to the category of the log, not to its message (which is the event “value”).
Is it ok to have several events with the same name?
Name collisions are ok, if the “path” in the tree is different, they will be seen as different event kinds.
However, it makes it harder for humans to visualize them, or point on a particular one in a script, as the usage of “parent” is then mandatory.