**Look into Palanteer and get an omniscient view of your program** Improve software code quality - C++ and Python

@@ Overview

![ ](images/header_perspective.png width=800px) **Palanteer** is a 3-parts solution to software quality improvement for C++ and Python. 1. An **instrumentation library** for programs to rule them all.
Provides the instrumentation API and remote communication capabilities. - C++: - ultralight single-header cross-platform library - [very low](#c++instrumentationperformance) overhead - compile time selection of [groups of instrumentation](base_concepts.md.html#c++specific/groups) - support of [Fibers / userland threads](instrumentation_api_cpp.md.html#virtualthreads) - support for [automatic instrumentation](getting_started.md.html#quickc++automaticfunctionsinstrumentation) (GCC only) - Python: - One module for manual instrumentation - [Unmodified program](instrumentation_api_python.md.html#automaticinstrumentationwithoutcodemodification) can be analyzed as for `cProfile` with `-m` option - Support of [asyncio / gevent](instrumentation_api_python.md.html#virtualthreads) 1. Associated with the **interactive viewer**: - Internal behaviors become obvious - All data can be graphically visualized and manipulated. 1. Associated with the **remote scripting Python module**: - Easy stimulation and deep observation - Enables trivial testing, measuring or monitoring Recording simultaneously [up to 8 streams](#multistream) (i.e., from different processes) is supported.
+++++
- **Simple tracing of meaningful atomic events** - Time scopes, variable values, locks, logs, memory allocations, context switches... - Global structure from the event hierarchy - Typed (string, timestamp, float, integer...) - C++ - Compile-time selection of per-user [groups of events](base_concepts.md.html#c++specific/groups) - [Very low](#c++instrumentationperformance) overhead, typically few nanoseconds - [Automatic instrumentation](getting_started.md.html#quickc++automaticfunctionsinstrumentation) with `-finstrument-functions` flag (GCC only) possible - Python - [Automatic instrumentation](instrumentation_api_python.md.html#automaticinstrumentationwithoutcodemodification) of functions enter/leave - Automatic tracking of interpreter memory allocations - Automatic tracking of all raised exceptions - Automatic tracking of garbage collection runs - Automatic tracking of coroutines ![Hierarchical scopes with graphable inner variables](images/scope_with_inner_variables.gif width=80%) ![High precision - nanosecond order](images/nanosecond_precision.gif width=80%)
- **Lock usage tracking** - Explicit threads' battle for locks - Who is blocked by whom and for how long ![Lock usage tracking](images/lock_contention.gif width=80%)
- **Memory usage tracking** - Detect allocation hot spots, big allocators, temporary allocations... - Leak detection (based on traced events) ![Memory usage tracking with hot spots - here, typical temporary allocations](images/memory_spike.gif width=80%)
- **Visualization of the data from the best vantage point** - Smooth and interactive experience on a standard computer - Even with huge records - Many kinds of views to cover many kinds of needs - Timelines for CPU, timeline for memory, flame graphs, lock contention, context switch, curves, histograms... - Flexible layouts - Drag&drop support - No limit on view quantity - Can be saved and recalled later ![Example of layout](images/views.gif width=80%)
- **C++: Better assertions, [enhanced with provided context](#enhancedassertions)** - Just add variables or expressions as extra parameters - Compile-time selection of user defined groups of assertions ![Command line: dump of a failed enhanced assertion](images/crash_shell.png width=80%)
- **C++: [Stack trace](instrumentation_configuration_cpp.md.html#pl_impl_stacktrace) dumping** - Displayed in the terminal - Recorded with all events before crash - Available global context ease crash investigations ![Logged stack trace in the viewer](images/crash_scope.png width=80%)
- **C++: Full leveraging of [static strings](base_concepts.md.html#staticanddynamicstrings)** - Identified and hashed at compile time, no runtime cost - ["External strings"](getting_started.md.html#quickc++externalstringconfiguration): full stripping of instrumentation static strings from the binary - Benefit: code size reduction and instrumentation obfuscation - Strings are resolved with an external lookup generated from the code (tool provided) ![Test program without external strings](images/no_external_string_effect.png width=80%) ![Test program with external strings - smaller text section, no more instrumentation strings](images/external_string_effect.png width=80%)
**Easy [scripting](getting_started.md.html#quickremotescripting) of the stimulation and observation** - Elaborate deep and reliable system tests - Stimulate with CLIs (remote commands from instrumentation), monitor via events - Automate the extraction of performance indicators - Scripting language is Python - Scripts are independent of the program's implementation language ![Small functional scripted test](images/scripting.png width=80%)
**Palanteer** is a free, lean and comprehensive solution for better and enjoyable software development! ## C++ Features ### Easy tracing Each call to the instrumentation library is a piece of information on the execution of your program. The [instrumentation API](instrumentation_api_cpp.md.html) is straightforward and simple. As an example, the most representative ones are: - **plScope**: start a named timestamp measurement, automatically closed at the end of the C++ scope - **plData**: log a named value (numeric or dynamic string) - **plVar**: log a variable (name and content) Example: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++ // Declare the name of your thread (optional, it helps visualization, persistence of config and scripting) plDeclareThread("Worker 1"); // Named variable value tracing plData("Monster health", monsterHealth); // Variable tracing, shorter equivalent to plData("monsterHealth", monsterHealth) plVar(monsterHealth); // Multiple variable tracing, shorter equivalent to the multiple calls with one variable plVar(monsterHealth, monsterAttackCoef, monsterName); // Text tracing plText("stage", "Level one reached"); // Timestamped scope (automatically closed at C++ scope end) plScope("superFunction"); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All your data, whatever the quantity, can be smoothly visualized. And all of them can be viewed at least as a curve or histogram. ### Easy and efficient logging The `plLog` instrumentation family processes a timestamped printf-like messages with a level and category, on record and/or on console.
The user provided category is a way to later filter efficiently the logs per topic, and the available nested levels are `Debug`, `Info`, `Warn` and `Error`.
The full description is available [here](instrumentation_api_cpp.md.html#logs). !!! Using logs is straightforward as no complex configurable is required.
Console display is even enabled without `Palanteer` service initialization. Example: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++ plLogDebug("input", "Key '%c' pressed", pressedKeyChar); plLogInfo("output", "The resulting value of the phase %-20s is %g with the code 0x%08x", phaseStr, floatResult, errorCode); plLogWarn("phase", "End of a computation"); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The minimum level for displaying the logs on console is dynamically configurable with [`plSetLogLevelConsole(level)`](instrumentation_api_cpp.md.html#plsetloglevelconsole). Default is `Warn`.
While runtime performances are degraded when printing on console, displaying on console allows efficient debugging or tuning of programs under development.
Moreover, the cost of the log can be fully removed once the program is finished by using the [groups of instrumentation](base_concepts.md.html#c++specific/groups). In a similar way, the minimum level for recording the logs is dynamically configurable with [`plSetLogLevelRecord(level)`](instrumentation_api_cpp.md.html#plsetloglevelrecord). Default is all. **Comparison with some other loggers** All tools have to make some design decisions which impact the efficiency of their features, performances, usability or addressed use-cases.
In order to understand the use-cases where the logging service of `Palanteer` is interesting, let's compare it with some well established logger: - `spdlog`: feature-rich and popular C++ logging library - `Nanolog (Stanford)`: claimed to be the most efficient logger (runtime and used space). | | Palanteer | Nanolog (Stanford) | spdlog | |---------------------------------------|-----------------------------------------------------|----------------------------------------|-------------------------------------------------------------------------| | **Supported OS:** | (2) Windows, Linux | (3) Linux | **(1) Windows, UNIX like, Mac, Android** | | **Installation:** | **(1) Single header file** | (2) Small library | (2) Header only (but very slow compilation) or library | | **C++ min requirement:** | **(1) C++11** | (3) C++17 | **(1) C++11** | | **String formatting** | printf syntax | printf syntax | `fmt` style | | **Easy to start:** | Just 1 include and start logging | Just 1 include and start logging | Just 1 include and start logging | | **Easy to master:** | **(1) printf-like API with level and category** | **(1) printf-like API with level** | (3) OOP API with `fmt` syntax and logger/sink concepts | | **Extensible:** | (2) No, only builtin types | (2) No, only builtin types | **(1) Yes, custom formatter can be added. Also supports binary arrays** | | **Log files** | (2) Only one file | (2) Only one file | **(1) Various log targets (rotating, daily...)** | | **Compile time selection of logs** | **(1) Per user-defined groups of logs** | (3) All or nothing | (2) Some can be filtered by level | | **Obfuscation of static strings** | **(1) Yes** | (2) No | (2) No | | **Structured context** | **(1) Yes, context can be added through scopes** | (2) No | (2) No | | **Runtime cost per log [^perftest]:** | (2) 19.2 ns | **(1) 10.7 ns** | (3) 1250 ns (unable to reproduce the figures on spdlog github) | | **Log file size [^perftest]:** | (2) 18 MB (with indexing), need a tool to visualize | **(1) 7 MB, need a tool to visualize** | (3) 70 MB, human readable | | **Compilation time[^compiltest]:** | **(1) 2800 logs/s** | (3) 265 log/s | (2) 470 logs/s (with the static library mode) | [^perftest] The performance test is a 1 million time loop on logging "Simple log message with 1 parameter %d" (adapted for `fmt` case) in a multithreaded capable configuration, on a AMD Ryzen 7 5700U. It measures in all cases only the writing inside the internal queue (no formatting nor output).
`Palanteer` figures are obtained with the provided test program and the unit tests (for compilation speed). [^compiltest] This test is based on the compilation time difference between a file and the same file with 1000 calls to logging with 1 parameter, on a AMD Ryzen 7 5700U. **Synthesis**: - `Nanolog`'s strength is its (rightly) claimed CPU and storage size efficiency, at the expense of any other kind of features (it is after all the outcome of a PhD thesis). - `spdlog`'s strength is its complete logging feature list. Its richness and flexibility comes at a price: average global performance (runtime, compilation time...) and not so easy configuration. - `Palanteer`'s strength is its good performances (runtime, compilation time...) and its adequacy for application observation with additional traced context and visualization tool. The lack of choice for the log files (only 1 file) closes the door for permanent logging solutions. Its currently limited OS support may also be a problem. ### Automatic functions instrumentation (GCC only) GCC provides a way to automatically instrument entering and leaving functions. `Palanteer` leverages this mechanism to create function scopes automatically by injecting `plBegin` and `plEnd` calls.
It just requires the [addition of a few flags](getting_started.md.html#quickc++automaticfunctionsinstrumentation) on the compilation line of the files to instrument. This feature brings a quick and easy way to: - profile your program - timeline of the functions execution, flame graph, plots... - memory consumption dynamic, hotspots and leaks (with false positive) - visualize the behavior of your program - checking that the behavior and code path is the one expected - diving into a new program is much easier when you can clearly see all call paths !!! warning Some words of caution Automatic scope instrumentation is by nature unsupervised, so non-optimal for the quantity of generated events.
Some types of information are unaltered by an heavy instrumentation. Memory usage is one of them.
In contrast, timings are affected depending on the nested call quantity. Profiling values cannot be considered as "reliable" if the scope contains many nested sub scopes. GCC provides some ways to mitigate this issue, at least partially. ### Compile-time instrumentation groups Grouping events by user-defined categories is one of the key features.
Each group can independently be compiled or fully ignored (zero instructions in the binary, zero run-time cost, almost zero compilation time cost). Per-topic instrumentation is a powerful tool for code quality: - Deep instrumentation can be kept without any impact inside the code and re-activated only when needed - Groups cover tracing, logging and assertions, they generalize the standard NDEBUG flag - Performance cost versus instrumentation level is fully under user control The "group" instrumentation API is directly derived from the general ones by replacing the prefix `pl` with `plg` and inserting the group name as first parameter. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++ // Declare the group MYGROUP (just an example) with the prefix "PL_GROUP_" and give it the value 1 to be compiled or 0 to be fully ignored #ifndef PL_GROUP_MYGROUP #define PL_GROUP_MYGROUP 1 #endif // => Use the same API but with prefix *plg* instead of *pl* and the group name as first parameter // Named variable value tracing plgData(MYGROUP, "Monster health", monsterHealth); // Variable tracing, shorter equivalent to plgData(MYGROUP, "monsterHealth", monsterHealth) plgVar(MYGROUP, monsterHealth); // Multiple variable tracing, shorter equivalent to the multiple calls with one variable plgVar(MYGROUP, monsterHealth, monsterAttackCoef, monsterName); // Text tracing plgText(MYGROUP, "stage", "Level one reached"); // Scope tracking (automatically closed at scope end) plgScope(MYGROUP, "superFunction"); // Add a printf-like log plgLogInfo(MYGROUP, "user input", "Key %c pressed", input_pressed_key); // ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !!! warning Important! Once used in a `Palanteer` API, the group **must** be defined, else the preprocessor rants (a lot...). A typical instrumentation covers the structure of the program but also inputs, outputs of computations, statistics, states... Remember that quantity is not a problem when using groups! ### External strings Instrumenting the code, to track its execution, obviously means more accessible details inside your binary.
There is at least 2 cases where this might be a problem: - **Obfuscation**: if the production code embeds some instrumentation, the corresponding text strings are present in clear in the binary - **Binary size reduction**: in embedded software, each byte matters. On the other side, instrumentation or any kind of observation is critical to be able to debug efficiently Both cases above are solved by using the "external strings" feature, simply by compiling with the flag `PL_EXTERNAL_STRINGS` set to 1. In this mode, all static strings used by `Palanteer` are stripped out at compilation time, only their hash remain in the binary. !!! On Linux, the standard command `strings <binary name>` shows in clear all text strings of the binary. It is an easy way to check your embeded strings. In order to properly work with a record where the external string feature has been enabled, the additional actions below are required: 1. Generate a lookup file *hash -> strings* from all sources - The provided tool `./tools/stringLookupGenerator.py` parses C++ source files and generates this lookup - A new lookup file shall be generated each time you add or modify an existing string used by `Palanteer` - This process is fast enough to be part of a build system 2. This lookup shall be used to recover the plain content of the strings (the `Palanteer` viewer and Python scripting module do it) * As a fallback, the sad unrecognized strings are displayed as an 8 bytes hexadecimal hash, in the viewer and scripting module !!! The simple tool `./tools/extStringDecoder.py` is an example how to recover obfuscating strings from `stdin` and output the result on `stdout` Example of generation of the static string lookup on the full `Palanteer` viewer sources (on Linux with zsh shell supporting `**`): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ shell palanteer > time ./tools/stringLookupGenerator.py c++/palanteer.h server/**/*.{c,cpp,h} > palanteer_string_lkup.txt ./tools/stringLookupGenerator.py c++/palanteer.h server/**/*.{c,cpp,h} > 1.73s user 0.01s system 99% cpu 1.749 total palanteer > wc -l -c c++/palanteer.h server/**/*.{c,cpp,h} | tail -1 137149 6921792 total palanteer > cat palanteer_string_lkup.txt | head -10 @@CBF29CE484222325@@ @@D756D3636AEC67B4@@ Up pressed @@FE7BA9047D1571F8@@!(*ptr)->isEvent @@B5A1F7BAAE9A272B@@!_receivedMsg @@D91743DED2B2E3D1@@!_screenLayoutToApply.windows.empty() @@678E9C501C94EC94@@!_typeFilters.empty() @@17B27941379D89B8@@!cmDecompressor && !cmCompressor @@C0C639794CD0DE9D@@!empty @@ED9409B179A15E99@@!f.events.empty() @@C191DDD7DCEB6E9F@@!GET_ISFLAT(evt.parentLIdx) palanteer > wc -l palanteer_string_lkup.txt 1165 palanteer_string_lkup.txt ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These logs show: - the full source (~137 Klines and ~6.9 Mcharacters, which includes dependencies like `Dear ImGui` and `Zstd`) is processed in less than 2 seconds - the generated lookup contains 1165 unique strings from events, assertions and filenames !!! tip The tool `./tools/stringLookupGenerator.py` is also used to generate the lookup for the function symbols from a GCC Linux auto instrumented program. ### Enhanced assertions Enhanced assertions ease investigations through provided context elements. This is especially true when an issue is hard to reproduce.
The additional context is of course evaluated only in case of assertion failure. Example: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++ plAssert(a<b); // Standard form plAssert(a<b, "A shall always be less than B"); // Documented form plAssert(a<b, a, b); // Extended form showing the values of 'a' and 'b' when assertion is failed plAssert(a<b, "A shall always be less than B", a, b); // Displays up to 9 parameters... © Ought to be enough for anybody ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As for events, assertions can be conditionally compiled per user-defined categories.
It can be seen as a generalization of the `NDEBUG` compilation flag (which acts as a global switch on standard assert). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++ plgAssert(MYGROUP, a<b); plgAssert(MYGROUP, a<b, "A shall always be less than B"); plgAssert(MYGROUP, a<b, a, b); plgAssert(MYGROUP, a<b, "A shall always be less than B", a, b); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In the example above, the last assertion outputs the text below, optionally followed by the stack trace if enabled. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ none [PALANTEER] Assertion failed: a<b On function: int main(int, char**) On file : c++/testprogram/testProgram.cpp(358) - A shall always be less than B - int a = 314 - float b = 123.000000 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !!! warning Enhanced assertions are using variadic templates which may increase the code size.
By compiling with the `PL_SIMPLE_ASSERT` flag set to 1, the assertions revert to the standard behavior (additional context parameters are ignored), while keeping the "per category compilation" feature. ### Compile time selection of the features > "Pay only for what you use" Most features of `Palanteer` can be selected individually at compile time. | Feature name | Constant | Default | |-------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------|:-----------------------------------| | [Event tracing](#easytracing) | Enabled if `USE_PL`==1 and `PL_NOEVENT` is not 1 | Disabled (`USE_PL` not defined) | | [Log console display](#easylogging) | Enabled if `USE_PL`==1 | Disabled (`USE_PL` not defined) | | [`Palanteer` assertions](#enhancedassertions) | Enabled if `USE_PL`==1 and `PL_NOASSERT` is not 1 | Disabled (`USE_PL` not defined) | | [Remote control](#remotecontrol) | Enabled if `USE_PL`==1 and `PL_NOCONTROL` is not 1 | Disabled (`USE_PL` not defined) | | [Event tracing from group _YYY_](base_concepts.md.html#groups) | Enabled if event tracing is enabled and `PL_GROUP_YYY` is 1 | `PL_GROUP_YYY` **must** be defined | | [Assertions from group _YYY_](base_concepts.md.html##groups) | Enabled if `Palanteer` assertions is enabled and `PL_GROUP_YYY` is 1 | `PL_GROUP_YYY` **must** be defined | | [External strings](#externalstrings) | Enabled if `PL_EXTERNAL_STRINGS` is set to 1 | Disabled | | [Simple (standard) `Palanteer` assertions](#enhancedassertions) | Enabled if `PL_SIMPLE_ASSERT` is set to 1 | Enhanced assertions | | [Memory tracking with new/delete overload](instrumentation_configuration_cpp.md.html#pl_impl_overload_new_delete) | Enabled if `PL_OVERLOAD_NEW_DELETE` is set to 1 | Enabled | ## Python features Python features differ from the C++ ones because of the nature of the language, interpreted, dynamic and high level. For instance: - No concept of "compile-time" - No need for enhanced assertions - Builtin profiling API available - More automation possible thanks to looser control on the hardware - ... ### Automatic partial instrumentation The CPython interpreter offers dedicated profiling hooks that `Palanteer` leverages to automatically trace: - Entering or leaving a function - Raised exception, caught or not - Memory allocations (at the interpreter level) - Garbage collection runs - Coroutine detection with automatic naming This means that **unmodified code** can already provide useful insights, by using the same mechanism than the standard `cProfile` module: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ none > python3 -m palanteer program.py ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this case, the instrumentation module connects to a server (viewer, or remote script), if present. The full usage is: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ none > python3 -m palanteer Palanteer profiler usage: Either: 1) With code instrumentation: insert a call to palanteer.plInitAndStart(app_name, ...) in your main function. See Palanteer documentation for details. Manual instrumentation can provide much more information (data, locks, ...) than just the automatic function profiling. 2) With unmodified script: 'python -m palanteer [options] <your script>' This syntax is similar to the cProfile usage. No script modification is required but only the function timings and exceptions are profiled. Options for case (2): -s <server IP address> Set the server IP address (default is 127.0.0.1) -p <server TCP port> Set the server port (default is 59059) -f <filename.pltraw> Save the profiling data into a file to be imported in the Palanteer viewer (default=remote connection) -nf Do not automatically trace the functions (default=trace functions) -ne Do not automatically trace the exceptions (default=trace exception) -nm Do not trace the memory allocations (default=trace memory allocations) -ng Do not automatically trace the garbage collector runs (default=trace gc) -c Do automatically trace the C functions (default=only python functions) -w Wait for server connection (Palanteer viewer or scripting module). Applicable only in case of remote connection. -m Run the app as a module (similar to python's "-m" option) Note 1: in case of connection to the server and -w is not used and no server is reachable, profiling is simply skipped Note 2: on both Windows and Linux, context switch information is available only with root privileges (OS limitation) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ And with additional manual instrumentation, even better knowledge can be collected: - content of variables (integral types or strings) - manual scopes inside functions - lock usage ## Common features The features below are common to both C++ and Python instrumentation. ### Offline recording Instrumentation libraries propose to save the events directly in a file, without any server side.
This option is activated by using the mode `PL_MODE_STORE_IN_FILE` in the initialization (`plInitAndStart`), and the record filename has the extension `.pltraw`. **The benefits are:** - Recording can be performed even if no network is available - Higher event recording rate can be reached thanks to the removal of the processing bottleneck (server side) - Buffer saturation is less likely to happen due to faster local processing (=disk write) - Ex: 18 Mevent/s seen with `testprogram perf -f` - The program observation is less disturbed in case the server shares the same machine - No active server side means no CPU sharing effects, in particular if the available core quantity is limited on the machine - No viewer means no graphical card/system sharing effects, in particular if the observed program involves graphics **The cons are:** - No remote control - No real time analysis - The processing bottleneck to index all the events will happen when importing the file in the viewer The content of the `.pltraw` file is simply the exact payload sent to the server in case of connection.
Importing such file in the viewer is equivalent of a replay of the program transmission, but from a file. The file size is 24 times (12 times in compact mode) the event quantity (note that memory operations take 2 events), plus the size of all unique strings and a light protocol overhead. Example with the test program. The resulting `example_record.pltraw` shall be imported in the viewer (menu bar `File->Import`): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ none λ .\bin\testprogram.exe perf -f Mode 'file storage' Collection duration : 23.07 ms for 1000000 events Collection unit cost: 23 ns Processing duration : 53.96 ms (w/ transmission and disk file writing) Processing rate : 18.533 million event/s Max buffer usage : 29345536 bytes (48.91% of max) λ ls -l example_record.pltraw -rw-r--r-- 1 damien 24001446 Aug 1 13:43 example_record.pltraw ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ### Memory usage and context switches Two kinds of data are automatically collected: - Memory allocations and deallocation - OS context switches **Memory** - In C++, by default, memory allocations are tracked by overloading the `new` and `delete` operators - In Python, the allocations are tracked on the interpreter level (i.e. what is seen by the OS) - Tracking at the "object" level would insanely bloat records, as even an integer is an object and requires an internal (optimized) allocation. - Note: memory event locations are identified only by the instrumented scopes. - Collecting stacktraces is a heavy process, definitely outside the "light overhead" rules of the tool **Operating System context switches** - Context switches correspond to the dynamic association between a system thread and a CPU "logical core" - This information is critical for investigation of performance when CPU is saturated - Context switches information is accessible only with privileged rights (i.e. as root or administrator) ### Multistream Up to **8** simultaneous streams can be recorded at the same time.
All received events are merged and recorded as if they were coming from the same system. This feature is enabled on the viewer side, by selecting the "`multistream mode`" in the menu bar. It slightly modifies the behavior of the recorder: * Socket connections are accepted at any time during the recording, until no more connection are active. * Note that a terminated connection does not give back the space for another connection. * Multiple files can be selected in case of import from file * The name of the record is the one provided manually in the `multistream mode` menu.
It overrides all received application names, which are an attribute of the streams composing this record. * All threads are prefixed by the application name of their stream * If a thread is in a group, the group is prefixed. !!! tip Important Multistream setups have some constraints, listed below. **Clocks** Because heavily dependent on the observed system, clocks of streams shall be synchronized on the program sides.
Processes from the same running OS are automatically synchronized.
However, streams from different machine shall be synchronized externally, as `Palanteer` has no way to get the information to automate this phase. This usually implies overriding the clock fonction `PL_GET_CLOCK_TICK_FUNC()`. !!! warning Any bias or clock drift will directly impact the consistency of the events between streams.
The absolute date origin is extracted from the first connection: any subsequent connection with a strong negative bias would result in zero thresholded dates.
This constraint does not exist with imported files, as the absolute date origin is the oldest one of the set of files. **Performance** The recording performance can be slightly reduced due to the additional indirections to merge events from different streams.
As the server is usually the bottleneck during recording, the aggregated global stream shall not exceed the monostream maximum processing capacity (see the [performance section](index.html#performance)) **Context switches** On server side, context switches are supported in multistream mode.
However, for streams coming from the same OS, multiple simultaneous instances of collections of such events does not work well.
For instance: - on Windows, using such OS API is globally exclusive - on Linux, the configuration is global to the OS and the reconfiguration from a subsequent stream may corrupt data for all streams. **Configuration consistency** Multistream requires some consistency among streams (these constraints may be removed later): - same size of clock (32 or 64 bits) - same size of hash (32 or 64 bits) - same hash salt A stream not compliant with previous registered ones will be ignored. **Locks** All locks are considered internal to a stream, inter-process locks are not supported.
This means that a lock shall be unique among all streams.
Any lock from a stream having the same name as an existing one of a previous stream will be suffixed with `#`. ### Remote control Instrumentation libraries let you observe the execution of programs. It also lets you dynamically interact with them.
With the control of both the stimulation and the observation, the range of new possibilities is wide: * Conformance tests for a particular stimulation (matches the need of "system tests", "integration tests", "functional tests"...) * KPI/performance/metric measurements * Live service monitoring or control * ... **Scripted dynamic control** The `Palanteer` Python scripting module gives you the remote control of the running state of a selection of threads.
The code shall be instrumented with calls to `plFreezePoint()` on well chosen locations that will act as synchronization points.
The remote scripts activate or release the "freeze" on these freeze points on a per-thread basis (see [`program_set_freeze_mode`](scripting_api.md.html#program_set_freeze_mode) and [`program_step_continue`](scripting_api.md.html#program_step_continue)), and are notified of freeze states. The typical usage of such synchronizations is to be able to reliably stimulate the program, at a controlled moment. **Scripted stimulation with CLIs** The stimulation part is done via remotely executed commands named CLI (Command Line Interface) that are declared beforehand inside the program. CLIs' input and output are text based. The reasons of such a choice are: * The easiness to use, as declaring a CLI requires only 3 standard strings * Inputs and outputs are generic and uniform, making their processing also uniform and light * The target is to stimulate the program, not to build another fully-fledged RPC (Remote Procedure Call) middleware. The registration of a CLI for a given handler is done with the `plRegisterCli` function (see [CLI](base_concepts.md.html#baseconcepts/commandlineinterface) description for details), as shown in the example below: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++ // CLI registration parameters // handler : function to call for this CLI // name : name of the command (no space allowed) // specParams : specification of the parameters, as a string // description: description of the command and its usage // C++: plRegisterCli(bananaHandler, "getNiceText", "banana=int temperature=float mantra=string", "The purpose of this function is to return some text, just because we are worth it"); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ or ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ python # Python: plRegisterCli(bananaHandler, "getNiceText", "banana=int temperature=float mantra=string", "The purpose of this function is to return some text, just because we are worth it") ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The action of the CLI handler is user-defined (ex: modify some state, launch internal functions...). Just be cautious with race conditions, as CLIs are called from the `Palanteer` reception thread. On the script side, calling a CLI is even simpler: - input is a text string starting with the CLI name followed by its parameters (see scripting API [`program_cli`](scripting_api.md.html#program_cli) for details) - output is a status and a reponse string (or error explanation) See an example below: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python status, text_answer = palanteer.program_cli("getNiceText banana=23 temperature=37.2 mantra=Heaven") ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Scripted observation** After control and stimulation, the last piece of the scripting puzzle is the observation of the events sent by a program. The scripting module allows to: - select events in a simple way while keeping the notion of hierarchy and association (see [`data_configure_events`](scripting_api.md.html#data_configure_events)) - poll and receive them per batch (see [`data_collect_events`](scripting_api.md.html#data_collect_events)) The choice of receiving events synchronously, by polling, is motivated by the fact that scripts shall stay simple.
Introducing callbacks or any kind of multi-threading at this level would not bring anything more except a whole range of problems for users. **Full example showing dynamic control, stimulation and observation** Below is a simple example of a C++ program instrumented with `Palanteer` and generating 100 000 random integers. The range can be remotely configured with a CLI. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++ // File: example.cpp // On Linux, build with: g++ -DUSE_PL=1 -I <palanteer C++ instrumentation folder> example.cpp -lpthread -o example #include // For "rand" #define PL_IMPLEMENTATION 1 // The instrumentation library shall be "implemented" once #include "palanteer.h" int globalMinValue = 0, globalMaxValue = 10; // Handler (=user implementation) of the example CLI, which sets the range void setBoundsCliHandler(plCliIo& cio) // 'cio' is a communication helper passed to each C++ CLI handler { int minValue = cio.getParamInt(0); // Get the 2 CLI parameters as integers (as declared) int maxValue = cio.getParamInt(1); if(minValue>maxValue) { // Case where the CLI execution fails. The text answer contains some information about it cio.setErrorState("Minimum value (%d) shall be lower than the maximum value (%d)", minValue, maxValue); return; } // Modify the state of the program. No care about thread-safety here, to keep the example simple globalMinValue = minValue; globalMaxValue = maxValue; // CLI execution was successful (because no call to cio.setErrorState()) } int main(int argc, char** argv) { plInitAndStart("example"); // Start the instrumentation, for the program named "example" plDeclareThread("Main"); // Declare the current thread as "Main" so that it can be identified more easily in the script plRegisterCli(setBoundsCliHandler, "config:setRange", "min=int max=int", "Sets the value bounds of the random generator"); // Declare our CLI plFreezePoint(); // Add a freeze point here to be able to configure the program at a controlled moment plBegin("Generate some random values"); for(int i=0; i<100000; ++i) { int value = globalMinValue + rand()%(globalMaxValue+1-globalMinValue); plData("random data", value); // Here are the "useful" values } plEnd(""); // Shortcut for plEnd("Generate some random values") plStopAndUninit(); // Stop and uninitialize the instrumentation return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The same, `example.py` (make it executable on Linux), written in python: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python #! /usr/bin/env python3 import sys import random from palanteer import * globalMinValue, globalMaxValue = 0, 10 # Handler (=implementation) of the example CLI, which sets the range def setBoundsCliHandler(minValue, maxValue): # 2 parameters (both integer) as declared global globalMinValue, globalMaxValue if minValue>maxValue: # Case where the CLI execution fails (non null status). The text answer contains some information about it return 1, "Minimum value (%d) shall be lower than the maximum value (%d)" % (minValue, maxValue) # Modify the state of the program globalMinValue, globalMaxValue = minValue, maxValue # CLI execution was successful (null status) return 0, "" def main(argv): global globalMinValue, globalMaxValue plInitAndStart("example") # Start the instrumentation plDeclareThread("Main") # Declare the current thread as "Main", so that it can be identified more easily in the script plRegisterCli(setBoundsCliHandler, "config:setRange", "min=int max=int", "Sets the value bounds of the random generator") # Declare the CLI plFreezePoint() # Add a freeze point here to be able to configure the program at a controlled moment plBegin("Generate some random values") for i in range(100000): value = int(globalMinValue + random.random()*(globalMaxValue+1-globalMinValue)) plData("random data", value) # Here are the "useful" values plEnd("") # Shortcut for plEnd("Generate some random values") plStopAndUninit() # Stop and uninitialize the instrumentation # Bootstrap if __name__ == "__main__": main(sys.argv) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ And a remote script, `remoteScript.py`, which launches one of the previous programs (C++ or Python), connects to it, then configures the range of random values and compute some statistics on them: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python #! /usr/bin/env python3 import sys import palanteer_scripting as ps def main(argv): if len(sys.argv)<2: print("Error: missing parameters (the program to launch)") sys.exit(1) # Initialize the scripting module ps.initialize_scripting() # Enable the freeze mode so that we can safely configure the program once stopped on its freeze point ps.program_set_freeze_mode(True) # Launch the program under test ps.process_launch(sys.argv[1], args=sys.argv[2:]) # From here, we are connected to the remote program # Configure the selection of events to receive my_selection = ps.EvtSpec(thread="Main", events=["random data"]) # Thread "Main", only the event "random data" ps.data_configure_events(my_selection) # Configure the program status, response = ps.program_cli("config:setRange min=300 max=500") if status!=0: print("Error when configuring: %s\nKeeping original settings." % response) # Disable the freeze mode so that the program resumes its execution ps.program_set_freeze_mode(False) # Collect the events as long as the program is alive or we got some events in the last round qty, sum_values, min_value, max_value, has_worked = 0, 0, 1e9, 0, True while ps.process_is_running() or has_worked: has_worked = False for e in ps.data_collect_events(timeout_sec=1.): # Loop on received events, per batch has_worked, qty, sum_values, min_value, max_value = True, qty+1, sum_values+e.value, min(min_value, e.value), max(max_value, e.value) # Display the result of the processed collection of data print("Quantity: %d\nMinimum : %d\nAverage : %d\nMaximum : %d" % (qty, min_value, sum_values/max(qty,1), max_value)) # Cleaning ps.process_stop() # Kills the launched process, if still running ps.uninitialize_scripting() # Uninitialize the scripting module # Bootstrap if __name__ == "__main__": main(sys.argv) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The execution of `remoteScript.py` (on Linux) gives the following output: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ none > time ./remoteScript.py example Quantity: 100000 Minimum : 300 Average : 400 Maximum : 500 ./remoteScript.py example 0.62s user 0.02s system 24% cpu 2.587 total > time ./remoteScript.py python3 example.py Quantity: 100000 Minimum : 300 Average : 400 Maximum : 500 ./remoteScript.py python3 example.py 0.86s user 0.02s system 33% cpu 2.624 total ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this "heavy tracing in loop" scenario, the scripting part is the bottleneck, hence the small difference between the `C++` and the `Python` program execution times. ### Palanteer viewer ![_](images/views.gif) The viewer has two main roles: - **record and store the events** from the execution of an instrumented program, to be analyzed later - **display records** in a way that developers can debug, profile, optimize speed and memory, check behavior correctness, etc... **Recording** The 2 ways to create a record from an instrumented program are: - live by remote connection with the program launched in **'connected mode'** - offline by importing a .pltraw file generated with a program launched in **'file storage'** mode The viewer always listens so that launching your instrumented program in 'connected' mode is enough to connect both.
If a direct connection is not possible nor desirable, the offline recording in file is the way to go. The event processing will occur at import time.
Records are listed in the 'Catalog' window, per program and in chronological order.
A nickname, which correspond to the 'build name' sent by the instrumented program, can be provided or edited to easily recall a particular one. **Views** Once loaded, a record can be visualized through any of these views: | View | Description | Dynamic | |---------------|--------------------------------------------------------------------------------|:-----------------:| | **Timeline** | Global and comprehensive display of the chronological execution of the program | Instantaneous | | **Memory** | Per thread chronological representation of the memory allocations and usage | Instantaneous | | **Text** | Per thread text hierarchy of the recorded events | Instantaneous | | **Plot** | Curve plot of any kind of event | Instantaneous | | **Histogram** | Histogram of any event kind | Need computations | | **Profile** | Per thread flame graph or array of timings, memory allocations or memory usage | Need computations | **Workspaces** The views arrangement, aka 'workspace', is adjustable simply by dragging window title bars or borders.
The current workspace can be saved as a named 'template layout' in the 'View' menu and recalled later at any time. **Navigation** If you had only one key to remember, it would be: | Action | Description | |--------|-------------------------------------------| | **H key** | Dedicated help for the window under focus | Unless not applicable or specified otherwise in the dedicated help window, the other usual actions for navigation are: | Action | Description | |----------------------------------|----------------------------------------------------| | **H key** | Dedicated help for the window under focus | | **F key** | Toggle full view screen | | **Ctrl-F key** | Text search view | | **Ctrl-P key** | Screen capture | | **Right mouse button dragging** | Move the visible part of the view | | **Left/Right key** | Move horizontally | | **Ctrl-Left/Right key** | Move horizontally faster | | **Up/Down key** | Move vertically | | **Mouse wheel** | Move vertically | | **Middle mouse button dragging** | Measure/select a time range | | **Ctrl-Up/Down key** | Time zoom | | **Ctrl-Mouse wheel** | Time zoom | | **Left mouse** | Time synchronize views of the same group | | **Double left mouse click** | Time and range synchronize views of the same group | | **Right mouse click on an item** | Open a contextual menu related to the item | | **Hover an item** | Display a tooltip with detailed information | **Views synchronization** Views can be 'associated' so that they share the same time range and react to each other. This is called 'view synchronization'.
This association is chosen in the top right combobox of the views By default, all views are associated with the Group 1 . The 'Group 2' provides a second shared focus.
A view can also be 'Isolated' and become independent of all others. **Exports** Several exports of the data are available: * **Screen capture**: a screenshot is performed at any moment when hitting the `Ctrl-P` key (`png` format) * **Chrome Text Format**: this global export of the events is proposed in the `File` menu of the main menu bar (`json` format) * **Text format**: a per-thread text export of the events is proposed in the contextual menu of text views * **CSV format**: a per-event-type export of the events is proposed in the contextual menu of plot views when hovering the legend ## Requirements Only very usual dependencies are required.
Other dependencies are snapshotted inside this repository (see [dependencies](#thirdpartydependencies)). **C++ single-header instrumentation library** - Compiler C++11 or above - OS: Linux (32 or 64 bits) and Windows 10 - This list of supported platform could be extended with [additional work](more.md.html#addinganewplatformforinstrumentation) - Optional requirement (for Linux only): stacktrace dump requires both libunwind and libdw - On Ubuntu/Debian, just type: `sudo apt install libunwind-dev libdw-dev` - On Windows, stacktrace dump works out of the box - Tested processors: x86, x64, armv7l - Tested compilers: gcc8+, clang7+, VS 2019, VS2022 - MIT License (i.e. no constraint on disclosing your sources). **Python instrumentation module** - Same requirement as the C++ instrumentation library (wrapped) - CMake - Python 3.7 or above - As a C extension module is used, only the CPython interpreter is supported. - `setuptools` Python package - Usual Python packaging tool. If not present, install it with: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ none pip install setuptools ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - MIT License (i.e. no constraint on disclosing your sources). **Viewer** - Compiler C++14 or above - Mainly because C++11 does not have the convenient structure list initializer - CMake - OS: Linux 64 bits or Windows 10 - External dependencies: OpenGl >= 3.3, CMake - Internal third party libraries (snapshot inside this repository, so just for your information), in particular: - `Dear Imgui` (graphical toolkit) - `zstd` (compression library) - Tested compilers: gcc8., clang7+, VS 2019, VS 2022 - AGPLv3+ License (except the lower level "platform layer" which is under the MIT License) **Python remote scripting module** - CPython 3.7 or above - `setuptools` Python package - CMake - Compiler C++14 or above - OS: Linux 64 bits or Windows 10 - AGPLv3+ License ## Performance C/C++ is a language made to juice performances from hardware. Having an idea of the impact of the instrumentation on them is a must. For Python, this requirement is weaker because the language is interpreted. ### C++ instrumentation performance Assessing the characteristics of the instrumentation library is not straightforward as it depends on the compiler, the configuration, and how the API is used.
However, two kinds of performance indicators are provided. **1) A selection of measurements has been implemented inside the self tests**
The `Palanteer` internal "performance" test suite was run on a laptop with a Core I7-7600U on both Linux (gcc) and Windows 10 (MSVC). !!! tip Results are approximates, they just give an idea on the real figures.
Code sizes vary with the quantization of the executable sizes. A precise measure should be based on the machine code, but it is harder to automate. The results for Linux (Debian 10, gcc) are: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ none =========================================================================== Assert (simple) code size | 20.0 bytes/assert | Assert code size | 16.0 bytes/assert | Assert+2 integers code size | 80.0 bytes/assert | Compilation speed (-O0) | 1731 event/s | Compilation speed (-O2) | 451 event/s | Event code size (-O2) | 64 bytes/event | Event tracing runtime | 6.1 ns | Palanteer code size - Context switch (-O2) | 12320 bytes | Palanteer code size - Control part (-O2) | 28696 bytes | Palanteer code size - Total (-O2) | 66152 bytes | Palanteer include - USE_PL=0 | 0.006 s | Palanteer include - USE_PL=0 + | 0.237 s | Palanteer include - USE_PL=1 | 0.262 s | Palanteer include - USE_PL=1 + impl. | 0.773 s | =========================================================================== ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The results for Windows 10 (MSVC 2019) are: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ none =========================================================================== Assert (simple) code size | 19.0 bytes/assert | Assert code size | 19.0 bytes/assert | Assert+2 integers code size | 91.0 bytes/assert | Compilation speed (-Od) | 878 event/s | Compilation speed (-O2) | 241 event/s | Event code size (-O2) | 66 bytes/event | Event tracing runtime | 4.3 ns | Palanteer code size - Context switch (-O2) | 2560 bytes | Palanteer code size - Control part (-O2) | 40448 bytes | Palanteer code size - Total (-O2) | 313344 bytes | Palanteer include - USE_PL=0 | 0.009 s | Palanteer include - USE_PL=0 + | 0.318 s | Palanteer include - USE_PL=1 | 0.328 s | Palanteer include - USE_PL=1 + impl. | 0.648 s | =========================================================================== ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Some details on the test: * `abcdefghij` and `i` are integers * The tested assertions are `plAssert(abcdefghij++)` and `plAssert(abcdefghij++, abcdefghij, i)` * The tested event tracing is `plVar(abcdefghij)` **Synthesis of the results** * **Tracing cost**: around 5 ns (very optimistic figure, probably due to the used small loop) for ~65 bytes of code size (on x64 instructions set) * The few nanosecond timing ensures that tracing performances are accurate * Note: it does not mean that the system is able to trace 200 million event/s, the bottleneck is the event processing on server side (see next indicator (2)). * **Compilation speed**: at least ~900 event/s in debug mode. This is acceptable even for an heavily instrumented file, especially compared to the usual C++ compilation speed. * the timings with an optimized build are 4 times slower but are consistent with the global compilation slow down, nothing specific to `Palanteer`. * **Include cost on Linux**: * Include of disabled instrumentation library is costless (few milliseconds) * Include of enabled instrumentation library is very close to the cost of the inner inclusion of `#include ` * Using the C++ standard library in a project may fully mask the build cost of including the `Palanteer` instrumentation library * **Include cost on Windows**: * Include of disabled instrumentation library is costless (few milliseconds) * As for Linux, the cost of including the enabled instrumentation library is very close to the cost of the inner inclusion of `#include `, and depending on the other includes of the C++ standard library, this cost may be masked. * The asserts and event tracing code size and duration are roughly consistent across OSes. * the core instrumentation code size is not **2) Another kind of run-time speed measurement is done via the launch of the provided C++ example program "`testprogram perf`" with connection to the viewer.**
This speed measurement is closer to a real use case as it includes the event transmission, the event processing on the server side, and its real-time display by the viewer. The results for Linux (Debian 10 in a VirtualBox VM with gcc, context switch tracing disabled) are: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ none > ./bin/testprogram perf Mode 'connected' Collection duration : 19.22 ms for 1000000 events Collection unit cost: 19 ns Processing duration : 203.85 ms (w/ transmission and server processing) Processing rate : 4.906 million event/s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The results for Windows 10 (with MSVC 2019, context switch tracing disabled) are: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ none λ bin\testprogram.exe perf Mode 'connected' Collection duration : 21.39 ms for 1000000 events Collection unit cost: 21 ns Processing duration : 211.63 ms (w/ transmission and server processing) Processing rate : 4.725 million event/s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !!! note Synthesis - The event tracing in a "for loop" is around 25 ns in average ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++ for(int i=0; i<iterationQty; ++i) { plBegin("TestLoop"); plData("Iteration", i); plData("Still to go", iterationQty-i-1); plEnd("TestLoop"); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - We see that the bottleneck is clearly the server side, by a factor 10 !!! warning Important In case of running out of instrumentation resources, namely free space in event collection buffer or dynamic string from the pre-allocated pool, threads busy-wait until the collection thread recycles them.
An error log "`SATURATION`" is also inserted in the `Palanteer` collection thread to indicate the degradation of the tracing quality.
This shall be fixed by increasing the available resources on the instrumentation side.
### C++ memory usage `Palanteer` has many compile-time configuration variables to dimension the services.
Some of them have minor effects, some of them can lead to a failed assertion when exceeding the maximum resource quantity.
This may seem annoying but it is the price to pay to remove any interference with the observed program and give a full control on the memory footprint. !!! In a desktop environment, the default values should let you have a smooth experience without tweaking them.
In an embedded or memory constrained environment, the compile-time configuration variables will be precious. All the allocations are done at initialization time in `plInitAndStart`, except for the lookup which tracks the string hashes which can be resized if needed.
The initial size of this lookup is configurable, so it is possible to prevent any reallocation if you have an estimation of the quantity of unique strings that the program will use.
The exact unique string quantity during/after a run is available in the `plStats` structure. All allocations are freed when `plStopAndUninit` is called. The following table provides the effect of the configuration variables related to memory for the `Palanteer` services: | Parameter | Description | Saturation effect | Default value | Memory consumption | Memory usage with default values | | --------- | ------- | ------ | :-------: | :----: | :----: | | PL_IMPL_COLLECTION_BUFFER_BYTE_QTY | Buffer size for event collection | Busy wait | 5000000 | 2.375 * `Value` | 11875 KB | | PL_IMPL_DYN_STRING_QTY | Pool size for dynamic string | Busy wait | 1024 | `PL_DYN_STRING_MAX_SIZE` * `Value` | 512 KB | | PL_DYN_STRING_MAX_SIZE | Max length of dynamic strings | String truncation | 512 | (see above) | - | | PL_IMPL_REMOTE_REQUEST_BUFFER_BYTE_QTY | Buffer size for command requests (rx) | Assertion failure at reception | 8192 | 2 * `Value` | 16 KB | | PL_IMPL_REMOTE_RESPONSE_BUFFER_BYTE_QTY | Buffer size for command response (tx) | Bad command status | 8192 | 3 * `Value` | 24 KB | | PL_IMPL_STRING_BUFFER_BYTE_QTY | Buffer size for new strings (tx) | Sending in multiple batches | 8192 | 1 * `Value` | 8 KB | | PL_IMPL_MAX_EXPECTED_STRING_QTY | Initial size for unique string lookup | Reallocation | 4096 | 16 * first power of two above `Value` | 64 KB | | PL_IMPL_MAX_CLI_QTY | Pool size for registered CLIs | Assertion failure at registration | 128 | (64+16*`PL_IMPL_CLI_MAX_PARAM_QTY`) * `Value` | 24 KB | | PL_IMPL_CLI_MAX_PARAM_QTY | Maximum parameter qty per CLI | Assertion failure at registration | 8 | (see above) | - | | **TOTAL** | | | | | 12513 KB | !!! warning This table does not take into account: * the `Palanteer` core code size (see [section above](#c++instrumentationperformance) for an estimation for x64 instructions set) * the `Palanteer` core context (few KB) * the static strings in the program under observation. Using the external string feature fully cuts out this cost. With the default values, the main cost is from far the event collection buffers. Their size allows a peak rate of ~100 K events for a collection period (around ~5 ms) so more than 15 million event/second as average.
However, only the peak rate matters: if you have a burst of events within a few milliseconds which exceeds the buffer size, your observation will be damaged by some busy-waiting.
If your event peak rate is smaller, the buffer size can be reduced. ### Python instrumentation performance As for C++, a way to estimate the Python instrumentation run-time speed measurement can be done via the launch of the provided Python example program "`./python/testprogram/testprogram.py perf`" with connection to the viewer.
This example program is very similar to the C++ one, as their purposes are the same: be a showcase and stimulate all instrumentation APIs.
The main difference is the reduction of a factor 10 in the (artificial) computation loops. Time to completion cannot be compared.
As for the C++ experiment, this one includes the event transmission, the event processing on the server side, and its real-time display by the viewer. The results for Linux (Debian 10) are: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ none > ./testprogram.py perf Collection duration : 723.55 ms for 1000000 events Collection unit cost: 724 ns Processing duration : 746.86 ms (w/ transmission and server processing) Processing rate : 1.339 million event/s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The results for Windows 10 (with MSVC 2019) are: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ none λ .\testprogram.py perf Mode 'connected' Collection duration : 557.69 ms for 1000000 events Collection unit cost: 558 ns Processing duration : 579.26 ms (w/ transmission and server processing) Processing rate : 1.726 million event/s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !!! note Synthesis - The event tracing in a "for loop" is around 700 ns in average ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++ for i in range(loopQty): plBegin("TestLoop") plData("Iteration", i) plData("Still to go", loopQty-i-1) plEnd("TestLoop") } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - We see that the bottleneck is the instrumentation side, for both OS - Python is an interpreted language, and despite the fact that most of the tracing task is done in the C extension library, it has a cost - Windows results are ~30% faster than on Linux, this is probably due to the lack of CPU on Linux VM (software GL renderer taking at least 1 CPU) - The tracing rate of ~1.5 million event/s looks acceptable, but remember that it is program-bounded and the program shall do something else than tracing... ## Third party dependencies The instrumentation libraries have only very common dependencies which are installed with most OS: - a C++11 (or above) compiler - CPython 3.7+ with setuptools (Python instrumentation only) The `Palanteer` viewer and scripting module have a bit more requirements (C++14 and only Linux 64 bits or Windows 10), but still very common.
Its additional dependencies, listed below, are already snapshotted inside this repository so no particular action is required at installation time. | Dependency name | License type | URL | Used by | Location in the project | |----------------------------------|-----------------------------|------------------------------------------------|-------------------|------------------------------| | Khronos OpenGL API and Extension | MIT | https://www.khronos.org/registry/OpenGL/api/GL | Viewer | server/external/ | | Dear ImGui | MIT | https://github.com/ocornut/imgui | Viewer | server/external/imgui | | stb | Public domain | https://github.com/nothings/stb | Viewer | server/external/stb*.h | | Fonts 'Roboto-Medium.ttf' | Apache License, Version 2.0 | https://fonts.google.com/specimen/Roboto | Viewer | server/viewer/vwFontData.cpp | | ZStandard | BSD | https://facebook.github.io/zstd | Viewer, scripting | server/external/zstd | | Markdeep | BSD | https://casual-effects.com/markdeep | Documentation | doc/ | !!! NOTE No, the "top of tree" versions are not always the best to use...
Snapshotting dependencies means: - easier installation and build, as these dependencies are already resolved without any user action - compatibility and global functionalities are verified by the project maintainer - more stability through time as no uncontrolled external change is injected in the project at random moments - Using a recent version of a dependency may indeed fix some bugs in it but is also likely to introduce some new and nasty ones for the project. - Not even mentioning changes on the APIs or its behavior... ## Licenses `Palanteer` uses two different licenses, depending on the components and their inherent constraints: 1. Instrumentation libraries shall preserve the user's freedom to distribute their program in closed sources 1. Improvement on the server tooling side shall benefit to the community !!! To remove any ambiguity, each folder contains the associated license and each file has a license header. The big lines are: - `./c++`, `./python` and `./tools` are under the **MIT license** - these folders contain the instrumentation libraries and helper tools - This permissive license preserves developers's rights about distributing their software, even if delivered with instrumentation (modified or not). - `./server/base` is also under the **MIT license** - as an exception for server side, the code in this folder, if useful, can be reused in closed source projects. - `./server/common`, `./server/viewer` and `./python/python_scripting` are under the **AGPL v3+ license** - these parts shall benefit to the community (i.e. sources must be shared if a derivative is distributed) while free to use and modify - "Affero" version of the GPL was naturally chosen to cover also the case of distribution over network !!! Tip To be extra clear: - when instrumenting a program with `Palanteer`, there is **no constraint to open the source code** of a program if distributed (instrumentation is under MIT license) - on the other side, any modification or reuse of the server code (viewer or scripting module) is subjected to the AGPL v3+ license, if distributed

@@ Getting started

@@ Base concepts

@@ C++ Instrumentation API

@@ C++ Instrumentation configuration

@@ Python Instrumentation API

@@ Scripting API

@@ More