Added settings view to the api check script, both view interface (column
names and types) and values (SELECT * FROM pg_stat_monitor_settings)
were added for comparison between branches.
To check if a bucket has expired, a comparison of the time elapsed
since last bucket change was being done in get_next_wbucket() function
using the following line:
while ((current_usec - current_bucket_usec) > (PGSM_BUCKET_TIME
* 1000 * 1000))
The problem is that the expression compares a uint64 (current_usec)
with a int32 (PGSM_BUCKET_TIME), if a given user configures a value for
pgsm_bucket_time GUC (let's call it T) that could overflow int32 range
in the expression T*1000*1000 > 2**31-1, then the result would be a
negative integer cast to (uint64), resulting in a large uint64 value that
would evaluate the expression as false, thus never updating bucket
number.
When querying pg_stat_monitor view, for every entry it's verified if
the entry has not yet expired by calling IsBucketValid(bucket_number).
Using the entry's bucket number the function calculates if the time
since the bucket started, using shared global variable
pgss->bucket_start_time[bucket_id], is still valid.
Since pgss->bucket_start_time is not properly initialized in
get_next_wbucket(), the function IsBucketValid() will always
return false, thus not listing any entries in the view.
Some queries were taking less than one millisecond to execute, since we
store cpu time values in milliseconds in a float, a value of 0.1 (a tenth
of millisecond) cast to int64 would truncate the fractional part and
the result would be zero.
To fix the problem, the cast to (int64) when calculating the average cpu
time was removed, when updating entry statistics in pgss_update_entry()
function.
This test ensures that pg_stat_monitor moves only pending queries to the
next bucket when transitioning to a new one, also check that it keeps
the finished queries in the original bucket.
This commit brings following changes:
1) Implementation of tap based testing using perl language for different
scenarios that could not be covered under traditional SQL based diff testing,
or require server start/shutdown. At this point of time, tap testing is only
enabled for Postgres 14 & 13, for rest of back branches it will be done at
laster time as there is substantial change in number of columns and their
names.
2) Changes to github action workflows for Postgres 14 & 13 to accomodate the
requirements for tap testing.
3) Similarly, minor changes to Makefile are also done.
4) Testing of supported GUCs using tap tests for different possible
configuration.
5) pg_stat_monitor_reset_errors testing using the tap testcases.
6) Insufficient shared space and buffer overflow testing via tap testcases.
7) Some sql scripts under 'scripts' folder to generate some work load requried
for tap test cases.
8) Everything under 't' folder is specific to perl based test cases. It houses
perl files and folders for some expected files and result folder.
9) 90%+ code coverage for LOC and functions.
10) PG-339 Fix by diego, change in pgsm_errors.c.
Fixed a small typo introduced by commit
eaa9c08e24506529a317c20d419f33304c0d0918: PG-316: Convert blocks
timings to average per buckets.
The e->counters.sysinfo.utime = and e->counters.sysinfo.stime =
operations should be += to accumulate the results and calculate the
average correctly.
After fixing the problem with utility statements, this whole block:
do $$
declare
n integer:= 1;
begin
loop
PERFORM a,b,c,d FROM t1, t2, t3, t4
WHERE t1.a = t2.b AND t3.c = t4.d ORDER BY a;
exit when n = 1000;
n := n + 1;
end loop;
end $$;
Is only processed once, as those are nested statements, in order to
match the 1000 statements the GUC pg_stat_monitor.track must be set to
'all' and then back to the default of 'top' when done testing it.
There was a missing increment/decrement to exec_nested_level in
pgss_ProcessUtility hook, due to this, some utility statements could
end up being processed more than once, as PostgreSQL may recurse into
this hook for sub-statements or when processing a query string
containing multiple semicolon-separated statements.
modified: docs/USER_GUIDE.md
new file: docs/_images/PPG_links.png
new file: docs/_images/percona-favicon.ico
new file: docs/_images/percona-logo.svg
new file: docs/_images/percona_favicon.ico
new file: docs/css/percona.css
new file: docs/css/toctree.css
new file: docs/index.md
new file: docs/js/version-select.js
new file: docs/setup.md
new file: mkdocs.yml
Updated the Forum URL to point to the dedicated
pg_stat_monitor forum in `README.md` and
`CONTRIBUTING.md`, removed link to the Discord
channel to ensure that conversations are focused
to one location. Updated the table of contents in
the README.
Signed-off-by: Lenz Grimmer <lenz.grimmer@percona.com>
This commit adds following three sql based testcases:
1) Test unique application name set by user.
2) Histogram function is working properly as desired.
3) Error on insert is shown with proper message.
After the split into multiple pg_stat_monitor--1.0.XX.sql.in sql files,
where XX is the PostgreSQL version, it was forgotten to add the errors
view to the relevant files, this commit fixes that.
Added a link to pg_stat_monitor view reference in the Overview
Aded PG 14 to supported versions
Updated links in the Documentation section
modified: README.md
If a query exceeds pg_stat_monitor.pgsm_query_max_len, then it's
truncated before we save it into the query buffer (SaveQueryText).
When reading the query back, on pg_stat_monitor_internal, we allocate a
buffer for the query with length = pg_stat_monitor.pgsm_query_max_len,
the problem is that the read_query function adds a '\0' to the end of
the buffer when reading a query, thus if a query has been truncated, for
example, to 1024, when reading it back read_query will store the '\0' at
the position 1025, an out of array bounds position.
Then, when we call pfree to release the buffer, PostgreSQL notices the
buffer overrun and triggers an error assertion, the assertion calls our
error hook which attempts to acquire the shared pgss->lock before
pg_stat_monitor_internal has released it, leading to a deadlock.
To avoid the problem we add 1 more byte to the extra '\0' during palloc
call for query_text and parent_query_text.
Also, we release the lock before calling pfree, just in case PostgreSQL
finds a problem in pfree we won't deadlock again and get the error
reported correctly.
If a backend would change the application name during execution,
pg_stat_monitor would then fail to read the updated value, as it caches
the result in order to avoid calling the expensive functions
pgstat_fetch_stat_numbackends() and pgstat_fetch_stat_local_beentry().
A workaround was found, we can just read an exported GUC from
PostgreSQL backend itself, namely application_name, from utils/guc.h,
thus saving us from having to call those expensive functions.
We are concerned with finished queries in pg_stat_monitor, so state and
state_code columns were removed from the pg_stat_monitor_view as we only
list finished queries on it.
Removed state regression as it is not necessary anymore.
Also, this allowed us to remove the call to pgss_store on ExecutorStart,
which just updated query state to EXEC, thus saving some CPU.