Commit Graph

4153 Commits (a75ba3f896ba0b36340bcea9cda4caa77ddd0477)

Author SHA1 Message Date
Jelte Fennema 92689a8362
Make GPIDs work with pg_dist_poolinfo (#6588)
The original implementation of GPIDs didn't work correctly when using
`pg_dist_poolinfo` together with PgBouncer. The reason is that it
assumed that once a connection was made to a worker, the originating
GPID should stay the same for ever. But when pg_dist_poolinfo is used
this isn't the case, because the same connection on the worker might be
used by different backends of the coordinator.

This fixes that issue by updating the GPID whenever a new application
name is set on a connection. This is the only thing that's needed,
because PgBouncer already sets the application name correctly on the
server connection whenever a client is updated.
2023-01-13 14:39:19 +00:00
Marco Slot ad3407b5ff
Revert "Make the metadata syncing less resource invasive [Phase-1]" (#6618)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2023-01-13 13:56:55 +01:00
Ahmet Gedemenli ed6dd8b086
Mark 0/0 lsn results as null (#6617)
Marks `source_lsn` and `target_lsn` fields as null if the result is 0/0
2023-01-13 15:33:43 +03:00
Emel Şimşek 28ed013a91
Enable ALTER TABLE ... ADD CHECK (#6606)
DESCRIPTION: Enable adding CHECK constraints on distributed tables
without the client having to provide a constraint name.

This PR enables the following command syntax for adding check
constraints to distributed tables.
 ALTER TABLE ... ADD CHECK ... 

by creating a default constraint name and transforming the command into
the below syntax before sending it to workers.

ALTER TABLE ... ADD CONSTRAINT \<conname> CHECK ...
2023-01-12 23:31:06 +03:00
Ahmet Gedemenli 9b9d8e7abd Use shard transfer UDFs with node ids for rebalancing 2023-01-12 16:57:51 +03:00
Ahmet Gedemenli e5fef40c06 Introduce citus_move_shard_placement UDF with nodeid 2023-01-12 16:57:51 +03:00
Ahmet Gedemenli e19c545fbf Introduce citus_copy_shard_placement UDF with nodeid 2023-01-12 16:57:51 +03:00
Emel Şimşek d322f9e382
Handle DEFERRABLE option for the relevant constraints at deparser. (#6613)
Table Constraints UNIQUE, PRIMARY KEY and EXCLUDE may have option
DEFERRABLE in their command syntax. This PR handles the option when
deparsing the relevant constraint statements.

NOT DEFERRABLE 
and
INITIALLY IMMEDIATE (if DEFERRABLE}

are the default values for the option so we only append the non-default
values to the alter table statement.
2023-01-12 12:32:38 +03:00
Jelte Fennema 34df853bda
Fix bug introduced by #6412 (#6590)
In #6412 I made a change to not re-assign the global PID if it was
already set. This inadvertently introduced a regression where `userId`
and `databaseId` would not be set on the backend data when the global
PID was assigned in the authentication hook.

This fixes it by doing two things:
1. Removing `userId` from `BackendData`, since it's not used anywhere
   anyway.
2. Move assignment of `databaseId` to dedicated
   `SetBackendDataDatabaseId` function, that isn't a no-op when global
   pid is already set.

Since #6412 is not released yet this does not need a description.
2023-01-10 16:21:57 +01:00
Jelte Fennema c2b4087ff0
Quote all identifiers that we use for logical replication (#6604)
In #6598 it was noticed that Citus could generate syntactically invalid
statements during logical replication. With #6603 we resolved the direct
issue, by only generating valid subscription names. But there was also
the underlying problem that we did not escape certain identifier
strings. While in theory this should be okay since we should only
generate names that are valid, this issue reiterated that we should not
take this for granted. As an extra line of defense this quotes all
identifiers we use during logical replication setup.
2023-01-06 14:12:03 +00:00
Jelte Fennema 44e09128f0
Fix failures in mx_base_schedule (#6601)
Apparently no-one actually ran the mx_base_schedule, because the tests
in schedule itself were already failing. This updates it to be in line
with multi_mx_schedule again to make the tests pass again. Notably it
doesn't contain multi_mx_node_metadata and multi_extension. Because
those tests take long to run and the were not necessary to make
multi_mx_create_table pass again.
2023-01-06 14:48:18 +01:00
Ahmet Gedemenli 26b170e1a8
Use %u instead of %i for naming subscriptions & roles (#6603)
DESCRIPTION: Fix the modifier for subscription and role creation

fixes: #6598 
Reported by @ivyazmitinov
2023-01-06 14:38:32 +01:00
Ahmet Gedemenli bc3383170e
Fix crash when trying to replicate a ref table that is actually dropped (#6595)
DESCRIPTION: Fix crash when trying to replicate a ref table that is actually dropped

see #6592 
We should have a real solution for it.
2023-01-06 14:52:08 +03:00
Emel Şimşek db7a70ef3e
Enable ALTER TABLE ... ADD UNIQUE and ADD EXCLUDE. (#6582)
DESCRIPTION: Adds support for creating table constraints UNIQUE and
EXCLUDE via ALTER TABLE command without client having to specify a name.

ALTER TABLE ... ADD CONSTRAINT <conname> UNIQUE ...
ALTER TABLE ... ADD CONSTRAINT <conname> EXCLUDE ...

commands require the client to provide an explicit constraint name.
However, in postgres it is possible for clients not to provide a name
and let the postgres generate it using the following commands

ALTER TABLE ... ADD UNIQUE ...
ALTER TABLE ... ADD EXCLUDE ...

This PR enables the same functionality for citus tables.
2023-01-05 18:12:32 +03:00
Emel Şimşek 135c519c62
Fix flakyness for multi_name_lengths test (#6599)
Fix flakyness for multi_name_lengths test.
2023-01-05 17:27:16 +03:00
Önder Kalacı eb75decbeb
Undo planner extended statistics override (#6492) 2023-01-04 13:25:57 +01:00
Önder Kalacı a1aa96b32c
Make the metadata syncing less resource invasive [Phase-1] (#6537) 2023-01-04 11:36:45 +01:00
Ahmet Gedemenli 235047670d
Drop SHARD_STATE_TO_DELETE (#6494)
DESCRIPTION: Drop `SHARD_STATE_TO_DELETE` and use the cleanup records
instead

Drops the shard state that is used to mark shards as orphaned. Now we
insert cleanup records into `pg_dist_cleanup` so "orphaned" shards will
be dropped either by maintenance daemon or internal cleanup calls. With
this PR, we make the "cleanup orphaned shards" functions to be no-op, as
they would not be needed anymore.

This PR includes some naming changes about placement functions. We don't
need functions that filter orphaned shards, as there will be no orphaned
shards anymore.

We will also be introducing a small script with this PR, for users with
orphaned shards. We'll basically delete the orphaned shard entries from
`pg_dist_placement` and insert cleanup records into `pg_dist_cleanup`
for each one of them, during Citus upgrade.

We also have a lot of flakiness fixes in this PR.

Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2023-01-03 14:38:16 +03:00
Jelte Fennema f56904fe04
Fix flakyness in isolation_insert_vs_vacuum (#6589)
Sometimes our `isolation_insert_vs_vacuum` test would fail like this.

```diff
 step s2-vacuum-analyze:
     VACUUM ANALYZE test_insert_vacuum;
-
+ <waiting ...>
 step s1-commit:
     COMMIT;

+step s2-vacuum-analyze: <... completed>

```

The reason seems to be that VACUUM ANALYZE tries to take some locks that
conflict with the other transaction, but these locks somehow get
released or VACUUM ANALYZE stops waiting for them. This is somewhat
expected since VACUUM has some special locking logic.

To solve the flakyness we now trigger VACUUM ANALYZE to always report as
blocking and after that we wait explicitly wait for it to complete. This
is done
like is suggested by the flaky test tips from postgres: 

c68a183990/src/test/isolation/README (L152)

I've confirmed that this fixes the issue suing our flaky-test-debugging
CI workflow.
2023-01-02 16:51:58 +01:00
Ahmet Gedemenli 0c74e4cc0f
Fix some flaky tests (#6587)
Fix for some simple flakiness'.
All `DROP USER' and cleanup function calls.
2022-12-29 10:19:09 +03:00
Ahmet Gedemenli f824c996b3
Remove duplicate split entry in run_test.py (#6586)
Nothing important. Removing the duplicate `"split" in test_schedule`
check
2022-12-28 18:18:24 +03:00
Ahmet Gedemenli 1b1e737e51
Drop cleanup on failure (#6584)
DESCRIPTION: Defers cleanup after a failure in shard move or split

We don't need to do a cleanup in case of failure on a shard transfer or
split anymore. Because,
* Maintenance daemon will clean them up anyway.
* We trigger a cleanup at the beginning of shard transfers/splits. 
* The cleanup on failure logic also can fail sometimes and instead of
the original error, we throw the error that is raised by the cleanup
procedure, and it causes confusion.
2022-12-28 15:48:44 +03:00
Ahmet Gedemenli cfc17385e9
Some minor improvements on flakiness detection (#6585)
* Skip some exceptional test files in the flaky workflow, like
multi_extension
* Run some tests without a schedule, like single_node_enterprise
* Use minimal schedule for the tests in split and operations schedules
2022-12-28 15:08:39 +03:00
Ahmet Gedemenli eba9abeee2
Fix leftover shard copy on the target node when tx with move is aborted (#6583)
DESCRIPTION: Cleanup the shard on the target node in case of a
failed/aborted shard move

Inserts a cleanup record for the moved shard placement on the target
node. If the move operation succeeds, the record will be deleted. If
not, it will remain there to be cleaned up later.

fixes: #6580
2022-12-27 22:42:46 +03:00
Naisila Puka e937935935
Clean up normalize file (#6578) 2022-12-26 12:08:27 +03:00
Naisila Puka bc49616426
Fix link of path to flaky tests docs (#6579) 2022-12-23 12:09:22 +03:00
Ahmet Gedemenli 96bf648d1a Unify dependent test files into one 2022-12-22 13:06:26 +03:00
Ahmet Gedemenli acf3539a90 Fix split schedule 2022-12-22 13:06:26 +03:00
Hanefi Onaldi 303db172f8
Use Citus version comparison in upgrade tests, not equality (#6568)
We have several version checks in our Citus upgrade tests. However, as
we drop support for PG versions, we need to update the Citus versions
used in our CI images. Therefore we must compare Citus versions in our
tests instead of using equality checks so that the queries are ran in
all the associated Citus versions.

For example, we have many conditionals where we early exit if the Citus
version is not equal to 9.0. However, as of today we never use version
9.0 and thus we always early exit in those tests.
2022-12-21 14:01:57 +03:00
Teja Mupparti 9a9989fc15 Support MERGE Phase – I
All the tables (target, source or any CTE present) in the SQL statement are local i.e. a merge-sql with a combination of Citus local and
Non-Citus tables (regular Postgres tables) should work and give the same result as Postgres MERGE on regular tables. Catch and throw an
exception (not-yet-supported) for all other scenarios during Citus-planning phase.
2022-12-18 20:32:15 -08:00
Emel Şimşek 5268d0a6cb
Enable PRIMARY KEY generation via ALTER TABLE even if the constraint name is not provided (#6520)
DESCRIPTION: Support ALTER TABLE .. ADD PRIMARY KEY ... command

Before processing
	> **ALTER TABLE ... ADD PRIMARY KEY ...**
command

1. 	Create a primary key name to use as the constraint name.
2. Change the **ALTER TABLE ... ADD PRIMARY KEY ...** command to into
**ALTER TABLE ... ADD CONSTRAINT \<constraint name> PRIMARY KEY ...**
form.
This is the only form we can specify a name for a primary key. If we run
ALTER TABLE .. ADD PRIMARY KEY, postgres
would create a constraint name internally in its own scheme. But the
problem is that we need to create constraint names
for shards in our own scheme which is \<constraint name>_\<shardid>.
Hence we need to create a name and send it to workers so that the
workers can append the shardid.
4. Run the changed command on the coordinator to make sure we are using
the same constraint name across the board.
5. Send the changed command to workers such that it is executed for the
main table as well as for the shards.

Fixes #6515.
2022-12-16 20:34:00 +03:00
aykut-bozkurt 9c0073ba57
remove unused boundary type (#6563)
Removes unused job boundary tag `SUBQUERY_MAP_MERGE_JOB`.

Only usage is at `BuildMapMergeJob`, which is only called when the
boundary = `JOIN_MAP_MERGE_JOB`. Hence, it should be safe to remove.
2022-12-16 18:19:22 +03:00
Onder Kalaci feb5534c65 Do not create additional WaitEventSet for RemoteSocketClosed checks
Before this commit, we created an additional WaitEventSet for
checking whether the remote socket is closed per connection -
only once at the start of the execution.

However, for certain workloads, such as pgbench select-only
workloads, the creation/deletion of the additional WaitEventSet
adds ~7% CPU overhead, which is also reflected on the benchmark
results.

With this commit, we use the same WaitEventSet for the purposes
of checking the remote socket at the start of the execution.

We use "rebuildWaitEventSet" flag so that the executor can re-use
the existing WaitEventSet.

As a result, we see the following improvements on PG 15:

main			 : 120051 tps, 0.532 ms latency avg.
avoid_wes_rebuild: 127119 tps, 0.503 ms latency avg.

And, on PG 14, as expected, there is no difference

main			 : 129191 tps, 0.495 ms latency avg.
avoid_wes_rebuild: 129480 tps, 0.494 ms latency avg.

But, note that PG 15 is slightly (~1.5%) slower than PG 14.
That is probably the overhead of checking the remote socket.
2022-12-14 22:42:55 +01:00
Onder Kalaci d52da55ac0 Move WaitEvent to DistributedExecution
Prep. for caching WaitEventsSet/WaitEvents
2022-12-14 21:59:19 +01:00
Nils Dijk b5b73d78c3
add prepare and finish pg upgrade functions to 11.2-1 (#6560)
Fixes a missed include in #6315.

While adding the cluster clock we have added some extra steps to
`citus_prepare_pg_upgrade` and `citus_finish_pg_upgrade`. These changes
were not added to the citus upgrade and downgrade scripts, this allowed
for a syntax error to slip in.

This PR adds the new versions of both UDF's to the upgrade script while
adding the old version to the downgrade script. This exposed the syntax
error which is also solved.
2022-12-14 12:34:22 +01:00
Gokhan Gulbiz 556161be32
Fix make recipe mapping in test runner (#6561) 2022-12-14 12:57:13 +03:00
aykut-bozkurt 8be4ce546e
fix vanilla test status on CI (#6555)
- Because of the make command used for vanilla tests, test status is
always shown as success on CI. As a fix, I added `&& false` at the end
of the copying diff file to make the command fail when check-vanilla
fails.
```make
check-vanilla: all
	$(pg_regress_multi_check) --vanillatest || (cp $(vanilla_diffs_file) $(citus_abs_srcdir)/regression.diffs && false)
```

- I also fixed some vanilla tests that fails due to recently added clock
related operators shown up at some queries.
2022-12-13 11:15:47 +03:00
Gürkan İndibay 3f091e3493
Give nicer error message when using alter_table_set_access_method on a view (#6553)
DESCRIPTION: Fixes alter_table_set_access_method error for views.

Fixes #6001
2022-12-12 23:56:22 +03:00
aykut-bozkurt 1ad1a0a336
add citus_task_wait udf to wait on desired task status (#6475)
We already have citus_job_wait to wait until the job reaches the desired
state. That PR adds waiting on task state to allow more granular
waiting. It can be used for Citus operations. Moreover, it is also
useful for testing purposes. (wait until a task reaches specified state)

Related to #6459.
2022-12-12 22:41:03 +03:00
aykut-bozkurt 3da6e3e743
bgworkers with backend connection should handle SIGTERM properly (#6552)
Fixes task executor SIGTERM handling.

Problem:
When task executors are sent SIGTERM, their default handler
`bgworker_die`, which is set at worker startup, logs FATAL error. But
they do not release locks there before logging the error, which
sometimes causes hanging of the monitor. e.g. Monitor waits for the lock
forever at pg_stat flush after calling proc_exit.

Solution:
Because executors have connection to backend, they should handle SIGTERM
similar to normal backends. Normal backends uses `die` handler, in which
they set ProcDiePending flag and the next CHECK_FOR_INTERRUPTS call
handles it gracefully by releasing any lock before termination.
2022-12-12 16:44:36 +03:00
dependabot[bot] f6b8990fc7
Bump certifi from 2022.9.14 to 2022.12.7 in /src/test/regress (#6554)
Bumps [certifi](https://github.com/certifi/python-certifi) from
2022.9.14 to 2022.12.7.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-12 14:13:08 +01:00
Gokhan Gulbiz e2a73ad8a8
Flaky Test Detection CI Workflow (#6495)
This PR adds a new CI workflow named ```flaky-test``` to run flaky test
detection on newly introduced regression tests.

Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
2022-12-12 14:36:23 +03:00
Ahmet Gedemenli 190307e8d8
Wait for cleanup function (#6549)
Adding a testing function `wait_for_resource_cleanup` which waits until
all records in `pg_dist_cleanup` are cleaned up. The motivation is to
prevent flakiness in our tests, since the `NOTICE: cleaned up X orphaned
resources` message is not consistent in many cases. This PR replaces
`citus_cleanup_orphaned_resources` calls with
`wait_for_resource_cleanup` calls.
2022-12-08 13:19:25 +03:00
Teja Mupparti cbb33167f9 Fix the flaky test in clock.sql 2022-12-07 09:47:35 -08:00
Ahmet Gedemenli 989a3b54c9
Disable maintenance daemon for cleanup test (#6551)
Disabling the cleanup in maintenance daemon, to prevent a flaky test.
2022-12-07 20:28:55 +03:00
Onur Tirtir b177975371 Add new regression tests 2022-12-07 18:27:50 +03:00
Onur Tirtir 2803470b58 Add lateral join checks for outer joins and drop the useless ones for semi joins 2022-12-07 18:27:50 +03:00
Onur Tirtir e7e4881289 Phase - III: recursively plan non-recurring sub join trees too 2022-12-07 18:27:50 +03:00
Onur Tirtir f52381387e Phase - II: recursively plan non-recurring subqueries too 2022-12-07 18:27:50 +03:00
Onur Tirtir f339450a9d Phase - I: recursively plan non-recurring relations 2022-12-07 18:27:50 +03:00
Ahmet Gedemenli 3cc5d9842a
Remove IF EXISTS from cleanup on failure test for subscription object (#6547)
Nothing critical. Just improving a DROP SUBSCRIPTION test for a cleanup
after failure scenario.
2022-12-07 17:51:36 +03:00
Ahmet Gedemenli cb02d62369
Unique names for replication artifacts (#6529)
DESCRIPTION: Create replication artifacts with unique names

We're creating replication objects with generic names. This disallows us
to enable parallel shard moves, as two operations might use the same
objects. With this PR, we'll create below objects with operation
specific names, by appending OparationId to the names.

* Subscriptions
* Publications
* Replication Slots
* Users created for subscriptions
2022-12-06 15:48:16 +03:00
Teja Mupparti e14dc5d45d Address the issues/comments from the original PR# 6315
1) Regular users fail to use clock UDF with permission issue.
2) Clock functions were declared as STABLE, whereas by definition they are VOLATILE. By design, any clock/time
   functions will return different results for each call even within a single SQL statement.

Note: UDF citus_get_transaction_clock() is a misnomer as it internally calls the clock tick which always returns
      different results for every invocation in the same transaction.
2022-12-05 11:06:21 -08:00
aykut-bozkurt 65f256eec4
* add SIGTERM handler to gracefully terminate task executors, \ (#6473)
Adds signal handlers for graceful termination, cancellation of
task executors and detecting config updates. Related to PR #6459.

#### How to handle termination signal?
Monitor need to gracefully terminate all running task executors before
terminating. Hence, we have sigterm handler for the monitor.

#### How to handle cancellation signal?
Monitor need to gracefully cancel all running task executors before
terminating. Hence, we have sigint handler for the monitor.

#### How to detect configuration changes?
Monitor has SIGHUP handler to reflect configuration changes while
executing tasks.
2022-12-02 18:15:31 +03:00
songjinzhou ad6450b793
fix the problem #5763 (#6519)
Co-authored-by: TsinghuaLucky912 <tsinghualucky912@foxmail.com>
Fixes https://github.com/citusdata/citus/issues/5763
2022-12-02 13:49:32 +01:00
Ahmet Gedemenli 3b24c47470
Fix flaky cleanup tests (#6530)
We are having some flakiness in our test schedule because of the objects
leftover from shard moves/splits. With this commit we prevent logging
cleanup object counts.

fixes: #6534
2022-12-02 12:39:36 +03:00
Hanefi Onaldi d4394b2e2d
Fix spacing in multiline strings (#6533)
When using multiline strings, we occasionally forget to add a single
space at the end of the first line. When this line is concatenated with
the next one, the resulting string has a missing space.
2022-12-01 23:42:47 +03:00
Fabrízio de Royes Mello 37f3dff1ca
Simplify columnar perf example (#6526)
Rewrite the plpython function to generate random words in SQL to
simplify the usage and run the example.
2022-12-01 20:05:40 +01:00
songjinzhou 29f0196fdf
Add support for SET ACCESS METHOD in altering a distributed table (#6525)
Co-authored-by: TsinghuaLucky912 <postgres@localhost.localdomain>
2022-12-01 17:45:32 +01:00
Gürkan İndibay c2193608c9
Add jobs to test builds on different distros (#6499)
With this PR, citus code will be tested in all packaging environments. 
Sometimes, there can be compile errors which blocks packaging and in
this case unplanned delays may occur.
By testing the code in packaging environments, I'm aiming to detect any
compilation errors before packaging.

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
Co-authored-by: Hanefi Onaldi <Hanefi.Onaldi@microsoft.com>
2022-12-01 19:11:41 +03:00
Hanefi Onaldi 1f29c16262
Fix misleading GUC description (#6532)
citus.skip_advisory_lock_permission_checks skips checks when it is set
to 'on', not 'off'
2022-12-01 15:43:02 +03:00
Ahmet Gedemenli 0e92244bfe
Cleanup for shard moves (#6472)
DESCRIPTION: Extend cleanup process for replication artifacts

This PR adds new cleanup record types for:
* Subscriptions
* Replication slots
* Publications
* Users created for subscriptions

We add records for these object types, to `pg_dist_cleanup` during
creation phase. Once the operation is done, in case of success or
failure, we iterate those records and drop the objects. With this PR we
will not be dropping any of these objects during the operation. In
short, we will always be deferring the drop.

One thing that's worth mentioning is that we sort cleanup records before
processing (dropping) them, because of dependency relations among those
objects, e.g a subscription might depend on a publication. Therefore, we
always drop subscriptions before publications.

We have some renames in this PR:
* `TryDropOrphanedShards` -> `TryDropOrphanedResources`
* `DropOrphanedShardsForCleanup` -> `DropOrphanedResourcesForCleanup`
* `run_try_drop_marked_shards` -> `run_try_drop_marked_resources`
as these functions now process replication artifacts as well.

This PR drops function `DropAllLogicalReplicationLeftovers` and its all
usages, since now we rely on the deferring drop mechanism.
2022-11-30 15:38:05 +03:00
aykut-bozkurt 1f8675da43
nonblocking concurrent task execution via background workers (#6459)
Improvement on our background task monitoring API (PR #6296) to support
concurrent and nonblocking task execution.

Mainly we have a queue monitor background process which forks task
executors for `Runnable` tasks and then monitors their status by
fetching messages from shared memory queue in nonblocking way.
2022-11-30 14:29:46 +03:00
aykut-bozkurt 83ef600f27
fix false full join pushdown error check (#6523)
**Problem**: Currently, we error out if we detect recurring tuples in
one side without checking the other side of the join.

**Solution**: When one side of the full join consists recurring tuples
and the other side consists nonrecurring tuples, we should not pushdown
to prevent duplicate results. Otherwise, safe to pushdown.
2022-11-30 14:17:56 +03:00
Gokhan Gulbiz bc118ee551
Change GUC propagation flag's default value to off (#6516)
This PR changes
```citus.propagate_session_settings_for_loopback_connection``` default
value to off not to expose this feature publicly at this point. See
#6488 for details.
2022-11-29 13:25:53 +03:00
Jelte Fennema e12d97def2
Fix flakyness in multi_metadata_access (#6524)
Sometimes multi_metadata_access failed like this in CI:
```diff
     AND ext.extname = 'citus'
     AND nsp.nspname = 'pg_catalog'
     AND NOT has_table_privilege(pg_class.oid, 'select');
             oid
 ---------------------------
- pg_dist_authinfo
  pg_dist_clock_logical_seq
+ pg_dist_authinfo
 (2 rows)
```

Source:
https://app.circleci.com/pipelines/github/citusdata/citus/28784/workflows/e462f118-eb64-4a3f-941a-e04115334f9b/jobs/883443

This fixes that by ordering the column.
2022-11-29 10:00:06 +01:00
Philip Dubé cf69fc3652 Grammar: it's to its
Includes an error message

& one case of its to it's

Also fix "to the to" typos
2022-11-28 20:43:44 +00:00
Jelte Fennema 68de2ce601
Include gpid in all internal application names (#6431)
When debugging issues it's quite useful to see the originating gpid in
the application_name of a query on a worker. This already happens for
most queries, but not for queries created by the rebalancer or by
run_command_on_worker. This adds a gpid to those two application_names
too.

Note, that if the GPID of the new application_names is different than
the current GPID of the backend the backend will continue to keep 
the old gpid as its actual GPID. This PR is just meant to make sure 
that the application_name is as useful as it can be for users to 
look at. Updating of gpids will be done in a follow-up PR, and 
adding gpids to all internal connections will make this easier.
2022-11-25 11:16:33 +01:00
Teja Mupparti edaf88e0ff Fix the dangling pointer bug in get_merged_argument_list() 2022-11-22 09:41:10 -08:00
Onur Tirtir 80faf47ab5
Fix dangling pointer warning in AnyTableReplicated (#6504)
DESCRIPTION: Fixes a potential dangling pointer issue

Need to backport to 11.0 & 11.1 since we might want to release packages
for debian/bookworm based on those branches in future.
2022-11-21 16:42:00 +03:00
Jelte Fennema a477ffdf4b
Correctly fix OpenSSL 3.0 warnings (#6502)
In #6038 I tried to fix OpenSSL 3.0 warnings with PG13, but I had made a
mistake when doing that. This actually fixes these warnings.
2022-11-18 14:35:41 +01:00
Emel Şimşek 8e5ba45b74
Fixes a bug that causes crash when using auto_explain extension with ALTER TABLE...ADD FOREIGN KEY... queries. (#6470)
Fixes a bug that causes crash when using auto_explain extension with
ALTER TABLE...ADD FOREIGN KEY... queries.
Those queries trigger a SELECT query on the citus tables as part of the
foreign key constraint validation check. At the explain hook, workers
try to explain this SELECT query as a distributed query causing memory
corruption in the connection data structures. Hence, we will not explain
ALTER TABLE...ADD FOREIGN KEY... and the triggered queries on the
workers.

Fixes #6424.
2022-11-15 17:53:39 +03:00
Teja Mupparti 7358b826ef Remove the explicit-transaction requirement for the UDF citus_get_transaction_clock() as implicit transactions too use this UDF. 2022-11-10 10:54:36 -08:00
Marco Slot 77fbcfaf14
Propagate BEGIN properties to worker nodes (#6483)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-11-10 18:08:43 +01:00
Onur Tirtir e0363470bc Add missing targets to check-full 2022-11-08 09:59:55 +03:00
Onur Tirtir 7b3e55f903 Add missing dependencies to test targets 2022-11-08 09:59:55 +03:00
Marco Slot fcaabfdcf3
Remove remaining master_create_distributed_table usages (#6477)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-11-04 16:30:06 +01:00
Marco Slot 666696c01c
Deprecate citus.replicate_reference_tables_on_activate, make it always off (#6474)
Co-authored-by: Marco Slot <marco.slot@gmail.com>
2022-11-04 16:21:10 +01:00
Naisila Puka b8c7a9844c
Add docs on handling alternative test outputs (#6469)
I recently cleaned up our test suite from redundant test outputs: #6111
#6140 #6214 #6140 #6434

I had to check many files manually, as they didn't have any
documentation on why the alternative test output existed in the first
place.

Adding a section in our test docs to remind developers to add
alternative test outputs with enough information/keywords.
2022-11-03 10:55:50 +03:00
Onur Tirtir 1af28b3f27
Use CommitContext for subxact mgmt and reduce memory usage in CommitContext (#6099)
(Hopefully) Fixes #5000.

If memory allocation done for `SubXactContext *state` in `PushSubXact()`
fails, then `PopSubXact()` might segfault, for example, when grabbing
the
topmost `SubXactContext` from `activeSubXactContexts` if this is the
first
ever subxact within the current xact, with the following stack trace:
```c
citus.so!list_nth_cell(const List * list, int n) (\opt\pgenv\pgsql-14.3\include\server\nodes\pg_list.h:260)
citus.so!PopSubXact(SubTransactionId subId) (\home\onurctirtir\citus\src\backend\distributed\transaction\transaction_management.c:761)
citus.so!CoordinatedSubTransactionCallback(SubXactEvent event, SubTransactionId subId, SubTransactionId parentSubid, void * arg) (\home\onurctirtir\citus\src\backend\distributed\transaction\transaction_management.c:673)
CallSubXactCallbacks(SubXactEvent event, SubTransactionId mySubid, SubTransactionId parentSubid) (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:3644)
AbortSubTransaction() (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:5058)
AbortCurrentTransaction() (\opt\pgenv\src\postgresql-14.3\src\backend\access\transam\xact.c:3366)
PostgresMain(int argc, char ** argv, const char * dbname, const char * username) (\opt\pgenv\src\postgresql-14.3\src\backend\tcop\postgres.c:4250)
BackendRun(Port * port) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:4530)
BackendStartup(Port * port) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:4252)
ServerLoop() (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:1745)
PostmasterMain(int argc, char ** argv) (\opt\pgenv\src\postgresql-14.3\src\backend\postmaster\postmaster.c:1417)
main(int argc, char ** argv) (\opt\pgenv\src\postgresql-14.3\src\backend\main\main.c:209)
```

For this reason, to be more defensive against memory-allocation errors
that could happen at `PushSubXact()`, now we use our pre-allocated
memory
context for the objects created in `PushSubXact()`.

This commit also attempts reducing the memory allocations done under
CommitContext to reduce the chances of consuming all the memory
available
to CommitContext.

Note that it's problematic to encounter with such a memory-allocation
error for other objects created in `PushSubXact()` as well, so above is
an **example** scenario that might result in a segfault.

DESCRIPTION: Fixes a bug that might cause segfaults when handling deeply
nested subtransactions
2022-11-03 00:57:32 +03:00
Onur Tirtir a5f7f001b0
Make sure to disallow triggers that depend on extensions (#6399)
DESCRIPTION: Makes sure to disallow triggers that depend on extensions

We were already doing so for `ALTER trigger DEPENDS ON EXTENSION`
commands. However, we also need to disallow creating Citus tables
having such triggers already, so this PR fixes that.
2022-11-02 16:27:31 +03:00
Alexander Kukushkin deeacfee04
Improve a query that terminates compeling backends from citus_update_node() (#6468)
DESCRIPTION: Improve a query that terminates compeling backends from citus_update_node()

1. Use pg_blocking_pids() function instead of self join on pg_locks. It exists since 9.6 and more accurate than pg_locks.
2. Prefix all function calls with pg_catalog schema to prevent privilege escalation by creating functions with similar names in a public schema.
3. Change logs and update comments to reflect the fact that the pg_terminate_backend() function only sends SIGTERM but not wating for the actual backend termination.
2022-11-02 12:32:00 +01:00
Alexander Kukushkin 402a30a2b7
Allow citus_update_node() to work with nodes from different clusters (#6466)
DESCRIPTION: Allow citus_update_node() to work with nodes from different clusters

citus_update_node(), citus_nodename_for_nodeid(), and citus_nodeport_for_nodeid() functions only checked for nodes in their own clusters and hence last two returned NULLs and the first one showed an error is the nodeId was from a different cluster.

Fixes https://github.com/citusdata/citus/issues/6433
2022-11-02 10:07:01 +01:00
oohira 3f66f3d9dd
Add missing space to citus.shard_count description (#6464)
DESCRIPTION: Add missing space to citus.shard_count description
2022-10-31 10:37:14 +01:00
Teja Mupparti 69f75af62d Remove unused macros 2022-10-28 10:38:07 -07:00
Teja Mupparti 01103ce05d This implements a new UDF citus_get_cluster_clock() that returns a monotonically
increasing logical clock. Clock guarantees to never go back in value after restarts,
and makes best attempt to keep the value close to unix epoch time in milliseconds.

Also, introduces a new GUC "citus.enable_cluster_clock", when true, every
distributed transaction is stamped with logical causal clock and persisted
in a catalog pg_dist_commit_transaction.
2022-10-28 10:15:08 -07:00
Ahmet Gedemenli c379ff8614
Drop defer drop gucs (#6447)
DESCRIPTION: Drops GUC defer_drop_after_shard_split
DESCRIPTION: Drops GUC defer_drop_after_shard_move

Drop GUCs and related parts from the code.
Delete tests that specifically added for the GUCs.
Keep tests that can be used without the GUCs.
Update test output changes.

The motivation for this PR is to have an "always deferring" mechanism.
These two GUCs provide an option to not deferring dropping objects
during a shard move/split, and dropping them immediately. With this PR,
we will be always deferring dropping orphaned shards and other types of
objects.

We will have a separate PR to extend the deferred cleanup operation, so
that we would create records for deferred drop, for Subscriptions,
Publications, Replication Slots etc. This will make us be able to keep
track of created objects that needs to be dropped, during a shard
move/split. We will have objects created specifically for the current
operation; and those objects will be dropped at the end.

We have an issue (a draft roadmap) for enabling parallel shard moves.
For details please see: https://github.com/citusdata/citus/issues/6437
2022-10-25 16:48:34 +03:00
Hanefi Onaldi 915d1b3b38
Repartition tests for numeric types with neg scale (#6358)
This PR adds some test cases where repartition join correctly prunes
shards on two tables that have numeric columns with negative scale.
2022-10-24 20:59:05 +03:00
Jelte Fennema 20a4d742aa
Fix flakyness in failure_split_cleanup (#6450)
Sometimes in CI our failure_split_cleanup test would fail like this:
```diff
     CALL pg_catalog.citus_cleanup_orphaned_resources();
-NOTICE:  cleaned up 79 orphaned resources
+NOTICE:  cleaned up 82 orphaned resources
     SELECT operation_id, object_type, object_name, node_group_id, policy_type
```

Source:
https://app.circleci.com/pipelines/github/citusdata/citus/28107/workflows/4ec712c9-98b5-4e90-9806-e02a37d71679/jobs/846107

The reason was that previous tests in the schedule would also create
some orphaned resources. Sometimes some of those would already be
cleaned up by the maintenance daemon, resulting in a different number of
cleaned up resources than expected. This cleans up any previously
created resources at the start of the test without logging how many
exactly were cleaned up. As a bonus this now also allows running this
test using check-failure-base.
2022-10-24 17:35:31 +02:00
Onur Tirtir 2d14dd85e9
Not hardcode "false" in UpdateAutoConvertedForConnectedRelations (#6452)
This didn't cause any bugs since today we're always calling
UpdateAutoConvertedForConnectedRelations with autoconverted=false, so we
don't need to backport this to anywhere.
2022-10-21 18:14:20 +03:00
Onur Tirtir dbe2749bbf
Drop unreachable code from query_pushdown_planning.c (#6451)
Given that we cannot continue after a `RaiseDeferredErrorInternal(..,
ERROR)` call.
2022-10-21 18:04:31 +03:00
Jelte Fennema 7f05ad033a
Add a section on PR descriptions to flaky test docs (#6446)
Good PR descriptions for flaky tests are quite helpful when reviewing.
Although obviously no PR description is the same, there's a few common
pieces of information that are useful for all PRs that fix flaky tests.
2022-10-21 16:52:31 +02:00
aykut-bozkurt 162c8a5160
Drop worker_fetch_foreign_file/worker_repartition_cleanup only if they exist when upgrading Citus (#6441)
We should not introduce breaking sql changes to upgrade files after they
are released. We did that for worker_fetch_foreign_file in v9.0.0 and
worker_repartition_cleanup in v9.2.0. Later when we try to drop those
udfs, they were missing for some clients unexpectedly due to breaking
change in an old upgrade script. For that case, the fix is to add DROP
IF EXISTS for those 2 udfs in 11.0-4--11.1-1.
2022-10-21 14:32:42 +03:00
Emel Şimşek 02fd1e6c03
Fix the crash that happens when using auto_explain extension with recursive queries (#6406)
This crash happens with recursively planned queries. For such queries,
subplans are explained via the ExplainOnePlan function of postgresql.
This function reconstructs the query description from the plan therefore
it expects the ActiveSnaphot for the query be available. This fix makes
sure that the snapshot is in the stack before calling ExplainOnePlan.

Fixes #2920.
2022-10-19 18:04:45 +03:00
Jelte Fennema 737e2bb1bb
Don't leak search_path to workers on DDL (#6444)
DESCRIPTION: Don't leak search_path to workers on DDL

For DDL we have to set the `search_path` on workers to the same as on
the coordinator for some DDL to work. Previously this search_path would
leak outside of the transaction that was used for the DDL. This fixes
that by using `SET LOCAL` instead of `SET`. The only place where we
still use plain `SET` is for DDL commands that are not allowed within
transactions, such as `CREATE INDEX CONCURRENLTY`.

This fixes this flaky test:
```diff
 CONTEXT:  SQL statement "SELECT change_id                           FROM distributed_triggers.data_changes
       WHERE shard_key_value = NEW.shard_key_value AND object_id = NEW.object_id
       ORDER BY change_id DESC LIMIT 1"
-PL/pgSQL function record_change() line XX at SQL statement
+PL/pgSQL function distributed_triggers.record_change() line 17 at SQL statement
 while executing command on localhost:57638
 DELETE FROM data_ref_table where shard_key_value = 'hello';
```
Source:
https://app.circleci.com/pipelines/github/citusdata/citus/27849/workflows/75ae5f1a-100b-4b7a-b991-7de069f39ee1/jobs/831429

I had tried to fix this flaky test in #5894 and then I tried
implementing a better fix in #5896, where @marcocitus suggested this
better fix. This change reverts the fix from #5894 and implements the
fix suggested by Marco.


Our multi_mx_alter_distributed_table test actually depended on the old
buggy search_path leaking behavior. After fixing the bug that test would
fail like this:
```diff
 CALL proc_0(1.0);
 DEBUG:  pushing down the procedure
-NOTICE:  Res: 3
-DETAIL:  from localhost:xxxxx
+ERROR:  relation "test_proc_colocation_0" does not exist
+CONTEXT:  PL/pgSQL function mx_alter_distributed_table.proc_0(double precision) line 5 at SQL statement
+while executing command on localhost:57637
 RESET client_min_messages;
```

I fixed this test by fully qualifying the table names used in the
procedure. I think it's quite unlikely that actual users depend 
on this behavior though. Since it would require first doing 
DDL before calling a procedure in a session where the
search_path was changed after connecting.
2022-10-19 16:47:35 +02:00
Ahmet Gedemenli cdbda9ea6b
Add failure test for shard move (#6325)
DESCRIPTION: Adds failure test for shard move
DESCRIPTION: Remove function `WaitForAllSubscriptionsToBecomeReady` and
related tests

Adding some failure tests for shard moves.
Dropping the not-needed-anymore function
`WaitForAllSubscriptionsToBecomeReady`, as the subscriptions now start
as ready from the beginning because we don't use logical replication
table sync workers anymore.

fixes: #6260
2022-10-19 14:25:26 +02:00
Gokhan Gulbiz 56da3cf6aa
Increase node_connection_timeout to prevent flakiness in shard_rebalancer regression tests (#6445)
In CI shard_rebalancer sometimes fails with this error:

```diff
SET citus.node_connection_timeout to 60;
 BEGIN;
     SET LOCAL citus.shard_replication_factor TO 2;
     SET citus.log_remote_commands TO ON;
     SET SESSION citus.max_adaptive_executor_pool_size TO 5;
     SELECT replicate_table_shards('dist_table_test_2',  max_shard_copies := 4,  shard_transfer_mode:='block_writes');
+WARNING:  could not establish connection after 60 ms
```
Source
https://app.circleci.com/pipelines/github/citusdata/citus/28128/workflows/38eeacc4-4191-4366-87ed-9a628414965a/jobs/847458?invite=true#step-107-21

This PR avoids this issue by increasing
```citus.node_connection_timeout``` to 35s.
2022-10-19 13:03:14 +03:00
Onur Tirtir 5aec88d084
Not try locking relations referencing to views (#6430)
Since there can't be such a foreign key already.

This mainly fixes the error that Citus throws
when trying to truncate a distributed view.

Fixes #5990.
2022-10-19 11:24:22 +03:00
Jelte Fennema f756db39c4
Add docs on how to fix flaky tests (#6438)
I fixed a lot of flaky tests recently and I found some patterns in the
type of issues and type of fixes. This adds a document that lists 
these types of issues and explains how to fix them.
2022-10-18 15:52:01 +02:00
Gokhan Gulbiz e87eda6496
Introduce a new GUC to propagate local settings to new connections in rebalancer (#6396)
DESCRIPTION: Introduce
```citus.propagate_session_settings_for_loopback_connection``` GUC to
propagate local settings to new connections.

Fixes: #5289
2022-10-18 12:50:30 +03:00