citus

Commit Graph

Author	SHA1	Message	Date
Mehmet Yilmaz	7da9ffaa4e	style	2025-11-19 19:18:52 +00:00
eaydingol	cf533ebae9	Add breaking change detection for minor version upgrades (#8334 ) This PR introduces infrastructure and validation to detect breaking changes during Citus minor version upgrades, designed to run in release branches only. Breaking change detection: - [GUCs] Detects removed GUCs and changes to default values - [UDFs] Detects removed functions and function signature changes -- Supports backward-compatible function overloading (new optional parameters allowed) - [types] Detects removed data types - [tables/views] Detects removed tables/views and removed/changed columns - New make targets for minor version upgrade tests - Follow-up PRs will add test schedules with different upgrade scenarios The test will be enabled in release branches (e.g., release-13) via the new test-citus-minor-upgrade job shown below. It will not run on the main branch. Testing Verified locally with sample breaking changes: `make check-citus-minor-upgrade-local citus-old-version=v13.2.0 ` Test case 1: Backward-compatible signature change (allowed) ``` -- Old: CREATE FUNCTION pg_catalog.citus_blocking_pids(pBlockedPid integer) -- New: CREATE FUNCTION pg_catalog.citus_blocking_pids(pBlockedPid integer, pBlockedByPid integer DEFAULT NULL) ``` No breaking change detected (new parameter has DEFAULT) Test case 2: Incompatible signature change (breaking) ``` -- Old: CREATE FUNCTION pg_catalog.citus_blocking_pids(pBlockedPid integer) -- New: CREATE FUNCTION pg_catalog.citus_blocking_pids(pBlockedPid integer, pBlockedByPid integer) ``` Breaking change detected: `UDF signature removed: pg_catalog.citus_blocking_pids(pblockedpid integer) RETURNS integer[]` Test case 3: GUC changes (breaking) - Removed `citus.max_worker_nodes_tracked` - Changed default value of `citus.max_shared_pool_size` from 0 to 4 Breaking change detected: ``` The default value of GUC citus.max_shared_pool_size was changed from 0 to 4 GUC citus.max_worker_nodes_tracked was removed ``` Test case 4: Table/view changes - Dropped `pg_catalog.pg_dist_rebalance_strategy` and removed a column from `pg_catalog.citus_lock_waits` ``` - Column blocking_nodeid in table/view pg_catalog.citus_lock_waits was removed - Table/view pg_catalog.pg_dist_rebalance_strategy was removed ``` Test case 5: Remove a custom type - Dropped `cluster_clock` and the objects depend on it. In addition to the dependent objects, test shows: ``` - Type pg_catalog.cluster_clock was removed ``` Sample new job for build and test workflow (for release branches): ``` test-citus-minor-upgrade: name: PG17 - check-citus-minor-upgrade runs-on: ubuntu-latest container: image: "${{ needs.params.outputs.citusupgrade_image_name }}:${{ fromJson(needs.params.outputs.pg17_version).full }}${{ needs.params.outputs.image_suffix }}" options: --user root needs: - params - build env: citus_version: 13.2 steps: - uses: actions/checkout@v4 - uses: "./.github/actions/setup_extension" with: skip_installation: true - name: Install and test citus minor version upgrade run: \|- gosu circleci \ make -C src/test/regress \ check-citus-minor-upgrade \ bindir=/usr/lib/postgresql/${PG_MAJOR}/bin \ citus-pre-tar=/install-pg${PG_MAJOR}-citus${citus_version}.tar \ citus-post-tar=${GITHUB_WORKSPACE}/install-$PG_MAJOR.tar; - uses: "./.github/actions/save_logs_and_results" if: always() with: folder: ${{ env.PG_MAJOR }}_citus_minor_upgrade - uses: "./.github/actions/upload_coverage" if: always() with: flags: ${{ env.PG_MAJOR }}_citus_minor_upgrade codecov_token: ${{ secrets.CODECOV_TOKEN }} ```	2025-11-14 06:48:35 +00:00
Muhammad Usama	95da74c47f	Fix Deadlock with transaction recovery is possible during Citus upgrades (#7910 ) DESCRIPTION: Fixes deadlock with transaction recovery that is possible during Citus upgrades. Fixes #7875. This commit addresses two interrelated deadlock issues uncovered during Citus upgrades: 1. Local Deadlock: - Problem: In `RecoverWorkerTransactions()`, a new connection is created for each worker node to perform transaction recovery by locking the `pg_dist_transaction` catalog table until the end of the transaction. When `RecoverTwoPhaseCommits()` calls this function for each worker node, the order of acquiring locks on `pg_dist_authinfo` and `pg_dist_transaction` can alternate. This reversal can lead to a deadlock if any concurrent process requires locks on these tables. - Fix: Pre-establish all worker node connections upfront so that `RecoverWorkerTransactions()` operates with a single, consistent connection. This ensures that locks on `pg_dist_authinfo` and `pg_dist_transaction` are always acquired in the correct order, thereby preventing the local deadlock. 2. Distributed Deadlock: - Problem: After resolving the local deadlock, a distributed deadlock issue emerges. The maintenance daemon calls `RecoverWorkerTransactions()` on each worker node— including the local node—which leads to a complex locking sequence: - A RowExclusiveLock is taken on the `pg_dist_transaction` table in `RecoverWorkerTransactions()`. - An update extension then tries to acquire an AccessExclusiveLock on the same table, getting blocked by the RowExclusiveLock. - A subsequent query (e.g., a SELECT on `pg_prepared_xacts`) issued using a separate connection on the local node gets blocked due to locks held during a call to `BuildCitusTableCacheEntry()`. - The maintenance daemon waits for this query, resulting in a circular wait and stalling the entire cluster. - Fix: Avoid cache lookups for internal PostgreSQL tables by implementing an early bailout for relation IDs below `FirstNormalObjectId` (system objects). This eliminates unnecessary calls to `BuildCitusTableCache`, reducing lock contention and mitigating the distributed deadlock. Furthermore, this optimization improves performance in fast connect→query_catalog→disconnect cycles by eliminating redundant cache creation and lookups. 3. Also reverts the commit that disabled the relevant test cases.	2025-03-12 12:43:01 +03:00
Onur Tirtir	2d8be01853	Disable 2PC recovery while executing ALTER EXTENSION cmd during Citus upgrade tests (cherry picked from commit `b6b73e2f4c`)	2025-02-04 16:53:32 +03:00
Onur Tirtir	27ac44eb2a	Fix mixed Citus upgrade tests (#7218 ) When testing rolling Citus upgrades, coordinator should not be upgraded until we upgrade all the workers. --------- Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>	2023-09-26 17:52:52 +03:00
Jelte Fennema	350a0f6417	Support running Citus upgrade tests with run_test.py (#6832 ) Citus upgrade tests require some additional logic to run, because we have a before and after schedule and we need to swap the Citus version in-between. This adds that logic to `run_test.py`. In passing this makes running upgrade tests locally multiple times faster by caching tarballs.	2023-05-23 14:38:54 +02:00
Jelte Fennema	c018e29bec	Don't blanket ignore flake8 E402 error (#6734 ) Instead this starts ignoring it in specific places only, because most files don't actually need it ignored.	2023-02-27 18:17:15 +03:00
Jelte Fennema	9f41ea2157	Fix issues reported by flake8	2023-02-10 13:05:37 +01:00
Jelte Fennema	188cc7d2ae	Run python files through isort	2023-02-10 13:05:37 +01:00
Jelte Fennema	a645cb4b94	Better test failure debugging for arbitrary-configs (#5861 ) This improves debugging of arbitrary configs in two ways: 1. Enable logging of distributed deadlock detection 2. Show output of `psql` commands	2022-08-09 12:25:20 +03:00
Talha Nisanci	19f28eabae	Fix citus upgrade local run issues (#5414 ) This PR is fixing 2 separate issues related to the local run of citus upgrade tests. `d3e7c825ab` fixes the issue that, with our new testing infrastructure, we moved/renamed some of existing folders. This created a problem for local runs of citus upgrade tests since some paths were sensitive to such changes. This commit tries to make it more generic so that this issue is less likely to happen in the future, while also fixing the current issue. `93de6b60c3` we are fixing an issue that a new environment variable was added for citus upgrade tests, which is defined in the CI. `0cb51f8c37/.circleci/config.yml (L294)` This environment variable wasn't set in our local runs hence it would create problems. Instead of defining this environment variable in the local run, we change the citus_upgrade run command to use an existing env variable, which is now also set in the CI.	2021-11-03 16:17:36 +03:00
SaitTalhaNisanci	3f65751d43	Add an infrastructure to run same tests with arbitrary configs (#5316 ) To run tests in parallel use: ```bash make check-arbitrary-configs parallel=4 ``` To run tests sequentially use: ```bash make check-arbitrary-configs parallel=1 ``` To run only some configs: ```bash make check-arbitrary-base CONFIGS=CitusSingleNodeClusterConfig,CitusSmallSharedPoolSizeConfig ``` To run only some test files with some config: ```bash make check-arbitrary-base CONFIGS=CitusSingleNodeClusterConfig EXTRA_TESTS=dropped_columns_1 ``` To get a deterministic run, you can give the random's seed: ```bash make check-arbitrary-configs parallel=4 seed=12312 ``` The `seed` will be in the output of the run. In our regular regression tests, we can see all the details about either planning or execution but this means we need to run the same query under different configs/cluster setups again and again, which is not really maintanable. When we don't care about the internals of how planning/execution is done but the correctness, especially with different configs this infrastructure can be used. With `check-arbitrary-configs` target, the following happens: - a bunch of configs are loaded, which are defined in `config.py`. These configs have different settings such as different shard count, different citus settings, postgres settings, worker amount, or different metadata. - For each config, a separate data directory is created for tests in `tmp_citus_test` with the config's name. - For each config, `create_schedule` is run on the coordinator to setup the necessary tables. - For each config, `sql_schedule` is run. `sql_schedule` is run on the coordinator if it is a non-mx cluster. And if it is mx, it is either run on the coordinator or a random worker. - Tests results are checked if they match with the expected. When tests results don't match, you can see the regression diffs in a config's datadir, such as `tmp_citus_tests/dataCitusSingleNodeClusterConfig`. We also have a PostgresConfig which runs all the test suite with Postgres. By default configs use regular user, but we have a config to run as a superuser as well. So the infrastructure tests: - Postgres vs Citus - Mx vs Non-Mx - Superuser vs regular user - Arbitrary Citus configs When you want to add a new test, you can add the create statements to `create_schedule` and add the sql queries to `sql_schedule`. If you are adding Citus UDFs that should be a NO-OP for Postgres, make sure to override the UDFs in `postgres.sql`. You can add your new config to `config.py`. Make sure to extend either `CitusDefaultClusterConfig` or `CitusMXBaseClusterConfig`. On the CI, upon a failure, all logfiles will be uploaded as artifacts, so you can check the artifacts tab. All the regressions will be shown as part of the job on CI. In your local, you can check the regression diffs in config's datadirs as in `tmp_citus_tests/dataCitusSingleNodeClusterConfig`.	2021-10-12 14:24:19 +03:00

12 Commits (7da9ffaa4e0ee58293b5fa5a4ad44579d507bd84)