citus

Commit Graph

Author	SHA1	Message	Date
Marco Slot	f2538a456f	Support co-located/recurring sublinks in the target list	2020-12-13 15:45:24 +01:00
Marco Slot	8e8adcd92a	Harden citus_tables against node failure	2020-12-13 15:10:40 +01:00
Hadi Moshayedi	4dd22cc4e4	Columnar: Fix ANALYZE for large number of rows.	2020-12-10 09:52:33 -08:00
Hadi Moshayedi	b3dac5e9d1	Columnar: set default compression as zstd if available	2020-12-09 14:32:08 -08:00
Hadi Moshayedi	4668fe51a6	Columnar: Make compression level configurable	2020-12-09 08:48:50 -08:00
Hadi Moshayedi	f5a4a4bc74	Columnar: Support zstd compression	2020-12-09 08:30:55 -08:00
Hadi Moshayedi	3f81ee26fd	Columnar: Support LZ4 compression	2020-12-09 08:29:07 -08:00
jeff-davis	260a02180b	Add tests for unsupported columnar storage features (#4397 ) Add negative tests: * Deletes * Sample scan * Special columns * Tuple locks * Indexes	2020-12-09 00:08:45 -08:00
Jeff Davis	c91e5b052b	more test fixups	2020-12-07 13:43:27 -08:00
Jeff Davis	7169ba21c4	more test fixes	2020-12-07 13:36:46 -08:00
Jeff Davis	e26fdeb706	fixup tests some more	2020-12-07 13:22:16 -08:00
Jeff Davis	5b3c32eb38	fixup tests	2020-12-07 13:18:22 -08:00
Jeff Davis	068af7f38e	fixup upgrade tests	2020-12-07 13:11:51 -08:00
Jeff Davis	3758e83850	Rename cstore->columnar in SQL objects and errors.	2020-12-07 13:01:53 -08:00
Jeff Davis	ad919ff220	Tests for UPDATE and error message improvement. UPDATEs on partitioned tables that affect only row partitions should succeed, the rest should fail. Also rename CStoreScan to ColumnarScan to make the error message more relevant.	2020-12-07 11:25:30 -08:00
Ahmet Gedemenli	7577821920	Fix transaction name length calculation	2020-12-07 12:34:15 +03:00
Ahmet Gedemenli	936775e8e3	Delete transactions when removing node With this commit, we delete entries in pg_dist_transaction for the primary nodes that are removed by `master_remove_node`.	2020-12-07 11:35:20 +03:00
Hadi Moshayedi	01da2a1c73	Columnar: track decompressed length in metadata	2020-12-04 09:09:39 -08:00
Onder Kalaci	bd9827aed9	Add regression tests with different data types We typically do not test Citus with these uncommon data types. Now, we already have the tests for ADF integration, add it to regression tests as well.	2020-12-04 10:25:00 +03:00
Hadi Moshayedi	4a9aebaa7b	Columnar: rename block to chunk	2020-12-03 08:50:19 -08:00
Hadi Moshayedi	24bfd368a9	Columnar: Fix VACUUM for empty tables	2020-12-03 08:46:09 -08:00
Marco Slot	c9b658daea	Add a public.citus_tables view	2020-12-03 17:31:40 +01:00
Marco Slot	4098d33acb	Allow citus size functions on replicated tables	2020-12-03 16:33:24 +01:00
SaitTalhaNisanci	f164575524	Add a utility to process each table index (#4382 ) A utility function is added so that each caller can implement a handler for each index on a given table. This means that the caller doesn't need to worry about how to access each index, the only thing that it needs to do each to implement a function to which each index on the table is passed iteratively.	2020-12-03 16:33:13 +03:00
Marco Slot	c69ea2512a	Fix flappy failure test	2020-12-03 13:54:02 +01:00
Onder Kalaci	c546ec5e78	Local node connection management When Citus needs to parallelize queries on the local node (e.g., the node executing the distributed query and the shards are the same), we need to be mindful about the connection management. The reason is that the client backends that are running distributed queries are competing with the client backends that Citus initiates to parallelize the queries in order to get a slot on the max_connections. In that regard, we implemented a "failover" mechanism where if the distributed queries cannot get a connection, the execution failovers the tasks to the local execution. The failover logic is follows: - As the connection manager if it is OK to get a connection - If yes, we are good. - If no, we fail the workerPool and the failure triggers the failover of the tasks to local execution queue The decision of getting a connection is follows: /* * For local nodes, solely relying on citus.max_shared_pool_size or * max_connections might not be sufficient. The former gives us * a preview of the future (e.g., we let the new connections to establish, * but they are not established yet). The latter gives us the close to * precise view of the past (e.g., the active number of client backends). * * Overall, we want to limit both of the metrics. The former limit typically * kics in under regular loads, where the load of the database increases in * a reasonable pace. The latter limit typically kicks in when the database * is issued lots of concurrent sessions at the same time, such as benchmarks. */	2020-12-03 14:16:13 +03:00
Hadi Moshayedi	c2f60b6422	Columnar: pg_upgrade support (#4354 )	2020-12-02 08:46:59 -08:00
Ahmet Gedemenli	5242dcfe99	Add tests for propagating alter schema rename	2020-12-02 15:18:26 +03:00
Ahmet Gedemenli	514c6a76ac	Propagate alter schema rename	2020-12-02 15:18:26 +03:00
Nils Dijk	6f9c040f76	DESCRIPTION: Propagate columnar table settings for distributed tables When distributing a columnar table, as well as changing options on a distributed columnar table, this patch will forward the settings from the coordinator to the workers. For propagating options changes on an already distributed table this change is pretty straight forward. Before applying the change in options locally we will create a `DDLJob` that contains a call to `alter_columnar_table_set(...)` for every shard placement with all settings of the current table. This goes both for setting an option as well as resetting. This will reset the values to the defaults configured on the coordinator. Having the effect that the coordinator is authoritative on the settings and makes sure the shards have the same settings set as the table on the coordinator. When a columnar table is distributed it is using the `TableDDLCommand` infra structure to create a new kind of `TableDDLCommand`. This new type, called a `TableDDLCommandFunction` contains a context and 2 function pointers to execute. One function returns the command as applied on the table, the second function will return the sql command to apply to a shard with a given shard id. The schema name is ignored as it will use the fully qualified name of the shard in the same schema as the base table.	2020-12-02 13:02:42 +01:00
Halil Ozan Akgül	ef0914a7f8	Adds ORDER BY to flaky test (#4305 ) Co-authored-by: Önder Kalacı <onder@citusdata.com>	2020-12-02 14:24:05 +03:00
Onder Kalaci	f7e1aa3f22	Multi-row INSERTs use local execution when placements are local Multi-row execution already uses sequential execution. When shards are local, using local execution is profitable as it avoids an extra connection establishment to the local node.	2020-12-01 21:37:59 +03:00
Marco Slot	04cffdd925	Run master_copy_shard_placement separately	2020-11-30 20:34:03 +01:00
Marco Slot	48caca4084	Improve regression test settings	2020-11-30 20:34:03 +01:00
Ahmet Gedemenli	8e5f0487eb	Add order by for flaky test	2020-12-01 10:54:52 +03:00
Ahmet Gedemenli	67761897ab	Add test for citus table size func in transaction with modification Add test for citus_relation_size	2020-12-01 10:38:15 +03:00
Hadi Moshayedi	feecb7b423	Columnar: few fixes (#4371 ) * Columnar: fix a memory issue * Columnar: no need for deferred triggers * Columnar: relax memory growth constraints	2020-11-30 18:09:43 -08:00
Hadi Moshayedi	a94e8c9cda	Associate column store metadata with storage id (#4347 )	2020-11-30 18:01:43 -08:00
Marco Slot	ecbc1ab008	Run subquery_prepared_statements by itself	2020-11-30 08:53:06 +01:00
Sait Talha Nisanci	8b0aed521f	Isolate join test Join test gets too many clients error too frequently hence we should not run anything concurrently with that. Hopefully this will fix the flakiness of test.	2020-12-01 00:00:17 +03:00
SaitTalhaNisanci	c31a8df380	Call 6 times not 7 in subquery_prepared_statements (#4357 )	2020-11-30 21:20:51 +03:00
Onur Tirtir	03bcccdee0	Fix hostname length check in StartNodeUserDatabaseConnection (#4363 ) Copying string before hostname length check makes the check useless	2020-11-30 20:00:35 +03:00
Onur Tirtir	7f3d1182ed	Handle invalid connection hash entries (#4362 ) If MemoryContextAlloc errors out -e.g. during an OOM-, ConnectionHashEntry->connections stays as NULL. With this commit, we add isValid flag to ConnectionHashEntry that should be set to true right after we allocate & initialize ConnectionHashEntry->connections list properly, and we check it before accesing to ConnectionHashEntry->connections.	2020-11-30 19:44:03 +03:00
SaitTalhaNisanci	8c3dd6338e	Run pg12 and pg13 separately (#4352 ) It seems that sometimes we get `too many clients errors` with this set of parallel tests, hence two of them are separated.	2020-11-30 19:32:49 +03:00
Hadi Moshayedi	7f43804dae	Normalize VACUUM VERBOSE output (#4353 ) This is to avoid flaky changes like the following in test outputs: -CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. +CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.02 s.	2020-11-27 12:07:25 -08:00
Nils Dijk	383e334023	refactor options to their own table linked to the regclass (#4346 ) Columnar options were by accident linked to the relfilenode instead of the regclass/relation oid. This PR moves everything related to columnar options to their own catalog table.	2020-11-27 11:22:08 -08:00
SaitTalhaNisanci	af02ac6cf5	Refactor MultiRouterPlannableQuery (#4350 ) The name of the function is different than the implemantation. Because the function is designed to only consider SELECT queries. Also this changes the assert with an error.	2020-11-27 18:44:38 +03:00
Nils Dijk	326e6afa53	refactor table ddl events scoped for shards (#4342 ) Refactor internals on how Citus creates the SQL commands it sends to recreate shards. Before Citus collected solely ddl commands as `char `'s to recreate a table. If they were used to create a shard they were wrapped with `worker_apply_shard_ddl_command` and send to the workers. On the workers the UDF wrapping the ddl command would rewrite the parsetree to replace tables names with their shard name equivalent. This worked well, but poses an issue when adding columnar. Due to limitations in Postgres on creating custom options on table access methods we need to fall back on a UDF to set columnar specific options. Now, to recreate the table, we can not longer rely on having solely DDL statements to recreate a table. A prototype was made to run this UDF wrapped in `worker_apply_shard_ddl_command`. This became pretty messy, hard to understand and subsequently hard to maintain. This PR proposes a refactor of the internal representation of table ddl commands into a `TableDDLCommand` structure. The current implementation only supports a `char ` as its contents. Based on the use of the DDL statement (eg. creating the table -mx- or creating a shard) one of two different functions can be called to get the statement to send to the worker: - `GetTableDDLCommand(TableDDLCommand command)`: This function returns that ddl command to create the table. In this implementation it will just return the `char `. This has the same functionality as getting the old list and not wrapping it. - `GetShardedTableDDLCommand(TableDDLCommand command, uint64 shardId, char schemaName)`: This function returns the ddl command wrapped in `worker_apply_shard_ddl_command` with the `shardId` as an argument. Due to backwards compatibility it also accepts a. `schemaName`. The exact purpose is not directly clear. Ideally new implementations would work with fully qualified statements and ignore the `schemaName`. A future implementation could accept 2.function pointers and a `void *` for context to let the two pointers work on. This gives greater flexibility in controlling what commands get send in which situations. Also, in a future, we could implement the intermediate step of creating the `parsetree` datastructure of statements based on the contents in the catalog with a corresponding deparser. For sharded queries a mutator could be ran over the parsetree to rewrite the tablenames to the names with the shard identifier. This will completely omit the requirement for `worker_apply_shard_ddl_command`.	2020-11-26 13:31:59 +01:00
SaitTalhaNisanci	83020f444e	Initialize fast planner restriction context (#4349 ) We initialize fast planner restriction context so that code paths that rely on this being not NULL will operate without a problem.	2020-11-26 13:45:27 +03:00
Onder Kalaci	629ecc3dee	Add the infrastructure to count the number of client backends Considering the adaptive connection management improvements that we plan to roll soon, it makes it very helpful to know the number of active client backends. We are doing this addition to simplify yhe adaptive connection management for single node Citus. In single node Citus, both the client backends and Citus parallel queries would compete to get slots on Postgres' `max_connections` on the same Citus database. With adaptive connection management, we have the counters for Citus parallel queries. That helps us to adaptively decide on the remote executions pool size (e.g., throttle connections if necessary). However, we do not have any counters for the total number of client backends on the database. For single node Citus, we should consider all the client backends, not only the remote connections that Citus does. Of course Postgres internally knows how many client backends are active. However, to get that number Postgres iterates over all the backends. For examaple, see [pg_stat_get_db_numbackends](`8e90ec5580/src/backend/utils/adt/pgstatfuncs.c (L1240)`) where Postgres iterates over all the backends. For our purpuses, we need this information on every connection establishment. That's why we cannot affort to do this kind of iterattion.	2020-11-25 19:19:24 +01:00

1 2 3 4 5 ...

2649 Commits (cc04fce10f87dea9b4d0ce4f00463daa1f17e1e1)