citus

Commit Graph

Author	SHA1	Message	Date
SaitTalhaNisanci	98f95e2a5e	add TaskQueryStringForPlacement TaskQueryStringForPlacement simplifies how the executor gets the query string for a given placement. Task will use the necessary fields to return the correct query placement string. Executor doesn't need to know the details for this. rename TaskQueryString as TaskQueryStringAllPlacements TaskQueryString returns the query string that will be the same for all the placements. In INSERT..SELECT the query string can be different for each placement. Adaptive executor uses TaskQueryStringForPlacement, which returns the query string for a placement. It makes sense to rename TaskQueryString as TaskQueryStringAllPlacements as it is returning the query string for all placements. rename SetTaskQuery as SetTaskQueryIfShouldLazyDeparse SetTaskQuery does not always sets the task query. It can set the query string as well. So it is more clear to name it SetTaskQueryIfShouldLazyDeparse, since it will set the query not query string only when we should deparse the query in a lazy way.	2020-03-31 15:47:55 +03:00
SaitTalhaNisanci	982b5fbabf	add SetTaskPerPlacementStrings It is possible that a task will have different query string for each placement. This is the case in INSERT..SELECT via repartitioning. When we are setting task->perPlacementQueryString, we should set queryStringLazy to NULL. Therefore a method for that purpose is created.	2020-03-31 15:47:55 +03:00
Marco Slot	331b45348c	Fix error when using LEFT JOIN with GROUP BY on primary key	2020-03-30 16:42:22 +02:00
SaitTalhaNisanci	e1802c5c00	extract local plan cache related methods into a file (#3667 )	2020-03-31 11:11:34 +03:00
SaitTalhaNisanci	8dfc2cb122	not append ; if end of the list in StringJoin (#3672 )	2020-03-31 10:01:28 +03:00
Philip Dubé	67d2ad4e37	Fixes flaky test in multi_reference_table: ORDER BY (#3676 ) Fixes app.circleci.com/pipelines/github/citusdata/citus/7744/workflows/0848f36c-af9e-46b7-9dda-a421df54ba56/jobs/109503	2020-03-30 23:31:10 +02:00
Philip Dubé	4eb2c33f38	multi_copy.c: remove tableMetadata	2020-03-30 19:26:44 +00:00
Jelte Fennema	3be665269f	Reintroduce ForceSearchShardPlacementInList (#3664 ) This was added to silence static analysis errors. It was removed accidentally in #3591. This reintroduces it again.	2020-03-27 14:28:50 +01:00
Hanefi Onaldi	0e8103b101	Propagate ALTER ROLE .. SET statements In PostgreSQL, user defaults for config parameters can be changed by ALTER ROLE .. SET statements. We wish to propagate those defaults accross the Citus cluster so that the behaviour will be similar in different workers. The defaults can either be set in a specific database, or the whole cluster, similarly they can be set for a single role or all roles. We propagate the ALTER ROLE .. SET if all the conditions below are met: - The query affects the current database, or all databases - The user is already created in worker nodes	2020-03-27 13:02:48 +03:00
Marco Slot	a65ffee266	Fixes a bug that causes some DML queries containing aggregates to fail	2020-03-26 16:08:34 +00:00
SaitTalhaNisanci	d3fdade2e8	add missing perPlacementQueryStrings to copy and out funcs (#3657 )	2020-03-26 17:16:29 +03:00
Marco Slot	b89e9dc158	Fix a bug which caused queries with SRFs and function evalution to fail	2020-03-25 06:55:53 +01:00
SaitTalhaNisanci	dd1a456407	store query command list in task (#3649 ) Sometimes we have concatenated query strings for a task. However, when we want to find each query string, it is not a trivial task. Therefore, it makes sense to store this in task so that when we need each query string we can easily get it.	2020-03-26 12:04:08 +03:00
Philip Dubé	917cb6ae93	Don't segfault on queries using GROUPING GROUPING will always return 0 outside of GROUPING SETS, CUBE, or ROLLUP Since we don't support those, it makes sense to reject GROUPING in queries	2020-03-25 15:46:43 +00:00
Philip Dubé	720525cfda	Add support for window functions on coordinator Some refactoring: Consolidate expression which decides whether GROUP BY/HAVING are pushed down Rename early pullUpIntermediateRows to hasNonDistributableAggregates Create WorkerColumnName to handle formatting WORKER_COLUMN_FORMAT Ignore NULL StringInfo pointers to SafeToPushdownWindowFunction Fix bug where SubqueryPushdownMultiNodeTree mutates supplied Query, SafeToPushdownWindowFunction requires the original query as it relies on rtable	2020-03-25 15:31:20 +00:00
Nils Dijk	4e611cfc25	Refactor dependency resolution and resolve from pg_shdepend (#3633 ) DESCRIPTION: Refactor dependency resolution and resolve from pg_shdepend This PR refactors how dependencies are resolved by not assuming solely a `pg_depend` record describing the dependency. Instead we keep a definition of the dependency around which records how the dependency is resolved. This can be one of the following ways - `pg_depend`, data will contain a copy of the `pg_depend` record - `pg_shdepend`, data will contain a copy of the `pg_shdepend` record - `ObjectAddress`, data will contain only an `ObjectAddress` describing a dependency Irregardless of way the dependency was found it will always be able to get to the address of the dependency as that is the most important property. For some checks we can inspect the source where the dependency was found and perform a deep inspection to decide if we want to follow the dependency. This is important to not distribute dependencies coming from extensions for example.	2020-03-25 13:38:25 +01:00
Onur Tirtir	52fd58d51f	move MakeNameListFromRangeVar function to a more appropriate file	2020-03-25 11:01:50 +03:00
Onur Tirtir	2396b66ac5	remove an outdated comment in local executor	2020-03-25 11:01:40 +03:00
Onur Tirtir	8ebb8ef31d	use PG_USED_FOR_ASSERTS_ONLY	2020-03-25 11:01:33 +03:00
Onur Tirtir	81d48d3466	fix some typos	2020-03-25 11:01:26 +03:00
Jelte Fennema	149f0b2122	Use Microsoft approved cipher string (#3639 ) This cipher string is approved by the Microsoft security team and only enables TLSv1.2 ciphers.	2020-03-24 15:51:44 +01:00
Jelte Fennema	2aabe3e2ef	Mark all connections for shutdown when citus.node_conninfo chan… (#3642 ) We cache connections between nodes in our connection management code. This is good for speed. For security this can be a problem though. If the user changes settings related to TLS encryption they want those to be applied to future queries. This is especially important when they did not have TLS enabled before and now they want to enable it. This can normally be achieved by changing citus.node_conninfo. However, because connections are not reopened there will still be old connections that might not be encrypted at all. This commit changes that by marking all connections to be shutdown at the end of their current transaction. This way running transactions will succeed, even if placement requires connections to be reused for this transaction. But after this transaction completes any future statements will use a connection created with the new connection options. If a connection is requested and a connection is found that is marked for shutdown, then we don't return this connection. Instead a new one is created. This is needed to make sure that if there are no running transactions, then the next statement will not use an old cached connection, since connections are only actually shutdown at the end of a transaction.	2020-03-24 15:31:41 +01:00
Hadi Moshayedi	b46b9a68ae	Tests for master_copy_shard_placement	2020-03-23 08:33:55 -07:00
Marco Slot	ede176d849	Implement shard placement copying	2020-03-23 08:33:08 -07:00
Philip Dubé	dd2bd53e5b	PartiallyEvaluateExpression: Avoid unrecognized paramkind: 2	2020-03-23 14:14:01 +00:00
SaitTalhaNisanci	3b7959a763	not run local shard copy test in parallel (#3640 ) It seems that when logging is enabled we should not run local shard copy in parallel with other tests. The reason is that it adds coordinator for reference tables and if the parallel test creates a schema before this test is run, the schema will be logged. So it is not deterministic.	2020-03-23 14:38:18 +03:00
SaitTalhaNisanci	c5c446f84f	not run local_shard_copy in parallel (#3635 )	2020-03-23 13:56:25 +03:00
SaitTalhaNisanci	3df578010e	add a UDF to update colocation (#3623 ) If two tables have the same distribution column type, we implicitly colocate them. This is useful since colocation has a big performance impact in most applications. When a table is rebalanced, all of the colocated tables are also rebalanced. If table A and table B are colocated and we want to rebalance table A, table B will also be rebalanced. We need replica identity so that logical replication can replicate updates and deletes during rebalancing. If table B does not have a replica identity we error out. A solution to this is to introduce a UDF so that colocation can be updated. The remaining tables in the colocation group will stay colocated. For example if table A, B and C are colocated and after updating table B's colocations, table A and table C stay colocated. The "updating colocation" step does not move any data around, it only updated pg_dist_partition and pg_dist_colocation tables. Specifically it creates a new colocation group for the table and updates the entry in pg_dist_partition while invalidating any cache.	2020-03-23 13:22:24 +03:00
Onder Kalaci	7b4eb9611b	Properly terminate connections at the end session Citus coordinator (or MX nodes) caches `citus.max_cached_conns_per_worker` connections per node. This means that, those connections are not terminated after each statement. Instead, cached to avoid the cost of re-establishment. This is crucial for OLTP performance. The problem with that approach is that, we never properly handle the termnation of those cached connections. For instance, when a session on the coordinator disconnects, you'd see the following logs on the workers: ``` 2020-03-20 09:13:39.454 CET [64028] LOG: could not receive data from client: Connection reset by peer ``` With this patch, we're terminating the cached connections properly at the end of the connection.	2020-03-20 17:34:34 +01:00
Jelte Fennema	56863e8f0b	Really ignore -Wgnu-variable-sized-type-not-at-end (#3627 )	2020-03-20 11:53:28 +01:00
Jelte Fennema	ed0376bb41	Unparallelize tests (#3629 ) We're getting a lot of random failures on CI regarding connection errors. This works around that by not running that create lots of connections in parallel.	2020-03-20 10:31:34 +01:00
Jelte Fennema	6db7d87618	Compile safestringlib using regular configure This is needed to automatically generate .bc (bitcode) files when postgres is compiled with llvmjit support. It also has the advantage that cmake is not required for the build anymore.	2020-03-19 11:52:20 +01:00
Nils Dijk	6ff79c5ea9	Revert: Semmle: Protect against theoretical race in recursive d… (#3619 ) As discussed with @JelteF; #3559 caused consistent errors on BSD (OSX). Given a group of people use this environment to develop on it is an undesirable change. This reverts commit `ca8f7119fe`.	2020-03-18 13:48:05 +01:00
SaitTalhaNisanci	2eaf7bba69	not use local copy if we are copying into intermediate results file We have special logic to copy into intermediate results and we use a custom format for that, "result" copy format. Postgres internally does not know this format and if we use this locally it will error saying that it does not know this format. Files are visible to all transactions, which means that we can use any connection to access files. In order to use the existing logic, it makes sense that in case we have intermediate results, which means we will write the results to a file, we preserve the same behavior, which is opening connections to localhost. Therefore if we have intermediate results we return false in ShouldExecuteCopyLocally.	2020-03-18 09:35:20 +03:00
SaitTalhaNisanci	9d2f3c392a	enable local execution in INSERT..SELECT and add more tests We can use local copy in INSERT..SELECT, so the check that disables local execution is removed. Also a test for local copy where the data size > LOCAL_COPY_FLUSH_THRESHOLD is added. use local execution with insert..select	2020-03-18 09:34:39 +03:00
SaitTalhaNisanci	42cfc4c0e9	apply review items log shard id in local copy and add more comments	2020-03-18 09:33:55 +03:00
SaitTalhaNisanci	c22068e75a	use the right partition for partitioned tables	2020-03-18 09:28:59 +03:00
SaitTalhaNisanci	1df9601e13	not use local copy if current transaction is connected to local group If current transaction is connected to local group we should not use local copy, because we might not see some of the changes that are made over the connection to the local group.	2020-03-18 09:28:59 +03:00
SaitTalhaNisanci	39bbec0f30	add tests for local copy execution	2020-03-18 09:28:59 +03:00
SaitTalhaNisanci	f9c4431885	add the support to execute copy locally A copy will be executed locally if - Local execution is enabled and current transaction accessed a local placement - Local execution is enabled and we are inside a transaction block. So even if local execution is enabled but we are not in a transaction block, the copy will not be run locally. This will not run locally: ``` COPY distributed_table FROM STDIN; .... ``` This will run locally: ``` SET citus.enable_local_execution to 'on'; BEGIN; COPY distributed_table FROM STDIN; COMMIT; .... ``` . There are 3 ways to do a copy in postgres programmatically: - from a file - from a program - from a callback function I have chosen to implement it with a callback function, which means that we write the rows of copy from a callback function to the output buffer, which is used to insert tuples into the actual table. For each shard id, we have a buffer that keeps the current rows to be written, we perform the actual copy operation either when: - copy buffer for the given shard id reaches to a threshold, which is currently 512KB - we reach to the end of the copy The buffer size is debatable(512KB). At a given time, we might allocate (local placement * buffer size) memory at most. The local copy uses the same copy format as remote copy, which means that we serialize the data in the same format as remote copy and send it locally. There was also the option to use ExecSimpleRelationInsert to insert slots one by one, which would avoid the extra serialization/deserialization but doing some benchmarks it seems that using buffers are significantly better in terms of the performance. You can see this comment for more details: https://github.com/citusdata/citus/pull/3557#discussion_r389499054	2020-03-18 09:28:59 +03:00
Jelte Fennema	99c5b0add7	Make building safestringlib on some distros easier (#3616 ) On some distros (e.g. Redhat 7) there is cmake version 2 and cmake version 3, safestringlib requires cmake version 3. On those distros the binary is called cmake3, so try to use that one before falling back to regular cmake binary.	2020-03-16 11:34:30 +01:00
Philip Dubé	7b382e43bc	multi_logical_optimizer: replace ListCopyDeep with copyObject, stack allocate WorkerAggregateWalkerContext	2020-03-13 15:46:01 +00:00
Nils Dijk	e5237b9e20	Fix left join shard pruning (#3569 ) DESCRIPTION: Fix left join shard pruning in pushdown planner Due to #2481 which moves outer join planning through the pushdown planner we caused a regression on the shard pruning behaviour for outer joins. In the pushdown planner we make a union of the placement groups for all shards accessed by a query based on the filters we see during planning. Unfortunately implicit filters for left joins are not available during this part. This causes the inner part of an outer join to not prune any shards away. When we take the union of the placement groups it shows the behaviour of not having any shards pruned. Since the inner part of an outer query will not return any rows if the outer part does not contain any rows we have observed we do not have to add the shard intervals of the inner part of an outer query to the list of shard intervals to query. Fixes: #3512	2020-03-13 15:20:45 +01:00
Onur Tirtir	a14739f808	Local execution of ddl/drop/truncate commands (#3514 ) * reimplement ExecuteUtilityTaskListWithoutResults for local utility command execution * introduce new functions for local execution of utility commands * change ErrorIfTransactionAccessedPlacementsLocally logic for local utility command execution * enable local execution for TRUNCATE command on distributed & reference tables * update existing tests for local utility command execution * enable local execution for DDL commands on distributed & reference tables * enable local execution for DROP command on distributed & reference tables * add normalization rules for cascaded commands * add new tests for local utility command execution	2020-03-13 15:39:32 +03:00
Jelte Fennema	ca8f7119fe	Semmle: Protect against theoretical race in recursive directory… (#3559 ) In between stat at the start of the loop and unlink/rmdir at the end the item that the filename references might have changed. In some cases this can be a security bug, but since we only delete the file/directory it should not be for us as far as I can tell. It could in theory still cause errors though if the a file is changed into a directory by some other process. This commit makes the code robust against that, by not using stat and only rely on error codes and retries.	2020-03-13 10:37:13 +01:00
SaitTalhaNisanci	77f96a1f87	retry vanilla tests if they fail once more (#3611 )	2020-03-12 12:50:06 +03:00
Jelte Fennema	c7aa6eddf3	Fix some bugs in string to int functions (#3602 ) This fixes 3 bugs: 1. `strtoul` never underflows, so that branch was useless 2. `strtoul` has ULONG_MAX instead of LONG_MAX when it overflows 3. `long` and `unsigned long` are not necessarily 64bit, they can be either more or less. So now `strtoll` and `strtoull` are used and 64 bit bounds are checked.	2020-03-11 23:03:02 +01:00
Jelte Fennema	c4cc26ed37	Semmle: Ensure stack memory is not leaked through uninitialized… (#3561 ) New stack memory can contain anything including passwords/private keys. In these functions we return structs that can have their padding bytes uninitialized. By first zeroing out the struct fully, we try to ensure that any data that is in these padding bytes is at least overwritten once. It might not be zero anymore after setting the fields, but at least it shouldn't be private data anymore.	2020-03-11 20:05:36 +01:00
Philip Dubé	11b968bc30	Add runtime type checking to AGGREGATE_CUSTOM_COMBINE helper functions	2020-03-11 17:20:30 +00:00
Jelte Fennema	e0bbe1ca38	Semmle: Actively check one possible NULL deref case (#3560 ) Calling ErrorIfUnsupportedConstraint was still giving errors on Semmle. This makes sure that we check for NULL at runtime. This way we can safely ignore all errors created by this function.	2020-03-11 18:11:56 +01:00

1 2 3 4 5 ...

2166 Commits (98f95e2a5e47befe37bc357d4022d4d67aeee7aa)