Commit Graph

1474 Commits (1553e12ee4e5d58120b99192a91c37e031bbbe0c)

Author SHA1 Message Date
Jason Petersen 6a0dc7756e Formatting fixes
Noticed a lot of weird lines wrapped at 80; our standard is 90.
2019-03-22 20:32:19 -06:00
Jason Petersen 6acf52660c Always coerce RHS of pruning op to part. key type
Our assumption that strip_implicit_coercions would leave us with a bi-
nary-compatible type to that of the partition key was wrong. Instead,
we should ensure the RHS of the comparison we perform is proactively
coerced into a compatible type (at least binary compatible).
2019-03-22 20:32:19 -06:00
Jason Petersen 5baa257c91 Add second assert to guard against future changes
This isn't entirely necessary but I feel safer with it here.
2019-03-22 20:32:19 -06:00
Jason Petersen 69adb627c3 Add Assert that will crash before coercion fix is in 2019-03-22 20:32:19 -06:00
Hadi Moshayedi ff1d4f697a
Ignore test_times.log (#2638) 2019-03-22 10:29:01 -07:00
Nils Dijk feaac69769
Implementation for asycn FinishConnectionListEstablishment (#2584) 2019-03-22 17:30:42 +01:00
Marco Slot e3b7e74f43 Allow rescan in DECLARE .. WITH HOLD 2019-03-22 11:25:55 +01:00
Jason Petersen a2c6f596f9 Address code review comments 2019-03-21 11:59:52 -06:00
Jason Petersen 04aa34da68 Invalidate ConnParamsHash at config reload
At configuration reload, we free all "global" (i.e. GUC-set) connection
parameters, but these may still have live references in the connection
parameters hash. By marking the entries as invalid, we can ensure they
will not be used after free.
2019-03-21 00:03:35 -06:00
Jason Petersen 00d836e5a3 alloc non-global conn. params in provided context
Having DATA-segment string literals made blindly freeing the keywords/
values difficult, so I've switched to allocating all in the provided
context; because of this (and with the knowledge of the end point of
the global parameters), we can safely pfree non-global parameters when
we come across an invalid connection parameter entry.
2019-03-21 00:03:35 -06:00
Marco Slot e8152d9b6d Only look in top-level rtable in ExtractFirstDistributedTableId 2019-03-20 12:14:46 +03:00
Marco Slot ee6a0b6943 Speed up RTE walkers
Do it in two ways (a) re-use the rte list as much as possible instead of
re-calculating over and over again (b) Limit the recursion to the relevant
parts of the query tree
2019-03-20 12:14:46 +03:00
Marco Slot 5ff1821411 Cache the current database name
Purely for performance reasons.
2019-03-20 12:14:46 +03:00
Marco Slot 0ea4e52df5 Add nodeId to shardPlacements and use it for shard placement comparisons
Before this commit, shardPlacements were identified with shardId, nodeName
and nodeport. Instead of using nodeName and nodePort, we now use nodeId
since it apparently has performance benefits in several places in the
code.
2019-03-20 12:14:46 +03:00
Onder Kalaci 41d8c4030a Add some more regression tests for outer join pushdown 2019-03-19 11:49:38 +03:00
Onder Kalaci ad5ff1d01a Some queries lead to infinite recursion with recurisve planning
The rule for infinite recursion is the following:

    - If the query contains a subquery which is recursively planned, and
      no other subqueries can be recursively planned due to correlation
      (e.g., LATERAL joins), the planner keeps recursing again and again.

One interesting thing here is that even if a subquery contains only intermediate
result(s), we re-recursively plan that. In the end, the logic in the code does the following:

  - Try recursive planning any of the subqueries in the query tree
     - If any subquery is recursively planned, call the planner again
        where the subquery is replaced with the intermediate result.
        - Try recursively planning any of the queries
          - If any subquery is recursively planned, call the planner again
            where the subquery (in this case it is already intermediate result)
            is replaced with the intermediate result.
              - Try recursively planning any of the queries
                - If any subquery is recursively planned, call the planner again
                  where the subquery (in this case it is already intermediate result)
                  is replaced with the intermediate result.
                  - Try recursively planning any of the queries
                    - If any subquery is recursively planned, call the planner again
                      where the subquery (in this case it is already intermediate result)
                      is replaced with the intermediate result.
                      ......
2019-03-18 10:35:00 +03:00
Marco Slot f2abf2b8e5 Functions are treated as transaction blocks 2019-03-15 16:34:08 -06:00
Marco Slot 4b9bd54ae0 Remove create_insert_proxy_for_table 2019-03-15 14:13:03 -06:00
exialin 84b853e1b5 Fix some typos (#2620) 2019-03-14 16:48:31 -07:00
Hadi Moshayedi a9e6d06a98 Skip execution of ALTER TABLE constraint checks on the coordinator 2019-03-14 15:40:56 -07:00
Hadi Moshayedi cdd3b15ac8 Fix distributed deadlock for ALTER TABLE ... ATTACH PARTITION.
Following scenario resulted in distributed deadlock before this commit:

CREATE TABLE partitioning_test(id int, time date) PARTITION BY RANGE (time);
CREATE TABLE partitioning_test_2009 (LIKE partitioning_test);
CREATE TABLE partitioning_test_reference(id int PRIMARY KEY, subid int);

SELECT create_distributed_table('partitioning_test_2009', 'id'),
       create_distributed_table('partitioning_test', 'id'),
       create_reference_table('partitioning_test_reference');

ALTER TABLE partitioning_test ADD CONSTRAINT partitioning_reference_fkey FOREIGN KEY (id) REFERENCES partitioning_test_reference(id) ON DELETE CASCADE;
ALTER TABLE partitioning_test_2009 ADD CONSTRAINT partitioning_reference_fkey_2009 FOREIGN KEY (id) REFERENCES partitioning_test_reference(id) ON DELETE CASCADE;

ALTER TABLE partitioning_test ATTACH PARTITION partitioning_test_2009 FOR VALUES FROM ('2009-01-01') TO ('2010-01-01');
2019-03-14 15:28:37 -07:00
Hadi Moshayedi f19feb742c
Remove never assigned colocatedRelation from CreateDistributedTable (#2479) 2019-03-12 14:50:18 -07:00
Hanefi Onaldi 419f52884f
Merge branch 'master' into improve-mitmproxy-documentation 2019-03-12 07:16:01 -07:00
Murat Tuncer 2681231c98 Create column aliases for shard tables in worker queries when requested 2019-03-07 12:54:42 +03:00
Hadi Moshayedi f4d3b94e22
Fix some of the casts for groupId (#2609)
A small change which partially addresses #2608.
2019-03-05 12:06:44 -08:00
velioglu faf50849d7 Enhance pushdown planning logic to handle full outer joins with using clause
Since flattening query may flatten outer joins' columns into coalesce expr that is
in the USING part, and that was not expected before this commit, these queries were
erroring out. It is fixed by this commit with considering coalesce expression as well.
2019-03-05 11:49:30 +03:00
Onder Kalaci 26f569abd8 Make sure to clear PGresult on few places
This leads to a memory leak otherwise.
2019-02-28 13:44:34 +03:00
Jason Petersen 3df2f51881
Turn on style-checking, fix lingering violations
We'd been ignoring updating uncrustify for some time now because I'd
thought these were misclassifications that would require an update in
our rules to address. Turns out they're legit, so I'm checking them in.
2019-02-26 23:01:40 -07:00
Jason Petersen 5817bc3cce
Add test-timing script
Through some clever stream redirections and options, we can get decent
timing data for each of our tests.
2019-02-26 23:01:40 -07:00
Jason Petersen 5db45bac45
Enable CircleCI
The configuration for the build is in the YAML file; the changes to the
regression runner are backward-compatible with Travis and just add the
logic to detect whether our custom (isolation- and vanilla-enabled) pkg
is present.
2019-02-26 22:17:26 -07:00
Onder Kalaci f706772b2f Round-robin task assignment policy relies on local transaction id
Before this commit, round-robin task assignment policy was relying
on the taskId. Thus, even inside a transaction, the tasks were
assigned to different nodes. This was especially problematic
while reading from reference tables within transaction blocks.
Because, we had to expand the distributed transaction to many
nodes that are not necessarily already in the distributed transaction.
2019-02-22 19:26:38 +03:00
Onder Kalaci e521e7e39c Apply feedback 2019-02-22 18:14:30 +03:00
Onder Kalaci 407d0e30f5 Fix selectForUpdate bug 2019-02-21 18:21:41 +03:00
Onder Kalaci f144bb4911 Introduce fast path router planning
In this context, we define "Fast Path Planning for SELECT" as trivial
queries where Citus can skip relying on the standard_planner() and
handle all the planning.

For router planner, standard_planner() is mostly important to generate
the necessary restriction information. Later, the restriction information
generated by the standard_planner is used to decide whether all the shards
that a distributed query touches reside on a single worker node. However,
standard_planner() does a lot of extra things such as cost estimation and
execution path generations which are completely unnecessary in the context
of distributed planning.

There are certain types of queries where Citus could skip relying on
standard_planner() to generate the restriction information. For queries
in the following format, Citus does not need any information that the
standard_planner() generates:

  SELECT ... FROM single_table WHERE distribution_key = X;  or
  DELETE FROM single_table WHERE distribution_key = X; or
  UPDATE single_table SET value_1 = value_2 + 1 WHERE distribution_key = X;

Note that the queries might not be as simple as the above such that
GROUP BY, WINDOW FUNCIONS, ORDER BY or HAVING etc. are all acceptable. The
only rule is that the query is on a single distributed (or reference) table
and there is a "distribution_key = X;" in the WHERE clause. With that, we
could use to decide the shard that a distributed query touches reside on
a worker node.
2019-02-21 13:27:01 +03:00
Nils Dijk 1623c44fc7 Simplify make file for citus sql files 2019-02-19 21:29:20 -05:00
Hanefi Onaldi 148dcad0bb
More documentation and stale comments rewritten 2019-02-04 20:21:51 +03:00
Hanefi Onaldi 825666f912
Query samples in docs and better errors 2019-02-04 19:20:02 +03:00
Hanefi Onaldi 574b071113
Add wrapper function introduced in PG11 for compatibility 2019-02-04 19:20:02 +03:00
Hanefi Onaldi 1106e14385
Wrap functions in subqueries
remove debug logs to fix travis tests

Support RowType functions in joins

Regression tests for a custom type function in join
2019-02-04 19:19:29 +03:00
Hanefi Onaldi c5c3d6d0a3
Update the mitmscripts readme
and migrate readme to markdown
and create contribution guidelines
2019-02-04 11:30:05 +03:00
Hanefi Onaldi 4dd1f5784b
Failure&cancellation tests for mx metadata sync
Failure&Cancellation tests for initial start_metadata_sync() calls
to worker and DDL queries that send metadata syncing messages to an MX node

Also adds message type definitions for messages that are exchanged
during metadata syncing
-
2019-02-01 11:50:25 +03:00
Murat Tuncer b36b59dd4f Relax reference table restrictions in subquery union pushdowns
We used to error out if there is a reference table
in the query participating a union. This has caused
pushdownable queries to be evaluated in coordinator.

Now we let reference tables inside union queries as long
as there is a distributed table in from clause.

Existing join checks (reference table on the outer part)
sufficient enought that we do not need check the join relation
of reference tables.
2019-01-31 15:34:29 +03:00
Onder Kalaci ec67381ba2 Queries with only intermediate results do not rely on task assignment policy
Previously we allowed task assignment policy to have affect on router queries
with only intermediate results. However, that is erroneous since the code-path
that assigns placements relies on shardIds and placements, which doesn't exists
for intermediate results.

With this commit, we do not apply task assignment policies when a router query
hits only intermediate results.
2019-01-28 17:59:17 +03:00
Murat Tuncer cd5213abee Set sequential mode execution GUC for alter partitioned table
PG recently started propagating foreign key constraints
to partition tables. This came with a select query
to validate the the constaint.

We are already setting sequential mode execution for this
command. In order for validation select query to respect
this setting we need to explicitly set the GUC.

This commit also handles detach partition part.
2019-01-25 15:28:07 +03:00
velioglu 1bb0ec316a Reset planner restriction context instead of popping with recursive planning 2019-01-17 14:35:16 +03:00
Jason Petersen 339e6e661e
Remove 9.6 (#2554)
Removes support and code for PostgreSQL 9.6

cr: @velioglu
2019-01-16 13:11:24 -07:00
Nils Dijk 3f2bac18df
Add make target to run regression tests in isolation with vagrant
Also allow `multi_alter_table_add_constraints` to run in isolation
2019-01-16 11:41:09 +01:00
Marco Slot 1656b519c4 Plan outer joins through pushdown planning 2019-01-05 20:55:27 +01:00
Murat Tuncer b389bebda1 Move repeated code to a function 2019-01-03 17:19:01 +03:00
Murat Tuncer a72d959735 Fix multi_view tests 2019-01-03 17:07:26 +03:00
Murat Tuncer 2ed7d24591 Fix having clause bug for complex joins
We update column attributes of various clauses for a query
inluding target columns, select clauses when we introduce
new range table entries in the query.

It seems having clause column attributes were not updated.

This fix resolves the issue
2019-01-03 17:07:26 +03:00
Murat Tuncer ec36030fae Move functions calls that can fail to outside of spinlock
We had recently fixed a spinlock issue due to functions
failing, but spinlock is not being released.

This is the continuation of that work to eliminate possible
regression of the issue. Function calls that are moved out of
spinlock scope are macros and plain type casting. However,
depending on the configuration they have an alternate implementation
in PG source that performs memory allocation.

This commit moves last bit of codes to out of spinlock for completion purposes.
2019-01-03 15:59:56 +03:00
Murat Tuncer 3b95a03c3e
Merge branch 'master' into fix_spinlock_use 2018-12-25 14:41:21 +03:00
Hadi Moshayedi 38579d52d0
Speed-up run_command_on_shards(). (#2564)
We were establishing connections synchronously. Establishing
connections asynchronously results in some parallelization, saving
hundreds of milliseconds.

In a test I did, this decreased the query time from 150ms to 40ms.
2018-12-24 08:47:01 -05:00
Murat Tuncer 9671bc3cbb Make sure spinlock is not left unreleased when an exception is thrown
A spinlock is not released when an exception is thrown after
spinlock is acquired. This has caused infinite wait and eventual
crash in maintenance daemon.

This work moves the code than can fail to the outside of spinlock
scope so that in the case of failure spinlock is not left locked
since it was not locked in the first place.
2018-12-24 15:47:21 +03:00
Hanefi Onaldi fb497ddad1
Bump 8.2devel on master (#2567) 2018-12-24 13:49:50 +03:00
Onder Kalaci 9fff7d28a7
Revert 4925521 2018-12-21 15:36:40 -07:00
Marco Slot 2e4029973c
Remove sequential create index concurrently test 2018-12-21 14:03:00 -07:00
Marco Slot 1b1c6374f7
Execute CREATE INDEX CONCURRENTLY concurrently 2018-12-21 14:02:59 -07:00
Marco Slot 3ff2b47366 Restrict visibility of get_*_active_transactions functions to pg_monitor 2018-12-19 18:32:42 +01:00
Dimitri Fontaine 6a1a2b8458 Move an assert-only array-bound check to run-time.
When the bound-check fails at run-time, better abort with an error message
rather than trying to user memory we did not allocate.
2018-12-19 06:12:05 +01:00
Marco Slot 13f4a0ac9f Stabilize failure test shard IDs 2018-12-19 04:26:46 +01:00
Marco Slot 5b9376a7f8 Check ownership before taking locks in distributed table creation 2018-12-18 15:32:07 +01:00
Nils Dijk 694992e946
upgrade default ssl_ciphers to more restrictive on extension creation
Show ssl_ciphers in ssl_by_default_test
2018-12-12 15:33:15 +01:00
Jason Petersen 92893e9601
Fix control file version 2018-12-11 18:50:20 -07:00
Jason Petersen bd0d1f05e7
Bump SQL version
Should have been done when the release-8.0 branch was created…
2018-12-11 10:40:15 -07:00
velioglu 90704d9a52 Fix getting function oid to get hll_add_agg id 2018-12-10 14:16:19 +03:00
velioglu 3e0cff94a6 Add FunctionOidExtended function 2018-12-10 11:59:41 +03:00
Nils Dijk 4af40eee76 Enable SSL by default during installation of citus 2018-12-07 11:23:19 -07:00
velioglu 8764a19464 Adds support for disabling hash agg with hll functions on coordinator query 2018-12-07 18:49:25 +03:00
Marco Slot 9cf91c438b Only allow transmit from pgsql_job_cache directory 2018-12-05 10:18:27 +01:00
Marco Slot 70fb9c851b Remove odd memcpy usag in BuildCachedShardList 2018-12-04 14:09:10 +01:00
Marco Slot 0388324fbe Expand planner readme 2018-12-04 09:55:19 +01:00
Dimitri Fontaine d1b182de7d Replace calls to unsafe functions like memcpy and sscanf
In answer to a security audit, we double check buffer sizes and avoid
known-dangerous operations such as sscanf.
2018-12-04 08:54:43 +01:00
Onder Kalaci 621ccf3946 Ensure to use initialized MaxBackends
Postgresql loads shared libraries before calculating MaxBackends.
However, Citus relies on MaxBackends being set. Thus, with this
commit we use the same steps to calculate MaxBackends while
Citus is being loaded (e.g., PG_Init is called).

Note that this is safe since all the elements that are used to
calculate MaxBackends are PGC_POSTMASTER gucs and a constant
value.
2018-12-03 13:25:51 +03:00
Onder Kalaci b6ebd791a6 Sort task list for multi-task explain outputs
This is purely for ensuring that regression tests do not randomly fail.
2018-11-30 11:19:37 -07:00
Onder Kalaci 18c9badff5 Make sure the explain output for partition wise join is stable
We disable bunch of planning options on the workers. This might be
risky if any concurrent test relies on EXPLAIN OUTPUT as well. Still,
we want to keep this test, so we should try to not parallelize this
test with such test.
2018-11-30 16:44:57 +03:00
Marco Slot 8893cc141d Support INSERT...SELECT with ON CONFLICT or RETURNING via coordinator
Before this commit, Citus supported INSERT...SELECT queries with
ON CONFLICT or RETURNING clauses only for pushdownable ones, since
queries supported via coordinator were utilizing COPY infrastructure
of PG to send selected tuples to the target worker nodes.

After this PR, INSERT...SELECT queries with ON CONFLICT or RETURNING
clauses will be performed in two phases via coordinator. In the first
phase selected tuples will be saved to the intermediate table which
is colocated with target table of the INSERT...SELECT query. Note that,
a utility function to save results to the colocated intermediate result
also implemented as a part of this commit. In the second phase, INSERT..
SELECT query is directly run on the worker node using the intermediate
table as the source table.
2018-11-30 15:29:12 +03:00
Hanefi Onaldi 088a2ef66a throw an error when a subquery has grouping set clause 2018-11-30 13:11:32 +03:00
Onder Kalaci a15f168ce4 Ensure that citus_dist_activity test outputs do not change
Since there is no lock ordering among the query that is executed
and the select from the view, we prefer to add a timeout before
priting the activity.
2018-11-30 11:46:17 +03:00
Nils Dijk 9309e63156
create_distributed_table as user, change table ownership during create 2018-11-29 14:20:42 +01:00
Nils Dijk 6aa191f72c
remove table_ddl_command_array and test master_get_table_ddl_events 2018-11-29 14:20:42 +01:00
Murat Tuncer fd868ec268 Fix citus_stat_statements view
Join between pg_stat_statements and citus_query_stats should
include queryid, dbid, userid instead of just queryid.
2018-11-29 14:49:16 +03:00
Dimitri Fontaine 5ae2d03881 Refrain from having a strong opinion on maxGroupId.
When initializing a Citus formation automatically from an external piece of
software such as Citus-HA, the following process process may be used:

  - decide on the groupId in the external software
  - SELECT * FROM master_add_inactive_node('localhost', 9701, groupid => X)

When Citus checks for maxGroupId, it forbids other software to pick their
own group Ids to ues with the master_add_inactive_node() API.

This patch removes the extra testing around maxGroupId.
2018-11-28 04:29:15 +01:00
Marco Slot 0393910c65 Shard IDs in isolation_citus_dist_stat_activity output changed 2018-11-28 02:59:50 +01:00
Marco Slot aff37cf1bc Control multi-shard modify locks with enable_deadlock_prevention 2018-11-28 02:59:50 +01:00
Marco Slot 1ec5b6c890 Remove old worker_hash_partition_table API 2018-11-26 14:40:37 +01:00
Marco Slot 5a63deab2e Clean up UDFs and remove unnecessary permissions 2018-11-26 14:40:37 +01:00
Hanefi Onaldi 448b241ab4 validate query isolation tests 2018-11-26 14:04:51 +03:00
Hanefi Onaldi 4edb193f25 make the tests parallelizeable
helper view table_fkeys_in_workers now allows filtering by schema so that a test case can print out foreign keys in its schema only
2018-11-26 14:04:51 +03:00
Hanefi Onaldi b3d897039a constraint validation regression tests 2018-11-26 14:04:51 +03:00
Hanefi Onaldi 7db6991dc0 propagate validate queries to workers 2018-11-26 14:04:51 +03:00
Marco Slot e8e956aa9f Require superuser when using non-existent job schema in worker_merge_files_into_table 2018-11-24 02:57:16 +01:00
Marco Slot c4ad899dd8 Check schema ownership in worker_merge_* functions 2018-11-23 11:05:09 +01:00
Marco Slot e9a7295ead Add multi-user tests for task-tracker protocol functions 2018-11-23 11:05:09 +01:00
Marco Slot 8e93fe5870 Check schema owner in task_tracker_assign_task 2018-11-23 11:05:09 +01:00
Marco Slot ec957a833a Check permission in task_tracker_task_status 2018-11-23 11:04:58 +01:00
Marco Slot 4245032849 Add user ID suffixes to filenames in check-worker tests 2018-11-23 08:36:12 +01:00
Marco Slot 6aa5592e52 Add user ID suffix to intermediate files in re-partition jobs 2018-11-23 08:36:11 +01:00
Marco Slot a59bf31c76 Use worker_execute_sql_task UDF in task-tracker executor 2018-11-22 18:15:33 +01:00