Commit Graph

40 Commits (484cb12cd038e5a3f3580df1efbe92826d78b837)

Author SHA1 Message Date
Burak Yucesoy 484cb12cd0 Add LoadShardPlacement UDF
This UDF returns a shard placement from cache given shard id and placement id. At the
moment it iterates over all shard placements of given shard by ShardPlacementList and
searches given placement id in that list, which is not a good solution performance-wise.
However, currently, this function will be used only when there is a failed transaction.
If a need arises we can optimize this function in the future.
2017-01-23 21:04:57 +03:00
Andres Freund 6972186652 Add ShardPlacement fields required for colocated placement connection mapping. 2017-01-16 13:42:54 -08:00
Andres Freund b813b39241 Cache ShardPlacements in metadata cache.
So far we've reloaded them frequently. Besides avoiding that cost -
noticeable for some workloads with large shard counts - it makes it
easier to add information to ShardPlacements that help us make
placement_connection.c colocation aware.
2017-01-10 18:14:18 -08:00
Andres Freund 8cb47195ba Make LoadShardInterval() backed by the metadata cache.
Doing so requires adding a mapping from shardId to the cache
entries. For that metadata_cache.c now maintains an additional
hashtable. That hashtable only references shard intervals in the
dist table cache.
2017-01-10 17:00:19 -08:00
Andres Freund f6e8647337 Split DistTableCacheEntry() into separate functions.
Previously the function was getting too large. Thus this splits the
function into separate parts for looking up the cache entry and
building the cache contents.
2017-01-10 15:23:18 -08:00
Burak Yucesoy 9c9f479e4b Replicate reference tables when new node is added
With this change, we start to replicate all reference tables to the new node when new node
is added to the cluster with master_add_node command. We also update replication factor
of reference table's colocation group.
2017-01-05 14:30:41 +03:00
Eren Basak 71d73ec5ff Propagate DDL commands to metadata workers for MX tables 2016-12-23 15:43:32 +03:00
Onder Kalaci 9f0bd4cb36 Reference Table Support - Phase 1
With this commit, we implemented some basic features of reference tables.

To start with, a reference table is
  * a distributed table whithout a distribution column defined on it
  * the distributed table is single sharded
  * and the shard is replicated to all nodes

Reference tables follows the same code-path with a single sharded
tables. Thus, broadcast JOINs are applicable to reference tables.
But, since the table is replicated to all nodes, table fetching is
not required any more.

Reference tables support the uniqueness constraints for any column.

Reference tables can be used in INSERT INTO .. SELECT queries with
the following rules:
  * If a reference table is in the SELECT part of the query, it is
    safe join with another reference table and/or hash partitioned
    tables.
  * If a reference table is in the INSERT part of the query, all
    other participating tables should be reference tables.

Reference tables follow the regular co-location structure. Since
all reference tables are single sharded and replicated to all nodes,
they are always co-located with each other.

Queries involving only reference tables always follows router planner
and executor.

Reference tables can have composite typed columns and there is no need
to create/define the necessary support functions.

All modification queries, master_* UDFs, EXPLAIN, DDLs, TRUNCATE,
sequences, transactions, COPY, schema support works on reference
tables as expected. Plus, all the pre-requisites associated with
distribution columns are dismissed.
2016-12-20 14:09:35 +02:00
Metin Doslu 86cca54857 Add colocate_with option to create_distributed_table()
With this commit, we support three versions of colocate_with: i.default, ii.none
and iii. a specific table name.
2016-12-16 14:53:35 +02:00
Murat Tuncer b5c1ecb684 Fix failures during pg_upgrade
- fix error in CitusHasBeenLoaded()
- allow creation of pg_catalog tables during upgrade
2016-11-11 17:22:45 -08:00
Andres Freund fcd150c7c8 Invalidate relcache after pg_dist_shard_placement changes.
This forces prepared statements to be re-planned after changes of the
placement metadata. There's some locking issues remaining, but that's a
a separate task.

Also add regression tests verifying that invalidations take effect on
prepared statements.
2016-10-26 03:36:35 -07:00
Brian Cloutier 2e96f6ab27 Fix crash when upgrading to Citus 6
Between restart (running the new code) and ALTER EXTENSION citus
UPGRADE there was an inconsistency where we assumed that
pg_dist_partition had the repmodel column set. Now we give it a default
value if the column doesn't exist yet.
2016-10-24 15:18:29 +03:00
Metin Doslu 161093908e Convert colocationid to uint32 2016-10-20 10:59:31 +03:00
Metin Doslu 40bdafa8d1 Add create_distributed_table()
create_distributed_table() creates a hash distributed table with default values
of shard count and shard replication factor.
2016-10-20 10:58:25 +03:00
Andres Freund ac14b2edbc
Support PostgreSQL 9.6
Adds support for PostgreSQL 9.6 by copying in the requisite ruleutils
file and refactoring the out/readfuncs code to flexibly support the
old-style copy/pasted out/readfuncs (prior to 9.6) or use extensible
node APIs (in 9.6 and higher).

Most version-specific code within this change is only needed to set new
fields in the AggRef nodes we build for aggregations. Version-specific
test output files were added in certain cases, though in most they were
not necessary. Each such file begins by e.g. printing the major version
in order to clarify its purpose.

The comment atop citus_nodes.h details how to add support for new nodes
for when that becomes necessary.
2016-10-18 16:23:55 -06:00
Eren Basak cee7b54e7c Add worker transaction and transaction recovery infrastructure 2016-10-18 14:18:14 +03:00
Eren Basak f3ede37c9f Add hasmetadata column to pg_dist_node 2016-10-17 11:52:18 +03:00
Eren Basak c7bf2021fa Add metadata infrastructure for pg_dist_local_group table 2016-10-17 11:52:18 +03:00
Eren Basak ed3af403fd Add Metadata Snapshot Infrastructure
This change adds the required infrastructure about metadata snapshot from MX
codebase into Citus, mainly metadata_sync.c file and master_metadata_snapshot UDF.
2016-10-13 10:40:14 +03:00
Andres Freund 982ad66753 Introduce placement IDs.
So far placements were assigned an Oid, but that was just used to track
insertion order. It also did so incompletely, as it was not preserved
across changes of the shard state. The behaviour around oid wraparound
was also not entirely as intended.

The newly introduced, explicitly assigned, IDs are preserved across
shard-state changes.

The prime goal of this change is not to improve ordering of task
assignment policies, but to make it easier to reference shards.  The
newly introduced UpdateShardPlacementState() makes use of that, and so
will the in-progress connection and transaction management changes.
2016-10-07 11:59:20 -07:00
Brian Cloutier 9d6699b07c Switch from pg_worker_list.conf file to pg_dist_node metadata table.
Related to #786

This change adds the `pg_dist_node` table that contains the information
about the workers in the cluster, replacing the previously used
`pg_worker_list.conf` file (or the one specified with `citus.worker_list_file`).

Upon update, `pg_worker_list.conf` file is read and `pg_dist_node` table is
populated with the file's content. After that, `pg_worker_list.conf` file
is renamed to `pg_worker_list.conf.obsolete`

For adding and removing nodes, the change also includes two new UDFs:
`master_add_node` and `master_remove_node`, which require superuser
permissions.

'citus.worker_list_file' guc is kept for update purposes but not used after the
update is finished.
2016-10-05 13:01:35 +03:00
Marco Slot 32b2bd4ed8 Add replication model column to pg_dist_partition 2016-10-05 01:14:28 +02:00
Burak Yucesoy 1ee39eb098 Internal co-location API
With this commit we introduce internal API for co-location related operations.
2016-09-29 11:56:53 +03:00
Andres Freund 63fb8311cb Don't access pg_dist_partition->partkey directly, use heap_getattr().
Text datums can't be directly accessed via the struct equivalence trick
used to access catalogs. That's because, as an optimization, they're
sometimes aligned to 1 byte ("text"'s alignment), and sometimes to 4
bytes. That depends on it being a short
varlena (cf. VARATT_NOT_PAD_BYTE) or not.

In the case at hand here, partkey became longer than 127 characters -
the boundary for short varlenas (cf. VARATT_CAN_MAKE_SHORT()). Thus it
became 4 byte/int aligned. Which lead to the direct struct access
accessing the wrong data.

The fix is simply to never access partkey that way - to enforce that,
hide partkey ehind the usual ifdef.

Fixes: #674
2016-07-29 10:02:36 -07:00
Murat Tuncer c20080992d Remove PostgreSQL 9.4 support 2016-07-26 20:16:09 +03:00
Andres Freund 25615ee9d7 Add CitusExtensionOwner(), to execute some priviledged operations under.
There exist some operations we have to execute with elevated
privileges. The most expedient user for that is the user owning the
citusdb extension.
2016-04-27 10:26:08 -07:00
Andres Freund 12a246de37 Perform permission checks in functions manipulating distributed tables.
Previously several commands, amongst them commands like
master_create_distributed_table(), were allowed for everyone. That's not
good: Even though citus currently requires superuser permissions, we
shouldn't allow non-superusers to perform actions as sensitive as making
a table distributed.

There's no checks on the worker_* functions, as these usually just punt
the action to underlying postgres functionality, which then perform the
necessary checks.
2016-04-27 10:22:20 -07:00
Andres Freund 42d232c0e8 Use the current session's username when connecting to worker nodes.
So far we've always used libpq defaults when connecting to workers; bar
special environment variables being set that'll always be the user that
started the server.  That's not desirable because it prevents using
users with fewer privileges.

Thus change the various APIs creating connections to workers to always
use usernames. That means:
1) MultiClientConnect() needs to, optionally, accept a username
2) GetOrEstablishConnection(), including the underlying cache, need to
   use the current user as part of the connection cache key. That way
   connections for separate users are distinct, and we always use one
   with the correct authorization.
3) The task tracker needs to keep track of the username associated with
   a task, so it can use it when establishing connections outside the
   originating session.
2016-04-27 10:00:08 -07:00
Onder Kalaci 108114ab99 Apply final code review feedback
- Fix o(n^2) loop to o(n)
- Collapse two if statements into a single one
- Some coding conventions feedback
2016-04-27 10:36:03 +03:00
Onder Kalaci c4b783b70b Fix Merge Conflict
This commit fixes merge conflicts.
2016-04-26 11:18:47 +03:00
Onder Kalaci 6c7abc2ba5 Add fast shard pruning path for INSERTs on hash partitioned tables
This commit adds a fast shard pruning path for INSERTs on
hash-partitioned tables. The rationale behind this change is
that if there exists a sorted shard interval array, a single
index lookup on the array allows us to find the corresponding
shard interval. As mentioned above, we need a sorted
(wrt shardminvalue) shard interval array. Thus, this commit
updates shardIntervalArray to sortedShardIntervalArray in the
metadata cache. Then uses the low-level API that is defined in
multi_copy to handle the fast shard pruning.

The performance impact of this change is more apparent as more
shards exist for a distributed table. Previous implementation
was relying on linear search through the shard intervals. However,
this commit relies on constant lookup time on shard interval
array. Thus, the shard pruning becomes less dependent on the
shard count.
2016-04-26 11:16:00 +03:00
Brian Cloutier 7a6d689259 Clear metadata_cache upon DROP EXTENSION
When we notice that pg_dist_partition is being invalidated we assume
that the citus extension is being dropped and drop state such as
extensionLoaded and the cached oids of all the metadata tables.

This frees the user from needing to reconnect after running DROP
EXTENSION, so we also no longer send a warning message.
2016-04-22 07:25:49 -07:00
Jason Petersen 423e6c8ea0
Update copyright dates
Fixed configure variable and updated all end dates to 2016.
2016-03-23 17:14:37 -06:00
Marco Slot 75a141a7c6 Merge remote-tracking branch 'origin/master' into feature/drop_shards_on_drop_table 2016-02-17 22:52:58 +01:00
Murat Tuncer 3528d7ce85 Merge from master branch into feature/citusdb-to-citus 2016-02-17 14:49:01 +02:00
Marco Slot 37f580f9c7 Trim comment about invalidating dropped relations 2016-02-16 14:04:12 +01:00
Marco Slot 2af6797c04 Perform relcache invalidation in CitusInvalidateRelcacheByRelid 2016-02-16 12:59:38 +01:00
Jason Petersen fdb37682b2
First formatting attempt
Skipped csql, ruleutils, readfuncs, and functions obviously copied from
PostgreSQL. Seeing how this looks, then continuing.
2016-02-15 23:29:32 -07:00
Murat Tuncer 55c44b48dd Changed product name to citus
All citusdb references in
- extension, binary names
- file headers
- all configuration name prefixes
- error/warning messages
- some functions names
- regression tests

are changed to be citus.
2016-02-15 16:04:31 +02:00
Onder Kalaci 136306a1fe Initial commit of Citus 5.0 2016-02-11 04:05:32 +02:00