Commit Graph

24 Commits (f8d3e50f255fe0acb6cc9fd13cc4630e6cb74631)

Author SHA1 Message Date
Halil Ozan Akgul f8d3e50f25 Introduces STATUS_WAITING_COMPAT macro
The STATUS_WAITING define is removed and an enum with PROC_WAIT_STATUS_WAITING is added instead
This macro uses appropriate one

Relevant PG commit:
a513f1dfbf2c29a51b0f7cbd5913ce2d2ee452c5
2021-09-03 15:27:24 +03:00
jeff-davis 7b9aecff21 Columnnar: metapage changes. (#4907)
* Columnar: introduce columnar storage API.

This new API is responsible for the low-level storage details of
columnar; translating large reads and writes into individual block
reads and writes that respect the page headers and emit WAL. It's also
responsible for the columnar metapage, resource reservations (stripe
IDs, row numbers, and data), and truncation.

This new API is not used yet, but will be used in subsequent
forthcoming commits.

* Columnar: add columnar_storage_info() for debugging purposes.

* Columnar: expose ColumnarMetadataNewStorageId().

* Columnar: always initialize metapage at creation time.

This avoids the complexity of dealing with tables where the metapage
has not yet been initialized.

* Columnar: columnar storage upgrade/downgrade UDFs.

Necessary upgrade/downgrade step so that new code doesn't see an old
metapage.

* Columnar: improve metadata.c comment.

* Columnar: make ColumnarMetapage internal to the storage API.

Callers should not have or need direct access to the metapage.

* Columnar: perform resource reservation using storage API.

* Columnar: implement truncate using storage API.

* Columnar: implement read/write paths with storage API.

* Columnar: add storage tests.

* Revert "Columnar: don't include stripe reservation locks in lock graph."

This reverts commit c3dcd6b9f8.

No longer needed because the columnar storage API takes care of
concurrency for resource reservation.

* Columnar: remove unnecessary lock when reserving.

No longer necessary because the columnar storage API takes care of
concurrent resource reservation.

* Add simple upgrade tests for storage/ branch

* fix multi_extension.out

Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
2021-05-10 20:16:46 +03:00
Hadi Moshayedi c3dcd6b9f8 Columnar: don't include stripe reservation locks in lock graph. 2021-02-10 10:20:20 -08:00
Onur Tirtir 88bfd2e4b7 refactor around local group id checks
Mostyl optimizes the calls made to GetLocalGroupId and refactors
its usages
2020-03-05 20:20:41 +03:00
Philip Dubé 20abc4d2b5
Replace foreach with foreach_ptr/foreach_oid (#3544) 2020-02-27 16:54:49 +01:00
Philip Dubé 3a906b8210 Fix typos noticed while reading through code trying to understand HAVING 2020-02-11 19:55:10 +00:00
Jelte Fennema 1d8dde232f
Automatically convert useless declarations using regex replace (#3181)
* Add declaration removal to CI

* Convert declarations
2019-11-21 13:47:29 +01:00
SaitTalhaNisanci 94a7e6475c
Remove copyright years (#2918)
* Update year as 2012-2019

* Remove copyright years
2019-10-15 17:44:30 +03:00
Jelte Fennema cbecf97c84
Move tuplestore setup to a helper function (#2898)
* Add tuplestore helpers

* More detailed error messages in tuplestore

* Add CreateTupleDescCopy to SetupTuplestore

* Use new SetupTuplestore helper function

* Remove unnecessary copy

* Remove comment about undefined behaviour
2019-08-27 09:11:08 +02:00
Onder Kalaci 621ccf3946 Ensure to use initialized MaxBackends
Postgresql loads shared libraries before calculating MaxBackends.
However, Citus relies on MaxBackends being set. Thus, with this
commit we use the same steps to calculate MaxBackends while
Citus is being loaded (e.g., PG_Init is called).

Note that this is safe since all the elements that are used to
calculate MaxBackends are PGC_POSTMASTER gucs and a constant
value.
2018-12-03 13:25:51 +03:00
Onder Kalaci a94184fff8 Prevent overflow of memory accesses during deadlock detection
In the distributed deadlock detection design, we concluded that prepared transactions
cannot be part of a distributed deadlock. The idea is that (a) when the transaction
is prepared it already acquires all the locks, so cannot be part of a deadlock
(b) even if some other processes blocked on the prepared transaction,  prepared transactions
 would eventually be committed (or rollbacked) and the system will continue operating.

With the above in mind, we probably had a mistake in terms of memory allocations. For each
backend initialized, we keep a `BackendData` struct. The bug we've introduced is that, we
assumed there would only be `MaxBackend` number of backends. However, `MaxBackends` doesn't
include the prepared transactions and axuliary processes. When you check Postgres' InitProcGlobal`
you'd see that `TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts;`

This commit aligns with total procs processed with that.
2018-09-17 16:23:29 +03:00
Onder Kalaci d657759c97 Views to Provide some insight about the distributed transactions on Citus MX
With this commit, we implement two views that are very similar
to pg_stat_activity, but showing queries that are involved in
distributed queries:

    - citus_dist_stat_activity: Shows all the distributed queries
    - citus_worker_stat_activity: Shows all the queries on the shards
                                  that are initiated by distributed queries.

Both views have the same columns in the outputs. In very basic terms, both of the views
are meant to provide some useful insights about the distributed
transactions within the cluster. As the names reveal, both views are similar to pg_stat_activity.
Also note that these views can be pretty useful on Citus MX clusters.

Note that when the views are queried from the worker nodes, they'd not show the distributed
transactions that are initiated from the coordinator node. The reason is that the worker
nodes do not know the host/port of the coordinator. Thus, it is advisable to query the
views from the coordinator.

If we bucket the columns that the views returns, we'd end up with the following:

- Hostnames and ports:
   - query_hostname, query_hostport: The node that the query is running
   - master_query_host_name, master_query_host_port: The node in the cluster
                                                   initiated the query.
    Note that for citus_dist_stat_activity view, the query_hostname-query_hostport
    is always the same with master_query_host_name-master_query_host_port. The
    distinction is mostly relevant for citus_worker_stat_activity. For example,
    on Citus MX, a users starts a transaction on Node-A, which starts worker
    transactions on Node-B and Node-C. In that case, the query hostnames would be
    Node-B and Node-C whereas the master_query_host_name would Node-A.

- Distributed transaction related things:
    This is mostly the process_id, distributed transactionId and distributed transaction
    number.

- pg_stat_activity columns:
    These two views get all the columns from pg_stat_activity. We're basically joining
    pg_stat_activity with get_all_active_transactions on process_id.
2018-09-10 21:33:27 +03:00
Onder Kalaci 5bea95009b Skip autovacuum processes for distributed deadlock detection
Autovacuum process cancels itself if any modification starts
on the table in order to avoid blocking your regular Postgres
sessions. That's normal and expected. Thus, any locks held by
autovacuum process cannot involve in a distributed deadlock
since it'll be released if needed.
2017-11-15 14:32:16 +02:00
Onder Kalaci c65c153a46 Skip speculative locks for distributed deadlock detection
These locks are held for a very short duration time and cannot
contribute to a deadlock. Speculative locks are used by Postgres
for internal notification mechanism among transactions.
2017-11-15 12:43:45 +02:00
Onder Kalaci 94921a2be1 Skip page-level locks on distributed deadlock detection
Short-term share/exclusive page-level locks are used for
read/write access. Locks are released immediately after
each index row is fetched or inserted.

Since those locks may not lead to any deadlocks, it's safe
to ignore them in the distributed deadlock detection.
2017-11-09 10:37:23 +02:00
Onder Kalaci 68ca8cb7f0 Skip relation extension locks
We should skip if the process blocked on the relation
extension since those locks are hold for a short duration
while the relation is actually extended on the disk and
released as soon as the extension is done. Thus, recording
such waits on our lock graphs could yield detecting wrong
distributed deadlocks.
2017-09-28 10:09:09 +03:00
Marco Slot dbf18df995 Don't error out if BuildGlobalWaitGraph fails to connect 2017-08-23 19:08:03 +02:00
Marco Slot 641420d79f Remove source node argument from dump_local_wait_edges 2017-08-23 13:14:00 +02:00
Marco Slot bd6bf29983 Don't add procs multiple times in BuildWaitGraphForSourceNode 2017-08-21 16:48:30 +02:00
Onder Kalaci a333c9f16c Add infrastructure for distributed deadlock detection
This commit adds all the necessary pieces to do the distributed
deadlock detection.

Each distributed transaction is already assigned with distributed
transaction ids introduced with
3369f3486f. The dependency among the
distributed transactions are gathered with
80ea233ec1.

With this commit, we implement a DFS (depth first seach) on the
dependency graph and search for cycles. Finding a cycle reveals
a distributed deadlock.

Once we find the deadlock, we examine the path that the cycle exists
and cancel the youngest distributed transaction.

Note that, we're not yet enabling the deadlock detection by default
with this commit.
2017-08-12 13:28:37 +03:00
Brian Cloutier 9d93fb5551 Create citus.use_secondary_nodes GUC
This GUC has two settings, 'always' and 'never'. When it's set to
'never' all behavior stays exactly as it was prior to this commit. When
it's set to 'always' only SELECT queries are allowed to run, and only
secondary nodes are used when processing those queries.

Add some helper functions:
- WorkerNodeIsSecondary(), checks the noderole of the worker node
- WorkerNodeIsReadable(), returns whether we're currently allowed to
  read from this node
- ActiveReadableNodeList(), some functions (namely, the ones on the
  SELECT path) don't require working with Primary Nodes. They should call
  this function instead of ActivePrimaryNodeList(), because the latter
  will error out in contexts where we're not allowed to write to nodes.
- ActiveReadableNodeCount(), like the above, replaces
  ActivePrimaryNodeCount().
- EnsureModificationsCanRun(), error out if we're not currently allowed
  to run queries which modify data. (Either we're in read-only mode or
  use_secondary_nodes is set)

Some parts of the code were switched over to use readable nodes instead
of primary nodes:
- Deadlock detection
- DistributedTableSize,
- the router, real-time, and task tracker executors
- ShardPlacement resolution
2017-08-10 17:37:17 +03:00
Onder Kalaci b5ea3ab6a3 Improve locking semantics for backend management
We use the backend shared memory lock for preventing
new backends to be part of a new distributed transaction
or an existing backend to leave a distributed transaction
while we're reading the all backends' data.

The primary goal is to provide consistent view of the
current distributed transactions while doing the
deadlock detection.
2017-08-09 17:17:12 +03:00
Marco Slot 80ea233ec1 Add function for dumping global wait edges 2017-07-25 16:52:32 +02:00
Marco Slot 81198a1d02 Add function for dumping local wait edges 2017-07-25 16:52:32 +02:00