If it is certain that we will not use any `parallel_worker`s for a columnar table,
then stripe entries inserted by aborted transactions become visible to
`SnapshotAny` and that causes `REINDEX` to fail by throwing a duplicate key
error.
To fix that:
* consider three states for a stripe write operation:
"flushed", "aborted", or "in-progress",
* make sure to have a clear separation between them, and
* act according to those three states when reading from a columnar table
make_simple_restrictinfo and pull_varnos functions now have a new parameter
These new macros give us the ability to use this new parameter for PG14 and they don't give the parameter for previous versions
Relevant PG commit:
55dc86eca70b1dc18a79c141b3567efed910329d
index_insert function now has a new parameter, indexUnchanged
This new macro give us the ability to use these new parameter for PG14 and they don't give the parameters for previous versions
Existing parameter is set to false
Relevant PG commit:
9dc718bdf2b1a574481a45624d42b674332e2903
es_result_relation_info is removed from Estate. In this commit we make some changes to handle that.
resultRelationInfo filed is added to ModifyState to support the removed field.
Relevant PG commits:
1375422c7826a2bf387be29895e961614f69de4b
a04daa97a4339c38e304cd6164d37da540d665a8
GetOldestXmin function is removed so we use GetOldestNonRemovableTransactionId functions instead
GetOldestNonRemovableTransactionId_compat picks the appropriate one
Relevant PG commit:
dc7420c2c9274a283779ec19718d2d16323640c0
get_partition_parent and RelationGetPartitionDesc functions now have new parameters to also include detached partitions
Thess new macros give us the ability to use these new parameter for PG14 and they don't give the parameters for previous versions
Existing parameters are set to not accept detached partitions
Relevant PG commit:
71f4c8c6f74ba021e55d35b1128d22fb8c6e1629
In two commits vacuumFlags in PGXACT is moved and then renamed to status flags
This macro uses the appropriate version of the flag
Relevant PG commits:
5788e258bb26495fab65ff3aa486268d1c50b123
cd9c1b3e197a9b53b840dcc87eb41b04d601a5f9
SetTuplestoreDestReceiverParams function now has two new parameters
This new macro give us the ability to use this new parameter for PG14 and it doesn't give the parameter for previous versions
Existing parameters are set to NULL to keep previous behavior
Relevant PG commit:
2f48ede080f42b97b594fb14102c82ca1001b80c
Some Copy related functions copied from Postgres had support for both old and new protocols
Postgres removed support for old version so we remove it too
Relevant PG commit:
3174d69fb96a66173224e60ec7053b988d5ed4d9
New macros: standard_ProcessUtility_compat, ProcessUtility_compat, ColumnarProcessUtility_compat, PrevProcessUtilityHook_compat
The functions now have a new bool parameter: readOnlyTree
These new macros give us the ability to use this new parameter for PG14 and it doesn't give the parameter for previous versions
In multi_ProcessUtility and ColumnarProcessUtility, before doing anything else, we check if readOnlyTree parameter is true and create a copy of pstmt
Existing readOnlyTree parameters are set to false since we already handle the read only case at multi_ProcessUtility and ColumnarProcessUtility
Relevant PG commit:
7c337b6b527b7052e6a751f966d5734c56f668b5
BeginCopyFrom function now has a new whereClause parameter.
In the function this parameter is assigned to the whereClause field of the CopyFromState returned
Currently in Postgres there is only one place where this argument isn't NULL, and in previous PG version the whereClause argument of copy state is set right after the function call
Since we don't have such example all current whereClause parameters are set to NULL
Relevant PG commit:
c532d15dddff14b01fe9ef1d465013cb8ef186df
CopyState struct is divided into parts and one of them is CopyFromState
This macro uses the appropriate one for PG versions
Relevant PG commit:
c532d15dddff14b01fe9ef1d465013cb8ef186df
In ReindexStmt concurrent field is moved to options and then options are converted to params list.
This macro uses previous fields for previous versions and the new params list with a new function named IsReindexWithParam for PG14
Relevant PG commits:
844c05abc3f1c1703bf17cf44ab66351ed9711d2
b5913f6120792465f4394b93c15c2e2ac0c08376
VacOptTernaryValue enum is renamed to VacOptValue.
In the enum there were three values, VACOPT_TERNARY_DEFAULT, VACOPT_TERNARY_DISABLED, and VACOPT_TERNARY_ENABLED
Now there are four values VACOPTVALUE_UNSPECIFIED, VACOPTVALUE_AUTO, VACOPTVALUE_DISABLED, and VACOPTVALUE_ENABLED
New macros are VacOptValue_compat, VACOPTVALUE_UNSPECIFIED_COMPAT, VACOPTVALUE_DISABLED_COMPAT, and VACOPTVALUE_ENABLED_COMPAT
The VACOPTVALUE_UNSPECIFIED_COMPAT matches VACOPT_TERNARY_DEFAULT and VACOPTVALUE_UNSPECIFIED. And there are no macro for VACOPTVALUE_AUTO.
Relevant PG commit:
3499df0dee8c4ea51d264a674df5b5e31991319a
New macros: FuncnameGetCandidates_compat and expand_function_arguments_compat
The functions (the ones without _compat) now have a new bool include_out_arguments parameter
These new macros give us the ability to use this new parameter for PG14 and it doesn't give the parameter for previous versions
Existing include_out_arguments parameters are set to 'false' to keep current behavior
Relevant PG commit:
e56bce5d43789cce95d099554ae9593ada92b3b7
stats function now have a new bool print_to_stderr parameter
This new macro gives us the ability to use this new parameter for PG14 and it doesn't give the parameter for previous versions
Existing print_to_stderr parameter is set to true to keep current behavior
Relevant PG commit:
43620e328617c1f41a2a54c8cee01723064e3ffa
getObjectTypeDescription and getObjectIdentity functions now have a new bool missing_ok parameter
These new macros give us the ability to use this new parameter for PG14 and they don't give the parameter for previous versions
Currently all missing_ok parameters are set to false to keep current behavior
Relevant PG commit:
2a10fdc4307a667883f7a3369cb93a721ade9680
The STATUS_WAITING define is removed and an enum with PROC_WAIT_STATUS_WAITING is added instead
This macro uses appropriate one
Relevant PG commit:
a513f1dfbf2c29a51b0f7cbd5913ce2d2ee452c5
AlterTableStmt's relkind field is changed into objtype
New AlterTableStmtObjType macro uses the appropriate one
Relevant PG commit:
cc35d8933a211d9965eb1c1d2749a903d5735db2
Allow ColumnarScans to push down join quals by generating
parameterized paths. This significantly expands the utility of chunk
group filtering, making a ColumnarScan behave similar to an index when
on the inner of a nested loop join.
Also, evaluate all parameters on beginscan/rescan, which also works
for external parameters.
Fixes#4488.
Previously, we were doing `first_row_number` reservation for the first
row written to current `WriteState` but were doing `stripe_id`
reservation when flushing the `WriteState` and were inserting the
related record to `columnar.stripe` at that time as well.
However, inserting `columnar.stripe` record at flush-time is
problematic. This is because, as told in #5160, if relation has
any index-based constraints and if there are two concurrent
writes that are inserting conflicting key values for that constraint,
then postgres relies on `tableAM->fetch_index_tuple`
(=`columnar_fetch_index_tuple`) callback to return `true` when
indexAM is checking against possible constraint violations.
However, pending writes of other backends are not visible to concurrent
sessions in columnar since we were not inserting the stripe metadata
record until flushing the stripe.
With this commit, we split stripe reservation into two phases:
i) Reserve `stripe_id` and insert a "dummy" record to `columnar.stripe`
at the very same time we reserve `first_row_number`, i.e. when writing
the first row to the current `WriteState`.
ii) At flush time, do the storage level allocation and complete the
missing fields of the dummy record inserted into `columnar.stripe`
during i).
That way, any concurrent writes would be able to check against possible
constraint violations by using `SnapshotDirty` when scanning
`columnar.stripe`.
Note that `columnar_fetch_index_tuple` still wouldn't be able to fill
the output tupleslot for the requested tid but it would at least return
`true` for such index look-up's and we believe this should be sufficient
for the caller indexAM callback to make the concurrent writer block on
prior one.
That is how we fix#5160.
Only downside of reserving `stripe_id` at the same time we reserve
`first_row_number` is that now any aborted writes would also waste
some amount of `stripe_id` as in the case of `first_row_number` but
we are just wasting them one-by-one.
Considering the fact that we waste `first_row_number` by the amount
stripe row limit (=150k by default) in such cases, this shouldn't be
important at all.
Before starting to scan a columnar table, we always flush the pending
writes to disk.
However, we increment command counter after modifying metadata tables.
On the other hand, now that we _don't always use_ xact snapshot to scan
a columnar table, writes that we just flushed might not be visible to
the query that just flushed pending writes to disk since curcid of
provided snapshot would become smaller than the command id being used
when modifying metadata tables.
To give an example, before this change, below was a possible scenario
due to the changes that we made to use the correct snapshot.
```sql
CREATE TABLE t(a int, b int) USING columnar;
BEGIN;
INSERT INTO t VALUES (5, 10);
SELECT * FROM t;
┌───┬───┐
│ a │ b │
├───┼───┤
└───┴───┘
(0 rows)
SELECT * FROM t;
┌───┬────┐
│ a │ b │
├───┼────┤
│ 5 │ 10 │
└───┴────┘
(1 row)
```
* Synchronize hasmetadata flag on mx workers
* Switch to sequential execution
* Add test
* Use SetWorkerColumn
* Add test for stop_sync
* Remove usage of UpdateHasmetadataOnWorkersWithMetadata
* Remove MarkNodeMetadataSynced
* Fix test for metadatasynced
* Remove MarkNodeMetadataSynced
* Style
* Remove MarkNodeHasMetadata
* Remove UpdateDistNodeBoolAttr
* Refactor SetWorkerColumn
* Use SetWorkerColumnLocalOnly when setting up dependencies
* Use SetWorkerColumnLocalOnly in TriggerSyncMetadataToPrimaryNodes
* Style
* Make update command generator functions static
* Set metadatasynced before syncing
* Call SetWorkerColumn only if the sync is successful
* Try to sync all nodes
* Fix indexno
* Update metadatasynced locally first
* Break if a node fails to sync metadata
* Send worker commands optional
* Style & Rebase
* Add raiseOnError param to SetWorkerColumn
* Style
* Set metadatasynced for all metadata nodes
* Style
* Introduce SetWorkerColumnOptional
* Polish
* Style
* Dont send set command to not synced metadata nodes
* Style
* Polish
* Add test for stop_sync
* Add test for shouldhaveshards
* Add test for isactive flag
* Sort by placementid in the function verify_metadata
* Cover edge cases for failing nodes
* Add comments
* Add nodeport to isactive test
* Add warning if metadata out of sync
* Update warning message
As we use the current user to sync the metadata to the nodes
with #5105 (and many other PRs), there is no reason that
prevents us to use the coordinated transaction for metadata syncing.
This commit also renames few functions to reflect their actual
implementation.
Before this commit, creating a partition after a DROP column
on the parent (position before dist. key) was leading to
partition to have the wrong distribution column.
update_distributed_table_colocation can be called by the relation
owner, and internally it updates pg_dist_partition. With this
commit, update_distributed_table_colocation uses an internal
UDF to access pg_dist_partition.
As a result, this operation can now be done by regular users
on MX.
* Fix UNION not being pushdown
Postgres optimizes column fields that are not needed in the output. We
were relying on these fields to understand if it is safe to push down a
union query.
This fix looks at the parse query, which has the original column fields
to detect if it is safe to push down a union query.
* Add more tests
* Simplify code and make it more robust
* Process varlevelsup > 0 in FindReferencedTableColumn
* Only look for outers vars in union path
* Add more comments
* Remove UNION ALL specific logic for pulling up childvars
These two options were not included when creating the sequences on the
workers as part of metadata syncing.
The missing `data_type` part of the definition made finding the cause
of #5126 harder than necessary, because of confusing errors.
Before this commit, we always synced the metadata with superuser.
However, that creates various edge cases such as visibility errors
or self distributed deadlocks or complicates user access checks.
Instead, with this commit, we use the current user to sync the metadata.
Note that, `start_metadata_sync_to_node` still requires super user
because accessing certain metadata (like pg_dist_node) always require
superuser (e.g., the current user should be a superuser).
However, metadata syncing operations regarding the distributed
tables can now be done with regular users, as long as the user
is the owner of the table. A table owner can still insert non-sense
metadata, however it'd only affect its own table. So, we cannot do
anything about that.
Ignore orphaned shards in more places
Only use active shard placements in RouterInsertTaskList
Use IncludingOrphanedPlacements in some more places
Fix comment
Add tests
* Alter seq type when we first use the seq in a dist table
* Don't allow type changes when seq is used in dist table
* ALTER SEQUENCE propagation
* Tests for ALTER SEQUENCE propagation
* Relocate AlterSequenceType and ensure dependencies for sequence
* Support for citus local tables, and other fixes
* Final formatting