Commit Graph

2 Commits (e5a2bbb2bd86d32d24af301fd33941bddb4187de)

Author SHA1 Message Date
SaitTalhaNisanci f9c4431885 add the support to execute copy locally
A copy will be executed locally if
- Local execution is enabled and current transaction accessed a local placement
- Local execution is enabled and we are inside a transaction block.

So even if local execution is enabled but we are not in a transaction block, the copy will not be run locally.

This will not run locally:
```
COPY distributed_table FROM STDIN;
....
```

This will run locally:
```
SET citus.enable_local_execution to 'on';
BEGIN;
COPY distributed_table FROM STDIN;
COMMIT;
....
```
.
There are 3 ways to do a copy in postgres programmatically:
- from a file
- from a program
- from a callback function

I have chosen to implement it with a callback function, which means that we write the rows of copy from a callback function to the output buffer, which is used to insert tuples into the actual table.

For each shard id, we have a buffer that keeps the current rows to be written, we perform the actual copy operation either when:
- copy buffer for the given shard id reaches to a threshold, which is currently 512KB
- we reach to the end of the copy

The buffer size is debatable(512KB). At a given time, we might allocate (local placement * buffer size) memory at most.

The local copy uses the same copy format as remote copy, which means that we serialize the data in the same format as remote copy and send it locally.

There was also the option to use ExecSimpleRelationInsert to insert
slots one by one, which would avoid the extra
serialization/deserialization but doing some benchmarks it seems that
using buffers are significantly better in terms of the performance.

You can see this comment for more details: https://github.com/citusdata/citus/pull/3557#discussion_r389499054
2020-03-18 09:28:59 +03:00
SaitTalhaNisanci 321d0152c1
add a utility to get shard oid from relation oid and shard id (#3596) 2020-03-09 15:50:29 +03:00