Distributed Data Management
# Distributed Data Management
# Distributed Transactions
Transaction Management for distributed systems. All shards should either commit/abort the same transaction.
# Atomic Commit
The de-facto protocol for atomic commit is two-phase commit (2PC)
Approach: use 1 process as the coordinator (leader). Given a proposed transaction T, commit if all followers agree to commit. Abort if at least one follower aborts/fails. Problem: if the process was to fail after the decision was made by the coordinator it will be unable to apply the changes locally in the shard.
- Build a more reliable system by building a replicated state machine within the shard. Replicated log will allow fault tolerance Problem: replicated cluster failure will cause loss of entire log
- Perform replication across different clusters. With a replica of each shard in each data centre Problem: coordinator was a single point of failure
- Replicate coordinator in the same way for fault tolerance. Second phase of 2PC is no longer needed as each shard can access the local log for decision (abort/commit) without additional broadcast.
# Distributed Snapshotting
Capturing the global state of a distributed system.
# Consistent Cuts
Properties:
- Termination: eventually every process records its state
- Validity: all recorded states correspond to a consistent cut
# Chandy Lamport Algorithm
Approach: disseminate a special marker to mark events during the cut.
- Channels and processes record state (add to snapshot) until the marker has been received. E.g. channel incoming to p2 continuously records messages until a marker is passed through it
- Snapshot is complete once everyone has seen the marker.
# Epoch-based Snapshotting
For continuous data stream processing, it is difficult to log individual task executions.
Approach: divide computations into epochs, such as stages, and treat them as 1 transaction. The Chandy Lamport algorithm is not enough, as it will capture a lot of in-flight messages. We want to capture just the states which would in itself reflect the effect of these messages. This is done by epoch alignment:
- Allow all messages to go through until an epoch change marker is introduced
- On receiving the marker, log the state
- When a process receiving the marker has multiple channels, prioritise inputs from channels which have not seen the marker until they all see the marker.
- Terminate once all processes seen the marker.