Consensus

Last updated Feb 16, 2023 Edit Source

# Consensus

Processes propose values and they all have to agree on one of these values.

Single Value Consensus
- Validity: decided values are those which are proposed
- Agreement: no 2 correct processes decide differently
- Termination: every correct process eventually decides
- Integrity: a process decides at most once
Single Value Uniform Consensus
- Uniform Agreement: no 2 processes decide different values Consensus is not solvable in the asynchronous system model if any node is allowed to fail
Unable to detect failure
Cannot wait for the correct majority of processes
Termination not satisfied: cannot decide

# Paxos Algorithm

An Eventual Leader Election (weakest leader elector we can use) can be used to eventually elect 1 single proposer (providing termination).

Proposers: attempt to impose their proposal to acceptors
Acceptors: may accept values issued by proposers (cannot talk to each other)
Learners: decide depending on what is accepted Contention problem: several processes might initially be proposers

# Abortable Consensus

Algorithm aborts if there is contention of multiple proposers.

# Version 1 (Centralised)

Proposer sends value to a central acceptor. Acceptor decides on the first value which it gets. Problem 1: if this acceptor fails, we will never know of the decision

Proposers talk to a set of acceptors, use a majority quorum to choose a value and enable fault tolerance. Problem 2: acceptor accepts the first proposal, if messages arrive out of order, possible to have no majority

# Version 3 (Enable restarts)

If no majority value, we need to restart until there is one. Since proposers can propose again, we need a way to differentiate between them.

Use a ballot number: sequence number in the form $i, n+i, 2n+i$ for a process $i$ and $n$ processes Problem 3: restarts lead to different majority accepted values across time, learners cannot make a single decision

# Version 4 (Prepare and Accept)

We need a way to ensure that every higher number proposal results in the same chosen value

Satisfied by ensuring acceptors only accept this value
Satisfied by ensuring proposers only propose this value
Proposers need to learn this value from the highest sequence number of those accepted. Proposers need to ensure that this “highest value” does not change. Proposers query acceptors so that if a value is accepted, every higher proposal issued has the same value previously accepted

Proposer $prepare(n)$:
- Gets a promise from acceptors not to accept a proposal with lower ballot number n
- Acceptor also responds with the value corresponding to the highest ballot number proposal
Proposer $accept(n,v)$:
- Pick the value from the maximum proposal number returned. If none of the processes return a value, proposer can pick freely.
- Acceptor $accept(n,v)$ if not accepted any $prepare(m)$ such that $m>n$; else $reject$
Proposer $decide(v)$ if majority acks; else $abort$

# Optimisations

Reject prepare(n) if accepted prepare(m); m > n: Reject a lower prepare
Reject accept(n,v) if answered accept(m,u); m > n: Reject a lower accept
Reject prepare(n) if answered accept(m,u); m > n : Reject a lower accept
Ignore messages if majority obtained

# Multi Paxos

The motivation: replicated state machines need to agree on a sequence of commands to execute.

Approach: organise the algorithm into rounds. In each round, each server starts a new instance of Paxos. They propose (2 RTT), accept (2 RTT) and decide on 1 command, add that to the log and restart.

Initial states

$ProCmds = \emptyset$: stores the list of commands proposed
Log = <>: a log of decided commands A process which wants to execute a command C triggers $rb-broadcast<C, Pid>$. On delivery, the command pair is added to ProCmds unless it is already in Log. Problem: the same command across multiple processes might be decided in different slots in time.

# Sequence Consensus

Rather than agreeing on a single command and storing that in a Log, we can directly try to agree on the sequence of commands.

Validity: if process p decides on a value, the value is a sequence of commands
Uniform Agreement: if process p decides u and another decides v, then one is a prefix of the other
Integrity: process can later decide another value, but the previous value is a strict prefix of the newly decided value
Termination After adopting a value with highest proposal number, the proposer is allowed to extend the sequence with the new command. Problem: in the prepare phase, processes send a lot of redundant state as the full log is transferred between the proposer and acceptor leading to high IO. No pipelining as well since each round must begin with the prepare phase.

# Log Synchronisation

Modify the prepare phase and shared states such that we can work on a single synchronised log $v_a$. To do this, let 1 process act as the sole leader (proposer) until it is aborted by an election of higher ballot number

# Prepare Phase

The leader sends Prepare:

current round: $n$
accepted round: $n_a$
log length: $|v_a|$
decided index $l_d$, where the decided sequence is $prefix(v_a,l_d)$ The followers reply with Promise:
their accepted round
the log entries which the leader is missing and the leader appends those to the log. AcceptSync is used to synchronise the new log. Promised followers and leader now have the same common log prefix

# Accept Phase

The leader sends Accept command with highest $n$ to all promised followers The followers reply with Accepted When majority accepted, Decide is sent. Any late Promise is replied with an AcceptSync

# Partial Connectivity (enabling quorum connectedness)

Chained scenario: When one server loses connectivity to the leader, it will try to elect itself as a leader. Livelock situation as servers compete to become the leader. Can be solved if A becomes the leader but can’t because it is already connected to a leader. Quorum Loss: When the leader loses quorum connectivity, deadlock situation as a majority cannot be obtained to make progress. B, D, E cannot elect a leader without a quorum. A is quorum connected but cannot elect a new leader since it is still connected to the alive leader C. Constrained Election: Leader is fully disconnected. A can become the new leader but will not be elected as it does not have the most updated log (log length).

# Failure recovery

Recover state from persistent storage
Send a PrepareReq to all peers
- If elected as leader, synchronise through a Prepare phase
- Prepare phase from another leader will synchronise

# Reconfiguration

Supporting a way to add/replace any process part of the replicated state machine. A configuration $c_i$ is defined by a set of process ids ${p1, p2, p3}$ and the new configuration can be any new set of processes e.g. ${p1,p2,p4}$

# Stop Sign

To safely stop the current configuration, we must prevent new decisions in the old configuration (“split-brain” problem) using a stop sign: The stop sign contains information about the new configuration to help processes reconfigure:

the new set of processes in $c_{i+1}$
the new configuration id number
the identifiers for each replica in the new configuration Each process on viewing the stop sign, can safely shut down and restart in the new configuration A new process not previously part of $c_i$ must perform log migration to catch up with the new instance. Log migration can be done with snapshots of the latest state.

Brendan Ang