2203 Distributed Systems
# 2203 Distributed Systems
#moc
- Distributed Abstractions
- Failure Detectors
- Broadcast Abstractions
- Distributed Shared Memory
- Consensus
- Time Abstractions
- Distributed Data Management
# What are distributed systems
A set of nodes, connected by a network, which appear to its users as a single coherent system.
# Core problems
# Agreement
# Two generals problem
“Two generals need to coordinate an attack”
- Must agree on time to attack
- They’ll win only if they attack simultaneously
- Communicate through messengers
- Messengers may be killed on their way Generals are unable to come to an agreement within a specified time bound using unreliable communication channels.
# Consensus Problem
All nodes/processes propose a value
Some nodes (non correct nodes) might crash & stop responding
The algorithm must ensure a set of properties (specification):
- All correct nodes eventually decide
- Every node decides the same
- Only decide on proposed values This problem models the core issue in distributed databases known as atomic commits, where we choose to commit if every node agrees to commit and abort if at least one node aborts. It is a consensus with 2 values {commit, abort}.
# Broadcast Problem
Atomic Broadcast
- A node broadcasts a message
- If sender correct, all correct nodes deliver message
- All correct nodes deliver the same messages (consensus)
- Messages delivered in the same order
Atomic broadcast can be used to solve consensus in the following way:
- Decide on the first received proposal
- Since all messages are in the same order, all nodes will decide the same
Consensus can be solved by Atomic broadcast
Atomic broadcast is equivalent to Consensus
# Modelling Distributed Systems
# Timing assumptions
- Processes: bounds on time to make a computation step
- Network: bounds on time to transmit a message
- Clocks: lower and upper bounds on clock drift rate
# Failure assumptions
- Processes: what kind of failure?
- Network: can network drop messages, temporarily disconnect?
# Asynchronous System Model
- No bound on time to deliver a message
- No bound on time to compute
- Clocks are not synchronized
# Synchronous system
“My server always serves requests within 1 week”
- Known bound on time to deliver a message (latency)
- Known bound on time to compute
- Known lower and upper bounds in physical clock drift rate
Examples: - Embedded systems (shared clock)
- Multicore computers
# Partial Synchrony
“My server processes requests within one week when it is running, and it will eventually be running for at least a week, I just don’t know when that will be.”
- A system that is asynchronous but eventually exhibits some period of synchrony.
# Measuring Performance
# Message complexity
The number of messages required to terminate an operation of an abstraction
# Time complexity (Rounds)
One time unit in an Execution E is the longest message delay in E. We assume all communication steps takes one time unit. We also call this a round or step.
Time Complexity is Maximum time taken by any execution of the algorithm under the assumptions
- A process can execute any finite number of actions (events) in zero time
- The time between send(m)i,j and deliver(m)i,j is at most one time unit