2203 Distributed Systems

Last updated Jan 21, 2023 Edit Source

# 2203 Distributed Systems

#moc

Distributed Abstractions
Failure Detectors
Broadcast Abstractions
Distributed Shared Memory
Consensus
Time Abstractions
Distributed Data Management

# What are distributed systems

A set of nodes, connected by a network, which appear to its users as a single coherent system.

# Core problems

# Agreement

# Two generals problem

“Two generals need to coordinate an attack”

Must agree on time to attack
They’ll win only if they attack simultaneously
Communicate through messengers
Messengers may be killed on their way Generals are unable to come to an agreement within a specified time bound using unreliable communication channels.

# Consensus Problem

All nodes/processes propose a value
Some nodes (non correct nodes) might crash & stop responding
The algorithm must ensure a set of properties (specification):

All correct nodes eventually decide
Every node decides the same
Only decide on proposed values This problem models the core issue in distributed databases known as atomic commits, where we choose to commit if every node agrees to commit and abort if at least one node aborts. It is a consensus with 2 values {commit, abort}.

# Broadcast Problem

Atomic Broadcast

A node broadcasts a message
If sender correct, all correct nodes deliver message
All correct nodes deliver the same messages (consensus)
Messages delivered in the same order

Atomic broadcast can be used to solve consensus in the following way:
Decide on the first received proposal
Since all messages are in the same order, all nodes will decide the same
Consensus can be solved by Atomic broadcast
Atomic broadcast is equivalent to Consensus

# Modelling Distributed Systems

# Timing assumptions

Processes: bounds on time to make a computation step
Network: bounds on time to transmit a message
Clocks: lower and upper bounds on clock drift rate

# Failure assumptions

Processes: what kind of failure?
Network: can network drop messages, temporarily disconnect?

# Asynchronous System Model

No bound on time to deliver a message
No bound on time to compute
Clocks are not synchronized

# Synchronous system

“My server always serves requests within 1 week”

Known bound on time to deliver a message (latency)
Known bound on time to compute
Known lower and upper bounds in physical clock drift rate
Examples:
Embedded systems (shared clock)
Multicore computers

# Partial Synchrony

“My server processes requests within one week when it is running, and it will eventually be running for at least a week, I just don’t know when that will be.”

A system that is asynchronous but eventually exhibits some period of synchrony.

# Measuring Performance

# Message complexity

The number of messages required to terminate an operation of an abstraction

# Time complexity (Rounds)

One time unit in an Execution E is the longest message delay in E. We assume all communication steps takes one time unit. We also call this a round or step.

Time Complexity is Maximum time taken by any execution of the algorithm under the assumptions

A process can execute any finite number of actions (events) in zero time
The time between send(m)i,j and deliver(m)i,j is at most one time unit

Brendan Ang