One Pass Algorithms

Last updated Nov 8, 2022 Edit Source

# One Pass Algorithms

Algorithms where the data is read only once from the disk.

# Duplicate Elimination

I/O Cost: B(R)
Space: $M-1 \ge B(distinct(R))$

# Natural Join

I/O Cost: B(R) + B(S)
Space: $M-1 \ge min(B(R),B(S))$

# Nested Loop Join

Can be considered “one and a half pass algorithm”: One argument is read only once while another is read repeatedly

# Simple Nested Loop Join

Sacrifice some I/O time in order to save on memory:

I/O Cost: $B(R) + B(S)\times B(R)$
Space: $M \ge2$

# Block based nested loop join

We can improve the I/O cost by utilising all the buffers available:

I/O Cost: $B(R) + B(S)\times (B(R)/(M-1))$
Space: $M \ge2$ If M is large -> devolves into the one pass algorithm If M is 2 -> devolves into the simples nested loop join algorithm

a. One pass algorithm means that at least one of the relations must be fully loaded into the input buffer. M must be at least 100 + 1 = 101 I/O cost: 100 + 150 = 250 b. S in the outer loop: same as in (a) R in the outer loop: $150+100\times150/100=300$ c. When the smaller relation is put in the outer loop and fits within the available M-1 buffers, the block based algorithm becomes a one-pass algorithm and they share the cost. We can load 999 blocks of R and 1 block of S at a time. $Cost = (20000/999)\times5000+20000\approx120101$

One Pass Algorithms 2022-10-23 14.55.37.excalidraw

Brendan Ang

One Pass Algorithms

# One Pass Algorithms

# Example using Select

# Duplicate Elimination

# Natural Join

# Nested Loop Join

# Simple Nested Loop Join

# Block based nested loop join

# Practice Problems

Backlinks

Interactive Graph