Query Processing

Last updated Nov 8, 2022 Edit Source

# Query Processing

Given a query, we need to devise an algorithm to obtain the desired result

Select B, D From R, S Where R.A = “c” AND S.E = 2 AND R.C=S.C Some possible solutions:

Cartesian product -> Select tuple -> Project
Select tuple -> Natural join ->Project
Index to select R tuples -> Use S.C index to select found R.C values -> Remove S.E != 2 -> Join matching

Notations:

R: a relation/table
T(R): number of tuples in R
B(R): number of blocks needed to hold tuples of R
V(R, a): number of distinct values of attribute a in R Assumptions for analysis:

Main memory is organized as input buffers (for holding data for computations) and output buffers (for storing results) each being the size of 1 block. Let M represent the number of input buffers available.
Relation R is clustered if its tuples are packed into as few blocks as possible. If T(R) = 1,000 and each block can hold 1000 tuples, B(R) = 100
- We assume R is clustered. If not clustered, it may take T(R) disk I/Os rather than B(R) to read all the tuples
Ignore the final I/O cost for writing result in output buffer back to disk. Irrelevant to our calculations.