
Ts' = PurgeN(op, Ts1, Ps1, …, Tsn, Psn), where:
N is the position of the operator's input that we are purging state for
| |
|   | op is the operator along with any arguments |
|   | Tsi is the set of tuples that have arrived from the i-th input |
|   | Psi is the set of punctuation that has arrived from the i-th input |
Ts' is the set of tuples that corresponds to state maintained for that input that can be discarded |
Certain query operators already purge some state. The purge invariants we discuss are in addition to purging that normally occurs. For operators that do not accumulate state, the purge invariant is trivial, returning the empty set. We explore two operators: Sort and Symmetric Hash Join.
Purge(SortA, Ts, Ps) = Pass(SortA, Ts, Ps)
Symmetric hash join (Wilschut et al.) is an enhancement to the traditional hash join that outputs results as it reads from both inputs. It maintains a hash table for each input. Tuples from each input are cached in the corresponding hash table, and then matched with tuples in the other hash table. For each match, the joined tuple is passed on. This algorithm has the desirable property that it returns results while it is reading data. However, it accumulates a lot of state in its hash tables.
We can enhance this operator using punctuations to discard some of its state. The operator can use punctuation from one input to match tuples in the other hash table to find tuples that will not be joined anymore, which can be discarded.
Formally, the purge invariants for the symmetric hash join are:
Purge1(JoinA, Ts1, Ps1, Ts2, Ps2) =
{t1ÎTs1 | "t2,
joinable(A,t1,t2) Þ SetMatch(t2,Ps2) } | |
Purge2(JoinA, Ts1, Ps1, Ts2, Ps2) =
{t2ÎTs2 | "t1,
joinable(A,t1,t2) Þ SetMatch(t1,Ps1) } |
A, and considers
tuples from its respective input. Focusing on Purge1(), a tuple t1 can be discarded
from the first input if any other tuple t2 that could be joined with t1 on A
matches punctuation from the second input.
Last modified by Pete Tucker on 26 August 2005.