Punctuated Data Streams - Propagation Invariants

Propagation invariants define if and when an operator can pass punctuation along as part of its result. The operator must make sure that any punctuation it passes will hold true for its output. There may be another operator waiting for its output that can also take advantage of punctuations, so it is desirable to propagate punctuations whenever possible. Propagation invariants for an operator can assume that tuples indicated by the pass invariant have already been output.

Formally, propagation invariants take the form Ps = Propagation(op, Ts1, Ps1, …, Tsn, Psn), where:

 op is the operator with any attributes
 Tsi is the set of tuples that have arrived from the i-th input
 Psi is the set of punctuation that has arrived from the i-th input
 Ps is the set of punctuations that can be output from the operator
Note that there can be many correct propagation invariants for a given operator as omitting or combining punctuation still yields a grammatical stream. The trivial propagation invariant for any operator just returns the empty set, but is not very useful. We explore two operators: Sort and Union.

Sort

There are at least two interesting propagation invariants for sort. One is to output punctuation that matches tuples returned according to the pass invariant for sort. In this case, the invariant is simply:
 Propagation(SortA, Ts, Ps) = Initial(SortA, Ps)
  Initial(SortA, Ps) = maximal contiguous subset Ps' of Ps |
    ¬$ t1, t2 where (SetMatch(t1, Ps') Ù ¬SetMatch(t2, Ps') Ù (t2 <A t1)) }
The second propagation invariant requires more overhead, but outputs the most descriptive punctuations. In addition to initial punctuation, the operator iterates through the set of punctuations that it has read from the input. For each punctuation, if there are no tuples that match it, then that punctuation can be output. Formally, this invariant is defined as:
 Propagation(SortA, Ts, Ps) = Initial(SortA, Ps) È { pÎPs | ¬$ tÎ(Ts – Pass(SortA, Ts, Ps)) where match(t, p) }
This second invariant can emit punctuation "out of order” relative to the tuple order. For example, if we are sorting authors by name, and we find there are no authors starting with 'G', we can emit punctuation for 'G' even though we may only have passed along tuples for authors up to 'D'.

Union

Like all operators that have more than one input, the union operator must combine punctuations from all inputs into one punctuation for the output. The propagation invariant for Union uses the CombinePunct function discussed earlier. Formally, the propagation invariant for binary union is defined as:
 Propagation(Union, Ts1, Ps1, Ts2, Ps2) = {p | p1ÎPs1, p2ÎPs2, p=CombinePunct(p1,p2) }
For example, consider a union on two inputs; each input containing attributes . If the first input informs the union operator that all tuples with attribute A in the range [5,15] have been read, it would be incorrect for the union operator to simply pass that punctuation through. It is possible for the second input to still have values for A in that same range.

Instead, the union operator must store the punctuation. If the next punctuation from the second input is for the range [10,20], then the operator can intersect the two ranges, and pass on punctuation for the range [10,15] for attribute A. If the next punctuation from the second input is for the range [20,25], then the union operator can intersect this range with the range from its first input. In this case, [5,15] Ç [20,25] is the empty set, so no punctuation is passed.


Last modified by Pete Tucker on 26 August 2005.