As most current query processing architectures are already pipelined, it seems logical to try to extend them from stored files to data streams. However, two classes of query operators are impractical for processing long or infinite data streams. Unbounded stateful operators (such as join) maintain state with no upper bound, and therefore may run out of memory. Blocking operators (such as sort) read the entire input before emitting an output, and therefore might never produce a result. We believe that a priori knowledge of a data stream can permit the use of such operators. This knowledge can be expressed in the form of punctuations.
Punctuations in a stream mark the end of substreams, allowing us to view an infinite stream as a combination of finite streams. We introduce three kinds of invariants to specify the proper behavior of query operators in the presence of punctuation. Pass invariants unblock blocking operators by defining when a such an operator can pass results on. Keep Invariants decrease the size of state by defining when a stateful operator can discard unneeded state. Propagation invariants define when and how an operator can pass punctuation on, so later operators can take advantage of them.
The semantics of a punctuated stream are that there are no tuples in the stream that match the punctuation. This is so the stream processor can be certain it can output results based on a punctuation. Note in the stream above that there are no data items following a punctuation that are of the same color.
|Functions||- Functions commonly used when processing punctuations.|
|Punctuation Format||- The format we use to designate punctuations in XML.|
|Pass Invariants||- define when a blocking operator can correctly pass a subset of its result before all data has been received.|
|Propagation Invariants||- define when an oerator can correctly pass punctuation.|
|Keep Invariants||- define what data items an unbounded stateful operator must keep in state.|
|Real World Example||using a warehouse containing temperature sensitive merchandise|
|Resources||used for this research so far.|
|My Ph.D. Thesis (PDF)||on Punctuated Streams|
|Operators||punctuated stream processing model|
|Testing||testing harness for the model, including code to simulate the real world example|
Last modified by Pete Tucker on 26 August 2005.