Punctuated Data Streams - Schemes and Functions

Punctuation Format

We have three goals for our punctuation format. First, the punctuation itself should be small, similar in size to a tuple in the stream. Second, the punctuation should not affect the results of query operators that do not understand punctuations. Third, it should be easy for the query operator to determine what tuples a punctuation is describing.

The Namespaces in XML recommendation gives us a way to address these goals. We can define a namespace that mirrors the structure of the existing tuples. Since the punctuations belong to a different namespace, query operators will not confuse them with actual tuples. For example, if we have the following tuple format:

<QUOTE> 
 <TICKER> CSCO </TICKER>
 <PRICE> 16.51 </PRICE>
 <DATE> 04/17/2001 </DATE>
 <TIME> 13:00 </TIME>
</QUOTE> 

Then punctuation for that tuple will have the following general structure, where punct is a new namespace we introduce:

<punct:QUOTE> 
 <TICKER> </TICKER>
 <PRICE> </PRICE>
 <DATE> </DATE>
 <TIME> </TIME>
</punct:QUOTE> 

Punctuation Scheme

Punctuations in our punctuation scheme have pattern elements that correspond to elements in the tuple they describe. A pattern element can take the form of a wildcard, constant, range, or list:
<!-- Constant: All quotes for CSCO have been read -->
<punct:QUOTE>
 <TICKER> CSCO </TICKER>
 <PRICE> * </PRICE>
 <DATE> * </DATE>
 <TIME> * </TIME>
</punct:QUOTE>
 Figure 1 Punctuation with a single constant value and wildcards
 
<!-- Range: All quotes from 10:00 to 12:00 on May 18th have been read. -->
<punct:QUOTE>
 <TICKER> * </TICKER>
 <PRICE> * </PRICE>
 <DATE> 05/18/2001 </DATE>
 <TIME> [10:00,12:00] </TIME>
</punct:QUOTE>
 Figure 2 Punctuation describing a range between 10:00 and 12:00 on May 18th
 
<!-- Range with no lower bound: All quotes before 11:00 have been read for May 18th. -->
<punct:QUOTE>
 <TICKER> * </TICKER>
 <PRICE> * </PRICE>
 <DATE> 05/18/2001 </DATE>
 <TIME> [,11:00) </TIME>
</punct:QUOTE>
 Figure 3 Punctuation describing a range from the minimum TIME value, ending before 11:00, for May 18th
 
<!-- List: all quotes for CSCO, MSFT, and LU have been read from the stream -->
<punct:QUOTE>
 <TICKER> {CSCO,MSFT,LU} </TICKER>
 <PRICE> * </PRICE>
 <DATE> * </DATE>
 <TIME> * </TIME>
</punct:QUOTE>
 Figure 4 Punctuation that list specific ticker symbols that have been read
We use these four forms in our implementation. There are certainly other possible forms. Any predicate is a candidate for a pattern element. The main consideration is that pattern elements are closed under intersection, so CombinePunct can be implemented.

Last modified by Pete Tucker on 26 August 2005.