Wednesday, 20 July 2011

Muthukrishnan

I read (excluding the chapter on New Directions) the book by Muthukrishnan. Having briefly looked at his website it turns out I read what he describes as "Prelim". In looking at the now published book this preliminary paper seems to form the core foundation of the book, and so I don't think much has been lost, bar one or two extra examples.

This paper introduces both intuition and some formal notation and foundations on which study of data streaming can be exploited.

Particular of note is the idea of the stream describing a signal; that is a one dimensional function. The best way I have found to think of this is as the signal being some untouchable array. The stream the describes either just a 'printf' (called Time Series Model) of the array in sequential order; else updates to this array (Cash Register/Turnstile). Either way, the stream describes this array without you actually considering the array as a whole.
A good example from the paper is of a subway system, each element in the array is a count of the number of people arrived-departed from that turnstile, and the data stream is someone arriving/leaving; updating the array. Another example is each item in the data stream is a new IP packet; each element of the array (signal) is the packet.

It also talks about the essential goal as computing functions on the signal. E.g. "the number of X seen so far".

Pretty good set of notes; although I feel it is time to start getting full on into the papers on pattern matching in a stream.

No comments:

Post a Comment