Sunday, 14 August 2011

Porat and Porat

The paper by Porat and Porat gives a different yet very similar algorithm for pattern matching in a stream. This one however does have a small chance of false negatives; that is it may miss an actual match.

As is seemingly usual you take 1..logm lengths of the pattern and tries to match the longer length from the previous one (i from i-1). To do this, if there is a match, the algorithm jumps by period(i-1) characters until you have no 'overlap'; that is the period(i) is longer that the remaining length of the substring of the you are on.

The details of how this 'process' above stitches together involves creating and then aborting processes 'just in case' you need the result from that process. If you later find it useless (this will be before it will finish) you just kill it and start a new one.

In comparison with the previous streaming method, which extends the pattern using fingerprints, rather than here using fingerprints generally and extend the pattern by relying on periodicity and jumps and 'checkpoints'.

The paper goes onto 1 and k mismatch pattern matching which uses the pattern matching algorithm as a black box. Once a method is implemented for general pattern matching, this could be a natural extension to the project here.

Next stop is a paper which claims solves the pattern matching in a stream problem using real time and log space. This algorithm could be replaced as the blackbox above, or for the periodicity calculating algorithm from the previous paper.

No comments:

Post a Comment