knowledge-database (beta)

Current group: comp.compression.

Finding maximal repeating substring using suffix tree

Finding maximal repeating substring using suffix tree  
dayzman at hotmail.com
 Re: Finding maximal repeating substring using suffix tree  
Popai
 Re: Finding maximal repeating substring using suffix tree  
Alf P. Steinbach
From:dayzman at hotmail.com
Subject:Finding maximal repeating substring using suffix tree
Date:11 Jan 2005 23:31:12 -0800
Hi,

Does anybody know how I can find the maximal repeating substring using
suffix tree? What I mean by maximal repeating substring is:
Given a string S, a substring s that repeats k times in S is a MRS iff:

k>= 2 and |s| * k >= |s| / 2; and
|s| * k is the maximum among all substrings that satisfy the above
condition; and
k is the maximum among all substrings that satisfy the above two
conditions.

I know it is possible to find the longest repeating substring using
suffix tree, which is to find the deepest fork
(http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Suffix/), but LRS
only considers the length of the substring, rather than the occurence.
Please help.

Cheers,
Michael
From:Popai
Subject:Re: Finding maximal repeating substring using suffix tree
Date:12 Jan 2005 01:14:01 -0800
Try Chrochemore algorithm.

M. CROCHEMORE, An Optimal Algorithm for Computing the Repetitions in a
Word. Information Processing Letters 12 (1981), 244--250.
(I wasn't able to find the article over the internet but you can fins a
lot of other papers which present the basic idea of the algorithm).

The basic idea is the following:
Consider a string S = s1 s2 ... sn over an alphabet A = {a1,a2,...ak}.

Start with the sets C_a1 = all positions where a1 is in S
C_a2 = all positions where a2 is in S
.... and so on ...
C_ak = all positions where ak is in S

Next compute the sets
C_a1a1 = all position p from C_a1 where a1 is in S
at position p+1
C_a1a2 = all position p from C_a1 where a2 is in S
at position p+1
.... and so on with all combination of length 2, then
all combinations of lenght 3 and so on.

This way you find all factors of S, positions and how many times the
repeat.

At the first look the algorithm sims to be very slow because it takes
all the posibilites but in practice it is quite fast because after a
few steps many of the sets C_xx...xx are empty and an entire brench of
all posibilities tree is cutted. Moreover the sum of cardinality of all
sets C_xxx...xxx (with xxx...xxx of the same size) is not bigger than n
(the lenght of S).

I hope this helps,
Best regars,
Ionut Popa



P.S: Or try read the paper
Finding Maximal Repetitions in a Word in Linear Time (1999) -
Roman Kolpakov, Gregory Kucherov,
Proceedings of the40th IEEE Annual Symposium on Foundations of Computer
Science(http://citeseer.ist.psu.edu/cache/papers/cs/14999/http:zSzzSzwww.loria.frzSz~kucherovzSzPAPERSzSzKolpakovKucherovFOCS99.pdf/kolpakov99finding.pdf)





dayzman@hotmail.com wrote:
> Hi,
>
> Does anybody know how I can find the maximal repeating substring
using
> suffix tree? What I mean by maximal repeating substring is:
> Given a string S, a substring s that repeats k times in S is a MRS
iff:
>
> k>= 2 and |s| * k >= |s| / 2; and
> |s| * k is the maximum among all substrings that satisfy the above
> condition; and
> k is the maximum among all substrings that satisfy the above two
> conditions.
>
> I know it is possible to find the longest repeating substring using
> suffix tree, which is to find the deepest fork
> (http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Suffix/), but
LRS
> only considers the length of the substring, rather than the
occurence.
> Please help.
>
> Cheers,
> Michael
From:Alf P. Steinbach
Subject:Re: Finding maximal repeating substring using suffix tree
Date:Wed, 12 Jan 2005 07:39:10 GMT
* dayzman@hotmail.com:
>
> k>= 2 and |s| * k >= |s| / 2; and

That isn't meaningful.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
   

Copyright © 2006 knowledge-database   -   All rights reserved