|
|
 | | From: | dayzman at hotmail.com | | Subject: | Finding maximal repeating substring using suffix tree | | Date: | 11 Jan 2005 23:31:12 -0800 |
|
|
 | Hi,
Does anybody know how I can find the maximal repeating substring using suffix tree? What I mean by maximal repeating substring is: Given a string S, a substring s that repeats k times in S is a MRS iff:
k>= 2 and |s| * k >= |s| / 2; and |s| * k is the maximum among all substrings that satisfy the above condition; and k is the maximum among all substrings that satisfy the above two conditions.
I know it is possible to find the longest repeating substring using suffix tree, which is to find the deepest fork (http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Suffix/), but LRS only considers the length of the substring, rather than the occurence. Please help.
Cheers, Michael
|
|
 | | From: | Popai | | Subject: | Re: Finding maximal repeating substring using suffix tree | | Date: | 12 Jan 2005 01:14:01 -0800 |
|
|
 | Try Chrochemore algorithm.
M. CROCHEMORE, An Optimal Algorithm for Computing the Repetitions in a Word. Information Processing Letters 12 (1981), 244--250. (I wasn't able to find the article over the internet but you can fins a lot of other papers which present the basic idea of the algorithm).
The basic idea is the following: Consider a string S = s1 s2 ... sn over an alphabet A = {a1,a2,...ak}.
Start with the sets C_a1 = all positions where a1 is in S C_a2 = all positions where a2 is in S .... and so on ... C_ak = all positions where ak is in S
Next compute the sets C_a1a1 = all position p from C_a1 where a1 is in S at position p+1 C_a1a2 = all position p from C_a1 where a2 is in S at position p+1 .... and so on with all combination of length 2, then all combinations of lenght 3 and so on.
This way you find all factors of S, positions and how many times the repeat.
At the first look the algorithm sims to be very slow because it takes all the posibilites but in practice it is quite fast because after a few steps many of the sets C_xx...xx are empty and an entire brench of all posibilities tree is cutted. Moreover the sum of cardinality of all sets C_xxx...xxx (with xxx...xxx of the same size) is not bigger than n (the lenght of S).
I hope this helps, Best regars, Ionut Popa
P.S: Or try read the paper Finding Maximal Repetitions in a Word in Linear Time (1999) - Roman Kolpakov, Gregory Kucherov, Proceedings of the40th IEEE Annual Symposium on Foundations of Computer Science(http://citeseer.ist.psu.edu/cache/papers/cs/14999/http:zSzzSzwww.loria.frzSz~kucherovzSzPAPERSzSzKolpakovKucherovFOCS99.pdf/kolpakov99finding.pdf)
dayzman@hotmail.com wrote: > Hi, > > Does anybody know how I can find the maximal repeating substring using > suffix tree? What I mean by maximal repeating substring is: > Given a string S, a substring s that repeats k times in S is a MRS iff: > > k>= 2 and |s| * k >= |s| / 2; and > |s| * k is the maximum among all substrings that satisfy the above > condition; and > k is the maximum among all substrings that satisfy the above two > conditions. > > I know it is possible to find the longest repeating substring using > suffix tree, which is to find the deepest fork > (http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Suffix/), but LRS > only considers the length of the substring, rather than the occurence. > Please help. > > Cheers, > Michael
|
|
 | | From: | Alf P. Steinbach | | Subject: | Re: Finding maximal repeating substring using suffix tree | | Date: | Wed, 12 Jan 2005 07:39:10 GMT |
|
|
 | * dayzman@hotmail.com: > > k>= 2 and |s| * k >= |s| / 2; and
That isn't meaningful.
-- A: Because it messes up the order in which people normally read text. Q: Why is it such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?
|
|
|