### Abstract

Non-standard stringology concerns string matching problems, wherein a position in the "text" (of size n) matches one in the "pattern)) (of size m), based on very general relationships between the corresponding "symbols". For example, string matching with don't cares is a simple non-standard string matching prob.' lem, wherein text andjor pattern positions might have wildcard symbols rather than those drawn from the base alphabet Σ these wildcards match ever-y symbol from Σ. The main results in this paper concern the inherent complexity of a variety of non-standard string mat thing problems, characterized in terms of algebraic convolutions. Non-standard Basic String Matching: l For three problems from this family - including string mat thing with don't cares and its generalizations - we prove a lower bound of Ω(f(Σ 1)) convolutions, where the (increasing) function depends on the problem. These results are proved in the boolean convolution model that we introduce here. We also match this bound by improving or adapting the bestknown algorithms to this model. l In the RAM model we show, using reductions, that all of these problems encode a variant of truncated boolean convolution with integer parameter n. These reductions allow us to infer that any improvement to the best-known algorithms for these problems (in the RAM) will yield faster algorithms for solving parametrized truncated convolutions of the input vectors. As we show here, the fastest algorithms for the latter problem derived by extending the scheme from [K089] uses 0(min{Pi;,√ M}) convolutions. We also derive analogous results for eight other variants of non-standard string matching problems that are drawn from the following two families: nonstandard counting string matching (eg., variant of classical string mat thing, that involves counting number of mismatches at each text position), and nonstandard threshold string matching (in which the kmismatches problem is a basic example). Interestingly, all of the above results are derived using the structure of the "match graph" defined by the mat thing relation of the given inst ante of the nonstandard string matching problem, and its complement. It turns out that our lower bounds and reductions depend upon the sizes of the induced cliques in these graphs - in particular, the "dominating" cliques in the match graph, and the "clique edge covers/ partitions" in its complement. We also provide improved deterministic and randomized algorithms, as well are those with better expected running times for some non-standard string matching problems.

Original language | English (US) |
---|---|

Title of host publication | Proceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC 1994 |

Publisher | Association for Computing Machinery |

Pages | 770-779 |

Number of pages | 10 |

ISBN (Electronic) | 0897916638 |

DOIs | |

State | Published - May 23 1994 |

Event | 26th Annual ACM Symposium on Theory of Computing, STOC 1994 - Montreal, Canada Duration: May 23 1994 → May 25 1994 |

### Publication series

Name | Proceedings of the Annual ACM Symposium on Theory of Computing |
---|---|

Volume | Part F129502 |

### Other

Other | 26th Annual ACM Symposium on Theory of Computing, STOC 1994 |
---|---|

Country | Canada |

City | Montreal |

Period | 5/23/94 → 5/25/94 |

### Fingerprint

### All Science Journal Classification (ASJC) codes

- Software

### Cite this

*Proceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC 1994*(pp. 770-779). (Proceedings of the Annual ACM Symposium on Theory of Computing; Vol. Part F129502). Association for Computing Machinery. https://doi.org/10.1145/195058.195457

}

*Proceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC 1994.*Proceedings of the Annual ACM Symposium on Theory of Computing, vol. Part F129502, Association for Computing Machinery, pp. 770-779, 26th Annual ACM Symposium on Theory of Computing, STOC 1994, Montreal, Canada, 5/23/94. https://doi.org/10.1145/195058.195457

**Non-standard stringology : Algorithms and complexity.** / Muthukrishnan, Shan; Palem, K.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Non-standard stringology

T2 - Algorithms and complexity

AU - Muthukrishnan, Shan

AU - Palem, K.

PY - 1994/5/23

Y1 - 1994/5/23

N2 - Non-standard stringology concerns string matching problems, wherein a position in the "text" (of size n) matches one in the "pattern)) (of size m), based on very general relationships between the corresponding "symbols". For example, string matching with don't cares is a simple non-standard string matching prob.' lem, wherein text andjor pattern positions might have wildcard symbols rather than those drawn from the base alphabet Σ these wildcards match ever-y symbol from Σ. The main results in this paper concern the inherent complexity of a variety of non-standard string mat thing problems, characterized in terms of algebraic convolutions. Non-standard Basic String Matching: l For three problems from this family - including string mat thing with don't cares and its generalizations - we prove a lower bound of Ω(f(Σ 1)) convolutions, where the (increasing) function depends on the problem. These results are proved in the boolean convolution model that we introduce here. We also match this bound by improving or adapting the bestknown algorithms to this model. l In the RAM model we show, using reductions, that all of these problems encode a variant of truncated boolean convolution with integer parameter n. These reductions allow us to infer that any improvement to the best-known algorithms for these problems (in the RAM) will yield faster algorithms for solving parametrized truncated convolutions of the input vectors. As we show here, the fastest algorithms for the latter problem derived by extending the scheme from [K089] uses 0(min{Pi;,√ M}) convolutions. We also derive analogous results for eight other variants of non-standard string matching problems that are drawn from the following two families: nonstandard counting string matching (eg., variant of classical string mat thing, that involves counting number of mismatches at each text position), and nonstandard threshold string matching (in which the kmismatches problem is a basic example). Interestingly, all of the above results are derived using the structure of the "match graph" defined by the mat thing relation of the given inst ante of the nonstandard string matching problem, and its complement. It turns out that our lower bounds and reductions depend upon the sizes of the induced cliques in these graphs - in particular, the "dominating" cliques in the match graph, and the "clique edge covers/ partitions" in its complement. We also provide improved deterministic and randomized algorithms, as well are those with better expected running times for some non-standard string matching problems.

AB - Non-standard stringology concerns string matching problems, wherein a position in the "text" (of size n) matches one in the "pattern)) (of size m), based on very general relationships between the corresponding "symbols". For example, string matching with don't cares is a simple non-standard string matching prob.' lem, wherein text andjor pattern positions might have wildcard symbols rather than those drawn from the base alphabet Σ these wildcards match ever-y symbol from Σ. The main results in this paper concern the inherent complexity of a variety of non-standard string mat thing problems, characterized in terms of algebraic convolutions. Non-standard Basic String Matching: l For three problems from this family - including string mat thing with don't cares and its generalizations - we prove a lower bound of Ω(f(Σ 1)) convolutions, where the (increasing) function depends on the problem. These results are proved in the boolean convolution model that we introduce here. We also match this bound by improving or adapting the bestknown algorithms to this model. l In the RAM model we show, using reductions, that all of these problems encode a variant of truncated boolean convolution with integer parameter n. These reductions allow us to infer that any improvement to the best-known algorithms for these problems (in the RAM) will yield faster algorithms for solving parametrized truncated convolutions of the input vectors. As we show here, the fastest algorithms for the latter problem derived by extending the scheme from [K089] uses 0(min{Pi;,√ M}) convolutions. We also derive analogous results for eight other variants of non-standard string matching problems that are drawn from the following two families: nonstandard counting string matching (eg., variant of classical string mat thing, that involves counting number of mismatches at each text position), and nonstandard threshold string matching (in which the kmismatches problem is a basic example). Interestingly, all of the above results are derived using the structure of the "match graph" defined by the mat thing relation of the given inst ante of the nonstandard string matching problem, and its complement. It turns out that our lower bounds and reductions depend upon the sizes of the induced cliques in these graphs - in particular, the "dominating" cliques in the match graph, and the "clique edge covers/ partitions" in its complement. We also provide improved deterministic and randomized algorithms, as well are those with better expected running times for some non-standard string matching problems.

UR - http://www.scopus.com/inward/record.url?scp=0028017002&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0028017002&partnerID=8YFLogxK

U2 - https://doi.org/10.1145/195058.195457

DO - https://doi.org/10.1145/195058.195457

M3 - Conference contribution

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 770

EP - 779

BT - Proceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC 1994

PB - Association for Computing Machinery

ER -