Non-standard stringology

Algorithms and complexity

Shan Muthukrishnan, K. Palem

Research output: Chapter in Book/Report/Conference proceedingConference contribution

31 Citations (Scopus)

Abstract

Non-standard stringology concerns string matching problems, wherein a position in the "text" (of size n) matches one in the "pattern)) (of size m), based on very general relationships between the corresponding "symbols". For example, string matching with don't cares is a simple non-standard string matching prob.' lem, wherein text andjor pattern positions might have wildcard symbols rather than those drawn from the base alphabet Σ these wildcards match ever-y symbol from Σ. The main results in this paper concern the inherent complexity of a variety of non-standard string mat thing problems, characterized in terms of algebraic convolutions. Non-standard Basic String Matching: l For three problems from this family - including string mat thing with don't cares and its generalizations - we prove a lower bound of Ω(f(Σ 1)) convolutions, where the (increasing) function depends on the problem. These results are proved in the boolean convolution model that we introduce here. We also match this bound by improving or adapting the bestknown algorithms to this model. l In the RAM model we show, using reductions, that all of these problems encode a variant of truncated boolean convolution with integer parameter n. These reductions allow us to infer that any improvement to the best-known algorithms for these problems (in the RAM) will yield faster algorithms for solving parametrized truncated convolutions of the input vectors. As we show here, the fastest algorithms for the latter problem derived by extending the scheme from [K089] uses 0(min{Pi;,√ M}) convolutions. We also derive analogous results for eight other variants of non-standard string matching problems that are drawn from the following two families: nonstandard counting string matching (eg., variant of classical string mat thing, that involves counting number of mismatches at each text position), and nonstandard threshold string matching (in which the kmismatches problem is a basic example). Interestingly, all of the above results are derived using the structure of the "match graph" defined by the mat thing relation of the given inst ante of the nonstandard string matching problem, and its complement. It turns out that our lower bounds and reductions depend upon the sizes of the induced cliques in these graphs - in particular, the "dominating" cliques in the match graph, and the "clique edge covers/ partitions" in its complement. We also provide improved deterministic and randomized algorithms, as well are those with better expected running times for some non-standard string matching problems.

Original languageEnglish (US)
Title of host publicationProceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC 1994
PublisherAssociation for Computing Machinery
Pages770-779
Number of pages10
ISBN (Electronic)0897916638
DOIs
StatePublished - May 23 1994
Event26th Annual ACM Symposium on Theory of Computing, STOC 1994 - Montreal, Canada
Duration: May 23 1994May 25 1994

Publication series

NameProceedings of the Annual ACM Symposium on Theory of Computing
VolumePart F129502

Other

Other26th Annual ACM Symposium on Theory of Computing, STOC 1994
CountryCanada
CityMontreal
Period5/23/945/25/94

Fingerprint

String Matching
Convolution
Matching Problem
Thing
Clique
Strings
Fast Algorithm
Complement
Graph in graph theory
Edge Cover
Lower bound
Increasing Functions
Deterministic Algorithm
Randomized Algorithms
Pi
Natural number
Counting
Partition
Model
Integer

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Muthukrishnan, S., & Palem, K. (1994). Non-standard stringology: Algorithms and complexity. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC 1994 (pp. 770-779). (Proceedings of the Annual ACM Symposium on Theory of Computing; Vol. Part F129502). Association for Computing Machinery. https://doi.org/10.1145/195058.195457
Muthukrishnan, Shan ; Palem, K. / Non-standard stringology : Algorithms and complexity. Proceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC 1994. Association for Computing Machinery, 1994. pp. 770-779 (Proceedings of the Annual ACM Symposium on Theory of Computing).
@inproceedings{04419322a4874185a9ec11531a31b047,
title = "Non-standard stringology: Algorithms and complexity",
abstract = "Non-standard stringology concerns string matching problems, wherein a position in the {"}text{"} (of size n) matches one in the {"}pattern)) (of size m), based on very general relationships between the corresponding {"}symbols{"}. For example, string matching with don't cares is a simple non-standard string matching prob.' lem, wherein text andjor pattern positions might have wildcard symbols rather than those drawn from the base alphabet Σ these wildcards match ever-y symbol from Σ. The main results in this paper concern the inherent complexity of a variety of non-standard string mat thing problems, characterized in terms of algebraic convolutions. Non-standard Basic String Matching: l For three problems from this family - including string mat thing with don't cares and its generalizations - we prove a lower bound of Ω(f(Σ 1)) convolutions, where the (increasing) function depends on the problem. These results are proved in the boolean convolution model that we introduce here. We also match this bound by improving or adapting the bestknown algorithms to this model. l In the RAM model we show, using reductions, that all of these problems encode a variant of truncated boolean convolution with integer parameter n. These reductions allow us to infer that any improvement to the best-known algorithms for these problems (in the RAM) will yield faster algorithms for solving parametrized truncated convolutions of the input vectors. As we show here, the fastest algorithms for the latter problem derived by extending the scheme from [K089] uses 0(min{Pi;,√ M}) convolutions. We also derive analogous results for eight other variants of non-standard string matching problems that are drawn from the following two families: nonstandard counting string matching (eg., variant of classical string mat thing, that involves counting number of mismatches at each text position), and nonstandard threshold string matching (in which the kmismatches problem is a basic example). Interestingly, all of the above results are derived using the structure of the {"}match graph{"} defined by the mat thing relation of the given inst ante of the nonstandard string matching problem, and its complement. It turns out that our lower bounds and reductions depend upon the sizes of the induced cliques in these graphs - in particular, the {"}dominating{"} cliques in the match graph, and the {"}clique edge covers/ partitions{"} in its complement. We also provide improved deterministic and randomized algorithms, as well are those with better expected running times for some non-standard string matching problems.",
author = "Shan Muthukrishnan and K. Palem",
year = "1994",
month = "5",
day = "23",
doi = "https://doi.org/10.1145/195058.195457",
language = "English (US)",
series = "Proceedings of the Annual ACM Symposium on Theory of Computing",
publisher = "Association for Computing Machinery",
pages = "770--779",
booktitle = "Proceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC 1994",

}

Muthukrishnan, S & Palem, K 1994, Non-standard stringology: Algorithms and complexity. in Proceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC 1994. Proceedings of the Annual ACM Symposium on Theory of Computing, vol. Part F129502, Association for Computing Machinery, pp. 770-779, 26th Annual ACM Symposium on Theory of Computing, STOC 1994, Montreal, Canada, 5/23/94. https://doi.org/10.1145/195058.195457

Non-standard stringology : Algorithms and complexity. / Muthukrishnan, Shan; Palem, K.

Proceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC 1994. Association for Computing Machinery, 1994. p. 770-779 (Proceedings of the Annual ACM Symposium on Theory of Computing; Vol. Part F129502).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Non-standard stringology

T2 - Algorithms and complexity

AU - Muthukrishnan, Shan

AU - Palem, K.

PY - 1994/5/23

Y1 - 1994/5/23

N2 - Non-standard stringology concerns string matching problems, wherein a position in the "text" (of size n) matches one in the "pattern)) (of size m), based on very general relationships between the corresponding "symbols". For example, string matching with don't cares is a simple non-standard string matching prob.' lem, wherein text andjor pattern positions might have wildcard symbols rather than those drawn from the base alphabet Σ these wildcards match ever-y symbol from Σ. The main results in this paper concern the inherent complexity of a variety of non-standard string mat thing problems, characterized in terms of algebraic convolutions. Non-standard Basic String Matching: l For three problems from this family - including string mat thing with don't cares and its generalizations - we prove a lower bound of Ω(f(Σ 1)) convolutions, where the (increasing) function depends on the problem. These results are proved in the boolean convolution model that we introduce here. We also match this bound by improving or adapting the bestknown algorithms to this model. l In the RAM model we show, using reductions, that all of these problems encode a variant of truncated boolean convolution with integer parameter n. These reductions allow us to infer that any improvement to the best-known algorithms for these problems (in the RAM) will yield faster algorithms for solving parametrized truncated convolutions of the input vectors. As we show here, the fastest algorithms for the latter problem derived by extending the scheme from [K089] uses 0(min{Pi;,√ M}) convolutions. We also derive analogous results for eight other variants of non-standard string matching problems that are drawn from the following two families: nonstandard counting string matching (eg., variant of classical string mat thing, that involves counting number of mismatches at each text position), and nonstandard threshold string matching (in which the kmismatches problem is a basic example). Interestingly, all of the above results are derived using the structure of the "match graph" defined by the mat thing relation of the given inst ante of the nonstandard string matching problem, and its complement. It turns out that our lower bounds and reductions depend upon the sizes of the induced cliques in these graphs - in particular, the "dominating" cliques in the match graph, and the "clique edge covers/ partitions" in its complement. We also provide improved deterministic and randomized algorithms, as well are those with better expected running times for some non-standard string matching problems.

AB - Non-standard stringology concerns string matching problems, wherein a position in the "text" (of size n) matches one in the "pattern)) (of size m), based on very general relationships between the corresponding "symbols". For example, string matching with don't cares is a simple non-standard string matching prob.' lem, wherein text andjor pattern positions might have wildcard symbols rather than those drawn from the base alphabet Σ these wildcards match ever-y symbol from Σ. The main results in this paper concern the inherent complexity of a variety of non-standard string mat thing problems, characterized in terms of algebraic convolutions. Non-standard Basic String Matching: l For three problems from this family - including string mat thing with don't cares and its generalizations - we prove a lower bound of Ω(f(Σ 1)) convolutions, where the (increasing) function depends on the problem. These results are proved in the boolean convolution model that we introduce here. We also match this bound by improving or adapting the bestknown algorithms to this model. l In the RAM model we show, using reductions, that all of these problems encode a variant of truncated boolean convolution with integer parameter n. These reductions allow us to infer that any improvement to the best-known algorithms for these problems (in the RAM) will yield faster algorithms for solving parametrized truncated convolutions of the input vectors. As we show here, the fastest algorithms for the latter problem derived by extending the scheme from [K089] uses 0(min{Pi;,√ M}) convolutions. We also derive analogous results for eight other variants of non-standard string matching problems that are drawn from the following two families: nonstandard counting string matching (eg., variant of classical string mat thing, that involves counting number of mismatches at each text position), and nonstandard threshold string matching (in which the kmismatches problem is a basic example). Interestingly, all of the above results are derived using the structure of the "match graph" defined by the mat thing relation of the given inst ante of the nonstandard string matching problem, and its complement. It turns out that our lower bounds and reductions depend upon the sizes of the induced cliques in these graphs - in particular, the "dominating" cliques in the match graph, and the "clique edge covers/ partitions" in its complement. We also provide improved deterministic and randomized algorithms, as well are those with better expected running times for some non-standard string matching problems.

UR - http://www.scopus.com/inward/record.url?scp=0028017002&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0028017002&partnerID=8YFLogxK

U2 - https://doi.org/10.1145/195058.195457

DO - https://doi.org/10.1145/195058.195457

M3 - Conference contribution

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 770

EP - 779

BT - Proceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC 1994

PB - Association for Computing Machinery

ER -

Muthukrishnan S, Palem K. Non-standard stringology: Algorithms and complexity. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC 1994. Association for Computing Machinery. 1994. p. 770-779. (Proceedings of the Annual ACM Symposium on Theory of Computing). https://doi.org/10.1145/195058.195457