Fast kernel methods for SVM sequence classifiers

Pavel Kuksa, Vladimir Pavlovic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

In this work we study string kernel methods for sequence analysis and focus on the problem of species-level identification based on short DNA fragments known as barcodes. We introduce efficient sorting-based algorithms for exact string k-mer kernels and then describe a divide-and-conquer technique for kernels with mismatches. Our algorithms for mismatch kernel matrix computations improve currently known time bounds for these computations. We then consider the mismatch kernel problem with feature selection, and present efficient algorithms for it. Our experimental results show that, for string kernels with mismatches, kernel matrices can be computed 100-200 times faster than traditional approaches. Kernel vector evaluations on new sequences show similar computational improvements. On several DNA barcode datasets, k-mer string kernels considerably improve identification accuracy compared to prior results. String kernels with feature selection demonstrate competitive performance with substantially fewer computations.

Original languageEnglish (US)
Title of host publicationAlgorithms in Bioinformatics - 7th International Workshop, WABI 2007, Proceedings
Pages228-239
Number of pages12
StatePublished - Dec 24 2007
Event7th International Workshop on Algorithms in Bioinformatics, WABI 2007 - PhiIadelphia, PA, United States
Duration: Sep 8 2007Sep 9 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4645 LNBI

Other

Other7th International Workshop on Algorithms in Bioinformatics, WABI 2007
CountryUnited States
CityPhiIadelphia, PA
Period9/8/079/9/07

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Fast kernel methods for SVM sequence classifiers'. Together they form a unique fingerprint.

  • Cite this

    Kuksa, P., & Pavlovic, V. (2007). Fast kernel methods for SVM sequence classifiers. In Algorithms in Bioinformatics - 7th International Workshop, WABI 2007, Proceedings (pp. 228-239). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4645 LNBI).