Semi-Supervised and Incremental Sequence Analysis for Taxonomic Classification

Adriana Fasino, Emrecan Ozdogan, Bahrad A. Sokhansanj, Gail Rosen, Robi Polikar

    Research output: Chapter in Book/Report/Conference proceedingConference contribution


    Metagenomic analysis is vital in determining what organisms are present in a microbial sample and why they are present. In this study, we explore the utility of MMseqs2, a bioinformatics pipeline, for taxonomic classification in metagenomics, focusing on 16S rRNA gene sequences. We evaluate the algorithm's performance in full dataset as well as batch-by-batch incremental processing, and more importantly, we add the capability of semi-supervised classification to this otherwise clustering only algorithm. Incremental updating is important because it allows seamless integration and processing of new data, whereas semi-supervised classification allows taxonomic identification of previously unknown organisms. We also evaluate the different clustering modes offered by MMseqs2, and compare MMseqs2 to our previously developed semi-supervised incremental algorithm SSI-VSEARCH. We show that MMseqs2's built-in clusterupdate function works well, and our semi-supervised classification capability adds new functionality to this bioinformatics processing pipeline.

    Original languageAmerican English
    Title of host publication2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Number of pages7
    ISBN (Electronic)9781665430654
    StatePublished - 2023
    Event2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023 - Mexico City, Mexico
    Duration: Dec 5 2023Dec 8 2023

    Publication series

    Name2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023


    Conference2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023
    CityMexico City

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Computer Science Applications
    • Human-Computer Interaction
    • Decision Sciences (miscellaneous)
    • Safety, Risk, Reliability and Quality

    Cite this