Using positional sequence patterns to estimate the selectivity of SQL LIKE queries

Mehmet Aytimur, Ali Cakmak*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Sequence patterns are frequently employed in many expert system applications in a wide range of domains from bioinformatics to smart homes and stock market analysis. Regular sequence patterns fail to express whether two consecutive items in a pattern are occurring right after each other in all pattern occurrences in an item database or not. Such a differentiation may be important for many intelligent system applications, for instance, to better address business questions like “should two frequently-bought-together items be located right next to each other on a retail store shelf, or is it ok to place them at some distance as long as they are in the same aisle?”. In this paper, we propose a novel type of sequence pattern, called “positional sequence patterns”, and illustrate its application on a special expert system, i.e., the query planner/optimizer of a database management system. Positional sequence patterns allow to accommodate extra information regarding whether a frequent ordered item pair always occurs next to each other without any gap in between in all pattern occurrences. Since positional sequence patterns are not considered by the existing sequence pattern mining algorithms, we also propose an algorithm to mine them. Next, we integrate the positional sequence patterns into the selectivity estimation component of the query optimizer as an expert system application. More specifically, in the knowledgebase of the query optimizer, a histogram-like structure of positional sequence patterns are created and stored. Then, during query optimization time, these histograms are utilized to infer the selectivity of flexible text queries that are enabled by the SQL LIKE operator. In particular, the proposed selectivity estimation method employs redundant pattern elimination based on pattern information content during histogram construction, and a partitioning-based matching scheme. The experimental results on a real dataset from DBLP show that the proposed approach outperforms the state of the art by around 20% improvement in error rates.

Original languageEnglish
Article number113762
JournalExpert Systems with Applications
Publication statusPublished - 1 Mar 2021

Bibliographical note

Publisher Copyright:
© 2020 Elsevier Ltd


This research is in part funded by TUBITAK ( Scientific and Technological Research Council of Turkey ) under grant number 117E086 .

FundersFunder number
Türkiye Bilimsel ve Teknolojik Araştirma Kurumu117E086


    • Data management
    • Histograms
    • Information content
    • Selectivity estimation
    • Sequence pattern mining


    Dive into the research topics of 'Using positional sequence patterns to estimate the selectivity of SQL LIKE queries'. Together they form a unique fingerprint.

    Cite this