Home
Scholarly Works
An Overview of String Processing Applications to...
Conference

An Overview of String Processing Applications to Data Analytics

Abstract

Data analytics may conveniently be divided into four stages: preparation, preprocessing, analysis, and post-processing. Especially in the second and third of these, where the data is cleaned, filtered and analyzed, string processing algorithms are fundamental. Applicable string methodology especially includes pattern matching (dozens of competing algorithms) and algorithms that compute repetitions and other forms of regularity. These are supported by powerful data structures (suffix array, prefix table, Burrows-Wheeler Transform, Lyndon array, and many others), developed and refined over the last 50 years. In this paper we provide an overview of three central methodological areas: •pattern matching;•repetitions (of both adjacent and non-adjacent repeating substrings);•string covering and compression. Each of these methodologies deals with both exact and approximate matches in the data provided. We outline several current applications to data analytics, in particular bioinformatics, information security and image analysis — all of them therefore positioned for future extension as string methodologies continue their rapid development. pattern matching; repetitions (of both adjacent and non-adjacent repeating substrings); string covering and compression.

Authors

Koponen H; Mhaskar N; Smyth WF

Volume

00

Pagination

pp. 1-8

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publication Date

May 19, 2021

DOI

10.1109/rdaaps48126.2021.9452004

Name of conference

2021 Reconciling Data Analytics, Automation, Privacy, and Security: A Big Data Challenge (RDAAPS)
View published work (Non-McMaster Users)

Contact the Experts team