Home
Scholarly Works
User-generated short-text classification using...
Journal article

User-generated short-text classification using cograph editing-based network clustering with an application in invoice categorization

Abstract

Rapid adaptation of online business platforms in every sector creates an enormous amount of user-generated textual data related to providing product or service descriptions, reviewing, marketing, invoicing and bookkeeping. These data are often short in size, noisy (e.g., misspellings, abbreviations), and do not have accurate classifying labels (line-item categories). Classifying these user-generated short-text data with appropriate line-item categories is crucial for corresponding platforms to understand users’ needs. This paper proposed a framework for user-generated short-text classification based on identified line-item categories. In the line-item identification phase, we used cograph editing (CoE)-based clustering on keywords network, which can be formulated from users’ generated short-texts. We also proposed integer linear programming (ILP) formulations for CoE on weighted networks and designed a heuristic algorithm to identify clusters in large-scale networks. Finally, we outlined an application of this framework to categorize invoices in an empirical setting. Our framework showed promising results in identifying invoice line-item categories for large-scale data.

Authors

Wahid DF; Hassini E

Journal

Data & Knowledge Engineering, Vol. 148, ,

Publisher

Elsevier

Publication Date

November 1, 2023

DOI

10.1016/j.datak.2023.102238

ISSN

0169-023X

Contact the Experts team