Home
Scholarly Works
A Content-Based Indexing Scheme for Large-Scale...
Conference

A Content-Based Indexing Scheme for Large-Scale Unstructured Data

Abstract

The sheer volume of multimedia contents generated by todays Internet services are stored in the cloud. The traditional indexing method associating the user-generated metadata with the content is vulnerable to the inaccuracy caused by the low quality of the metadata. While the content-based indexing does not depend on the error-prone metadata. However, the state-of-the-art research focuses on developing descriptive features and miss the system-oriented considerations when incorporating these features into the practical cloud computing systems. We propose an Update-Efficient and Parallel-Friendly content-based multimedia indexing system, called Partitioned Hash Forest (PHF). The PHF system incorporates the state-of-the-art content-based indexing models and multiple system-oriented optimizations. PHF contains an approximate content-based index and leverages the hierarchical memory system to support the high volume of updates. Additionally, the content-aware data partitioning and lock-free concurrency management module enable the parallel processing of the concurrent user requests. We evaluate PHF in terms of indexing accuracy and system efficiency by comparing it with the state-of-the-art content-based indexing algorithm and its variances. We achieve the significantly better accuracy with less resource consumption, around 37% faster in update processing and up to 2.5X throughput speedup in a multicore platform comparing to other parallel-friendly designs.

Authors

Zhu N; Lu Y; He W; Yu H

Pagination

pp. 205-212

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publication Date

April 1, 2017

DOI

10.1109/bigmm.2017.51

Name of conference

2017 IEEE Third International Conference on Multimedia Big Data (BigMM)
View published work (Non-McMaster Users)

Contact the Experts team