Home
Scholarly Works
Primary Key Free Watermarking for Numerical...
Chapter

Primary Key Free Watermarking for Numerical Tabular Datasets in Machine Learning

Abstract

High-quality tabular datasets are often traded by their owners as valuable digital assets due to their scarcity and usefulness in training machine learning models. A pivotal concern when trading the datasets is their ownership, which is seriously threatened by piracy due to the simplicity of reselling illegal copies. This produces an urgent demand for an effective watermarking method to demonstrate the ownership of the dataset. Existing database watermarking methods rely on either a primary key or a virtual primary key to watermark a tabular dataset. These methods cannot work well in the context of machine learning, because a primary key can be easily modified without affecting the machine learning utility of a tabular dataset, and a virtual primary key is often not robust against watermark-removing attacks. How to watermark a tabular dataset without using a primary key or virtual primary key is a challenging task that has not been systematically studied before. In this paper, we tackle this task by a novel primary key free method that embeds a sinusoidal signal as the watermark into a discrete-time signal constructed from the tabular dataset. We conduct an in-depth theoretical analysis on the exceptional robustness of our watermark against five challenging attacks, and further validate the robustness through comprehensive experiments on two real-world datasets.

Authors

Che X; Akbari M; Li S; Yue D; Zhang Y; Chu L

Book title

Pattern Recognition

Series

Lecture Notes in Computer Science

Volume

15331

Pagination

pp. 254-270

Publisher

Springer Nature

Publication Date

January 1, 2025

DOI

10.1007/978-3-031-78119-3_18
View published work (Non-McMaster Users)

Contact the Experts team