Synthesizing Scenario-based Dataset for User Behavior Pattern Mining Conference Paper uri icon

  • Overview


  • User behavior pattern mining has drawn great attention in business and security areas. Realistic and accurate datasets are required for evaluating various user behavior pattern mining approaches, their implementations and optimization results. Synthetic datasets are crucial due to restricted access to production datasets, security and privacy issues, meeting specific needs of consumers, or the high costs of real datasets. This paper presents a synthetic dataset generator that effectively assists data scientists and analysts in designing scenario-driven datasets with embedded user behavior patterns, and visually analyzing the quality of the generated datasets. We developed an interactive data exploration environment to such a design-generate-visualize-analyze-optimize process. An abstract representation of the real-world user behavior pattern is proposed, which allows data analysts to create datasets with both intended and random patterns injected. Dataset generation is controlled by both data statistics (e.g., data size, and attribute distribution) and scenario-based user behavior patterns (e.g., association pattern, sequential pattern and time constraint). A prototype toolkit has been developed to synthesize and analyze the datasets in different application domains. Keywords-behavior pattern; synthetic dataset generation; data mining; visualization; sequential pattern mining; clustering

publication date

  • 2015