Home
Scholarly Works
Polyglot and Distributed Software Repository...
Conference

Polyglot and Distributed Software Repository Mining with Crossflow

Abstract

Mining software repositories at a large scale typically requires substantial computational and storage resources. This creates an increasing need for repository mining programs to be executed in a distributed manner, such that remote collaborators can contribute local computational and storage resources. In this paper we present Crossflow, a novel framework for building polyglot distributed repository mining programs. We demonstrate how Crossflow offers delegation of mining jobs to remote workers and can cache their results, how such workers are able to implement advanced behavior like load balancing and rejecting jobs they either cannot perform or would execute sub-optimally, and how workers of the same analysis program can be written in different programing languages like Java and Python, executing only relevant parts of the program described in that language.

Authors

Barmpis K; Neubauer P; Co J; Kolovos D; Matragkas N; Paige RF

Pagination

pp. 374-384

Publisher

Association for Computing Machinery (ACM)

Publication Date

June 29, 2020

DOI

10.1145/3379597.3387481

Name of conference

Proceedings of the 17th International Conference on Mining Software Repositories
View published work (Non-McMaster Users)

Contact the Experts team