Given a database network where each vertex is associated with a transaction
database, we are interested in finding theme communities. Here, a theme
community is a cohesive subgraph such that a common pattern is frequent in all
transaction databases associated with the vertices in the subgraph. Finding all
theme communities from a database network enjoys many novel applications.
However, it is challenging since even counting the number of all theme
communities in a database network is #P-hard. Inspired by the observation that
a theme community shrinks when the length of the pattern increases, we
investigate several properties of theme communities and develop TCFI, a
scalable algorithm that uses these properties to effectively prune the patterns
that cannot form any theme community. We also design TC-Tree, a scalable
algorithm that decomposes and indexes theme communities efficiently. Retrieving
a ranked list of theme communities from a TC-Tree of hundreds of millions of
theme communities takes less than 1 second. Extensive experiments and a case
study demonstrate the effectiveness and scalability of TCFI and TC-Tree in
discovering and querying meaningful theme communities from large database
networks.