The development of fault-tolerant distributed systems that can tolerate
Byzantine behavior has traditionally been focused on consensus protocols, which
support fully-replicated designs. For the development of more sophisticated
high-performance Byzantine distributed systems, more specialized fault-tolerant
communication primitives are necessary, however.
In this paper, we identify an essential communication primitive and study it
in depth. In specifics, we formalize the cluster-sending problem, the problem
of sending a message from one Byzantine cluster to another Byzantine cluster in
a reliable manner. We not only formalize this fundamental problem, but also
establish lower bounds on the complexity of this problem under crash failures
and Byzantine failures. Furthermore, we develop practical cluster-sending
protocols that meet these lower bounds and, hence, have optimal complexity. As
such, our work provides a strong foundation for the further exploration of
novel designs that address challenges encountered in fault-tolerant distributed
systems.