Context-based adaptive entropy coding is a key component of many data compression algorithms. In designing these coders one has to balance between the benefits of using a large number of conditioning classes, i.e., high-order context, and the penalties of context dilution. Context quantization is a technique to solve this problem. The basic idea is to merge instances of the context that have similar statistics. For binary sources polynomial-time algorithms exist to design context quantizer of minimum empirical conditional entropy with respect to a training set. But a daunting operational difficulty remains as how to describe the partition of the context space in which the conditional entropy coding is conducted. In this paper, we propose a technique to code the context description of the optimal context quantizer guided by the principle of minimum description length. The results show that our approach outperforms the JBIG2 standard.