A novel architecture and set of learning rules for cortical self-organization is proposed. The model is based on the idea that multiple information channels can modulate one another's plasticity. Features learned from bottom-up information sources can thus be influenced by those learned from contextual pathways, and vice versa. A maximum likelihood cost function allows this scheme to be implemented in a biologically feasible, hierarchical neural circuit. In simulations of the model, we first demonstrate the utility of temporal context in modulating plasticity. The model learns a representation that categorizes people's faces according to identity, independent of viewpoint, by taking advantage of the temporal continuity in image sequences. In a second set of simulations, we add plasticity to the contextual stream and explore variations in the architecture. In this case, the model learns a two-tiered representation, starting with a coarse view-based clustering and proceeding to a finer clustering of more specific stimulus features. This model provides a tenable account of how people may perform 3D object recognition in a hierarchical, bottom-up fashion.