Optimal Simplicity: Minimizing Description Length in Neural Networks