To improve generalisation in supervised learning, it is common to encourage invariance in the solution, i.e. keeping the output relatively constant to irrelevant transformations of the input. Many techniques can be seen as introducing invariance, such as data augmentation, convolutional structure, or more general group structure. But how do we learn what invariances should be used for a dataset? In this talk, we will discuss why the usual training loss is not the right objective function. We instead use the marginal likelihood as suggested by Bayesian inference, and develop a procedure which learns a useful invariance through gradient-based optimisation. Our model learns to be invariant to perturbations that are commonly hand-crafted in data augmentation, and learns very different perturbations depending on the dataset. We finish by speculating on how procedures like these can help automate the creation of network architectures.