What is a $\mathcal{S}$-VAE?

A $\mathcal{S}$-VAE is a variational auto-encoder with a hyperspherical latent space. In our paper we propose to use the von Mises-Fisher (vMF) distribution to achieve this, under which the Uniform distribution on the hypersphere is recovered as a special case. Hence this approach allows for a truly uninformative prior, with a clear advantage for data with a hyperspherical latent representation.

Why use a $\mathcal{S}$-VAE instead of a $\mathcal{N}$-VAE?

VAEs using the Gaussian variational prior/posterior ($\mathcal{N}$-VAE) are the default choice. But although the Gaussian is mathematically convenient, it exhibits some problematic properties both in low and high dimenions; in low dimensions origin gravity takes place, where points are drawn to the origin. In high dimensions, the so called ‘soap-bubble-effect’ is observed, where the vast majority of the probability mass is concentrated on the hyperspherical shell.

Additionally, on a more fundamental level there is the problem of manifold mapping: if the data to be modeled, lies on a latent manifold not homeomorphic to $\mathbb{R}^M$, information is lost. We will show an example of this last phenomenon next.

Example: Recovering a Hyperspherical Latent Structure

data: We first generate samples from a mixture of three vMFs on the circle, $\mathcal{S}^1$, which subsequently are mapped into the higher dimensional $\mathbb{R}^{100}$ by applying a noisy, non-linear transformation.

models: After this, we in turn train an auto-encoder, a $\mathcal{N}$-VAE, and a $\mathcal{S}$-VAE. We further investigate the behavior of the $\mathcal{N}$-VAE, by training a model using a scaled down KL divergence.

From left to right, plots of learned latent space representations of a. the Auto-Encoder, b. the $\mathcal{N}$-VAE, c. the $\mathcal{N}$-VAE with a scaled down KL divergence, and d. the $\mathcal{S}$-VAE.

Note that as expected the auto-encoder mostly recovers the original circular latent space as there are no distributional restrictions. We clearly observe for the $\mathcal{N}$-VAE that points collapse around the origin due to the KL, which is much less pronounced when the KL contribution is scaled down. Lastly, the $\mathcal{S}$-VAE almost perfectly recovers the original circular latent space. The observed behavior confirms our intuition.

Citation

Davidson, T. R., Falorsi, L., De Cao, N., Kipf, T., and Tomczak,
J. M. (2018). Hyperspherical Variational Auto-Encoders. 34th 
Conference on Uncertainty in Artificial Intelligence (UAI-18).

BibTeX format:

@article{s-vae18,
  title={Hyperspherical Variational Auto-Encoders},
  author={Davidson, Tim R. and
          Falorsi, Luca and 
          De Cao, Nicola and
          Kipf, Thomas and
          Tomczak, Jakub M.},
  journal={34th Conference on Uncertainty in Artificial Intelligence (UAI-18)},
  year={2018}
}

Hyperspherical VAE

Tim R. Davidson, Luca Falorsi, Nicola De Cao, Thomas Kipf, Jakub M. Tomczak

What is a $\mathcal{S}$-VAE?

Why use a $\mathcal{S}$-VAE instead of a $\mathcal{N}$-VAE?

Example: Recovering a Hyperspherical Latent Structure

Citation