In the following samples, we convert a simple MIDI clip to Beethoven domain and Bach domain.
We perform ablation study on the dimensionality of the latent encoding. As can be heard, a latent dimensionality of 64 tends to reconstruct the input (unwanted memorization). A model with a latent space of 8 performs well. A model with a latent dimensionality of 4 is more creative, less related to the input midi, and also suffers from a reduction in quality.
Audio Source |
---|
Latent Size | Beethoven | Bach |
---|---|---|
4 | ||
8 | ||
64 |