In the following samples, we convert a simple MIDI clip to Beethoven domain and Bach domain.
We perform ablation study on the dimensionality of the latent encoding. As can be heard, a latent dimensionality of 64 tends to reconstruct the input (unwanted memorization). A model with a latent space of 8 performs well. A model with a latent dimensionality of 4 is more creative, less related to the input midi, and also suffers from a reduction in quality.
| Audio Source |
|---|
| Latent Size | Beethoven | Bach |
|---|---|---|
| 4 | ||
| 8 | ||
| 64 |