We compare our results with two published result sets in the field of Concatenative Synthesis. [1] and [2] are a work comparing several flavors of musical mosaicing algorithms:
dp – Dynamic programming path search, supporting continuity but not mixtures.
mix – A Mixture method resembling non-negative matching pursuit done on a frame-by-frame basis. Supports mixtures but not continuity.
mp – Matching pursuit decomposition in the audio time domain, using a dictionary of analytic signal atoms extracted from the sources, as implemented by MPTK [Krstulovic and Gribonval, 2006].
near – Classic nearest-neighbor matching done on a frame-by-frame
tracks – Hybrid tracks algorithm, heuristic algorithm allowing for mixtures and continuity. (near and mix are both implemented by this
method with certain features turned off).
These methods take two inputs. In addition to the audio being converted, it also takes audio to use as source material to reconstruct the target audio from.
For comparison, we converted the target audio to three domains using the network described in the paper. The following table shows our result as well as a sample from the target domain
For method [3], the source audio from which the target is reconstructed is a collection of 4 String Quartets by Schoenberg.
Input - Mahler, Ritenuto
Concatenative Method Results
Result
Autoencoder based Music Translation
For comparison, we converted the target audio using our String Quartets decoder, trained on String Quartets by Beethoven. We also provide the result of conversion to a Wind Quintet. The following table shows our result as well as a sample from the target domain
Ours - String Quartet
Ours - Wind Quintet
[1] Samples taken from http://www.dtic.upf.edu/~gcoleman/dsounds/, sample A.
[2] Samples taken from http://www.dtic.upf.edu/~gcoleman/dsounds/, sample G.
[3] Samples taken from http://spectrum.mat.ucsb.edu/~b.sturm/CMJ2006/MATConcat.html, Mahler's crescendi.