Samples for Semantic Blending Experiment in WAV Space

Introduction

(Updated on 15th November, 2018)

We have selected two random 5 second segments A and B from each domain. Then, we combine the segments, in their WAV form, as follows:
starting with 3.5 second from $A$, we combine the next 1.5 seconds of $B$ with the first 1.5 seconds of $B$ using a linear weighting with weights $1 - t/1.5$ and $t/1.5$ respectively, where $t \in [0,1.5]$.
We then use the decoder to generate audio. The results are natural and the shift is completely seamless, as seen below.

Music Blending

Sample No. 1 - Solo Cello

Inputs

A	B

Blending

	WAV Blending		Latent Blending
	A	B	A	B
A
B

Sample No. 2 - Solo Piano

Inputs

A	B

Blending

	WAV Blending		Latent Blending
	A	B	A	B
A
B

Sample No. 3 - Wind Quintet

Inputs

A	B

Blending

	WAV Blending		Latent Blending
	A	B	A	B
A
B

Sample No. 4 - Accompanied Violin

Inputs

A	B

Blending

	WAV Blending		Latent Blending
	A	B	A	B
A
B

Sample No. 5 - String Quartet

Inputs

A	B

Blending

	WAV Blending		Latent Blending
	A	B	A	B
A
B

Voice Blending

In order to emphasize the difference between wav-domain and latent-domain blending, we added blending experiments on the voice conversion task.
We apply our method on 3 publicly available datasets - “Nancy” from Blizzard 2011[1], Blizzard 2013[2] and LJ[3] dataset.

The blended voice samples show a clear difference between the two. Samples blended in WAV space depicts a “cross-fading” effect i.e. a dominant speaker and a quite speaker are heard simultaneously.
In contrast, blending in latent space creates the effect of natural-sounding mumbling of a single speaker.

Sample No. 6 - Blizzard 2013

Inputs

A	B

Blending

	WAV Blending		Latent Blending
	A	B	A	B
A
B

Sample No. 7 - Blizzard

Inputs

A	B

Blending

	WAV Blending		Latent Blending
	A	B	A	B
A
B

Sample No. 8 - LJ

Inputs

A	B

Blending

	WAV Blending		Latent Blending
	A	B	A	B
A
B

[1] Simon King and Vasilis Karaiskos, “The Blizzard challenge 2011,” in Blizzard Challenge workshop, 2011.

[2] Simon King and Vasilis Karaiskos, “The Blizzard challenge 2013,” in Blizzard Challenge workshop, 2013.

[3] Keith Ito, “The LJ speech dataset,” 2017.