Audio textures
Supplementary audio
Example textures
Examples of some textures synthesized with a large weight on the autocorrelation loss and a relatively low weight on the diversity loss.
Texture | Original | Synthesized |
---|---|---|
Tapping 1-2-3 | ||
Wind chimes | ||
Person speaking English | ||
Frogs and insects |
Evolution of the audio during optimization
This sequence of audio shows what the wind chimes texture sounds like at various points during the optimization. In this case the optimization halted after 1768 steps rather than run for the full 2000 steps.
Steps | Audio |
---|---|
3 | |
10 | |
30 | |
100 | |
300 | |
1000 | |
1768 |
Effect of the weight on the autocorrelation term
Rhythmic textures synthesized with different weights on the autocorrelation term in the loss.
Autocor-relation weight | Tapping 1-2 | Tapping 1-2-3 |
---|---|---|
0 | ||
1 | ||
1000 | ||
100000 |
Effect of the weight on the diversity term
Complex sounds synthesized with different weights on the diversity loss.
Diversity weight | Wind chimes | Person speaking French |
---|---|---|
1e-5 | ||
1e-3 |
Effect of the receptive field size
As the size of the receptive field widens the textures can reproduce longer-term structure.
Convolutional kernel size | Wind chimes | Brushing teeth |
---|---|---|
Original | ||
4 | ||
16 | ||
64 | ||
256 |
Effect of the number of filters
As the number of filters increases, the quality of the textures improves.
Number of filters | Wind chimes | Frogs and insects |
---|---|---|
2 | ||
8 | ||
32 | ||
128 | ||
512 |
Effect of stacking
Separate one-layer convolutional networks with different receptive field sizes work better than stacking several convolutional layers.
Architecture | Wind chimes | Frogs and insects |
---|---|---|
Original | ||
Stacked | ||
Separate |