AI , Robotics and Education Association

Match any image with an imaginary soundscape using this AI-powered web app

By James Vincent  |  May 24 , 2018


Sound can be marvelously evocative; whisking us away from offices and homes to sit beside babbling streams in shady forests, or shiver on an exposed mountain side. But a new project by Japanese researchers takes advantage of this imaginative potential and combines it with AI to magical effect. The resulting web app — “Imaginary Soundscape” — uses machine learning to match any picture you upload with a suitable audio pairing.

Upload an Japanese woodcut of fishing boats, for example, and the system offers waves and water sounds; load an abstract painting of nightingales, and you’re given a garden soundscape of wind chimes and birds. Often the results are exactly what you’d expect, but more interesting is when the system picks up on elements in the picture you might not immediately have thought of (like pairing Megatron with tractor sounds), or that make no sense at all (like this painting of hands matched with sounds from a live sports game). Some quicks tips: you tend to get interesting results when you upload artwork, photographs of human activity, and abstract images. Uploading memes just confuses the machine.

All this is the product of a relatively simple AI mechanism. It uses object recognition to identify elements within the image, and then matches these to a dataset of more than 52,000 sound files. The researchers responsible — Yuma Kajihara, Shoya Dozono, and Nao Tokui of the University of Toky — have been working on this sort of “cross modal” model for years. An earlier version of Imaginary Soundscapes launched in January to match audio with images from Google Street View, and you can read this blog post from Tokui for more of the background on that project. (He cites Brian Eno as an inspiration.) It’s a fun little app and a fantastic example of the new types of creative expression AI enables. The system isn’t perfect — the objection recognition makes errors, and the sound database is unavoidably incomplete — but that’s part of the charm. If anything, the serendipitous and unexpected soundscapes it produces are more interesting than the “correct” matches. Let us know what pairs you make in the comments below.


Return to News >