In this edition of Ask a Developer we got a chance to chat with Dr. Peter Meiher. Dr. Meijer is bringing science fiction to life with his app vOICe, which allows blind people to get a picture of the world through technology.
From where did the idea originate for the vOICe application?
It all started back in 1983 while I was studying physics at Delft University
of Technology. I wanted to learn more about digital technology by designing
a digital device. So I sat down brainstorming for a few days, trying to think
of something novel, challenging and potentially useful to mankind, and then
this idea of converting images into sound was born: my hope was that blind
people might – over time and after extensive practice – learn to mentally
perform the reverse mapping, and “see” the images that were encoded in sound.
A rising line would sound like a rising tone, and when hearing a rising tone
they “ought” to visualize and “see” a rising line. Two lines would sound as
two simultaneous tones, and so on up to arbitrary image bitmaps. Now PCs were
way too slow back then to do it all in real-time for arbitrary images, so I
designed and built a 5-stage pipelined special purpose computer to do the job.
Much much later PCs (late 90s) and even smartphones (Nokia, around 2003, and
Android phones, as of 2008) came up to speed, and I could do the real-time
image to sound conversion in software running on mass market devices. This
then enabled me to skip the manufacturing hurdle altogether and make The vOICe
globally available at very low cost through distribution of software over the
Internet. The vOICe for Android for instance has been installed over 170,000
times from Google Play, in 189 countries.
What made you decide to implement the program on the Android platform?
I felt that the open approach with Google making Android available to any
phone manufacturer would make Android repeat the history of Microsoft Windows
on the PC, but now for mobile devices. More closed mini-computer platforms
were overrun by the tidal wave of many vendors competing on the PC platform,
with consumers buying the best among OS-compatible devices. Despite Apple
having a head start in mobile, my bet was – and is – that it will not last
unless they give up their tight control.
However, in the early days of Android, in 2008, doing real-time processing
of camera images under Android was problematic: the header-less image format
was undocumented, and I had to reverse engineer the frame encoding while I
did not even have a physical Android device at the time. Some fellow Android
developers with a shared interest in computer vision type of apps handed me
a few camera bitmaps, and that is what I used to reverse engineer the format.
Also, initially it was impossible under Android to synthesize and play sound
in-memory: I had to synthesize soundscapes in memory, write them to SD card,
and play them as files from there. These issues were finally solved in later
Android versions, and the mediocre speed of the Dalvik interpreter got resolved
through a combination of JIT (just-in-time) compilation and NDK (Native
Development Kit). These were the growing pains of Android, you might say.
What are your future plans for the project? What kind of new features are
you going to implement?
I absolutely want to run The vOICe for Android on/with Google Glasses and
other augmented reality glasses once these become available. Having a camera
inside glasses makes the experience of seeing-with-sound much more immersive,
and hands-free. However, Google does not yet provide any specifications let
alone APIs as needed to actively develop toward this goal. So I can only wait
and hope that Google does not omit key features that are essential for running
my app on/with Google Glasses (I write on/with here because it is also unknown
whether apps will run mostly on the glasses or on a companion smartphone).
For the time being, blind users who are serious about wearing The vOICe run
the Windows version of it on a netbook PC inside a backpack, in combination
with $30 USB camera glasses that they buy on eBay.
Can you explain the science behind the translation of visual input to an
audio output? How are people meant to utilize these sounds?
For pixels in grayscale images you have 3 dimensions: horizontal position,
vertical position, and brightness. The positioning needs to be especially accurate.
In hearing sound, the two most accurate dimensions are frequency and time,
so I use those to encode position, and further map brightness to loudness.
A camera image is scanned once per second from left to right, while
associating vertical position with frequency (pitch), and brightness with
loudness. The time dimension is perceptually further supported by stereo
panning. The end result approximates the inverse of a spectrogram: The vOICe
synthesizes sounds in such a way that a spectrogram returns a recognizable image
of the original camera view. It is the challenge to the blind user to learn
to do this reconstruction mentally. The resulting perceptual resolution is
low, but still better than that offered by current $100,000 retinal implants,
as was recently published in PLoS ONE (‘Visual’ Acuity of the Congenitally Blind
Using Visual-to-Auditory Sensory Substitution). Moreover, a study with
The vOICe at Harvard Medical School has shown that the visual cortex of
experienced blind users of The vOICe gets recruited and functionally involved
in processing soundscapes from The vOICe. This was featured on television in
2008 in The Nature of Things (click here for the video). An NSF-funded SBIR Phase I
study in 2010 by MetaModal LLC in Pasadena, California, explored various
training exercises with a small number of blind participants, click here to see videos of the study.
What has been your favorite experience in the development process for this
It is mostly the collaboration with blind people around the world as well
as with research partners who investigate the neuroscience of seeing with
sound in the context of brain plasticity, synesthesia and other research
areas. I think for The vOICe for Android my favorite experience is yet to
come, with the advent of suitable augmented reality glasses that should
allow blind users to run The vOICe for Android immersively and hands-free.
Dr. Peter Meijer’s research and project are presented through his website www.seeingwithsound.com