Even though our devices are increasingly connected, there are times when they can’t connect to the internet. In that case, you can still transfer some data from one device to another using peer-to-peer connectivity like Bluetooth, Wi-Fi Direct, or NFC. But all these solutions require some specific hardware and APIs which are not always available. On the contrary, we know that each phone has, by definition, a microphone and a speaker. That’s why we decided to come up with an alternative, AudioModem, that transmits data over sound waves.

Sound: limitations and benefits
In this approach, the most significant limitation is the bandwidth. Indeed, speakers and microphones cannot deal with a very wide spectrum of frequencies: outside the audible range they don’t work so well. So we have to use a narrow spectrum, therefore delivering a restricted bandwidth. But even if exchanging big files is out of the table, it remains a very convenient way to transfer contact informations, Wi-Fi passwords or short messages in general.
This solution offers a couple of benefits though. First, we do not have to rely on any infrastructure, since the hardware is already included in a lot of appliances (like TV’s and laptops) and could be easily included in others (such as doorlocks for example). In other words, this channel works out-of-the-box for smartphones, but can also be cheaply extended to a whole range of smart objects.
Another interesting point is that sound propagates in an omnidirectional fashion, making it perfect for broadcasting (reaching multiple recipients at once). Which also means that there is no need for pairing: the receiver does not need to make itself known to the sender before starting a communication (unlike Bluetooth). So a transfer would just require one action from the sender (starting to broadcast) and one action for the receiver (starting listening to all broadcasts).
State of the art
Other people already looked into data transfer over sound waves, and some implementations are already available like chirp.io and Yamaha’s Infosound (in Japanese). So let’s have a look at the choices they made:
- chirp.io uses a bird tweeting sound: while this goes well with the theme of the app, using the whole audible spectrum prevents some interesting usage. For example, using ultrasonic frequencies one could be embed the data in a song or in the soundtrack of a movie without affecting the signal perceived by a human being. That’s probably why Infosound decided to only use the ultrasonic spectrum, and that’s the choice we made. Namely, we’re only using the 18.4kHz-20.8kHz range.
- Both Chirp.io and Infosound require some network connectivity: they do not directly transmit the whole data, but only an URL, which then requires an internet connection to download the original data. AudioModem takes a different approach and directly transfers the data without relying on any external connectivity (this way it’ll work in the subway or in that bar where you never get any network).
Modulation choices
Once we settled on a frequency range, we needed to find a way to encode arbitraty data in (inaudible) sound waves. The method to do that is called modulation (precisely what your modem was doing in the 90’s, but using ultrasonic frequencies). The basic idea is to encode the data bits in the sound waves by varying some of the properties of the carrier wave (amplitude, frequency, phase, or any combination of those):
- Amplitude modulation: it is easier to implement modulation and demodulation (and operations are faster), but this method is really sensitive to disturbances during the transmission (background noise, distorsion, …)
- Frequency modulation: this method uses a larger bandwidth, which is problematic for us, since it makes it difficult to restrict ourselves to inaudible frequencies that can be emitted and received by basic speakers and microphones
- Phase modulation: this modulation is more resilient than the other to disturbances. But most schemes require a coherent receiver, which is more complicated to implement.
In our case, we implemented a phase modulation variant, DBPSK, which removes the need for the coherent receiver by making the data redundant. In the end, its advantages are:
- It is more robust than amplitude modulation as it is not that vulnerable to noise.
- It uses less spectral width than frequency modulation so it’s easier to transmit in the inaudible part of the spectrum.
- It can be demodulated with an easy to implement receiver (known as a non-coherent receiver).
- It offers an acceptable theoretical data rate.
DBPSK in a nutshell
Phase modulations, including DBPSK, encode bits of data in the soud wave (called the carrier wave) by modulating its phase. In its simplest form, the binary digits (0/1) are mapped to two different values of the phase (0°/180°). For example, here’s what the wave looks like when sending the letter ‘e’ encoded in ASCII (01100101) for PSK:

The difference DBPSK introduces is that instead of sending the raw data (01100101), we transform it by replacing each bit with the XOR of this bit and the previous transformed bit, starting with an arbitrarily chosen bit (1 was chosen, which gives us : 110111001).

Additionally, we need to transmit some synchronization code before transmitting our data, otherwise the receiever cannot tell where the data begins. We chose to use a Barker code for this purpose. Our transmission will then look like this:

Finally, such a method usually requires some forward error correction in order to reduce the sensitivity to disturbances.
Implementation
We decided to implement a proof-of-concept prototype, and we settled on iOS because it provides us with a few libraries for signal processing. Namely, we used vDSP (part of the Accelerate framework) which gives a set of mathematical functions, and Audio Queue Services (part of the AudioToolbox framework) to access the speaker and microphone.
The source code is available on our GitHub account
This prototype has an input field where you can type the text you want to send. When you press the “Broadcast” button, the data is continuously broadcasted alternatively with the Barker code. When another user is close enough to pick up the signal he can press the “Receive” button to get the data on his phone. It’s as simple as that!
Final thoughts
Although fully functional, our implementation still lacks major components like error correction and an improved UI. We just wanted to twiddle around with the concept of sending data over sound waves and we hope that you found this interesting. Don’t hesitate to react to this blog post on Twitter.