Social Audio Is Dead, Long Live Social Audio

If you’ve heard the term “social audio” being thrown around, you’re probably inside the media business bubble.  That means you’re probably also aware that “social audio” just isn’t hot anymore. 

If you’re not in the media business bubble, you might still have heard of something like Clubhouse, or a Twitter Space or Spotify Live. Social audio is the all-encompassing term for this kind of tech and practice – you might have called it emergent, before it started receding in recent months. If you’re still in the dark, think of something like a conference call, or live radio, except it’s run and hosted by friends, acquaintances, public figures, or experts. You log in, you listen to people talking live, maybe you’re even invited to ask a question by a moderator. Simple, right? 

Well, it might sound simple. From a pandemic-inflated zenith of big valuations and indiscriminate investment, “social audio” has crashed to something of a nadir. Spotify’s recent announcement that it was killing its version of a social audio feature follows in the footsteps of Reddit and Facebook, who’ve done the same. Amazon has laid off staff on its social audio app, Amp, and Clubhouse has gone from being the trendy new app of 2021 to being extremely not that.

Partly, it’s because a lot of these social audio apps were reinvented wheels – not doing anything that YouTube, Twitch, Discord and even radio haven’t been doing better for years. But let’s put aside the execution side of things and have a good look at the idea. I think there is something to that idea of “social audio”. Specifically, I think it’s a big opportunity for podcasting. Because podcasting is an inherently social medium.

What do I mean by that? Let’s start with a good working definition of what a podcast is – on-demand, digitally-distributed speech audio. Those are the essential characteristics of what a podcast is, and I would argue that they point towards a medium with a strongly social nature. 

Take the first part of the definition “on-demand”. Listeners self-select any podcast they listen to from a massive range, rather than being served a much more limited range of content through broadcast media like TV or radio. This means more narrowly-focused, or niche content, can thrive – and that in turn leads to more well-defined audience segments, or fanbases, who are more actively engaged with the shows they listen to. Now think of “digitally-distributed”. This means consumption is not restricted by space or time – you can live anywhere, and listen to a podcast any time after it's published. And it also means that podcasting is still largely free and platform-agnostic.

In sum, we’re talking about a medium that fosters well-defined audiences, and then makes it easy and accessible for new people to join in. To me, that’s fertile ground to build communities – and what is a community but a web of social relations?

Now let’s zoom in on that word, “speech”, and get a little bit more philosophically. Podcasting, as a medium, is almost entirely a human voice, speaking to either another human voice, or directly to the listener. And that’s powerfully, profoundly social, in ways that tap into our very nature.

Think about it like this – the spoken word is the most primal form of complex communication we have, hardwired deep into our monkey brains from millions of years of evolution. We can point to the invention of the printing press or the emergence of the written word, but the spoken word is so fundamental to our shared conception of humanity that it virtually defines our shared conception of humanity.

Wordless primitive people may have been able to use grunts and pointing to co-exist or even cooperate, but it is through speaking that they were able to establish the kinds of relationships that make us the social animal. It’s that simple – speech is social. And when we listen to Ira Glass or Sarah Koenig or Joe Rogan, we’re tapping into a literally ancient social ritual. 

Paradoxically, though, this is a social ritual we perform by ourselves. Research shows that when we listen to podcasts, we mostly do it alone. In fact, disproportionately alone, compared to other media (think about how common it is to watch a movie with an audience, or even have the radio on in a public space, and how unusual it would be to do that with a podcast). There’s an unresolved tension there – or perhaps unrealized potential. 

Even though we actually listen to them alone, podcast consumption manifests socially, in some interesting spontaneous or informal ways. Think of what that term “social” really means – relationships, communities. Then think of the position podcast hosts – the speakers of those words inside our ears – have in our imagination. These are not Hollywood stars we worship, but people we view as being socially related to us. There’s research into these parasocial relationships listeners develop towards hosts, and how that might fulfill social needs. When those same hosts read out ads on podcasts, research also shows we remember and trust this information more than other forms of advertising. But it’s not just the ads – we make lifestyle changes based on what we hear in podcasts, suggesting we think of these words from strangers similarly to how we think of recommendations from friends. Speaking of recommendations from friends, they are by far the most common way people discover new podcasts, showing that even if we consume them alone, information about these podcasts travels along social connections. 

There’s a social impulse there. And just because it hasn’t been completely captured, mediated and monetised by a tech start-up doesn’t mean it’s not real – or, for that matter, that it won’t be captured, mediated and monetised in the future. 

Because, when we look at Clubhouse’s valuation dropping and say “social audio is dead”, what we’re really saying is that “no media has yet emerged to effectively exploit the sociality of audio to the satisfaction of the market”.  Instagram has mediated the latent social potential of the image, and it seems TikTok has done the same for the video. It seems naive to think something of that scale can’t or won’t happen for speech audio like podcasting.

What will it be? What will it look or sound like? Will it be good or even bad for the world? This is where my thinking out loud ends, because I don’t have an answer. And if I did, I’d be working feverishly on it, not writing up a blog. I just think there’s a lot of value and potential in a “social audio” that doesn’t exist just yet. One that can fully reconcile podcasting’s fundamental potential for human connection, with its fragmented, solitary consumption.