This week I read [[Siddhartha]] by Herman Hesse (as translated by Hilda Rosner). There's a lot to say about the book - it's tremendous - but today I'll focus on one idea from the book, which is the necessity of mistakes.
I also want to talk about [[Audio]]. So to expect two sections this week: mistakes, and audio.
If I were to draw an image personifying spirituality, I might sketch a picture of a monk at a monastery, practicing sitting meditation and drinking tea. He has found peace - having renounced all worldly possessions. I'd draw his mouth with a subtle upward curve, to resemble the slightest of smiles.
I certainly would not draw a gambler, mired in a midlife existential crisis, experiencing a nauseating level of depression. But that is precisely what our protagonist, Siddhartha, experienced:
I had to experience despair, I had to sink to the greatest mental depths, to thoughts of suicide, in order to experience grace, to hear Om again, to sleep deeply again and to awaken refreshed again. I had to become a fool again in order to find Atman in myself. I had to sin in order to live again.
There is something extremely relatable about despair. Whereas wild success is a rare fruit tasted only by a small fragment of civilization's elite - all of us have known failure. Some to darker nadirs, but all of us know this suffering.
And what's beautiful about this - is that the book shows us not only can we "bounce back" from the lowest points in our life, but they are actually necessary. Without them, you may gain an intellectual understanding of wisdom. But to truly know what wisdom and truth are, you must take your own unique path.
In my life, I find this question recur with frequency. Is it possible to "beeline" to peace? Or must you take the winding path? Or as my mother describes "弯路" - a torturous, bending road.
The engineer in me says that optimization is possible. But all signs point elsewhere. We each have moments where we know something is wrong, something is off - but choose it anyways.
Take for example, this software engineer on a Reddit thread that I found recently. This young engineer took a job for prestige and is now miserable.
I hate everything about it. I had a choice between a company with great people, better CoL and same pay and my current company. The only thing better here is the prestige, I fell for it. I knew something felt off during the interview but I took the job anyway. People here are assholes who live to work, pretend to be nice and backstab you as soon as they can. Fuck them, fuck me. I wish I I could go back.
So that's this engineer. But we've all made that same decision before. And maybe not for prestige, but for reasons that we knew were wrong, but simply couldn't override.
There is no lesson here. These are my notes. Not an essay. There is no point I'm trying to make. No argument I'm advancing. Simply that I thought the idea of [[Mistakes | mistakes]] as not only beneficial, but necessary for spiritual growth was deeply beautiful.
Given the number of times I have felt existential pangs as a consequence of my decisions - it's possible that I've made enough mistakes to set the stage for growth.
But probably not. I suspect I have many more mistakes ahead of me.
Ok, so let's change gears from the spiritual to the technical, and talk about audio.
If you're not familiar with these two articles, bookmark them and read them after you've consumed this piece:
And I'm certainly missing other blog posts/tweets on what I'm sure will be a trend of tech writers discussing the future of audio. I'm less interested in predictions on Audio though, and would rather focus on getting the pulse on what is out there today, and what I feel like each of these technologies might unleash.
In the style of Venkat's "A Text Renaissance" (or at least the first half of that piece), I will briefly outline a series of technologies that I think have put us at the precipice of a Cambrian explosion in audio.
Feel free to skip around, these sections came out to be much longer than I anticipated.
[[WebRTC]] is here. Finally. After initial release in 2011, and so many teasers along the way, we're finally at the point where browsers have deployed stable and high quality versions of WebRTC. In fact, the stable spec was only finalized on May 4, 2018 - and security vulnerabilities have been discovered and patched since. It's been a long road, but it's finally here.
In recent years, developer productivity has seen massive benefits due to browser affordances like "flexbox", native promises (leading to async/await), and countless other small improvements.
And for most of us, WebRTC was just a thing that was always on the horizon. But it sort of crept up on all of us, and now it's there, ready for the tinkering.
Sadly, I don't think this means the fall of Zoom anytime soon. Zoom has invested heavily in their own architecture and technology and probably have a massive lead on everyone. But that's for video - which consumes significantly more bandwidth than audio and is a different kind of problem.
WebRTC and audio work very well today. For example, my friends Azlen and Jon built Cozyroom, which sits atop of WebRTC. They've since polished it after their humble hackathon beginnings, but these types of experiences are about to explode.
Myself, I'm working with Nick on Openhouse. Which I'll talk about much more in the future - but for now, know that it is a project built with the intention of learning WebRTC.
This is a trend that didn't really make a whole lot of sense to me at first. How does a stupid little accessory change the face of computing, especially since it doesn't do anything structurally different from its corded predecessors?
Well, it changed a lot. It changed our fashion sensibilities around wearables. It changed how often we plug into audio. It taught us that we could replace our own thoughts with the voice/laugh of Joe Rogen.
It has completely surprised Apple, demolishing their own sales projections. And the demand for AirPods, feels as insatiable as the demand for Teslas in a world of unlimited access to a subprime auto loans.
But it's not just AirPods. Apple's forever copycat, Android, has finally shipped "Fast Pair" earbuds that might not suck. When the proliferation of fashionable Android wireless earbuds finally hits mass adoption, get ready for a further acceleration of people escaping from their own thoughts via podcasts and other monkey mind pacifiers.
For those of you who still think Discord competes with Slack for communities... Where the hell have you been? It's been my experience that all communities that have started after COVID-19 have chosen Discord, and the only Slack communities that I still get invites from are legacy ones, or businesses.
It's silly to think Discord and Slack compete on instant messaging. Slack is the "semi-async" text communication solution for startups, which competes with Microsoft Teams in the enterprise.
Discord is a social space that embraces pseudonymity and gamers. It facilitates seamless jumping between communities (something that is an architectural impossibility given Slack's sharded MySQL strategy), but most importantly (imo), it provides persistent voice channels for communities to congregate.
These voice channels seemed stupid to me at first. Like why would you want an always open connection to friends that you can hop in and hop out of at any time? Why not just call them? Why would this change behavior at all? I mean, as far as the technology is concerned, this idea has ALWAYS been possible - so why didn't it take off before?
Well, the reduction in friction to communicating with people does change behavior. It makes it feel like walking into a common room at a youth hostel and seeing who's there. It creates audio serendipity in a way that is impossible on Slack. It feels like a physical space, made possible by... audio.
Which is a great segue into Clubhouse. See, Clubhouse and Discord share something in common. I described Discord voice channels via the metaphor of a "common room". Well, in Clubhouse, the primary interface is a list of "rooms", or shared audio spaces - they lean into that metaphor big time.
In the larger rooms, there's a small handful of speakers on stage enjoying a free-flowing (often moderated) discussion. All while an audience listens silently from their muted perch (audience members cannot speak unless selected by the moderator to join the stage). And in the smaller rooms, it's a public, participatory discussion about anything random.
Clubhouse to me - and it might morph into something very different in the coming weeks, such is the pace of mutation - feels like a book festival. If you've never heard of a book festival, then you're not alone. The only reason I know the term is I randomly found myself in Berkeley once upon a time - and it just so happened to be during the first annual Bay Area book festival.
It's a lot like a music festival, but instead of featuring musicians, there were authors and literary figures. And instead of performing music, they just talked. Different bookstores and theaters around Berkeley were all booked for the extravaganza.
When I opened Clubhouse for the first time, it felt like I was transported to downtown Berkeley for the book festival. There were a list of different talks happening right then, and list of upcoming events.
It amazes me how audio, seems to create the illusion of spaces. Definitely not physically, but it creates an imaginative space that feels "cozy" in a way that Zoom and Google Meet simply do not.
I'm particularly excited about our Openhouse project because it allows for these affordances and to create spaces that can feel cozy. The internet is mutating, and open source is the highly editable RNA that powers the rapid evolution of new tech. I want to see a world where any community can embed a Clubhouse like experience on any site or in any app.
And if that happens, then we'll chip away at the value prop of universities. Their lectures have already been digitized and commoditized. Now, with Clubhouse/Openhouse, we see talks, guest speakers, and seminars getting commoditized. We're getting there.
Audio and Text have a very peculiar relationship.
Before we had text messaging, we had phone calls. But before we had phones, we had mail. And before we wrote letters, we just talked to each other.
Audio and Text seem to come in alternating waves.
Audio Transcription has been around for a long time. I remember many years ago, TJ-Maxx would sell last-year's Dragon speech recognition software in their discount software section (well, I guess the entire store is a discount section, so apologies for that redundant adjective). But they were pretty limited back then.
Today, AI transcription is getting good. And it's available as an API via things like Temi, [[Otter]], Trint (see comparison). But it's not that good though yet.
If we can get really good AI transcription, for cheap, then audio and text will blur in ways we haven't seen previously.
We're not there yet, but watch this trend. As GPT-3 has proven, when we start throwing lots of parameters (175+ billion) and money (millions of $) at a really really massive neural net, things start getting uncanny - or possibly, beyond uncanny (in the sense of crossing the uncanny chasm... err valley).
I'm no expert in AI, I barely understand what a neural network is (thanks to Coursera, the first mover in unbundling the university), but I know to follow this trend.
The moment we can get fast, highly-accurate automated transcription for cheap, the line between audio and text will blur so much that it'll lead to innovations that are hard to predict. As is though, transcription seems to get something every sentence and that's good, but not good enough.
The phase change, when it comes, will come suddenly.
I have little to say about podcasting other than it is digitizing Radio for the AirPod generation. This has been a trend for years and is uninteresting because it's so huge already.
It might be uninteresting, but it's hugely important. According to Infinite Dial, about 55% of Americans have listened to a podcast in 2020, up from 33% in 2015. The shift in behavior has tipped to the majority.
I love how podcasting is based on RSS, an architecturally open technology - which Apple has supported for years. I love that openness. It's contribute to the rapid growth and proliferation of podcasting - the barrier to entry is so low.
Monetizing it will be a pain. The fact that it sits on top of RSS, an architecturally open technology, makes it more frustrating for podcasters to get stats on their listeners, which means less potent ad targeting.
Spotify has made a big play on podcasts, we'll see how that plays out for them. The key here is to build the world's best podcast player (classic Aggregation Theory at play) and get everyone to use that. Then, once you have the user's identity and listening history, you can drop targeted ads for days.
But Spotify doesn't have the best podcast player, and it will probably be pretty hard to get me to switch to them. So they've resorted to exclusive deals with big names. It didn't work for Mixer when they signed Ninja. Maybe it will work for Spotify though? The moment the aggregate demand though, they win. To do so, I think they'll need a Google-search level quantum leap in utility over the current podcast players.
Anyways, apologies for the digression on Spotify. What I really want to point out is you've probably heard all those Clubhassholes say stuff like "Clubhouse has completely replaced podcasts for me". I think that's actually how podcasting fits into this greater narrative of audio explosion - it is the gateway behavior to newer audio experiences.
First you podcast, then you clubhouse, then we all become Joaquin Phoenix with an ever present audio-presence in our ear canals.
Audio is exploding. You already knew that. But I hope you've gotten a better sense of "Why now?", along with a few trends that might push us into explosive adoption of audio:
The petri dish is ripe for a combinatorial expansion of random ideas in audio 🧫🎧. Get excited!