Captions in a world without sound: a CEO's story
Content Lead, Google Workspace
Google Workspace Newsletter
Keep up with the evolving future of work and collaboration with insights, trends, and product news.SIGN UP
David Cohn has made a career out of connecting people and organizations. As a founder and co-CEO of Civic Entertainment Group — a full-service marketing and communications agency — he’s helped brands like Airbnb, Meta, Shopify, Ford, CNN, McDonald’s, NBC, Verizon and more build bridges between their organization’s purpose and the communities they serve. In recent years Civic has helped Airbnb normalize and mainstream the new behavior of home sharing and showcased Verizon's 5G value proposition at the Super Bowl.
David is especially excited about helping brands build and activate critical accessibility strategies, including those involving new uses of technology. When Guiding Eyes for the Blind partnered with Google to develop a new technology that allows blind runners to run without a human or canine guide, Civic led the team that orchestrated NYC Marathon programming and secured coverage from major outlets like GMA, Forbes, and Reuters.
But if you ask David about the use of technology that has most impacted his own personal life, he’ll tell you that it’s the humble-but-powerful caption in a Google Meet video call. Over the years, he’s met, pitched to, and connected with hundreds of company executives without once hearing their voices. Deaf since infancy due to Spinal Meningitis, David moves through the world entirely without sound. Real-time captions in Meet allow him to understand everything that’s said in a meeting, regardless of whether people are sitting at the same conference table (using companion mode) or tucked into a video tile from across the globe.
“I watched in real time as the captions in Meet became exponentially better throughout the pandemic, which is when I needed them most, though now I cannot imagine work without them,” said David.
Captions were first launched in Google Meet in 2019 and are continuously updating via the latest AI technology. Rob McGrorty, Group Product Manager at Google, who focuses on Automated Speech Recognition (ASR) for products like Meet across the company, said that “speech recognition is always evolving and improving as we introduce the latest generation of Machine Learning models from our research team. In the case of Meet captions, these models take in raw audio from the video call and infer the words being spoken in order to transcribe the conversation into a text caption live on the screen before you. In May of 2020, one of these updates had a more significant and noticeable impact for Meet users, and we’re so glad to hear how much it improved David’s experience!”
During this same time, Google Meet also made improvements in audio quality and background noise reduction — fueled by AI enhancements — to make audio clearer for all meeting participants.
“Sometimes I think about how incredibly useful this sort of technology would have been in college,” said David, a graduate of the University of Pennsylvania. “Instead, I had what I called ‘homework between homework and hacks between hacks,’ since not hearing professors made it difficult to prepare for coursework so I spent a lot of time mining for information, memorizing textbooks, choosing courses based on access to information rather than interests, borrowing fellow students’ notes (which wouldn't matter anymore with Google Docs!) and on and on.”
David never learned American Sign Language and has always relied on visual cues and lip reading to navigate the world. “So when the pandemic hit and people started to wear masks,” he said, “it was my worst nightmare. All my information comes from visual cues and all my communication from peoples’ faces, so I felt very much cut off from humanity.” He was constantly asking strangers to lower their masks so he could “hear” them. Think grocery store clerks, retail salespeople, Uber drivers, Starbucks baristas, restaurant servers, and even strangers looking in his direction (“are they speaking to me??”). He ended up having T-shirts made that said “I’m Deaf and I lipread” on the front, and “Please be kind” on the back.
Google continues to invest in the power of captions with users like David in mind. Live Transcribe for Android phones and Recorder on Pixel devices are already able to transcribe audio from the microphone onto the screen, using the same underlying Automated Speech Recognition technology as Google Meet, but running locally on the device to maximize privacy and portability. And Meet will soon be launching companion mode on its mobile app, giving David the power of captions no matter where life takes him.
“Because my disability is invisible, people don’t know what I’m experiencing,” David said. “Deaf people tend to be resourceful, but surviving the pandemic would have been much more difficult if it had happened a decade ago. Technology made it exponentially easier to operate, personally and professionally.”
See more of how captions build bridges in this YouTube video, produced in partnership with BBC Storyworks. Take a behind-the-scenes look at two Googlers, KR Liu and Laura D’Aquila, who are hard of hearing and how they built and use some of Google’s most helpful tools, like Live Transcribe and Google captions in their day-to-day lives.