In Defense of 'Um' and 'Ah': Why Imperfections Make Us Human
3 min read · By Naripod Team
“I hate listening to my voice. I say ‘um’ too much.”
We hear this all the time. It is the number one reason people are afraid to hit record. We have been trained to believe that “good” speaking sounds like a news anchor reading a teleprompter: smooth, continuous, and error-free.
But have you ever tried to have a deep conversation with a news anchor? It’s weird.
Real human connection doesn’t happen in perfect sentences. It happens in the messy, stumbling, beautiful way we actually talk. And those “errors” you’re so afraid of? They are actually features, not bugs.
The “Broadcast Standard” Trap
For decades, radio and TV set the standard for what recorded audio “should” sound like. Professional broadcasters are trained to eliminate “disfluencies”—the ums, ahs, and pauses that naturally occur in speech.
Then podcasting came along, and editing software made it easy for anyone to strip-mine their speech. You can see the waveforms, find the silence, and cut it out. You can make yourself sound like a machine.
But why do we want to sound like machines?
Especially now, when actual machines (AI) can speak perfectly, sounding human is a competitive advantage.
Decoding the “Um”
Let’s look at what those “imperfections” actually communicate.
“Um” means “I’m thinking.” When you say “um,” you are signaling to the listener: I am searching for the right word because I want to be precise. It shows care. It shows that the thought is being formed in real-time, not read from a script.
A pause means “I’m feeling.” Silence is heavy. When a storyteller pauses before a difficult sentence, the listener leans in. We feel the weight of what you’re about to say. If you edit out that breath, you edit out the emotion.
Stumbling means “I’m excited.” When we trip over our words, it’s often because our brain is moving faster than our mouth. It communicates energy, passion, and urgency.
These aren’t failures of speech. They are proof of life.
The Uncanny Valley of Perfection
There is a concept in robotics called the “Uncanny Valley.” As a robot looks more human, we like it more—up to a point. But when it gets almost human but not quite (too perfect, too stiff), it becomes creepy.
Audio has an uncanny valley too.
When a story is too polished, too edited, and too perfect, our brains reject it. It feels sterile. We stop trusting the speaker because we can’t hear the humanity.
We trust people who sound like people. We trust the person who laughs at their own joke before they finish it. We trust the person who sighs before admitting they were wrong.
Don’t Edit Your Soul Out
On Naripod, we don’t have editing tools. That’s not a missing feature; it’s a philosophy.
We want you to press record and just talk. If you stumble, keep going. If you say “um,” let it be. If you need to take a long pause to compose yourself, take it.
Your listeners aren’t grading you on your diction. They are listening for a connection. They want to feel what you felt.
So next time you hesitate to share a story because you’re not a “professional speaker,” remember: The robots are perfect. You are something better.
You are interesting.