No more typing reviews! Try our Samantha, our new voice AI agent.

Amazon Polly vs Deepgram comparison

 

Comparison Buyer's Guide

Executive SummaryUpdated on Apr 6, 2025

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Amazon Polly
Ranking in Text-To-Speech Services
1st
Average Rating
7.4
Reviews Sentiment
7.6
Number of Reviews
5
Ranking in other categories
No ranking in other categories
Deepgram
Ranking in Text-To-Speech Services
2nd
Average Rating
8.4
Reviews Sentiment
5.9
Number of Reviews
11
Ranking in other categories
Speech-To-Text Services (1st), AI Customer Support (2nd), AI Sales & Marketing (6th), AI Scheduling & Coordination (1st)
 

Mindshare comparison

As of May 2026, in the Text-To-Speech Services category, the mindshare of Amazon Polly is 15.7%, down from 29.9% compared to the previous year. The mindshare of Deepgram is 9.7%, up from 6.8% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Text-To-Speech Services Mindshare Distribution
ProductMindshare (%)
Amazon Polly15.7%
Deepgram9.7%
Other74.6%
Text-To-Speech Services
 

Featured Reviews

AG
Senior Software Developer at a tech vendor with 10,001+ employees
Text has been converted to speech across multiple languages with customizable voice settings
The most beneficial aspect of Amazon Polly is its ability to convert text to speech in multiple languages. It allows us to change the voice configurations for both male and female voices, and enables adjustments in pronunciation and delays. These features help us effectively target our users. Additionally, the integration capabilities with AWS services like Lambda aid us in storing Polly voice messages in DynamoDB and S3. It also offers configurations in multiple languages, enhancing our service reach.
Arunkumar HG - PeerSpot reviewer
Technology Architect & Hands-On Leader | Prototyping, Automation, AI/LLM Integration | 20+ Years in at Regalix
A Powerful, Adaptable, and Constantly Evolving STT Solution for Voice Automation
Honestly, Deepgram has been exceptionally proactive in addressing the primary area that needed improvement. My main challenge was with the real-time detection of when a user has finished speaking in a live conversation, which is critical for a responsive voice bot. They directly solved this by releasing their Flux model. Because Flux is a recent release, I haven't yet had enough time to thoroughly test it and identify new limitations. At this stage, any "improvement" would be more of a "nice-to-have" feature rather than a fix for an existing problem. The core service is already very robust and meets all of our current needs. What additional features should be included in the next release? ---------------------------------------------------------------- Looking toward the future, here are a few features that could add even more value to an already excellent platform: * Advanced Built-in Analytics: While I can get the raw transcript and build my own analytics pipeline, it would be powerful to have features like sentiment analysis, emotion detection, or automatic summarization offered directly through the API. This would save significant development time. * More Granular Speaker Diarization: For calls with multiple participants, enhancing the real-time speaker diarization (labeling who is speaking) to be even more precise would be a fantastic addition for creating detailed call analyses. * Tighter Integration with TTS: Since Deepgram is also expanding into Text-to-Speech (TTS), offering a more seamlessly integrated STT-to-TTS pipeline could simplify the development stack for creating voice agents from start to finish. * Specialized, Pre-Trained Industry Models: While the general models are highly accurate, offering even more specialized, pre-trained models for specific industries like finance, healthcare, or legal-which are heavy on specific jargon-could push the accuracy even higher for those niche use cases.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"Amazon Polly offers significant features like the ability to select different voice categories and language options, such as Spanish, Portuguese, German, and French, which is particularly useful for maintaining worldwide contact centers and enhances customer experience by allowing us to give voice responses instead of text-based responses."
"The sound generated by Amazon Polly is very natural, and I appreciate the options to select different voices, including an expensive or cheaper one, and the Structured Speech Markup Language (SSML) feature allows me to specify if I want a warmer or higher tune, which has helped make the meditations sound very natural."
"They have the neural voices, and they're so realistic, you don't even know that a person is not reading to you, making things much better."
"We can use the SSML tags in Amazon Polly to modify text-to-speech by controlling speech patterns and behaviour."
"The sound generated by Amazon Polly is very natural, and I appreciate the options to select different voices, including an expensive or cheaper one, and the Structured Speech Markup Language (SSML) feature allows me to specify if I want a warmer or higher tune, which has helped make the meditations sound very natural."
"Amazon Polly offers significant features like the ability to select different voice categories and language options, such as Spanish, Portuguese, German, and French, which is particularly useful for maintaining worldwide contact centers and enhances customer experience by allowing us to give voice responses instead of text-based responses."
"Amazon Polly is useful because it's helpful to hear the words on top of it when I can't take in information in a general way. Sometimes, it's very taxing if I'm trying to read cases. They have the neural voices, and they're so realistic. You don't even know that a person is not reading to you, making things much better. I know that they do have the ability to provide you with your own lexicon that's personal to you. I like that you can adjust the pitch and the speed of the voice because some people talk way too fast. Or if you're reading, I read slowly, so that's always helpful. One of the functions that I find helpful is that when reading material on the web, it's like it has its own browser. You go to the URL, and you don't have to read the whole thing, and you can stick the cursor on the place where you want it to start. Then if you want it to skip over something, you put it somewhere else, and that's ideal for reading case law because you skip around a lot. You don't really read it from start to finish. It helps if someone's going to read all those citations because they definitely want to be able to skip that."
"The most beneficial aspect of Amazon Polly is its ability to convert text to speech in multiple languages."
"The best thing with Deepgram is they are continually evolving and doing a lot of market research, and they take feedback seriously."
"The speed of the solution for transcribing videos is good."
"The speed of the solution for transcribing videos is good."
"The solution's Speech-to-Text conversion feature is really awesome."
"The best features of Deepgram for me are the level of transcription accuracy it provides and the amount of time it saves."
"Deepgram is able to handle large volumes of audio data without compromising accuracy."
"The solution's most valuable feature is its speed of transcription, as it is one of the fastest tools, especially if you compare it to the second fastest solution that you can get, which is 20 times faster, so it is not just a marginally faster product."
"We have tracked a reduction of around 70% in the support cost and direct human interaction for support."
 

Cons

"The price could be better; I wish it weren't so expensive to do because it's really cool."
"Amazon Polly's standard text-to-speech feature could be enhanced to deliver more natural and expressive human-like speech."
"Another point is that Amazon Polly needs better hard phone capability compared to Cisco solutions, which easily connect with hard phones."
"The price could be better. I wish it weren't so expensive to do because it's really cool. I would love to see them have lexicon packages of them like, this is for lawyers, this is for accountants, and it's going to have a lot of things in it. I also think they could do a better job at showing use cases other than telemarketing or contact center stuff like bots that are very commercial. I know that's where the money is, but it's such a huge hole that's missing for people with disabilities that are even worse than mine. Some people cannot see or hear at all, but they're not just cognitively impaired."
"When you put more tags inside Amazon Polly to define break time and instruct the speech to be conversational, sometimes it gives you an error."
"To get to the solution, there are many steps to go through, such as setting up AWS, which is a lot of hops."
"We've had issues in the past where it generates the transcript, and a lot of the text is duplicated."
"Deepgram is currently restricted to only the English variants, but it should include other languages, such as German or French."
"The area of live transcription could be improved. Sometimes, Deepgram's WebSocket is disposed of due to redundancy."
"The traditional Speech-to-Text doesn't understand when the user is done speaking in bot conversations."
"Deepgram has a vast UI and a vast range of models, but there could be a simpler version for creating AI agents rather than providing a full-fledged platform for minimal use cases."
"Regarding improvements for Deepgram, I think the quality of the transcriptions could be enhanced, as the Spanish accent poses challenges, making it harder to transcribe some words, and considering additional accents from Chilean or Argentine speakers could improve the model's performance with local words."
"Deepgram is currently restricted to only the English variants, but it should include other languages, such as German or French."
"When I had an AI interview for coding, Deepgram didn't capture the names of programming languages or well-known LLMs accurately all the time."
 

Pricing and Cost Advice

"The price could be better. Neural voices are so realistic, and I want to say that they have it so that you can try to tell where the voice is coming from or something like that. But if I have more than one, it's so expensive to have to listen to a bunch of cases on my phone and have the neural voice read to me. It really wouldn't be worth it. It'd be paying probably more than what I make in the case. Right now, I'm on the free tier, and I think the number of minutes that you get is reasonable as long as you're not doing this all the time and you're using it judiciously. I have some credits that I think I can use, but I don't know how fast they'll go through."
"The solution has a pay-as-you-go pricing model, where you must pay according to your usage."
"The pricing is moderate."
"The solution’s pricing is cheap."
"Deepgram is a cheap solution."
"When using Deepgram, one needs to pay for the hours or minutes for which the transcription is needed."
report
Use our free recommendation engine to learn which Text-To-Speech Services solutions are best for your needs.
893,221 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Comms Service Provider
9%
Educational Organization
8%
Media Company
7%
Financial Services Firm
7%
Educational Organization
10%
Financial Services Firm
8%
University
8%
Construction Company
8%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
No data available
By reviewers
Company SizeCount
Small Business9
Midsize Enterprise1
Large Enterprise1
 

Questions from the Community

What is your experience regarding pricing and costs for Amazon Polly?
Amazon Polly uses a pay-as-you-go pricing model. The standard voice type costs around $4 per one million characters, while the neural voice type costs approximately $10. It is free for the first tw...
What needs improvement with Amazon Polly?
Amazon Polly's standard text-to-speech feature could be enhanced to deliver more natural and expressive human-like speech. New speaking styles, emotions, more languages, and advanced features could...
What is your primary use case for Amazon Polly?
We are using Amazon Polly ( /products/amazon-polly-reviews ) to convert text into speech. It is being utilized to provide speech and voice messages to disabled users and also to deliver these speec...
What is your experience regarding pricing and costs for Deepgram?
My experience with pricing, setup cost, and licensing is that pricing is seamless and customizable as needed. Currently, we use the growth plan. For enterprise, they offer a higher tier, so it is c...
What needs improvement with Deepgram?
Deepgram has a vast UI and a vast range of models, but there could be a simpler version for creating AI agents rather than providing a full-fledged platform for minimal use cases. It could be multi...
What is your primary use case for Deepgram?
My main use case for Deepgram is creating voice agents to automate the customer support part and reply to FAQs and customer queries. Deepgram has multiple models, speech to text and text to speech ...
 

Overview

 

Sample Customers

GoAnimate, Duolingo, Bandwidth
Information Not Available
Find out what your peers are saying about Amazon Polly vs. Deepgram and other solutions. Updated: April 2026.
893,221 professionals have used our research since 2012.