
AI-Powered TTS Audio Player | Grayscale Investments
Shipped an AI audio player using OpenAI's TTS API, with smart caching to minimize costs.
Overview
Our CMO at Grayscale saw a text-to-speech feature on another site and liked the idea of letting users listen to articles rather than read them. She brought the idea to us, and I took lead on researching solutions and defining how we'd implement it.
The result is an AI-powered audio player embedded on article pages across the website. Users can click play and hear the full article read aloud, with the audio generated by OpenAI's TTS model. I worked with one of my engineers to land on the optimal approach, then defined the specs for implementation while he built it out.
How It Works
When a user clicks play, the gpt-4o-mini-tts model generates audio from the article body (which is rich text in Sanity). The audio is output as an MP3 to keep file sizes down, then cached and stored via Vercel Blob.

Any subsequent user who plays that same article hears the cached audio, so we're not generating a new file every time someone hits play. The only times a new audio file gets generated are when an article is played for the first time, or when a CMS user updates the article body. In that case, we wipe the cached file and generate fresh audio the next time someone plays it.

We explored other options like ElevenLabs, but landed on OpenAI for a few reasons. Our company already used OpenAI, so we were able to set up a dedicated account and get an API key from our security team without friction. React also has a text-to-speech component that integrates seamlessly with our stack. And the gpt-4o-mini-tts model was the most cost-effective option that still delivered the quality we needed.
Player Design
The audio player is embedded into the article page template, sitting below the masthead and above the article body text.
I made the call to include "Listen to this Article" text above the player to eliminate any confusion about what it does. I also worked on the design of the player itself, keeping it simple and clean while staying in line with the rest of our design language. On the functionality side, I defined the player behavior with engineering, including scrubbing so users can skip to specific parts of the article rather than listening through the whole thing each time.
Compliance
Given Grayscale's regulatory environment, we needed to work with compliance to ensure the tone and dictation of the generated audio was a good representation of the intention behind our research articles. Basically, they couldn't sound too "salesy." We validated the output with compliance before going live.
Funding & Cost Management
Before moving forward, I put together a formal request to our COO outlining the expected spend for this feature. Once approved, we set up spending guardrails in OpenAI to avoid any overages. This let us ship the feature confidently without worrying about runaway costs.
Additional Thoughts
I'm proud of how this project came together! I was able to research and land on a solution that was cost-effective, accomplished the ask from stakeholders, and integrated natively with our current tech stack. It felt like I hit all of the marks a PM should strive for on this one.