Creating a Voicemail Greeting with OpenAI Text to Speech - OpenAI Blog

Yesterday, I was setting up a new VOIP phone number and I decided that I wanted a professional sounding voicemail greeting for prompting people to leave a message. In the past, I’ve used services that enable you to pay a voice actor to read your voicemail greeting. But hiring a human to read some text seems perverse in this new era of artificial intelligence.

In this blog post, I show you can use the OpenAI Text to Speech endpoint to generate a voicemail greeting. Here’s what the final recording sounds like:

This whole project should take you less than 15 minutes.

Setting Everything Up

Ensure that you have your OpenAI project setup correctly by following the instructions in my previous blog post:

Create Your First OpenAI and NodeJS App in 15 Minutes

Calling the OpenAI Text to Speech API and Generating an MP3 File

Here’s the code that I used to generate the voicemail greeting:

import fs from "fs";
import path from "path";
import OpenAI from "openai";

const openai = new OpenAI();

// create path where sound file will be outputted
const speechFile = path.resolve("./greeting.mp3");

async function main() {
  // call the OpenAI speech endpoint
  const mp3 = await openai.audio.speech.create({
    model: "tts-1-hd",
    voice: "nova",
    input: `Hello and thank you for calling. I can’t answer your call at the moment, however if you 
    leave your name and number then I’ll get back to you as soon as possible.`,
  });

  // stream the sound file from the API to the file on disk
  const buffer = Buffer.from(await mp3.arrayBuffer());
  await fs.promises.writeFile(speechFile, buffer);
}

main();

All of the work happens in the main() function. This function does two things. First, it calls the openai.audio.speech.create() method to convert the text into an audio file:

// call the OpenAI speech endpoint
const mp3 = await openai.audio.speech.create({
  model: "tts-1-hd",
  voice: "nova",
  input: `Hello and thank you for calling. I can’t answer your call at the moment, however if you 
    leave your name and number then I’ll get back to you as soon as possible.`,
});

In the code above, the following three arguments are passed to this method:

model – The name of an OpenAI Text to Speech model. You can use either the tts-1 or the tts-1-hd model. The tts-1-hd produces higher definition audio output but it is slower.
voice – The name of the voice to use. The options are alloy, echo, fable, onyx, nova, and shimmer.
input – The text to convert to audio. The text cannot be longer than 4,096 characters. In the code above, the backtick character ` is used to wrap the text across multiple lines.

Because the openai.audio.speech.create() method calls an API behind the scenes, the method is asynchronous and it returns a promise instead of the actual audio file. The following code is used to take the audio data being streamed back from the API call and write the data to a file on your hard drive:

  // stream the sound file from the API to the file on disk
  const buffer = Buffer.from(await mp3.arrayBuffer());
  await fs.promises.writeFile(speechFile, buffer);

Summary

The goal of this blog post was to illustrate how you can easily convert text to speech by using the OpenAI Speech endpoint. This was a really simple application of this amazing technology but I hope that this introduction was enough to get you started on more advanced applications.