Features

OpenClaw Voice Agent Skills: How to Give Your AI Agent a Phone Number

Chris DiYanni·Founder & AI/ML Engineer·

Your OpenClaw agent already handles Telegram messages, Slack threads, and WhatsApp chats around the clock. The vapi-voice-agent skill takes it one step further: a real phone number your agent can answer and call from, with natural conversational voice powered by Vapi's real-time STT/TTS infrastructure.

AI agents that live in chat windows are useful. AI agents that can pick up the phone are a different category of tool entirely. Giving your agent a phone number opens up use cases that text channels simply cannot cover: appointment confirmation calls that actually get answered, outbound lead follow-ups that feel personal, inbound customer support lines that never ring through to voicemail.

This guide covers how the vapi-voice-agent skill works, how to set it up on ClawTrust, and the practical use cases where voice genuinely outperforms text. We will also get specific about pricing so you can model the economics before you commit.

OpenClaw Voice: Give Your AI Agent a Real Phone Number

The vapi-voice-agent skill is pre-installed on every ClawTrust agent. Unlike most integrations that require you to hunt down a skill, install it, and debug configuration errors, voice capability is already there. You just need to connect it to a Vapi account with a phone number.

Once connected, your agent operates as a full phone participant:

  • Inbound calls: Someone calls your agent's number. Vapi answers, transcribes the caller's speech in real time, routes it to your OpenClaw agent, and speaks the agent's response back to the caller. The caller hears a natural voice, not a robotic IVR tree.
  • Outbound calls: Your agent can initiate calls proactively. Schedule an outbound call for 30 minutes before an appointment, trigger a follow-up call after a form submission, or run a batch of delivery confirmation calls overnight.
  • Call transcripts: Every call generates a full transcript that your agent can reference. Transcripts can be forwarded automatically to Telegram or Slack so your team sees what happened on every call.

The key insight is that your agent's intelligence does not change. The same agent that handles your Slack DMs is the same agent answering the phone. It knows your business context, your customer data, and your processes. The only difference is the input and output are speech instead of text.

For a broader look at the skills ecosystem, see our guide to OpenClaw built-in skills.

What Vapi Does: The Technology Behind OpenClaw Voice

Vapi is a voice AI infrastructure platform. Its job is to handle all the telephony complexity that would otherwise require a telecom engineering team to manage. When you use the vapi-voice-agent skill, you are not building a phone system from scratch. You are plugging into Vapi's production-grade infrastructure through a clean API.

Here is what happens on a typical inbound call:

  1. Caller dials your agent's Vapi phone number
  2. Vapi answers the call and opens a real-time audio stream
  3. A speech-to-text (STT) model converts the caller's speech to text, typically within 200-400ms
  4. The transcribed text is sent to your OpenClaw agent as a message
  5. Your agent generates a response using its configured LLM (Claude, GPT-4o, or whichever model you have configured)
  6. A text-to-speech (TTS) model converts the response to audio
  7. Vapi streams the audio back to the caller

The entire loop from caller speech to agent response typically runs in under one second for short turns, making conversation feel natural rather than robotic. Vapi's infrastructure is specifically optimized for this latency target, which is why building it yourself with generic cloud services produces noticeably worse results.

Vapi supports multiple voice and speech providers at each layer:

LayerSupported ProvidersBest For
Speech-to-TextDeepgram, OpenAI Whisper, GoogleDeepgram recommended for lowest latency
LLMYour OpenClaw agent's configured modelUse the same model as your other channels
Text-to-SpeechElevenLabs, OpenAI TTS, Azure, Deepgram AuraElevenLabs for highest quality, OpenAI for balance of speed and quality
TelephonyVapi (Twilio/Vonage backend), BYOC (Bring Your Own Carrier)Vapi numbers are simplest to start with

Vapi also manages call routing, SIP trunking, phone number provisioning, call recording (optional), voicemail detection, and post-call webhooks. The vapi-voice-agent skill exposes all of this to your OpenClaw agent through a clean set of tools the agent can call during or after conversations.

Inbound Calls: Your Agent Answers the Phone

Inbound call handling is the most immediately useful capability for most businesses. Your agent gets a dedicated phone number. When that number rings, the agent picks up.

There is no queue. There is no hold music. There is no "your call is important to us." The agent answers immediately, every time, regardless of how many calls are coming in simultaneously. A hundred customers calling at once is handled the same as one.

What your agent can do on an inbound call:

  • Answer common questions using its knowledge base and context
  • Look up account or order status by asking for identifying information
  • Book appointments directly using the cal-com-scheduling skill
  • Qualify leads and route to appropriate follow-up sequences
  • Take messages and route them to a human via Telegram or Slack
  • Escalate to a human by transferring the call when the situation requires it

Call transcripts are generated automatically. You can configure the agent to forward transcript summaries to your Slack or Telegram channel after each call, so you always know what your callers needed even when the agent handled it without your involvement.

For businesses that currently direct customers to a contact form or a support email, an inbound voice line eliminates the friction that causes customers to give up and call a competitor instead. Phone calls have significantly higher completion rates than form submissions for urgent issues.

Outbound Calls: Your Agent Makes the Calls

Outbound calling is where voice agents generate the most measurable business impact. Your agent can proactively reach out to customers, leads, or contacts based on events, schedules, or triggers.

The mechanism is straightforward: your agent calls the vapi-voice-agent skill's outbound call tool with a phone number, a script or context, and optionally a scheduled time. Vapi handles the dialing, connects the call when answered, and runs the conversation.

Common outbound use cases include:

  • Appointment confirmations: "Call the patient at 2:00 PM the day before their appointment to confirm and handle rescheduling if needed"
  • Lead follow-ups: "Call leads who submitted the contact form but have not responded to the follow-up email after 48 hours"
  • Order and delivery updates: "Call customers when their order ships and let them know the expected delivery window"
  • Subscription renewals: "Call customers 3 days before their subscription expires if they have not renewed online"
  • Post-service feedback: "Call customers 24 hours after a service call to ask if everything was resolved"

Voicemail handling is built in. If the call goes to voicemail, the agent detects it and can leave a pre-configured message rather than hanging up silently. You can set different messages based on the context of the outbound call.

Outbound voice calls have measurably higher engagement rates than email or SMS for time-sensitive communications. Appointment reminder calls reduce no-show rates by 30-50% compared to email reminders alone, which directly impacts revenue for service businesses.

Voice Persona and Configuration

Your agent's phone presence can be configured independently of its text channel behavior. This matters because a good phone agent has different requirements than a good chat agent. Phone conversations need a natural greeting, a clear introduction, and handling for awkward silences and interruptions.

Configuration options through the vapi-voice-agent skill include:

  • Agent name: What the agent introduces itself as on calls. This can differ from the agent's name in your Slack workspace.
  • Greeting script: The opening line the agent speaks when a call connects. "Thanks for calling Acme Support, this is Alex. How can I help you today?" Keep it short and natural.
  • Voice selection: Choose from ElevenLabs custom voices (highest quality, supports custom voice cloning), OpenAI TTS voices (fast and natural, good default choice), Azure Cognitive Services voices, and Deepgram Aura voices.
  • Maximum call duration: Set a hard cap on call length to control costs and prevent edge cases where a call runs indefinitely.
  • Escalation paths: Define when and how the agent should transfer to a human. "If the caller asks to speak to a manager, transfer to [number]" or "if the issue requires account access I do not have, take a message and escalate to Slack."
  • Fallback behavior: What the agent does when it does not understand the caller or cannot resolve the issue.

For customer-facing deployments, ElevenLabs custom voice cloning lets you create a voice persona that is consistent with your brand. You provide 30-60 seconds of sample audio and ElevenLabs generates a unique voice model. This is the highest-quality option and produces the most natural-sounding calls.

Real-World Use Cases for OpenClaw Voice Agents

Appointment and Booking Confirmation Calls

Service businesses lose significant revenue to no-shows. A patient who forgets a dental appointment, a client who double-booked a meeting, a customer who forgot about an installation window. Reminder calls are the most effective intervention, but most businesses cannot afford to staff them.

An OpenClaw voice agent solves this directly. Connect it to your cal-com-scheduling skill and configure a trigger: 24 hours before every appointment, the agent calls the customer, confirms the appointment, and handles rescheduling requests in real time. The customer speaks naturally, the agent updates the calendar, and your no-show rate drops without adding headcount.

Inbound Customer Support Line

For small businesses, a professional customer support phone line has historically required either hiring staff or paying for a call center service. Neither is economical below a certain volume threshold. An OpenClaw voice agent changes the math entirely.

Configure your agent with your product documentation, common support scenarios, and escalation rules. It handles the routine inquiries (order status, hours of operation, basic troubleshooting) instantly and routes genuinely complex issues to a human via Slack notification with the full call transcript attached.

Lead Qualification and Follow-up

Marketing teams spend significant effort generating leads that then sit in a CRM going cold because sales follow-up is slow. An OpenClaw voice agent can call new leads within minutes of form submission, qualify them with a short conversation, and either book a demo directly (via the scheduling skill) or flag high-intent leads for immediate human follow-up.

The speed matters. Lead conversion rates drop dramatically after 5 minutes of delay. An agent that calls within 2 minutes of form submission is working in a different conversion window than a human sales rep who returns calls the next morning.

Delivery and Order Status Updates

E-commerce and logistics businesses field enormous volumes of "where is my order?" inquiries. An outbound voice agent can proactively call customers at key order milestones, reducing inbound inquiry volume while improving the customer experience. A 30-second call confirming delivery is more satisfying than a tracking email that requires the customer to click through to a carrier site.

Survey and Feedback Collection

Voice surveys have substantially higher completion rates than email surveys. An OpenClaw voice agent can conduct post-purchase or post-service interviews, ask structured questions, capture open-ended responses, and log results automatically. Because the agent conducts a real conversation rather than reading through a scripted list, customers provide richer and more honest feedback than they would in a form.

Setting Up Vapi with OpenClaw

Setup takes about 20 minutes from start to first call. Here is the full process:

  1. Create a Vapi account. Go to vapi.ai and sign up. Vapi offers a free tier with usage credits to test with before you commit to a paid plan.
  2. Purchase a phone number through Vapi. In the Vapi dashboard, navigate to Phone Numbers and provision a number. US numbers cost approximately $2/month. You can choose a local area code or a toll-free number depending on your use case.
  3. Generate a Vapi API key. In the Vapi dashboard, go to API Keys and create a new key. Copy it immediately since you will not see it again in full after creation.
  4. Add the API key to your ClawTrust credentials vault. In your ClawTrust dashboard, open the Credentials section for your agent and add the Vapi API key. It is stored encrypted in the credentials vault. Your agent accesses it through the clawtrust-credentials skill proxy so the raw key is never exposed in your agent's environment.
  5. Configure the vapi-voice-agent skill. The skill is already installed. In your agent's skill configuration, set the assistant name, greeting script, voice provider and voice ID, maximum call duration, and escalation rules. Start with OpenAI TTS for simplicity, then switch to ElevenLabs once you want to customize the voice further.
  6. Test with an inbound call. Call your new Vapi number from your own phone. You should hear your configured greeting and be able to have a natural conversation with your agent. Speak naturally and verify the agent responds appropriately and the transcript appears in your dashboard.

Try Voice on Your Agent Today

Every ClawTrust agent ships with the vapi-voice-agent skill pre-installed. Start your free trial and give your agent a phone number in under 20 minutes. No phone infrastructure to manage.

Start Free Trial

Voice Agent Pricing: What It Costs to Run

Vapi's pricing model is usage-based. You pay per minute of active call time. The per-minute rate depends on which providers you use at each layer, but a realistic all-in estimate is $0.05 to $0.20 per minute covering STT, LLM inference, TTS, and telephony.

Here is how that maps to real usage scenarios:

ScenarioCall VolumeAvg DurationCost at $0.10/minCost at $0.20/min
Appointment reminders100/mo2 min$20/mo$40/mo
Inbound support line200/mo5 min$100/mo$200/mo
Lead follow-up calls50/mo4 min$20/mo$40/mo
Single support call15 min$0.50$1.00

Compare those numbers to the human alternative. A human support agent costs $15-25 per hour, which works out to $0.25-0.42 per minute of actual talk time (not counting time spent on tasks between calls). At $0.10-0.20 per minute, AI voice calls are 1.25x to 4x cheaper per minute of conversation, and the AI agent handles unlimited concurrent calls without any additional cost.

The phone number itself is a separate cost: approximately $2 per month per number through Vapi. This is essentially rounding error in the overall cost model.

For businesses currently paying for a call center service or a dedicated receptionist, the economics are not marginal. They are transformative. A receptionist handling inbound calls at 40 hours per week at $20/hour costs $3,200 per month. An OpenClaw voice agent handling the same volume costs $100-400 per month depending on call volume and duration, plus the ClawTrust subscription.

OpenClaw Voice vs Text Channels: When to Use Each

Voice is not universally better than text. The right channel depends on the situation, the customer, and what you are trying to accomplish. Here is a practical guide to when each channel wins:

ScenarioBest ChannelWhy
Quick question or single taskText (Telegram/Slack)Faster, asynchronous, no interruption to either party
Complex multi-step workflowTextEasier to track, copy-pasteable results, reviewable history
Appointment confirmationsVoice (outbound)Higher answer and response rate than email or SMS reminders
Inbound customer supportVoice or text (customer preference)Depends on customer demographics; offer both when possible
Real-time urgent updatesVoice (outbound)Phone calls get attention; emails and texts get ignored in urgent situations
International customersTextNo international calling complexity, works in any time zone without intrusion
Sharing documents or dataTextFiles, links, and structured data do not translate to voice
Sales qualification with warm leadsVoice (outbound)Conversational voice calls convert leads faster than back-and-forth email

The best-performing agent deployments use voice and text together. Text channels handle ongoing task management, asynchronous requests, and document-heavy workflows. Voice handles time-sensitive outbound communications and inbound callers who prefer not to navigate a chat interface.

Your OpenClaw agent on ClawTrust handles both from a single configuration. The same agent that reads your Telegram messages books calendar appointments and calls customers to confirm them. That coordination is what makes a voice-enabled agent substantially more capable than a standalone phone bot.

For more on skills that work well alongside voice, see our guides on OpenClaw built-in skills and OpenClaw scheduling skills. Combining the vapi-voice-agent skill with cal-com-scheduling is the most common high-value configuration for service businesses.

Frequently Asked Questions

Can OpenClaw make and receive phone calls?

Yes. The vapi-voice-agent skill integrates OpenClaw with Vapi, a voice AI infrastructure platform. Your agent gets a real phone number, can handle inbound calls (customer support, appointment booking, FAQs), and make outbound calls (appointment reminders, lead follow-ups, order confirmations). Call transcripts are automatically saved and can be forwarded to your Telegram or Slack channel.

What is Vapi and how does it work with OpenClaw?

Vapi is a voice AI infrastructure platform that handles telephony, speech-to-text, and text-to-speech for AI agents. When someone calls your agent's number, Vapi converts their speech to text, sends it to your OpenClaw agent, gets a response, and converts that response back to speech in real time. The vapi-voice-agent skill connects OpenClaw to Vapi's API.

How much does it cost to run voice calls with OpenClaw?

Vapi charges per minute of active call time. Total cost typically ranges from $0.05 to $0.20 per minute all-inclusive (speech-to-text, LLM inference, text-to-speech, and telephony). A 5-minute support call costs roughly $0.25-$1.00. Compared to human agents at $15-25/hr, AI voice calls are significantly cheaper at scale.

What voice options are available for OpenClaw agents?

The vapi-voice-agent skill supports multiple text-to-speech providers: ElevenLabs (highest quality, custom voice cloning), OpenAI TTS (fast, natural), Azure Cognitive Services, and Deepgram. You can configure a unique voice persona for your agent's phone presence, independent of its text channel persona.

Can I use OpenClaw for customer support phone calls?

Yes. OpenClaw agents with the vapi-voice-agent skill can handle inbound customer support calls 24/7. The agent answers calls, handles common questions, takes messages, books appointments, and escalates to human agents when needed.

openclawvoicevapiphoneinboundoutboundskillintegration

Ready to hire your first AI employee?

Secured and ready in 5 minutes.

Get Started