YouTube Shorts With ElevenLabs And InfiniteTalk, A Beginner’s Workflow For Solo Founders
Turn short scripts into avatar-style Shorts in under an hour, with captions, simple visuals, and one clear CTA.

What you will make
A 60-second YouTube Short where a talking avatar explains one useful idea. The voice is generated with ElevenLabs. The lip-sync video is created with InfiniteTalk from a single image. You will add big, readable captions, a simple background or b-roll, and one call to action that points viewers to your link.
Time: about 60 minutes
Budget: free or low cost to start
Tools you need
- ElevenLabs for realistic voiceovers
- InfiniteTalk for talking-avatar video, upload an image and your voice file, get a lip-synced clip
- Editor, desktop: Descript for text-based editing and captions
- Editor, mobile: CapCut for fast mobile editing
- Optional: Canva for thumbnails
- Optional: Audacity for quick audio joins
Step 1) Write a spoken-language script, 100–150 words
Keep it simple. One idea, one outcome, under 60 seconds.
Rules for spoken flow
- Use short sentences and contractions. Write how you talk.
- Use you and I so it feels personal.
- Read it out loud. If you stumble, rewrite.
- Add commas to signal pauses.
- End with a one-line CTA that tells people what to do.
Template, copy-paste
Hook: One sentence that names the pain or promise.
Value: Two or three sentences that teach one tip or show a simple step.
Proof or example: One short line that shows it working.
CTA: One action, “Link in description or pinned comment.”
Example, cold email timing
Most cold emails fail because they hit at the wrong moment.
Here is a 10-minute fix.
Find companies that just raised funding, reference their round, then offer one clear win you can deliver in 14 days.
I book 2–3 calls a week like this.
Want the query and message template, link in description.
Step 2) Generate your voice with ElevenLabs
- Paste your script into ElevenLabs
- Pick a friendly preset voice, preview, and listen for flow
- Tweak punctuation for natural pauses, regenerate if needed
- Download as MP3 or WAV. Name it clearly, for example short_narration_v1.mp3
Optional, two-voice format
If you want a quick Q&A, write one listener question as a single sentence. Generate a second voice for that line as a separate file.
Pronunciation tip
If the AI says a name oddly, spell it phonetically in the script, for example “GAHY-lee-oh” for Galileo.
Reference: ElevenLabs Text-to-Speech guides, Product guide and TTS overview
Step 3) Create a talking-avatar clip with InfiniteTalk
Turn a static image into a lip-synced talking head that matches your ElevenLabs audio.
- Pick a clean headshot or a simple avatar image, front-facing works best
- Upload your image to InfiniteTalk
- Upload your voice file from Step 2
- Choose language and style if prompted, then generate
- Download the resulting video clip. Keep it under 60 seconds
Q&A variation
If you made a separate listener question, create a second avatar or a different crop of the same avatar for the question line. Generate two short clips, then alternate them in the edit for a mini dialogue.
Reference: InfiniteTalk overview and generator, Homepage and Create page
Step 4) Assemble the Short in Descript, or CapCut
Descript path, desktop
- Create a new Video project in Descript
- Set canvas to 1080×1920 vertical
- Import your InfiniteTalk clip. If you have two clips, place them in order on the timeline
- Turn on Fancy Captions from the transcript. Make the font large and readable
- Add one visual layer if you want, a subtle gradient background or a looping b-roll behind the avatar
- Place your logo in a corner
- Keep total length under 60 seconds. Export as MP4
Reference: Descript features for captions and vertical exports, Captions and Vertical clip maker
CapCut path, mobile
- New project, 9:16 format in CapCut
- Import your InfiniteTalk clip
- Use Auto Captions, then edit phrasing for clarity
- Add simple backgrounds, stickers, or b-roll if you like
- Place your logo and a “link in description” arrow near the end
- Export as 1080×1920 MP4
Step 5) Captions, visuals, and branding that work on a phone
- Captions: Big, high contrast, 1–2 lines at a time. Highlight one keyword per line
- Safe zones: Keep text away from the very top and bottom. The YouTube UI can cover it
- Visuals: Simple background or a single loop of b-roll. Clean beats clutter almost every time
- Branding: Small logo in a corner, consistent accent color, no distraction
Step 6) Publish as a YouTube Short
Title: Lead with the hook. “Cold emails fail for this reason, fix it in 10 minutes.”
Description: First line repeats the promise, then add your link
Hashtags: Add #shorts and 2–3 relevant tags
Pinned comment: Repeat the CTA with the link, for example “Get the template here, [your link]”
Important note, viewers cannot tap a link inside the video frame. Put your link in the description or a pinned comment for visibility, then point to it with on-screen text like “link below.” See YouTube Help, Sharing links with your audiences.
Thumbnail: Bold text, big contrast, your face or avatar if it fits your brand. Make it quickly in Canva
Step 7) Monetize with one clear CTA
Pick one. Do not stack five asks in one Short.
CTA recipes
- Lead magnet: “Grab the free template, link in description”
- Affiliate tool: “Try the exact tool I used, link below” (add a small “affiliate link” note in the description)
- Service inquiry: “Want this set up for you, book a call. Link below”
On-screen helpers
- Add “link below” text in the last 3–5 seconds
- Consider a small arrow pointing down
- Say the CTA once in the voice track, keep it short
Troubleshooting
- Voice sounds flat: Shorten sentences, add commas, try a different preset
- Over 60 seconds: Cut one sentence. Keep the core promise and the CTA
- Captions drift: Regenerate captions or split the clip into two scenes and re-align
- Avatar lip-sync looks off: Use a clearer headshot, good lighting, straight-on face
- Audio mismatch between clips: Normalize or reduce gain on the louder section by 2–4 dB
Copy-paste prompts and templates
ElevenLabs narration prompt style
Instruction to yourself: Keep sentences under 14 words. Use contractions.
Add commas for pauses. One idea only. End with a one-line CTA.
Script:
[Paste your 100–150 words here]
Listener question generator
Act as a beginner who just watched a 60-second tip about: [topic].
Write one short, curious question in a friendly tone. One sentence only.
Two-voice script format
Narrator: [your line]
Listener: [one-sentence question]
Narrator: [short answer plus CTA]
CTA lines for description and pinned comment
Get the free template: [your link]
Try the tool I used: [your link] (affiliate link)
Book a setup call: [your link]
Publishing checklist
- Vertical 1080×1920
- Under 60 seconds
- Big captions, safe zones respected
- One CTA in voice and on-screen
- Link in description and pinned comment
- Thumbnail uploaded
References
- YouTube Help, Sharing links with your audiences, https://support.google.com/youtube/answer/13748639?hl=en
- ElevenLabs, Text-to-Speech guides, https://elevenlabs.io/docs/product-guides/playground/text-to-speech and https://elevenlabs.io/text-to-speech
- InfiniteTalk, overview and generator, https://www.infinitetalk.net/ and https://www.infinitetalk.net/infinitetalk
- Descript, captions and vertical exports, https://www.descript.com/captions and https://www.descript.com/tools/vertical-clip-maker