Agentic Podcast Pipeline (Local TTS)
Note: Language of content for generated outputs are in Chinese.
I’ve been building a small workflow that takes a question like “Who’s Obama”, turns it into a short podcast script (with a human approval loop), and then generates a voice track locally with VibeVoice.
TL;DR TL;DR
Console in → (search → script → review loop) → podcast.txt → VibeVoice → .wav
TL;DR
- Input: type a topic/question in the console (e.g., “Who’s Obama?”).
- Agents: the workflow does web-grounded search → podcast-style script draft → human approval loop (regenerate until you say
Yes). - Output: save the approved script to
podcast.txt, then run VibeVoice locally to generatepodcast_generated.wav.
Stack
- Microsoft Agent Framework (workflow orchestration + multi-agent edges)
- VibeVoice (local) + vibevoice/VibeVoice-7B
- Reference code: EdgeAI Agentic Workshop Code
Workflow (what happens)
- User input (console):
Who’s Obama - Search agent: calls a web search endpoint (Ollama web search API)
- Script agent: drafts a podcast-style dialogue
- Review gate: you type
Yes/yesto accept, anything else to regenerate (loop) - Separate step: run VibeVoice locally to turn the approved script into a
.wav
Step 1 — Console app: capture a topic and stream workflow events
The console app asks for a topic, starts the workflow, and streams events back (agent updates + human review requests + final output). When the workflow requests human input, the app pauses, collects your response, then resumes the run.
This ends up feeling like “chat,” but it’s actually a workflow run moving through executors and edges.
Step 2 — Workflow: search → generate → review (with a regeneration loop)
I built the workflow as three executors:
search_executor(find background info)gen_script_executor(write a podcast script)review_executor(human approval gate)
Edges are:
search_executor→gen_script_executorgen_script_executor→review_executorreview_executor→gen_script_executor(only if you don’t approve)
Human approval gate (short version)
The review step does two things:
- prints the generated script and asks: “Do you accept this script?”
- if you type Yes, it yields the final output
- otherwise it sends your feedback back to the script agent and loops for a rewrite
That loop is the whole point: agents run fast, but you decide what ships.
Step 3 — Web Grounding: wiring it to Ollama web search
That gives the script agent something grounded to write from (instead of just riffing from memory).
Step 4 — Generate audio with VibeVoice (separate step)
Once the workflow finishes, I save the approved script to podcast.txt.
Then I run VibeVoice locally as a separate step to generate the audio file:
- input:
podcast.txt - output:
assets/ai-podcast/podcast_generated.wav
I kept voice generation out of the workflow on purpose: it’s slower, and I only want to run it after the script is approved.
Pseudocode for each part
Code below does’t work directly, they only demo what’s being done each part of the steps
Part 1 — Console app (capture topic + stream events)
Code
topic = input("Topic: ").strip()
run_id = client.start_workflow(
workflow="podcast_workflow",
input={"topic": topic}
)
for event in client.stream_events(run_id):
print(f"[{event.type}] {event.data}")
if event.type == "HUMAN_INPUT_REQUIRED":
answer = input("Do you accept this script? (Yes to accept): ").strip()
client.resume(run_id, input={"approval": answer})
What it does
- Prompts for a topic.
- Starts a workflow run.
- Streams executor updates to the console.
- Pauses when the workflow asks for human approval, collects your response, then resumes.
Output
- Console logs of progress (search results summary, generated script, review prompt).
- A final “approved script” payload returned by the workflow (which you later save to
podcast.txt).
Part 2 — Workflow graph (search → generate → review loop)
Code
workflow.add_edge("search_executor", "gen_script_executor")
workflow.add_edge("gen_script_executor", "review_executor")
# Loop only when not approved
workflow.add_conditional_edge(
"review_executor",
when=lambda r: r["approved"] is False,
to="gen_script_executor"
)
What it does
- Defines the “happy path”: search → write → review.
- Adds a conditional loop to regenerate the script when the reviewer rejects it.
Output
- Deterministic orchestration:
- approved → workflow ends
- rejected → script agent runs again with feedback
Part 3 — Search executor (web grounding)
Code
url = "https://ollama.com/api/web_search"
resp = requests.post(
url,
headers={"Authorization": f"Bearer {api_key}"},
json={"query": query},
timeout=30
)
results = resp.json()
return {"sources": results}
What it does
- Calls your web search endpoint and returns structured results to downstream steps.
- Gives the script agent “something to cite / summarize” instead of relying on memory.
Output
sources: titles/snippets/URLs (whatever your endpoint returns)- Optional: a short “key facts” summary you extract for prompting the script agent
Part 4 — Script executor (draft dialogue)
Code
prompt = f"""
Write a short podcast dialogue in Chinese.
Use ONLY the facts from these sources: {sources}
If a fact is missing, say you cannot confirm it.
"""
script = llm.generate(prompt)
return {"script": script}
What it does
- Produces a Chinese podcast-style dialogue.
- (Recommended) Constrains generation to searched sources to reduce hallucinations.
Output
script: the draft text shown to the reviewer
Part 5 — Review executor (approval gate)
Code
print(script)
approval = await request_human_input("Do you accept this script? Type Yes to accept.")
approved = approval.strip().lower() == "yes"
if not approved:
return {"approved": False, "feedback": approval}
return {"approved": True, "final_script": script}
What it does
- Shows the script.
- Accepts
Yes/yesto finalize. - Any other input becomes feedback and triggers the rewrite loop.
Output
- Approved:
final_script - Not approved:
feedback→ goes back into the script prompt
Part 6 — VibeVoice (separate offline audio generation)
Code
python inference.py \
--model vibevoice/VibeVoice-7B \
--input podcast.txt \
--output podcast_generated.wav
What it does
- Converts the approved script into audio after the workflow finishes.
Output
podcast_generated.wav
Results
- The workflow is simple, but it’s already useful: researched answer → narrated script → voice track.
- The approval gate is what makes it practical (and not just “LLM output spam”).
- Separating VibeVoice made iteration faster: I can regenerate scripts without regenerating audio.
Problems Encountered & Solutions
1. Empty Response on First Try (Microsoft Agent Framework)
During demo2, the Microsoft Agent Framework returned an empty response on the first invocation. The workflow would run through search and script generation, but the review_executor showed no content—even though the same query worked fine on retry.
Workaround: Update the Microsoft Agent Framework to a newer version, or simply type no at the review prompt to trigger a second round—the content usually appears correctly on retry.
2. Lost Context in Back-and-Forth Conversations
When iterating on the script through multiple feedback rounds, the agent sometimes “forgot” earlier context—leading to inconsistent rewrites or repeated mistakes.
Potential solutions:
- Increase context window size (use a model with larger context)
- Set
OLLAMA_CONTEXT_LENGTHenvironment variable to expand the context buffer - Use a more capable model that handles long conversations better
3. Background Noise in VibeVoice Output
The generated audio has noticeable background noise/artifacts. This is a limitation of the current VibeVoice model.
Workaround: Use a denoise voice processing pipeline as a post-processing step, or wait for improved future models. The audio is usable but not production-quality.
4. Future Improvement: Grounding for Controversial Topics
The current workflow does a single web search for grounding. For controversial or nuanced topics, this might not be enough to produce balanced content.
Proposed improvement: Add an additional web search node specifically for fact-checking or gathering alternative perspectives before finalizing the script. This would help prevent one-sided or potentially misleading content.
Generated script podcast.txt
Speaker 1: 欢迎收听本期播客!我是主持人Lucy。今天我们邀请到了历史专家Ken,一起聊聊美国前总统奥巴马的故事。Ken,您能先和听众朋友们简单介绍一下奥巴马的生平吗?
Speaker 2: 当然可以,Lucy。奥巴马于1961年8月4日出生在夏威夷火奴鲁鲁,是美国第44任总统,也是首位非裔总统。他的家庭背景非常多元,父亲是肯尼亚经济学家,母亲是美国白人,两人在夏威夷大学相遇并结婚。奥巴马的童年经历非常丰富,他曾随母亲在印尼生活,接触伊斯兰和天主教文化,这段经历对他后来的身份认同产生了深远影响。
Speaker 1: 听起来他的成长环境非常国际化。那他如何从一个普通家庭的孩子成长为美国政坛的领袖呢?
Speaker 2: 这个问题很值得探讨。奥巴马在夏威夷长大后,进入了一所精英预科学校,后来在加州欧文学院学习政治学,最终获得哈佛大学法学博士学位。他在哈佛期间担任《哈佛法律评论》主编,成为首位非裔主编。毕业后,他回到芝加哥从事社区组织工作,这段经历让他更贴近底层民众,也为他后来的政治生涯奠定了基础。
Speaker 1: 是的,他的社区组织经历确实很关键。那他如何进入政坛的呢?
Speaker 2: 2004年,奥巴马在伊利诺伊州的国会听证会上发表的演讲让他一举成名。他后来当选为州议员,并在2004年参选联邦参议员,以“希望与改变”为口号赢得选民支持。2008年,他作为民主党候选人参选总统,最终击败希拉里·克林顿,成为首位非裔总统。
Speaker 1: 那他的总统任期有哪些重要成就呢?
Speaker 2: 他的任期以“希望与改变”为标志,推动了《平价医疗法案》(奥巴马医改),扩大了医保覆盖范围;在气候问题上,他主导了《巴黎气候协定》,推动全球减排合作;外交上,他通过“巧实力”策略加强与盟友关系,同时在中东问题上采取更灵活的外交手段。此外,他还获得2009年诺贝尔和平奖,以表彰他对国际和平的贡献。
Speaker 1: 这些成就确实令人印象深刻。但他的任期也面临不少挑战,比如经济危机和伊拉克战争的处理。
Speaker 2: 没错。2008年金融危机期间,奥巴马推动了经济刺激计划,但批评者认为其政策效果有限。在伊拉克战争问题上,他主张逐步撤军,但曾因“谎言门”事件(关于伊拉克拥有大规模杀伤性武器的虚假情报)引发争议。不过,他后来通过外交手段推动了美军撤出伊拉克,展现了务实的领导风格。
Speaker 1: 那他的个人生活对他的政治生涯有什么影响呢?
Speaker 2: 奥巴马的个人生活与政治紧密相连。他的妻子米歇尔不仅是第一夫人,更是一位社会活动家,推动教育和健康议题。他们的两个女儿也常出现在公众视野中,成为他家庭形象的一部分。此外,奥巴马的演讲能力极强,他善于用故事打动人心,这让他在竞选中脱颖而出。
Speaker 1: 是的,他的演讲确实充满感染力。那您认为他的历史地位如何?
Speaker 2: 奥巴马的任期标志着美国政治的一个转折点。他打破了种族壁垒,推动了社会进步议题,但也面临传统政治势力的阻力。他的政策影响深远,比如医改和气候协定,至今仍在发挥作用。不过,他的任期也暴露了美国政治的复杂性,比如党派斗争和经济挑战。
Speaker 1: 感谢Ken的精彩分享!最后,您有什么想对听众说的吗?
Speaker 2: 我想说,奥巴马的故事不仅是个人奋斗的传奇,更是美国多元文化融合的缩影。他的经历提醒我们,改变需要勇气和坚持,而历史的车轮永远由无数个体推动。
Speaker 1: 非常感谢Ken!听众朋友们,如果喜欢本期内容,别忘了订阅我们的播客。我们下期再见!