Building QuantumSketch: AI + Manim for STEM Video

QuantumSketch turns a plain-English prompt — "explain Fourier transforms visually" — into a narrated, animated STEM video in under three minutes. The core insight: LLMs are excellent scriptwriters and Manim code generators, but terrible at execution. Keep them separate.

Pipeline overview

User prompt
  → LLM (Claude Sonnet) → script + Manim code
  → Validator (AST parse + safe-exec check)
  → Manim render (Docker, isolated)
  → TTS narration (ElevenLabs)
  → ffmpeg merge (audio + video)
  → CDN upload → signed URL returned

Every stage is a separate microservice. Temporal.io orchestrates the workflow so any stage can fail and retry without restarting the whole job. A 3-minute video generation can take 90 seconds of Manim render time — Temporal makes that durable.

LLM prompting strategy

The single biggest quality lever is the system prompt for Manim code generation. I constrain it hard:

SYSTEM = """
You are a Manim CE expert. Output ONLY valid Python code.
Rules:
1. Use Manim CE 0.18 API only (no deprecated calls)
2. Every scene must subclass Scene or ThreeDScene
3. No file I/O, no network calls, no subprocess
4. Max scene duration: 60s
5. Return JSON: {"script": "...", "manim_code": "..."}
"""

The JSON wrapper is critical — it lets me validate schema before even attempting render. Bad JSON → re-prompt once, then fail gracefully.

The validation layer

Raw LLM-generated Manim code can contain imports that crash the container, infinite loops, or deprecated API calls. Before render:

AST parse — verify it's valid Python
Import whitelist — only manim, numpy, math allowed
Static call check — flag subprocess, os.system, open()
Timeout probe — dry-run with 5s CPU limit, check for termination

Roughly 15% of first-pass LLM outputs fail validation and need a retry. After retry, failure drops to ~2%.

Manim render isolation

Each render runs in a Docker container:

docker run --rm \
  --network none \
  --memory 2g \
  --cpus 2.0 \
  --pids-limit 128 \
  -v /tmp/job-{id}:/workspace \
  quantumsketch-manim:latest \
  python -m manim render /workspace/scene.py MainScene \
    --format mp4 --quality m

The container has no network access. Output writes to the mounted volume. Render time for a 60s animation at medium quality: 15–45 seconds depending on scene complexity.

TTS + ffmpeg merge

I use ElevenLabs for narration (voice consistency matters for a learning product). The script and audio are generated in parallel with Manim render to save wall-clock time:

async def generate_video(job: Job):
    manim_task = asyncio.create_task(render_manim(job.code))
    audio_task = asyncio.create_task(generate_tts(job.script))
    video_path, audio_path = await asyncio.gather(manim_task, audio_task)
    return await merge_av(video_path, audio_path)

ffmpeg merges with -shortest flag so audio/video lengths sync even if render and TTS drift slightly.

Cost breakdown (per video)

| Stage | Provider | Cost | |-------|----------|------| | LLM (script + code) | Claude API | ~$0.008 | | TTS (avg 300 words) | ElevenLabs | ~$0.015 | | Compute (Docker render) | EC2 t3.medium | ~$0.004 | | Storage + CDN | S3 + CloudFront | ~$0.001 | | Total | | ~$0.028/video |

At 1,000 videos/day that's ~$28/day in costs. Comfortably profitable at the current subscription price.

What I'd change

The Manim code generation quality drops sharply for 3D scenes (ThreeDScene API is more complex and underrepresented in training data). I'm experimenting with a few-shot library of 3D templates the LLM can adapt rather than generate from scratch.

FAQ

What is QuantumSketch? QuantumSketch is an AI-powered STEM video generator that converts plain-English prompts into animated, narrated educational videos using Manim and LLMs.

How does QuantumSketch generate animations? It chains an LLM (Claude) to write both the narration script and Manim Python code, validates the code, renders it in an isolated Docker container, generates TTS audio, and merges them with ffmpeg.

What is Manim? Manim is an open-source Python library for creating mathematical animations, originally created by 3Blue1Brown. QuantumSketch uses Manim Community Edition.

How long does video generation take? Typically 60–120 seconds end-to-end depending on scene complexity and Manim render time.

Can I use QuantumSketch for my own content? Yes — try it at quantumsketch.app. It supports physics, math, chemistry, and CS topics.

Built by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. See also: How I Use Temporal.io for Long-Running GenAI Workflows.