Breaking Veo's 8-Second Wall: a tiny tool for long AI videos

Trent Tompkins June 3, 2026

Google's Veo makes genuinely beautiful video from a still image and a sentence of direction. There's just one catch that trips everyone up: a single Veo clip is capped at 8 seconds. Ask for a 30-second hero shot and the API politely says no.

So I wrote a small tool, veo_long, that gets around the limit with an old film-editing trick rather than waiting for a bigger model. The idea is dead simple, and once you see it you'll wonder why it isn't built in.

The trick: hand each clip off to the next

Video is just a sequence of still frames. The last frame of one clip and the first frame of the next are both just images. So:

Generate clip 1 (8 seconds) from your starting image.
Pull clip 1's final frame out as a PNG.
Feed that PNG to Veo as the first frame of clip 2, with a new prompt.
Repeat as many times as you want.
Glue all the clips together into one file.

Because clip 2 literally starts on the exact frame clip 1 ended on, there's no jump cut — the motion just keeps going. Four 8-second clips become one smooth 32-second video. Want a minute? Use eight clips.

Here's a 32-second example the tool stitched together for me in a single command — it opens by morphing one image into another, then three more clips each pick up where the last left off:

Wait — what's "ffmpeg," and do I need it?

Yes, and don't be scared of it. ffmpeg is a free, open-source command-line tool that can do basically anything to audio and video — convert it, trim it, pull out a single frame, join files end to end. It's the invisible engine behind a huge slice of the internet's video. You won't have to learn it; veo_long just calls it for you to grab the last frame of each clip and to concatenate the pieces at the end.

You only need to install it once. Open a terminal and copy-paste whichever line matches your computer:

# Windows (pick any one)
winget install Gyan.FFmpeg
choco install ffmpeg
scoop install ffmpeg

# macOS
brew install ffmpeg

# Linux (Debian / Ubuntu)
sudo apt update && sudo apt install -y ffmpeg

Then open a fresh terminal and type ffmpeg -version. If you see a wall of version text instead of "command not found," you're done.

The other ingredients

Besides ffmpeg you need Python and two libraries:

pip install google-genai pillow

And access to Veo through Google Cloud. The nice part: there's no API key to paste. You install Google's gcloud tool, run gcloud auth login once, and the script mints a short-lived token from that login each time it runs. Nothing secret ever lives in the file. You only set your project id at the top of the script:

PROJECT  = "your-gcp-project-id"   # set this to your Google Cloud project
LOCATION = "us-central1"           # the region where Veo lives

Using it

Each clip is a little block — {seconds=8 frame1=... frame2=... prompt=...}. You can wrap blocks in { }, [ ], or ( ), whatever reads nicest to you.

seconds — 4, 6, or 8 (Veo's per-clip limit)
frame1 — your start image, or the word last to continue from the previous clip's final frame, or leave it out for a pure text-to-video opener
frame2 — optional end image; if you give it, Veo smoothly interpolates frame1 → frame2 for that clip
prompt — describe the motion (no quotes needed inside the block)

A full 32-second video in one command:

python veo_long.py out.mp4 \
  "{seconds=8 frame1=start.png frame2=end.png prompt=the title card dissolves into the live scene}" \
  "{seconds=8 frame1=last prompt=camera glides slowly across the room}" \
  "{seconds=8 frame1=last prompt=a figure walks into frame and turns to face us}" \
  "{seconds=8 frame1=last prompt=pull back for a wide hero shot, triumphant}"

It's resumable. If clip 3 of 8 fails (a hiccup, a timeout), just run the exact same command again — the clips that already finished are reused and it picks up where it stopped. No re-generating, no wasted credits.

The whole thing

That's it — one self-contained Python file. Here's the complete source, MIT-licensed, yours to use and butcher:

⬇ Download veo_long.zip (the script + README + license)

#!/usr/bin/env python3
"""veo_long.py — chain/interpolate Veo clips into one long, continuous video.

Veo caps a single clip at 8s. To go longer you stitch: generate clip 1, take its
LAST frame, feed it as clip 2's FIRST frame, repeat, then ffmpeg-concat. Each
hand-off is seamless because clip N+1 literally starts on clip N's final frame.

Each clip takes up to FOUR named params, written as a brace block:

    {seconds=8 frame1=psp.png frame2=psp_last.png prompt=the letter dissolves into the game}

  seconds : 4, 6, or 8   (Veo per-clip limit)
  frame1  : start frame — a PNG/JPG path, or 'last' to chain from the PREVIOUS
            clip's final frame, or omit for pure text-to-video (clip 1 only)
  frame2  : OPTIONAL end frame (a path). If given, Veo INTERPOLATES frame1->frame2
            for this clip (uses veo-2.0, which supports last_frame). If omitted,
            it's a normal first-frame image-to-video on veo-3.0 (full quality).
  prompt  : the motion prompt (everything after 'prompt=' to the end of the block;
            no quotes needed inside the braces)

EXAMPLE (32s = 4x8s):

    python tools/veo_long.py out.mp4 \
      "{seconds=8 frame1=psp.png frame2=psp_last.png prompt=acceptance letter dissolves into the live game}" \
      "{seconds=8 frame1=last prompt=camera glides across the glowing spellbook}" \
      "{seconds=8 frame1=last prompt=a ghost drifts up from the Graves card}" \
      "{seconds=8 frame1=last prompt=pull back to reveal the full game, triumphant}"

You can also put the blocks in a file: --specfile clips.txt (one {block} per line).

Flags: --aspect 16:9|9:16 (default 16:9), --audio (default OFF for clean joins),
--keep (keep the per-clip work dir). Auth: gcloud USER token (Veo 403s the
service account). Resumable: existing part_NN.mp4 are reused. Requires ffmpeg.
"""
from __future__ import annotations
import os, sys, re, time, subprocess, shutil
os.environ.pop("GOOGLE_APPLICATION_CREDENTIALS", None)
from pathlib import Path
from PIL import Image
from google.oauth2.credentials import Credentials
from google import genai
from google.genai import types
from google.genai import errors as genai_errors
try: sys.stdout.reconfigure(encoding="utf-8")
except Exception: pass

# ─────────────────────────────────────────────────────────────────────────────
#  CONFIGURE ME  ──  set these two before first run
# ─────────────────────────────────────────────────────────────────────────────
#  1. Make a Google Cloud project with the Vertex AI API enabled + billing on.
#  2. Install the gcloud CLI, then run ONCE:  gcloud auth login
#     (This tool mints a short-lived token from that login at runtime — there is
#      NO API key to paste and nothing secret stored in this file.)
#  3. Put your project id below. The region "us-central1" is where Veo lives.
PROJECT  = "your-gcp-project-id"   # <── CHANGE THIS to your GCP project id
LOCATION = "us-central1"           #     Veo region (leave as-is unless you know otherwise)
# ─────────────────────────────────────────────────────────────────────────────

MODEL_V3, MODEL_V2 = "veo-3.0-generate-001", "veo-2.0-generate-001"
CHAIN = {"last", "-", "^", "prev"}
INITIAL_WAIT, RAMP, MAX_WAIT = 90, 30, 300

AUTHOR = "Opus (Anthropic Claude), for Trent Tompkins"
LICENSE = """\
MIT License — Copyright (c) 2026 Trent Tompkins

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE."""


def print_help() -> None:
    print(__doc__)
    print("Clip blocks may be wrapped in { }, [ ], or ( ).\n")
    print(f"Author: {AUTHOR}\n")
    print(LICENSE)


def _ffmpeg() -> str:
    exe = shutil.which("ffmpeg")
    if not exe:
        sys.exit("[veo_long] ERROR: ffmpeg not on PATH.")
    return exe


def _token() -> str:
    return subprocess.run("gcloud auth print-access-token", shell=True,
                          capture_output=True, text=True, timeout=60).stdout.strip()


def _client():
    return genai.Client(vertexai=True, project=PROJECT, location=LOCATION,
                        credentials=Credentials(token=_token()))


def parse_block(block: str) -> dict:
    """Parse '{seconds=8 frame1=x frame2=y prompt=...}' -> dict. prompt= grabs the
    rest of the block so it can contain spaces and '=' freely."""
    s = block.strip()
    if s and s[0] in "{[(": s = s[1:]
    if s and s[-1] in "}])": s = s[:-1]
    s = s.strip()
    d = {}
    m = re.search(r"\bprompt\s*=", s)
    head = s
    if m:
        d["prompt"] = s[m.end():].strip().strip('"').strip("'")
        head = s[:m.start()]
    for kv in head.split():
        if "=" in kv:
            k, v = kv.split("=", 1)
            d[k.strip().lower()] = v.strip().strip('"').strip("'")
    secs = d.get("seconds", "8")
    if secs not in ("4", "6", "8"):
        sys.exit(f"[veo_long] ERROR: seconds must be 4/6/8 in: {block}")
    d["seconds"] = int(secs)
    if not d.get("prompt"):
        sys.exit(f"[veo_long] ERROR: missing prompt in: {block}")
    return d


def extract_blocks(tokens: list[str]) -> list[str]:
    """Find {…} blocks whether each is one quoted arg or split across argv."""
    joined = " ".join(tokens)
    blocks = re.findall(r"[\{\[\(][^{}\[\]()]*[\}\]\)]", joined)
    if not blocks:
        sys.exit("[veo_long] ERROR: no clip blocks found (wrap each clip in { }, [ ], or ( )).")
    return blocks


def _resize_to(src: Path, like: Path, out: Path) -> Path:
    w, h = Image.open(like).size
    Image.open(src).convert("RGB").resize((w, h), Image.LANCZOS).save(out)
    return out


def generate(client, clip, first_png, second_png, out_mp4, aspect, audio):
    if out_mp4.exists():
        print(f"[clip] SKIP exists {out_mp4.name} ({out_mp4.stat().st_size//1024}KB)", flush=True)
        return True
    secs = clip["seconds"]
    img = types.Image.from_file(location=str(first_png)) if first_png else None
    interp = second_png is not None
    model = MODEL_V2 if interp else MODEL_V3
    kw = dict(aspect_ratio=aspect, number_of_videos=1, duration_seconds=secs)
    if not interp:
        kw["resolution"] = "1080p"; kw["generate_audio"] = audio
    if interp:
        kw["last_frame"] = types.Image.from_file(location=str(second_png))
    wait, attempt, op = INITIAL_WAIT, 0, None
    while True:
        attempt += 1
        kind = "interp frame1->frame2" if interp else ("frame1" if img else "text-only")
        print(f"[{time.strftime('%H:%M:%S')}] [clip] {out_mp4.name} submit #{attempt} "
              f"({secs}s, {model.split('-generate')[0]}, {kind})", flush=True)
        try:
            op = client.models.generate_videos(model=model, prompt=clip["prompt"], image=img,
                                               config=types.GenerateVideosConfig(**kw)); break
        except genai_errors.ClientError as e:
            if "429" in str(e) or "RESOURCE_EXHAUSTED" in str(e):
                print(f"  429 — sleep {wait}s", flush=True); time.sleep(wait); wait = min(wait+RAMP, MAX_WAIT); continue
            print(f"  ClientError: {str(e)[:280]}", flush=True); return False
        except Exception as e:
            print(f"  {e.__class__.__name__}: {str(e)[:280]}", flush=True); return False
    t0 = time.time()
    while not op.done:
        time.sleep(10); op = client.operations.get(op); print(f"  polling {int(time.time()-t0)}s", flush=True)
    if getattr(op, "error", None):
        print(f"  op error: {op.error}", flush=True); return False
    resp = getattr(op, "response", None)
    vids = getattr(resp, "generated_videos", None) if resp else None
    if not vids:
        print(f"  NO VIDEO (rai={getattr(resp,'rai_media_filtered_reasons',None)})", flush=True); return False
    v = vids[0]
    if getattr(v.video, "video_bytes", None):
        out_mp4.write_bytes(v.video.video_bytes)
    else:
        client.files.download(file=v.video); v.video.save(str(out_mp4))
    print(f"[clip] SAVED {out_mp4.name} ({out_mp4.stat().st_size//1024}KB, {int(time.time()-t0)}s)", flush=True)
    return True


def extract_last_frame(ff, mp4, png):
    subprocess.run([ff, "-y", "-sseof", "-0.05", "-i", str(mp4), "-frames:v", "1",
                    "-q:v", "1", str(png)], capture_output=True)
    if not png.exists():
        subprocess.run([ff, "-y", "-i", str(mp4), "-vf", "reverse", "-frames:v", "1",
                        str(png)], capture_output=True)
    return png.exists()


def concat(ff, parts, out):
    lst = out.parent / "_concat_list.txt"
    lst.write_text("".join(f"file '{p.as_posix()}'\n" for p in parts), encoding="utf-8")
    r = subprocess.run([ff, "-y", "-f", "concat", "-safe", "0", "-i", str(lst),
                        "-c", "copy", str(out)], capture_output=True, text=True)
    if r.returncode != 0 or not out.exists():
        print("[concat] -c copy failed, re-encoding…", flush=True)
        subprocess.run([ff, "-y", "-f", "concat", "-safe", "0", "-i", str(lst),
                        "-c:v", "libx264", "-pix_fmt", "yuv420p", "-r", "24", str(out)],
                       capture_output=True)
    lst.unlink(missing_ok=True)
    return out.exists()


def main():
    argv = sys.argv[1:]
    if not argv or argv[0] in ("-h", "--help", "help", "man", "/?"):
        print_help(); return
    aspect, audio, keep, specfile = "16:9", False, False, None
    pos = []
    i = 0
    while i < len(argv):
        a = argv[i]
        if a == "--aspect": aspect = argv[i+1]; i += 2; continue
        if a == "--audio": audio = True; i += 1; continue
        if a == "--keep": keep = True; i += 1; continue
        if a == "--specfile": specfile = argv[i+1]; i += 2; continue
        pos.append(a); i += 1
    if not pos:
        sys.exit(__doc__)
    out = Path(pos[0]).resolve()
    if specfile:
        blocks = extract_blocks(Path(specfile).read_text(encoding="utf-8").splitlines())
    else:
        blocks = extract_blocks(pos[1:])
    clips = [parse_block(b) for b in blocks]
    ff = _ffmpeg()
    work = out.parent / (out.stem + "_parts")
    work.mkdir(parents=True, exist_ok=True)
    print(f"[veo_long] {len(clips)} clips -> {out} (aspect={aspect}, audio={audio})", flush=True)

    client = None
    parts, prev_last = [], None
    for n, clip in enumerate(clips):
        f1 = clip.get("frame1", "")
        if f1 in CHAIN:
            first = prev_last
            if first is None: sys.exit(f"[veo_long] ERROR: clip {n} frame1=last but no previous frame")
        elif f1 in ("", "none", "text", "_"):
            first = None
        else:
            first = Path(f1).resolve()
            if not first.exists(): sys.exit(f"[veo_long] ERROR: clip {n} frame1 not found: {first}")
        second = None
        f2 = clip.get("frame2", "")
        if f2 and f2 not in ("none", "_"):
            second = Path(f2).resolve()
            if not second.exists(): sys.exit(f"[veo_long] ERROR: clip {n} frame2 not found: {second}")
            if first is not None:  # match end-frame dims to start-frame for clean interpolation
                second = _resize_to(second, first, work / f"f2_{n:02d}.png")
        part = work / f"part_{n:02d}.mp4"
        if not part.exists():
            if client is None: client = _client()
            if not generate(client, clip, first, second, part, aspect, audio):
                sys.exit(f"[veo_long] clip {n} failed — fix + re-run (done parts resume)")
        parts.append(part)
        prev_last = work / f"last_{n:02d}.png"
        if not extract_last_frame(ff, part, prev_last):
            print(f"[veo_long] WARN: couldn't extract last frame of {part.name}", flush=True); prev_last = None

    print(f"[veo_long] concatenating {len(parts)} parts…", flush=True)
    if not concat(ff, parts, out):
        sys.exit("[veo_long] ERROR: concat failed")
    total = sum(c["seconds"] for c in clips)
    print(f"[veo_long] DONE -> {out} ({out.stat().st_size//1024}KB, ~{total}s)", flush=True)
    if not keep:
        shutil.rmtree(work, ignore_errors=True)


if __name__ == "__main__":
    main()

Why this matters

The lesson isn't really about Veo. It's that a hard limit in a tool is often just an invitation to wrap it. Eight seconds was a wall; a dozen lines of glue turned it into a building block. Most of the "AI can't do X" complaints I hear are one small script away from "actually, it can."