Docs Audio Workflow (Docs TTS)

Documentation categories such as docs/LookAround/ support TTS narration audio. Unlike blog audio, the docs audio player is automatically injected into the document layout, so individual MDX files do not need to import a player component.

Current docs category with audio enabled:

Category	Directory	Article Count	Chinese Audio	English Audio
LookAround	`docs/LookAround/`	8	Generated	Generated

Architecture Overview

Docs audio reuses the blog audio player while keeping a separate docs manifest and separate OSS path. This preserves the same playback experience without mixing blog and docs manifest entries.

Component Relationship

BlogAudioPlayer (core playback logic)
  ├── BlogPostPage auto-injects: <BlogAudioPlayer slug={audioSlug} />
  │   ├── slug derived from permalink via getBlogAudioSlug(metadata)
  │   └── Reads blogAudioManifest.json by default
  │
  └── DocsAudioPlayer (thin wrapper)
      └── <DocsAudioPlayer slug="xxx" />
          └── Passes docsAudioManifest.json + keyPrefix="docs/"

Key Files

File	Purpose
`src/components/BlogAudioPlayer/index.js`	Core player component with `manifest` and `keyPrefix` props
`src/components/DocsAudioPlayer/index.js`	Docs audio wrapper that passes the docs manifest and prefix
`src/theme/BlogPostPage/index.js`	Blog page layout swizzle that auto-injects the blog player
`src/utils/blogAudio.js`	Derives blog audio slug from permalink
`src/utils/lookAroundDocs.js`	LookAround document detection helpers
`src/theme/DocItem/Layout/index.js`	Document layout that auto-injects the docs player
`src/data/blogAudioManifest.json`	Blog audio manifest imported by the site
`src/data/docsAudioManifest.json`	Docs audio manifest imported by the site
`static/audio/blog/manifest.json`	Backup blog manifest access path
`static/audio/docs/manifest.json`	Backup docs manifest access path

Manifest Structure

docsAudioManifest.json uses these keys:

Chinese: docs/{slug}, for example docs/omega-horizontal-vertical-analysis
English: en/docs/{slug}, for example en/docs/omega-horizontal-vertical-analysis

Each entry contains:

{
  "docs/omega-horizontal-vertical-analysis": {
    "urls": [
      "https://oss.nevergpdzy.com/Audio/docs/omega-horizontal-vertical-analysis_001.mp3",
      "https://oss.nevergpdzy.com/Audio/docs/omega-horizontal-vertical-analysis_002.mp3"
    ],
    "voice": "茉莉",
    "generatedAt": "2026-05-02T06:10:34.705519+00:00"
  }
}

OSS Paths

Content	OSS Path
Chinese docs audio	`Audio/docs/{slug}.mp3` or `Audio/docs/{slug}_001.mp3`
English docs audio	`Audio/docs/en/en_{slug}.mp3` or `Audio/docs/en/en_{slug}_001.mp3`
Docs manifest	`Audio/docs/manifest.json`

Public URLs in docs audio manifests must use https://oss.nevergpdzy.com/; do not reintroduce the retired picture domain into src/data/docsAudioManifest.json or static/audio/docs/manifest.json.

These paths are independent from blog audio under Audio/blog/.

Generation Strategy

Docs articles are usually longer than blog posts and often contain tables, references, and image resource sections. The generator's docs mode handles this by:

Extracting readable text while removing frontmatter, code blocks, JSX, images, raw URLs, source/reference sections, and image-resource appendices.
Converting Markdown tables before normal Markdown cleanup so rows become readable sentences instead of pipe-delimited text.
Splitting at paragraph boundaries, with a default --chunk-char-limit 2400; reduce this to values such as 1200 for problematic long Chinese articles.
Applying concurrency only at the article level. --article-jobs can process multiple articles in parallel, but chunks inside one article always run serially so audio order strictly follows text order.
Validating each generated chunk with ffprobe; suspiciously short or truncated audio is deleted and retried.
Refusing to publish a manifest when any article in the run fails, which prevents partially missing audio from becoming the live manifest.

Chinese audio uses the 茉莉 voice with a slow, gentle, soothing prompt and natural pauses. English audio uses Chloe with the original natural, fluent narration prompt. Do not use atempo or other MP3 post-processing for speed changes; pacing should come from the TTS prompt and stable chunking.

Generate Audio

Generate All Chinese LookAround Audio

cd ../tts-blog-generator
python generate.py --type docs --lang zh --force --article-jobs 2

By default, docs mode scans ../Dev-Knowledge-Base/docs/LookAround.

Generate All English LookAround Audio

cd ../tts-blog-generator
python generate.py --type docs --lang en \
  --blog-dir "../Dev-Knowledge-Base/i18n/en/docusaurus-plugin-content-docs/current/LookAround" \
  --force --article-jobs 2

English audio uses the Chloe voice, en_ filenames, and the OSS directory Audio/docs/en/.

Regenerate One Article

cd ../tts-blog-generator

# Chinese
python generate.py --type docs --lang zh \
  --include mercedes-benz-g-class-history-category-industry-position \
  --force --article-jobs 1

# English
python generate.py --type docs --lang en \
  --blog-dir "../Dev-Knowledge-Base/i18n/en/docusaurus-plugin-content-docs/current/LookAround" \
  --include mercedes-benz-g-class-history-category-industry-position \
  --force --article-jobs 1

If a long Chinese article produces garbled-sounding or suspicious audio around a specific timestamp, inspect the extracted text first, then regenerate with a smaller chunk size:

python generate.py --type docs --lang zh \
  --include mercedes-benz-g-class-history-category-industry-position \
  --force --article-jobs 1 --chunk-char-limit 1200

The G-Class Chinese audio was regenerated this way: 15 smaller chunks replaced the original 7 long chunks.

Common Arguments

Argument	Meaning
`--type docs`	Generate docs audio; defaults to `docs/LookAround/`
`--lang zh` / `--lang en`	Select language
`--blog-dir <path>`	Override source directory; required for English docs mirror
`--include <slug>`	Process only matching slug or filename
`--force`	Regenerate and remove stale manifest entries for target articles
`--article-jobs <n>`	Article-level concurrency; chunks within one article remain serial
`--chunk-char-limit <n>`	Maximum characters per TTS chunk
`--dry-run`	Preview chunk mapping without API calls
`--skip-upload`	Skip OSS upload

Text Inspection

Before regenerating a problematic article, inspect the exact text that will be sent to TTS:

cd ../tts-blog-generator

python - <<'PY'
from pathlib import Path
import generate

slug = "mercedes-benz-g-class-history-category-industry-position"
path = Path("../Dev-Knowledge-Base/docs/LookAround") / f"{slug}.md"
text = generate.extract_text(path)
chunks = generate.chunk_text(text, limit=1200)

Path("output").mkdir(exist_ok=True)
Path("output/check_zh_gclass.txt").write_text(
    "\n\n".join(
        f"=== chunk {i:03d} / {len(chunks)} ({len(chunk)} chars) ===\n{chunk}"
        for i, chunk in enumerate(chunks, 1)
    ),
    encoding="utf-8",
)
print(len(text), [len(chunk) for chunk in chunks])
PY

Check for:

No mojibake markers such as �, 鈥, 盲, or 鏂.
No raw URLs, Markdown images, table separators, or reference appendices.
Chunk numbers, filenames, and text order line up.
Tables have been converted into readable sentences.

Windows PowerShell may display Chinese files as mojibake when using Get-Content; trust Python reads with encoding="utf-8".

Sync the Manifest

The generator writes:

output/manifest.json: combined blog and docs manifest.
output/docs-manifest.json: docs-only manifest for this site.
output/blog-manifest.json: blog-only manifest.

After docs generation, copy the docs-only manifest:

cp output/docs-manifest.json ../Dev-Knowledge-Base/src/data/docsAudioManifest.json
cp output/docs-manifest.json ../Dev-Knowledge-Base/static/audio/docs/manifest.json

Do not copy output/manifest.json into docsAudioManifest.json; it may include blog entries.

Player Integration

Automatic Injection

The docs player is injected in src/theme/DocItem/Layout/index.js.

import {isLookAroundDocMetadata, getLookAroundDocSlug} from '@site/src/utils/lookAroundDocs';
import DocsAudioPlayer from '@site/src/components/DocsAudioPlayer';

const isLookAround = isLookAroundDocMetadata(metadata);
const lookAroundSlug = isLookAround ? getLookAroundDocSlug(metadata) : null;

<DocBreadcrumbs />
<DocVersionBadge />
{isLookAround && lookAroundSlug && <DocsAudioPlayer slug={lookAroundSlug} />}
{docTOC.mobile}

isLookAroundDocMetadata() checks metadata.sourceDirName === 'LookAround'.

Manual Use

For a specific MDX file, the player can also be placed manually:

import DocsAudioPlayer from '@site/src/components/DocsAudioPlayer';

<DocsAudioPlayer slug="your-doc-slug" />

The slug is the filename without extension.

Add Audio Support for Another Docs Category

Add a category detection helper under src/utils/, following lookAroundDocs.js.
Import it in src/theme/DocItem/Layout/index.js and add conditional rendering.
Generate Chinese audio with python generate.py --type docs --blog-dir "../Dev-Knowledge-Base/docs/YourCategory".
Generate English audio from the mirrored i18n/en/ directory.
Merge or copy output/docs-manifest.json into both docs manifest files.

If the new category needs a different OSS prefix, update the docs OSS key logic in generate.py and keep the player keyPrefix convention aligned.

Troubleshooting and Verification

Local Duration Check

cd ../tts-blog-generator
ffprobe -v error -show_entries format=duration,size -of json output/your-file.mp3

If a long Chinese chunk is only a few dozen seconds or much shorter than similarly sized chunks, treat it as a truncated TTS result. Delete it and retry, usually with a smaller --chunk-char-limit.

OSS URL Check

After adding or regenerating audio, probe the target article's manifest URLs and confirm they are reachable and return audio/mpeg.

Build Verification

cd ../Dev-Knowledge-Base
npm run build

The build must pass for both zh-Hans and en.

Architecture Overview​

Component Relationship​

Key Files​

Manifest Structure​

OSS Paths​

Generation Strategy​

Generate Audio​

Generate All Chinese LookAround Audio​

Generate All English LookAround Audio​

Regenerate One Article​

Common Arguments​

Text Inspection​

Sync the Manifest​

Player Integration​

Automatic Injection​

Manual Use​

Add Audio Support for Another Docs Category​

Troubleshooting and Verification​

Local Duration Check​

OSS URL Check​

Build Verification​

Test Checklist​