Skip to main content

Docs Audio Workflow (Docs TTS)

Documentation categories such as docs/LookAround/ support TTS narration audio. Unlike blog audio, the docs audio player is automatically injected into the document layout, so individual MDX files do not need to import a player component.

Current docs category with audio enabled:

CategoryDirectoryArticle CountChinese AudioEnglish Audio
LookArounddocs/LookAround/8GeneratedGenerated

Architecture Overview

Docs audio reuses the blog audio player while keeping a separate docs manifest and separate OSS path. This preserves the same playback experience without mixing blog and docs manifest entries.

Component Relationship

BlogAudioPlayer (core playback logic)
├── BlogPostPage auto-injects: <BlogAudioPlayer slug={audioSlug} />
│ ├── slug derived from permalink via getBlogAudioSlug(metadata)
│ └── Reads blogAudioManifest.json by default

└── DocsAudioPlayer (thin wrapper)
└── <DocsAudioPlayer slug="xxx" />
└── Passes docsAudioManifest.json + keyPrefix="docs/"

Key Files

FilePurpose
src/components/BlogAudioPlayer/index.jsCore player component with manifest and keyPrefix props
src/components/DocsAudioPlayer/index.jsDocs audio wrapper that passes the docs manifest and prefix
src/theme/BlogPostPage/index.jsBlog page layout swizzle that auto-injects the blog player
src/utils/blogAudio.jsDerives blog audio slug from permalink
src/utils/lookAroundDocs.jsLookAround document detection helpers
src/theme/DocItem/Layout/index.jsDocument layout that auto-injects the docs player
src/data/blogAudioManifest.jsonBlog audio manifest imported by the site
src/data/docsAudioManifest.jsonDocs audio manifest imported by the site
static/audio/blog/manifest.jsonBackup blog manifest access path
static/audio/docs/manifest.jsonBackup docs manifest access path

Manifest Structure

docsAudioManifest.json uses these keys:

  • Chinese: docs/{slug}, for example docs/omega-horizontal-vertical-analysis
  • English: en/docs/{slug}, for example en/docs/omega-horizontal-vertical-analysis

Each entry contains:

{
"docs/omega-horizontal-vertical-analysis": {
"urls": [
"https://oss.nevergpdzy.com/Audio/docs/omega-horizontal-vertical-analysis_001.mp3",
"https://oss.nevergpdzy.com/Audio/docs/omega-horizontal-vertical-analysis_002.mp3"
],
"voice": "茉莉",
"generatedAt": "2026-05-02T06:10:34.705519+00:00"
}
}

OSS Paths

ContentOSS Path
Chinese docs audioAudio/docs/{slug}.mp3 or Audio/docs/{slug}_001.mp3
English docs audioAudio/docs/en/en_{slug}.mp3 or Audio/docs/en/en_{slug}_001.mp3
Docs manifestAudio/docs/manifest.json

Public URLs in docs audio manifests must use https://oss.nevergpdzy.com/; do not reintroduce the retired picture domain into src/data/docsAudioManifest.json or static/audio/docs/manifest.json.

These paths are independent from blog audio under Audio/blog/.

Generation Strategy

Docs articles are usually longer than blog posts and often contain tables, references, and image resource sections. The generator's docs mode handles this by:

  1. Extracting readable text while removing frontmatter, code blocks, JSX, images, raw URLs, source/reference sections, and image-resource appendices.
  2. Converting Markdown tables before normal Markdown cleanup so rows become readable sentences instead of pipe-delimited text.
  3. Splitting at paragraph boundaries, with a default --chunk-char-limit 2400; reduce this to values such as 1200 for problematic long Chinese articles.
  4. Applying concurrency only at the article level. --article-jobs can process multiple articles in parallel, but chunks inside one article always run serially so audio order strictly follows text order.
  5. Validating each generated chunk with ffprobe; suspiciously short or truncated audio is deleted and retried.
  6. Refusing to publish a manifest when any article in the run fails, which prevents partially missing audio from becoming the live manifest.

Chinese audio uses the 茉莉 voice with a slow, gentle, soothing prompt and natural pauses. English audio uses Chloe with the original natural, fluent narration prompt. Do not use atempo or other MP3 post-processing for speed changes; pacing should come from the TTS prompt and stable chunking.

Generate Audio

Generate All Chinese LookAround Audio

cd ../tts-blog-generator
python generate.py --type docs --lang zh --force --article-jobs 2

By default, docs mode scans ../Dev-Knowledge-Base/docs/LookAround.

Generate All English LookAround Audio

cd ../tts-blog-generator
python generate.py --type docs --lang en \
--blog-dir "../Dev-Knowledge-Base/i18n/en/docusaurus-plugin-content-docs/current/LookAround" \
--force --article-jobs 2

English audio uses the Chloe voice, en_ filenames, and the OSS directory Audio/docs/en/.

Regenerate One Article

cd ../tts-blog-generator

# Chinese
python generate.py --type docs --lang zh \
--include mercedes-benz-g-class-history-category-industry-position \
--force --article-jobs 1

# English
python generate.py --type docs --lang en \
--blog-dir "../Dev-Knowledge-Base/i18n/en/docusaurus-plugin-content-docs/current/LookAround" \
--include mercedes-benz-g-class-history-category-industry-position \
--force --article-jobs 1

If a long Chinese article produces garbled-sounding or suspicious audio around a specific timestamp, inspect the extracted text first, then regenerate with a smaller chunk size:

python generate.py --type docs --lang zh \
--include mercedes-benz-g-class-history-category-industry-position \
--force --article-jobs 1 --chunk-char-limit 1200

The G-Class Chinese audio was regenerated this way: 15 smaller chunks replaced the original 7 long chunks.

Common Arguments

ArgumentMeaning
--type docsGenerate docs audio; defaults to docs/LookAround/
--lang zh / --lang enSelect language
--blog-dir <path>Override source directory; required for English docs mirror
--include <slug>Process only matching slug or filename
--forceRegenerate and remove stale manifest entries for target articles
--article-jobs <n>Article-level concurrency; chunks within one article remain serial
--chunk-char-limit <n>Maximum characters per TTS chunk
--dry-runPreview chunk mapping without API calls
--skip-uploadSkip OSS upload

Text Inspection

Before regenerating a problematic article, inspect the exact text that will be sent to TTS:

cd ../tts-blog-generator

python - <<'PY'
from pathlib import Path
import generate

slug = "mercedes-benz-g-class-history-category-industry-position"
path = Path("../Dev-Knowledge-Base/docs/LookAround") / f"{slug}.md"
text = generate.extract_text(path)
chunks = generate.chunk_text(text, limit=1200)

Path("output").mkdir(exist_ok=True)
Path("output/check_zh_gclass.txt").write_text(
"\n\n".join(
f"=== chunk {i:03d} / {len(chunks)} ({len(chunk)} chars) ===\n{chunk}"
for i, chunk in enumerate(chunks, 1)
),
encoding="utf-8",
)
print(len(text), [len(chunk) for chunk in chunks])
PY

Check for:

  • No mojibake markers such as , , , or .
  • No raw URLs, Markdown images, table separators, or reference appendices.
  • Chunk numbers, filenames, and text order line up.
  • Tables have been converted into readable sentences.

Windows PowerShell may display Chinese files as mojibake when using Get-Content; trust Python reads with encoding="utf-8".

Sync the Manifest

The generator writes:

  • output/manifest.json: combined blog and docs manifest.
  • output/docs-manifest.json: docs-only manifest for this site.
  • output/blog-manifest.json: blog-only manifest.

After docs generation, copy the docs-only manifest:

cp output/docs-manifest.json ../Dev-Knowledge-Base/src/data/docsAudioManifest.json
cp output/docs-manifest.json ../Dev-Knowledge-Base/static/audio/docs/manifest.json

Do not copy output/manifest.json into docsAudioManifest.json; it may include blog entries.

Player Integration

Automatic Injection

The docs player is injected in src/theme/DocItem/Layout/index.js.

import {isLookAroundDocMetadata, getLookAroundDocSlug} from '@site/src/utils/lookAroundDocs';
import DocsAudioPlayer from '@site/src/components/DocsAudioPlayer';

const isLookAround = isLookAroundDocMetadata(metadata);
const lookAroundSlug = isLookAround ? getLookAroundDocSlug(metadata) : null;

<DocBreadcrumbs />
<DocVersionBadge />
{isLookAround && lookAroundSlug && <DocsAudioPlayer slug={lookAroundSlug} />}
{docTOC.mobile}

isLookAroundDocMetadata() checks metadata.sourceDirName === 'LookAround'.

Manual Use

For a specific MDX file, the player can also be placed manually:

import DocsAudioPlayer from '@site/src/components/DocsAudioPlayer';

<DocsAudioPlayer slug="your-doc-slug" />

The slug is the filename without extension.

Add Audio Support for Another Docs Category

  1. Add a category detection helper under src/utils/, following lookAroundDocs.js.
  2. Import it in src/theme/DocItem/Layout/index.js and add conditional rendering.
  3. Generate Chinese audio with python generate.py --type docs --blog-dir "../Dev-Knowledge-Base/docs/YourCategory".
  4. Generate English audio from the mirrored i18n/en/ directory.
  5. Merge or copy output/docs-manifest.json into both docs manifest files.

If the new category needs a different OSS prefix, update the docs OSS key logic in generate.py and keep the player keyPrefix convention aligned.

Troubleshooting and Verification

Local Duration Check

cd ../tts-blog-generator
ffprobe -v error -show_entries format=duration,size -of json output/your-file.mp3

If a long Chinese chunk is only a few dozen seconds or much shorter than similarly sized chunks, treat it as a truncated TTS result. Delete it and retry, usually with a smaller --chunk-char-limit.

OSS URL Check

After adding or regenerating audio, probe the target article's manifest URLs and confirm they are reachable and return audio/mpeg.

Build Verification

cd ../Dev-Knowledge-Base
npm run build

The build must pass for both zh-Hans and en.

Test Checklist

  • Extracted text was inspected with UTF-8 and contains no mojibake or non-narrative material.
  • Chinese audio was generated with python generate.py --type docs --lang zh ....
  • English audio was generated with python generate.py --type docs --lang en --blog-dir "...".
  • Long Chinese articles use a smaller --chunk-char-limit when needed.
  • docsAudioManifest.json was synced from output/docs-manifest.json into both src/data/ and static/audio/docs/.
  • src/data/docsAudioManifest.json and static/audio/docs/manifest.json are identical.
  • All target article OSS audio URLs are reachable.
  • npm run build passes.
  • The player appears below breadcrumbs on LookAround articles.
  • Playback starts correctly.
  • Multi-chunk audio advances in _001, _002 order.
  • Progress bar total duration is reasonable.
  • Playback-rate switching works.
  • Keyboard shortcuts work: Space to play/pause, ←/→ to seek ±5s, Shift+←/→ to seek ±30s.
  • Keyboard shortcuts do not interfere when typing in search, comments, or other input fields.
  • Non-LookAround docs do not show the player.
  • Locale switching loads the matching language audio.
  • OSS anti-hotlinking allows production and localhost domains.