Docs Audio Workflow (Docs TTS)
Documentation categories such as docs/LookAround/ support TTS narration audio. Unlike blog audio, the docs audio player is automatically injected into the document layout, so individual MDX files do not need to import a player component.
Current docs category with audio enabled:
| Category | Directory | Article Count | Chinese Audio | English Audio |
|---|---|---|---|---|
| LookAround | docs/LookAround/ | 8 | Generated | Generated |
Architecture Overview
Docs audio reuses the blog audio player while keeping a separate docs manifest and separate OSS path. This preserves the same playback experience without mixing blog and docs manifest entries.
Component Relationship
BlogAudioPlayer (core playback logic)
├── BlogPostPage auto-injects: <BlogAudioPlayer slug={audioSlug} />
│ ├── slug derived from permalink via getBlogAudioSlug(metadata)
│ └── Reads blogAudioManifest.json by default
│
└── DocsAudioPlayer (thin wrapper)
└── <DocsAudioPlayer slug="xxx" />
└── Passes docsAudioManifest.json + keyPrefix="docs/"
Key Files
| File | Purpose |
|---|---|
src/components/BlogAudioPlayer/index.js | Core player component with manifest and keyPrefix props |
src/components/DocsAudioPlayer/index.js | Docs audio wrapper that passes the docs manifest and prefix |
src/theme/BlogPostPage/index.js | Blog page layout swizzle that auto-injects the blog player |
src/utils/blogAudio.js | Derives blog audio slug from permalink |
src/utils/lookAroundDocs.js | LookAround document detection helpers |
src/theme/DocItem/Layout/index.js | Document layout that auto-injects the docs player |
src/data/blogAudioManifest.json | Blog audio manifest imported by the site |
src/data/docsAudioManifest.json | Docs audio manifest imported by the site |
static/audio/blog/manifest.json | Backup blog manifest access path |
static/audio/docs/manifest.json | Backup docs manifest access path |
Manifest Structure
docsAudioManifest.json uses these keys:
- Chinese:
docs/{slug}, for exampledocs/omega-horizontal-vertical-analysis - English:
en/docs/{slug}, for exampleen/docs/omega-horizontal-vertical-analysis
Each entry contains:
{
"docs/omega-horizontal-vertical-analysis": {
"urls": [
"https://oss.nevergpdzy.com/Audio/docs/omega-horizontal-vertical-analysis_001.mp3",
"https://oss.nevergpdzy.com/Audio/docs/omega-horizontal-vertical-analysis_002.mp3"
],
"voice": "茉莉",
"generatedAt": "2026-05-02T06:10:34.705519+00:00"
}
}
OSS Paths
| Content | OSS Path |
|---|---|
| Chinese docs audio | Audio/docs/{slug}.mp3 or Audio/docs/{slug}_001.mp3 |
| English docs audio | Audio/docs/en/en_{slug}.mp3 or Audio/docs/en/en_{slug}_001.mp3 |
| Docs manifest | Audio/docs/manifest.json |
Public URLs in docs audio manifests must use https://oss.nevergpdzy.com/; do not reintroduce the retired picture domain into src/data/docsAudioManifest.json or static/audio/docs/manifest.json.
These paths are independent from blog audio under Audio/blog/.
Generation Strategy
Docs articles are usually longer than blog posts and often contain tables, references, and image resource sections. The generator's docs mode handles this by:
- Extracting readable text while removing frontmatter, code blocks, JSX, images, raw URLs, source/reference sections, and image-resource appendices.
- Converting Markdown tables before normal Markdown cleanup so rows become readable sentences instead of pipe-delimited text.
- Splitting at paragraph boundaries, with a default
--chunk-char-limit 2400; reduce this to values such as1200for problematic long Chinese articles. - Applying concurrency only at the article level.
--article-jobscan process multiple articles in parallel, but chunks inside one article always run serially so audio order strictly follows text order. - Validating each generated chunk with
ffprobe; suspiciously short or truncated audio is deleted and retried. - Refusing to publish a manifest when any article in the run fails, which prevents partially missing audio from becoming the live manifest.
Chinese audio uses the 茉莉 voice with a slow, gentle, soothing prompt and natural pauses. English audio uses Chloe with the original natural, fluent narration prompt. Do not use atempo or other MP3 post-processing for speed changes; pacing should come from the TTS prompt and stable chunking.
Generate Audio
Generate All Chinese LookAround Audio
cd ../tts-blog-generator
python generate.py --type docs --lang zh --force --article-jobs 2
By default, docs mode scans ../Dev-Knowledge-Base/docs/LookAround.
Generate All English LookAround Audio
cd ../tts-blog-generator
python generate.py --type docs --lang en \
--blog-dir "../Dev-Knowledge-Base/i18n/en/docusaurus-plugin-content-docs/current/LookAround" \
--force --article-jobs 2
English audio uses the Chloe voice, en_ filenames, and the OSS directory Audio/docs/en/.
Regenerate One Article
cd ../tts-blog-generator
# Chinese
python generate.py --type docs --lang zh \
--include mercedes-benz-g-class-history-category-industry-position \
--force --article-jobs 1
# English
python generate.py --type docs --lang en \
--blog-dir "../Dev-Knowledge-Base/i18n/en/docusaurus-plugin-content-docs/current/LookAround" \
--include mercedes-benz-g-class-history-category-industry-position \
--force --article-jobs 1
If a long Chinese article produces garbled-sounding or suspicious audio around a specific timestamp, inspect the extracted text first, then regenerate with a smaller chunk size:
python generate.py --type docs --lang zh \
--include mercedes-benz-g-class-history-category-industry-position \
--force --article-jobs 1 --chunk-char-limit 1200
The G-Class Chinese audio was regenerated this way: 15 smaller chunks replaced the original 7 long chunks.
Common Arguments
| Argument | Meaning |
|---|---|
--type docs | Generate docs audio; defaults to docs/LookAround/ |
--lang zh / --lang en | Select language |
--blog-dir <path> | Override source directory; required for English docs mirror |
--include <slug> | Process only matching slug or filename |
--force | Regenerate and remove stale manifest entries for target articles |
--article-jobs <n> | Article-level concurrency; chunks within one article remain serial |
--chunk-char-limit <n> | Maximum characters per TTS chunk |
--dry-run | Preview chunk mapping without API calls |
--skip-upload | Skip OSS upload |
Text Inspection
Before regenerating a problematic article, inspect the exact text that will be sent to TTS:
cd ../tts-blog-generator
python - <<'PY'
from pathlib import Path
import generate
slug = "mercedes-benz-g-class-history-category-industry-position"
path = Path("../Dev-Knowledge-Base/docs/LookAround") / f"{slug}.md"
text = generate.extract_text(path)
chunks = generate.chunk_text(text, limit=1200)
Path("output").mkdir(exist_ok=True)
Path("output/check_zh_gclass.txt").write_text(
"\n\n".join(
f"=== chunk {i:03d} / {len(chunks)} ({len(chunk)} chars) ===\n{chunk}"
for i, chunk in enumerate(chunks, 1)
),
encoding="utf-8",
)
print(len(text), [len(chunk) for chunk in chunks])
PY
Check for:
- No mojibake markers such as
�,鈥,盲, or鏂. - No raw URLs, Markdown images, table separators, or reference appendices.
- Chunk numbers, filenames, and text order line up.
- Tables have been converted into readable sentences.
Windows PowerShell may display Chinese files as mojibake when using Get-Content; trust Python reads with encoding="utf-8".
Sync the Manifest
The generator writes:
output/manifest.json: combined blog and docs manifest.output/docs-manifest.json: docs-only manifest for this site.output/blog-manifest.json: blog-only manifest.
After docs generation, copy the docs-only manifest:
cp output/docs-manifest.json ../Dev-Knowledge-Base/src/data/docsAudioManifest.json
cp output/docs-manifest.json ../Dev-Knowledge-Base/static/audio/docs/manifest.json
Do not copy output/manifest.json into docsAudioManifest.json; it may include blog entries.
Player Integration
Automatic Injection
The docs player is injected in src/theme/DocItem/Layout/index.js.
import {isLookAroundDocMetadata, getLookAroundDocSlug} from '@site/src/utils/lookAroundDocs';
import DocsAudioPlayer from '@site/src/components/DocsAudioPlayer';
const isLookAround = isLookAroundDocMetadata(metadata);
const lookAroundSlug = isLookAround ? getLookAroundDocSlug(metadata) : null;
<DocBreadcrumbs />
<DocVersionBadge />
{isLookAround && lookAroundSlug && <DocsAudioPlayer slug={lookAroundSlug} />}
{docTOC.mobile}
isLookAroundDocMetadata() checks metadata.sourceDirName === 'LookAround'.
Manual Use
For a specific MDX file, the player can also be placed manually:
import DocsAudioPlayer from '@site/src/components/DocsAudioPlayer';
<DocsAudioPlayer slug="your-doc-slug" />
The slug is the filename without extension.
Add Audio Support for Another Docs Category
- Add a category detection helper under
src/utils/, followinglookAroundDocs.js. - Import it in
src/theme/DocItem/Layout/index.jsand add conditional rendering. - Generate Chinese audio with
python generate.py --type docs --blog-dir "../Dev-Knowledge-Base/docs/YourCategory". - Generate English audio from the mirrored
i18n/en/directory. - Merge or copy
output/docs-manifest.jsoninto both docs manifest files.
If the new category needs a different OSS prefix, update the docs OSS key logic in generate.py and keep the player keyPrefix convention aligned.
Troubleshooting and Verification
Local Duration Check
cd ../tts-blog-generator
ffprobe -v error -show_entries format=duration,size -of json output/your-file.mp3
If a long Chinese chunk is only a few dozen seconds or much shorter than similarly sized chunks, treat it as a truncated TTS result. Delete it and retry, usually with a smaller --chunk-char-limit.
OSS URL Check
After adding or regenerating audio, probe the target article's manifest URLs and confirm they are reachable and return audio/mpeg.
Build Verification
cd ../Dev-Knowledge-Base
npm run build
The build must pass for both zh-Hans and en.
Test Checklist
- Extracted text was inspected with UTF-8 and contains no mojibake or non-narrative material.
- Chinese audio was generated with
python generate.py --type docs --lang zh .... - English audio was generated with
python generate.py --type docs --lang en --blog-dir "...". - Long Chinese articles use a smaller
--chunk-char-limitwhen needed. -
docsAudioManifest.jsonwas synced fromoutput/docs-manifest.jsoninto bothsrc/data/andstatic/audio/docs/. -
src/data/docsAudioManifest.jsonandstatic/audio/docs/manifest.jsonare identical. - All target article OSS audio URLs are reachable.
-
npm run buildpasses. - The player appears below breadcrumbs on LookAround articles.
- Playback starts correctly.
- Multi-chunk audio advances in
_001,_002order. - Progress bar total duration is reasonable.
- Playback-rate switching works.
- Keyboard shortcuts work: Space to play/pause, ←/→ to seek ±5s, Shift+←/→ to seek ±30s.
- Keyboard shortcuts do not interfere when typing in search, comments, or other input fields.
- Non-LookAround docs do not show the player.
- Locale switching loads the matching language audio.
- OSS anti-hotlinking allows production and localhost domains.