Skip to content

Instantly share code, notes, and snippets.

@johnlindquist
Created October 31, 2025 19:23
Show Gist options
  • Save johnlindquist/b835e1576045140f7417d64a44ace7d8 to your computer and use it in GitHub Desktop.
Save johnlindquist/b835e1576045140f7417d64a44ace7d8 to your computer and use it in GitHub Desktop.
import "@johnlindquist/kit"
import { GoogleGenAI, createUserContent, createPartFromUri } from '@google/genai';
import { createHash } from 'node:crypto';
import { readFile, writeFile, stat } from 'node:fs/promises';
// ─── Configuration ───────────────────────────────────────────────────────────
console.log("πŸ”§ Initializing configuration...");
const GEMINI_API_KEY = await env("GEMINI_API_KEY");
// Use Pro for detailed, high-quality extraction
const MODEL_EXTRACT = "gemini-2.5-flash";
// Use Flash-Lite for cost-effective context compression
const MODEL_COMPRESS = "gemini-2.5-flash-lite";
const CHUNK_DURATION_SECONDS = 600; // 10 minutes
const OUT_DIR = kenvPath("workshop-harvest-sequential");
const CACHE_DIR = path.join(OUT_DIR, ".cache");
const MAX_EXTRACTION_ATTEMPTS = 3;
const EXTRACTION_MAX_OUTPUT_TOKENS = [4096, 3072, 2048];
// NEW: Silence Detection Configuration
const MINIMUM_SILENCE_DURATION = 120; // 2 minutes (as requested)
// Noise level threshold. -50dB is a common starting point for relatively quiet recordings.
// If breaks are not detected, try lowering this (e.g., -60dB). If normal speech is detected as silence, try raising it (e.g., -40dB).
const SILENCE_THRESHOLD_DB = "-50dB";
console.log(`πŸ“‹ Configuration loaded:`);
console.log(` - Extraction Model: ${MODEL_EXTRACT}`);
console.log(` - Compression Model: ${MODEL_COMPRESS}`);
console.log(` - Chunk Duration: ${CHUNK_DURATION_SECONDS}s (${CHUNK_DURATION_SECONDS / 60} minutes)`);
console.log(` - Silence Detection: Skip silences > ${MINIMUM_SILENCE_DURATION}s at ${SILENCE_THRESHOLD_DB}`);
console.log(` - Output Directory: ${OUT_DIR}`);
// ──────────────────────────────────────────────────────────────────────────────
if (!GEMINI_API_KEY) {
console.error("❌ ERROR: GEMINI_API_KEY environment variable not set");
await div(md(`❌ ERROR: GEMINI_API_KEY environment variable not set.`));
exit();
}
console.log("βœ… GEMINI_API_KEY found");
console.log("πŸ€– Initializing Gemini AI client...");
const ai = new GoogleGenAI({ apiKey: GEMINI_API_KEY });
console.log("βœ… Gemini AI client initialized");
// Ensure ffmpeg is installed
async function ensureFfmpeg() {
console.log("πŸ” Checking for ffmpeg installation...");
try {
const ffmpegVersion = await execa("ffmpeg", ["-version"]);
console.log("βœ… ffmpeg found");
console.log(` Version info: ${ffmpegVersion.stdout.split('\n')[0]}`);
}
catch {
console.error("❌ ERROR: Missing dependency: ffmpeg");
await div(md(`❌ ERROR: Missing dependency: ffmpeg. Please install it (e.g., brew install ffmpeg)`));
exit();
}
}
await ensureFfmpeg();
await ensureDir(CACHE_DIR);
// Helper function to get video duration using ffprobe
async function getVideoDuration(videoPath: string): Promise<number> {
console.log("πŸ“ Getting video duration...");
try {
const { stdout } = await execa("ffprobe", [
"-v", "error",
"-show_entries", "format=duration",
"-of", "default=noprint_wrappers=1:nokey=1",
videoPath
]);
const duration = parseFloat(stdout.trim());
console.log(`βœ… Video duration: ${duration.toFixed(2)}s (${(duration / 60).toFixed(2)} minutes)`);
return duration;
} catch (error) {
console.error(`❌ ERROR: Failed to get video duration`);
console.error(` Error: ${error.message}`);
throw new Error("Failed to get video duration.");
}
}
// ─── Silence Detection Helpers (NEW) ─────────────────────────────────────────
/**
* Uses FFmpeg's silencedetect filter to identify long periods of silence.
* This analyzes the audio track without re-encoding the video.
*/
async function detectSignificantSilences(videoPath, minDuration, threshold) {
console.log(`\n🀫 [Silence Detection] Analyzing video for silences longer than ${minDuration}s at ${threshold}...`);
console.log(` This analyzes the audio track and is relatively fast compared to video processing.`);
// Display a message in the Kit interface as this runs
const silences = [];
// Regex to capture the output from FFmpeg's silencedetect filter
const silenceStartRegex = /silence_start: ([\d.]+)/;
const silenceEndRegex = /silence_end: ([\d.]+) \| silence_duration: ([\d.]+)/;
let currentSilence = null;
try {
const detectionStartTime = Date.now();
// Run FFmpeg with silencedetect filter. We discard the output (-f null -).
// The detection results are printed to stderr.
const proc = execa("ffmpeg", [
"-hide_banner",
"-i", videoPath,
"-af", `silencedetect=n=${threshold}:d=${minDuration}`,
"-f", "null",
"-"
]);
// We read the stderr stream in real-time to capture the detection output robustly
proc.stderr.on('data', (data) => {
const lines = data.toString().split('\n');
for (const line of lines) {
// Filter lines relevant to silence detection
if (line.includes('[silencedetect')) {
const startMatch = line.match(silenceStartRegex);
const endMatch = line.match(silenceEndRegex);
if (startMatch) {
if (currentSilence) {
console.warn(` ⚠️ [Silence Detection] Found new silence start before previous end. Dropping previous.`);
}
currentSilence = { start: parseFloat(startMatch[1]) };
} else if (endMatch && currentSilence) {
currentSilence.end = parseFloat(endMatch[1]);
currentSilence.duration = parseFloat(endMatch[2]);
silences.push(currentSilence);
console.log(` 🀫 Found silence: Start=${currentSilence.start.toFixed(2)}s, End=${currentSilence.end.toFixed(2)}s, Duration=${currentSilence.duration.toFixed(2)}s`);
currentSilence = null;
}
}
}
});
await proc; // Wait for the FFmpeg process to finish
const detectionDuration = ((Date.now() - detectionStartTime) / 1000).toFixed(2);
console.log(`βœ… [Silence Detection] Analysis complete in ${detectionDuration}s.`);
console.log(` Found ${silences.length} significant periods of silence.`);
await hide(); // Hide the Kit interface message
return silences;
} catch (error) {
// FFmpeg often returns a non-zero exit code when using -f null, even if the detection worked.
// We check if the process completed (even with an error code).
if (error.killed === false) {
console.log(`⚠️ [Silence Detection] FFmpeg completed with exit code ${error.exitCode} (This is often expected when using -f null).`);
console.log(`βœ… [Silence Detection] Analysis finished.`);
console.log(` Found ${silences.length} significant periods of silence.`);
await hide();
return silences;
}
console.error(`❌ [Silence Detection] Failed to analyze audio for silence.`);
console.error(` Error: ${error.message}`);
if (error.stderr) {
console.error(` FFmpeg stderr (last 20 lines): \n${error.stderr.split('\n').slice(-20).join('\n')}`);
}
await hide();
throw new Error("Silence detection failed.");
}
}
/**
* Calculates the inverse of the silence periods to determine the active segments.
*/
function calculateActiveSegments(videoDuration, silences) {
console.log(`\nπŸ“Š [Segmentation] Calculating active (non-silent) segments...`);
const activeSegments = [];
let currentPosition = 0;
// Ensure silences are sorted by start time
silences.sort((a, b) => a.start - b.start);
// Iterate through detected silences and define the segments in between them
for (const silence of silences) {
// Check if there is a gap between the current position and the start of the silence
// Use a small tolerance (0.1s) to avoid creating tiny segments due to floating point inaccuracies
if (silence.start > currentPosition + 0.1) {
// There is an active segment before this silence starts
activeSegments.push({
start: currentPosition,
end: silence.start,
duration: silence.start - currentPosition
});
console.log(` πŸƒ Active Segment: Start=${currentPosition.toFixed(2)}s, End=${silence.start.toFixed(2)}s`);
}
// Move the position to the end of the silence
currentPosition = Math.max(currentPosition, silence.end);
}
// Check for an active segment after the last silence until the end of the video
if (videoDuration > currentPosition + 0.1) {
activeSegments.push({
start: currentPosition,
end: videoDuration,
duration: videoDuration - currentPosition
});
console.log(` πŸƒ Active Segment: Start=${currentPosition.toFixed(2)}s, End=${videoDuration.toFixed(2)}s (End of Video)`);
}
const totalActiveDuration = activeSegments.reduce((sum, segment) => sum + segment.duration, 0);
const totalSilentDuration = videoDuration - totalActiveDuration;
console.log(`βœ… [Segmentation] Calculation complete.`);
console.log(` Total Video Duration: ${(videoDuration / 60).toFixed(2)} minutes`);
console.log(` Total Active Duration: ${(totalActiveDuration / 60).toFixed(2)} minutes (to be processed)`);
console.log(` Total Silence Skipped: ${(totalSilentDuration / 60).toFixed(2)} minutes`);
return activeSegments;
}
function logCachedActiveSegments(activeSegments, videoDuration) {
console.log(`\nπŸ“Š [Segmentation] Using cached active (non-silent) segments...`);
let totalActiveDuration = 0;
activeSegments.forEach((segment, index) => {
totalActiveDuration += segment.duration;
const isEndOfVideo = Math.abs(segment.end - videoDuration) < 0.1;
const endLabel = isEndOfVideo ? " (End of Video)" : "";
console.log(` πŸƒ Active Segment: Start=${segment.start.toFixed(2)}s, End=${segment.end.toFixed(2)}s${endLabel}`);
});
const totalSilentDuration = videoDuration - totalActiveDuration;
console.log(`βœ… [Segmentation] Cached calculation previously completed.`);
console.log(` Total Video Duration: ${(videoDuration / 60).toFixed(2)} minutes`);
console.log(` Total Active Duration: ${(totalActiveDuration / 60).toFixed(2)} minutes (to be processed)`);
console.log(` Total Silence Skipped: ${(totalSilentDuration / 60).toFixed(2)} minutes`);
}
function buildProcessingChunks(activeSegments) {
console.log("πŸ“Š Calculating processing chunks based on active segments...");
const chunks = [];
let chunkIndex = 1;
for (let i = 0; i < activeSegments.length; i++) {
const segment = activeSegments[i];
console.log(` Processing Active Segment ${i + 1}/${activeSegments.length} (Duration: ${segment.duration.toFixed(2)}s)`);
let segmentPosition = 0;
while (segmentPosition < segment.duration) {
const duration = Math.min(CHUNK_DURATION_SECONDS, segment.duration - segmentPosition);
if (duration < 1) {
segmentPosition += duration;
continue;
}
const startTimeInVideo = segment.start + segmentPosition;
const chunk = {
index: chunkIndex,
activeSegmentIndex: i + 1,
startTime: startTimeInVideo,
duration: duration,
name: `chunk_${String(chunkIndex).padStart(4, '0')}`
};
chunks.push(chunk);
segmentPosition += duration;
chunkIndex++;
}
}
console.log(`βœ… Calculated ${chunks.length} total chunks for processing.`);
return chunks;
}
// ──────────────────────────────────────────────────────────────────────────────
// 1. Select Video
async function selectVideo() {
console.log("πŸ“Ή Prompting for video file selection...");
const videoPath = await path({
hint: "Select your workshop .mp4",
})
if (!videoPath || !videoPath.endsWith(".mp4")) {
console.error(`❌ Invalid file selected: ${videoPath}`);
await div(md(`❌ Please select a valid .mp4 file.`));
exit();
}
console.log(`βœ… Video selected: ${videoPath}`);
return videoPath;
}
const inputPath = await selectVideo();
// Get video duration
const videoDuration = await getVideoDuration(inputPath);
// Setup Job Directory
console.log("πŸ“ Setting up job directory...");
const jobName = path.parse(inputPath).name + "-spike-" + Date.now();
const jobDir = path.join(OUT_DIR, jobName);
const chunksDir = path.join(jobDir, "chunks");
console.log(` Job Name: ${jobName}`);
console.log(` Job Directory: ${jobDir}`);
console.log(` Chunks Directory: ${chunksDir}`);
await ensureDir(chunksDir);
console.log("βœ… Directories created");
// 2. Determine Chunks (REVISED LOGIC)
// 2a. Detect Silence and Calculate Active Segments
const videoStats = await stat(inputPath);
const preprocessingCacheKey = computePreprocessingCacheKey(inputPath, videoStats);
const preprocessingCachePath = path.join(CACHE_DIR, `${preprocessingCacheKey}.json`);
let silences;
let activeSegments;
let chunksToProcess;
let usingCachedPreprocessing = false;
if (await pathExists(preprocessingCachePath)) {
try {
const cachedRaw = await readFile(preprocessingCachePath, "utf-8");
const cached = JSON.parse(cachedRaw);
if (Array.isArray(cached?.silences) && Array.isArray(cached?.activeSegments) && Array.isArray(cached?.chunks)) {
silences = cached.silences;
activeSegments = cached.activeSegments;
chunksToProcess = cached.chunks;
usingCachedPreprocessing = true;
console.log(`♻️ [Cache] Loaded cached preprocessing results from ${preprocessingCachePath}`);
} else {
console.warn(`⚠️ [Cache] Cached preprocessing missing required fields. Recomputing...`);
}
} catch (error) {
console.warn(`⚠️ [Cache] Failed to read preprocessing cache (${error.message}). Recomputing...`);
}
}
if (!usingCachedPreprocessing) {
silences = await detectSignificantSilences(inputPath, MINIMUM_SILENCE_DURATION, SILENCE_THRESHOLD_DB);
activeSegments = calculateActiveSegments(videoDuration, silences);
if (!activeSegments || activeSegments.length === 0) {
console.error("❌ [Segmentation] No active segments found. The video might be entirely silent or the threshold is too strict.");
await div(md(`❌ Processing stopped. No active audio segments found. Check silence detection parameters (e.g., adjust SILENCE_THRESHOLD_DB) or input file.`));
exit();
}
chunksToProcess = buildProcessingChunks(activeSegments);
const cachePayload = {
version: 1,
createdAt: new Date().toISOString(),
video: {
path: inputPath,
size: videoStats.size,
mtimeMs: videoStats.mtimeMs,
},
config: {
chunkDurationSeconds: CHUNK_DURATION_SECONDS,
minimumSilenceDuration: MINIMUM_SILENCE_DURATION,
silenceThresholdDb: SILENCE_THRESHOLD_DB,
},
silences,
activeSegments,
chunks: chunksToProcess,
};
try {
await writeFile(preprocessingCachePath, JSON.stringify(cachePayload, null, 2));
console.log(`πŸ’Ύ [Cache] Saved preprocessing metadata to ${preprocessingCachePath}`);
} catch (error) {
console.warn(`⚠️ [Cache] Unable to persist preprocessing cache (${error.message}).`);
}
} else {
if (!activeSegments || activeSegments.length === 0) {
console.error("❌ [Segmentation] Cached active segments were empty. Delete the cache file and rerun.");
await div(md(`❌ Processing stopped. Cached preprocessing is empty. Delete \`${preprocessingCachePath}\` and rerun.`));
exit();
}
logCachedActiveSegments(activeSegments, videoDuration);
if (!chunksToProcess || chunksToProcess.length === 0) {
console.warn(`⚠️ [Cache] Cached chunk plan empty. Regenerating from cached segments...`);
chunksToProcess = buildProcessingChunks(activeSegments);
} else {
console.log(`♻️ [Cache] Reusing cached processing plan with ${chunksToProcess.length} chunks.`);
}
}
// 3. Sequential Processing Loop
let rollingHistory = "";
let immediatePreviousContext = "This is the very beginning of the workshop. The instructor is likely setting the stage and providing an introduction.";
const uploadedFiles = [];
try {
console.log(`\nπŸš€ Starting processing loop for ${chunksToProcess.length} chunks...\n`);
for (const chunk of chunksToProcess) {
const chunkStartTime = Date.now();
console.log(`\n${'='.repeat(80)}`);
console.log(`πŸ“¦ Processing Chunk ${chunk.index}/${chunksToProcess.length} (Active Segment ${chunk.activeSegmentIndex}): ${chunk.name}`);
console.log(` Start Time (in original video): ${chunk.startTime.toFixed(2)}s`);
console.log(` Duration: ${chunk.duration.toFixed(2)}s`);
console.log(`${'='.repeat(80)}\n`);
await hide(); // Hide the Kit interface during processing
// NEW: Context management across breaks
// Check if this chunk is the start of a new active segment (meaning a long silence occurred before it)
if (chunk.index > 1) { // Not the very first chunk
const previousChunk = chunksToProcess[chunk.index - 2]; // index is 1-based, array is 0-based
if (chunk.activeSegmentIndex !== previousChunk.activeSegmentIndex) {
console.log(`πŸ›‘ [Context] Transitioning across a break (silence). Updating context to signal discontinuity.`);
// Override the immediatePreviousContext. The detailed summary from the last processed chunk
// might not flow logically into this one after a long break (e.g. lunch).
// We provide a snippet of the previous context but prioritize the fact that a break occurred.
const previousContextSnapshot = immediatePreviousContext.length > 500 ? immediatePreviousContext.slice(-500) : immediatePreviousContext;
immediatePreviousContext = `
---
[CONTEXTUAL NOTE]: A long break or period of silence (e.g., lunch, 15-minute break, or exercise time) occurred immediately before this segment. The instructor is likely resuming the session or starting a new major topic.
[Conclusion of the topic just before the break]:
${previousContextSnapshot}
---`;
console.log(`βœ… [Context] Immediate context updated for discontinuity (${immediatePreviousContext.length} chars)`);
}
}
// 3a. Extract Chunk Locally (On-the-fly)
const chunkPath = path.join(chunksDir, `${chunk.name}.mp4`);
console.log(`πŸ“Ή [FFmpeg] Extracting segment locally...`);
console.log(` Input: ${inputPath}`);
console.log(` Output: ${chunkPath}`);
console.log(` Start: ${chunk.startTime}s, Duration: ${chunk.duration}s`);
const ffmpegStartTime = Date.now();
try {
// Use ffmpeg to cut the segment without re-encoding (-c copy). This is fast.
// We add '-avoid_negative_ts make_zero' to ensure the chunk's timestamps start near 0, improving compatibility.
console.log(` Running ffmpeg command...`);
await execa("ffmpeg", [
"-i", inputPath,
"-ss", String(chunk.startTime),
"-t", String(chunk.duration),
"-c", "copy",
"-avoid_negative_ts", "make_zero",
"-y", // Overwrite if exists
chunkPath
]);
const ffmpegDuration = ((Date.now() - ffmpegStartTime) / 1000).toFixed(2);
console.log(`βœ… [FFmpeg] Segment extracted successfully in ${ffmpegDuration}s`);
// Check file size
const stats = await stat(chunkPath);
const fileSizeMB = (stats.size / (1024 * 1024)).toFixed(2);
console.log(` File size: ${fileSizeMB} MB`);
} catch (error) {
console.error(`❌ [FFmpeg] Extraction failed for ${chunk.name}`);
console.error(` Error: ${error.message}`);
if (error.stdout) console.error(` Stdout: ${error.stdout}`);
if (error.stderr) console.error(` Stderr: ${error.stderr}`);
throw new Error("FFmpeg extraction failed.");
}
// 3b. Upload Chunk and Poll
console.log(`\n☁️ [Gemini] Starting file upload...`);
const uploadStartTime = Date.now();
const file = await uploadAndPoll(ai, chunkPath);
const uploadDuration = ((Date.now() - uploadStartTime) / 1000).toFixed(2);
console.log(`βœ… [Gemini] File uploaded and processed in ${uploadDuration}s`);
console.log(` File URI: ${file.uri}`);
console.log(` File Name: ${file.name}`);
uploadedFiles.push(file);
// 3c. Construct Prompt
let artifacts = null;
let newDetailedSummary = null;
let analysis = "";
let extractionAttempt = 0;
let lastRawOutputPath = "";
while (extractionAttempt < MAX_EXTRACTION_ATTEMPTS && (!artifacts || !newDetailedSummary)) {
extractionAttempt++;
if (extractionAttempt === 1) {
console.log(`\nπŸ“ [Prompt] Building extraction prompt...`);
} else {
console.log(`\nπŸ” [Retry] Regenerating extraction output (attempt ${extractionAttempt}/${MAX_EXTRACTION_ATTEMPTS})...`);
}
console.log(` Segment Index: ${chunk.index}`);
console.log(` Rolling History Length: ${rollingHistory.length} chars`);
console.log(` Immediate Context Length: ${immediatePreviousContext.length} chars`);
// Pass the chunk object so the prompt knows the precise duration
const prompt = buildExtractionPrompt(chunk, rollingHistory, immediatePreviousContext, extractionAttempt);
console.log(`βœ… [Prompt] Prompt built (${prompt.length} chars)`);
// 3d. Generate Content (Extraction)
console.log(`\n🧠 [Gemini] Starting content generation with ${MODEL_EXTRACT} (attempt ${extractionAttempt})...`);
console.log(` Model: ${MODEL_EXTRACT}`);
console.log(` Video URI: ${file.uri}`);
const generationStartTime = Date.now();
const maxOutputTokens = EXTRACTION_MAX_OUTPUT_TOKENS[Math.min(extractionAttempt - 1, EXTRACTION_MAX_OUTPUT_TOKENS.length - 1)];
console.log(` Max Output Tokens: ${maxOutputTokens}`);
const response = await ai.models.generateContent({
model: MODEL_EXTRACT,
generationConfig: {
maxOutputTokens,
temperature: 0.6,
topP: 0.9,
},
contents: createUserContent([
createPartFromUri(file.uri, file.mimeType),
prompt
])
});
const generationDuration = ((Date.now() - generationStartTime) / 1000).toFixed(2);
console.log(`βœ… [Gemini] Content generation completed in ${generationDuration}s`);
console.log(`\nπŸ“„ [Processing] Extracting response text...`);
analysis = response.text;
console.log(`βœ… [Processing] Response extracted (${analysis.length} chars)`);
// 3e. Parse Response (Using XML tags for robustness)
console.log(`\nπŸ” [Parsing] Extracting XML tags from response...`);
artifacts = extractTagContent(analysis, "ARTIFACTS");
newDetailedSummary = extractTagContent(analysis, "DETAILED_SUMMARY");
if (!artifacts || !newDetailedSummary) {
lastRawOutputPath = path.join(jobDir, `${chunk.name}-raw-error-attempt-${extractionAttempt}.txt`);
console.error(`❌ [Parsing] Failed to parse required tags from model output for ${chunk.name} (attempt ${extractionAttempt})`);
console.error(` Saving raw output for debugging: ${lastRawOutputPath}`);
await writeFile(lastRawOutputPath, analysis);
if (extractionAttempt < MAX_EXTRACTION_ATTEMPTS) {
console.log(` Preparing to retry with stricter constraints...`);
}
} else if (extractionAttempt > 1) {
console.log(`βœ… [Parsing] Required tags recovered on attempt ${extractionAttempt}`);
}
}
if (!artifacts || !newDetailedSummary) {
const failureMessage = lastRawOutputPath
? `Model output parsing failed after ${MAX_EXTRACTION_ATTEMPTS} attempts. See ${lastRawOutputPath}`
: "Model output parsing failed.";
throw new Error(failureMessage);
}
// 3f. Update Immediate Context
console.log(`\nπŸ’Ύ [Context] Updating immediate previous context...`);
immediatePreviousContext = newDetailedSummary;
console.log(`βœ… [Context] Immediate context updated (${immediatePreviousContext.length} chars)`);
// 3g. Compress Context (Crucial for Scalability)
console.log(`\nπŸ—œοΈ [Compression] Compressing rolling history with ${MODEL_COMPRESS}...`);
console.log(` Existing history length: ${rollingHistory.length} chars`);
console.log(` New summary length: ${newDetailedSummary.length} chars`);
const compressionStartTime = Date.now();
// Capture the length before compression for ratio calculation
const previousHistoryLength = rollingHistory.length;
rollingHistory = await compressContext(ai, rollingHistory, newDetailedSummary);
const compressionDuration = ((Date.now() - compressionStartTime) / 1000).toFixed(2);
console.log(`βœ… [Compression] History compressed in ${compressionDuration}s`);
console.log(` New history length: ${rollingHistory.length} chars`);
const inputLength = previousHistoryLength + newDetailedSummary.length;
const compressionRatio = inputLength > 0
? (100 - (rollingHistory.length / inputLength) * 100).toFixed(1)
: '0';
console.log(` Compression achieved: ${compressionRatio}% reduction`);
// 3h. Save Results
console.log(`\nπŸ’Ύ [File I/O] Saving analysis results...`);
const outputPath = path.join(jobDir, `${chunk.name}-analysis.md`);
// Format the artifacts for final output
let finalOutput = artifacts;
// Add video path and start time (using the original video's timestamp) below the title line
finalOutput = finalOutput.replace(/^title:.*$/m, (match) => `${match}\nvideo: ${inputPath}\nstart_time_seconds: ${chunk.startTime.toFixed(2)}`);
await writeFile(outputPath, finalOutput);
console.log(`βœ… [File I/O] Analysis saved to: ${outputPath}`);
// Save context state for review
const contextStatePath = path.join(jobDir, `${chunk.name}-context-state.md`);
await writeFile(contextStatePath, `## Rolling History (Compressed)\n${rollingHistory}\n\n## Immediate Context (Detailed Summary of this segment)\n${immediatePreviousContext}`);
console.log(`βœ… [File I/O] Context state saved to: ${contextStatePath}`);
const chunkDuration = ((Date.now() - chunkStartTime) / 1000).toFixed(2);
console.log(`\nβœ… [Complete] Finished processing ${chunk.name} in ${chunkDuration}s`);
console.log(` Progress: ${chunk.index}/${chunksToProcess.length} chunks completed\n`);
}
} catch (error) {
console.error(`\n❌ [ERROR] Processing failed!`);
console.error(` Error: ${error.message}`);
console.error(` Stack: ${error.stack}`);
console.error(` Output directory: ${jobDir}`);
await div(md(`# ❌ Processing Failed
An error occurred during processing:
\`\`\`
${error.message}
\`\`\`
Check the logs and the output directory: \`${jobDir}\``));
} finally {
// 3i. Cleanup (Optional)
// ... (Cleanup code remains the same if needed)
}
console.log(`\n${'='.repeat(80)}`);
console.log(`βœ… PROCESSING COMPLETE`);
console.log(`${'='.repeat(80)}`);
console.log(`πŸ“ Output Directory: ${jobDir}`);
console.log(`πŸ“Š Processed: ${chunksToProcess.length} chunks`);
console.log(`πŸ“ˆ Total uploaded files: ${uploadedFiles.length}`);
console.log(`\n**How to evaluate the results:**`);
console.log(`1. Review the ${jobDir}/*-analysis.md files. Check that 'start_time_seconds' correctly reflects the time in the original video and skips breaks.`);
console.log(`2. Review the ${jobDir}/*-context-state.md files to verify continuity, compression, and handling of transitions across breaks.`);
await div(md(`# βœ… Processing Complete
Output Directory: \`${jobDir}\`
**How to evaluate the results:**
1. Review the \`${jobDir}/*-analysis.md\` files. Check that \`start_time_seconds\` correctly reflects the time in the original video and skips breaks.
2. Review the \`${jobDir}/*-context-state.md\` files to verify that continuity is being maintained, the compression is effective, and transitions across breaks are handled correctly.`));
// ─── Helpers ─────────────────────────────────────────────────────────────────
/**
* Uploads a file to the Gemini Files API and polls until it is ACTIVE.
*/
async function uploadAndPoll(ai, filePath) {
const fileName = path.basename(filePath);
console.log(` ☁️ [Upload] Starting upload for: ${fileName}`);
// Check file size before upload
const stats = await stat(filePath);
const fileSizeMB = (stats.size / (1024 * 1024)).toFixed(2);
console.log(` πŸ“Š [Upload] File size: ${fileSizeMB} MB`);
const uploadStartTime = Date.now();
let file;
try {
file = await ai.files.upload({
file: filePath,
config: { mimeType: 'video/mp4' }
});
} catch (e) {
console.error(` ❌ [Upload] File upload failed: ${e.message}`);
throw new Error(`File upload failed for ${fileName}: ${e.message}`);
}
const uploadTime = ((Date.now() - uploadStartTime) / 1000).toFixed(2);
console.log(` βœ… [Upload] Upload initiated in ${uploadTime}s`);
console.log(` πŸ“‹ [Upload] File Name: ${file.name}`);
console.log(` πŸ“‹ [Upload] Initial State: ${file.state}`);
let attempts = 0;
const maxAttempts = 120; // Increased timeout (120 * 3s = 6 minutes)
const pollStartTime = Date.now();
console.log(` πŸ”„ [Polling] Waiting for file to become ACTIVE (max ${maxAttempts * 3}s)...`);
while (file.state !== 'ACTIVE' && attempts < maxAttempts) {
await new Promise(resolve => setTimeout(resolve, 3000)); // Wait 3 seconds
try {
file = await ai.files.get({ name: file.name });
} catch (e) {
console.warn(` ⚠️ [Polling] Failed to get file status, retrying... (${e.message})`);
// Continue polling even if the status check fails intermittently
}
attempts++;
if (attempts % 5 === 0) {
const elapsed = ((Date.now() - pollStartTime) / 1000).toFixed(0);
console.log(` πŸ”„ [Polling] Status check ${attempts}/${maxAttempts} (${elapsed}s elapsed): ${file.state}`);
}
if (file.state === 'FAILED') {
console.error(` ❌ [Polling] File processing failed: ${file.error?.message || 'Unknown error'}`);
throw new Error(`File processing failed: ${file.error?.message || 'Unknown error'}`);
}
}
if (file.state !== 'ACTIVE') {
const totalTime = ((Date.now() - pollStartTime) / 1000).toFixed(0);
console.error(` ❌ [Polling] File did not become active after ${totalTime}s`);
throw new Error(`File did not become active for ${fileName}`);
}
const totalPollTime = ((Date.now() - pollStartTime) / 1000).toFixed(2);
console.log(` βœ… [Polling] File is ACTIVE after ${totalPollTime}s`);
console.log(` πŸ“‹ [Polling] Final State: ${file.state}`);
return file;
}
/**
* Extracts content between specified XML tags.
*/
function extractTagContent(text, tagName) {
console.log(` πŸ” [Parse] Searching for <${tagName}> tag...`);
const regex = new RegExp(`<${tagName}>([\\s\\S]*?)<\/${tagName}>`, 'i');
const match = text.match(regex);
if (match) {
const content = match[1].trim();
console.log(` βœ… [Parse] Found <${tagName}> tag (${content.length} chars)`);
return content;
} else {
console.log(` ❌ [Parse] <${tagName}> tag not found in response`);
return null;
}
}
/**
* Uses the compression model to update the rolling history.
*/
async function compressContext(ai, existingHistory, newSummary) {
console.log(` πŸ“ [Compression] Building compression prompt...`);
const prompt = buildCompressionPrompt(existingHistory, newSummary);
console.log(` πŸ“ [Compression] Prompt length: ${prompt.length} chars`);
console.log(` πŸ€– [Compression] Calling ${MODEL_COMPRESS}...`);
try {
const compressionStartTime = Date.now();
const response = await ai.models.generateContent({
model: MODEL_COMPRESS,
contents: createUserContent([prompt])
});
const compressionTime = ((Date.now() - compressionStartTime) / 1000).toFixed(2);
console.log(` βœ… [Compression] Model response received in ${compressionTime}s`);
const compressed = response.text.trim();
console.log(` βœ… [Compression] Compressed text length: ${compressed.length} chars`);
return compressed;
} catch (error) {
console.error(` ❌ [Compression] Context compression failed!`);
console.error(` ❌ [Compression] Error: ${error.message}`);
console.error(` ⚠️ [Compression] Falling back to truncation method...`);
// Fallback: Concatenate and truncate if compression fails (safety measure)
const fallback = (`${existingHistory}\n\n---\n\n${newSummary}`).slice(-50000); // Keep the most recent 50k chars
console.log(` ⚠️ [Compression] Fallback result length: ${fallback.length} chars`);
return fallback;
}
}
// ─── Prompts ─────────────────────────────────────────────────────────────────
/**
* Prompt for the high-quality extraction model (Pro).
* Updated to take the chunk object to provide precise duration context.
*/
function buildExtractionPrompt(chunk, history, immediateContext, attempt = 1) {
const segmentIndex = chunk.index;
const approxDurationMins = (chunk.duration / 60).toFixed(1);
const retryContext = attempt === 1 ? "" : `
<RETRY_CONTEXT>
Previous output for this segment was invalid. Strictly follow <OUTPUT_RULES>, keep the response concise, and ensure <ARTIFACTS> and <DETAILED_SUMMARY> each appear exactly once with matching closing tags.
</RETRY_CONTEXT>`;
return `
You are analyzing a video segment (Segment ${segmentIndex}, approx. ${approxDurationMins} minutes) of a technical workshop. Your goal is to extract detailed lesson artifacts and provide a summary for the next segment.
<CONTEXT>
Use the following context to understand the flow, resolve references (e.g., "like I showed earlier"), and maintain continuity.
<ROLLING_HISTORY>
(Concise overview of the workshop up to Segment ${segmentIndex - 1})
${history || "N/A (First Segment)"}
</ROLLING_HISTORY>
<IMMEDIATE_PREVIOUS_CONTEXT>
(Detailed summary of the immediately preceding segment OR a notice about a discontinuity/break)
${immediateContext}
</IMMEDIATE_PREVIOUS_CONTEXT>
</CONTEXT>
<INSTRUCTIONS>
Analyze the current video segment and produce the output strictly within the specified XML tags.
1. <ARTIFACTS>
Extract the lesson content in detailed Markdown format, following the precise structure below.
Extract all code examples, commands typed, and configuration files shown or discussed. Be precise. Use markdown fencing with appropriate language identifiers.
<FORMAT_EXAMPLE>
---
title: [Generated Descriptive Title for this Segment]
description: [One sentence description of the segment]
topics: [comma, separated, topics]
---
# [Generated Descriptive Title for this Segment]
[Detailed paragraph overview of this segment.]
## Key Concepts
* [Concept 1]
* [Concept 2]
## Demos and Code Examples
### [Name of Demo/Code 1]
[Description of the code and steps taken]
${"`"}${"`"}${"`"}[language]
[Exact Code Snippet]
${"`"}${"`"}${"`"}
### [Name of Demo/Code 2]
...
## Tips and Tricks
* [Specific advice, shortcuts, or "gotchas"]
## Sidebars and Tangents
* [Interesting discussions not directly related to the main topic]
</FORMAT_EXAMPLE>
</ARTIFACTS>
2. <DETAILED_SUMMARY>
Provide a comprehensive, detailed summary of this specific segment. Focus on what was taught, the exact steps taken in demos, and the precise state of the project/code at the very end of the segment. THIS IS CRITICAL input for the next segment's analysis and for updating the rolling history.
</DETAILED_SUMMARY>
<OUTPUT_RULES>
- Keep the entire response under ~1,800 words and avoid repeating identical list entries.
- When a file, log, or dataset is long or repetitive, include only the directly referenced portions and summarize the remainder.
- Always include both <ARTIFACTS> and <DETAILED_SUMMARY> tags, each exactly once, with proper closing tags.
- Do not leave the response unfinished; confirm the final tokens close the tags cleanly.
</OUTPUT_RULES>${retryContext}
</INSTRUCTIONS>
`.trim();
}
function computePreprocessingCacheKey(videoPath, stats) {
const hashInput = JSON.stringify({
videoPath,
size: stats.size,
mtimeMs: stats.mtimeMs,
chunkDurationSeconds: CHUNK_DURATION_SECONDS,
minimumSilenceDuration: MINIMUM_SILENCE_DURATION,
silenceThresholdDb: SILENCE_THRESHOLD_DB,
});
return createHash('sha1').update(hashInput).digest('hex');
}
/**
* Prompt for the compression model (Flash-Lite).
*/
function buildCompressionPrompt(existingHistory, newSummary) {
return `
You are tasked with updating the rolling history of a workshop. Combine the existing history with the summary of the latest segment to create a concise overview of the entire workshop up to this point.
<EXISTING_HISTORY>
${existingHistory || "N/A"}
</EXISTING_HISTORY>
<LATEST_SEGMENT_SUMMARY>
${newSummary}
</LATEST_SEGMENT_SUMMARY>
<INSTRUCTIONS>
Create a new rolling history that integrates the new information. The history should be high-level but capture the main topics covered, the progress made, key definitions, and the overall structure of the workshop so far. Focus on continuity and the big picture.
Keep the output concise (under 1500 words) while retaining essential technical details and the current state of any ongoing projects.
</INSTRUCTIONS>
`.trim();
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment