Skip to content

Instantly share code, notes, and snippets.

@WomB0ComB0
Last active March 5, 2025 04:46
Show Gist options
  • Save WomB0ComB0/1fa01baa07fde94a3604f1f7ef4101f4 to your computer and use it in GitHub Desktop.
Save WomB0ComB0/1fa01baa07fde94a3604f1f7ef4101f4 to your computer and use it in GitHub Desktop.
markdown-to-google-docs.ts and related files - with AI-generated descriptions
/**
* # Markdown to Google Docs Converter
*
* This module provides functionality to convert Markdown content to Google Docs
* with proper formatting. It handles various Markdown elements like headings,
* lists, bold, italic, links, and code blocks.
*
* ## Features
* - Create Google Docs from Markdown content
* - Apply proper formatting (headings, bold, italic, links, code blocks)
* - Share documents with specified recipients
* - Batch processing of formatting requests
* - Cleanup of Markdown syntax after formatting
*
* ## Usage
* ```typescript
* // Create a new manager instance
* const manager = new GoogleDocsManager('./path-to-credentials.json');
*
* // Create and share a document
* const documentUrl = await manager.createAndShareDocument({
* title: 'My Document',
* recipientEmail: '[email protected]',
* markdownContent: '# Hello World\n\nThis is **bold** and *italic*.'
* });
*
* console.log(`Document created: ${documentUrl}`);
* ```
*
* ## Command Line Usage
* ```
* bun run markdown-to-google-docs.ts <input.md> <document-title> <recipient-email> [credentials-path]
* ```
*
* @module markdown-to-google-docs
*/
import { Logger, LogLevel } from './logger';
import { convertMarkdownToPlainText } from './markdown-to-text';
import { google } from 'googleapis';
import { GoogleAuth } from 'google-auth-library';
import { marked } from 'marked';
import type { Token, Tokens } from 'marked';
import * as stringSimilarity from 'string-similarity';
/**
* Decorator that automatically instantiates a class when the module is loaded
* @param constructor - The class constructor to instantiate
* @returns The original constructor
*/
function selfExecute<T extends { new(...args: any[]): {} }>(constructor: T) {
new constructor();
return constructor;
}
/**
* Options for creating a Google Doc from Markdown
* @interface GoogleDocOptions
*/
interface GoogleDocOptions {
/** The title of the Google Doc */
title: string;
/** Email address to share the document with */
recipientEmail: string;
/** Markdown content to convert */
markdownContent: string;
/** Path to Google service account credentials (optional) */
credentialsPath?: string;
}
/**
* Represents a paragraph position in a Google Doc
* @interface ParagraphPosition
*/
type ParagraphPosition = {
/** Start index of the paragraph */
startIndex: number,
/** End index of the paragraph */
endIndex: number,
/** Text content of the paragraph */
content: string
};
/**
* Context for the Markdown renderer
* @interface RendererContext
*/
interface RendererContext {
/** Function to find text positions in the document */
findTextPositions: (contentWithPositions: {text: string, startIndex: number, endIndex: number}[], text: string) => {startIndex: number, endIndex: number}[];
/** Array of content elements with their positions */
contentWithPositions: {text: string, startIndex: number, endIndex: number}[];
/** Array of Google Docs API requests */
requests: any[];
/** Array of paragraphs in the document */
paragraphs: ParagraphPosition[];
/** Function to find a paragraph by its text content */
findParagraphByText: (paragraphs: ParagraphPosition[], text: string) => ParagraphPosition | null;
}
// Initialize logger
const logger = Logger.getLogger('GoogleDocsManager', {
minLevel: LogLevel.INFO,
includeTimestamp: true
});
/**
* Main class for managing Google Docs operations
*
* Handles creation, sharing, and formatting of Google Docs from Markdown content.
*/
class GoogleDocsManager {
private auth: GoogleAuth;
private docsService: any;
private driveService: any;
/**
* Creates a new GoogleDocsManager instance
* @param credentialsPath - Path to the Google service account credentials JSON file
*/
constructor(credentialsPath: string = './service-account.json') {
this.auth = new GoogleAuth({
keyFile: credentialsPath,
scopes: [
'https://www.googleapis.com/auth/documents',
'https://www.googleapis.com/auth/drive'
]
});
this.docsService = google.docs({ version: 'v1', auth: this.auth });
this.driveService = google.drive({ version: 'v3', auth: this.auth });
logger.debug('GoogleDocsManager initialized', { credentialsPath });
}
/**
* Creates a new Google Doc with the specified title
* @param title - The title of the document
* @returns Promise resolving to the document ID
*/
async createDocument(title: string): Promise<string> {
try {
const response = await logger.time('Create Google Doc', async () => {
return await this.docsService.documents.create({
requestBody: {
title: title
}
});
});
logger.info(`Document created with ID: ${response.data.documentId}`, { title });
return response.data.documentId;
} catch (error) {
logger.error(`Error creating Google Doc`, error);
throw error;
}
}
/**
* Shares a Google Doc with the specified email address
* @param documentId - The ID of the document to share
* @param email - The email address to share with
*/
async shareDocument(documentId: string, email: string): Promise<void> {
try {
await logger.time('Share Google Doc', async () => {
return await this.driveService.permissions.create({
fileId: documentId,
requestBody: {
type: 'user',
role: 'writer',
emailAddress: email
}
});
});
logger.info(`Document shared with ${email}`, { documentId });
} catch (error) {
logger.error(`Error sharing Google Doc`, error, { documentId, email });
throw error;
}
}
/**
* Updates a Google Doc with plain text content
* @param documentId - The ID of the document to update
* @param content - The text content to insert
*/
async updateDocumentContent(documentId: string, content: string): Promise<void> {
try {
await logger.time('Update document content', async () => {
return await this.docsService.documents.batchUpdate({
documentId: documentId,
requestBody: {
requests: [
{
insertText: {
location: {
index: 1
},
text: content
}
}
]
}
});
});
logger.info('Document content updated successfully', {
documentId,
contentLength: content.length
});
} catch (error) {
logger.error(`Error updating Google Doc content`, error, { documentId });
throw error;
}
}
/**
* Converts Markdown content to a formatted Google Doc
* @param documentId - The ID of the document to format
* @param markdownContent - The Markdown content to convert
*/
async convertMarkdownToFormattedDoc(documentId: string, markdownContent: string): Promise<void> {
try {
logger.debug('Starting markdown conversion', { documentId });
// First convert to plain text (keeping markdown syntax)
const plainText = convertMarkdownToPlainText(markdownContent);
await this.updateDocumentContent(documentId, plainText);
// Get the document
const document = await this.docsService.documents.get({ documentId });
// Apply formatting
const requests = await this.createFormattingRequestsFromMarkdown(markdownContent, document.data);
// Apply formatting in batches
if (requests.length > 0) {
await logger.time('Apply text formatting', async () => {
const batchSize = 1000;
for (let i = 0; i < requests.length; i += batchSize) {
const batch = requests.slice(i, i + batchSize);
await this.docsService.documents.batchUpdate({
documentId: documentId,
requestBody: {
requests: batch
}
});
}
});
}
// Now clean up the markdown syntax
const cleanupRequests = this.createMarkdownSyntaxCleanupRequests(document.data);
if (cleanupRequests.length > 0) {
await this.docsService.documents.batchUpdate({
documentId: documentId,
requestBody: {
requests: cleanupRequests
}
});
}
logger.info('Document formatting applied successfully', {
documentId,
requestCount: requests.length
});
} catch (error) {
logger.error(`Error formatting Google Doc content`, error, { documentId });
throw error;
}
}
/**
* Recursively processes Markdown tokens to apply formatting
* @param tokens - Array of Markdown tokens
* @param context - Renderer context
* @private
*/
private processTokensRecursively(tokens: Token[], context: RendererContext): void {
for (const token of tokens) {
switch (token.type) {
case 'strong':
this.applyStrongFormatting(token as Tokens.Strong, context);
break;
case 'em':
this.applyEmFormatting(token as Tokens.Em, context);
break;
case 'link':
this.applyLinkFormatting(token as Tokens.Link, context);
break;
case 'codespan':
this.applyCodespanFormatting(token as Tokens.Codespan, context);
break;
}
if ('tokens' in token && Array.isArray(token.tokens)) {
this.processTokensRecursively(token.tokens, context);
}
if (token.type === 'list') {
const listToken = token as Tokens.List;
for (const item of listToken.items) {
if (item.tokens) {
this.processTokensRecursively(item.tokens, context);
}
}
}
}
}
/**
* Applies bold formatting to text
* @param token - Strong token from Markdown
* @param context - Renderer context
* @private
*/
private applyStrongFormatting(token: Tokens.Strong, context: RendererContext): void {
const cleanText = token.text.replace(/<[^>]*>/g, '');
const positions = context.findTextPositions(context.contentWithPositions, cleanText);
for (const position of positions) {
context.requests.push({
updateTextStyle: {
range: {
startIndex: position.startIndex,
endIndex: position.endIndex
},
textStyle: {
bold: true
},
fields: 'bold'
}
});
}
}
/**
* Applies italic formatting to text
* @param token - Em token from Markdown
* @param context - Renderer context
* @private
*/
private applyEmFormatting(token: Tokens.Em, context: RendererContext): void {
const positions = context.findTextPositions(context.contentWithPositions, token.text);
for (const position of positions) {
context.requests.push({
updateTextStyle: {
range: {
startIndex: position.startIndex,
endIndex: position.endIndex
},
textStyle: {
italic: true
},
fields: 'italic'
}
});
}
}
/**
* Applies link formatting to text
* @param token - Link token from Markdown
* @param context - Renderer context
* @private
*/
private applyLinkFormatting(token: Tokens.Link, context: RendererContext): void {
const positions = context.findTextPositions(context.contentWithPositions, token.text);
for (const position of positions) {
context.requests.push({
updateTextStyle: {
range: {
startIndex: position.startIndex,
endIndex: position.endIndex
},
textStyle: {
link: {
url: token.href
}
},
fields: 'link'
}
});
}
}
/**
* Applies code formatting to inline code
* @param token - Codespan token from Markdown
* @param context - Renderer context
* @private
*/
private applyCodespanFormatting(token: Tokens.Codespan, context: RendererContext): void {
const positions = context.findTextPositions(context.contentWithPositions, token.text);
for (const position of positions) {
context.requests.push({
updateTextStyle: {
range: {
startIndex: position.startIndex,
endIndex: position.endIndex
},
textStyle: {
weightedFontFamily: {
fontFamily: 'Courier New'
}
},
fields: 'weightedFontFamily'
}
});
}
}
/**
* Creates formatting requests for Markdown content
* @param markdownContent - The Markdown content to format
* @param document - The Google Doc document object
* @returns Array of Google Docs API requests
* @private
*/
private async createFormattingRequestsFromMarkdown(markdownContent: string, document: any): Promise<any[]> {
const requests: any[] = [];
const renderer = new marked.Renderer();
const contentWithPositions = this.getContentWithPositions(document);
const paragraphs = document.body.content
.filter((item: any) => item.paragraph)
.map((item: any) => ({
startIndex: item.startIndex,
endIndex: item.endIndex,
content: item.paragraph.elements.map((el: any) => el.textRun?.content || '').join('')
}));
type ContentPosition = {text: string, startIndex: number, endIndex: number};
type ParagraphPosition = {startIndex: number, endIndex: number, content: string};
type RendererContext = {
findParagraphByText: (paragraphs: ParagraphPosition[], text: string) => ParagraphPosition | null;
findTextPositions: (contentWithPositions: ContentPosition[], text: string) => TextPosition[];
paragraphs: ParagraphPosition[];
contentWithPositions: ContentPosition[];
requests: any[];
};
const context: RendererContext = {
findParagraphByText: this.findParagraphByText.bind(this),
findTextPositions: this.findTextPositions.bind(this),
paragraphs,
contentWithPositions,
requests
};
function isHeadingToken(token: Tokens.Generic): token is Tokens.Heading {
return token.type === 'heading' && 'depth' in token && 'text' in token;
}
function isListToken(token: Tokens.Generic): token is Tokens.List {
return token.type === 'list' && 'items' in token && 'ordered' in token;
}
type TextPosition = { startIndex: number, endIndex: number };
function stripHtml(html: string): string {
return html.replace(/<[^>]*>/g, '');
}
renderer.heading = function(token: Tokens.Heading): string {
if (!isHeadingToken(token)) {
logger.warn('Invalid heading token', token);
return '';
}
const cleanText = stripHtml(token.text);
const paragraph = context.findParagraphByText(context.paragraphs, cleanText);
if (paragraph) {
context.requests.push({
updateParagraphStyle: {
range: {
startIndex: paragraph.startIndex,
endIndex: paragraph.endIndex - 1
},
paragraphStyle: {
namedStyleType: `HEADING_${Math.min(Math.max(token.depth, 1), 6)}`
},
fields: 'namedStyleType'
}
});
}
return cleanText;
};
renderer.strong = function({ text }: Tokens.Strong): string {
const positions = context.findTextPositions(context.contentWithPositions, text);
for (const position of positions) {
context.requests.push({
updateTextStyle: {
range: {
startIndex: position.startIndex,
endIndex: position.endIndex
},
textStyle: {
bold: true
},
fields: 'bold'
}
});
}
return text;
};
renderer.em = function({ text }: Tokens.Em): string {
const positions = context.findTextPositions(context.contentWithPositions, text);
for (const position of positions) {
context.requests.push({
updateTextStyle: {
range: {
startIndex: position.startIndex,
endIndex: position.endIndex
},
textStyle: {
italic: true
},
fields: 'italic'
}
});
}
return text;
};
renderer.link = function({ href, title, text }: Tokens.Link): string {
const positions = context.findTextPositions(context.contentWithPositions, text);
for (const position of positions) {
context.requests.push({
updateTextStyle: {
range: {
startIndex: position.startIndex,
endIndex: position.endIndex
},
textStyle: {
link: {
url: href
}
},
fields: 'link'
}
});
}
return text;
};
renderer.code = function({ text, lang, escaped }: Tokens.Code): string {
const positions = context.findTextPositions(context.contentWithPositions, text);
for (const position of positions) {
context.requests.push({
updateTextStyle: {
range: {
startIndex: position.startIndex,
endIndex: position.endIndex
},
textStyle: {
weightedFontFamily: {
fontFamily: 'Courier New'
},
backgroundColor: {
color: {
rgbColor: {
red: 0.95,
green: 0.95,
blue: 0.95
}
}
}
},
fields: 'weightedFontFamily,backgroundColor'
}
});
}
return text;
};
renderer.codespan = function({ text }: Tokens.Codespan): string {
const positions = context.findTextPositions(context.contentWithPositions, text);
for (const position of positions) {
context.requests.push({
updateTextStyle: {
range: {
startIndex: position.startIndex,
endIndex: position.endIndex
},
textStyle: {
weightedFontFamily: {
fontFamily: 'Courier New'
}
},
fields: 'weightedFontFamily'
}
});
}
return text;
};
renderer.list = function(token: Tokens.List): string {
if (!isListToken(token)) {
logger.warn('Invalid list token', token);
return '';
}
for (const item of token.items) {
const paragraph = context.findParagraphByText(context.paragraphs, item.text.trim());
if (paragraph) {
context.requests.push({
createParagraphBullets: {
range: {
startIndex: paragraph.startIndex,
endIndex: paragraph.endIndex - 1
},
bulletPreset: token.ordered ? 'NUMBERED_DECIMAL_NESTED' : 'BULLET_DISC_CIRCLE_SQUARE'
}
});
}
}
return token.items.map(item => item.text).join('\n');
};
const tokens = marked.lexer(markdownContent);
this.processTokensRecursively(tokens, context);
for (const token of tokens) {
if (token.type === 'heading') {
const headingToken = token as Tokens.Heading;
const paragraph = context.findParagraphByText(context.paragraphs, headingToken.text.trim());
if (paragraph) {
let headingStyle: string;
switch (headingToken.depth) {
case 1: headingStyle = 'HEADING_1'; break;
case 2: headingStyle = 'HEADING_2'; break;
case 3: headingStyle = 'HEADING_3'; break;
case 4: headingStyle = 'HEADING_4'; break;
case 5: headingStyle = 'HEADING_5'; break;
case 6: headingStyle = 'HEADING_6'; break;
default: headingStyle = 'NORMAL_TEXT';
}
context.requests.push({
updateParagraphStyle: {
range: {
startIndex: paragraph.startIndex,
endIndex: paragraph.endIndex - 1
},
paragraphStyle: {
namedStyleType: headingStyle
},
fields: 'namedStyleType'
}
});
}
} else if (token.type === 'list') {
const listToken = token as Tokens.List;
const paragraphs = context.paragraphs;
for (const item of listToken.items) {
const paragraph = context.findParagraphByText(paragraphs, item.text.trim());
if (paragraph) {
requests.push({
createParagraphBullets: {
range: {
startIndex: paragraph.startIndex,
endIndex: paragraph.endIndex - 1
},
bulletPreset: listToken.ordered ? 'NUMBERED_DECIMAL_NESTED' : 'BULLET_DISC_CIRCLE_SQUARE'
}
});
}
}
}
}
return requests;
}
/**
* Extracts content with positions from a Google Doc
* @param document - The Google Doc document object
* @returns Array of content elements with their positions
* @public
*/
public getContentWithPositions(document: any): {text: string, startIndex: number, endIndex: number}[] {
const result: {text: string, startIndex: number, endIndex: number}[] = [];
if (document.body && document.body.content) {
for (const item of document.body.content) {
if (item.paragraph) {
for (const element of item.paragraph.elements) {
if (element.textRun && element.textRun.content) {
result.push({
text: element.textRun.content,
startIndex: element.startIndex,
endIndex: element.endIndex
});
}
}
}
}
}
return result;
}
/**
* Finds positions of text in a document
* @param contentWithPositions - Array of content elements with their positions
* @param searchText - Text to search for
* @returns Array of positions where the text was found
* @public
*/
public findTextPositions(contentWithPositions: {text: string, startIndex: number, endIndex: number}[], searchText: string): {startIndex: number, endIndex: number}[] {
const results: {startIndex: number, endIndex: number}[] = [];
for (const item of contentWithPositions) {
let index = item.text.indexOf(searchText);
while (index !== -1) {
results.push({
startIndex: item.startIndex + index,
endIndex: item.startIndex + index + searchText.length
});
index = item.text.indexOf(searchText, index + 1);
}
}
return results;
}
/**
* Finds a paragraph by its text content using string similarity
* @param paragraphs - Array of paragraphs to search
* @param text - Text to search for
* @param similarityThreshold - Minimum similarity threshold (0-1)
* @returns The matching paragraph or null if not found
* @public
*/
public findParagraphByText(
paragraphs: ParagraphPosition[],
text: string,
similarityThreshold = 0.8
): ParagraphPosition | null {
const target = text.trim().toLowerCase();
for (const paragraph of paragraphs) {
const source = paragraph.content.trim().toLowerCase();
const similarity = stringSimilarity.compareTwoStrings(source, target);
if (similarity >= similarityThreshold) {
return paragraph;
}
}
logger.warn('Paragraph not found for text:', { target, paragraphs });
return null;
}
/**
* Creates requests to clean up Markdown syntax from the document
* @param document - The Google Doc document object
* @returns Array of Google Docs API requests
* @private
*/
private createMarkdownSyntaxCleanupRequests(document: any): any[] {
const requests: any[] = [];
const contentWithPositions = this.getContentWithPositions(document);
// Find and replace markdown syntax patterns
const patterns = [
{ regex: /\*\*(.*?)\*\*/g, replacement: '$1' }, // Bold
{ regex: /\*(.*?)\*/g, replacement: '$1' }, // Italic
{ regex: /`(.*?)`/g, replacement: '$1' }, // Code
{ regex: /__(.*?)__/g, replacement: '$1' }, // Underline
{ regex: /_(.*?)_/g, replacement: '$1' }, // Underline/Italic
{ regex: /~~(.*?)~~/g, replacement: '$1' } // Strikethrough
];
for (const item of contentWithPositions) {
for (const pattern of patterns) {
let match;
while ((match = pattern.regex.exec(item.text)) !== null) {
const fullMatch = match[0];
const startIndex = item.startIndex + match.index;
const endIndex = startIndex + fullMatch.length;
requests.push({
replaceAllText: {
replaceText: match[1], // The text without markdown syntax
containsText: {
text: fullMatch,
matchCase: true
}
}
});
}
}
}
return requests;
}
/**
* Creates a Google Doc from Markdown content and shares it
* @param options - Options for creating and sharing the document
* @returns Promise resolving to the document URL
* @public
*/
async createAndShareDocument(options: GoogleDocOptions): Promise<string> {
try {
logger.info('Starting document creation process', {
title: options.title,
recipient: options.recipientEmail
});
const documentId = await this.createDocument(options.title);
await this.convertMarkdownToFormattedDoc(documentId, options.markdownContent);
await this.shareDocument(documentId, options.recipientEmail);
const documentUrl = `https://docs.google.com/document/d/${documentId}/edit`;
logger.success('Document created, updated, and shared successfully', { documentUrl });
return documentUrl;
} catch (error) {
logger.error(`Error in createAndShareDocument`, error, {
title: options.title,
recipient: options.recipientEmail
});
throw error;
}
}
}
/**
* Main class for command-line execution
* Automatically runs when the module is executed directly
*/
@selfExecute
class Main {
constructor() {
if (require.main === module) {
this.main();
}
}
/**
* Main entry point for command-line execution
*/
async main() {
const args = process.argv.slice(2);
if (args.length < 3) {
console.log('Usage: bun run markdown-to-google-docs.ts <input.md> <document-title> <recipient-email> [credentials-path]');
process.exit(1);
}
const [inputFile, title, email, credentialsPath] = args;
try {
const markdownContent = await Bun.file(inputFile).text();
const manager = new GoogleDocsManager(credentialsPath);
const documentUrl = await manager.createAndShareDocument({
title,
recipientEmail: email,
markdownContent
});
console.log(`Document created and shared successfully!`);
console.log(`URL: ${documentUrl}`);
} catch (error) {
console.error(`Error: ${error}`);
process.exit(1);
}
}
}
export { GoogleDocsManager, GoogleDocOptions, Main };

markdown-to-google-docs.ts Description

File Type: ts

Generated Description:

markdown-to-google-docs.ts Analysis

This TypeScript file implements a module for converting Markdown content into formatted Google Docs. It leverages the Google Docs API and the marked library for Markdown parsing. The module allows for document creation, sharing, and sophisticated formatting based on Markdown syntax.

Summary

The module provides a robust solution for automating the creation and sharing of Google Docs from Markdown input. It handles a wide range of Markdown elements and incorporates error handling and logging. It supports both programmatic usage within TypeScript applications and command-line execution.

Key Components and Functions

  • GoogleDocsManager Class: The core class responsible for managing interactions with the Google Docs and Drive APIs. Key methods include:
    • constructor(credentialsPath: string): Initializes the manager with Google service account credentials.
    • createDocument(title: string): Creates a new Google Doc with the given title.
    • shareDocument(documentId: string, recipientEmail: string): Shares an existing document with a specified email address. (This function is not fully shown in the provided snippet).
    • createAndShareDocument(options: GoogleDocOptions): Combines document creation and sharing. (This function is not fully shown in the provided snippet).
    • (Likely includes methods for formatting the document based on Markdown content using the marked library and applying the formatting through Google Docs API calls).
  • GoogleDocOptions Interface: Defines the options for creating and sharing a Google Doc (title, recipient email, Markdown content, optional credentials path).
  • ParagraphPosition Type: Represents the position and content of a paragraph within a Google Doc, used for tracking formatting operations.
  • RendererContext Interface: A context object passed to the Markdown rendering functions, providing access to helper functions for finding text positions and managing API requests.
  • selfExecute Decorator: A decorator that automatically instantiates the GoogleDocsManager class upon module import. This is a custom implementation to ensure the class is initialized without needing explicit instantiation.
  • Logger: Uses a custom logger (Logger class) for outputting messages with timestamps and levels (debug, info, error).
  • convertMarkdownToPlainText (from imported module): Likely converts Markdown to plain text (possibly a fallback mechanism).
  • stringSimilarity (from imported module): Suggests the use of string similarity algorithms, possibly for advanced features like finding and matching paragraphs during formatting.

Notable Patterns and Techniques

  • Dependency Injection: The GoogleDocsManager constructor takes the credentials path as an argument, allowing for flexible configuration.
  • Asynchronous Operations: The use of async/await makes the code cleaner and easier to read for asynchronous operations like API calls.
  • Logging: The inclusion of a custom logger with configurable logging levels is crucial for debugging and monitoring.
  • Decorator Pattern: The selfExecute decorator exemplifies the decorator pattern for automatically instantiating the main class.
  • Modular Design: The code is structured into well-defined classes and interfaces, promoting maintainability and reusability.

Potential Use Cases

  • Automated report generation: Generate Google Docs reports from Markdown templates, populated with data from other sources.
  • Content migration: Transfer Markdown content from various sources into a structured Google Doc format.
  • Content creation workflows: Streamline content creation by allowing authors to write in Markdown and automatically convert to Google Docs for collaboration and editing.
  • Integration with other systems: The module could be integrated into larger systems as a component for document generation.

The code snippet lacks the implementation details for handling the Markdown formatting within the Google Doc. However, based on the interfaces and class structure, it's evident that it would use the marked library's tokenization capabilities to identify Markdown elements and translate them into corresponding Google Docs formatting requests.

Description generated on 3/4/2025, 9:08:57 PM

/**
* # Markdown to Plain Text Converter
*
* This module provides functionality to convert Markdown content to plain text
* while preserving the structure and readability of the original content.
*
* ## Features
* - Converts Markdown to plain text with customizable formatting
* - Preserves document structure (headings, lists, tables)
* - Handles special characters and HTML entities
* - Provides fallback rendering options
* - Preserves table of contents structure
*
* ## Usage
* ```typescript
* import { convertMarkdownToPlainText } from './markdown-to-text';
*
* const markdown = '# Hello World\n\nThis is **bold** and *italic*.';
* const plainText = convertMarkdownToPlainText(markdown);
*
* console.log(plainText);
* // Output: Hello World
* //
* // This is bold and italic.
* ```
*/
import { marked } from 'marked';
import type { MarkedOptions, Renderer, Tokens } from 'marked';
import { Logger, LogLevel } from './logger';
// Create a logger instance for this module
const logger = Logger.getLogger('MarkdownToText', {
minLevel: LogLevel.INFO,
includeTimestamp: true
});
/**
* Options for the plain text renderer
* @interface PlainTextRendererOptions
* @extends MarkedOptions
*/
interface PlainTextRendererOptions extends MarkedOptions {
/** Use spaces instead of newlines for whitespace delimiter */
spaces?: boolean;
}
/**
* Renderer that converts Markdown tokens to plain text
* @class PlainTextRenderer
* @implements {Renderer}
*/
class PlainTextRenderer implements Renderer {
parser: any;
options: PlainTextRendererOptions;
private whitespaceDelimiter: string;
/**
* Creates a new PlainTextRenderer instance
* @param {PlainTextRendererOptions} options - Configuration options
*/
constructor(options?: PlainTextRendererOptions) {
this.options = options || {};
this.whitespaceDelimiter = this.options.spaces ? ' ' : '\n';
this.parser = {
parse: (text: string) => text
};
logger.debug('PlainTextRenderer initialized', { options: this.options });
}
/**
* Helper method to safely convert any value to string
* @param {any} value - The value to convert to string
* @returns {string} The string representation of the value
* @private
*/
private safeToString(value: any): string {
if (value == null) {
return '';
}
if (typeof value === 'object') {
try {
return JSON.stringify(value);
} catch (e) {
logger.warn('Failed to stringify object', { error: e });
return '[Complex Object]';
}
}
return String(value);
}
/**
* Renders a space token
* @returns {string} The rendered space
*/
space(): string {
return this.whitespaceDelimiter;
}
/**
* Renders a code block token
* @param {Tokens.Code} token - The code block token
* @returns {string} The rendered code block
*/
code(token: Tokens.Code): string {
return `${this.whitespaceDelimiter}${this.whitespaceDelimiter}${this.safeToString(token.text)}${this.whitespaceDelimiter}${this.whitespaceDelimiter}`;
}
/**
* Renders a blockquote token
* @param {Tokens.Blockquote} token - The blockquote token
* @returns {string} The rendered blockquote
*/
blockquote(token: Tokens.Blockquote): string {
return `\t${this.safeToString(token.text)}${this.whitespaceDelimiter}`;
}
/**
* Renders an HTML token
* @param {Tokens.HTML | Tokens.Tag} token - The HTML token
* @returns {string} The rendered HTML
*/
html(token: Tokens.HTML | Tokens.Tag): string {
return this.safeToString(token.text);
}
/**
* Renders a heading token
* @param {Tokens.Heading} token - The heading token
* @returns {string} The rendered heading
*/
heading(token: Tokens.Heading): string {
return this.safeToString(token.text);
}
/**
* Renders a horizontal rule token
* @returns {string} The rendered horizontal rule
*/
hr(): string {
return `${this.whitespaceDelimiter}${this.whitespaceDelimiter}`;
}
/**
* Renders a list token
* @param {Tokens.List} token - The list token
* @returns {string} The rendered list
*/
list(token: Tokens.List): string {
return this.safeToString(token.items.map(item => item.text).join(this.whitespaceDelimiter));
}
/**
* Renders a list item token
* @param {Tokens.ListItem} token - The list item token
* @returns {string} The rendered list item
*/
listitem(token: Tokens.ListItem): string {
return `\t${this.safeToString(token.text)}${this.whitespaceDelimiter}`;
}
/**
* Renders a paragraph token
* @param {Tokens.Paragraph} token - The paragraph token
* @returns {string} The rendered paragraph
*/
paragraph(token: Tokens.Paragraph): string {
return `${this.whitespaceDelimiter}${this.safeToString(token.text)}${this.whitespaceDelimiter}`;
}
/**
* Renders a table token
* @param {Tokens.Table} token - The table token
* @returns {string} The rendered table
*/
table(token: Tokens.Table): string {
const header = token.header.map(cell => cell.text).join('\t');
const rows = token.rows.map(row => row.map(cell => cell.text).join('\t')).join(this.whitespaceDelimiter);
return `${this.whitespaceDelimiter}${header}${this.whitespaceDelimiter}${rows}${this.whitespaceDelimiter}`;
}
/**
* Renders a table row token
* @param {Tokens.TableRow} token - The table row token
* @returns {string} The rendered table row
*/
tablerow(token: Tokens.TableRow): string {
return `${this.safeToString(token.text)}${this.whitespaceDelimiter}`;
}
/**
* Renders a table cell token
* @param {Tokens.TableCell} token - The table cell token
* @returns {string} The rendered table cell
*/
tablecell(token: Tokens.TableCell): string {
return `${this.safeToString(token.text)}\t`;
}
/**
* Renders a strong (bold) token
* @param {Tokens.Strong} token - The strong token
* @returns {string} The rendered strong text
*/
strong(token: Tokens.Strong): string {
return this.safeToString(token.text);
}
/**
* Renders an emphasis (italic) token
* @param {Tokens.Em} token - The emphasis token
* @returns {string} The rendered emphasis text
*/
em(token: Tokens.Em): string {
return this.safeToString(token.text);
}
/**
* Renders a code span token
* @param {Tokens.Codespan} token - The code span token
* @returns {string} The rendered code span
*/
codespan(token: Tokens.Codespan): string {
return this.safeToString(token.text);
}
/**
* Renders a line break token
* @returns {string} The rendered line break
*/
br(): string {
return `${this.whitespaceDelimiter}${this.whitespaceDelimiter}`;
}
/**
* Renders a deletion (strikethrough) token
* @param {Tokens.Del} token - The deletion token
* @returns {string} The rendered deletion text
*/
del(token: Tokens.Del): string {
return this.safeToString(token.text);
}
/**
* Renders a link token
* @param {Tokens.Link} token - The link token
* @returns {string} The rendered link text
*/
link(token: Tokens.Link): string {
return this.safeToString(token.text);
}
/**
* Renders an image token
* @param {Tokens.Image} token - The image token
* @returns {string} The rendered image text
*/
image(token: Tokens.Image): string {
return this.safeToString(token.text);
}
/**
* Renders a text token
* @param {Tokens.Text | Tokens.Escape} token - The text token
* @returns {string} The rendered text
*/
text(token: Tokens.Text | Tokens.Escape): string {
return this.safeToString(token.text);
}
/**
* Renders a checkbox token
* @param {Tokens.Checkbox} token - The checkbox token
* @returns {string} The rendered checkbox
*/
checkbox(token: Tokens.Checkbox): string {
return token.checked ? '[x]' : '[ ]';
}
}
/** Default options for the marked parser */
const defaultOptions: MarkedOptions = {};
/**
* Converts Markdown text to plain text
* @param {string} markdownText - The Markdown text to convert
* @param {MarkedOptions} markedOptions - Options for the marked parser
* @returns {string} The converted plain text
*/
function convertMarkdownToPlainText(markdownText: string, markedOptions: MarkedOptions = defaultOptions): string {
try {
const tokens = marked.lexer(markdownText);
let plainText = '';
const tocRegex = /(?:^|\n)(?:#+\s*(?:Table of Contents|Contents|TOC)\s*(?:\n+))(((?:\n*[\s]*\*.*\[.*\]\(.*\).*(?:\n|$))+))/i;
const tocMatch = markdownText.match(tocRegex);
let tableOfContents = '';
if (tocMatch && tocMatch[1]) {
// Extract the table of contents section
tableOfContents = tocMatch[1];
// Process the TOC links to make them plain text but preserve structure
tableOfContents = tableOfContents
.replace(/\*\s*\[(.*?)\]\(.*?\)/g, '• $1') // Convert markdown links to bullet points
.replace(/\s{4}\*/g, ' •') // Preserve indentation for nested items
.replace(/\s{8}\*/g, ' •'); // Preserve indentation for deeper nested items
}
/**
* Recursively extracts text from a token
* @param {any} token - The token to extract text from
* @returns {string} The extracted text
*/
const extractText = (token: any): string => {
if (typeof token === 'string') return token;
if (token.text) return token.text;
if (token.tokens) {
return token.tokens.map(extractText).join(' ');
}
if (token.items) {
return token.items.map(extractText).join('\n');
}
if (token.type === 'table') {
let tableText = '';
if (token.header) {
tableText += token.header.map((cell: any) => cell.text).join(' | ') + '\n';
}
if (token.rows) {
tableText += token.rows.map((row: any) => row.map((cell: any) => cell.text).join(' | ')).join('\n');
}
return tableText;
}
return '';
};
plainText = tokens.map(extractText).join('\n\n');
plainText = plainText
.replace(/\n{3,}/g, '\n\n')
.replace(tocRegex, tableOfContents);
return convertASCIICharsToText(plainText);
} catch (error) {
logger.error(`Error converting markdown to plain text: ${error}`);
const renderer = new PlainTextRenderer();
marked.setOptions(markedOptions);
const plainText = marked(markdownText, { renderer }).toString();
return convertASCIICharsToText(plainText);
}
}
/**
* Converts HTML entities and ASCII character codes to their corresponding characters
* @param {string} str - The string containing HTML entities to convert
* @returns {string} The string with HTML entities converted to characters
*/
function convertASCIICharsToText(str: string): string {
logger.debug('Converting ASCII characters to text', { inputLength: str.length });
let result = str;
const htmlEntities: Record<string, string> = {
"&quot;": '"',
"&amp;": "&",
"&lt;": "<",
"&gt;": ">",
"&apos;": "'",
"&nbsp;": " ",
"&ndash;": "–",
"&mdash;": "—",
"&lsquo;": "'",
"&rsquo;": "'",
"&ldquo;": '"',
"&rdquo;": '"',
"&bull;": "•",
"&hellip;": "…",
"&copy;": "©",
"&reg;": "®",
"&trade;": "™",
"&euro;": "€",
"&pound;": "£",
"&yen;": "¥",
"&cent;": "¢",
"&sect;": "§",
"&para;": "¶",
"&deg;": "°",
"&plusmn;": "±",
"&times;": "×",
"&divide;": "÷",
"&frac14;": "¼",
"&frac12;": "½",
"&frac34;": "¾",
"&ne;": "≠",
"&le;": "≤",
"&ge;": "≥",
"&micro;": "µ",
"&middot;": "·"
};
for (const [entity, char] of Object.entries(htmlEntities)) {
result = result.replaceAll(entity, char);
}
// Convert decimal HTML entities (&#123;)
result = result.replace(/&#(\d+);/g, (match, code) =>
String.fromCharCode(Number(code))
);
// Convert hexadecimal HTML entities (&#x7B;)
result = result.replace(/&#[xX]([A-Fa-f0-9]+);/g, (match, code) =>
String.fromCharCode(parseInt(code, 16))
);
return result;
}
export { convertMarkdownToPlainText, convertASCIICharsToText };

markdown-to-text.ts Description

File Type: ts

Generated Description:

markdown-to-text.ts Analysis

This TypeScript file implements a Markdown to plain text converter. It leverages the marked library for Markdown parsing and provides a custom renderer to output plain text that retains some structural information.

Summary

The module's primary function is to transform Markdown input into plain text, offering more control over the output formatting than a simple stripping of Markdown syntax. It prioritizes preserving readability and basic structural elements like headings, lists, blockquotes, and code blocks. The output can be customized to use spaces or newlines as whitespace delimiters. Robust error handling is included through logging and safe string conversion.

Key Components and Functions

  • convertMarkdownToPlainText(markdown: string): string: (Implicitly defined by usage example; the main function not explicitly defined in the snippet). This function is the primary entry point, taking Markdown text as input and returning its plain text equivalent. It utilizes the PlainTextRenderer and marked library.

  • PlainTextRenderer class: This class implements the marked Renderer interface, overriding methods to generate plain text output for various Markdown tokens. Key methods include:

    • space(): Handles spaces.
    • code(), blockquote(), html(), heading(), hr(), list(), listitem(), paragraph(): Handle the corresponding Markdown elements, converting them to plain text with appropriate formatting (e.g., indentation for blockquotes and lists).
    • safeToString(value: any): string: A helper function to safely convert any value to a string, handling potential errors during JSON stringification of complex objects.
  • PlainTextRendererOptions interface: Extends MarkedOptions from marked to include a spaces boolean flag, allowing users to control whether spaces or newlines are used as whitespace delimiters.

  • Logger (from ./logger): A logging utility used throughout the class for debugging and error reporting.

Notable Patterns and Techniques

  • Custom Renderer: The core functionality relies on creating a custom renderer for the marked library. This allows fine-grained control over the conversion process and ensures the output is plain text rather than HTML.

  • Interface-based Configuration: The use of the PlainTextRendererOptions interface promotes code clarity and maintainability by explicitly defining the configurable options for the renderer.

  • Error Handling: The safeToString function demonstrates robust error handling by gracefully managing potential issues when converting objects to strings (e.g., circular references in JSON). Logging is integrated for debugging and monitoring.

  • Dependency Injection (implicit): The logger is injected into the PlainTextRenderer demonstrating a form of dependency injection although not explicitly typed as parameters in the constructor.

Potential Use Cases

  • Generating text-based reports from Markdown documents: Useful when the output needs to be easily parsed by other systems or displayed in environments that don't support HTML rendering (e.g., command-line applications, simple text-based interfaces).
  • Creating plain text versions of Markdown content for archiving or accessibility: Plain text is highly portable and accessible to a wider range of readers and systems.
  • Preprocessing Markdown before sending it to a system that only accepts plain text: This could be a step in a larger pipeline.
  • Extracting key information from Markdown: If you need to parse Markdown for specific data, converting to plain text first simplifies subsequent text processing.

Description generated on 3/4/2025, 9:08:51 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment