Created
February 26, 2025 13:34
-
-
Save youngbrioche/6c2fb0491d40d206f11d0ba9cbf51f3b to your computer and use it in GitHub Desktop.
translate-article.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# /// script | |
# requires-python = ">=3.12" | |
# dependencies = [ | |
# "llm-anthropic>=0.14.1" | |
# ] | |
# /// | |
import sys | |
import argparse | |
import llm | |
# Define translation prompt - modified to ensure no follow-up questions | |
TRANSLATION_PROMPT = """ | |
# Enhanced Bidirectional Translation Prompt for German<->English Technical Content | |
You are tasked with translating technical content about IT and software development between German and English for the company INNOQ. Your goal is to produce translations that read naturally to native speakers of the target language while maintaining technical accuracy. | |
## Translation Direction | |
- For **German to English** translation: Produce text that reads as if written by a native US English speaker | |
- For **English to German** translation: Produce text that reads as if written by a native German speaker | |
## General Translation Guidelines | |
1. Do not translate word-for-word. Focus on conveying meaning and intent naturally in the target language. | |
2. Maintain the original structure and organization, including headings, subheadings, and paragraphs. | |
3. Pay special attention to technical terms and concepts, using the correct and most up-to-date terminology in the target language. | |
4. Adapt cultural references or idioms to ones familiar to the target audience. | |
5. First read through the entire source text to understand its overall content and tone. | |
6. After completing the initial translation, review for consistency in terminology, style, and tone. | |
7. Proofread for grammatical errors, awkward phrasings, or typos. | |
## Language-Specific Guidelines | |
### When translating from German to English: | |
1. Use American English spelling and punctuation conventions. | |
2. Convert German compound nouns to proper English equivalents (e.g., "Datenbankmanagementsystem" → "database management system"). | |
3. Simplify overly complex sentence structures common in German technical writing. | |
4. Be mindful of false friends (e.g., "aktuell" → "current" not "actual"). | |
### When translating from English to German: | |
1. Use proper German capitalization rules for nouns. | |
2. Apply German punctuation conventions, including comma rules for subordinate clauses. | |
3. Use appropriate German compound nouns when they represent standard terminology. | |
4. Decide appropriately when to keep English technical terms (as is common in German IT literature) vs. using German equivalents. | |
5. Apply correct grammatical gender and case for technical terms. | |
## Technical Content Guidelines | |
1. Research unfamiliar technical terms to ensure accurate translation. | |
2. If a technical term lacks a direct equivalent in the target language, provide a brief explanation in parentheses the first time it appears. | |
3. Preserve code snippets as-is, only translating comments. | |
4. In German IT content, many English terms are used untranslated - maintain this convention when appropriate for natural-sounding German text. | |
5. Maintain consistent terminology throughout the document. | |
Translate the following content without asking any follow-up questions. Output only the translated text without explanations or comments: | |
{content} | |
""" | |
# Define editing prompt - modified to ensure no follow-up questions | |
EDITING_PROMPT = """ | |
# Technical Translation Editing Prompt: Refining Machine Translations | |
You are an expert editor reviewing machine-translated technical content between German and English for INNOQ. Your task is to identify weaknesses in the machine translation and refine it to sound more natural to native speakers while preserving technical accuracy. | |
## Your Objective | |
Transform a technically accurate but potentially awkward machine translation into polished content that reads as if originally written in the target language. Focus particularly on flow, idioms, and natural expression while maintaining technical precision. | |
## Common Machine Translation Weaknesses to Identify | |
1. Literal translations that feel unnatural or stilted | |
2. Awkward sentence structures that follow the source language pattern | |
3. Inconsistent terminology or inappropriate technical vocabulary | |
4. Missing context or cultural nuance | |
5. Unnatural word choices or collocations | |
6. Overly formal or informal tone for the content type | |
7. Redundancies or unnecessarily complex phrasing | |
## Editing Process | |
1. First, read through the entire translation to understand the content and identify systematic issues | |
2. Mark passages that sound unnatural or follow source language structures too closely | |
3. Check all technical terminology for accuracy and consistency | |
4. Evaluate whether idioms and cultural references have been appropriately adapted | |
5. Restructure sentences to follow natural patterns of the target language | |
6. Replace awkward phrasings with more natural expressions | |
7. Ensure consistent voice, tense, and style throughout | |
## Language-Specific Considerations | |
### When editing German→English translations: | |
1. Break up overly long, complex sentences common in German technical writing | |
2. Replace literal translations of German compound terms with natural English equivalents | |
3. Reduce passive voice where active voice would sound more natural in English | |
4. Ensure proper article usage (a common issue when translating from German) | |
5. Watch for directly translated German idioms that may not work in English | |
### When editing English→German translations: | |
1. Check for proper noun capitalization and compound word formation | |
2. Verify correct grammatical gender and case usage for technical terms | |
3. Ensure appropriate use of formal vs. informal address | |
4. Check that English terms kept in German text (common in IT) are used appropriately | |
5. Confirm proper German word order, especially in subordinate clauses | |
## Technical Content Refinement | |
1. Verify that specialized terminology maintains consistent translation throughout | |
2. Ensure proper formatting of units, measurements, dates, and numbers per target language conventions | |
3. Check that acronyms are properly introduced and used consistently | |
4. Confirm that explanations of technical concepts flow naturally in the target language | |
5. Validate that code comments maintain technical accuracy while reading naturally | |
## Final Quality Check | |
1. Read the edited document aloud to identify any remaining awkward phrasing | |
2. Compare with similar native-written content in the target language to ensure natural style | |
3. Verify that the document maintains the intended level of formality and technical precision | |
4. Check that the editing hasn't introduced any factual or technical errors | |
Edit the following translation without asking any follow-up questions. Output only the edited text without explanations, comments, or additional context: | |
{content} | |
""" | |
def translate_article(article_text, model_name="anthropic/claude-3-7-sonnet-latest"): | |
""" | |
Translate an article using a two-step process with LLM. | |
Args: | |
article_text (str): The article text to translate | |
model_name (str): The model name to use | |
Returns: | |
str: The translated and edited article | |
""" | |
# Get the model | |
model = llm.get_model(model_name) | |
# Step 1: Translate the article | |
translation_prompt = TRANSLATION_PROMPT.format(content=article_text) | |
translation_response = model.prompt(translation_prompt) | |
translated_text = translation_response.text() | |
# Step 2: Edit the translation | |
editing_prompt = EDITING_PROMPT.format(content=translated_text) | |
editing_response = model.prompt(editing_prompt) | |
edited_text = editing_response.text() | |
return edited_text | |
def main(): | |
# Parse command line arguments | |
parser = argparse.ArgumentParser(description="Translate articles using a two-step LLM process") | |
parser.add_argument("--input", "-i", help="Input file path (defaults to stdin if not provided)") | |
parser.add_argument("--output", "-o", help="Output file path (defaults to stdout if not provided)") | |
parser.add_argument("--model", "-m", default="anthropic/claude-3-7-sonnet-latest", | |
help="Model name to use (default: anthropic/claude-3-7-sonnet-latest)") | |
args = parser.parse_args() | |
# Read input | |
if args.input: | |
with open(args.input, 'r', encoding='utf-8') as f: | |
article_text = f.read() | |
else: | |
print("Reading from standard input. Press Ctrl+D (Unix) or Ctrl+Z (Windows) when finished.") | |
article_text = sys.stdin.read() | |
# Translate and edit | |
result = translate_article(article_text, args.model) | |
# Write output | |
if args.output: | |
with open(args.output, 'w', encoding='utf-8') as f: | |
f.write(result) | |
print(f"Translation saved to {args.output}") | |
else: | |
print(result) | |
if __name__ == "__main__": | |
main() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment