How we handle code, entities, and formatting
Code and Untranslatable Content
Remove templating and programming code from your source content before translation. Given the vast range of templating and programming languages, it's difficult to reliably identify and preserve all code patterns.
Why Remove Code?
- Inconsistent identification across different programming languages
- Mixed translatable content within code blocks creates confusion
- Better translation quality when translators work with clean text
Recommended Approach
Use our key-value structure to separate translatable content from code:
// ✅ Good - Separate content from code
{
"welcome_message": {
"text": "Welcome to our platform"
},
"button_label": {
"text": "Get Started"
}
}
// ❌ Avoid - Mixing code with translatable content
{
"template": {
"text": "<div class='welcome'>Welcome to {{platform_name}}</div>"
}
}
Integration Tools
Many frameworks provide tools for managing translations:
- YAML files for configuration-based translations
- XLIFF files for translation exchange
- i18n libraries for runtime translation rendering
Encoding
All requests use UTF-8 encoding and are interpreted as such. Ensure your source content is properly UTF-8 encoded before submission.
HTML Entities
Automatic Decoding
HTML entities are automatically decoded to UTF-8 to make translation easier:
ü
becomesü
é
becomesé
&
becomes&
Output format: Our translations will not contain HTML entities - all special characters are returned as UTF-8.
Exception: Angle Brackets
Special handling for<
and >
:
When text contains HTML markup, any <
or >
characters that are not part of HTML tags will be returned as entities:
Input: "The formula is x < 5 and y > 10"
Output: "The formula is x < 5 and y > 10"
This preserves mathematical expressions and comparisons while maintaining HTML structure.
Newlines
Best Practice: Clean Up Newlines
Remove unnecessary newlines before submitting content for translation. Newlines should only mark:
- Sentence boundaries
- Paragraph breaks
- Intentional formatting
Why This Matters
Proper segmentation: Clean newlines help us segment sentences correctly for better translations.
Common issues:
- Copy-paste artifacts create unintended line breaks
- Faulty user input introduces random newlines
- Inconsistent formatting confuses sentence detection
Examples
✅ Good - Clean formatting
"Welcome to our platform. We're excited to help you succeed."
❌ Problematic - Unnecessary newlines
"Welcome to our
platform. We're excited
to help you succeed."
Summary
- Remove code from translatable content - use key-value separation
- Use UTF-8 encoding for all content
- Let us handle HTML entities - they'll be decoded automatically
- Clean up newlines - keep only intentional sentence/paragraph breaks
Following these guidelines ensures optimal translation quality and consistent formatting in your final output.