Special Characters, Encoding & HTML

How we handle code, entities, and formatting

Code and Untranslatable Content

Remove templating and programming code from your source content before translation. Given the vast range of templating and programming languages, it's difficult to reliably identify and preserve all code patterns.

Why Remove Code?

  • Inconsistent identification across different programming languages
  • Mixed translatable content within code blocks creates confusion
  • Better translation quality when translators work with clean text

Recommended Approach

Use our key-value structure to separate translatable content from code:

// ✅ Good - Separate content from code
{
  "welcome_message": {
    "text": "Welcome to our platform"
  },
  "button_label": {
    "text": "Get Started"
  }
}

// ❌ Avoid - Mixing code with translatable content
{
  "template": {
    "text": "<div class='welcome'>Welcome to {{platform_name}}</div>"
  }
}

Integration Tools

Many frameworks provide tools for managing translations:

  • YAML files for configuration-based translations
  • XLIFF files for translation exchange
  • i18n libraries for runtime translation rendering

Encoding

All requests use UTF-8 encoding and are interpreted as such. Ensure your source content is properly UTF-8 encoded before submission.

HTML Entities

Automatic Decoding

HTML entities are automatically decoded to UTF-8 to make translation easier:

  • &uuml; becomes ü
  • &eacute; becomes é
  • &amp; becomes &

Output format: Our translations will not contain HTML entities - all special characters are returned as UTF-8.

Exception: Angle Brackets

Special handling for&lt; and &gt;:

When text contains HTML markup, any < or > characters that are not part of HTML tags will be returned as entities:

Input:  "The formula is x < 5 and y > 10"
Output: "The formula is x &lt; 5 and y &gt; 10"

This preserves mathematical expressions and comparisons while maintaining HTML structure.

Newlines

Best Practice: Clean Up Newlines

Remove unnecessary newlines before submitting content for translation. Newlines should only mark:

  • Sentence boundaries
  • Paragraph breaks
  • Intentional formatting

Why This Matters

Proper segmentation: Clean newlines help us segment sentences correctly for better translations.

Common issues:

  • Copy-paste artifacts create unintended line breaks
  • Faulty user input introduces random newlines
  • Inconsistent formatting confuses sentence detection

Examples

✅ Good - Clean formatting
"Welcome to our platform. We're excited to help you succeed."

❌ Problematic - Unnecessary newlines
"Welcome to our
platform. We're excited
to help you succeed."

Summary

  1. Remove code from translatable content - use key-value separation
  2. Use UTF-8 encoding for all content
  3. Let us handle HTML entities - they'll be decoded automatically
  4. Clean up newlines - keep only intentional sentence/paragraph breaks

Following these guidelines ensures optimal translation quality and consistent formatting in your final output.