Re: i18n: should cf-cli strings be exact duplicates of en-US translations?


Koper, Dies <diesk@...>
 

Hi John,

Thank you for your interest in the CLI’s internals.

Can I ask the background of looking into this? Are you looking into adding support for another locale and finding the number of messages to translate too big?

I’ve asked my team to answer.

There is a caveat with Benefit (3) and Disadvantage (2):
French grammar rules dictate the requirement for a space before the colon. So “Hello:” becomes “Bonjour :”.
That could be the background for a number of such instances that look like duplications.

Regards,
Dies Koper
Cloud Foundry Product Manager - CLI


From: John Feminella [mailto:jxf(a)pivotal.io]
Sent: Thursday, September 15, 2016 7:14 PM
To: Discussions about Cloud Foundry projects and the system overall.
Subject: [cf-dev] i18n: should cf-cli strings be exact duplicates of en-US translations?

hi,

I'm interested in gathering the community's thoughts about a proposal to improve the structure of the i18n translations.

## Issue

Currently the translation files for the CF CLI duplicate both the identifier and the translation. For example, we have translations similar to:

# en-US.json
{
"id": "A required argument for this command is missing",
"translation": "a required argument for this command is missing"
}

My understanding is that the practice most i18n-enabled projects follow is that the id is exactly an identifier which conveys some semantic meaning, rather than literally duplicating one specific language's translation.

For example, a localization resources file might instead contain something like:

{
"id": "cli.errors.missing_argument",
"translation": "a required argument for this command is missing"
}

## Benefits

The benefits of using the semantic-identifier approach are:

(1) Reduce number of change sites

For cases where a translation shows up frequently, one needs to update many more locations than is strictly necessary when a change occurs, instead of just a few (the resources file).

(2) Improve message refactoring potential

Increases the opportunities for potential message refactorings. For example, a large number of messages include embedded newlines. One example:

{
"id": "\n\nTIP:\n",
"translation": "\n\nTIP:\n"
},

This seems bad, because one might want to use the translation for "TIP:" without needing to have the newlines embedded there.

(3) Reduce number of strings that need to be translated

Many translations that are virtually identical are duplicated throughout the localization files. For example:

{
"id": "requested state",
"translation": "requested state"
},
{
"id": "requested state:",
"translation": "requested state:"
},

Under the current approach, if a string differs even by a single character an entirely new translation is required, even if the semantic meaning is the same.

With the proposed approach these can instead be merged.

(4) Clearer intent

The intent of the message is clearer because it's not explicitly called out in the identifier, so if a message changes it's less clear what set of purposes it has.

## Disadvantages

(1) Must look at two places to determine correct translation

The main disadvantage of this approach is that a translator must look at both the source and destination files to determine the correct translation. For instance, someone translating from en-US to fr-FR has to have both the en-US and fr-FR files open.

However, in general I would think one usually doesn't want a literal translation ("translate the string 'missing a required argument'"), but rather a semantic translation ("write an error message indicating that the command is missing a required argument"), so this would encourage better translations overall.

(2) Theoretically possible to have wrong translations

In a language where the translations for something like "hello" and "hello:" were different, it would not be correct to merge these and differentiate with `T("greetings.hello")` vs. `T("greetings.hello") + ":"`. I've looked at the existing translations and I don't believe that any such cases currently exist, although I could be mistaken.

## Summary

Overall I believe this adds a lot of tangible benefits and reduces cognitive overhead. I'm interested in the community's thoughts. If we agree this is not ideal I would be willing to make a PR to implement the proposal.

best,
~ jf
--
John Feminella
Advisory Platform Architect
✉ · jxf(a)pivotal.io<mailto:jxf(a)pivotal.io>
t · @jxxf
[https://track.mixmax.com/api/track/v2/LpWPaEb8DP3BUwhxk/gIvlmLsFGdvZXawBkZ4pmI/gInJ3buknck5WdvZGZ19Gbj5yc0NXasBkdlRWLmNmI/i4CbsFmclZ3bg0WZ0NXezBSZoRHIk5WYgMHdjVmavJHcgknck5WdvZEIkV3bsNEI0V3biFGIz52bpN3c1N2cpRkI?sc=false]

Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.