Re: i18n: should cf-cli strings be exact duplicates of en-US translations?


John Feminella <jxf@...>
 

hi Dies,

Can I ask the background of looking into this? Are you looking into
adding support for another locale and finding the number of messages to
translate too big?

No, I just thought it was an area that might be worth improving if others
agreed, and I've been involved in a number of i18n efforts on various
products. I am not personally or specifically blocked in any way by this,
though as I mentioned I think it is a beneficial suggestion.

That said, I think there are some areas that are more worth improving than
others (assuming there is agreement to change them at all). For instance I
think that embedding newlines in string keys, as in "\n\nTIP:\n", could be
modified to be more amenable for translators.

French grammar rules dictate the requirement for a space before the
colon. So “Hello:” becomes “Bonjour :”.

That could be the background for a number of such instances that look
like duplications.

I agree, these kinds of cases are sometimes tricky.

In such cases, my understanding is that in the same way that some locales
prefer "15 July" and others prefer "July 15", so too would you use a
locale-specific modifier for the colon. So in the code one might use
something like `T("greeting") + T("locale.separators.colon")` if you wanted
to be maximally correct, where "locale.separators.colon" maps to " :" for
fr-FR and ":" for en-US.

best,
~ jf

On Thu, Sep 15, 2016, 18:19 Koper, Dies <diesk(a)fast.au.fujitsu.com> wrote:

Hi John,



Thank you for your interest in the CLI’s internals.



Can I ask the background of looking into this? Are you looking into adding
support for another locale and finding the number of messages to translate
too big?



I’ve asked my team to answer.



There is a caveat with Benefit (3) and Disadvantage (2):

French grammar rules dictate the requirement for a space before the colon.
So “Hello:” becomes “Bonjour :”.

That could be the background for a number of such instances that look like
duplications.



Regards,

Dies Koper
Cloud Foundry Product Manager - CLI





*From:* John Feminella [mailto:jxf(a)pivotal.io]
*Sent:* Thursday, September 15, 2016 7:14 PM
*To:* Discussions about Cloud Foundry projects and the system overall.
*Subject:* [cf-dev] i18n: should cf-cli strings be exact duplicates of
en-US translations?



hi,



I'm interested in gathering the community's thoughts about a proposal to
improve the structure of the i18n translations.



## Issue



Currently the translation files for the CF CLI duplicate both the
identifier and the translation. For example, we have translations similar
to:



# en-US.json

{

"id": "A required argument for this command is missing",

"translation": "a required argument for this command is missing"

}



My understanding is that the practice most i18n-enabled projects follow is
that the id is exactly an identifier which conveys some semantic meaning,
rather than literally duplicating one specific language's translation.



For example, a localization resources file might instead contain something
like:



{

"id": "cli.errors.missing_argument",

"translation": "a required argument for this command is missing"

}



## Benefits



The benefits of using the semantic-identifier approach are:



(1) Reduce number of change sites



For cases where a translation shows up frequently, one needs to update
many more locations than is strictly necessary when a change occurs,
instead of just a few (the resources file).



(2) Improve message refactoring potential



Increases the opportunities for potential message refactorings. For
example, a large number of messages include embedded newlines. One example:



{

"id": "\n\nTIP:\n",

"translation": "\n\nTIP:\n"

},



This seems bad, because one might want to use the translation for "TIP:"
without needing to have the newlines embedded there.



(3) Reduce number of strings that need to be translated



Many translations that are virtually identical are duplicated throughout
the localization files. For example:



{

"id": "requested state",

"translation": "requested state"

},

{

"id": "requested state:",

"translation": "requested state:"

},



Under the current approach, if a string differs even by a single character
an entirely new translation is required, even if the semantic meaning is
the same.



With the proposed approach these can instead be merged.



(4) Clearer intent



The intent of the message is clearer because it's not explicitly called
out in the identifier, so if a message changes it's less clear what set of
purposes it has.



## Disadvantages



(1) Must look at two places to determine correct translation



The main disadvantage of this approach is that a translator must look at
both the source and destination files to determine the correct translation.
For instance, someone translating from en-US to fr-FR has to have both the
en-US and fr-FR files open.



However, in general I would think one usually doesn't want a *literal* translation
("translate the string 'missing a required argument'"), but rather a
*semantic* translation ("write an error message indicating that the
command is missing a required argument"), so this would encourage better
translations overall.



(2) Theoretically possible to have wrong translations



In a language where the translations for something like "hello" and
"hello:" were different, it would not be correct to merge these and
differentiate with `T("greetings.hello")` vs. `T("greetings.hello") + ":"`.
I've looked at the existing translations and I don't believe that any such
cases currently exist, although I could be mistaken.



## Summary



Overall I believe this adds a lot of tangible benefits and reduces
cognitive overhead. I'm interested in the community's thoughts. If we agree
this is not ideal I would be willing to make a PR to implement the proposal.



best,

~ jf

--

John Feminella
Advisory Platform Architect
✉ · jxf(a)pivotal.io
t · @jxxf


Join cf-dev@lists.cloudfoundry.org to automatically receive all group messages.