i18n: should cf-cli strings be exact duplicates of en-US translations?
John Feminella <jxf@...>
hi,
I'm interested in gathering the community's thoughts about a proposal to improve
the structure of the i18n translations.
## Issue
Currently the translation files for the CF CLI duplicate both the identifier and
the translation. For example, we have translations similar to:
# en-US.json { "id": "A required argument for this command is missing",
"translation": "a required argument for this command is missing" }
My understanding is that the practice most i18n-enabled projects follow is that
the id is exactly an identifier which conveys some semantic meaning, rather than
literally duplicating one specific language's translation.
For example, a localization resources file might instead contain something like:
{ "id": "cli.errors.missing_argument", "translation": "a required
argument for this command is missing" }
## Benefits
The benefits of using the semantic-identifier approach are:
(1) Reduce number of change sites
For cases where a translation shows up frequently, one needs to update many more
locations than is strictly necessary when a change occurs, instead of just a few
(the resources file).
(2) Improve message refactoring potential
Increases the opportunities for potential message refactorings. For example, a
large number of messages include embedded newlines. One example:
{
"id": "\n\nTIP:\n", "translation": "\n\nTIP:\n" },
This seems bad, because one might want to use the translation for "TIP:" without
needing to have the newlines embedded there.
(3) Reduce number of strings that need to be translated
Many translations that are virtually identical are duplicated throughout the
localization files. For example:
{ "id": "requested state", "translation": "requested state" }, {
"id": "requested state:", "translation": "requested state:" },
Under the current approach, if a string differs even by a single character an
entirely new translation is required, even if the semantic meaning is the same.
With the proposed approach these can instead be merged.
(4) Clearer intent
The intent of the message is clearer because it's not explicitly called out in
the identifier, so if a message changes it's less clear what set of purposes it
has.
## Disadvantages
(1) Must look at two places to determine correct translation
The main disadvantage of this approach is that a translator must look at both
the source and destination files to determine the correct translation. For
instance, someone translating from en-US to fr-FR has to have both the en-US and
fr-FR files open.
However, in general I would think one usually doesn't want a literal translation
("translate the string 'missing a required argument'"), but rather a semantic
translation ("write an error message indicating that the command is missing a
required argument"), so this would encourage better translations overall.
(2) Theoretically possible to have wrong translations
In a language where the translations for something like "hello" and "hello:"
were different, it would not be correct to merge these and differentiate with
`T("greetings.hello")` vs. `T("greetings.hello") + ":"`. I've looked at the
existing translations and I don't believe that any such cases currently exist,
although I could be mistaken.
## Summary
Overall I believe this adds a lot of tangible benefits and reduces cognitive
overhead. I'm interested in the community's thoughts. If we agree this is not
ideal I would be willing to make a PR to implement the proposal.
best,~ jf--John Feminella
Advisory Platform Architect
✉ ·jxf(a)pivotal.io
t · @jxxf
I'm interested in gathering the community's thoughts about a proposal to improve
the structure of the i18n translations.
## Issue
Currently the translation files for the CF CLI duplicate both the identifier and
the translation. For example, we have translations similar to:
# en-US.json { "id": "A required argument for this command is missing",
"translation": "a required argument for this command is missing" }
My understanding is that the practice most i18n-enabled projects follow is that
the id is exactly an identifier which conveys some semantic meaning, rather than
literally duplicating one specific language's translation.
For example, a localization resources file might instead contain something like:
{ "id": "cli.errors.missing_argument", "translation": "a required
argument for this command is missing" }
## Benefits
The benefits of using the semantic-identifier approach are:
(1) Reduce number of change sites
For cases where a translation shows up frequently, one needs to update many more
locations than is strictly necessary when a change occurs, instead of just a few
(the resources file).
(2) Improve message refactoring potential
Increases the opportunities for potential message refactorings. For example, a
large number of messages include embedded newlines. One example:
{
"id": "\n\nTIP:\n", "translation": "\n\nTIP:\n" },
This seems bad, because one might want to use the translation for "TIP:" without
needing to have the newlines embedded there.
(3) Reduce number of strings that need to be translated
Many translations that are virtually identical are duplicated throughout the
localization files. For example:
{ "id": "requested state", "translation": "requested state" }, {
"id": "requested state:", "translation": "requested state:" },
Under the current approach, if a string differs even by a single character an
entirely new translation is required, even if the semantic meaning is the same.
With the proposed approach these can instead be merged.
(4) Clearer intent
The intent of the message is clearer because it's not explicitly called out in
the identifier, so if a message changes it's less clear what set of purposes it
has.
## Disadvantages
(1) Must look at two places to determine correct translation
The main disadvantage of this approach is that a translator must look at both
the source and destination files to determine the correct translation. For
instance, someone translating from en-US to fr-FR has to have both the en-US and
fr-FR files open.
However, in general I would think one usually doesn't want a literal translation
("translate the string 'missing a required argument'"), but rather a semantic
translation ("write an error message indicating that the command is missing a
required argument"), so this would encourage better translations overall.
(2) Theoretically possible to have wrong translations
In a language where the translations for something like "hello" and "hello:"
were different, it would not be correct to merge these and differentiate with
`T("greetings.hello")` vs. `T("greetings.hello") + ":"`. I've looked at the
existing translations and I don't believe that any such cases currently exist,
although I could be mistaken.
## Summary
Overall I believe this adds a lot of tangible benefits and reduces cognitive
overhead. I'm interested in the community's thoughts. If we agree this is not
ideal I would be willing to make a PR to implement the proposal.
best,~ jf--John Feminella
Advisory Platform Architect
✉ ·jxf(a)pivotal.io
t · @jxxf