hi, I'm interested in gathering the community's thoughts about a proposal to improve the structure of the i18n translations. ## Issue Currently the translation files for the CF CLI duplicate both the identifier and the translation. For example, we have translations similar to: # en-US.json { "id": "A required argument for this command is missing", "translation": "a required argument for this command is missing" } My understanding is that the practice most i18n-enabled projects follow is that the id is exactly an identifier which conveys some semantic meaning, rather than literally duplicating one specific language's translation. For example, a localization resources file might instead contain something like: { "id": "cli.errors.missing_argument", "translation": "a required argument for this command is missing" } ## Benefits The benefits of using the semantic-identifier approach are: (1) Reduce number of change sites For cases where a translation shows up frequently, one needs to update many more locations than is strictly necessary when a change occurs, instead of just a few (the resources file).
(2) Improve message refactoring potential Increases the opportunities for potential message refactorings. For example, a large number of messages include embedded newlines. One example:
{ "id": "\n\nTIP:\n", "translation": "\n\nTIP:\n" }, This seems bad, because one might want to use the translation for "TIP:" without needing to have the newlines embedded there. (3) Reduce number of strings that need to be translated Many translations that are virtually identical are duplicated throughout the localization files. For example: { "id": "requested state", "translation": "requested state" }, { "id": "requested state:", "translation": "requested state:" }, Under the current approach, if a string differs even by a single character an entirely new translation is required, even if the semantic meaning is the same. With the proposed approach these can instead be merged. (4) Clearer intent The intent of the message is clearer because it's not explicitly called out in the identifier, so if a message changes it's less clear what set of purposes it has. ## Disadvantages (1) Must look at two places to determine correct translation The main disadvantage of this approach is that a translator must look at both the source and destination files to determine the correct translation. For instance, someone translating from en-US to fr-FR has to have both the en-US and fr-FR files open. However, in general I would think one usually doesn't want a literal translation ("translate the string 'missing a required argument'"), but rather a semantic translation ("write an error message indicating that the command is missing a required argument"), so this would encourage better translations overall. (2) Theoretically possible to have wrong translations In a language where the translations for something like "hello" and "hello:" were different, it would not be correct to merge these and differentiate with `T("greetings.hello")` vs. `T("greetings.hello") + ":"`. I've looked at the existing translations and I don't believe that any such cases currently exist, although I could be mistaken. ## Summary Overall I believe this adds a lot of tangible benefits and reduces cognitive overhead. I'm interested in the community's thoughts. If we agree this is not ideal I would be willing to make a PR to implement the proposal. best,~ jf--John Feminella Advisory Platform Architect ✉ ·jxf(a)pivotal.io t · @jxxf
|
|
Hi John, Thank you for your interest in the CLI’s internals. Can I ask the background of looking into this? Are you looking into adding support for another locale and finding the number of messages to translate too big? I’ve asked my team to answer. There is a caveat with Benefit (3) and Disadvantage (2): French grammar rules dictate the requirement for a space before the colon. So “Hello:” becomes “Bonjour :”. That could be the background for a number of such instances that look like duplications. Regards, Dies Koper Cloud Foundry Product Manager - CLI From: John Feminella [mailto:jxf(a)pivotal.io] Sent: Thursday, September 15, 2016 7:14 PM To: Discussions about Cloud Foundry projects and the system overall. Subject: [cf-dev] i18n: should cf-cli strings be exact duplicates of en-US translations? hi, I'm interested in gathering the community's thoughts about a proposal to improve the structure of the i18n translations. ## Issue Currently the translation files for the CF CLI duplicate both the identifier and the translation. For example, we have translations similar to: # en-US.json { "id": "A required argument for this command is missing", "translation": "a required argument for this command is missing" } My understanding is that the practice most i18n-enabled projects follow is that the id is exactly an identifier which conveys some semantic meaning, rather than literally duplicating one specific language's translation. For example, a localization resources file might instead contain something like: { "id": "cli.errors.missing_argument", "translation": "a required argument for this command is missing" } ## Benefits The benefits of using the semantic-identifier approach are: (1) Reduce number of change sites For cases where a translation shows up frequently, one needs to update many more locations than is strictly necessary when a change occurs, instead of just a few (the resources file). (2) Improve message refactoring potential Increases the opportunities for potential message refactorings. For example, a large number of messages include embedded newlines. One example: { "id": "\n\nTIP:\n", "translation": "\n\nTIP:\n" }, This seems bad, because one might want to use the translation for "TIP:" without needing to have the newlines embedded there. (3) Reduce number of strings that need to be translated Many translations that are virtually identical are duplicated throughout the localization files. For example: { "id": "requested state", "translation": "requested state" }, { "id": "requested state:", "translation": "requested state:" }, Under the current approach, if a string differs even by a single character an entirely new translation is required, even if the semantic meaning is the same. With the proposed approach these can instead be merged. (4) Clearer intent The intent of the message is clearer because it's not explicitly called out in the identifier, so if a message changes it's less clear what set of purposes it has. ## Disadvantages (1) Must look at two places to determine correct translation The main disadvantage of this approach is that a translator must look at both the source and destination files to determine the correct translation. For instance, someone translating from en-US to fr-FR has to have both the en-US and fr-FR files open. However, in general I would think one usually doesn't want a literal translation ("translate the string 'missing a required argument'"), but rather a semantic translation ("write an error message indicating that the command is missing a required argument"), so this would encourage better translations overall. (2) Theoretically possible to have wrong translations In a language where the translations for something like "hello" and "hello:" were different, it would not be correct to merge these and differentiate with `T("greetings.hello")` vs. `T("greetings.hello") + ":"`. I've looked at the existing translations and I don't believe that any such cases currently exist, although I could be mistaken. ## Summary Overall I believe this adds a lot of tangible benefits and reduces cognitive overhead. I'm interested in the community's thoughts. If we agree this is not ideal I would be willing to make a PR to implement the proposal. best, ~ jf -- John Feminella Advisory Platform Architect ✉ · jxf(a)pivotal.io<mailto:jxf(a)pivotal.io> t · @jxxf [ https://track.mixmax.com/api/track/v2/LpWPaEb8DP3BUwhxk/gIvlmLsFGdvZXawBkZ4pmI/gInJ3buknck5WdvZGZ19Gbj5yc0NXasBkdlRWLmNmI/i4CbsFmclZ3bg0WZ0NXezBSZoRHIk5WYgMHdjVmavJHcgknck5WdvZEIkV3bsNEI0V3biFGIz52bpN3c1N2cpRkI?sc=false]
|
|
hi Dies, Can I ask the background of looking into this? Are you looking into adding support for another locale and finding the number of messages to translate too big? No, I just thought it was an area that might be worth improving if others agreed, and I've been involved in a number of i18n efforts on various products. I am not personally or specifically blocked in any way by this, though as I mentioned I think it is a beneficial suggestion. That said, I think there are some areas that are more worth improving than others (assuming there is agreement to change them at all). For instance I think that embedding newlines in string keys, as in "\n\nTIP:\n", could be modified to be more amenable for translators. French grammar rules dictate the requirement for a space before the colon. So “Hello:” becomes “Bonjour :”. That could be the background for a number of such instances that look
like duplications. I agree, these kinds of cases are sometimes tricky. In such cases, my understanding is that in the same way that some locales prefer "15 July" and others prefer "July 15", so too would you use a locale-specific modifier for the colon. So in the code one might use something like `T("greeting") + T("locale.separators.colon")` if you wanted to be maximally correct, where "locale.separators.colon" maps to " :" for fr-FR and ":" for en-US. best, ~ jf On Thu, Sep 15, 2016, 18:19 Koper, Dies <diesk(a)fast.au.fujitsu.com> wrote: Hi John,
Thank you for your interest in the CLI’s internals.
Can I ask the background of looking into this? Are you looking into adding support for another locale and finding the number of messages to translate too big?
I’ve asked my team to answer.
There is a caveat with Benefit (3) and Disadvantage (2):
French grammar rules dictate the requirement for a space before the colon. So “Hello:” becomes “Bonjour :”.
That could be the background for a number of such instances that look like duplications.
Regards,
Dies Koper Cloud Foundry Product Manager - CLI
*From:* John Feminella [mailto:jxf(a)pivotal.io] *Sent:* Thursday, September 15, 2016 7:14 PM *To:* Discussions about Cloud Foundry projects and the system overall. *Subject:* [cf-dev] i18n: should cf-cli strings be exact duplicates of en-US translations?
hi,
I'm interested in gathering the community's thoughts about a proposal to improve the structure of the i18n translations.
## Issue
Currently the translation files for the CF CLI duplicate both the identifier and the translation. For example, we have translations similar to:
# en-US.json
{
"id": "A required argument for this command is missing",
"translation": "a required argument for this command is missing"
}
My understanding is that the practice most i18n-enabled projects follow is that the id is exactly an identifier which conveys some semantic meaning, rather than literally duplicating one specific language's translation.
For example, a localization resources file might instead contain something like:
{
"id": "cli.errors.missing_argument",
"translation": "a required argument for this command is missing"
}
## Benefits
The benefits of using the semantic-identifier approach are:
(1) Reduce number of change sites
For cases where a translation shows up frequently, one needs to update many more locations than is strictly necessary when a change occurs, instead of just a few (the resources file).
(2) Improve message refactoring potential
Increases the opportunities for potential message refactorings. For example, a large number of messages include embedded newlines. One example:
{
"id": "\n\nTIP:\n",
"translation": "\n\nTIP:\n"
},
This seems bad, because one might want to use the translation for "TIP:" without needing to have the newlines embedded there.
(3) Reduce number of strings that need to be translated
Many translations that are virtually identical are duplicated throughout the localization files. For example:
{
"id": "requested state",
"translation": "requested state"
},
{
"id": "requested state:",
"translation": "requested state:"
},
Under the current approach, if a string differs even by a single character an entirely new translation is required, even if the semantic meaning is the same.
With the proposed approach these can instead be merged.
(4) Clearer intent
The intent of the message is clearer because it's not explicitly called out in the identifier, so if a message changes it's less clear what set of purposes it has.
## Disadvantages
(1) Must look at two places to determine correct translation
The main disadvantage of this approach is that a translator must look at both the source and destination files to determine the correct translation. For instance, someone translating from en-US to fr-FR has to have both the en-US and fr-FR files open.
However, in general I would think one usually doesn't want a *literal* translation ("translate the string 'missing a required argument'"), but rather a *semantic* translation ("write an error message indicating that the command is missing a required argument"), so this would encourage better translations overall.
(2) Theoretically possible to have wrong translations
In a language where the translations for something like "hello" and "hello:" were different, it would not be correct to merge these and differentiate with `T("greetings.hello")` vs. `T("greetings.hello") + ":"`. I've looked at the existing translations and I don't believe that any such cases currently exist, although I could be mistaken.
## Summary
Overall I believe this adds a lot of tangible benefits and reduces cognitive overhead. I'm interested in the community's thoughts. If we agree this is not ideal I would be willing to make a PR to implement the proposal.
best,
~ jf
--
John Feminella Advisory Platform Architect ✉ · jxf(a)pivotal.io t · @jxxf
|
|
Disclosure: I used to be a developer on the CLI team
I really like the idea of having identifiers rather than English, for the same reasons that John mentioned.
For strings where a colon existed, it would make sense to include the translation twice, one with and one without the colon, for the reason Dies brought up. I think that's fine; usually it's not that but some number of newlines in the CLI codebase, like in the "\n\nTIP\n" example.
I had considered taking on this work when I was on the CLI team, but it seemed like too big a change at the time. The tooling around updating translations all assumes a particular workflow, which would need to change: i18n4go pores through the source code and compares the values it finds to what exists in the English file and makes updates in the translation files as necessary, for example.
Cheers,
KH
toggle quoted message
Show quoted text
On Thu, Sep 15, 2016 at 3:43 PM John Feminella <jxf(a)pivotal.io> wrote: hi Dies,
Can I ask the background of looking into this? Are you looking into adding support for another locale and finding the number of messages to translate too big?
No, I just thought it was an area that might be worth improving if others agreed, and I've been involved in a number of i18n efforts on various products. I am not personally or specifically blocked in any way by this, though as I mentioned I think it is a beneficial suggestion.
That said, I think there are some areas that are more worth improving than others (assuming there is agreement to change them at all). For instance I think that embedding newlines in string keys, as in "\n\nTIP:\n", could be modified to be more amenable for translators.
French grammar rules dictate the requirement for a space before the colon. So “Hello:” becomes “Bonjour :”.
That could be the background for a number of such instances that look like duplications.
I agree, these kinds of cases are sometimes tricky.
In such cases, my understanding is that in the same way that some locales prefer "15 July" and others prefer "July 15", so too would you use a locale-specific modifier for the colon. So in the code one might use something like `T("greeting") + T("locale.separators.colon")` if you wanted to be maximally correct, where "locale.separators.colon" maps to " :" for fr-FR and ":" for en-US.
best, ~ jf
On Thu, Sep 15, 2016, 18:19 Koper, Dies <diesk(a)fast.au.fujitsu.com> wrote:
Hi John,
Thank you for your interest in the CLI’s internals.
Can I ask the background of looking into this? Are you looking into adding support for another locale and finding the number of messages to translate too big?
I’ve asked my team to answer.
There is a caveat with Benefit (3) and Disadvantage (2):
French grammar rules dictate the requirement for a space before the colon. So “Hello:” becomes “Bonjour :”.
That could be the background for a number of such instances that look like duplications.
Regards,
Dies Koper Cloud Foundry Product Manager - CLI
*From:* John Feminella [mailto:jxf(a)pivotal.io] *Sent:* Thursday, September 15, 2016 7:14 PM *To:* Discussions about Cloud Foundry projects and the system overall. *Subject:* [cf-dev] i18n: should cf-cli strings be exact duplicates of en-US translations?
hi,
I'm interested in gathering the community's thoughts about a proposal to improve the structure of the i18n translations.
## Issue
Currently the translation files for the CF CLI duplicate both the identifier and the translation. For example, we have translations similar to:
# en-US.json
{
"id": "A required argument for this command is missing",
"translation": "a required argument for this command is missing"
}
My understanding is that the practice most i18n-enabled projects follow is that the id is exactly an identifier which conveys some semantic meaning, rather than literally duplicating one specific language's translation.
For example, a localization resources file might instead contain something like:
{
"id": "cli.errors.missing_argument",
"translation": "a required argument for this command is missing"
}
## Benefits
The benefits of using the semantic-identifier approach are:
(1) Reduce number of change sites
For cases where a translation shows up frequently, one needs to update many more locations than is strictly necessary when a change occurs, instead of just a few (the resources file).
(2) Improve message refactoring potential
Increases the opportunities for potential message refactorings. For example, a large number of messages include embedded newlines. One example:
{
"id": "\n\nTIP:\n",
"translation": "\n\nTIP:\n"
},
This seems bad, because one might want to use the translation for "TIP:" without needing to have the newlines embedded there.
(3) Reduce number of strings that need to be translated
Many translations that are virtually identical are duplicated throughout the localization files. For example:
{
"id": "requested state",
"translation": "requested state"
},
{
"id": "requested state:",
"translation": "requested state:"
},
Under the current approach, if a string differs even by a single character an entirely new translation is required, even if the semantic meaning is the same.
With the proposed approach these can instead be merged.
(4) Clearer intent
The intent of the message is clearer because it's not explicitly called out in the identifier, so if a message changes it's less clear what set of purposes it has.
## Disadvantages
(1) Must look at two places to determine correct translation
The main disadvantage of this approach is that a translator must look at both the source and destination files to determine the correct translation. For instance, someone translating from en-US to fr-FR has to have both the en-US and fr-FR files open.
However, in general I would think one usually doesn't want a *literal* translation ("translate the string 'missing a required argument'"), but rather a *semantic* translation ("write an error message indicating that the command is missing a required argument"), so this would encourage better translations overall.
(2) Theoretically possible to have wrong translations
In a language where the translations for something like "hello" and "hello:" were different, it would not be correct to merge these and differentiate with `T("greetings.hello")` vs. `T("greetings.hello") + ":"`. I've looked at the existing translations and I don't believe that any such cases currently exist, although I could be mistaken.
## Summary
Overall I believe this adds a lot of tangible benefits and reduces cognitive overhead. I'm interested in the community's thoughts. If we agree this is not ideal I would be willing to make a PR to implement the proposal.
best,
~ jf
--
John Feminella Advisory Platform Architect ✉ · jxf(a)pivotal.io t · @jxxf
|
|
The tooling around updating translations all assumes a particular workflow, which would need to change: i18n4go pores through the source code and compares the values it finds to what exists in the English file and makes updates in the translation files as necessary, for example. Thanks for the feedback, Kris. I'm not familiar with the broader translation workflow on CF. If you have some time offline I'd love to get your thoughts and understand what you see as the challenges. On Thu, Sep 15, 2016, 19:16 Kris Hicks <khicks(a)pivotal.io> wrote: Disclosure: I used to be a developer on the CLI team
I really like the idea of having identifiers rather than English, for the same reasons that John mentioned.
For strings where a colon existed, it would make sense to include the translation twice, one with and one without the colon, for the reason Dies brought up. I think that's fine; usually it's not that but some number of newlines in the CLI codebase, like in the "\n\nTIP\n" example.
I had considered taking on this work when I was on the CLI team, but it seemed like too big a change at the time. The tooling around updating translations all assumes a particular workflow, which would need to change: i18n4go pores through the source code and compares the values it finds to what exists in the English file and makes updates in the translation files as necessary, for example.
Cheers,
KH
On Thu, Sep 15, 2016 at 3:43 PM John Feminella <jxf(a)pivotal.io> wrote:
hi Dies,
Can I ask the background of looking into this? Are you looking into adding support for another locale and finding the number of messages to translate too big?
No, I just thought it was an area that might be worth improving if others agreed, and I've been involved in a number of i18n efforts on various products. I am not personally or specifically blocked in any way by this, though as I mentioned I think it is a beneficial suggestion.
That said, I think there are some areas that are more worth improving than others (assuming there is agreement to change them at all). For instance I think that embedding newlines in string keys, as in "\n\nTIP:\n", could be modified to be more amenable for translators.
French grammar rules dictate the requirement for a space before the colon. So “Hello:” becomes “Bonjour :”.
That could be the background for a number of such instances that look like duplications.
I agree, these kinds of cases are sometimes tricky.
In such cases, my understanding is that in the same way that some locales prefer "15 July" and others prefer "July 15", so too would you use a locale-specific modifier for the colon. So in the code one might use something like `T("greeting") + T("locale.separators.colon")` if you wanted to be maximally correct, where "locale.separators.colon" maps to " :" for fr-FR and ":" for en-US.
best, ~ jf
On Thu, Sep 15, 2016, 18:19 Koper, Dies <diesk(a)fast.au.fujitsu.com> wrote:
Hi John,
Thank you for your interest in the CLI’s internals.
Can I ask the background of looking into this? Are you looking into adding support for another locale and finding the number of messages to translate too big?
I’ve asked my team to answer.
There is a caveat with Benefit (3) and Disadvantage (2):
French grammar rules dictate the requirement for a space before the colon. So “Hello:” becomes “Bonjour :”.
That could be the background for a number of such instances that look like duplications.
Regards,
Dies Koper Cloud Foundry Product Manager - CLI
*From:* John Feminella [mailto:jxf(a)pivotal.io] *Sent:* Thursday, September 15, 2016 7:14 PM *To:* Discussions about Cloud Foundry projects and the system overall. *Subject:* [cf-dev] i18n: should cf-cli strings be exact duplicates of en-US translations?
hi,
I'm interested in gathering the community's thoughts about a proposal to improve the structure of the i18n translations.
## Issue
Currently the translation files for the CF CLI duplicate both the identifier and the translation. For example, we have translations similar to:
# en-US.json
{
"id": "A required argument for this command is missing",
"translation": "a required argument for this command is missing"
}
My understanding is that the practice most i18n-enabled projects follow is that the id is exactly an identifier which conveys some semantic meaning, rather than literally duplicating one specific language's translation.
For example, a localization resources file might instead contain something like:
{
"id": "cli.errors.missing_argument",
"translation": "a required argument for this command is missing"
}
## Benefits
The benefits of using the semantic-identifier approach are:
(1) Reduce number of change sites
For cases where a translation shows up frequently, one needs to update many more locations than is strictly necessary when a change occurs, instead of just a few (the resources file).
(2) Improve message refactoring potential
Increases the opportunities for potential message refactorings. For example, a large number of messages include embedded newlines. One example:
{
"id": "\n\nTIP:\n",
"translation": "\n\nTIP:\n"
},
This seems bad, because one might want to use the translation for "TIP:" without needing to have the newlines embedded there.
(3) Reduce number of strings that need to be translated
Many translations that are virtually identical are duplicated throughout the localization files. For example:
{
"id": "requested state",
"translation": "requested state"
},
{
"id": "requested state:",
"translation": "requested state:"
},
Under the current approach, if a string differs even by a single character an entirely new translation is required, even if the semantic meaning is the same.
With the proposed approach these can instead be merged.
(4) Clearer intent
The intent of the message is clearer because it's not explicitly called out in the identifier, so if a message changes it's less clear what set of purposes it has.
## Disadvantages
(1) Must look at two places to determine correct translation
The main disadvantage of this approach is that a translator must look at both the source and destination files to determine the correct translation. For instance, someone translating from en-US to fr-FR has to have both the en-US and fr-FR files open.
However, in general I would think one usually doesn't want a *literal* translation ("translate the string 'missing a required argument'"), but rather a *semantic* translation ("write an error message indicating that the command is missing a required argument"), so this would encourage better translations overall.
(2) Theoretically possible to have wrong translations
In a language where the translations for something like "hello" and "hello:" were different, it would not be correct to merge these and differentiate with `T("greetings.hello")` vs. `T("greetings.hello") + ":"`. I've looked at the existing translations and I don't believe that any such cases currently exist, although I could be mistaken.
## Summary
Overall I believe this adds a lot of tangible benefits and reduces cognitive overhead. I'm interested in the community's thoughts. If we agree this is not ideal I would be willing to make a PR to implement the proposal.
best,
~ jf
--
John Feminella Advisory Platform Architect ✉ · jxf(a)pivotal.io t · @jxxf
|
|
The workflow is available here: https://github.com/cloudfoundry/cli/blob/master/CONTRIBUTING.md#i18ni18n4go, goi18n, bin/generate-language-resources are the three things that need to be executed when updating any calls to the i18n.T() function (which tends to be dot-imported, so it's just T()). i18n4go digs through the AST to find calls to T(). It assumes you want the strings that are in your call to T() as your translation key and (English) value, and modifies the en-us.all.json accordingly. It also has the ability to detect changes and removals. goi18n is what fills out the *.translated.json and *.untranslated.json files with those keys that are present in en-us.all.json but missing in, for example, fr-fr.all.json. It puts empty values in the .translated files and the English value in the .untranslated files. bin/generate-language-resources rebuilds the binary representations of those JSON files into i18n_resources.go To use keys instead of the English values in calls to T(), i18n4go would need to be changed to just do the add/removal of keys, but leave empty values in the en-us.all.json file for new entries (rather than adding the English value as well). The English value would be added manually to that file at that point. That in itself isn't too complicated, except for the fact that i18n4go doesn't have much in the way of tests, so making modifications should be done with care. The part that's not so great is that once you start switching to keys, you need to switch everything to a key at once otherwise i18n4go will not be very useful at all. For example, if you switched one call to T() to be a key instead of English, i18n4go would want to remove the old key and value, and add the new key with a value that's the same as the key. You'd have to restore the English value to the new entry. If you were to add a new call to T(), you'd also get limited usefulness out of i18n4go as it would modify the en-us.all.json to add the missing entry (which would have the correct key, but wrong value), but you'd still need to update the value. The workflow is already not very good; it would just be worse in this scenario. Using proper keys also tends to imply an order/hierarchy/convention to the keys, and that requires a bit of thought to properly model both the existing strings and new ones. I haven't been on the CLI team for a number of months now, so some of the above may have changed. Cheers, KH
toggle quoted message
Show quoted text
On Thu, Sep 15, 2016 at 5:18 PM John Feminella <jxf(a)pivotal.io> wrote: The tooling around updating translations all assumes a particular workflow, which would need to change: i18n4go pores through the source code and compares the values it finds to what exists in the English file and makes updates in the translation files as necessary, for example.
Thanks for the feedback, Kris. I'm not familiar with the broader translation workflow on CF. If you have some time offline I'd love to get your thoughts and understand what you see as the challenges.
On Thu, Sep 15, 2016, 19:16 Kris Hicks <khicks(a)pivotal.io> wrote:
Disclosure: I used to be a developer on the CLI team
I really like the idea of having identifiers rather than English, for the same reasons that John mentioned.
For strings where a colon existed, it would make sense to include the translation twice, one with and one without the colon, for the reason Dies brought up. I think that's fine; usually it's not that but some number of newlines in the CLI codebase, like in the "\n\nTIP\n" example.
I had considered taking on this work when I was on the CLI team, but it seemed like too big a change at the time. The tooling around updating translations all assumes a particular workflow, which would need to change: i18n4go pores through the source code and compares the values it finds to what exists in the English file and makes updates in the translation files as necessary, for example.
Cheers,
KH
On Thu, Sep 15, 2016 at 3:43 PM John Feminella <jxf(a)pivotal.io> wrote:
hi Dies,
Can I ask the background of looking into this? Are you looking into adding support for another locale and finding the number of messages to translate too big?
No, I just thought it was an area that might be worth improving if others agreed, and I've been involved in a number of i18n efforts on various products. I am not personally or specifically blocked in any way by this, though as I mentioned I think it is a beneficial suggestion.
That said, I think there are some areas that are more worth improving than others (assuming there is agreement to change them at all). For instance I think that embedding newlines in string keys, as in "\n\nTIP:\n", could be modified to be more amenable for translators.
French grammar rules dictate the requirement for a space before the colon. So “Hello:” becomes “Bonjour :”.
That could be the background for a number of such instances that look like duplications.
I agree, these kinds of cases are sometimes tricky.
In such cases, my understanding is that in the same way that some locales prefer "15 July" and others prefer "July 15", so too would you use a locale-specific modifier for the colon. So in the code one might use something like `T("greeting") + T("locale.separators.colon")` if you wanted to be maximally correct, where "locale.separators.colon" maps to " :" for fr-FR and ":" for en-US.
best, ~ jf
On Thu, Sep 15, 2016, 18:19 Koper, Dies <diesk(a)fast.au.fujitsu.com> wrote:
Hi John,
Thank you for your interest in the CLI’s internals.
Can I ask the background of looking into this? Are you looking into adding support for another locale and finding the number of messages to translate too big?
I’ve asked my team to answer.
There is a caveat with Benefit (3) and Disadvantage (2):
French grammar rules dictate the requirement for a space before the colon. So “Hello:” becomes “Bonjour :”.
That could be the background for a number of such instances that look like duplications.
Regards,
Dies Koper Cloud Foundry Product Manager - CLI
*From:* John Feminella [mailto:jxf(a)pivotal.io] *Sent:* Thursday, September 15, 2016 7:14 PM *To:* Discussions about Cloud Foundry projects and the system overall. *Subject:* [cf-dev] i18n: should cf-cli strings be exact duplicates of en-US translations?
hi,
I'm interested in gathering the community's thoughts about a proposal to improve the structure of the i18n translations.
## Issue
Currently the translation files for the CF CLI duplicate both the identifier and the translation. For example, we have translations similar to:
# en-US.json
{
"id": "A required argument for this command is missing",
"translation": "a required argument for this command is missing"
}
My understanding is that the practice most i18n-enabled projects follow is that the id is exactly an identifier which conveys some semantic meaning, rather than literally duplicating one specific language's translation.
For example, a localization resources file might instead contain something like:
{
"id": "cli.errors.missing_argument",
"translation": "a required argument for this command is missing"
}
## Benefits
The benefits of using the semantic-identifier approach are:
(1) Reduce number of change sites
For cases where a translation shows up frequently, one needs to update many more locations than is strictly necessary when a change occurs, instead of just a few (the resources file).
(2) Improve message refactoring potential
Increases the opportunities for potential message refactorings. For example, a large number of messages include embedded newlines. One example:
{
"id": "\n\nTIP:\n",
"translation": "\n\nTIP:\n"
},
This seems bad, because one might want to use the translation for "TIP:" without needing to have the newlines embedded there.
(3) Reduce number of strings that need to be translated
Many translations that are virtually identical are duplicated throughout the localization files. For example:
{
"id": "requested state",
"translation": "requested state"
},
{
"id": "requested state:",
"translation": "requested state:"
},
Under the current approach, if a string differs even by a single character an entirely new translation is required, even if the semantic meaning is the same.
With the proposed approach these can instead be merged.
(4) Clearer intent
The intent of the message is clearer because it's not explicitly called out in the identifier, so if a message changes it's less clear what set of purposes it has.
## Disadvantages
(1) Must look at two places to determine correct translation
The main disadvantage of this approach is that a translator must look at both the source and destination files to determine the correct translation. For instance, someone translating from en-US to fr-FR has to have both the en-US and fr-FR files open.
However, in general I would think one usually doesn't want a *literal* translation ("translate the string 'missing a required argument'"), but rather a *semantic* translation ("write an error message indicating that the command is missing a required argument"), so this would encourage better translations overall.
(2) Theoretically possible to have wrong translations
In a language where the translations for something like "hello" and "hello:" were different, it would not be correct to merge these and differentiate with `T("greetings.hello")` vs. `T("greetings.hello") + ":"`. I've looked at the existing translations and I don't believe that any such cases currently exist, although I could be mistaken.
## Summary
Overall I believe this adds a lot of tangible benefits and reduces cognitive overhead. I'm interested in the community's thoughts. If we agree this is not ideal I would be willing to make a PR to implement the proposal.
best,
~ jf
--
John Feminella Advisory Platform Architect ✉ · jxf(a)pivotal.io t · @jxxf
|
|
Thanks, that was super helpful. On a similar project, I overcame some of the difficulties you described with a small shell script and use of `jq`. Essentially one got around the i18n4go problem by adding a second file which was simply an old-English-key-to-new-semantic-key mapping. The script then parsed the mapping and made the appropriate substitutions in the language JSON.
In this scheme one would:
* add a key to replace to the mapping * make the source replacements * run the script and make the locale.json replacements * run i18n4go
From the i18n4go perspective, nothing will have changed; it's as if the old key was never there and things were named after the new key all along. Repeat this until each key has been replaced.
Eventually, when all the replacements have been made, you delete the mapping and resume using i18n4go as normal. (The mapping doesn't need to be committed to source control.)
It sounds like Dies and the CLI team have some more feedback so I'll come revisit this when they've had a chance to comment.
toggle quoted message
Show quoted text
On Thu, Sep 15, 2016, 20:58 Kris Hicks <khicks(a)pivotal.io> wrote: The workflow is available here: https://github.com/cloudfoundry/cli/blob/master/CONTRIBUTING.md#i18n
i18n4go, goi18n, bin/generate-language-resources are the three things that need to be executed when updating any calls to the i18n.T() function (which tends to be dot-imported, so it's just T()).
i18n4go digs through the AST to find calls to T(). It assumes you want the strings that are in your call to T() as your translation key and (English) value, and modifies the en-us.all.json accordingly. It also has the ability to detect changes and removals.
goi18n is what fills out the *.translated.json and *.untranslated.json files with those keys that are present in en-us.all.json but missing in, for example, fr-fr.all.json. It puts empty values in the .translated files and the English value in the .untranslated files.
bin/generate-language-resources rebuilds the binary representations of those JSON files into i18n_resources.go
To use keys instead of the English values in calls to T(), i18n4go would need to be changed to just do the add/removal of keys, but leave empty values in the en-us.all.json file for new entries (rather than adding the English value as well). The English value would be added manually to that file at that point.
That in itself isn't too complicated, except for the fact that i18n4go doesn't have much in the way of tests, so making modifications should be done with care.
The part that's not so great is that once you start switching to keys, you need to switch everything to a key at once otherwise i18n4go will not be very useful at all. For example, if you switched one call to T() to be a key instead of English, i18n4go would want to remove the old key and value, and add the new key with a value that's the same as the key. You'd have to restore the English value to the new entry. If you were to add a new call to T(), you'd also get limited usefulness out of i18n4go as it would modify the en-us.all.json to add the missing entry (which would have the correct key, but wrong value), but you'd still need to update the value. The workflow is already not very good; it would just be worse in this scenario.
Using proper keys also tends to imply an order/hierarchy/convention to the keys, and that requires a bit of thought to properly model both the existing strings and new ones.
I haven't been on the CLI team for a number of months now, so some of the above may have changed.
Cheers,
KH
On Thu, Sep 15, 2016 at 5:18 PM John Feminella <jxf(a)pivotal.io> wrote:
The tooling around updating translations all assumes a particular workflow, which would need to change: i18n4go pores through the source code and compares the values it finds to what exists in the English file and makes updates in the translation files as necessary, for example.
Thanks for the feedback, Kris. I'm not familiar with the broader translation workflow on CF. If you have some time offline I'd love to get your thoughts and understand what you see as the challenges.
On Thu, Sep 15, 2016, 19:16 Kris Hicks <khicks(a)pivotal.io> wrote:
Disclosure: I used to be a developer on the CLI team
I really like the idea of having identifiers rather than English, for the same reasons that John mentioned.
For strings where a colon existed, it would make sense to include the translation twice, one with and one without the colon, for the reason Dies brought up. I think that's fine; usually it's not that but some number of newlines in the CLI codebase, like in the "\n\nTIP\n" example.
I had considered taking on this work when I was on the CLI team, but it seemed like too big a change at the time. The tooling around updating translations all assumes a particular workflow, which would need to change: i18n4go pores through the source code and compares the values it finds to what exists in the English file and makes updates in the translation files as necessary, for example.
Cheers,
KH
On Thu, Sep 15, 2016 at 3:43 PM John Feminella <jxf(a)pivotal.io> wrote:
hi Dies,
Can I ask the background of looking into this? Are you looking into adding support for another locale and finding the number of messages to translate too big?
No, I just thought it was an area that might be worth improving if others agreed, and I've been involved in a number of i18n efforts on various products. I am not personally or specifically blocked in any way by this, though as I mentioned I think it is a beneficial suggestion.
That said, I think there are some areas that are more worth improving than others (assuming there is agreement to change them at all). For instance I think that embedding newlines in string keys, as in "\n\nTIP:\n", could be modified to be more amenable for translators.
French grammar rules dictate the requirement for a space before the colon. So “Hello:” becomes “Bonjour :”.
That could be the background for a number of such instances that look like duplications.
I agree, these kinds of cases are sometimes tricky.
In such cases, my understanding is that in the same way that some locales prefer "15 July" and others prefer "July 15", so too would you use a locale-specific modifier for the colon. So in the code one might use something like `T("greeting") + T("locale.separators.colon")` if you wanted to be maximally correct, where "locale.separators.colon" maps to " :" for fr-FR and ":" for en-US.
best, ~ jf
On Thu, Sep 15, 2016, 18:19 Koper, Dies <diesk(a)fast.au.fujitsu.com> wrote:
Hi John,
Thank you for your interest in the CLI’s internals.
Can I ask the background of looking into this? Are you looking into adding support for another locale and finding the number of messages to translate too big?
I’ve asked my team to answer.
There is a caveat with Benefit (3) and Disadvantage (2):
French grammar rules dictate the requirement for a space before the colon. So “Hello:” becomes “Bonjour :”.
That could be the background for a number of such instances that look like duplications.
Regards,
Dies Koper Cloud Foundry Product Manager - CLI
*From:* John Feminella [mailto:jxf(a)pivotal.io] *Sent:* Thursday, September 15, 2016 7:14 PM *To:* Discussions about Cloud Foundry projects and the system overall. *Subject:* [cf-dev] i18n: should cf-cli strings be exact duplicates of en-US translations?
hi,
I'm interested in gathering the community's thoughts about a proposal to improve the structure of the i18n translations.
## Issue
Currently the translation files for the CF CLI duplicate both the identifier and the translation. For example, we have translations similar to:
# en-US.json
{
"id": "A required argument for this command is missing",
"translation": "a required argument for this command is missing"
}
My understanding is that the practice most i18n-enabled projects follow is that the id is exactly an identifier which conveys some semantic meaning, rather than literally duplicating one specific language's translation.
For example, a localization resources file might instead contain something like:
{
"id": "cli.errors.missing_argument",
"translation": "a required argument for this command is missing"
}
## Benefits
The benefits of using the semantic-identifier approach are:
(1) Reduce number of change sites
For cases where a translation shows up frequently, one needs to update many more locations than is strictly necessary when a change occurs, instead of just a few (the resources file).
(2) Improve message refactoring potential
Increases the opportunities for potential message refactorings. For example, a large number of messages include embedded newlines. One example:
{
"id": "\n\nTIP:\n",
"translation": "\n\nTIP:\n"
},
This seems bad, because one might want to use the translation for "TIP:" without needing to have the newlines embedded there.
(3) Reduce number of strings that need to be translated
Many translations that are virtually identical are duplicated throughout the localization files. For example:
{
"id": "requested state",
"translation": "requested state"
},
{
"id": "requested state:",
"translation": "requested state:"
},
Under the current approach, if a string differs even by a single character an entirely new translation is required, even if the semantic meaning is the same.
With the proposed approach these can instead be merged.
(4) Clearer intent
The intent of the message is clearer because it's not explicitly called out in the identifier, so if a message changes it's less clear what set of purposes it has.
## Disadvantages
(1) Must look at two places to determine correct translation
The main disadvantage of this approach is that a translator must look at both the source and destination files to determine the correct translation. For instance, someone translating from en-US to fr-FR has to have both the en-US and fr-FR files open.
However, in general I would think one usually doesn't want a *literal* translation ("translate the string 'missing a required argument'"), but rather a *semantic* translation ("write an error message indicating that the command is missing a required argument"), so this would encourage better translations overall.
(2) Theoretically possible to have wrong translations
In a language where the translations for something like "hello" and "hello:" were different, it would not be correct to merge these and differentiate with `T("greetings.hello")` vs. `T("greetings.hello") + ":"`. I've looked at the existing translations and I don't believe that any such cases currently exist, although I could be mistaken.
## Summary
Overall I believe this adds a lot of tangible benefits and reduces cognitive overhead. I'm interested in the community's thoughts. If we agree this is not ideal I would be willing to make a PR to implement the proposal.
best,
~ jf
--
John Feminella Advisory Platform Architect ✉ · jxf(a)pivotal.io t · @jxxf
|
|