The User Guide and the Chinese language code

humdinger · November 6, 2024, 7:15pm

I recently opened this ticket: #19216 (Change userguide language code for Chinese) – Haiku

Please head over there to read it and the comment(s) and give feedback, if you’re savvy in that area. I think we can accommodate two Chinese user guide versions or so, but not every permutation of zh-Hans, zh-Hant and CN, HK, MO, SG, TW…

yjwork · November 7, 2024, 4:52am

vlc,nano…
Need to specify environment variables.
env LC_MESSAGES=zh_CN

yjwork · November 7, 2024, 5:27am

Debian linux

yjwork · November 7, 2024, 5:42am

The main encoding used in Linux :
zh_CN, zh_TW

The main encoding used in Haiku:
zh_Hans,zh_Hant

michel · November 7, 2024, 10:12am

I think @PulkoMandy detailed the political issues quite well. Much as we try to avoid politics on this forum, these things have consequences. That wouldn’t matter in the case of Portuguese, since Portugal and Brazil generally accept each other’s existence now, but both mainland China and Taiwan claim to be the real China. Get it wrong and you may find Haiku banned by one or both of the countries involved in the next round of saber-rattling. The HANS/HANT distinction at least tries to remain neutral.

humdinger · November 7, 2024, 6:06pm

@yjwork , I don’t know what I’m seeing in your screenshots…

I’m aware of “Chinese simplfied = zh_Hans” and “Chinese traditional = zh_Hant”.
What I was wondering is, what the in the ticket linked Stackexchange means WRT the user guide. It says:

There are several countries where Chinese is the main written language. The major difference between them is whether they use simplified or traditional characters, but there are also minor regional differences (in vocabulary, etc)

And another person says:

The difference between HANS and HANT is much less useful than CN and TW, as the difference is more than the characters, but region-specific usage. E.g. subroutine is translated as 子程序 in mainland China, but as 子程式 in Taiwan. In this example, the characters are the same in Simplified and Traditional Chinese, but the translation should still be different.

It’s a bit more confusing tha pt_PT vs. pt_BR…

KENZ · November 7, 2024, 11:26pm

As a Japanese, I may be familiar with those languages used in neighbor regions than you. Though I personally can’t write and speak in.

Simplified Chinese characters and traditional Chinese characters (I emphasis this doesn’t mean the roots of country or region/culture is traditional, but just the former changed writing character glyphs in some reason and the latter didn’t.) and of course Japanese Kanji has same roots.
But those writing characters are not used interchangeably in neither China or Taiwan in decades AFAIK.
And used vocabularies newly introduced in those country or region diversed much. So distinguishing writing characters (glyphs/codepoints) and languages (mean grammars and vocabularies) used in those areas are not useful for translation.

I think this is because people in those area prefer cz_CN and cz_TW than cz_HANS and cz_HANT.

MichaelPeppers · November 8, 2024, 1:44am

I’ll weigh in my 2 cents as somebody who studied Japanese in college for years, while I’m not that good with it I might help you clarify what you may perceive as confusing statements, because frankly they probably are if you are not familiar with how “sinograms” (Chinese characters) work.

I’ll make it short or else I might go on a tangent, each country that adopted the original Chinese characters in some form throughout history has molded them with use.
In this scenario we’re talking about the 2 modern Chinese variants, simplified (with generally less complex characters) and traditional (which generally stuck closer to the classical Han-era characters), each of them has been adopted by their respective government in an attempt to standardize writing, simple as.
('cause if you don’t do that the writing situation can become extremely chaotic very fast, as people would be able to add their own flourishes in the characters etc.)

While I cannot claim to know Chinese so take my opinion with a good amount of salt, I am not aware of any variant of sinograms for Chinese that is as widespread and used as either traditional or simplified, every Chinese speaker in theory should be able to read one or the other, any other possible set (for Chinese) should be some slight variation on those two at most, but probably it’s like different names for the same thing(s).

Personally if I know the Kanji equivalent I can read (as in more or less understand the meaning) and recognize its Chinese variant most of the time, be it traditional or simplified, so imo even in the hypothetical edge case a Chinese speaker is familiar only with some obscure variant of those they should be able to navigate the interface well enough using one of the two standards, because they can all be traced back to the same root character system.

Or as yjwork put it:

These two alone should be enough to cover 99% of Chinese users’ expectations.

As for the confusing part, I’ll blame that subroutine example, as the characters used there are actually not the same, both in writing and meaning afaik.

Probably worth keeping in mind that, as KENZ noted, Chinese users should generally be more accustomed to zh_CN and zh_TW, though I get that zh_HANS and zh_HANT don’t refer to actual countries.

Ilovehotdog · November 8, 2024, 4:58am

Simplified Chinese characters is a short （or easier）mode of traditional Chinese characters 。

any Chinese people can read those two kinds of characters。（most people have forgot how to write traditional Chinese characters。）（and，we have many kinds of accent to speak these characters.)

（most of Chinese people learn the Simplified Chinese characters .
because it get most enough when 10 years old.
if you learn traditional Chinese characters , it need 15 years old.
much harder.）

all kinds of them is the same.(when you get the root。 because it is the one same root)

roiredxsoto · November 8, 2024, 11:39am

Sorry to disagree. Most people from PRC won’t understand Traditional Chinese. Even in the PRC there are lots of people who can’t read, nor write a lot of Simplified Chinese characters.

And I agree that zh_CN and zh_TW are what everyone expects in the PRC and Taiwan as well as any other place around the world as standard chinese variants (without entering the debate with Cantonese, Honkongnese…)

zh_hans reminds me of Hans Zimmerman… if you know who that is

Regards,
RR

MichaelPeppers · November 8, 2024, 12:02pm

Forgot to add in my last post what’s probably the most confusing thing about these characters for people who don’t use them:

Sinograms are not sound-based symbols (ie. Latin/Cyrillic alphabet), they are meaning-based, as in the character only strictly tells you what it means, each language/dialect/variant speaker will read the character and slap their own native sound on it. That’s why any other distinction than simplified/traditional is basically irrelevant in written Chinese, the variants come into play after the character is “parsed” from writing into speech etc.

Example: 漢字 vs 汉字 (meaning “Han character(s)”)

漢字 is used in traditional Chinese, Japanese, (old?) Korean and some other variants in the area, while 汉字 is from simplified Chinese. Regardless of how you write it, in Japanese it’s pronounced “Kanji”, in standard Chinese “Hanzi” and in Korean “Hanja”. Pronunciation comes from interpreting the symbols.

yjwork · November 8, 2024, 12:37pm

In Linux systems, it is mostly zh_CN and zh_TW .
You can see from the screenshot above that there are only 2 files in the directory of zh_hans.

Most of the Haiku original program translation files are located in zh-Hans.And VLC and nano use zh_CN. Environment variables need to be added to appear in Chinese.

Most Chinese can understand traditional Chinese, but the terms in Taiwan may be different.
In our opinion, zh_TW is the same as zh_HK. We believe that zh_TW is Taiwan, China.

So what needs to be considered is not political issues, but how to maintain consistency.
If we keep zh_ hans and zh_hant, then when other software compiles and packages them, we need to modify zh_CN to zh_ hans(t).
My English is not good, I hope you can understand.

Ilovehotdog · November 8, 2024, 3:01pm

big wrong.

any Chinese people have a smart phone.

no one can use smart phone without reading skills.

humdinger · November 9, 2024, 1:24pm

So, everyone agrees:

Chinese simplified	= zh_CN	= zh_Hans
Chinese traditional	= zh_TW	= zh_Hant

To illustrate locale --all | grep zh returns:

zh.UTF-8
zh_Hans.UTF-8
zh_Hans_CN.UTF-8
zh_Hans_HK.UTF-8
zh_Hans_MO.UTF-8
zh_Hans_SG.UTF-8
zh_Hant.UTF-8
zh_Hant_HK.UTF-8
zh_Hant_MO.UTF-8
zh_Hant_TW.UTF-8

I point out that this is simply an organizational question. The end-user sees a label “Chinese simplified” or “Chinese traditional” (or rather the Chinese symbols for that) and picks one or the other. I assume…

IMO, if someone ports an app, it’s part of the porting effort to take care of using the needed environment variables etc.

I saw that the userguide uses zh_CN as code (which would be “Chinese simplified”) and wondered if someone asked to translate to “Chinese traditional”, would we add the code zh_TW, and why the userguide code differs from the GUI translation and ICU code.
As said, it feels more consistent to go with identical codes, and since we cannot realistically change ICU, shouldn’t we go with zh_Hans/Hant for the userguide as well?

MichaelPeppers · November 9, 2024, 1:44pm

IMO, if someone ports an app, it’s part of the porting effort to take care of using the needed environment variables etc.

In theory I would agree with that, in practice I would be surprised if most porters would even be aware of this being needed for a specific set of locales and the issue would inevitably pop up again and again for a limited subset of users, which would also make being aware of such issues somewhat hard, that’s not sustainable imo.

Personally I would suggest implementing a system that links zh_CN to zh_Hans and zh_TW to zh_Hant in some way, then Haiku gets to use the preferred internal name, people get consistent locales and everybody wins.

I cannot speak for Chinese users but if the user won’t notice the difference I guess that would be alright?

yjwork · December 7, 2024, 2:36pm

Hope to achieve it soon, thank you

humdinger · December 8, 2024, 7:12am

Very good!

Once somone starts on a Traditional Chinese userguide translation, I propose to do that under “zh_Hant” and see if we can rename the existing Simplified Chinese translation from the current “zh_CN” to “zh_Hans”.

apl · December 8, 2024, 7:40am

The record for “Chinese (simplified)” language in HaikuDepotServer is zh_Hans.

humdinger · December 8, 2024, 7:48am

Yeah, it seems as it’s just the userguide that doesn’t fit in with the rest…

nephele · December 8, 2024, 8:05am

The userguide language selector should be reworked anyway, a combobox that doesn’t look like a combobox is a bit icky :3