GNU C函式庫

2010/03/11ㄚ琪開始翻譯!

訊息翻譯

訊息翻譯的統一作法

gettext的家族函式

如何決定哪個目錄要使用

The functions to retrieve the translations for a given message have a remarkable simple interface. But to provide the user of the program still the opportunity to select exactly the translation s/he wants and also to provide the programmer the possibility to influence the way to locate the search for catalogs files there is a quite complicated underlying mechanism which controls all this. The code is complicated the use is easy.

Basically we have two different tasks to perform which can also be performed by the catgets functions:

  1. Locate the set of message catalogs. There are a number of files for different languages and which all belong to the package. Usually they are all stored in the filesystem below a certain directory.

    There can be arbitrary many packages installed and they can follow different guidelines for the placement of their files.

  2. Relative to the location specified by the package the actual translation files must be searched, based on the wishes of the user. I.e., for each language the user selects the program should be able to locate the appropriate file.

This is the functionality required by the specifications for gettext and this is also what the catgets functions are able to do. But there are some problems unresolved:

  • The language to be used can be specified in several different ways. There is no generally accepted standard for this and the user always expects the program understand what s/he means. E.g., to select the German translation one could write de, german, or deutsch and the program should always react the same.
  • Sometimes the specification of the user is too detailed. If s/he, e.g., specifies de_DE.ISO-8859-1 which means German, spoken in Germany, coded using the ISO 8859-1 character set there is the possibility that a message catalog matching this exactly is not available. But there could be a catalog matching de and if the character set used on the machine is always ISO 8859-1 there is no reason why this later message catalog should not be used. (We call this message inheritance.)
  • If a catalog for a wanted language is not available it is not always the second best choice to fall back on the language of the developer and simply not translate any message. Instead a user might be better able to read the messages in another language and so the user of the program should be able to define an precedence order of languages.

We can divide the configuration actions in two parts: the one is performed by the programmer, the other by the user. We will start with the functions the programmer can use since the user configuration will be based on this.

As the functions described in the last sections already mention separate sets of messages can be selected by a domain name. This is a simple string which should be unique for each program part with uses a separate domain. It is possible to use in one program arbitrary many domains at the same time. E.g., the GNU C Library itself uses a domain named libc while the program using the C Library could use a domain named foo. The important point is that at any time exactly one domain is active. This is controlled with the following function.

— 函式:char * textdomain (const char *domainname)

textdomain函式設定預設的域名給domainname,這域名是所有gettext類的呼叫使用,請注意假如這些函式的domainname參數不是null指標時,dgettextdcgettext呼叫不會受影響。

Before the first call to textdomain the default domain is messages. This is the name specified in the specification of the gettext API. This name is as good as any other name. No program should ever really use a domain with this name since this can only lead to problems.

The function returns the value which is from now on taken as the default domain. If the system went out of memory the returned value is NULL and the global variable errno is set to ENOMEM. Despite the return value type being char * the return string must not be changed. It is allocated internally by the textdomain function.

If the domainname parameter is the null pointer no new default domain is set. Instead the currently selected default domain is returned.

If the domainname parameter is the empty string the default domain is reset to its initial value, the domain with the name messages. This possibility is questionable to use since the domain messages really never should be used.

— 函式: char * bindtextdomain (const char *domainname, const char *dirname)

bindtextdomain函式用於指定包含不同語言域名的訊息目錄,要正確的使用,就要有層次結構目錄中的目錄,下面會解釋這細節。

對程式設計師來說注意程式所要的翻譯被放在那個目錄結構開始是很重要的,像我們說/foo/bar,然後這個程式應該使用bindtextdomain呼叫來繫結目前程式對這個目錄的域名,所以要確認這個目錄可以找到,一支正確執行的程式不會依賴使用者設定一個環境變數。

bindtextdomain函式可以重複使用假如這個跟之前繫結的域名不同的domainname 參數沒有被覆寫時。

假如該程式想要在某個時間使用bindtextdomain,它就可以使用chdir函式來改變目前的工作目錄,dirname字串應該是絕對路徑名稱是很重要的,否則處理的目錄就會隨時間改變。

假如dirname參數是null指標,bindtextdomain會傳回目前domainname 域名所選擇的目錄。

bindtextdomain函式傳回一個含有所選擇目錄名稱的字串指標,這個字串會在函式內被配置耳且不能被使用者改變,假如系統在bindtextdomain執行時開始不用核心,傳回值會是NULL而且全域變數errno 會跟著被設定。

如何指定gettext使用的輸出字元集

gettext不僅在訊息目錄理查找翻譯,它還即時轉換翻譯為想要的輸出字元集,如果用戶正在使用不同的字元集它會比建構訊息目錄的翻譯者有用,因為它可以避免不同訊息目錄的傳播,而這訊息目錄只是因字元集不同而已。

預設的輸出字元集是nl_langinfo的值(CODESET),這個值是目前語言環境的LC_CTYPE部份,但是用與語言環境無關方式(例如UTF-8)儲存字串的程式可以藉由bind_textdomain_codeset函式的使用請求gettext跟相關的函式用那個編碼傳回翻譯。

注意gettext的msgid參數並不取決於字元集的轉換,而是當gettext沒有找到msgid的翻譯時,它會不變地傳回msgid – 跟目前輸出的字元集無關,因此建議所有的msgids是US-ASCII字串。

— 函式:char * bind_textdomain_codeset (const char *domainname, const char *codeset)

bind_textdomain_codeset 函式用來指定訊息目錄的domainname域名之輸出字元集,codeset參數必須適合法的代碼名稱,這名稱是iconv_open函式使用的,又或是一個null指標。

假如codeset參數是null指標時,bind_textdomain_codeset傳回目前domainname域名選擇的代碼集,假如尚未選擇代碼集會傳回NULL。

bind_textdomain_codeset函式可以重複使用,假如以相同的domainname參數重複使用,那麼較後的呼叫會覆寫之前產生的設定。

bind_textdomain_codeset函式傳回一個包含所選擇的代碼集名稱的字串指標,這個字串在函式內被配置,並且不能由使用者更改,假如系統在bind_textdomain_codeset執行時不再使用核心,傳回的值是NULL而且全域變數errno會跟著設定。