Design and implementation of multi language base map information system for digital xinjiang

tarix	26.06.2016
ölçüsü	1.77 Mb.

DESIGN AND IMPLEMENTATION OF MULTI LANGUAGE BASE MAP INFORMATION SYSTEM FOR DIGITAL XINJIANG

A. Kurban A. Ablimit X. Chen

Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences

40-3 South Beijing Road, Urumqi, Xinjiang 830011, China

alishir@ms.xjb.ac.cn

Abstract:

Internationalization is one of the major factors that is currently influencing the application of geographic information systems (GIS) throughout the world. Localization of GIS provides practical working conditions for end users in one specific country or region. Xinjiang Uyghur Autonomous Region (XUAR) is the largest province in area in China constituting 1/6 of China and borders Mongolia, Russia, Kazakhstan, Kyrgyzstan, Tajikistan, Afghanistan, Pakistan and India. There are 13 minorities, such as Uyghur, Kazak, Uzbek, that live in the region with many ethnic languages that use their own writing script that originated and were modified from Arabic characters. Today, most commercial GIS software support Unicode encoding method making it possible to build a multilanguage attribute database within a GIS.

This paper introduces the results of preliminary research in developing a Multilanguage Base Map Information System (MBMIS) for the Digital Xinjiang (DXJ) Project. The multilanguage labeling and annotation of maps for graphic display and printing have been tested with Uyghur, Chinese, Russian and English for describing regional information in the local language and sharing geographic information with the world. The research presented describes the desktop user interface, website client interface, attribute database, and the place name database. In addition, to support the correct pronunciation of a place name in the local language, voice recordings in Uyghur and Chinese linked to the database are included in the multimedia database for good oral communication. The system is still undergoing testing to make it more flexible and stable.

Introduction

Currently, multilingualism is a widespread phenomenon as there are more than 6,800 languages across the globe. As we move into a more global economy, interactions between cultures occur daily. Businesses, governments, and nongovernmental organizations bring goods and services to every facet of society in every part of the earth. To get the geographical information about an intresting place in the world, the Internet is being the best way of all consumers in the world, as well as well the results of the develepment of geographical information system based on the Internet technology. As a result, the demands of multi-language supported geographical information database are increased steadily. The web services have to distribute, portray and process time-variant spatial data for multi-lingual stakeholders, visitors, investors and even decision makers from different language communities.

As a most attractive place for traveling, trading and investing, Xinjiang should be able to provide complete and detailed information to consumers as quickly as possible. For making a good public geographical information service platform for both local and global visitors and investors, the preliminary research of Multilanguage Base Map Information System (MBMIS) for the Digital Xinjiang (DXJ) Project was initiated.

Language support is usually only limited by the underlying technology, such as operating systems and relational database systems. Customization options enable GIS professionals to generate intuitive and easy-to-use interfaces. Windows XP is selected as operating systems while ArcGIS is selected as the modern geographic information system software for this project. Both support most of the major world languages by Unicode character set as well as offer extensive functionality and allow for full customization of a wide variety of applications. As a pilot project. Uyghur, Chinese, English and Russian user interface versions and database are planned for testing. This paper will present some results of this research work.

Method and Base Map Database

ArcGIS Desktop supports 136 locales and the following 17 language groups on Windows XP: Arabic,Greek,Simplified Chinese, Traditional Chinese, Armenian,Hebrew, Baltic, Indic, Turkish, Central Europe, Japanese, Vietnamese, Cyrillic,Korean,Western Europe,Georgian, Thai, United States English. All these languages are supported with Unicode standard in the systems. As Uyghur writing script originated and were modified from Arabic characters while Uyghur language is a member of Turkic language group, Uyghur characters are supported in Windows XP, but not in the default case, therefore a special input method and Uyghur unicode fonts was needed. (inVista, there is an Uyghur input method and a Uyghur font called Microsoft Uighur so it does not need another input method but we set up other Uyghur unicode font in our research.)

Unicode is a group of character encoding formats that support most of the world's major languages. Several formats of Unicode are available today including UTF-8, UTF-16 and UTF-32. It is becoming a popular encoding format as more data contains characters of multiple languages.[3] It provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.[3] These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption.[1] Of course, we could not imagine to build a multiligual geographical database without any transformation.

A brief analysis to the usage of Unicode shows that it is needed the most in multi-lingual environments that necessitate either the usage of several languages in the same document or the frequent exchange of data of different native languages. If the GIS is not ready to properly display and handle those other native languages, it will leave its user helpless.[3]

Perhaps the GIS domain is not in desperate need to use Unicode as the WWW is now available. However, once Unicode becomes the prevailing standard on the Internet, many data processors will have to comply with such a seemingly unnecessary standard. It is in the opinion of the authors, that the complications of including such standard to a certain software package such as ArcGIS could be cumbersome, if it is not taken into consideration at an early enough stage and prepared for. [3]

ArcGIS Desktop applications, such as ArcMap, are Unicode based, so they support Unicode to a certain level. The level of Unicode support depends on the data format. [3]

Currently, a personal geodatabase is the only data format that supports Unicode by default. It is even possible to store and display characters of multiple languages in a single personal geodatabase which makes it possible for applications with Uyghur characters in GIS, even with some problems with automation of proper font selection. If the characters are not displayed correctly, it is necessary to verify that the font is set to Unicode, such as Tahoma, Microsoft Uighur.

This project involves Multi-lingual information in Uyghur, Chinese, English and Russian. These diverse characters in Unicode are distributed in Arabic, Simplify Chinese, Latin, Cyrillic character sets respectively. In addition , Uyghur letters, unlike the other three, are written and read from right to left. As a result, it would become more compilicated to utilize it with the other three character sets and Arabic numerals.

The recommended Uyghur input method by Uyghur Computer Science Association “Oyghan Uyghur Unicode IME 3.0” on its website (http://www.ukij.org/oyghan/) is introduced to this system as an Uyghur input method for both Windows XP Prof Kurban essional version and ArcGIS 9.0.

The Uyghur fonts used multilanguage user interface for the desktop downloaded from http://www.ukij.org/oyghan/: UKIJ Basma, UKIJ Tuz, UKIJ Tuz Tom, UKIJ Esliye and Uyghur Tuz Unicode… and Microsoft Uighur in Windows Vista ™.

As ArcGIS Geodatabase supports Unicode characters, all data stored in the geodatabase data model support a multi-lingual geodatabase. The 1:1 million scale base map data of Xinjiang Uighur Autonomous Region was imported to the geodatabase and the multilingual information for the place name was added.

The place names database in multilanguage

One main factor in the usability of the base map information is the diversity of languages, particlularly in Xinjiang. Maps for mainstream use would need to be fluently multilingual. Place names and map legends ought to be presented in multiple languages though not necessarily all at once. Uyghur and Chinese are selected as the dominant languages while English and Russian as international languages, since Russian is more commonly spoken in Central Asia than English.

Problems with place names include:

lack of place name information on the Internet about this rural area with geographical location;
misspelling and mispronounciation are another factor that affect the communication of the place name;
variations in the place name in English or Latin character and in Chinese Pinyin caused by Chinese pronounciation.

To solve these problems, after importing the orignal base map database to the system geodatabase, the fields for the other three languages are added and the place name translated and checked for the preliminery testing. To use Uyghur Latin Character (ULY) one extra field was added to the geodatabase. Table 1 shows the strucure of the geodatabase and Table 2 shows the attributes. The system will support more possible searches through the Internet by place name to all users in the world than any other system. In addition, to support the correct pronunciation of a place name in the local language, voice recordings in Uyghur and Chinese linked to the database are included in the multimedia database not only for good oral communication but also to support the exact place name presentation to all non-native sepeakers.

Table 1 The strucure of place name geodatabase

Field	Data Type	Description
ID	Integer	ID
Geo_ID	Integer	Geo_ID
Name_Cn	Unicode String	Name in Chinese
Name_Py	Unicode String	Name in Chinese Pinyin
Name_Ui	Unicode String	Name in Uyghur (Arabic)
Name_Ul	Unicode String	Name in Uyghur (Latin)
Name_En	Unicode String	Name in English
Name_Ru	Unicode String	Name in Russian
Pron_Ui	Hyper link	Pronunciation in Uyghur
Pron_Ch	Hyper link	Pronunciation in Chinese

Table 2 Attributes of Placename geodatabase

Place names in Xinjiang mostly have two different pronouciation. For example, Urumqi ( Urumchi) is also spelled in Chinese Pinyin as Wu u muqi . In this example, the pronounciation is similar so it is not too difficult to distingush it for local people or visitors, but it is a big problem if you use it for searching in the Internet. We are trying to develop a method for correcting this sort of problem in the future. (More examples are shown in Table 3)

Table 3 Examples of Pronouciation differences

Name_Ch	Name_Py	Name_Ui	Name_Ul	Name_En	Pron_Ui	Pron_Ui
Name in Chinese	Pinyin in Chinese	Name in Uyghur	ULY (Uyghur Latin Character)	Name in English	Pronunciation in Uyghur	Pronunciation in Chinese
乌鲁木齐	Wu lu mu qi	ئۈرۈمچى	Urumchi	Urumqi	650111.mp3	650111_C.mp3
喀什	Kashi		Qeshqer	Kashighar	653101.mp3	653101_C.mp3
沙依巴格	Shayibage		Saybagh	Saybagh	650102.mp3	650102_C.mp3
莎车	Shache		Yerken	Yarkan	653122.mp3	653122_C.mp3
若羌	Ruoqiang		Charqiliq	Qarkilik	652827.mp3	652827_C.mp3
阿勒泰	Aletai		Altay	Altai	654301.mp3	654301_C.mp3

The Multilingual Graphic User Interface and Output

The Multilingual GUI of ArcGIS desktop and Webclient of ArcIMS are undergoing testing to make it more stable and flexible. Tool and menu tip in Uyghur language is not tested yet. Data frame and layer name are successfully displayed in all characters at the same view; legend automation, automatic labeling and annotation also succeeded in ArcGIS desktop but some problems occurred when doing the same test in ArcIMS. The layer name in non-Roman scripts characters are displayed as a question mark. Since Geodatabase support Unicode, it is possible to implement the labeling in four languages through the Maplex extention of ArcGIS for Cartographic output in digital or printed form. After testing it was found that making Uyghur annotation in ArcGIS Desktop for display and printing isn’t required in Unicode. But, it is better to use Unicode for multiplatform use. Figures 1 through 5 shows the results.

Figure 1. The layer names and labels are displayed in the same view at the same time in Uyghur, Chinese, Russian and English languages in different layers respectively and combined in the same layer in ArcGIS desktop.

Figure 2 The labels and annotations are displayed correctly in the IE browser viewer, but the layer name and legend was not displayed correctly if written in non roman script at the same view at the same time with ArcIMS.

Figure 3 Legends are displayed correctly at the same time automaticly in ArcGIS desktop.

Figure 4 Multi-lingual querying by attribute data are successful both in ArcGIS desktop and ArcIMS..

Figure 5 The Uyghur character fields can be used as acommon field for making relation and links between tables.

Problems

ArcGIS still has problems to display the Uyghur Characters in automatic labeling, annotation and legend. When we used the Spline text function, the content we want to display in Uyghur characters, making it unreadable because all words are in separate status. All the Uyghur characters will get longer, as well as, the space between words, when we used the annotation and labeling function, whereas the other three character sets do not have this problem as showen in Figure 6.

a

b

Figure 6 a) Unreadable annotoation in Spline text function; b) the Uyghur character is longer than expected for notation and labeling.

Conclusion

Although problems occurred in the correct display and printing of Arabic originated Uyghur characters, it is obvious that unicode is rapidly approaching maturity. With some disadvantages, such as the double size of data compared to the normal size of normal code pages, Unicode promises to be the best method as data storage becomes less of an issues. The correct display for Uyghur characters is still being examined.

References

www.ukij.org
www.oyghan.com
www.Unicode.org
ESRI, FAQ on the www.esri.com
ESRI, ArcGIS Online Help
www.vitualindia.com
Xiaoyan Ji, Discussions on the Multi-source Data Processing Technology in the Construction of GIS Databases, GSDI-9 Conference Proceedings, 6-10 November 2006, Santiago, Chile
Shinji Masumoto，Multi-Language Support and Localization of GRASS GIS
Mr. Christopher Deckert, Modeling Language Diffusion With ArcGIS, Prepared for WORLDMAP.ORG JUNE 2004
Proceedings of the Workshop, Asian Language Resources and International Standardization,Center of Academia Activities, Academia Sinica, Taipei, Taiwan, 31 August 2002
Ljuba Veselionova and Jason Booza, Using GIS Map the Multilingual City
ISO-19115

Design and implementation of multi language base map information system for digital xinjiang

Abstract:

Introduction

Method and Base Map Database

The place names database in multilanguage

The Multilingual Graphic User Interface and Output

Problems

Conclusion

References