The Challenges of Language

SOFTWARE DEVELOPMENT 31-01-2022
The Challenges of Language
The Challenges of Language
As a software developer aimed at the global market, we have to overcome different challenges to effectively communicate with users.
This article unfolds the intricate ecosystem of different languages in software development due to different backgrounds, countries, and companies.
The Challenges of Language

 

João Paulo Varandas, Graphical User Interface Team Leader @SISCOG  |  13 min read 
___________

 

There are multiple challenges when you want to effectively communicate with your users. You have to deal with different backgrounds, different countries, different companies, different languages and even with different alphabets.

Users want and expect a totally straightforward experience in which they understand what is happening and what they have to do, that is, they want to interact with an application that speaks the language they are familiar with. Yet, there are many nuances, cultural and linguistic, for each language or dialect.

SISCOG creates standard software products for decision support and optimisation in the field of resource planning and management. Furthermore, SISCOG products are designed and built to be used by different types of transportation companies, from rail, light rail, metro, to even bus or aviation companies; or any other companies that plan and manage the work of their resources in shifts. That is, a unique framework supports different contexts to do the same job: plan and manage company operations.

 

THE BEGINNING @ SISCOG

Since its inception, SISCOG has decided to develop its products in English with the aim of being present in the global market. Naturally, the first clients were Portuguese. Therefore, it was necessary to have a mechanism that would translate the application’s idiom from English, in which they were developed, to Portuguese and to other languages.

At that time there was no Google Translator nor any online translation engine. So the adopted solution was a translation mechanism based on dictionary keys – corresponding to sentences typically written in UK English and understandable by a human being. Each dictionary key maps to sentences translated into different languages ​​used by users. Our products have the default translation for each dictionary key for UK English and Portuguese (written by SISCOG staff). These keys are placed in code and represent, in real-time, the translated sentence (in the chosen language/dialect) that is displayed in every interface used by our end-users. Users can change the target language in real-time.

This mechanism to support multilingual communication is one of the oldest functionalities implemented by SISCOG for its products. Incredibly, the Lisp packages still keep their names in Portuguese – TRADUCAO (translation) and CHAVE (key). By the way, this mechanism has had very few changes to this day.

All components - desktop or web-based - use this centralized translation mechanism.

 

Language settings in this example for English, Portuguese and Chinese

Language settings, in this example for English, Portuguese and Chinese

 

MAJOR CHALLENGES

 

The main challenge is to achieve effective written communication for the user, straightforward to translate, and easy to manage and maintain. The journey has not been simple. It was necessary to find solutions for each challenge separately, minimize risks and search for the best practices.

Communication with the end-user must be clear, precise, understandable and in the correct tense. So, whenever possible we should use verbs in the present tense or infinitive form. However, if the past or future tense is necessary, the composed forms should be avoided.

To be effective, it is necessary to write sentences correctly and use the right terminology according to the application domain. By default, SISCOG develops its products mainly for railway companies who provide transport services.

However, each client has a specific reality that may have different terms and concepts. SISCOG products have the default translation for each key, while each client system will have the specific translation if they so require.

 

Users want and expect a totally straightforward experience in which they understand what is happening and what they have to do.

______________________

 

Context and Meaning

Usually, translators do their job only with the context of the sentence itself isolated from the context of its use in the application. In this way, the sentence must provide all the information needed and its structure has to be syntactic and semantically correct, so that it can be easily translated into other languages. It is important to reduce the risks of misunderstandings, poor translation and waste of time.

Sometimes it’s required to give these sentences more context, resolving ambiguities in the translation. There are similar concepts that have the same name in English but may have different translations. For example, a “Roster” is translated into Portuguese by “Rotação” if applied to vehicles and by “Escala” if applied to staff. Therefore, dictionary keys have to indicate the context of the term's use in order to be correctly translated. So, for our example, we need two different dictionary keys, one for vehicle rosters (“VEHICLE_Roster”) and another for staff roster (“STAFF_Roster”).

As another example, the word “Copy” can be used as a verb in the title of a button or used as a noun referring an object as being copied from another. Thus, it is necessary to distinguish the context where this word is being used.

 

Terminology
Our focus is on tools for heavy railway, light rail and metro companies not only for crew members but also for local staff. However, they are also prepared to be used by bus and airplane companies.

Although it is difficult to find a common ground in the terminology used by the different business domains, we try to choose terms that are as agnostic as possible and easy to understand for any user or translator. For example, we’ve chosen the term “personnel base”, commonly used in aviation, over depot because its Portuguese translation – “depósito” – is closer to a storehouse, which is weird applied to people.

The same concept may have different names in different countries or different companies. In some companies, a “duty” is known as “shift”, or as “working period” or as “service”. For example, UK English uses the word “Railway” but the US English uses the word “Railroad” or the Portuguese “Comboio” is named in Brazil as “Trem”. Thus, by default, the terminology base is the British railways for English translations, and the terminology used by CP (the incumbent Portuguese railway company) for Portuguese translations.

Differences due to country dialects (e.g., US English or Portuguese of Brazil) or company jargon (e.g., metro or bus) could force the redefinition of the necessary translations in the respective systems.

 

Business Dictionary
Facing the challenge of having a common and better understanding of different terminology a Business Dictionary was created. This type of dictionaries list terms and their definitions to ensure the same designations are used company-wide when writing texts or simple sentences.

The scope is to cover not only products, systems and the software development process but also marketing, management, human resources, operations research and other areas relevant to SISCOG's business. The Business Dictionary is an ongoing and ever-growing project to which all SISCOG employees can contribute.

 

 

 The same concept may have different names in different countries or different companies.

______________________

 

 

Sentences with arguments
There are sentences that are used as the title of windows or dialogs; or used as the name of menu options; or used as field labels in tables or dialogs; or as normal free communication sentences with the user.

It is not feasible to write all possible sentences of an application because some depend on the actual situation in which they are used. To deal with this, some sentences may contain variables, that is, parts of the sentence depend on the specific situation. For example, the sentence “The duration must be between ~a and ~a” (~a represents a variable) can be instantiated with any two-time values, such as, for example, “The duration must be between 1:00 and 4:00”.

The use of variables requires a very careful sentence construction that shouldn’t hinder the translator’s work, given that in other languages the structure is not generically the same as in the original English sentence. Translators don’t know what words will replace a variable. For example, the use of auxiliary verbs, the relative position of the subject and the adjective, or the articles.

Articles in English do not have gender and do not distinguish the plural and the singular. If an article is related to a variable, it’s not possible to know how to adapt the gender or plural. This becomes more complicated the more genders (for example German has three) and complex forms of plurals a language has.

For example, “The ‘~a’ field must be a number” is translated to Portuguese “O campo ‘~a’ tem de ser um número”. The article is related to the word “field”, not the variable, therefore it doesn’t create gender problems. However, in the sentence “Cannot change start date because ~a is already added” the verb “added” could mean “Adicionado/a/os/as” in Portuguese depending on the variable’s value. Using different keywords labelled masculine and feminine can be a solution but is overwhelming. Thus, the challenge is to write a generic sentence avoiding these traps.

 

Sentence composition (verb, subject)
A sentence must not be the result of the concatenation of other sentences. For example, “Station” + <name> + “already exists”. The result in Portuguese is “Estação Campolide já existe” which is not the correct way of spelling it.

Thus, the correct way of writing the original sentence in English is using a variable. The sentence “The station ~a already exists” allows the appropriate translation into Portuguese “Já existe a estação ~a” (note the position of the variable in the structure of both sentences). This issue becomes even more critical when we have auxiliary verbs or adjectives in the same sentence. The concatenation may make sense in English, but the translation to other languages ​​can be very difficult to understand, since the sentence construction order may be different.

 

Standard sentences
Several dictionary keys have been created that correspond to standard phrases or standard questions. They are defined once but can be used in different moments, in very specific and unambiguous contexts, or as a complement to different messages. For example, the question “Do you want to continue?” can be used at the end of a message requesting an action from the user.

 

Example of the Russian message "Do you want to continue?"

Example of the Russian message "Do you want to continue?"

 

Maintenance of translations
Translation keys were created over time as needed and according to the current work context. For example, over a period of time there may be a major development in the context of the vehicle scheduling and management product FLEET, or in the context of the needs of a metro company. The sentences that are created, and the respective translation into Portuguese, are strongly influenced by the terminology of the context. For example, a “duty” in the CREWS product is translated to “Período de Trabalho” in the context of CP or MEDWAY (the Iberia rail freight company) or to “Turno” in the context of Lisbon Metro.

Sentences were written by many different programmers over many years. Many similar sentences have been created to say basically the same thing. Some of these sentences were written in completely different ways. For example, “You must specify the start time.” or “Start time is not defined.”

When analysing translation keys globally it is possible to find this type of discrepancy. Work has been done to standardize terminology and sentence construction.

Quite often, some product functionalities are reviewed and consequently some dictionary keys may need to be modified or even removed. Furthermore, these changes can also impact the specific code of a client system.

Translations made for the original keywords must be adapted according to the modified sentence. This job is usually done by developers for Portuguese translations, but not for other languages ​​that we don't speak. We do not risk using Google Translator because we are unable to validate the obtained result. Our clients know how to translate better. This is an additional effort during system updates or upgrades. It is necessary to identify the changes and give this indication to the respective client.

Keywords that are no longer used in code are often lost in the dictionary files as garbage.

This is a never-ending work due to the evolving nature of SISCOG applications and, as you can see from the challenges we face, it is not an easy job. From time to time, it is necessary to do some housecleaning. Right now we have around 12400 translation keys. 

 

SOME RULES

 

Formal voice
SISCOG has always chosen to address the user in a formal speech using the verb in the third person. Although in the English language the use of verbs has no distinction in terms of formal/informal speech, being always used in the second person, in the case of Portuguese that is not so. This formal address option is because there is still a portion of an older generation of users that use our type of solutions. However, this is changing, not only as more and more younger users are working with this type of software, but also many commercial applications and web pages are changing the standard and opting for a more informal treatment.

Also, sentences are written avoiding any slang, colloquial speech or IT jargon.

 

Positive phrasing
Users find it difficult to understand a sentence written in the negative and even more difficult to understand sentences with double negatives. The messages should express what can be done and not what can’t be done. Explaining what a user can do to correct a problem is more important than saying that the user has done something wrong. For example, “No train was selected.” versus “You must select a train.”. Also, by using positive phrasing as a standard, situations can be avoided where different keywords are used to say the same thing.

 

Length
The layout of most dialogs built-in Lisp is very rigid. Position and size are fixed and are usually defined by the English sentence plus some margin. The available size does not adapt to the width of the translated sentence. Also, note that SISCOG’s first dialogs for MS Windows were designed around the year 2000 when the typical screen size was 1024x768. Due to the lower resolution, the space available was small, meaning that dialogs’ widgets would have to be carefully arranged and, for this reason, most of the time they are not very spacious.

Another problem that arises from having small dialogs is that some translations wouldn’t fit the available space. In many cases there was a need to extend or redesign the dialog while in other cases extra imagination was needed to find a shorter translated sentence without losing meaning.

SISCOG’s code base has a type of dialog, that we named "variable dialog", which can mitigate this problem. The dialog will be constructed dynamically, adapting its size to the sentences that it contains. Although this can solve the majority of space problems, we risk obtaining aesthetically weird layouts. This functionality can only be used for simple layouts.

There are currently some new features that are now implemented in HTML, with responsive design, which do not suffer from this problem.

 

Example of Dutch and Chinese interfaces

Example of Dutch and Chinese interfaces

 

Error Messages
Effective error messages inform users that a problem occurred, explain why it happened and provide a solution so users can fix the problem. Users should either perform an action or change their behaviour as a result of an error message.

Well-written, helpful error messages are crucial to a quality user experience. However, poorly written error messages result in low product satisfaction and unnecessary error messages breaking the users’ flow.

For example, most messages of the type “<something> is not valid” or “Invalid <something>” are unnecessary. Instead, it’s better to signal this type of errors by providing a message that explains the correct format or actions to prevent it.

 

Forbidden words
Some words may have multiple meanings, while others might even be culturally unacceptable. For example, the word “invalid” in Portuguese (“inválido”) should never be used due to its double meaning as a noun – a person physically handicapped.

 

 

OTHER CHALLENGES

 

Chinese characters
Some years ago our translation mechanism and the graphical interface of our products faced a huge challenge. In the context of implementing a prototype for a Chinese Metro, it was necessary to prove that we could have our applications communicating in Chinese.

In a first step, Google Translator was used to obtain the translations of several thousand dictionary keys. It was necessary to “cheat” Google Translator to translate the keywords containing variables. After several days of work, we managed to obtain the dictionary files. After launching the applications, we noticed that apparently the graphical interface was doing very well, although we didn't know if the Chinese sentences were correct or not. When the prototype was shown to native Chinese, they didn't show any rejection reaction, quite the opposite.

It is also possible to have a graphical interface in Cyrillic, Greek or Japanese. However, Arabic and Jewish languages ​​pose an extra problem: reading is done backwards from right to left. This requires a complete redesign of the graphical interface.

 

Example of a Chinese interface

Example of a Chinese interface

 

Emojis
Nowadays, the adoption of emojis in communication with users is just a technical problem of choosing the most appropriate character set. However, as the saying goes “A picture is worth a thousand words” and therefore an emoji does not need to be translated 😀.