How to improve the quality of machine translation

Of course, improvement of the quality of machine translation (MT) is mainly the task of its developers. However, the user can also apply some efforts to achieve acceptable results - first of all, due to the fact that the quality of machine translation depends directly on the quality of the provided text.

Certainly, the guidelines below cannot solve all the problems of machine translation, but they will help to win some points in opposition of a computer and the natural language.

  1. Avoid misprints and spelling mistakes! The machine translator cannot correct mistakes and recognize incorrectly spelled words (special spell-checking programs are very useful for this purpose).

  2. Mind punctuation marks! Omitted or, on the contrary, redundant punctuation marks may prevent the electronic translator from correctly understanding the sentence's syntactic structure.
    Paragraph marks (¶) are automatically deleted by the program, and thus the lines become one line. Therefore, it is necessary to use a period (.) at the end of the sentence.

  3. Use diacritics correctly!
    Note: As a rule, the electronic translator cannot recognize words containing the Russian letter “ё” or words with emphasis marks.

  4. Observe the case of letters! A lowercase letter in a word may easily be converted to a capital one (for example, at the beginning of the sentence, or in the header), and it is taken into account when developing MT systems. On the contrary, a capital letter seldom becomes a lowercase one, and in most cases it relates to derivation of a new word, for example, at transition of a proper noun in the class of common nouns - xerox, etc.). As the word “Internet” is usually written with the capital letter, there is no sense to complain (as one of the users in the www.online-translator.com guest book does) that “there's no word “internet” in your dictionary”.

    Besides, there are languages where capitalization changes the part of speech to which the word belongs. A striking example is the German language, where nouns are capitalized both at the beginning and in middle of the sentence. Compare these translations:
    ”wie funktioniert das übersetzen mit dem “clipboard”?” - ”how does this function translate with “clipboard”?”
    ”Wie funktioniert das Übersetzen mit dem “clipboard”?” - ”How does the translation with “clipboard” function?”

  5. Try to use simple syntactic constructions with direct word order. For example, the subject or its group (I, you, he, my cat, my chief, son of my girlfriend) should be placed first.
    Then comes the predicate expressed by a verb (want, know, like).

  6. Avoid omitting syntactic words (even if the grammar allows it). Look at this example. The translation of an English sentence “Your e-mail address is the address other people use to Send via E-mail messages to you” into Russian produces not quite understandable text: “Ваш адрес электронной почты - адрес другое использование людей, чтобы послать почтовые сообщения Вам.” Now let's restore the only skipped word - the conjunction that: “Your e-mail address is the address that other people use to send e-mail messages to you” - and we will get quite a correct translation: “Ваш адрес электронной почты - адрес, который другие люди используют, чтобы послать почтовые сообщения Вам.”

  7. Use only conventional abbreviations! Incorrect translation of an abbreviation is only a part of the problem. The matter is that even a single not translated word may prevent the electronic translator from analyzing the syntactical structure of the sentence correctly (abbreviations participate in syntactical links alongside common words).
    Spelling of some abbreviations often coincides with spelling of frequently used words, which may have unpleasant consequences. For example, the Russian abbreviation “ПО” (software) is written in the same way as the Russian preposition “по” (on) (the case does not play any role in this example, as the preposition can be written with capital letters, for example, in the header). Therefore, though we regret to say it, the translation of the following phrase “Я часто использую это ПО.” looks like “I frequently use it ON”. On the other hand, if you are not too lazy to write “Я часто использую это программное обеспечение” the translation will be “I frequently use this software”.

  8. Avoid using slangy expressions! Of course, we are not speaking about the criminal slang (though we could assume that criminals may use MT systems). In informal communication even law-abiding native speakers often use words, expressions and constructions not belonging to literary language (“Люди, решите траблу! Не могу зарегить мыло!”). A part of the problem is that such words appear in speech earlier than in dictionaries. The other part is that adding neologisms to the dictionary is not always advisable, e.g. the word “мыло” (soap) for the most users of MT systems is the denotation of a detergent.