Friday 16th of May 2008

Guidelines on preparation of the source text

Of course, the quality improvement of machine translation (MT) is mainly the task of its developers. However, the users can also make some efforts for reaching acceptable results because first of all the quality of machine translation directly depends on the quality of the delivered source text.

Certainly, the guidelines below will not solve all problems of machine translation, but they can help to win some points in opposition of a computer and the natural language.

  1. Avoid misprints and spelling errors! The machine translator can not correct errors and recognize incorrectly written words (special spell-checking programs are very useful for this purpose).
  2. Bear in mind punctuation marks! Skipped or, on the contrary, redundant punctuation mark can prevent an electronic translator from understanding of syntactical structure of the sentence correctly.
    Signs of the end of the paragraph ()are automatically deleted by the program, and thus two lines become one line. Therefore it is necessary to put a point (.) at the end of the sentence.
  3. Place diacritics correctly!
    Remark: as a rule, the electronic translator can’t recognize a word with the Russian letter (ё) and also words with emphasis.
  4. Observe the case of letters! A lowercase letter in a word can quite become a capital one (for example, at the beginning of the sentence, in the header), and it is taken into account when developing MT systems. On the contrary, the capital letter becomes seldom a lowercase one, and in most cases it is related to derivation of a new word, for example, at transition of a proper noun in the class of common nouns — xerox ??? etc.). Because the word Internet is usually written with the capital letter there is no sense to complain (as one author of message in the guest book of the server www.translate.ru it does) that "there isn’t the word Internet in your dictionary".

    Besides, there are languages where the first capital letter in a word in principle changes its appurtenance to one or another part of speech. Certainly, an example for it is German language in which nouns are written with capital letter both at the beginning and in middle of the sentence. Compare these translations:
    "wie funktioniert das ûbersetzen mit dem "clipboard"?" - "How it works translate with “clipboard”?"
    Or
    " Wie funktioniert das Übersetzen mit dem "clipboard"?" - "How does the clipboard translation work?"

  5. Try to use simple syntactical constructions with the direct word order.
    For example, on the first place in the sentence there should be the subject or its group (I, you, he, my cat, my chief, son of my girlfriend).
    On the second place is the predicate expressed by a verb (want, know, like).
    Further there should be adverbs expressed by different parts of speech.

    A lot of guidelines on how to make the text in the natural language more "digestible" for the computer can be found at:
    http://alemeln.narod.ru/progper2.html
     

  6. Try to avoid skipping of syntactic words (even if it is allowed in the grammar). Here is an example. English sentence: "Your e-mail address is the address other people use to send e-mail messages to you" will be translated into Russian as not quite understandable text: "Ваш адрес электронной почты — адрес другое использование людей, чтобы послать почтовые сообщения Вам." Now after restoring the one skipped word — the conjunction that: "Your e-mail address is the address that other people use to send e-mail messages to you" —we’ll receive quite correct variant: "Ваш адрес электронной почты — адрес, который другие люди используют, чтобы послать почтовые сообщения Вам."
  7. Use only conventional abbreviations! Incorrect translation of an abbreviation is only a part of the problem. The matter is that even one not translated word can prevent the electronic translator from analyzing the syntactical structure of the sentence correctly (abbreviations participate in syntactical links alongside with common words).
    The writing of some abbreviations coincided with frequently used words could result in unpleasant consequences. For example, Russian abbreviation ПО (software) is written in the same way as Russian preposition по (on) (the case of letters does not play a role in this example as it is allowed to write a preposition with the capital letters, for example, in the header). Therefore, we regret to say, that translation of the following phrase "Я часто использую это ПО" consistently looks like "I frequently use it ON." On the other hand, if you are not too lazy and write "Я часто использую это программное обеспечение" the translation will be "I frequently use this software."
  8. Avoid using slangy expressions! Of course, we are speaking not about the criminal slang (though we could assume that the users of MT systems could use it). Law-abiding native speakers also use quite often during informal communication some words, expressions and constructions not belonging to literary norm ("Люди, решите траблу! Не могу зарегить мыло!" (literary norm: Help me please to solve the problem – I couldn’t sign-in an e-mail account) ). On the one hand, such words appear in speech earlier, than in dictionaries. On the other hand, it is not always advisable to add neologisms to the dictionary, e.g. the word “мыло” (soap) for the most users of MT systems is related to the denotation of a detergent.


Babblefish Language Lessons

BBC - Languages
Lessons in many languages
The Virtual CALL Library
Computer Aided Language Learning Software
single-serving.com
Quickly learn essential phrases and words for travelling, in easy single-serving doses! Great for beginners!
Holiday Prases
A great list of essential holiday phrases Now with mp3 downloads! In many languages.
Phrasebase Language Learning Resources
Your Conversational Language Learning Resource Center and Community


Learn English Gramer with this great Google tool...

Online English Grammar

Language Resources -
Grammar guide, and much more.

Wordchamp.com
Provides members with shared open content, exercises for language learning, free teacher resources, and personal tools to assist anyone in the day-to-day use of a foreign language.
Spanish Resources
Click here to learn about Spanish culture, check out the Spanish grammar guide, and much more.
French Resources
Click here to learn about French culture, check out the French grammar guide, and much more.
German Resources
Click here to learn about German culture, check out the German grammar guide, and much more.
English Resources
Click here to learn about English culture, check out the English grammar guide, and much more.
WordNet®
A large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.
Verbix Verb Conjugator
Here is a verb conjugator that conjugates the verbs of over 50 different languages for you.

Verb charts:  English  French  German  Italian  Spanish   




 

  Shop here and help support Babblefish.com
Apple Store Hallmark.com 3balls Golf CheapTickets Kayak.com BrightDiamond.com, Inc. DisneyShopping

 

Copyright - 1996 - 2007 ©               Translate this Page!
Google