pctechguide.com

  • Home
  • Guides
  • Tutorials
  • Articles
  • Reviews
  • Glossary
  • Contact

How neural machine translation systems based on AI work

Just a few weeks ago, Meta presented an artificial intelligence model capable of translating into 200 languages. The bet on this technology, which has the name ‘No Language Left Behind’ (NLLB-200), is part of a project developed by Mark Zuckerberg’s company to boost its bet on the metaverse.

Almost all the technology giants, with the exception of Apple and Google, are undertaking projects to position themselves in this new virtual universe that is in full swing. But there are other, more modest companies, some of them local, that have long since initiated research efforts in this field. For some years now, the company Incyta and the GRIAL research group of the Arts and Humanities Department of the Universitat Oberta de Catalunya (UOC) have been collaborating on a series of research and technology transfer projects related to neural machine translation. The objective of the research is to develop neural machine translation systems to be integrated into the workflow of the company Incyta.

This Barcelona-based language services company has been using machine translation systems for years to carry out post-editing. This workflow based on machine translation plus post-editing makes it possible to offer a more efficient and economical translation service, while maintaining the same level of quality, to its wide range of clients: the written press, publishing houses, public administration, universities, etc.

Until a few years ago, machine translation systems offered sufficient quality only for similar language pairs, such as Spanish-Catalan or Spanish-French. On the other hand, for slightly more distant languages, such as Spanish-English, for example, the quality of machine translation was not sufficient. It was more efficient to translate the document manually from scratch.

The emergence of today’s neural machine translation systems has made it possible to obtain outstanding quality even for very distant language pairs, such as Chinese-Spanish, for example. The appearance of these systems has constituted a true revolution in the world of professional translation, since they open the door to applying the most post-edition machine translation flow to most translation jobs.

Rule-based machine translation and corpus-based machine translation

But to understand what this technological revolution is all about, it is worth remembering the two main paradigms of machine translation: rule-based machine translation and corpus-based machine translation. In the first paradigm, the rule-based paradigm, machine translation systems are developed by computer engineers and linguists who write programs, dictionaries and rules to translate a sentence in a source language into a sentence in the target language.

The development of these systems usually involves many months of work by teams of several people. Among the rule-based systems, syntactic transfer systems can be highlighted. In these systems, the sentence in the source language is syntactically parsed to automatically obtain a parse tree. This parse tree, which can be deep or shallow, is transferred to an equivalent tree in the target language using a set of rules.

Once this syntactic tree is obtained in the target language, the words are translated using bilingual dictionaries and the translated words are inflected to obtain a correct sentence in the target language. This paradigm has worked very well for similar languages that have quite similar syntactic structures. There are excellent systems using this methodology that are still in use for similar language pairs such as Spanish and Catalan.

In the second paradigm, corpus-based systems, systems are not developed, but trained. That is, the systems learn to translate from texts in the source language and in the target language. Parallel corpora, i.e., sets of segments or sentences in one language with their translation equivalents in another language, are normally used to train these systems.

Chronology of machine translation


The first corpus-based systems are statistical machine translation systems, which burst onto the market around 2005. These systems are based on the calculation of two probabilities: the probability that a given sentence in the target language is the translation of a sentence in the source language; and the probability that a given sentence in the target language is a correct sentence in that language. The first probability can be calculated from the statistics obtained from the parallel corpus; while the second probability is calculated from the statistics obtained in a monolingual corpus of the target language. This monolingual corpus can be obtained from the target language part of the parallel corpus.

Filed Under: Articles

Latest Articles

Build a Website That Converts With These Tips

These days having a website is pretty much mandatory if you want to get your business noticed. Many consumers expect it and it is the first place they go when they are doing research on goods or services. But, just having a website is only half the battle. You also need to build a site that captures … [Read More...]

LED Flat Panels

Nearly everyone is familiar with LEDs (light-emitting diodes) from their use as indicator lights and numeric displays on consumer electronic devices. The basic LED is a solid-state device that contains a chemical compound that gives off light … [Read More...]

Shocking Cybercrime Statistics for 2025

People all over the world are becoming more concerned about cybercrime than ever. We have recently collected some statistics on this topic and compiled them into the following video. https://youtu.be/aoq7ymPQCWo Here are the details if you don't want to watch the full video. We were … [Read More...]

Importance of Inbound Marketing in the Digital Age

A couple of months ago, Zacks reported that Hubspot was starting to make some major changes to its inbound marketing strategy. They talk a lot about … [Read More...]

Damage Control Strategies for Resolving Online PR Crises

Last July, Astrologer faced a major crisis after its CEO went viral at a ColdPlay concert when having an affair. This was just one of the many times a … [Read More...]

AI is Not Killing Computer Jobs Like Doomers Projected

There is no denying the reality that AI technology has played a massive role in disrupting our lives. A growing number of people claim that AI … [Read More...]

Everything You Need to Know About Sourcing Circuit Boards From U.S. Suppliers

In This Article This article includes: Why Source PCBs From the United States?How to Get a Quote From a U.S.-Based PCB ManufacturerThe Top U.S. … [Read More...]

Top Taplio Alternatives in 2025 : Why MagicPost Leads for LinkedIn Posting ?

LinkedIn has become a strong platform for professionals, creators, and businesses to establish authority, grow networks, and elicit engagement. Simple … [Read More...]

Shocking Cybercrime Statistics for 2025

People all over the world are becoming more concerned about cybercrime than ever. We have recently collected some statistics on this topic and … [Read More...]

Guides

  • Computer Communications
  • Mobile Computing
  • PC Components
  • PC Data Storage
  • PC Input-Output
  • PC Multimedia
  • Processors (CPUs)

Recent Posts

How To Do a Clean Installation of Win98 – Phase 5: Setting Up Hardware and Finalising

Windows 98 Setup now commences its fifth and final phase. During this phase, Setup configures the following: Control Panel Programs on the Start … [Read More...]

Leaked passwords and phone numbers: how to act depending on the information that has been exposed

Data breaches have become more common than ever. The Identity Theft Resource Center found that there were only 238 fewer data breaches in the first … [Read More...]

Talisman

Unveiled at Siggraph 1996, Talisman was a Microsoft initiative to improve the quality, performance and integration of … [Read More...]

[footer_backtotop]

Copyright © 2026 About | Privacy | Contact Information | Wrtie For Us | Disclaimer | Copyright License | Authors