Vol. 1, No. 1, May 2019

Post-editing and legal translation

Post-editing e traduzione giuridica

Fiorenza Mileto, Università degli Studi Internazionali di Roma, Italia

 https://orcid.org/0000-0002-7472-4496


This study was conducted in collaboration with a group of students of UNINT: Francesco Becchetti, Eleonora Boni, Federico Borrelli, Silvia Carli, Eugenia Liguori, Federico Ottaviani, Tiziana Paloni.

Abstract

At UNINT, the courses dedicated to technologies are inspired by the principles of PBL (project-based learning) and experiential learning. Following this approach, in the courses dedicated to assisted and automatic translation the students perform experiments to test some aspects or address problems that are detected through the observation of the translation industry, such as the compatibility of screen readers with CATs for blind users, the testing of Adaptive Machine Translation (AMT) systems being developed, the verification of the usefulness of the output of Machine Translation (MT) not only for translators but also for interpreters. This year, during the automatic translation and post-editing laboratory, thanks to the interdisciplinary nature of the courses dealing with translation technologies, a group of students carried out experiments on materials made available by the teacher of active legal translation module. The aim was to verify how effective the automatic translation integrated with the assisted translation from Italian into English was on a determined type of text, using procedures like pre-editing, the creation of ad hoc translation memories based on legacy material and the automatic verification of terminology through the creation of specific glossaries.

Keywords: post-editing, legal translation, machine translation.

Abstract

I corsi dell'UNINT dedicati alle tecnologie sono ispirati dai principi del PBL (project-based learning) e dell'experiential learning. Seguendo questo approccio, nei corsi dedicati alla traduzione assistita e automatica gli studenti eseguono dei test per verificare alcuni aspetti o per indagare alcuni problemi rilevati durante l'osservazione dell'industria della traduzione: ad esempio, l'accessibilità dei CAT da parte degli screen reader per gli utenti non vedenti, l'analisi dei sistemi di Adaptive Machine Translation (AMT) in via di svilutppo, la verifica dell'utilizzabilità dell'output della Machine Translation (MT) non solo per i traduttori ma anche per gli interpreti. Quest'anno, nel laboratorio di traduzione automatica e post-editing, grazie alla natura interdisciplinare dei corsi che si occupano delle tecnologie per la traduzione, un gruppo di studenti ha eseguito dei test sui materiali resi disponibili dall'insegnante del modulo di traduzione giuridica attiva. L'obiettivo era quello di verificare l'efficacia della traduzione automatica integrata con la traduzione assistita nella combinazione linguistica italiano-inglese su una specifica tipologia di testi, utilizzando procedure quali il pre-editing, la creazione di memorie di traduzione ad hoc basate su materiali legacy e la verifica automatica della terminologia tramite la creazione di glossari specifici.

Keywords: post-editing, traduzione giuridica, machine translation.

1. Introduction

Universities have always represented an ideal environment for the experimentation and re-elaboration of old processes revisited in the light of technologies: the availability of time at a reduced cost, young minds not yet biased by vices and prejudices that are acquired with on-the-job-experiences, the flexibility of those who know how to see the same problem from different points of view because they are not affected by the limits of the fearsome parameters time/costs/quality imposed by the translation market, all these factors together provide a fertile ground where simple ideas could solve problems of various kinds.

At Università degli Studi Internazionali di Roma (UNINT), the courses dedicated to technologies are inspired by the principles of PBL (project-based learning) and experiential learning. Following this approach, in the courses dedicated to assisted and automatic translation the students of the first year of MA course in specialized translation perform experiments to test some aspects or address problems that are detected through the observation of the translation industry, such as the compatibility of screen readers with Computer Assisted Translation tools (CATs) for blind users, the testing of adaptive Machine Translation (MT) systems being developed, the verification of the usefulness of the output of MT not only for translators but also for interpreters.

During the academic year 2016/2017, in the Machine Translation and post-editing laboratory, thanks to the interdisciplinary nature of the courses dealing with translation technologies, a group of students carried out experiments on materials made available by the teacher of active legal translation module. The aim was to verify how effective the automatic translation integrated with the assisted translation from Italian into English was on a determined type of text, using procedures like pre-editing, the creation of ad hoc translation memories based on previously translated material and the automatic verification of terminology through the creation of specific glossaries.

The following article summarizes the results obtained by the students who took part in the experiment under the supervision of the teachers of translation technologies and active legal translation.

2. An overview of MT

The research on artificial intelligence, which immediately aroused great interest given the multitude of benefits obtained from which the human being could benefit, is progressing at a constant rate with results that it is not always possible to ignore.

On the one hand, there are MT systems, as software capable of producing a translation without human intervention, which are one of the products resulting from such research whose architectural structure, according to the latest updates in the field, has been perfected to the point of being able to simulate the cognitive process implemented during translation. The Neural Machine Translation (NMT), for example, based on algorithms that reproduce a person's neural structure, represents the last frontier in the field of translation technologies, despite the fact that it is still difficult to put aside the system that preceded it, the Statistical Machine Translation (SMT).

On the other hand, there are CAT tools, such as programs that reduce but do not eliminate human intervention. The objective of these tools is to provide a range of features that can assist and facilitate translation.

First considered separately, these two approaches to translation were merged into a single system, based on the integration of MT into CAT tools. This integration is based on the idea that translation proposals, generally deriving from a translation memory and/or termbase, can be increased by those deriving from MT.

3. Legal English

Like all sectorial languages, the legal language contains many pitfalls at all language levels. While it is highly lexical, it does not have a systematic, unambiguous and formalised terminology. It is therefore a special language in the broadest sense that presents such characteristics when compared with special languages in the strictest sense, such as mathematics, which instead need functional, economic and precise communication (Megale, 2008). The language of law differs considerably from the common language in terms of translation practice.

According to the classification proposed by Alcaraz and Hughes (2002) the English legal language or legal English is characterized at the lexical level by the presence of:

  1. purely technical terms. These terms tend to be monoreferential and exclusive to the field of use:
  1. semi-technical and mixed terms. These are common language terms that are mostly polysemic because they have acquired, in addition to their original meaning, a specialist meaning:
  1. non-technical terms or words of the common language which do not belong to a particular field of use.

3.1. Legal translation

The interpretation of a text should be based on the legal culture within the system to which the languages in question belong, while the translation culture should take into account the differences and similarities existing between the various legal systems, given their peculiarity of reflecting the social and political framework of the ethnic group to which they belong in response to the needs, customs and habits of the given group. It is precisely those differences between the various legal systems that hinder translation activities, which make it difficult to find a correspondent in the target language and, at times, even impracticable, when the lack of a translator is admitted (think of the notions of consideration or equity in Anglo-American law).

In addition to conceptual considerations, translation should also include linguistic considerations. Moreover, it should be considered that the legal language not only differs from the ordinary lexical and morphosyntactic level, but also gives rise within it to textual products that are extremely different at a structural level (e.g., the structural differences between a judgment and a law). For the sake of clarity, the Italian linguistic research proposes specific denominations to indicate the varieties of the language of the law, classified according to the pragmatic function performed by the message conveyed.

In transposing the legal meaning of each word, the style and terminology of the target language should be respected, making sure that the act of communication is successful and therefore meaningful in the light of the differences existing between the two systems under consideration.  It should be noted that, for the purposes of this paper, it was not necessary to consider the difficulties posed by the coexistence of different legal systems with the same official language, as is the case for Switzerland in which the legal Italian differs from that used in Italy precisely because of the different legal reality that reflect the two systems and consequently the two language varieties.

With regard to the difficulties posed by legal translation, Fiorito (2005) drew up a model in 2004 in which he identified four different situations that the translator could encounter. In the first situation, thanks to the presence of the same legal arrangement, regulated in the same way in the legal systems under consideration, it is sufficient to replace the term with its translation. In the second situation, the names of the institutions are lexically the same or equivalent, but regulated differently. In this case, literal translation is strongly discouraged as it could lead to serious inaccuracies. It is also essential to indicate that it is a translation and not an original text, so that the external origin of the text is revealed and at the same time the end user prepares to interpret the text according to a logic of adaptation or estrangement of the texts, where possible. In the case of contracts in general, as in those under consideration, the aim is always to specify the regulations to be referred to in the interpretation of the contract and in the event of disputes arising. Generally, it is the applicable law clause that establishes the reference framework and puts an end to possible ambiguities regarding the different definitions of the notions contained in the contract. The same section on definitions, typical of official contracts, aims to define precisely the institutions contained in the given legal act. The third situation is that both the names of the institutions and the underlying legislation are different. The literal translation of the term can be effective when it does not correspond to any family concept in the juridical system of arrival and when the semantic nucleus of the translated term is, however, easily recognizable. The last situation is that an institution does not exist in one of the legal systems concerned. In this case Fiorito (2005) tends to advise against literal translation, believing it to be meaningless because of the lack of a shared semantic nucleus to which to refer. Possible solutions include providing a bracketed explanation of the term to be translated together with the proposed translation and leaving it in its original language, thus making clear the legal system to which it refers.

3.2. MT and legal translation

In a veritable content explosion era, MT has become an essential asset. It is estimated that MT portals across the world translate about 600 billion words a day (Vashee, 2018). Even though MT quality is still far from being satisfactory, thus scarcely publishable,

(…) MT is more specifically aimed at enhancing human translators’ productivity and creativity by not only releasing them from routine work, which is best for machines, but also by providing them with more translation possibilities (…) When there is real demand for translation and the suitability, strengths, and weaknesses of available MT systems are well understood, why not incorporate online MT services into one’s working environment, especially for legal translation? (Kit & Wong, 2008, pp. 320-321)

So far, just a small group of researchers tried to integrate MT in legal translation. Taking into account the limits of MT and the complexity of the legal field, particularly due to a lack of equivalence between legal systems, scepticism is hard to condemn.

After 70 years of research, translation industry offers a wide range of MT solutions: SMT and the new-born NMT and AMT. In this work, after several considerations we will discuss later, we will focus on SMT and NMT.

Uncontested protagonist of the last 20 years of MT, SMT is defined as “a machine translation system that uses algorithms to establish probabilities between segments in a source and target language document to propose translation candidates. Also known as data-driven machine translation to contrast the approach with a rule-based machine translation system” (TAUS, 2014). It is composed by three elements, namely the translation model, the language model and the decoder (Koehn, 2007). As its name suggests, it works differently from other MT systems based on linguistic rules, it works instead on statistical data. In other words, it finds the most probable translation according to a statistical model, which uses a trained parallel corpus and automatically aligns words and phrases on the same basis; according to the following description,

The general setting of statistical machine translation is to learn how to translate from a large corpus of pairs of equivalent source and target sentences. This is typically a machine learning framework: we have an input (the source sentence), an output (the target sentence), and a model trying to produce the correct output for each given input. (Goutte et al., 2009, p. 2)

Obviously, as all automatic translation technologies, SMT has both advantages and disadvantages. On the one hand, thanks to its statistical approach, it can deal with lexical ambiguity, it requires minimal human effort and it can work with any language pair that has enough training data. On the other hand, there is the risk of failure with texts different from training corpora in addition to a possible lack of consistency. Moreover, it cannot explicitly deal with syntax and it requires a large parallel corpus, in fact a minimum of 2 million words is required to specialize an engine on a given domain. Nonetheless, it is possible to reach an acceptable quality even with fewer words (Lü, Huang, & Liu, 2007). Finally, it is important to remember that the output quality is highly dependent from ST corpora quality.

The last trend in MT research and the main subject of MT conferences all over the world is undoubtedly NMT. The traditional architecture of a neural engine is made of an “encoder-decoder” structure. In simple words, the encoder reads the input sentence and returns a numerical sequence (vectors) which is “decoded” into the output sequence through a complex system of functions and weights. Differently from SMT, which is limited to a short sequence of words (N-grams), NMT translates on a sentence basis. It is worth mentioning that NMT components are all jointly trained, while SMT models are not interdependent (Diño, 2017).

As far as research is concerned, results are still very heterogeneous and far from being satisfactory. On the one hand, NMT seems to clearly outperform SMT in terms of BLEU and HTER scores, making less morphology and lexical errors (Bentivogli et al., 2016; Bojar et al., 2016; Tinsley, 2017; Toral & Sánchez-Cartagena, 2017). According to most researchers, neural machine translation outputs would be more fluent and “natural”, with better word order and inflection. On the other hand, Koehn & Knowles (2017) identify seven big issues in NMT. Among them: out-of-domain performance of NMT is worse in almost all cases, adequacy is sacrificed for the sake of fluency and neural outputs “show weakness in translating low-frequency words belonging to highly-inflected categories (…) [and a] lower translation quality on very long sentences” (Koehn & Knowles, 2017, p. 28). If we think about legal translation, these issues seem to be particularly relevant. Finally, it is still difficult to understand neural networks choices; as Tinsley said “for now we’re essentially dealing with a vaguely transparent black box” (Tinsley, 2017, p. 32).

In such a scenario, the decision of integrating SMT engines in our workflow has been an obvious choice. Even if results in NMT research are very promising, the training phase remains a big issue in terms of amount of data, time and hardware resources. Moreover, it is difficult to access to available trained engines on legal domain.

For all these reasons, we have chosen the MT system operated by the Directorate-General for Translation (DGT), as the only trained engine on legal domain; in order to diversify our study, we have used two more engines, namely SDL Language Cloud and Google API, both integrated in SDL Trados Studio.

3.3. The DGT machine translation service

First introduced in 2010 at the Directorate-General for Translation (DGT), MT@EC tool was released into production only in 2013, after being tested in the various language departments. It is a system based on statistical methods, implemented starting from the Moses engine, which draws about a billion segments in the 24 official languages, contained in the database of Euramis. The language combinations available are all from or into English and the other 23 languages (EN < > 23 official languages) for a total of 552 language pairs. The fact of having a remarkably large bilingual corpora makes probabilistic analysis very reliable as the algorithms identify possible combinations on the basis of a quantitatively high data sample. Currently, the tool is used by the European institutions, in the public administration sector of the European Commission of each Member State and in collaborative projects with translation universities and, generally, it serves to translate documents or text fragments, relevant information available in other languages, press releases and especially all European legislation and texts that, in general, require a high degree of stylistic and conceptual accuracy (Mai, 2016). Since its first release in 2013, the quality of the output produced has improved considerably thanks to the increase in the data available and the evaluations provided by translators, which can also be seen from the amount of post-editing carried out on the text.

The integrated use of automatic and assisted translation is also detected at the DGT itself, where translators are used to working using an integrated system of assisted translation which, in the first phase, simultaneously offers the translations coming from the MT tool. Thanks to the function of recognition of the origin of the proposed segment, the translator is able to identify the origin of the translation proposal and decide whether to use it or not (Mai, 2016).

3.4. Pre-editing: how to modify the input to optimize the outcome

At this point, the main question on MT has shifted from whether technology will be able to replace human translation to how human intervention can enhance the system performance, aiming mainly to optimize translation time and costs and to produce texts than can be easily translated through MT by online users who don’t know the language the texts are written in.

Firstly, it can be easily understood that in order for the MT engine to effectively process a text, the source language and the target language should have as many similarities as possible. At the same time, it is possible or even advisable for the translator to intervene on the source language to stem in advance those problems which are likely to generate significant errors in the translation. How to write or rewrite a text becomes thus particularly important when seeking a better comprehensibility and, consequently, a higher translatability; in other words, the human intervention focuses on normalizing the source that feeds the translation memory tool, where normalizing the source means reducing variation between sentences (Muegge, 2007). Two main methods are applied to this end: pre-editing and writing in a controlled language.

While pre-editing consists of a number of interventions on the existing source text (abbreviation, simplification…) that makes revision faster and cheaper (Riediger & Galati, 2015), the concept of controlled language dates back to 1930s and implies a reduced language variation by limiting the choices available to the author in the first place. These two ways of operating on a text undergoing MT aim at a normalized input not only from a grammatical point of view, but also in regard to style, vocabulary, syntax and even text function. As Muegge (2007) points out,

the big factor for making machine translation systems more productive is reducing ambiguity in the source text. The problem that rules-based machine translation systems (…) struggle with is the fact that in uncontrolled source texts, the (grammatical) relationship between the words in a sentence is not always clear.

Pre-editing and controlled language methods share many of the rules that help the MT system to successfully identify the part of speech of each word, although they may slightly differ from one language to another. Using as reference the works of Muegge (2007) as far as English is concerned and of Riediger & Galati (2015) together with Bernardi (2016) for Italian, it is possible to summarize the fundamental linguistic features that a text should have to comply with the rule set for improving MT efficiency. Namely, the text should:

As far as Italian is concerned, there are two other rules that can be added:

The resulting text not only will be simplified in its structure and vocabulary, but it also will be shorter, hence faster and cheaper to translate. Thanks to the rules that mitigate the weaknesses of rules-based machine translation systems, the quality of the output and the translator’s productivity are likely to improve dramatically. As Muegge (2007) points out:

The more restrictive the controlled language, the more uniform and standardized the resulting source document, the higher the match rate in a translation memory system, and the lower the translation cost in a conventional translation environment. A controlled language that was designed for machine translation will significantly improve the quality of machine-generated translation proposals and dramatically reduce the time and cost associated with human translators editing those proposals for producing translations for previously un-translated material.

As a general rule, although it remains impossible to faithfully reproduce the original text fluency, some textual typologies pertaining to specific domains such as the juridical one, whose features include technical terminology and simple syntactic structure, tend to be more suitable for a pre-editing intervention and, consequently, for a processing through MT.

3.5. What makes legal texts suitable for CAT tools and MT?

There are some features that make legal texts suitable for the translation with the combination of CAT tools and MT engine and in the following table there is a detailed description.

Table 1. Legal texts vs CAT tools and MT

Features of legal text

CAT tool functions

MT pros and cons

standardized

micro and macrostructure

formatting and layout

automatic translation of easy or repetitive fragments

recurring style and frequent use of collocations

leverage from pre-existing materials translated without CAT Tool (alignment)

stylistic errors

technical terms

termbases and concordance in TM

regular and consistent only with trained SMT

use of passive voice and impersonal phrases

concordance in TM

pre-editing applied to legal texts has shown significant decrease in grammatical errors

4. The case study

The aim of this case study is to evaluate if, and to what extent, the translator may improve the translation quality of legal texts when working with a MT system/engine integrated into a CAT tool. Before starting the tests, the main features of MT, computer assisted translation, and legal translation were collected, in order to identify what may enable and inhibit the translation activity.

Research in this field, which involves the use of an automatic translation tool based on statistical data in order to improve the quality of the text and increase the productivity of translators, generally includes three approaches:

This case study, which aims to assess whether and to what extent it is possible to benefit from the integrated use of MT, TM and TB, is based on the considerations reached in the studies that fall within the third possible situation, since one of the most important points is to understand how to carry out post-editing and to know in which cases this activity is productive. The assessment of the post-editing effort was made by evaluating the quality of the output texts, using the standards published by LISA as a basis to define the criteria of human revision: mistranslation, concordance, terminology and style, omission, formatting, word order, inconsistency and grammar.

Here are reported some examples of errors detected according to the parameters defined above:

FORMATTING: in SDL Language Cloud, DGT MT system and Google the main problems were specifically the capitalization, the bold and the italics formats that changed from the source text to the target one; even though the translation was correct, the MT changed the capitalized word “IL MINISTRO” into the non-capitalized word “the minister”. In the same way the problem of bolding letters occurred simultaneously with the non-capitalization: the automatic translation mistranslated the word “VISTO” into “having regard” without bolding it.

OMISSION: DGT MT system, Google and SDL Language Cloud omitted some words of the source text. In SDL Language Cloud the omission of articles and adjectives, in some cases, drastically changed the meaning of the sentence or could lead to meaningless phrases. Furthermore, the omission of the adjective “personal” referring to “liberty” in the sentence “measures involving deprivation of liberty” translated from the Italian “misure privative della libertà personale” caused a drastic change in the meaning.

PUNCTUATION: in SDL Language Cloud, Google and DGT MT system generally automatic translation doesn’t show up significant changes in translation. The most serious mistake made by SDL Language Cloud and DGT MT system was the omission of a full stop (.) after the letter Z when translating from the original “Michalski Z.”; this probably occurred because the automatic translation didn’t recognize that the abbreviation Z. was referring to the name “Michalski”.

As far as the slash (/) is concerned, in legal language the space should not be placed neither before nor after the slash. In Google MT the space was placed before and after the letters, although it wasn’t placed before or after the numbers (as shown in the table below).

WORD ORDER: in SDL Language Cloud, Google and DGT MT system it was found that the automatic translation didn’t always recognize the correct English word order; it tended to maintain the Italian grammar order or randomly change the word order disregarding the punctuation. In the sentence “del codice penale polacco” (Google and SDL Language Cloud), the automatic translation maintained the Italian word order, translating it into “of the penal code polish” instead of the correct English form “of the Polish penal code”. It was also found that it didn’t follow the specific word order driven by the rules of punctuation, leading sometimes to a complete lack of meaning.

In DGT MT system the Italian sentence “l’applicazione della reciprocità internazionale” was mistranslated into “the international application of reciprocity”, but thanks to post-editing the sentence has been correctly translated into “the application of international reciprocity”.

TERMINOLOGY and STYLE: in SDL Language Cloud, Google and DGT MT system it was found that the translation was often made by choosing a terminology that didn’t fit with the context. Even if the translation from the Italian “pena residua” into the English “residual penalty” was correct, this translation was not appropriate for the analysed text; thanks to the use of a termbase it was possible to determine the more suitable form according to the context: “balance of sentence”.

GRAMMAR/MISTRANSLATION: in SDL Language Cloud in some cases the English grammar was not fully respected (there were some problems with definite/indefinite articles) while, in many other cases, the automatic translation completely mistranslated words or sentences of the source text. The Italian sentence “del cittadino polacco Michalski Z.” was mistakenly translated as “of a Polish citizen Michalski Z.” by using an indefinite article instead of the definite one “of the Polish citizen Michalski Z.”  necessary to refer directly to the following specified subject. In DGT MT system it was necessary, in some cases, to link plural subjects with a plural verb. During the post-editing stage, the sentence (second error category in Table 2) was therefore translated into “GIVEN that the Italian Republic and the Republic of Sudan are parties to the UN Convention”, in accordance with the English grammar rules.

In Google the automatic translation, in some cases, completely mistranslated words or sentences of the source text; it radically changed the meaning of the sentence or could lead to meaningless phrases (i.e.,: “Nel restare in attesa di cortese riscontro” translated into “While waiting to be courteous” rather than “Looking forward for your kind response”).

In the next three tables is reported a detailed analysis of some typical errors reported in the different MT tested.

Table 2. Error analysis: DGT MT

DGT MT

MT output

Post-editing

Sources

Mistranslation

GAI

of the sentence handed down to

to perform the remaining sentence

JHA

of the sentence imposed against

to enforce the balance of sentence

TM

TB/TM

TM/TB

Concordance

since the Italian Republic and the Republic of the Sudan is a party to the UN Convention

given that the Italian Republic and the Republic of Sudan are parties to the UN Convention

human

Omission

the Court of Genoa

the court of Appeal of Genoa

TM

Inconsistency

Omitted

Omissis

TM

Terminology and style

Whereas

section VI extraditions and deliveries

asks

given that

section 4th Extraditions and Surrenders

requires

TM

TM

TM

Word order

Public Prosecutor’s Office

The Italian Republic shall ensure that extradition on the international application of reciprocity

General Prosecuting Office

Given that the Italian Republic ensures in the matter of extradition the application of international reciprocity

TB

TM

Grammar

Noted that the Tunisian national F., born in Tunis on..., the subject of international research

Please note that D.I. Romanian

the person extradited

Given that the Tunisian national F., born in Tunisia on…, is wanted internationally

Please note that D.I. is Romanian national

the extradited person

TM/TB

Formatting

on the application of the principle of mutual recognition to judgments in criminal matters imposing custodial sentences or measures involving deprivation of liberty for the purpose of […]

on the application of the principle of mutual recognition to judgments in criminal matters imposing custodial sentences or measures involving deprivation of liberty for the purpose of […]

human

Table 3. Error analysis: Google MT

Google MT

MT output

Post-editing

Sources

Mistranslation

Delicts

To perform the remaining sentence

To be mentioned in the reply

While waiting to be courteous

Severe crimes

For the execution of the balance sentence

Please repeat when responding

In anticipation of your reply

TB

TB

TM

TB

Omission

On the basis of the MAE

and the Sudan

for the execution of the residual penality

On the basis of the aforementioned EAW

and the Republic of Sudan

for the execution of the balance of sentence 

TB

TM

TB

Inconsistency

taked / determined

Given that

TM

Terminology and style

Section IV Extraditions and Delegation

Snip

as provided for in Article

P.Q.M.

remaining penalty

performance of a penalty

c.c.p.

Section 4th Extraditions and Surrenders

Omissis

as prescribed by the Article

For these reasons

balance of sentence

enforcement of a sentence

code of criminal procedure

TM

TB

TM

TB

TM

TM/TB

Word order

General Prosecutor’s Office of Republic

of the penal code polish

Office of the Prosecutor General of the Republic

of the Polish criminal code

TB

TM

Grammar

forms the subject of international research

Of a Polish citizen

is internationally wanted

Of the Polish national

TB

TM

Formatting

Having regard to

GIVEN that

TM

Table 4. Error analysis: SDL Language Cloud MT

SDL language cloud MT

MT output

Post-editing

Sources

Mistranslation

gives the following

was passed on

Said sentence

conviction to load of D.I.

pronounced the following

became final on

The aforementioned sentence

sentence against D.I.

TM

TB

TM

TB

Omission

Having regard to Article 17 The 22.4.2005 n. 69

Given Article 17 of law no. 69 of 22.4.2005

TM

Terminology and style

Section IV extradition and deliveries

sentence of condemnation

P.Q.M

MAE

Features

transfer area of prisoners

The General Prosecutor

fences

international courtesy

penalty remaining for

D. L.gs.

Section 4th Extraditions and Surrenders

judgement of conviction

For these reasons

EAW

ORDERS

Transfer of Sentenced Persons Unit

the General Prosecuting office

receiving of stolen goods

international Comity

balance of sentence

Legislative Decree

TM

TM

TM

TM

TM

TM

TM

TB

TB

TB

TM

Word order

penal code polish

Framework Decision 909/2008/JHA

Polish criminal code

Framework Decision 2008/909/JHA

TB/human

human

Grammar

of a Polish citizen

It represents

of the Polish national

It is declared that

TM

human

5. Conclusions

This study outlines the advantages and drawbacks of the application of MT to legal language. Thanks to alignment and concordance functions linked to the use of a translation memory, issues related to style and use of collocations were facilitated, while the use of custom-designed termbases (based on human translated texts) helped to obtain a certain consistency in translating technical terms. The use of MT, with the combined use of TBs and TMs, for the translation of legal texts actually leads to a reduction in translation times without diminishing the quality of the final output, even if a lot of work is required in the setting of the overall translation process workflow, the setting of the translation project, the cleaning and the creation of TMs, the creation of glossaries as well as the use of TBs at translation stage and in the final step of quality control performed with the use of a quality assurance tool and human revision.

The next question to be investigated in the future is: is it possible to improve the linguistic competences in juridical field with the combined/integrated use of TMs, MT engines and TBs for educational purposes and the definition of best practices for the creation of ad hoc TMs and TBs?

References

Alcaraz, E., & Hughes, B. (2002). Legal Translation Explained. London: Routledge. doi: https://doi.org/10.4324/9781315760346

Bentivogli, L., Bisazza, A., Cettolo, M., & Federico, M. (2016). Neural versus Phrase-Based Machine Translation Quality: a Case Study. doi: https://doi.org/10.18653/v1/d16-1025

Bernardi, L. (2016). Il pre-editing e la traduzione automatica. Fondazione Milano. url: http://www.fondazionemilano.eu/blogpress/weaver/2016/03/19/regole-per-il-pre-editing/

Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huck, M., ... Zampieri, M. (2016). Findings of the 2016 Conference on Machine Translation (WMT16). In Proceedings of the First Conference on Machine Translation, Volume 2: Shared Task Papers (pp. 131-198). Berlin, Germany: Association for Computational Linguistics. doihttps://doi.org/10.18653/v1/w16-2301

Diño, G. (Dec 18, 2017). 3 Reasons Why Neural Machine Translation is a Breakthrough. Slator. url: https://slator.com/technology/3-reasons-why-neural-machine-translation-is-a-breakthrough/ 

Fiorito, L. (2005). La traduzione giuridica e il «Legal English» tra Common law e Civil Law. Translation Journal, 9(3). url: http://www.bokorlang.com/journal/33legal.htm

Goutte, C., Cancedda, N., Dymetman, M., & Foster, G. (2008). Learning Machine Translation. Cambridge, London: MIT Press Ltd. doi: https://doi.org/10.7551/mitpress/9780262072977.001.0001

Kit, C., & Wong, T. (2008). Comparative evaluation of online machine translation systems with legal texts. Law Library Journal, 100(2): 299-321. url: https://pdfs.semanticscholar.org/6cb7/04046816107a56f9409e66fede58d67215e0.pdf

Koehn, P., & Knowles, R. (2017). Six Challenges for Neural Machine Translation. In Proceedings of the First Workshop on Neural Machine Translation (pp.28-39). doi: https://doi.org/10.18653/v1/w17-3204

Koehn, P. (2009). Statistical Machine Translation. doi: https://doi.org/10.1017/cbo9780511815829 

Lü, Y., Huang, J., & Liu, Q. (2007). Improving Statistical Machine Translation Performance by Training Data Selection and Optimization. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp.343–350). url: http://www.aclweb.org/anthology/D07-1036 

Mai, K. (2016). Use of MT/@EC by translators in the European Commission. 2nd ELRC Conference, Brussels. url: http://www.lr-coordination.eu/sites/default/files/Brussels_conference/Mai-K_ELRC-MT%28at%29EC%20in%20DGT_26_10_2016_K.%20Mai.pdf

Megale, F. (2008). Teorie della traduzione giuridica. Fra diritto comparato e Translation Studies. Napoli, Editoriale Scientifica. doi: https://doi.org/10.7202/045694ar

Muegge, U. (2007). Controlled language: rules for machine translation. url: http://www.muegge.cc/controlled-language.htm 

Riediger H., & Galati, G. (2015). La traduzione e il web nell’epoca della traducibilità automatica. Come usare la TA e come scrivere e riscrivere i testi. Fondazione Milano. url: http://www.fondazionemilano.eu/blogpress/weaver/?wpdmact=process&did=NS5ob3RsaW5r

TAUS (2014). Statistical Machine Translation. url: https://www.taus.net/knowledgebase/index.php/Statistical_machine_translation 

Tinsley, J. (2017). Neural MT and the legal field. MultiLingual. url: https://www.multilingual.com/article/201706-28.pdf 

Toral, A., & Sánchez-Cartagena, V. M. (2017). A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 1, (pp.1063-1073). doi: https://doi.org/10.18653/v1/e17-1100

Vashee, K. (Jun 1, 2018). Why Machine Translation Matters in the Modern Era. CMS WiRe. url: https://www.cmswire.com/customer-experience/why-machine-translation-matters-in-the-modern-era

Received on 14 October 2018 and accepted for publication on 26 February 2019.


Acknowledgments


Special thanks to my colleague Ida Zadotti for her support and help in developing this research.