翻译质量自动评估在电商机器翻译中的应用 BoxingChen

1. Quality Estimation Technology and its Applications in E-Commerce Machine Translation Boxing Chen / 陈博兴 MIT, DAMO Academy, Alibaba Group / 阿里巴巴达摩院机器智能技术
2. Applications and Challenges of MT in E-Commerce Domain n Multiple Strategies Improve E-Commerce Machine Translation n Evaluation and Estimation of Translation Quality n Human Evaluation of Translation Quality n Automatic Translation Quality Estimation n Applications of QE Technology in E-Commerce MT n Conclusions n
3. My Short Bio • Work in 5 countries as a Machine Translation Researcher • • 20+ No. 1 in various MT related evaluations • • • • France, Italy, Singapore, Canada and China 12 MT evaluation tasks: IWSLT, TC-STAR, NIST, BOLT, WMT, etc. 6 WMT MT quality estimation sub-tasks 2 WMT MT metric tasks Published 50+ papers • Including “best paper award”, “best paper award nomination”, etc.
4. Applications of MT in E-Commerce Website Traffic Drainage Insite Search Wide Coverage and Accuracy of Targeting Accurate Translation of Query and Category Preserve & Repurchase Trust & Service Customer Conversion Native Translation of Commodity Information 1.Traffic Drainage Words MT Service 2.Landing Page 1.Language Identification 3.Creative Ideas 2.Spelling Correction 1.Information Rewriting 3.Query Translation 2.Title Translation 4.Category Translation 3.Description/Comment Translation 1.Overall Information Quality 2.Customer Service 3.Localization Service 4.Communication between Seller & Buyer Business Index UV Cost L-D L-D D-O GMV
5. Alibaba MT’s Capability Produc t info! N*10E8/day! User calls! 11.11! N*10E11/day! N*10E9/day! Words translat ed! Business supported N*10E4/s! • Product information • X billions/day (offline) • User calls • X billions/day • 11.11 peak • X 10 thousands/second • Translated words • X 100 billions/day • Business supported • X 10 billions USD/year
6. Business Ecology of Alibaba Translation International Business of Alibaba AliExpress Alibaba Scenario Solution Title Category & Attribute Description Query Comment Internet Audio/Image Data accumulation to improve MT Machine Translation Platform Data Platform Message …… High-quality, trustable Fast, low-cost Technical Capability Dingtalk Lazada Tmall HK Human Translation Crowdsourcing Platform CAT to reduce cost Data acquisition General corpus In-domain corpus & KB Data production Business Data 12!
7. Challenges of MT in E-commerce Translation Quality! •Readable output of target language! •Accurate translation of key information! •Flexible intervention mechanism! Speed! •Fast training on large-scale corpus! •High concurrency! •Low latency of inference! Service Quality! •High availability! •Flexible and rich interface! •Extendable to many language pairs! •Efficient deployment and update!
8. Data Strategy: Quantity Parallel data Internet data crawler Bilingual pages Out sourcing Crowd sourcing • Quantity • • • • data analysis Purchase / Exchange corpora Human Translation 20+ language pairs Main language pairs, such as Chinese-English: N*10E8 (亿) Most of language pairs, such as Chinese-French: N*10E7 (千万) Low-resource language pairs, such as Chinese-Vietnamese: N*10E6 (百万)
9. Data Strategy: Quality, Relevance IBM model Quality Estimation corpora RNN force decoding data! Data source information Data Classification corpora Topic Model based data selection Language Model based data selection CNN-based data classification In-domain! general-domain!
10. MT Knowledge Base NER Ali Knowledge graph ONLY Tens of Billions Entities! 2017 NEL Data mining Product data slimming synonym dress hypernym lace sleeveless etc 词形 多 维 度 立多 体语 空 间言 的的 知 识 库 品类 尺寸 人物 英 商品 语项 可扩展 语言 规则 产地 款式 沟通 材质 快递 俄 同义 词 旅游 购物 商品 翻译 10! 餐饮 领域 Schema 旅游 翻译 实时 沟通 上位 词 地域 缩略词 西 应用 场景 品牌 口语 … Multi-lingual knowledge base 词性 企业 中 语格 语义 语言学 ontology 语言学规则 适用性
11. MT Model Strategy: RBMT • Rule Based Machine Translation • • • • Direct matching Map input to output with basic rules. Rules are designed by human being. Scenarios: • • • • • Numbers Date Address Product info NE 11!
12. MT Model Strategy: SMT • Statistical Machine Translation • Scenarios: • • Product title Query [Koehn, 2003] 12!
13. MT Model Strategy: NMT • Neural Machine Translation • Scenarios: • • • • Transformer Product description Message Offer Comments RNN-based seq-seq model [Vaswani et al., 2017] 13!
14. MT Model Strategy: Constrained Translation • Customized translation for certain words/phrases • A pre/post processing module for NMT • Using attention information for alignment • Trained using sentences with aligned tags • Shared src and tgt tag embeddings. [Kuang et al., ACL 2018]
15. Alibaba @ WMT 2018 News Task
16. Applications and Challenges of MT in E-Commerce Domain n Multiple Strategies Improve E-Commerce Machine Translation n Evaluation and Estimation of Translation Quality n Human Evaluation of Translation Quality n Automatic Translation Quality Estimation n Applications of QE Technology in E-Commerce MT n Conclusions n
17. Translation Evaluation and Quality Estimation • Translation Evaluation: measure the quality of the • Quality: translations against golden references. • • • Fluency, adequacy? ・Distance to a correct version? System A vs System B? ・How many major and minor errors? Why do we need translation evaluation? • 1, allow rapid comparisons between different systems. • 2, enable the parameter tuning during system training.
18. Translation Evaluation and Quality Estimation • Quality Estimation: estimate the quality of the translations • Quality: without golden references. • • • Can we publish it as is? Is it worth post-editing it? ・Can a reader get the gist? ・How much effort to fix it? Why do we need quality estimation? • • 1. quality control in translation industry 2. parallel data cleaning before system training
19. Automatic Evaluation Metrics • MT metrics measure the quality of the translations against human references. • BLEU and Meteor are two most widely used metrics.
20. Problems with Reference-Based Evaluation • Requires human references • Reference(s): only a subset of good translations • Huge variation in reference translations. • Metrics completely disregard source segment • Cannot be applied for MT systems in use • Increased score do not necessarily indicate improved translation quality Credit: Lucia Specia
21. Human Evaluation Methods Credit: Lucia Specia
22. Disadvantages of Human Evaluation • Time consuming, High cost • • Only a small portion of translation can be evaluated Subjective • Low Inter- and Intra-annotator agreements
23. Applications and Challenges of MT in E-Commerce Domain n Multiple Strategies Improve E-Commerce Machine Translation n Evaluation and Estimation of Translation Quality n Human Evaluation of Translation Quality n Automatic Translation Quality Estimation n Applications of QE Technology in E-Commerce MT n Conclusions n
24. Automatic Quality Estimation • Estimate the quality of translation at run-time; • Estimate the quality of translation without any reference translation. Good MTs Machine Translation Outputs (MTs) Quality Estimation System (QE system) Bad MTs
25. Sentence- & Word- Level Quality Estimation • Sentence-Level • • Sentence Scoring according to postediting(PE) effort: percentage of edits need to be fixed (HTER) Word-Level • • Word Tagging to predict OK/BAD tokens • number of tags = number of tokens Gap Tagging to predict OK/BAD gaps ( = predict missing tokens) • number of tags = number of tokens +1 For Example: SRC: I have a red apple . MT: 我 有 一个 粉 苹果 。 PE: 我 有 一个 红 苹果 。 HTER: 1/6=0.167 (1 replacement) Word Tags: OK OK OK BAD OK Gap Tags: OK OK OK OK OK OK OK
26. Quality Estimation System Credit: Lucia Specia Feature Extractor Quality Estimator 26!
27. QE System @ Alibaba: QE Brain
28. Feature Extractor: Bi-directional Transformer Three components: (1)Self-attention encoder for the source (2)Forward and backward selfattention encoders for the target sentence (3)The reconstruction for the target sentence
29. Features in QE Brain
30. Quality Estimator: Bi-directional LSTM Concatenate the features along the depth direction to obtain a single one Sentence-level score can be formulated as a regression problem • Word tagging prediction is a sequence labeling problem • Gap tagging prediction is a sequence labeling problem •
31. QE Model — Performance Boosting Strategy Human-crafted Features We introduce the human-crafted features as additional linear components for the predictive layer with a sigmoid activation function in the input of the Bi-LSTM quality predictive model Sentence-level QE single model +HF +FT Ensembling Word-level Fine-tune with Artificial QE Data Greedy Ensemble Selection Round-trip translation in the APE task provides more supplementary training data, aiming to increase the diversity of erroneous translations during the training process so that it can reduce overfitting. The greedy ensemble selection algorithm, Focused Ensemble Selection (FES ), helps to reduce the size of averaging ensembles but improve its efficiency and predictive performance. Test 2017 en-de Pearson’s r 0.6837 0.6842 0.6957 0.7159 Spearman’s ⍴ 0.7091 0.7150 0.7205 0.7402 Test 2017 de-en Pearson’s r 0.7099 0.7085 0.7128 0.7338 Spearman’s ⍴ 0.6424 0.6551 0.6422 0.6700
32. QE Experimental Results @ WMT 2018 Sentence-level Test 2018 en-de SMT Test 2018 de-en SMT Pearson’s r Spearman’s ⍴ Pearson’s r Spearman’s ⍴ Baseline 0.365 0.381 0.332 0.325 Competitor 0.700 0.724 0.767 0.726 Our System 0.731 0.747 0.763 0.732 Sentence-level Test 2018 en-de NMT Pearson’s r Spearman’s ⍴ Baseline 0.287 0.420 Competitor 0.513 0.605 Our System 0.501 0.605
33. QE Experimental Results @ WMT 2018 Word-level Test 2018 en-de SMT F1-Multi Test 2018 en-de NMT F1-Multi Test 2018 de-en SMT F1-Multi Baseline 0.363 0.181 0.437 Competitor 0.430 0.291 0.424 Our System 0.607 0.435 0.593
34. Alibaba @ WMT18 Quality Estimation Task 34!
35. Applications and Challenges of MT in E-Commerce Domain n Multiple Strategies Improve E-Commerce Machine Translation n Evaluation and Estimation of Translation Quality n Human Evaluation of Translation Quality n Automatic Translation Quality Estimation n Applications of QE Technology in E-Commerce MT n Conclusions n
36. QE in E-Commerce MT Loop MT model MT Log Model Upgrade E-Commerce Data Improve Product Selection Translation Quality Quality Estimation
37. Product Selection Strategy List! Product! Selection! Favorite! Deal! High ! Valuable! Products!
38. Quality Estimation • Quality estimation according to different human translation quality measures • • • • WMT: HTER score (previous slides) E-Commerce: Linguist LQI score E-Commerce: Scores based on users’ behavior E-Commerce: Crowdsourcing evaluation score Quality Estimation
39. QE Based on LQI Score QE分数 LQI分数
40. QE Based on Crowdsourcing Evaluation • • Educated crowdsourcing evaluation http:// crowdsourcing.aliexp ress.com
41. QE Based on Users’ Behavior • Pipeline of users’ behavior Traffic Drainage • Business index • • • Insite Search L-D: List-to-description rate D-O: Description-to-Order rate RR: Repurchase Rate Purchasing Preserve & Repurchase
42. Improve Translation Quality Model finetuning! Improve Translation Quality Crowdsourcing translation! Expert translation!
43. Crowdsourcing Translation • • Crowdsourcing Translation: http://crowdsourcing.aliexpress.com Start User Translation User Voting Seller Confirming Accept? Yes Update Translation End No
44. QE in E-Commerce MT Loop MT model MT Log Model Upgrade E-Commerce Data Improve Product Selection Translation Quality Quality Estimation
45. Applications of QE Model: Training Data Filtering • Training data quality is a key factor to NMT’s performance. ! Collected parallel data ! ! QE model based data filtering ! ! Cleaned parallel data !
46. Applications of QE Model: Automatic Post-Editing • • Unify the Quality Estimation and Automatic Post-editing We can predict both words & gaps when condition on the entire source and the context in the target.
47. Applications of QE Model: Human Translation Quality Control • Integrating to human translation quality control process QE! QE! Translating Editing Proofing
48. Conclusions • • • • • • Machine translation is not perfect, but it can bring in real values in some scenarios, such as cross-border e-commerce, etc.; RBMT, SMT, NMT and constrained translation all have advantages in different scenarios of e-commerce translation; Alibaba has state-of-the-art machine translation technology. Bidirectional transformer is a strong representation encoder. QE system based on bidirectional transformer achieved the best results. Quality estimation is a crucial component for industrial machine translation, QE based MT improvement loop helps e-commerce translation. We are open for collaboration!