DeepL

23 Oct, 2022

History

Gereon Frahling & Leonard Fink founded Linguee at the end of 2007. Linguee scraped bilingual text samples from the Internet using web crawlers and then applied machine learning algorithms to evaluate its quality. Linguee also worked intensively with in-house language specialists and hundreds of freelance linguists to evaluate the quality of the translated sentence pairs.
The team leveraged Linguee's existing dataset of the translation search engine to build a new neural-based machine translation app : DeepL
Most of DeepL's technological details are kept secret.

Let's start with Gruenderszene's interview (2013) with Gereon Frahling, Linguee's co-founder.

Frahling received his university diploma in mathematics in 2002, his doctorate in theoretical computer science in 2006, and completed a post-doctorate at Google in New York in 2007 before retiring for a year and a half together with Leonard Fink to work on Linguee. This Cologne-based company now processes more than five million queries a day, and the startup employs 14 full-time staff. Linguee also recently reported that it had reached break-even.

I am the founder and CEO of Linguee. I'm a mathematician by background and have spent a lot of my life working on statistical analysis of extremely large amounts of data, first at university and then at Google after my PhD.

From 2007 to 2008, I worked in Google's research department in New York. At that time, of course, I had to use online dictionaries very often. What bothered me then was that you never got translations displayed in context and had access to very little information. That's when I realized how valuable a search engine would be where all the translations in the world could be searched. Because the development of a search engine fit so well with my knowledge, I quit Google and implemented this together with a friend.

I founded Linguee at the end of 2007. If someone has a problem translating a certain group of words, he can look at Linguee to see if a translator somewhere in the world has already translated this exact phrase and use the translation as a guide.

Timeline

2006 : Google Translate was launched in 2006, using statistical machine translation (SMT).
2007 : Linguee is founded
2016 : Neural machine translation (NMT) became state of the art in 2016, easily outperforming models based on statistical machine translation (SMT). Google Translate switched from SMT to NMT. Meanwhile, within Linguee GmbH, a team led by Jaroslaw Kutylowski begins working on the first version of the DeepL Translator, a neural network-based online translation. The team leveraged the existing dataset of the translation search engine Linguee. Kutylowski says that they first started using neural models in areas other than MT, for example, to improve their quality checking and monitoring for Linguee. But they soon realized it worked really well for machine translation, so they started to pull staff from other projects and had them work on machine translation.
2017 : DeepL Translator is released to the public, offering free translations between English, German, French, Spanish, Italian Polish and Dutch. DeepL employs 20 full-time staff, about half of whom are engineers. The rest are product managers and quality editors that liaise with its more than 500 freelancers (mostly translators) on quality checking and monitoring.
2018 : Russian and Portuguese are added. The company also attracted an investment from one of Silicon Valley’s most high-profile venture capital firms, Benchmark Capital (early-stage investor in Dropbox, eBay, Instagram, Snapchat, Twitter, Uber, and Yelp, among others) , which took a 13.6% stake. Among other investors are btov, a European venture capital firm with offices in Germany, Switzerland, and Luxembourg.
2020 : Chinese and Japanese are added. The number of full-time staff at DeepL doubled between 2019 and 2020; from 43 to 86, according to regulatory filings. Regulatory filings also show that DeepL reported EUR 142,000 in annual profit for 2020. The company’s original headquarters near the Rhine River in Cologne closed its doors in March 2020. A new office location was opened, still in the city center, where the staff takes up two floors. An office has also been opened in the “Silicon Roundabout,” London’s startup hub; and the company is also currently recruiting for Amsterdam-based roles.
2021 : Bulgarian, Czech, Danish, Estonian, Finnish, Greek, Hungarian, Latvian, Lithuanian, Romanian, Slovak, Slovenian, and Swedish are added.
2022 : Turkish, Indonesian and Ukrainian are added.

Secret Sauce

DeepL remains tight-lipped about their secret sauce. The company shies away from attending conferences or releasing research papers. Unsurprising given that keeping its tech a trade secret is key to maintaining its competitive edge against bigger competitors. In contrast, Google, Microsoft, and Facebook, while also maintaining some degree of secrecy, are more actively engaged in the academic community. Google, for instance, has published 107 papers on Machine Translation, including eight in 2021.

Early DeepL marketing materials provided some hints, stating that DeepL was built on convolutional neural networks (CNNs), a type of neural network more commonly used in analyzing images. Meanwhile, the first version Google Translate’s NMT, released in 2016, ran on recurrent neural networks or RNNs. The “transformer model” (arising from Google research published in 2017) is widely recognized as producing superior results and is now the dominant paradigm.

Indeed, DeepL blogged about a major change to its approach to neural networks in Feb 2020. While disclosing no details, it stated that its AI researchers had succeeded in achieving another breakthrough in quality, resulting in more precision for the target text. Here's a citation from their company blog :

It is well known that most publicly available translation systems are direct modifications of the Transformer architecture. Of course, the neural networks of DeepL also contain parts of this architecture, such as attention mechanisms. However, there are also significant differences in the topology of the networks that lead to an overall significant improvement in translation quality over the public research state of the art. We see these differences in network architecture quality clearly when we internally train and compare our architectures and the best known Transformer architectures on the same data.

Most of our direct competitors are major tech companies, which have a history of many years developing web crawlers. They therefore have a distinct advantage in the amount of training data available. We, on the other hand, place great emphasis on the targeted acquisition of special training data that helps our network to achieve higher translation quality. For this purpose, we have developed, among other things, special crawlers that automatically find translations on the internet and assess their quality.

In public research, training networks are usually trained using the “supervised learning” method. The network is shown different examples over and over again. The network repeatedly compares its own translations with the translations from the training data. If there are discrepancies, the weights of the network are adjusted accordingly. We also use other techniques from other areas of machine learning when training the neural networks. This also allows us to achieve significant improvements.

Meanwhile, we (like our largest competitors) train translation networks with many billions of parameters. These networks are so large that they can only be trained in a distributed fashion on very large dedicated compute clusters. However, in our research we attach great importance to the fact that the parameters of the network are used very efficiently. This is how we have managed to achieve a similar translation quality even with our smaller and faster networks. We can therefore also offer very high translation quality to users of our free service.

And here's their claim on blind test result :

We are pleased to inform you that we have launched a completely new translation system that represents another quantum leap in translation quality. This has prompted us to conduct new blind tests. We translated 119 lengthy passages from a wide variety of subjects using DeepL Translator and some competing systems. We then asked professional translators to evaluate these translations and choose the best translation-without being informed which system produced which translation. The translators selected the translations from DeepL four times more often than those from any other system (Google, Amazon, Microsoft):

Kutylowski gave away very little about the model they use to run their neural machine translations. He stressed that they are reading a lot of what’s being published in the space and then combine that information with their own ideas and insights in developing DeepL. Regardless of the model used, DeepL’s access to Linguee’s curated translation data is an important asset for the company as high-quality bilingual translation data has been a sought after commodity of late.

References

Magdalena Räth (December 11, 2013) "Linguee : Wir haben uns 18 Monate vergraben" Gruenderszene
Anna Wyndham (September 15, 2021) "Inside DeepL: The World’s Fastest-Growing, Most Secretive Machine Translation Company" Slator
DeepL (November 1, 2021) "How does DeepL work?"
DeepL (February 6, 2020) "Another breakthrough in AI translation quality"
Florian Faes (October 2017) "Why DeepL Got into Machine Translation and How It Plans to Make Money" Slator
DeepL "Company Profile"