Skip to main content

Command Palette

Search for a command to run...

Internationalization and Localization in the age of agents

Last week, I was working on internationalizing our product. If you've never encountered these terms, this refers to the translation and localization o

Updated
Internationalization and Localization in the age of agents

Complexity

I have extensive experience and knowledge of how this worked in the pre-agent era. And I have a general idea of ​​the scope and complexity of the work. In very simple terms, you need to find every line of code that is displayed to the user and turn it into a localization function call. For example, "Hello!" becomes _("Hello!"). When the user needs to see a greeting, the _ function will look up the corresponding translation in a special dictionary. So, the user will see "Hello!" if they have Russian selected. The same applies to date formats, prices, and number separators. There are also a ton of nuances when there are enumerations such as "You have one message," "You have two new messages," or "You have N new messages." Plural forms can vary in different languages. Imagine languages ​​that are written from right to left (Arabic, Hebrew).

There are many nuances. The same word has different meanings in different contexts and on different screens. My favorite is "Ok," which looks appropriate in an English interface, but in Russian can be translated as "Accept," "Agree," "Yes," or "Confirm." Not to mention that sometimes the interface simply doesn't have room for long text, so you have to find creative ways to accommodate it. This was especially true for older games, where the interface simply couldn't adapt to long messages. You have three lines in English, but in Russian the same text is 100 characters longer.

Telegram uses an interesting approach. They display each localized line in the interface using the translation service https://translations.telegram.org/en/ios/login/. This is much clearer, but there is still a chance of making a mistake, because somewhere in the depths of the product there may be some message and you will never guess what it means without understanding the actions that resulted in it.

With a basic list of phrases and a translation tool, the hardest part begins. The painstaking work of localization begins. Even within the same language group, people communicate differently. How many people can be offended by using the wrong "in" or "on" (Ukrainians never use ‘on Ukraine’, Russians never use ‘in Ukraine’)? And how difficult it is to translate jokes and humor. I participated in translating Ubuntu and Facebook into Russian and Ukrainian. I wasn't paid for it, but I was curious to learn how they organized the process. They allowed people to enter their own translation for each available phrase, and then translators could vote for the most correct one. And that translation was then included in the final selection. Commercial products also require payment for translators. You need someone to coordinate the localization work. Just imagine you have a big release coming up in a few hours, and a new interface with hundreds of lines of text and illustrations is added to the final version of the product. The Chinese translator, who charges tens of dollars an hour, has already gone to bed. And then everyone discovers that no one has marked up the lines in the interface at all!

Adding language support to a product could add tens of percent to the development cost. Phrases are fine, but images, interface solutions, sounds, and sometimes videos are important. Localization and translation are not the same thing. Localization can also include casting. Just imagine an ad for ass cream with black actors aimed at people from minority groups, like the Caucasus. They simply don't identify with the people on screen. Add politics, censorship, and accessibility issues, and the process becomes complex and multifaceted. The business collects requirements, the developer implements them in the product, and the localizers race to translate and update everything by the release date.

Therefore, I always recommend adding multilingual support only after fully understanding that the business actually needs this feature and that these initiatives will definitely pay off. It's also advisable to get a clear market fit and go through a couple of pivots to avoid stalling new releases.

Process

On Monday, we found ourselves at a point where we needed to add support for a new language to the product. We'd already triaged and even specified this task. And it was just waiting for its moment and priority.

Using agents, I completed the preliminary work and marked all the strings for translation in the product's backend. Before starting the work, I thought we'd never set a goal to implement translation, but we generally maintained some basic hygiene. I thought we had strings marked up almost everywhere, and I expected it to be a simple and straightforward job. To understand just how wrong I was in my estimates, I now have this week's worth of work: 16,000 (!) lines of code. And that's not including the frontend. And that's not including the translations themselves, they haven't been added yet.

How long would it have taken me to do the same thing manually? Even with autocorrect, I think the productivity increase is 20 times greater. And if you factor in human error, it's even greater. We've wasted so much time because someone forgot to mark up some insignificant phrase, but it popped up like a pimple on the nose right during the first demo. It's become a joke when we show users a registration form, and the very first validation displays a message in a language the client doesn't understand. Are we protected against this situation when using agents? Let me reiterate that I have experience; I know and understand where to look, how to mark up, which rules apply when to use lazy translation strings, and when to use regular ones (gettext and gettext_lazy or similar).

If I had to build the entire mental architecture of the process from scratch, just learning the tutorials and translation technologies would take time. If you want to get an idea, read the Django documentation on internationalization and localization https://docs.djangoproject.com/en/6.0/topics/i18n/ . Keep in mind that this documentation only covers localizing one specific product and does not include documentation on setting up and selecting additional localization tools.

I've been talking about the interface all this time, but it's all just a piece of cake until the question of data localization arises. When we last planned the project, we calculated how many unique strings we had to translate. Last time, without spaces, we came up with about a million characters. This is the data that appears in the user interface in graphs, tables, and when exported to Excel. I've already implemented a database translation infrastructure. It's rudimentary for now, but conceptually, we'll be moving in this direction. There's no industry-standard solution accepted by the community, and there can't be one; every project tackles this issue in its own way.

Result

If you're interested in the technical details, new columns with a language suffix are added at the database level, and the ORM, depending on the locale, decides which field to return to the user based on the locale and whether the translation is available. For example, if you have a table with a 'name' column, it gets a 'name_es' counterpart if it needs to be translated into Spanish or 'name_ru' if it needs to be translated into Russian. This option may not be suitable for you, but it's suitable for us for many reasons.

Furthermore, I built a translation infrastructure using agents. There's a tool that reads a file with interface phrases and translates them into target languages, along with data from the database. Plus, there's TranslatorBrain, a database where we store all completed translations to avoid paying for repeated API calls if a phrase has already been translated. Moreover, our work requires deploying separate instances of our project for different tasks and using different data for calculations.

This is objectively a very large amount of work for a working week. I can't convince you to use AI and agents. But I currently have a well-defined set of skills for internationalization. And if I get a project that needs to be multilingual, I can save it tens of thousands of dollars on codebase preparation compared to the previous approach. As you can imagine, agents don't care whether they need to process a project with 50,000 lines or 1,500,000 lines. Plus, as a bonus, you can get relatively high-quality automatic interface translations. There are trade secrets for that, too. They might not always be as good as those from professional translation agencies, but they certainly won't ruin your budget.

Seriously, if you have a project, please contact me; I'd be happy to help. Reach me out by https://t.me/mkashkin.