Using LLM to Automatically Translate Manga
This long weekend I decided to jump into this new Large Language Models (LLM) hype and see for myself what it's all about.
Since only the most popular and profitable manga series can get official releases, many obscure but equally entertaining manga will never get licensed in the west. One idea that's been in my backburner for the longest time was building a tool to automatically translate these manga for me.
Process
To translate manga, there's generally three steps:
- Translate the original text
- Remove the original text (also called "cleaning")
- Add the translated text back into the same place (also called "typesetting")
Cleaning
In official translations, this step can be skipped since the artist will have the original manga panels without text available for the overseas publisher.
In fan translations, this step is opening the manga panel in Photoshop and tediously using the eraser and pen-brush tool to remove the original text.
To automate this process, I decided to use the Panel Cleaner tool that I found on GitHub. To my surprise, the Docker image it provided worked off the shelf with no configuration needed (I know, Docker working as designed right?).
Using Panel Cleaner, I was able to get a mask to cover up the original texts. In addition, I used a simple flood fill algorithm to get bounding boxes of where the original texts are located.
Translating
This step is where I used LLMs. Since Google's Gemini 2.5 was the newest model released at the beginning of this weekend, I decided to use that.
Using the bounding boxes from the previous step, I created this prompt:
The first problem I encountered was that some text was not translated. After searching around, I discovered that LLMs (at least Gemini) sees images in 1000x1000 format. Once I normalized the bounding box coordinates, I was able to get translations of all the text as now Gemini can see everything.
The next problem I encountered was when characters have multiple speech bubbles in the same manga panel (e.g. long speech or thought), the resulting multi-part translation was often disjointed and incoherent. After playing around with some settings in Google AI Studio (the online playground), I was able to get it to output much more accurate translations by enabling thinking mode so that it can consider the full panel.
Another problem that I encountered was inaccuracies in short, isolated sentences and conversations. Since Japanese is a high-context language, the subject is often omitted and thus making pronouns in translations difficult to infer.
For a native human reader, the gender of pronouns and subject of conversations can be inferred from previous dialogs. To mimic this in LLMs, I switched from a one-off text generation request per page to a multi-part chat. This allowed Gemini to use context from previous pages to determine the subject of ambiguous sentences.
With these three changes, Gemini's translations were actually pretty good. However, I found that there's still a few cases where it falls short of human translation:
- Sentences where pronouns are derived from the context of the overarching story rather than just the previous pages
- Very short sentences or partial sentences due to characters stuttering in the moment
- Pop-culture references, wordplay, portmanteaus, onomatopoeias, etc. that are heavily reliant on the Japanese language itself
For now, I'll just accept these inaccuracies as it's still "better-than-nothing" as I can still get an idea of the overarching story.
Typesetting
For this step, I took the simple approach of creating a textbox where each bounding box was and just insert the translated text into each box. This worked surprisingly well for most cases.
However, manga often have text that "pop" out, rotate, or have other special effects that are hard to automatically detect.
While I think it's possible to fix this bounding box issue by fine-tuning Panel Cleaner and my flood fill algorithm, I have decided to hold off on fixing this issue because these text boxes are usually used for comedic effect and rarely used for plot-relevant text.
Conclusion
Despite the roadblocks I've encountered, I'm still satisfied with what I've built. My tool can take in a directory of Japanese manga pages and automatically clean, translate, and typeset all of them in just a few minutes.
In the past, I've been skeptical of machine translations, but I can't deny seeing some side-by-side comparisons that machine translation is getting a lot better nowadays. If my weekend project is the worst machine translations can do today, then I look forward to seeing what can be done in the next few years.