Translation Nightmares
When a neural network is trained, one expects that it will at least be able to correctly produce the expected results when shown the input it was trained on. For input not shown during training, one hopes that it will be able to generalize and produce coherent output. However, when shown something completely unexpected, then the network might produce bizarre results.
Just the other day I was training a neural network to understand the behavior of people waiting in line in a store. The results produced by the network for unseen inputs was all right — but something weird happened when the lights went out at the end of the day. There was nobody in line at that time, of course, but when lights went out, then the images that reached the neural network were completely different. So all of a sudden the neural network was reporting a full line, even though no one was there.
I don’t know anything about the methods employed by Google Translate, but I suspect it might have some similar problems. There’s this subreddit, TranslateGate, which collects examples of weird translations produced by Google’s tool. You feed it utter garbage, and it often produces coherent garbage. This must be because it always produces some output, so if you give it garbage, it will produce something. I assume that the output is somehow coherent because their networks were trained to produce coherent output.
It’s weird anyway, though. If you type “dog” 20 times (separated with a space) and tell it to translate from Igbo to English, it produces this output:
Doomsday Clock is three minutes at twelve We are experiencing characters and a dramatic developments in the world, which indicate that we are increasingly approaching the end times and Jesus’ return
Chilling, huh? This must be something like this: for some reason the neural network starts to produce something about apocalypse, and then it just keeps going. But you can produce other weird things too, and they’re not always eschatological. Translating sequences of “ga gu” of varying lengths from Somali to English, we get this:
Length | Translation |
---|---|
1 | ga gu |
2 | go to bed |
3 | go to bed |
4 | keep your car |
5 | Get your car safely |
6 | Keep your car safely |
7 | your caregiver |
8–9 | your personal safety record |
10–11 | your visit to your home country |
12 | your child's health care needs |
13–14 | your child's day care home |
15 | your visit to the United States |
16 | the effects of the disease in the early stages of life |
17–18 | the effects of the disease in the early stages of the epidemic |
19–20 | the impact of the floods on the road to snow |
And from Swahili to English:
Length | Translation |
---|---|
1 | no |
2 | do not go |
3 | it is not too late |
4 | there is no end of it |
5–6 | it is not too late |
7 | it is not too late for the day |
8–9 | It is the end of this world |
10–11 | it is the end of the age of the end of the ages |
12 | the end of this world is near the end of the ages |
13 | the end of the world is near the end of the ages of the ages |
14 | it is the end of the age of the end of the ages of the ages |
15–16 | at the end of the age of the greatness of the kingdom of heaven. |
17 | at the end of the age of the greatness of the greatness of the kingdom of heaven. |
18 | the end of the age of the greatness of the greatness of the kingdom of heaven. |
19 | at the end of the great tribulation of the greatness of the greatness of the kingdom of heaven. |
20 | at the end of the age of the end of the age of the greatness of the kingdom of heaven. |
This doesn’t work with more prominent languages. It could be that the neural network just didn’t have that much material to be trained with. It could also be that the results reflect the training material. For Somali, immigration, health, safety; for Swahili, religion.