Why is Google Photos classifying the Japanese lady as a dog?

Moona Balghouthi
5 min readMar 14, 2021

I think Google Photos is a great product that I personally admire the work behind. But it has its limitations. If you’re using it as a cloud option for your sweet memories, you may have noticed that it offers the search by someone’s face, by objects, landmarks, cities…

However, it’s still odd to me to identify a picture of a doll as a living human. I think the question here is similar to this experiment: When I’d hold a drawing of a tree and ask what is that? Would you spontaneously answer that’s a tree or that’s a picture of a tree?

At first, I assumed that the underlying technology is based on Faster R-CNN as Google’s AI blog article explained a Context R-CNN model leveraging temporal context.
Region based Convolutional Neural Networks identify potential objects’ regions in bounding boxes (“region of interest” or “RoI”) using selective search algorithm. Then, from each region separately, it extracts CNN features for classification.

If you’re not familiar with Region based Convolutional Neural Networks, here is a detailed insightful read in this blog article “A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN by Dhruv Parthasarathy

However, It may also be “a vision model based as closely as possible on the Transformer architecture originally designed for text-based tasks.” from their Transformers for Image Recognition at Scale article.
The Transformer is an encoder-decoder architecture proposed in the paper Attention Is All You Need aiming to handle sequence-to-sequence tasks like translation in natural language processing.

In the following screenshots of my personal Google Photos account, I came across interesting outputs for facial tags and objects recognition:

For example, artistic sculptures’s faces are detected:

Metal sculpture’s in Honfleur, FR (Left), a sand sculpture in Tottori, JP (Right)

Honestly, I can’t see why the algorithm clustered the first statue’s face with the sand sculpture’s face.

Other statues are detected as well like these ones:

Kampot, Cambodia
Suwon, South Korea

Should we classify these as human faces? They do have the human facial traits which make it difficult for an algorithm to ignore. Hence the question is how do we detect the “liveliness”?

Another challenge is detecting faces with concealed parts. I think it’s already doing well for my face’s recognition with the covid-19 protection face mask. But, these are pictures of my niece that weren’t classified with her other pics(I added the black dots, excuse my non-existing photoshop skills).

my niece playing with my phone

Another classical challenge of classifying dogs vs cats that I honestly thought was mastered.

Find me a dog?

Got a panda statue? Lion statue? Cat!

In the following pictures result, the first lovely dog is mine (which means there is a lot of her pictures…). Mistakenly another cat:

3 monkeys doing their own thing (Right)

But how about a cute rabbit instead?
Honestly in the below result, I can’t see why my head, from the back, is classified as a dog lol ?

Annoying humans spying on a rabbit

It is good at finding dogs indeed (Even this cool poodle dog sunbathing). But a deer?

Nara deer misclassified

Got another deer’s head. Guess, should blame that on Data augmentation techniques? Wait, that’s me again? Yet this time with on a 4 legs position (The Japanese door was too small for me in case you’re wondering where is that) …

Deer’s fur similar to a dog’s fur

Noticed the same thing in the first pic of the girl who was picking sth that fell at the time I clicked the picture to capture the flying bubbles:

Bubbles and dogs

Same in the 3rd picture as well :

Mysterious passenger

First pic is mislabeled as well (It’s just a Korean oriental magpie bird).

Last picture is just a person looking at the train window, guess the strong contrast of the image confused the algorithm.

Finally, this picture of the Japanese lady finishing her dance ritual at the new year’s midnight celebration in the temple:

shameful misclassification

This made me think that maybe it’s classifying any “creature” on 4 legs as a dog ? Especially with black color? Is it because my actual lovely dog is black. Hence its trained on many pictures of the same black dog. Highly possible but still doubtful.

At this time of writing, I didn’t find a way to unclassify this (Maybe I missed it somewhere in the UI?).
I only found out how to untag a face by removing results and justifying it with these options:

Google Photos’s UI

For those who are skeptic of AI in general and think we will end up in a dystopian era, I want to reassure you that we are still far away from that.

This blog post is my personal idea and I am not affiliated with Google or the organizations/people of the shared links. I just didn’t want to have a redundant content of what’s already detailed out there in the vast web.
I am not writing to criticize, I am just expressing my wandering thoughts.
If you think I missed a point, please feel free to comment and share your ideas.

--

--

Moona Balghouthi

Software Eng~Data Scientist, Into People, Social Entrepreneurship & Adventures !