I think Google Photos is a great product that I personally admire the work behind. But it has its limitations. If you’re using it as a cloud option for your sweet memories, you may have noticed that it offers the search by someone’s face, by objects, landmarks, cities…
However, it’s still odd to me to identify a picture of a doll as a living human. I think the question here is similar to this experiment: When I’d hold a drawing of a tree and ask what is that? Would you spontaneously answer that’s a tree or that’s a picture of a tree?
At first, I assumed that the underlying technology is based on Faster R-CNN as Google’s AI blog article explained a Context R-CNN model leveraging temporal context.
Region based Convolutional Neural Networks identify potential objects’ regions in bounding boxes (“region of interest” or “RoI”) using selective search algorithm. Then, from each region separately, it extracts CNN features for classification.
If you’re not familiar with Region based Convolutional Neural Networks, here is a detailed insightful read in this blog article “A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN by Dhruv Parthasarathy
However, It may also be “a vision model based as closely as possible on the Transformer architecture originally designed for text-based tasks.” from their Transformers for Image Recognition at Scale article.
The Transformer is an encoder-decoder architecture proposed in the paper Attention Is All You Need aiming to handle sequence-to-sequence tasks like translation in natural language processing.
In the following screenshots of my personal Google Photos account, I came across interesting outputs for facial tags and objects recognition:
For example, artistic sculptures’s faces are detected:
Honestly, I can’t see why the algorithm clustered the first statue’s face with the sand sculpture’s face.
Other statues are detected as well like these ones:
Should we classify these as human faces? They do have the human facial traits which make it difficult for an algorithm to ignore. Hence the question is how do we detect the “liveliness”?
Another challenge is detecting faces with concealed parts. I think it’s already doing well for my face’s recognition with the covid-19 protection face mask. But, these are pictures of my niece that weren’t classified with her other pics(I added the black dots, excuse my non-existing photoshop skills).
Another classical challenge of classifying dogs vs cats that I honestly thought was mastered.
Find me a dog?
In the following pictures result, the first lovely dog is mine (which means there is a lot of her pictures…). Mistakenly another cat:
But how about a cute rabbit instead?
Honestly in the below result, I can’t see why my head, from the back, is classified as a dog lol ?
It is good at finding dogs indeed (Even this cool poodle dog sunbathing). But a deer?
Got another deer’s head. Guess, should blame that on Data augmentation techniques? Wait, that’s me again? Yet this time with on a 4 legs position (The Japanese door was too small for me in case you’re wondering where is that) …
Noticed the same thing in the first pic of the girl who was picking sth that fell at the time I clicked the picture to capture the flying bubbles:
Same in the 3rd picture as well :
First pic is mislabeled as well (It’s just a Korean oriental magpie bird).
Last picture is just a person looking at the train window, guess the strong contrast of the image confused the algorithm.
Finally, this picture of the Japanese lady finishing her dance ritual at the new year’s midnight celebration in the temple:
This made me think that maybe it’s classifying any “creature” on 4 legs as a dog ? Especially with black color? Is it because my actual lovely dog is black. Hence its trained on many pictures of the same black dog. Highly possible but still doubtful.
At this time of writing, I didn’t find a way to unclassify this (Maybe I missed it somewhere in the UI?).
I only found out how to untag a face by removing results and justifying it with these options:
For those who are skeptic of AI in general and think we will end up in a dystopian era, I want to reassure you that we are still far away from that.
This blog post is my personal idea and I am not affiliated with Google or the organizations/people of the shared links. I just didn’t want to have a redundant content of what’s already detailed out there in the vast web.
I am not writing to criticize, I am just expressing my wandering thoughts.
If you think I missed a point, please feel free to comment and share your ideas.