File this under the "I wasn't sure if it would work and it did" category. Recently, a friend on Facebook wondered if there was some way to take a collection of photos and figure out which were 'real' photos versus memes. I thought it could possibly be a good exercise for GenAI and decided to take a shot at it. As usual, I opened up Google's AI Studio and did a few initial tests:
I then simply removed that image and pasted more info to test. From what I could see, it worked well enough. I then took the source code from AI Studio and began working.
The Code
First, I grabbed some pictures from my collection, eleven of them, and tried to get a few photos, memes, and screenshots. To make it easier for me, after downloading them I renamed them so it would be quicker to see if it worked right. As I mentioned above, AI Studio gave me the code, but I modified it slightly so I could pass a directory of images:
import fs from 'fs/promises';
import 'dotenv/config';
import { GoogleGenerativeAI, HarmCategory, HarmBlockThreshold } from '@google/generative-ai';
const MODEL_NAME = "gemini-pro-vision";
const API_KEY = process.env.GOOGLE_AI_KEY;
async function detectPhoto(path) {
const genAI = new GoogleGenerativeAI(API_KEY);
const model = genAI.getGenerativeModel({ model: MODEL_NAME });
const generationConfig = {
temperature: 0.4,
topK: 32,
topP: 1,
maxOutputTokens: 4096,
};
const safetySettings = [
{
category: HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
];
const parts = [
{text: "Look at the following photo and tell me if it's a photo, a screenshot, or a meme. Answer with just one word.\n"},
{
inlineData: {
mimeType: "image/jpeg",
data: Buffer.from(await fs.readFile(path)).toString("base64")
}
},
{text: "\n\n"},
];
const result = await model.generateContent({
contents: [{ role: "user", parts }],
generationConfig,
safetySettings,
});
const response = result.response;
return response.text();
}
const root = './source_for_detector/';
let files = await fs.readdir(root);
for(const file of files) {
console.log(`Check to see if ${file} is a photo, meme, or screenshot...`);
let result = await detectPhoto(root + file);
console.log(result);
}
It worked perfectly!
If you want a copy of the source, you can grab it here: https://github.com/cfjedimaster/ai-testingzone/tree/main/detect_meme_ss
The Photos
Ok, technically you can just head over to the GitHub repo to see these, but here are the source images. First, the 'regular' photos:
Next, the screenshots:
And finally, the memes. Enjoy.