Using GenAI to Classify an Image as a Photo, Screenshot, or Meme

File this under the "I wasn't sure if it would work and it did" category. Recently, a friend on Facebook wondered if there was some way to take a collection of photos and figure out which were 'real' photos versus memes. I thought it could possibly be a good exercise for GenAI and decided to take a shot at it. As usual, I opened up Google's AI Studio and did a few initial tests:

Screenshot from AI Studio

I then simply removed that image and pasted more info to test. From what I could see, it worked well enough. I then took the source code from AI Studio and began working.

The Code

First, I grabbed some pictures from my collection, eleven of them, and tried to get a few photos, memes, and screenshots. To make it easier for me, after downloading them I renamed them so it would be quicker to see if it worked right. As I mentioned above, AI Studio gave me the code, but I modified it slightly so I could pass a directory of images:

import fs from 'fs/promises';
import 'dotenv/config';

import { GoogleGenerativeAI, HarmCategory, HarmBlockThreshold } from '@google/generative-ai';

const MODEL_NAME = "gemini-pro-vision";
const API_KEY = process.env.GOOGLE_AI_KEY;


async function detectPhoto(path) {
  const genAI = new GoogleGenerativeAI(API_KEY);
  const model = genAI.getGenerativeModel({ model: MODEL_NAME });

  const generationConfig = {
    temperature: 0.4,
    topK: 32,
    topP: 1,
    maxOutputTokens: 4096,
  };

  const safetySettings = [
    {
      category: HarmCategory.HARM_CATEGORY_HARASSMENT,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
    {
      category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
    {
      category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
    {
      category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
  ];

  const parts = [
    {text: "Look at the following photo and tell me if it's a photo, a screenshot, or a meme. Answer with just one word.\n"},
    {
      inlineData: {
        mimeType: "image/jpeg",
        data: Buffer.from(await fs.readFile(path)).toString("base64")
      }
    },
    {text: "\n\n"},
  ];

  const result = await model.generateContent({
    contents: [{ role: "user", parts }],
    generationConfig,
    safetySettings,
  });

  const response = result.response;
  return response.text();
}

const root = './source_for_detector/';
let files = await fs.readdir(root);
for(const file of files) {
	console.log(`Check to see if ${file} is a photo, meme, or screenshot...`);
	let result = await detectPhoto(root + file);
	console.log(result);
}

It worked perfectly!

Terminal output from script