Digital DaVinci: A Genre and Era Classification Model

This project’s aim is to assess the classification of paintings by genre and era, two non-context classifications of images through a convolutional neural network architecture for image classification. While recent advances in transformer and diffusion-based models have shown the immense potential for 2D generation. In fact, a couple of years ago Jason Allen used Midjourney to win himself first place in the Colorado State Fair, which prompted discussion on both the potential and ethics of generative AI. Can AI fully understand art and, and if so, what else can it do beyond generating novel art. Can it assess or analyze existing art? And can it extract less obvious features like genre or era? Exploring this is the crux of my project.

Project Github

Project Setup

For this project, I am focusing on training a model on the dataset’s images to construct a reasonably functionable genre and era classifier. While the dataset’s labels include categories like artist, style, and brushstroke, we will focus only on genre and dating.

When it comes to the date/era of each painting, while the dataset provides the exact year that the painting was created, our model will focus on classifying a span or an era, since having a model predict the exact year would not be a reasonable metric.

With this in mind, I am trying to answer whether or not we can construct a baseline model using CNNs that could classify historic paintings by genre and era. I am also interested in exploring how to set up the best framework in terms of image processing and model network to allow for not only accuracy, but efficient training and testing

Dataset

Kaggle, Painter By Numbers Dataset

Classifies over 40GB of Painting Images over the last millenia by Genre, Style, and Era, among other categorizations.

Model Architectures

Experiment 1

I start with 2 convolutional layers, each followed by max pooling layers to extract features from the images.

After flattening the 2D matrices into a 1D vector, we have a fully connected layer with 100 units and ReLU activation, followed by a dropout layer to prevent overfitting.

The final dense layer uses a softmax activation to output the probabilities for each class.

First Model Architecture and Approach Review

After constructing and training this model, I realized that one of the main issues was that there were too many classes, and specifically, most classes had very few images associated with them during training. While a few top-heavy genres or date ranges in terms of training, most classes were not adequately being trained on. Therefore, it would make sense for this analysis to condense the number of classes. Furthermore, I realized that I could also make training more efficient by further reducing the input image size and increase the batch size. To test this, the following is my new model architecture

Experiment 2 (reduced Classes)

Simplified Model with a reduced Input Size.

Training

Genre Classification Training Log prior to Reducing Classes

Genre Classification Training Log After Reducing Classes

REsults and Evaluation

Genre Classification – 39 Classes: 0.2750

Era Classification – 14 classes: 0.3516

Genre Classification (Classes condensed) – 4 classes: Portrait, Landscape, Genre Painting, Abstract:
0.6036

As you can see, as the number of classes decrease, the accuracy of the predictions also increase.

GPT-4 Experimentation

One final Experiment I Conducted was ask GPT-4 to analyze some of the images and see how accurate it was at classifying the genre of each. Here are the results of 8 random samples from the dataset.

39986.jpg: Abstract (Correct)

23235.jpg: Landscape (Incorrect, categorized as Abstract)

30650.jpg: Genre Painting (Correct)

30259.jpg: Portrait (Incorrect, categorized as Genre Painting)

31118.jpg: Abstract (Correct)

100390.jpg: Portrait (Correct)

36698.jpg: Genre Painting (Correct)

2051.jpg: Landscape (Correct)

COnclusion

GPT accurately predicted 6/8 correctly, which is a 75% improvement over our model’s 60% (although it was a very small sample size, so that must be taken into account). Considering our training limitations, I’d say our model performed fairly well. The errors it did make were in confusing genre paintings with portraits and confusing a landscape painting with an abstract painting. I suspect even in our model that these would be the most likely mis-predictions. Many genre paintings contain people, and so the model could pick up on that feature and classify the work as a portrait. Additionally, older landscape paintings seem to have rough textures and minimalistic details which could confuse models to interpret them as abstract works of art.

You may also like