Creating a Niche Classifier with Neural Networks

Introduction

The wedding industry is composed of myriad independent designers, artists, and planners who are responsible for every aspect of their businesses–marketing being one of the most important and time-consuming. Over time, these entrepreneurs–frequently small business owners–develop large libraries of media which form the building blocks of their marketing efforts. These photos, shared after each celebration by the event’s photographers, are unlabeled, meaning every time a wedding creative wishes to post on social media or update their website, they must first spend a great deal of time searching through swathes of unlabeled images to select just a handful to share. Combing through thousands of wedding photos, just to locate one to post on Instagram, is often a daily task taking valuable time away from important client work. Computer vision techniques using neural networks offer a solution. Using established image classification neural network models, I employed transfer learning to finetune a model to create a niche classifier to sort and label wedding images in a small dataset.

Find the repo on github

Data Background

Image files are difficult to search because elements of the image are represented by numerical pixel data. Computer vision has come of age with the development of machine learning and neural networks. Data scientists, machine learning engineers, software developers, and researchers have devoted great effort to building classification neural networks, many of the most powerful of which are general classifiers. While adept and making high-level distinctions when classifying, these neural networks need to be fine-tuned to make more detailed distinctions among similar classes. While image file formats have the potential to store metadata, useful information regarding the content is missing unless users have added the data manually. Before computer vision was developed, this kind of classification could only be done through a manual, time- and labor-intensive process of sorting and labeling images. Now, with better models and access to more computing power, it is possible for these models to be adjusted for niche applications. The data used for this project consists of a collection of images downloaded from Instagram. I was able to collect images from wedding-related hashtags and build a dataset for the model. The classification targets for the model are: “bride,” “bouquet,” “newlyweds,” “reception room,” and “wedding party.” Instagram users often place many generalized or unrelated tags in their posts. Starting with 10,000 image downloads, multiple stages of sorting and refining yielded a dataset of 2,500 images.

Target-Class-Examples

I sorted these images into groups by class. The classes then required further refinement as many images represent multiple classes. In order to train the model effectively I removed images which did not exemplify the intended class. Two classes, “bouquet” and “bride,” often appear together so it is important to have sets that only include one or the other. I also removed images with poor lighting to improve the model’s capabilities.

Data-Collection-Pyramid

Modeling and Evaluation

With a small dataset, I wanted to use top models that used a relatively small amount of trainable parameters. I started with the base models of ResNet50V2 and MobileNetV2, running each of them with only an additional customized final classification layer. The initial results for both models were impressive. To improve the models, I ran several iterations, adding layers to the networks and adjusting hyperparameters. I tracked two elements while evaluating the models: training loss–how well the model fits the data–and F1 score–a measure of prediction accuracy. Gains were achieved once I added image augmentation layers. These layers flipped and rotated training images to allow the model to identify features in a variety of positions in the image, enhancing the model’s ability to identify attributes in more locations and in unseen data. Lowering the model learning rate also contributed to higher success rates. The final model is a version of the ResNet50V2 model with these improvements.

Conclusion

The results from this project challenged my assumptions on both the potential shortfalls of state-of-the-art image classification neural networks and the quality and quantity of data it takes to improve upon them. A larger dataset could lead to a more robust and useful model, allowing for more opportunities to fine-tune the base models. The dataset can expand in two directions. First, adding more images exemplifying the classes would create cleaner training data. I would also like to add more classes to expand the model’s reach. The images I collected were from a small window in time and a small potential source. Reaching out to well-established photographers is the next source I would like to investigate. Not only would a collection of photographs from their career portfolio span a longer timeline, they would also include a variety of compositions that would make the model’s training more robust. It’s also important to acknowledge that my approach does not take into account the breadth and depth of wedding traditions and customs the world over–the photos used to train the model represented mainly elements of Western weddings. The first approach to handling this issue would be to limit the scope of the model to a particular segment of weddings that my target users work within. With more data and training, I could build out the model to include a broader range of users. While this model has room for improvement, implementing it into an application that can classify batches of images has great value for small business owners. Selecting images for marketing posts ultimately takes a human’s expertise and attention, and removing the burden of labeling and starting with an unlabeled library for every search will greatly reduce the time required to select these images. Automating this classification process will save valuable time in selecting appropriate images to post and will assist wedding creatives to focus more of their work hours on their clients. Leveraging current neural network models and processing power offers intriguing possibilities for niche projects like this one!

Find the repo on github