Image Classification Using Pre-trained Deep Learning Models
We will extend the Hugging Face tutorials in this notebook to help you quickly get started with your own generative AI model building setup.
As usual we will start by importing the required packages.
from transformers import pipeline
from transformers import ViTFeatureExtractor, ViTForImageClassification
from IPython.display import Image as DisplayImage
from PIL import Image
import requests
Image Classification¶
Let us start by exploring pre-trained models for computer vision. We will first try our a model that predicts one of the 1000 ImageNet classes on any image you can provide. Try sourcing random images from Wikipedia or other creative commons sources to stretch the model to its limits. The Google Vision Transformer (ViT) model is pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224.