Image Classification Using Pre-trained Deep Learning Models
We will extend the Hugging Face tutorials in this notebook to help you quickly get started with your own generative AI model building setup.
As usual we will start by importing the required packages.
from transformers import pipeline from transformers import ViTFeatureExtractor, ViTForImageClassification from IPython.display import Image as DisplayImage from PIL import Image import requests
Let us start by exploring pre-trained models for computer vision. We will first try our a model that predicts one of the 1000 ImageNet classes on any image you can provide. Try sourcing random images from Wikipedia or other creative commons sources to stretch the model to its limits. The Google Vision Transformer (ViT) model is pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224.