Zero shot, Few shot, One shot Learning in NLP

6 min readMar 1, 2023

This Article focus on different types of N-Short learning disciplines.

Content of this Article

Introduction
What is N-Shot Learning?
What is Zero-Shot Learning in the first place?
What is Few-Shot Learning ?
What is One-Shot Learning ?
Conclusion

Introduction

Language models and transfer learning have emerged as key components of NLP in recent years. Language transformers such as BERT have pushed the limits of what is possible in NLP. Phenomenal results were obtained by first creating a model of words or even characters, and then applying that model to other tasks such as topic classification, text summarization and question answering. They have also given birth to models such as RoBERTA (larger), DistilBERT (smaller), and XLM (multi-language).

Dilemma

The machine learning market is currently dominated by supervised learning algorithms, which require large labeled datasets to achieve any form of generalization. The dependency in labeled training datasets is a limitation for many applications of supervised learning. One of the biggest dilemmas we encounter is getting labelled data. Almost all existing text classification models require a large amount of labelled data. When we are amazed at DeepMind’s AlphaGo or GPT-3, we forget that they were trained on entire Wikipedia corpus. How many scenarios can boast such comprehensive datasets?

The availability of large datasets, such with over 1000 classes, access to GPUs and cloud computing, and advances in deep learning have all enabled the development of highly accurate models to solve problems in a variety of domains. These previously described models can be re-used to solve problems with similar data using transfer learning.

The availability and access to large scale data for training is the primary requirement for transfer learning. This is not always possible, and some real-world problems are hampered by a lack of data. There are approximately 9,000 known flowering plant species, for example. One of these is the Corpse Lily, the world’s largest flower. Because this is a rare plant, there will be far fewer images of it in a given computer vision classification task than of more common flowering plants.

Issues we are facing:

data availability, alongside other challenges like
training time and
high infrastructure costs

Deep learning methods have evolved in recent years to attempt to remove dependencies in large training datasets by developing knowledge based on a few training examples. These methods are collectively referred to as N-shot learning.

What is N-Shot Learning (NSL) ?

N-shot learning (NSL) aims to build models using the training set, which consists of inputs and outputs. E.g. When we show a baby different pictures of the same person, he or she will recognize the same person in a larger number of photos. The imitation of that ability has led to two major developments in the deep learning space:

Models that can learn with minimum supervision: In this group, we have techniques such as self-supervised or semi-supervised learning.

Models that can learn with small training datasets: N-shot learning techniques fall into this category.

This method has been used to solve a wide range of problems, including object recognition, image classification, and sentiment classification. When it comes to classification tasks, typically, the N-way-K-shot classification is considered, in which train contains I= KN examples from N-classes, each with K-examples.

NSL is classified into three types:

few-shot,
one-shot, and
zero-shot

Few-shot is the most adaptable variant, requiring only a few data points for training, while zero-shot is the most restrictive, requiring no data points for training.

What is Zero Shot Learning (ZSL) ?

Zero-shot learning is a transfer learning variant in which there are no labelled examples to learn during training. This method makes use of additional information to comprehend previously unseen data. Three variables are learned using this method. These are the input variable x, the output variable y, and the random variable T that describes the task. As a result, the model is trained to learn the conditional probability distribution P(x|y,T).

Zero-shot learning essentially is made up of two stages:

Training: Where the knowledge about the attributes is captured
Inference: The knowledge is then used to categories instances among a new set of classes.

For example, if a child is asked to identify a Yorkshire terrier, he or she may recognize it as a type of dog, with additional information from Wikipedia.

Contextualizing Zero-Shot Learning, it is essentially learning from one set of known labels and then evaluating a different set of labels that the classifier has never seen before.

What is One Shot Learning (OSL) ?

One-shot learning allows for model learning from a single data point instance. How can a deep learning model be trained in a single record and generalize a classification model? Well, the answer is related to the fact that OSL techniques are pretrained in large datasets and learn key features that will make it possible to classify a new data instance only seen once before.

This enables models to exhibit human-like learning behavior. A child, for example, can easily identify another apple after observing the overall shape and color of an apple. In humans, this could be accomplished with just one or two data points. This ability is extremely useful for solving real-world problems where access to a large number of labeled data points is not always possible.

OSL training is completed in two major stages.

The model is first trained on the verification task. This task feeds labeled pairs of images to the model, which must determine whether they belong to the ’same’ or ‘different’ class.
Second, in the one-shot learning setting, the ’same/different’ predictions are used to identify new images. This is accomplished by taking the model’s maximum ’same’ probability after it has been trained on the verification task.

A Siamese Neural Network is a type of neural network architecture composed of two or more identical subnetworks. The term “identical” refers to the fact that they have the same configuration with the same parameters and weights. To calculate the difference between the two inputs, the two subnetworks output an encoding. The Siamese network’s goal is to use a similarity score to determine whether two inputs are the same or different.

What is Few Shot Learning ?

Few shot learning, also known as low-shot learning, learns a new task by using a small set of examples from new data. At first glance, FSL is immediately relevant to scenarios in which large labeled datasets are not available.

A study in 2019 titled ‘Meta-Transfer Learning for Few Shot Learning’ addressed the challenges that few-shot settings faced. Since then, few-shot learning is also known as a meta learning problem.

Here are some situations that are driving their increased adoption:

Whenever there is scarcity of supervised data, machine learning models often fail to carry out reliable generalizations.
When working with a huge dataset, correctly labeling the data can be costly.
When several samples are available, adding specific features for every task is strenuous and difficult to implement.

How do zero-shot, one-shot and few-shot learning differ?

Conclusion

Transfer learning and its variants, such as one-shot and zero-shot learning, aim to address some of the fundamental challenges encountered in machine learning applications, such as data scarcity. The ability of artificial intelligence to learn intelligently from less data makes it similar to human learning and paves the way for wider adoption.

Thank you for Reading !

Follow for more updates