A Batch, in the context of AI/ML, is a subset of data used for training AI models. It allows for significant computational efficiency and improved model performance.
Imagine you’re trying to study for a huge test covering lots of different topics. Instead of cramming all topics in one go, it’s easier to break your study materials into smaller sections and review them one by one - that way, you can focus and learn better. In AI, a “Batch” is similar: it’s like a smaller set of questions your computer uses to learn and improve.
The process of training an AI model can be thought of as a task of finding the optimal set of parameters that can accurately model the underlying patterns of the training data.
In this context, a batch indicates the number of training examples used in one iteration to update the model parameters. For instance, if the size of training data is 1000 and the batch size is 100, then the AI model will take the first 100 examples, update its parameters based on these examples, move to the next 100 examples, and so on. This process will occur 10 times (1000 training examples/100 examples per batch) in one epoch (one pass through the entire data).
Batches can vary in size. When the batch size equals the total dataset size, it is called “batch learning” or “batch gradient descent”. A batch size of 1 constitutes “online learning” or “stochastic gradient descent”. In between, you have what’s termed “mini-batch learning” that combines benefits of computational efficiency from batch learning and quick updates from online learning.
The choice of batch size affects learning directly and indirectly. Directly, it determines how accurately the gradient of the cost function can be estimated - larger batches provide better estimates but consume more memory and computation. Indirectly, the number of updates for each epoch fluctuates - smaller batches enable more frequent updates, potentially leading to faster convergence.
Above considerations influence the choice of batch size - too small and the gradient estimates are noisy leading to unstable learning; too big and the learning may be slow or stuck in suboptimal points. Also, memory limitations may enforce a maximum batch size.
In addition, techniques like batch normalization, which scales the activations of a network at each layer to have zero mean and unit variance, has been commonly applied to mini-batches. It helps to stabilize the learning process and has a slight regularizing effect.
Gradient Descent, Mini-Batch, Epoch, Overfitting, Underfitting, Stochastic Gradient Descent, Batch Gradient Descent, Online Learning, Batch Normalization