batch vs online learning

Batch learning

involves training the dataset ml model on entire dataset at once
trained in single or multiple pass
dataset availability is crucial, entire dataset must be present
requires intensive resource
high performance required cuz trained on all available data
usually used when dataset is small and can fit into memory

e.g. House price prediction using linear regression

Online /Incremental/Sequential learning

model is trained by feeding data instances one at a time or in small batches
faster training cuz chunk sized data is used
these models are very adaptive as the model can adapt to new data patterns over time
accuracy might not be it’s strongest feature initially but can be improved over time
useful for real-time data or data which changes over time (concept drift)
can be used to train system on huge dataset that can’t fit machine’s memory (out of core learning)
garbage in, garbage out (if bad data is fed to the system)

e.g. spam filtering model

Scikit-Multiflow, Jubatus

Feature	Batch Learning	Online Learning
Data Handling	Choose the method that processes the whole dataset at once or in portions of high size.	feeds data incrementally, that is, through the flow of one instance or one small batch at a time.
Training Frequency	On a timetable basis, fixed and cyclic (e. g. daily, at weekly or monthly basis).	Ongoing, where datasets, in scales larger than the current ones are obtained in the future.
Initial Dataset	It requires the whole dataset to be present before employment for training.	Moves from an initial set of test questions and then is altered over time with new test questions.
Adaptability	It has a weaker ability to update its model and less resistant to new incoming data; must update from time to time.	It is very flexible; it will clean the data set immediately if new data is introduced.
Resource Consumption	During the training phase, SKM requires a high computational resource since it needs to compare the variables of all samples.	May influence less demand at a particular period; it spreads the usage of resources in time.
Model Performance	Gets high accuracy if it was trained with enough data.	Is fast in terms of convergence but could, in some cases be tuned for precision.
Concept Drift Handling	We may have a problem with discrepancies on data distribution in the consecutive training phases.	Good at dealing with concept drift, which implies its flexibility in coping with new incoming distributions.
Update Mechanism	Must rest equally from scratch to make an update.	It becomes updated piecemeal in the form of a new data instance.
Deployment	The model is used after it has been trained and has no ability to modify itself until it undergoes training phase again.	The model is always in the process of deployment as well as training within the company and being oriented towards constant improvement.
Use Case Suitability	Apparently appropriate when used in setting where the action is fixed, employing stable data distributions.	Fast-paced systems that involve frequent changes in data are likely to benefit from such tuning.