up:: [[Machine Learning MOC]]
# Real World Problem Spaces Exhibit Non-Stationarity ie They Drift
## Data Drift
Machine learning models are stationary. The trained model is an artifact that doesn't move, however the problem space in the real world does (see [[Supervised Machine Learning Data Exists in a Euclidean Space#^qbglai|this diagram]]), which is referred to as **data drift**. Recall that [[Supervised Machine Learning Data Exists in a Euclidean Space|machine learning models can be thought of as existing in a multidimensional Euclidian space]].
There are two kinds of drift:
- **concept drift** - also known as *real drift*, this is a shift in the relationship between model inputs and outputs. Because p(y | x) changes, this always causes *model decay*.
- **data drift** - also known as *input drift*, *feature drift*, or *covariate shift*, this is a change to the statistical distributions within the input data. It can cause model decay by driving a change in p(y | x) but doesn't necessarily. When it doesn't, it is referred to as *virtual drift*.
Both kinds of drift can manifest in at least 4 modes:
1. Abrupt
2. Incremental
3. Gradual
4. Reoccurring (e.g. seasonality)
## Addressing Data Drift
There are a number of ways to address data drift. Some popular ones include using [[Supervised Machine Learning Data Exists in a Euclidean Space#^dd9a2d|active learning]] to address new gaps and pruning old and out of date data to realign the training domain with the problem space.
Training pipelines and other MLOps techniques are useful for detecting drift and retraining the model on augmented data after a specified amount of drift is detected.
## Source
![[Sources/Videos/FourthBrain - ML Model Drift and Decay.md#^es3vfo]]
![[Sources/Videos/FourthBrain - ML Model Drift and Decay.md#^q8fzfd]]
![[Sources/Videos/FourthBrain - ML Model Drift and Decay.md#^1w22m3]]
![[Sources/Videos/FourthBrain - ML Model Drift and Decay.md#^tcwz2j]]
![[Sources/Videos/FourthBrain - ML Model Drift and Decay.md#^q52i2i]]