Manifold Learning
A non-linear dimensionality reduction technique.
The curse of dimensionality
When it comes to high dimensional data, a data set with lots of features, one of the first things that might pop into our mind is how to visualize it. However, visualization of such data is not its only challenge.
As we increase the number of dimensions of space, the size of it rises gradually. With a limited training data size, there is a possibility that the training instances may be far from each other. Moreover, any new instance might be far from the training points, resulting in a high risk of over fitting and unreliable predictions.
One simple idea is to increase the number of training instances, enough to fill in the space and be spread uniformly across all dimensions. To give you an example of why this is not possible, imagine a dataset with 100 features.
With just 100 features (significantly fewer than in the MNIST problem), you would need more training instances than atoms in the observable universe in order for training instances to be within 0.1 of each other on average, assuming they were spread out uniformly across all dimensions
Generally, there are two main approaches to dimensionality reduction: Projection and Manifold Learning.
Most of the time, training instances lie close to a much lower-dimensional subspace of the high-dimensional space. The idea behind Projection is to find the best subspace and project each of the data instances onto that lower-dimensional subspace. One of the most common projection techniques is PCA (Principle Component Analysis).
However, Projection is not always the best approach, as in many cases, the subspace twist and turn. One famous example is the Swiss roll toy data set.
A d-dimensional manifold is a part of an n-dimensional space (d<n) that locally resembles a d-dimensional hyperplane. (In geometry, a hyperplane is a subspace whose dimension is one less than that of its ambient space) Based on what was said, a Swiss roll is a 2-dimensional plane that is rolled in the third dimension.
Manifold Learning can be thought of as an attempt to generalize linear frameworks like PCA to be sensitive to non-linear structure in data.
The general idea behind Manifold Learning is to model the manifold on which the training points reside. It takes an unsupervised approach that learns the high-dimensional structure of the data from the data itself.
In the section below we talk about some of the common manfiold learning techniques:
Locally Linear Embedding (LLE)
Locally linear embedding (LLE) seeks a lower-dimensional projection of the data that preserves local neighbourhoods’ distances. This approach makes it particularly good at unrolling twisted manifolds, especially when there is not too much noise.
Note that in plotting the scatter plot, the original data has 3 dimenstions but the new one has only 2 dimesntions.
Isomap
Isomap seeks a lower-dimensional embedding that maintains geodesic distances between all points.
Multi-dimensional Scaling (MDS)
Multidimensional Scaling is a statistical method that creates a mapping of objects in a geometric space based on their dissimilarity or distances. Therefore, the more similar the objects are, the closer they get on the visualisation graph.
It can also find a lower-dimensional representation of the data in which the new distances respect well the distances in the original high-dimensional space.
There are two types of MDS algorithms: Metric and non-Metric
- Metric MDS Principle Coordinates Analysis: The new distances are set to be close to the similarity or dissimilarity of original data.
- Non-metric MDS: While metric MDS applies to interval scale data only, Non-metric is used for ordinal data. In ordinal data, a pair of points with a dissimilarity of 2 isn’t twice as dissimilar as a pair with a dissimilarity of 1. Therefore, non-metric MDS tries to preserve the order of the distances instead of relative distances.
For more information on MDS, I highly suggest reading https://towardsdatascience.com/multidimensional-scaling-d84c2a998f72.
Below is an illustration of dimensionality reduction on the swiss roll dataset with common manifold learning methods:
Note that in Manifold learning there is usually an implicit assumption that
the task at hand (e.g., classification or regression) will be simpler if expressed in the lower-dimensional space. In general, reducing the dimensionality of your training set before training a model might speed up training and result in better solutions, but it may not always be true; it all depends on the dataset. In figure 7, you can observe that in the 3D space (left), the decision boundary would be complex, but in the 2D unrolled manifold space (right), the decision boundary is a straight line.
However in figure 8, while the swiss roll in 3D space(left) can be esaily split to two classes by a 2D hyperplane, in the 2D unrolled manifold space(right) the decision boundary is more complex.
Summary:
In this article we talked about what Manifold Learning is, its difference with projecttion and learnt few manifold learning implementations that are available in scikit-learn. Fore more information on Dimenstionality reduction I highly suggest reading the dimentionality reduction chapter of ‘hands on machine learning with tensorflow and scikit-learn’ by oreilly.
Refrences:
https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282