ABI1 : High-Dimensional Data Analysis ~ Dimension Reduction Techniques

ABI1 : High-Dimensional Data Analysis ~ Dimension Reduction Techniques

- 1 min
statistics high-dimensional feature-selection overfitting

The curse of dimensionality phenomena is a problem that arises when a high-dimnesional dataset results in sparsity, where the volume of the space represented grows quickly. Unless data size grows exponentially in relation to the variables, classification and prediction models will fail, because the current data is insuffcient to provide a useful model across so many variables.

A machine learning model is a function 𝑓(x; 𝜃) where 𝜃 is a vector of model parameters (e.g. the weights of a linear regression model or the encoded splits of a decision tree model). This function accepts an input vector x ∈ R 𝑝 and produces a real output (for regression) or probability of being in a specific class (for classification). Mathematical optimisation is the essential foundation of machine learning and AI and any optimization problem generally consists of 3 components :

1) Objective function

2) Constraints

3) Hypothesis space

It usually takes the form of min f(X); s.t g(X) <=0 ; h(X) =0

The main steps of a ml problem is to build a model hypothesis, define the objective function and solve it through optimization methods to determine the parameters of the model.

Check me out for the full-fledged listing on regression algorithms for analysis and their explanation