Advances in data collection and storage capabilities have led to an information overload in most sciences. Such datasets present new challenges in data analysis. Traditional statistical methods break down partly because of the increase in the number of observations, but mostly because of the increase in the number of variables associated with each observation. The dimension of the data is the number of variables that are measured on each observation. One of the problems with high-dimensional datasets is that, in many cases, not all the measured variables are “important" for understanding the underlying phenomena of interest. It is still of interest in many applications to reduce the dimension of the original data prior to any modeling of the data.PCA is a way of identifying patterns in data, and re-expressing the data in such a way as to highlight their similarities and differences. Since patterns in data can be hard to find in data of high dimension, PCA is a powerful tool for analyzing data. The other main advantage of PCA is that once you have found these patterns in the data, you can compress the data by reducing the number of dimensions, without much loss of information.