Introduction to Principal Component Analysis (PCA) (2024)

Prev Tutorial: Support Vector Machines for Non-Linearly Separable Data

Goal

In this tutorial you will learn how to:

  • Use the OpenCV class cv::PCA to calculate the orientation of an object.

What is PCA?

Principal Component Analysis (PCA) is a statistical procedure that extracts the most important features of a dataset.

Introduction to Principal Component Analysis (PCA) (1)

Consider that you have a set of 2D points as it is shown in the figure above. Each dimension corresponds to a feature you are interested in. Here some could argue that the points are set in a random order. However, if you have a better look you will see that there is a linear pattern (indicated by the blue line) which is hard to dismiss. A key point of PCA is the Dimensionality Reduction. Dimensionality Reduction is the process of reducing the number of the dimensions of the given dataset. For example, in the above case it is possible to approximate the set of points to a single line and therefore, reduce the dimensionality of the given points from 2D to 1D.

Moreover, you could also see that the points vary the most along the blue line, more than they vary along the Feature 1 or Feature 2 axes. This means that if you know the position of a point along the blue line you have more information about the point than if you only knew where it was on Feature 1 axis or Feature 2 axis.

Hence, PCA allows us to find the direction along which our data varies the most. In fact, the result of running PCA on the set of points in the diagram consist of 2 vectors called eigenvectors which are the principal components of the data set.

Introduction to Principal Component Analysis (PCA) (2)

The size of each eigenvector is encoded in the corresponding eigenvalue and indicates how much the data vary along the principal component. The beginning of the eigenvectors is the center of all points in the data set. Applying PCA to N-dimensional data set yields N N-dimensional eigenvectors, N eigenvalues and 1 N-dimensional center point. Enough theory, let’s see how we can put these ideas into code.

How are the eigenvectors and eigenvalues computed?

The goal is to transform a given data set X of dimension p to an alternative data set Y of smaller dimension L. Equivalently, we are seeking to find the matrix Y, where Y is the Karhunen–Loève transform (KLT) of matrix X:

\[ \mathbf{Y} = \mathbb{K} \mathbb{L} \mathbb{T} \{\mathbf{X}\} \]

Organize the data set

Suppose you have data comprising a set of observations of p variables, and you want to reduce the data so that each observation can be described with only L variables, L < p. Suppose further, that the data are arranged as a set of n data vectors \( x_1...x_n \) with each \( x_i \) representing a single grouped observation of the p variables.

  • Write \( x_1...x_n \) as row vectors, each of which has p columns.
  • Place the row vectors into a single matrix X of dimensions \( n\times p \).

Calculate the empirical mean

  • Find the empirical mean along each dimension \( j = 1, ..., p \).
  • Place the calculated mean values into an empirical mean vector u of dimensions \( p\times 1 \).

    \[ \mathbf{u[j]} = \frac{1}{n}\sum_{i=1}^{n}\mathbf{X[i,j]} \]

Calculate the deviations from the mean

Mean subtraction is an integral part of the solution towards finding a principal component basis that minimizes the mean square error of approximating the data. Hence, we proceed by centering the data as follows:

  • Subtract the empirical mean vector u from each row of the data matrix X.
  • Store mean-subtracted data in the \( n\times p \) matrix B.

    \[ \mathbf{B} = \mathbf{X} - \mathbf{h}\mathbf{u^{T}} \]

    where h is an \( n\times 1 \) column vector of all 1s:

    \[ h[i] = 1, i = 1, ..., n \]

Find the covariance matrix

  • Find the \( p\times p \) empirical covariance matrix C from the outer product of matrix B with itself:

    \[ \mathbf{C} = \frac{1}{n-1} \mathbf{B^{*}} \cdot \mathbf{B} \]

    where * is the conjugate transpose operator. Note that if B consists entirely of real numbers, which is the case in many applications, the "conjugate transpose" is the same as the regular transpose.

Find the eigenvectors and eigenvalues of the covariance matrix

  • Compute the matrix V of eigenvectors which diagonalizes the covariance matrix C:

    \[ \mathbf{V^{-1}} \mathbf{C} \mathbf{V} = \mathbf{D} \]

    where D is the diagonal matrix of eigenvalues of C.

  • Matrix D will take the form of an \( p \times p \) diagonal matrix:

    \[ D[k,l] = \left\{\begin{matrix} \lambda_k, k = l \\ 0, k \neq l \end{matrix}\right. \]

    here, \( \lambda_j \) is the j-th eigenvalue of the covariance matrix C

  • Matrix V, also of dimension p x p, contains p column vectors, each of length p, which represent the p eigenvectors of the covariance matrix C.
  • The eigenvalues and eigenvectors are ordered and paired. The j th eigenvalue corresponds to the j th eigenvector.
Note
sources [1], [2] and special thanks to Svetlin Penkov for the original tutorial.

Source Code

Note
Another example using PCA for dimensionality reduction while maintaining an amount of variance can be found at opencv_source_code/samples/cpp/pca.cpp

Explanation

  • Read image and convert it to binary

Here we apply the necessary pre-processing procedures in order to be able to detect the objects of interest.

  • Extract objects of interest

Then find and filter contours by size and obtain the orientation of the remaining ones.

  • Extract orientation

Orientation is extracted by the call of getOrientation() function, which performs all the PCA procedure.

First the data need to be arranged in a matrix with size n x 2, where n is the number of data points we have. Then we can perform that PCA analysis. The calculated mean (i.e. center of mass) is stored in the cntr variable and the eigenvectors and eigenvalues are stored in the corresponding std::vector’s.

  • Visualize result

The final result is visualized through the drawAxis() function, where the principal components are drawn in lines, and each eigenvector is multiplied by its eigenvalue and translated to the mean position.

Results

The code opens an image, finds the orientation of the detected objects of interest and then visualizes the result by drawing the contours of the detected objects of interest, the center point, and the x-axis, y-axis regarding the extracted orientation.

Introduction to Principal Component Analysis (PCA) (3)

Introduction to Principal Component Analysis (PCA) (4)

Introduction to Principal Component Analysis (PCA) (2024)

FAQs

What is the introduction of PCA analysis? ›

PCA is a technique that can be used to transform a series of potentially coordinated observations into a set of orthogonal vectors called principal components (PCs). One way to think of PCs is that they are a means of explaining variance in the data.

What is the PCA analysis method? ›

PCA takes a dataset with multiple variables as input, and it produces a dataset into a lower subspace, that is, a reduced dataset with fewer variables. It is often used in exploratory data analysis for building predictive models, but it is also used in data preprocessing for dimensionality reduction.

How does PCA work for dummies? ›

Principal Component Analysis (PCA) finds a way to reduce the dimensions of your data by projecting it onto lines drawn through your data, starting with the line that goes through the data in the direction of the greatest variance. This is calculated by looking at the eigenvectors of the covariance matrix.

What is a PCA used for? ›

Patient-controlled analgesia (PCA) is a type of pain management that lets you decide when you will get a dose of pain medicine. In some cases, PCA may be a better choice to ease pain than calling the nurse to give you pain medicine.

What is the best explanation of PCA? ›

Principal component analysis, or PCA, is a dimensionality reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

How do you describe PCA analysis? ›

Independent component analysis (ICA) is a method used in signal processing in order to separate a multivariate signal into its subcomponents, where these signals are mutually independent.

Why do we need PCA analysis? ›

Uses of PCA

Data compression: PCA can be used to reduce the dimensionality of high-dimensional datasets, making them easier to store and analyze. Feature extraction: PCA can be used to identify the most important features in a dataset, which can be used to build predictive models.

How do you interpret a PCA analysis? ›

To interpret each principal components, examine the magnitude and direction of the coefficients for the original variables. The larger the absolute value of the coefficient, the more important the corresponding variable is in calculating the component.

What is the primary goal of principal component analysis (PCA)? ›

The most important use of PCA is to represent a multivariate data table as smaller set of variables (summary indices) in order to observe trends, jumps, clusters and outliers. This overview may uncover the relationships between observations and variables, and among the variables.

How is PCA used in real life? ›

Real World Applications of PCA

Visualizing multidimensional data. PCA allows us to represent the information contained in multidimensional data in reduced dimensions which are more compatible with visualization. Finding patterns in high-dimensional datasets.

What is principal component analysis in super layman terms? ›

Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize.

How PCA works with example? ›

PCA Example

Let's say we have a data set of dimension 300 (n) × 50 (p). n represents the number of observations, and p represents the number of predictors. Since we have a large p = 50, there can be p(p-1)/2 scatter plots, i.e., more than 1000 plots possible to analyze the variable relationship.

What is the principal component analysis explained simply? ›

To sum up, principal component analysis (PCA) is a way to bring out strong patterns from large and complex datasets. The essence of the data is captured in a few principal components, which themselves convey the most variation in the dataset. PCA reduces the number of dimensions without selecting or discarding them.

What is the main role of a PCA? ›

' A PCA helps people who have disabilities, chronic illnesses, or age-related issues with their daily activities at home. They provide support with basic needs like bathing, dressing, eating, and moving around.

What is the main advantage of PCA? ›

PCA can help us improve performance at a meager cost of model accuracy. Other benefits of PCA include reduction of noise in the data, feature selection (to a certain extent), and the ability to produce independent, uncorrelated features of the data.

What is the introduction of the process analysis? ›

What is process analysis? Process analysis explains how to do something (play a computer game, change a tire), how to make something (a butterfly sanctuary), or how something happens (how the modern firehouse has evolved).

When was principal component analysis introduced? ›

Principal component analysis (PCA), first proposed by Karl Pearson in 1901, is a classic method for extracting and distilling the inter-relationships among a large number of correlated variables. PCA is often regarded as the first true 'multivariate' statistical method.

What is the description of a PCA? ›

Personal Care Assistants' duties include mobility support, assisting with maintaining personal hygiene by bathing, brushing their hair and teeth or applying skincare. In addition, they may be responsible for housekeeping duties such as cooking, cleaning, washing clothes and dishes and running errands.

Top Articles
Fish Tales Sour Lake Tx
Our Programs — Lunar Startups
Fiat 600e: Dolce Vita auf elektrisch
Craigslist Kentucky Cars And Trucks - By Owner
Tiffany's Breakfast Portage
Miller Motte College Student Portal
Convert Ng Dl To Pg Ml
Pokemon Infinite Fusion Good Rod
Studyladder Login
Equity Livestock Altoona Market Report
The First 10 Years, Leslie Bricusse - Qobuz
Becker County Jail Inmate List
Rooms For Rent Portland Oregon Craigslist
Uga Im Leagues
How To Find Free Stuff On Craigslist San Diego | Tips, Popular Items, Safety Precautions | RoamBliss
Kodiak C4500 For Sale On Craigslist
Magicseaweed Capitola
New from Simply So Good - Cherry Apricot Slab Pie
Dtm Urban Dictionary
Alamy Contributor Forum
Kentuky Fried Chicken Near Me
Old Navy Student Discount Unidays
Chatzy Spanking
Blackwolf Run Pro Shop
Only Murders In The Building Wiki
Cia Decrypter
Israel Tripadvisor Forum
Left Periprosthetic Femur Fracture Icd 10
Walgreens On Nacogdoches And O'connor
15 Best HDMovie2 Alternatives to Watch Movies in Hindi & Other Indian Languages Online Free Leawo Tutorial Center
1946 Chevy Truck For Sale Craigslist
Woude's Bay Bar Photos
Blue Beetle Showtimes Near Regal Independence Plaza & Rpx
Etfh Hatchery
Iconnect Seton
Lacy Aaron Schmidt Where Is He Now
Autozone Cercano
How to Get Rid of Phlegm, Effective Tips and Home Remedies
Mvsu Canvas
10000 Divided By 5
Exclaimer | Office 365, Exchange & G Suite Email Software
North Bay Craigslist Jobs
Montrose Colorado Sheriff's Department
Katie Hamden Of
Amazing Lash Bay Colony
Travelvids October 2022
Minecraft Skin Tynker
Unintelligible Message On A Warning Sign Crossword
Craigslist Antelope Valley General For Sale
Markella Magliola Obituary
Cnas Breadth Requirements
How to Screenshot on Cash App: A Complete Guide
Latest Posts
Article information

Author: Carlyn Walter

Last Updated:

Views: 5485

Rating: 5 / 5 (50 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Carlyn Walter

Birthday: 1996-01-03

Address: Suite 452 40815 Denyse Extensions, Sengermouth, OR 42374

Phone: +8501809515404

Job: Manufacturing Technician

Hobby: Table tennis, Archery, Vacation, Metal detecting, Yo-yoing, Crocheting, Creative writing

Introduction: My name is Carlyn Walter, I am a lively, glamorous, healthy, clean, powerful, calm, combative person who loves writing and wants to share my knowledge and understanding with you.