Principal Component Analysis

Last updated on: 23.01.2026

Dieser Artikel auf Deutsch

Requires free registration (medical professionals only)

Please login to access all articles, images, and functions.

Our content is available exclusively to medical professionals. If you have already registered, please login. If you haven't, you can register for free (medical professionals only).


Requires free registration (medical professionals only)

Please complete your registration to access all articles and images.

To gain access, you must complete your registration. You either haven't confirmed your e-mail address or we still need proof that you are a member of the medical profession.

Finish your registration now

DefinitionThis section has been translated automatically.

PCA, also known as principal component analysis or PCA, is a statistical-mathematical procedure that can be used to analyze and understand very large genetic data sets. Principal Component Analysis is a dimension-reducing statistical method for analyzing high-dimensional, genetic, transcriptomic or epigenetic data sets, with the help of which the main pattern of biological variation can be identified and visualized. PCA is used to:

  • multiple genes
  • multiple SNPs
  • methylation sites
  • transcriptome data

to a few meaningful dimensions.

General informationThis section has been translated automatically.

Genetic data is extremely complex, e.g. 20,000 genes per sample, many genes correlate with each other and the differences are often subtle. Using appropriate algorithms, a common pattern can be analyzed. PCA can be used to recognize the main axes of variation in the data.

These are called:

  • PC1 = largest genetic variance
  • PC2 = second largest variance
  • PC3 etc.

Each "component" is made up of a combination of many genes or SNPs.

One often finds:

  • PC1 on the x-axis
  • PC2 on the y-axis

Samples with similar genetics are close to each other, different groups are separated.

OccurrenceThis section has been translated automatically.

Applications in genetic engineering

population genetics

  • Genetic origin
  • Clusters (e.g. European / Asian / African)
  • Control of population bias

RNA sequencing

Among other things, this enables the separation of:

  • sick vs. healthy
  • therapy responders vs. non-responders
  • Quality control ("batch effects")

epigenetics

  • DNA methylation profiles
  • epigenetic ageing
  • Tumor subtypes

Examples: Dermatology / Immunology: In inflammatory diseases (e.g. psoriasis vs. atopic dermatitis) distinct genetic profiles can be identified Th1/Th17-dominated patterns(IFN signature vs. IL4/IL13 signature).

Principal component analysis (PCA) shows different molecular expression patterns in atopic dermatitis and psoriasis, e.g.

  • IL17A (psoriasis high/atopic dermatitis low)
  • IFNG (psoriasis medium/atopic dermatitis low)
  • IL4 (psoriasis low/atopic dermatitis high)
  • IL13 (psoriasis low/atopic dermatitis high)

PCA thus automatically separates a gene sample into two clusters that can be assigned from the respective known data. PCA is therefore not a causal analysis but merely an exploratory analysis. PCA shows structures but not causes.

PCA algorithms and implementations can of course also be used with large scRNA-seq datasets (Tsuyuzaki K et al. 2020).

LiteratureThis section has been translated automatically.

  1. Ben Salem K et al.(2021) Principal Component Analysis (PCA). Tunis Med 99:383-389.
  2. Moldovan LI et al. (2021) Characterization of circular RNA transcriptomes in psoriasis and atopic dermatitis reveals disease-specific expression profiles. Exp Dermatol 30:1187-1196.
  3. Pardo LM et al. (2020) Principal component analysis of seven skin-ageing features identifies three main types of skin ageing. Br J Dermatol 182:1379-1387.
  4. Traks T et al. (2024) High-throughput proteomic analysis of chronic inflammatory skin diseases: Psoriasis and atopic dermatitis. Exp Dermatol 33:e15079.
  5. Tsuyuzaki K et al.(2020) Benchmarking principal component analysis for large-scale single-cell RNA-sequencing. Genome Biol 21:9.

Last updated on: 23.01.2026