Skip to main content

View full-text article in PMC

. 2008 Nov 25;37(1):1–13. doi: 10.1093/nar/gkn923

Table 2.

Categorization of enrichment analysis tools

Tool category	Description	Indication and limitation	Sub-type of algorithms	Methods	Example tool
Class I: singular enrichment analysis (SEA)	Enrichment P-value is calculated on each term from the pre-selected interesting gene list. Then, enriched terms are listed in a simple linear text format. This strategy is the most traditional algorithm. It is still dominantly used by most of the enrichment analysis tools.	Capable of analyzing any gene list, which could be selected from any high-throughput biological studies/technologies (e.g. Microarray, ChIP-on-CHIP, ChIP-on-sequence, SNP array, EXON array, large scale sequence, etc.). However, the deeper inter-relationships among the terms may not be fully captured in linear format report.	Global reference background Local reference background Neural network	Fisher's exact hypergeometric chi-square binomial Fisher's Exact hypergeometric chi-square binomial Bayesian	GoStat, GoMiner, GOTM, BinGO, GOtoolBox, GFinder, etc. DAVID, Onto-Express, GARBAN, FatiGO, etc. BayGO
Class II: gene set enrichment analysis (GSEA)	Entire genes (without pre-selection) and associated experimental values are considered in the enrichment analysis. The unique features of this strategy are: (i) No need to pre-select interesting genes, as opposed to Classes I and II; (ii) Experimental values integrated into P-value calculation.	Suitable for pair-wide biological studies (e.g. disease versus control). Currently, may be difficult to be applied to the diverse data structures derived by a complex experimental design and some of the new technologies (e.g. SNP, EXON, Promoter arrays).	Based on ranked gene list Based on continuous gene values	Kolmogorov–Smirnov-like t-Test permutation Z-score	GSEA, CapMap, etc. FatiScan, ADGO, ermineJ, PAGE, iGA, GO-Mapper, GOdist, FINA, T-profiler, MetaGP, etc.
Class III: modular enrichment analysis (MEA)	This strategy inherits key spirit of SEA. However, the term–term/gene–gene relationships are considered into enrichment P-value calculation. The advantage of this strategy is that term–term/gene–gene relationship might contain unique biological meaning that is not held by a single term or gene. Such network/modular analysis is closer to the nature of biological data structure.	Capable of analyzing any gene lists, which could be selected from any high-throughput biological studies/technologies, like Class I. Emphasis on network relationships during analysis. ‘Orphan’ gene/term (with little relationships to other genes/terms), that sometimes could be very interesting, too, may be left out from the analysis.	Composite annotations DAG Structure Global annotation relationship	Measure enrichment on joint terms Measure enrichment by considering parents-child relationships Measure term–term global similarity with Kappa Statistics Czekanowski-Dice Pearson's correlation	ADGO, GeneCodis, ProfCom, etc. topGO, Ontologizer, POSOC, etc. DAVID, GoToolBox, etc.

HHS Vulnerability Disclosure