How to Download MSigDB Database Glycolysis-Related Genes

We received a question from a bioinformatics beginner marathon student, what should we do!!! Of course, we should pamper her! The question is as follows:

How to Download MSigDB Database Glycolysis-Related Genes

First, I searched for relevant information

Using keywords to search in WeChat: MSigDB database glycolysis-related genes. I found an article published in October 2022 in the journal Frontiers in Endocrinology: Identification of risk model based on glycolysis-related genes in the metastasis of osteosarcoma. This article used a collection of glycolysis-related pathway genes:

We obtained 5 glycolysis-related pathway gene sets from MSigDB, namely:

  • BIOCARTA GLYCOLYSIS PATHWAY
  • GO GLYCOLYTIC PROCESS,
  • HALLMARK GLYCOLYSIS,
  • KEGG GLYCOLYSIS GLUCONEOGENESIS,
  • REACTOME GLYCOLYSIS.
How to Download MSigDB Database Glycolysis-Related Genes

I feel there should be more than 5 gene sets!

I also saw another article published in August 2023 in the journal Cancer Cell Int titled Glycolysis-related biomarker TCIRG1 participates in regulation of renal cell carcinoma progression and tumor immune microenvironment by affecting aerobic glycolysis and AKT/mTOR signaling pathway, which also provided specific instructions on how to search for glycolysis-related genes from the MSigDB database:

How to Download MSigDB Database Glycolysis-Related Genes

Of course, we can directly download the supplementary table table2 from this article, but the MSigDB database underwent a major update in 2024, as mentioned in the article: What are the reasons for the failure to obtain gene sets from the msigdbr database?, we still recommend checking the latest data!

How to Download MSigDB Database Glycolysis-Related Genes

The article found 21 pathways:

BIOCARTA_ETC_PATHWAY
BIOCARTA_FEEDER_PATHWAY
BIOCARTA_GLYCOLYSIS_PATHWAY
BIOCARTA_KREB_PATHWAY
CHEN_LUNG_CANCER_SURVIVAL
DCA_UP.V1_DN
DCA_UP.V1_UP
GOBP_FRUCTOSE_1_6_BISPHOSPHATE_METABOLIC_PROCESS
GOBP_LACTATE_TRANSMEMBRANE_TRANSPORT
GOMF_LACTATE_TRANSMEMBRANE_TRANSPORTER_ACTIVITY
HALLMARK_GLYCOLYSIS
KEGG_CITRATE_CYCLE_TCA_CYCLE
KEGG_GLYCINE_SERINE_AND_THREONINE_METABOLISM
KEGG_GLYCOLYSIS_GLUCONEOGENESIS
MODULE_306
REACTOME_GLYCOLYSIS
REACTOME_REGULATION_OF_GLYCOLYSIS_BY_FRUCTOSE_2_6_BISPHOSPHATE_METABOLISM
WP_AEROBIC_GLYCOLYSIS
WP_GLYCOLYSIS_AND_GLUCONEOGENESIS
WP_GLYCOLYSIS_IN_SENESCENCE
WP_HIF1A_AND_PPARG_REGULATION_OF_GLYCOLYSIS

Let’s see how to implement it with code

We will use the keyword glycolysis to search the MSigDB database: https://www.gsea-msigdb.org/gsea/msigdb/index.jsp.

First, download the entire library, the file is less than 30M:

How to Download MSigDB Database Glycolysis-Related Genes
library(clusterProfiler)
library(org.Hs.eg.db)
library(GSEABase)

## === All pathways
geneset <- read.gmt("msigdb.v2024.1.Hs.symbols.gmt")
length(unique(geneset$term))
head(as.data.frame(table(geneset$term)))

There are a total of 34837 gene sets:

How to Download MSigDB Database Glycolysis-Related Genes

Only found 13 pathways, which is fewer than the above article. Looking at the pathways above, some gene sets do not contain the keyword glycolysis:

# Search for glycolysis-related gene sets: glycolysis
geneset_select <- geneset[grep(pattern ="glycolysis", geneset$term,ignore.case = T),]
str(geneset_select)
as.data.frame(table(as.character(geneset_select$term)))
How to Download MSigDB Database Glycolysis-Related Genes

Web Version Search and Download

The above code can only retrieve gene sets with the keyword in their names; there may be other gene sets that do not have this keyword but have glycolysis-related functions. See the method below to obtain the file: genesets.v2024.1.Hs.gmt, which has 22 gene sets, one more than the article above:

How to Download MSigDB Database Glycolysis-Related Genes

Read it in R to see:

### Web Download
geneset <- read.gmt("genesets.v2024.1.Hs.gmt")
length(unique(geneset$term))
as.data.frame(table(geneset$term))
How to Download MSigDB Database Glycolysis-Related Genes

This way, we have obtained all the glycolysis-related genes, and we can happily use them later!

If you have similar questions, you can check the links below:

Bioinformatics Introduction & Data Mining Online Live Course January 2025

After 5 years, our Bioinformatics Skills Tree VIP apprentices are continuing to recruit!

A low-cost solution to meet your bioinformatics analysis computing needs.

Leave a Comment