Abstract:
Objective Transcription of herbal plant Clematis florida Thunb. var. plena D. Don was used to identify the key genes involved in the metabolic pathways of the active compounds in the famed “Shiershichen” of She medicine.
Method PacBio single-molecule real-time (SMRT) was employed to sequence the full-length transcriptome of the roots of the herbal plant that contains the active compounds of the She medicine. Functional annotation, gene structure, and mining of the terpenoid biosynthetic pathway were conducted using bioinformatics tools to secure the transcript data.
Result From the 62.21 G polymerase read bases generated, 20540 non-redundant high-quality transcript sequences were identified after data treatment. At least one of the 7 major databases including NR, NT, Pfam, COG/KOG, SwissProt, GO, and KEGG applied annotated the gene functions of 19909 transcripts, and 8888 were in NR, NT, COG/KOG, KEGG, and GO. The GO annotation showed 14911 transcripts enriched in 53 terms that included biological processes, cellular components, and molecular functions. The KEGG database annotated 19701 transcripts and classified them into 6 major pathways and 44 sub-pathways with the largest number of transcripts enriched in metabolic pathways. The COG/KOG annotation identified 13204 transcripts with the general function prediction being the most predominant. There were 978 transcription factors, 224 LncRNAs, and 7167 SSRs predicted, and 48 transcripts with 16 key candidate genes involved in the terpenoid backbone biosynthesis identified.
Conclusion The full-length transcriptome sequencing on the roots of C. florida was successfully obtained with the comprehensive gene function information. It provided a basis for further studies on the regulatory network, biological characteristics, related metabolic pathways, signaling pathways, and molecular mechanisms associated with the well-known She herbal medicine.