
Abstract: Objective: Taking Bletillae Rhizoma as an example, this study aims to design a health food formulation with dual functions of gastric protection and liver protection through the combined use of machine learning algorithms such as association, decision-making, and clustering, and to evaluate it. Methods: Information on existing health foods with gastric mucosal protection or liver function protection, as well as databases of traditional Chinese medicines and formulas used to treat these two diseases, were organized. The Apriori algorithm, Analytic Hierarchy Process (AHP), Self-Organizing Map (SOM) clustering, and Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) were employed to mine the formula patterns of high-frequency medicinal materials, integrating modern functional activity and nutritional research findings to create a health food formulation centered around Bletillae with the idea of compatibility and evaluate it. Results: Frequency statistics of medicinal materials in each database and the correlation analysis of corresponding high-frequency medicinal materials revealed strong correlations among high-frequency medicinal materials. Subsequently, a total of 64 high-frequency medicinal materials suitable for health foods were analyzed using AHP with 17 indicators across two levels, resulting in the top five weighted medicinal materials, excluding Bletillae, being Licorice, Chenpi, Astragalus, Poria, and Schisandra, which can be prioritized during formulation. SOM clustering indicated that all high-frequency medicinal materials could be divided into seven categories, with significant overlap between the optimal formulation’s medicinal materials and the AHP analysis results. Literature retrieval for the gastric and liver protective functions of the materials, combined with previous data analysis results, identified 15 materials for the combination of monarch, minister, assistant, and messenger, leading to the design of 10 potential formulations; TOPSIS analysis evaluated these formulations, with the top two formulations scoring similarly and both exceeding 0.14, showing a significant difference from other formulations. Conclusion: Guided by the basic theories of traditional Chinese medicine and combined with various machine learning algorithms, this study established a formulation design and evaluation method for health foods with dual functions of gastric protection and liver protection, providing new ideas and directions for future health food formulation research.
The auxiliary functions of protecting gastric mucosa and chemical liver damage in health foods correspond to the traditional Chinese medicine theories of “epigastric pain” and “hypochondriac pain,” serving as an auxiliary treatment during the sub-health period of the stomach and liver. The simultaneous occurrence and treatment of gastric and liver diseases have been discussed in both traditional Chinese medicine literature and modern medical research. The medical sage Zhang Zhongjing once stated: “When observing liver disease, one should know that the liver transmits to the spleen, and the spleen should be nourished first.” Renowned Qing Dynasty physician Ye Tianshi also mentioned: “The liver is the source of disease, and the stomach is where the disease transmits,” indicating the multi-factorial nature and simultaneous occurrence of stomach and liver diseases. Among the pathogenic factors for stomach and liver diseases, psychological factors often play a leading role. Traditional Chinese medicine often attributes the emotional factors of stomach and liver diseases to the relationship between the liver, stomach, and spleen, hence the common practice of soothing the liver and tonifying the spleen to strengthen the stomach. Alcohol has become one of the greatest threats to human health, with drinkers facing long-term risks of alcohol-related diseases. Alcohol intake can lead to various adverse health outcomes, including cardiovascular diseases, central nervous system disorders, gastric mucosal damage, liver damage, and other digestive system diseases. Among these diseases, over 80% are chronic and difficult to cure. Globally, the per capita alcohol consumption among adults increased from 5.9 L/year in 1990 to 6.5 L/year in 2017, and it is expected to reach 7.6 L/year by 2030. The massive alcohol consumption and the large population of consumers highlight the importance of controlling the occurrence and development of related diseases through health foods and enhancing exercise to improve immunity, especially at the onset or in the sub-health and healthy states.
Bletillae is the dried tuber of the orchid plant Bletilla striata (Thunb.) Reichb. f., known for its hemostatic and anti-inflammatory properties. Historically, it has been a primary ingredient in health foods like Bletilla sugar and is commonly used in clinical settings to treat various gastrointestinal ulcers, with reports of treating chemical liver damage as well. The product forms of Bletillae mainly focus on medications that protect against gastrointestinal ulcers, whitening cosmetics, and biomaterials, and it can also serve as an ingredient in health foods, offering unique advantages in improving digestive system ailments. However, health foods primarily based on Bletillae are scarce, indicating significant potential for research and development.
This study uses Bletillae as the main ingredient in a health food with dual functions of gastric and liver protection. It employs various association, decision analysis (decision analysis), and clustering analysis (cluster analysis) machine learning algorithms, including Apriori, AHP, SOM clustering, and TOPSIS, to mine existing data on registered health foods for gastric protection and liver protection, as well as information on formulas and traditional Chinese medicines, exploring new functions of Bletillae in protecting against chemical liver damage and providing a basis for developing health foods centered around Bletillae that protect gastric mucosa and chemical liver damage. This research aims to provide references for the development of similar dual-function health foods and methodologies for screening and evaluating health food formulations.
1 Methods
1.1 Data Sources and Inclusion Criteria
The Yaozhi Database (https://db.yaozh.com/) is the largest medical information collection website in China. The formulas and traditional Chinese medicine data containing Bletillae are sourced from the Yaozhi Database’s traditional Chinese medicine formula database and the traditional Chinese medicine prescription database. Health food information was searched using the keywords “gastric mucosa” and “liver damage” on the official website of the State Administration for Market Regulation (http://www.samr.gov.cn/tssps/), collecting all health food approval information up to November 30, 2020.
During the collection and organization of formulas containing Bletillae, some formulas with the same composition, efficacy, or indications, as well as those containing only Bletillae, were excluded. A total of 241 formulas were collected, of which 31 had the function of tonifying the spleen and invigorating qi. A total of 121 prescriptions containing Bletillae were collected, of which 44 were traditional Chinese medicines used to treat spleen qi deficiency. Only health foods containing two or more medicinal ingredients were included, while those without formula information were excluded, resulting in 46 health food approvals with auxiliary protective effects on gastric mucosa and 302 approvals with auxiliary protective functions against chemical liver damage.
1.2 Database Establishment and Standardization
1.2.1 Establishment of Raw Material Database The collected data was imported into Microsoft Excel 2019 to establish four databases. During data processing, standardization was performed on the formula medicinal material information of formulas, traditional Chinese medicines, and health food approvals, referencing the 2020 edition of the Chinese Pharmacopoeia, the second edition of Clinical Traditional Chinese Medicine, and local standards: (1) Common names were standardized to the names recorded in the pharmacopoeia or local standards, such as “Bai Ji” standardized to “Bletillae,” and “Huang Qi” standardized to “Astragalus.” (2) Different processing specifications of the same medicinal material were standardized to the original medicinal name, for example, “Fried Atractylodes” standardized to “Atractylodes,” and “Propolis Powder” standardized to “Propolis.” (3) Extracts of medicinal materials were standardized to the original medicinal material name, such as “Pueraria Extract” standardized to “Pueraria,” and “Patchouli Oil” standardized to “Patchouli.” (4) Different formula names, traditional Chinese medicine names, and different approvals with consistent medicinal compositions and effects were merged; furthermore, during the organization of traditional Chinese medicinal health foods, approvals with the same medicinal materials but different excipients were merged. (5) If significant effects are observed due to changes in names during standardization, the original records were retained and processed separately, such as “Charred Gardenia” and “Gardenia,” “Raw Rehmannia” and “Cooked Rehmannia.”
1.2.2 Establishment of Formula Database The hierarchical and indicator data for AHP analysis were entered according to the content and statistics of the raw material database. The medicinal materials involved in the statistics included all high-frequency medicinal materials in health foods with auxiliary protective effects on gastric mucosa and chemical liver damage, as well as traditional Chinese medicines and formulas with functions of tonifying the spleen and invigorating qi. The first-level evaluation indicators were the categories of medicinal materials and existing prescriptions, while the second-level evaluation indicators were the types of existing prescriptions, including health foods with auxiliary protective effects on gastric mucosa, health foods with auxiliary protective functions against chemical liver damage, and traditional Chinese medicines and formulas containing Bletillae with functions of tonifying the spleen and invigorating qi, totaling four categories. By statistically analyzing the frequency of high-frequency medicinal materials in each database and the total number of occurrences in the four databases, a statistical table was formed in descending order and assigned scores from 1 to 9.
Based on the overall goal of developing a health food formulation with dual functions of gastric and liver protection centered around Bletillae, and the core theory of traditional Chinese medicine regarding the treatment of stomach and liver diseases through tonifying the spleen and invigorating qi, the relative importance relationships among elements within each level were determined, constructing pairwise comparison matrices, where the importance of two adjacent elements was assigned scores from 1 to 9. Here, 1 indicates equal importance, 3 indicates slightly more important, 5 indicates significantly more important, 7 indicates strongly more important, and 9 indicates extremely more important. Scores of 2, 4, 6, and 8 are the median values of the adjacent judgments.
Based on the weights obtained from AHP analysis, further analysis of high-frequency medicinal materials was conducted, and linear weighted scores were calculated for each medicinal material, with final scores used for ranking. SOM clustering was applied to the categories of medicinal materials and the two weighted scores obtained from the databases, allowing for the grouping of high-frequency medicinal materials in health foods for gastric and liver protection. The clustering method was used to group the evaluation results, providing a more scientific and objective classification, effectively addressing the issue of unclear grouping boundaries due to minor data differences. In the TOPSIS analysis, data were arranged based on the order of monarch, minister, assistant, and messenger for evaluation of proposed formulas to facilitate final selection.
1.3 Analysis Methods
1.3.1 Raw Material Screening Using Microsoft Excel 2019, frequency statistics and analysis of the efficacy and medicinal materials in the database were conducted. The Apriori association rule algorithm and network diagram in SPSS Modeler 18.0 were used to mine and statistically analyze the organized data, obtaining secondary correlations and high-frequency medicinal combinations in the formulas, traditional Chinese medicines, and health foods with auxiliary protective effects on gastric mucosa and chemical liver damage.
1.3.2 Formula Analysis AHP was used to determine the weights of evaluation system indicators. SOM clustering algorithms were utilized to analyze each medicinal material, and potential formulas were screened based on traditional Chinese medicine theories and modern medical research. Finally, TOPSIS was employed to comprehensively evaluate the advantages and disadvantages of each formula, resulting in combinations of raw materials that can be used with Bletillae.
The analysis methods and processes of this study are illustrated in Figure 1.
2 Results
2.1 Data Analysis of Health Foods
2.1.1 Analysis of Medicinal Material Frequency In the 46 approvals for traditional Chinese medicinal health foods with auxiliary protective effects on gastric mucosa, a total of 70 types of medicinal materials were involved, with 28 types appearing three times or more, accounting for 75.78% of the total frequency. Excluding Bletillae, the top five medicinal materials were Cardamom, Propolis, Atractylodes, Astragalus, and Chenpi, as detailed in Table 1. The frequency of medicinal materials categorized by type revealed that the categories appearing five times or more included tonifying deficiency medicines (24 times), regulating qi medicines (7 times), clearing heat medicines (7 times), and digestion-promoting medicines (5 times), collectively accounting for 61.43%. The usage of tonifying deficiency and regulating qi medicines aligns with traditional Chinese medicine principles for treating gastric mucosal damage, which requires tonifying the spleen and invigorating qi.
In 302 approvals for traditional Chinese medicinal health foods with auxiliary protective functions against chemical liver damage, a total of 150 types of medicinal materials were involved, with 30 types appearing ten times or more, accounting for 74.87% of all medicinal materials. The top five in frequency were Pueraria, Schisandra, Goji Berries, Ganoderma, and Ziziphus, as detailed in Table 2. A frequency analysis of the categories revealed that the top five categories were tonifying deficiency medicines (57 times), clearing heat medicines (18 times), regulating qi medicines (16 times), diuretic and dampness-draining medicines (10 times), and exterior-releasing medicines (8 times), collectively accounting for 72.67%. For chemical liver damage, the sub-health state often falls under the category of “hypochondriac pain,” primarily belonging to the category of excess heat syndrome, which is prone to dampness and phlegm. Therefore, treatment often involves dispelling dampness, regulating qi, tonifying the spleen, and resolving stasis, thus favoring the use of tonifying deficiency and regulating qi medicinal materials. The use of high-frequency medicinal categories in health foods aligns with this treatment principle. However, the method of selecting formula raw materials based on medicinal categories still has imperfections; for example, although Schisandra ranks second in frequency, it is an astringent medicine and has different properties compared to tonifying deficiency and regulating qi. However, it can be combined with other medicinal materials to exert its tonifying effects in classic formulas like Ginseng Schisandra Decoction. Therefore, the selection of formula raw materials should also comprehensively consider compatibility and other factors.
Combining the high-frequency medicinal materials from both categories, the overlapping materials include: Propolis, Astragalus, Chenpi, Poria, White Peony, Notoginseng, Pueraria, Dandelion, Licorice, Jujube, Ginseng, and Hawthorn, totaling 12 medicinal materials. This indicates a significant overlap in the medicinal materials that provide auxiliary protection for gastric mucosa and chemical liver damage, and these 12 materials can play significant roles in both conditions.
2.1.2 Analysis of Medicinal Material Correlation and Network Support indicates the proportion of associated medicinal materials appearing in all data; confidence reflects the strength of the connection between associated medicinal materials; and lift is primarily used to measure the independence between the antecedent and consequent of association rules. If lift > 1, it can be considered that there is a good correlation between the antecedent and consequent, and the strength of correlation is positively correlated with the numerical value. The high-frequency medicinal materials with auxiliary protective effects on gastric mucosa and chemical liver damage were analyzed using the Apriori algorithm and network diagram, with support ≥ 10% and confidence ≥ 50% as screening criteria, yielding combinations of high-frequency medicinal materials and network diagrams, as shown in Table 3 and Figure 2.
From Table 3, it can be seen that among the combinations of medicinal materials with auxiliary protective effects on gastric mucosa and chemical liver damage, the strongest correlations were observed between Bai Zhi and Wu Zhu Yu, as well as between Schisandra and Dan Shen, and Pueraria.
To further reveal the relationships between high-frequency medicinal materials, a network diagram analysis was conducted. The strength of the associations between the medicinal materials generally aligns with the results of the secondary correlation analysis. Among the medicinal materials providing auxiliary protection for gastric mucosa, the materials showing strong associations with Bletillae include Notoginseng, Astragalus, Dandelion, Licorice, and Atractylodes. For the auxiliary protection against chemical liver damage, the materials primarily associated with Bletillae are White Peony and Licorice.
2.2 Data Analysis of Formulas
2.2.1 Analysis of Medicinal Material Frequency Statistical analysis of the collected formula medicinal materials revealed a total of 114 types of medicinal materials, with a cumulative frequency of 238 occurrences. The medicinal materials appearing three times or more, excluding Bletillae, are detailed in Table 4. The top five categories based on frequency were tonifying deficiency medicines (21 times), clearing heat medicines (17 times), invigorating blood and resolving stasis medicines (12 times), phlegm-resolving medicines (8 times), and exterior-releasing medicines (6 times), accounting for 56.64%. Compared to the categories of health food raw materials, the formulas also included invigorating blood and resolving stasis medicines and phlegm-resolving medicines. The use of these medicinal materials may relate to the therapies involving invigorating blood and resolving stasis, as well as dispelling dampness and resolving phlegm in treating spleen qi deficiency and other conditions.
2.2.2 Analysis of Medicinal Material Correlation Through secondary correlation analysis of the high-frequency medicinal materials in Table 4, the strongest correlation was found between Atractylodes and Chuanxiong (support = 12.90%, confidence = 75%, lift = 7.75). The network analysis of high-frequency medicinal materials in the formulas is shown in Figure 3, indicating strong associations among many medicinal materials, with a close association between Astragalus and Licorice.
2.3 Data Analysis of Traditional Chinese Medicines
2.3.1 Analysis of Medicinal Material Frequency Statistics of the raw materials in traditional Chinese medicines with tonifying spleen and invigorating qi effects revealed the frequency of raw materials (Table 5). After excluding raw materials not in the health food raw material directory, the top five materials were Licorice (26 times), Astragalus (17 times), White Peony (16 times), Notoginseng (11 times), and Cardamom (11 times). This shows a high overlap with the high-frequency medicinal materials in health foods with gastric and liver protective functions.
The analysis of medicinal material categories showed that the top five were tonifying deficiency medicines (23 times), clearing heat medicines (21 times), regulating qi medicines (12 times), invigorating blood and resolving stasis medicines (10 times), and exterior-releasing medicines (8 times), accounting for 53.62%. The categories of raw materials used in traditional Chinese medicines were similar to those in formulas.
2.3.2 Analysis of Medicinal Material Correlation Through secondary correlation analysis, the strongest association between medicinal materials was found between Clove and Cardamom (support = 11.11%, confidence = 60%, lift = 9.0). The network diagram (Figure 4) shows that Licorice has a good correlation with Astragalus and Chenpi, while the correlation line between Astragalus and Chenpi is also clear, indicating a strong association.
2.4 Analysis of Health Food Formulations with Bletillae as the Main Ingredient for Gastric and Liver Protection
2.4.1 Modeling Process Factors influencing the selection of health food formulations are numerous, including categories of medicines and the health food raw material directory. When selecting formulation raw materials, these factors are often treated as equally important; however, due to the different functions of the formulations, the weights of these factors also vary. The non-equivalent evaluation method AHP analysis is a quantitative evaluation method that can derive weights based on qualitative assessments. This study established an AHP evaluation indicator system based on existing research, as shown in Table 6.
In the AHP analysis, multiple scoring criteria were selected to assign scores, aiming to filter out a more reliable evaluation method. In the second-level indicator category of medicinal materials, four scoring methods based on the formula database, traditional Chinese medicine database, health food database (including auxiliary protective effects on gastric mucosa and chemical liver damage), and the four databases were compared.
After AHP analysis, the data obtained from the weights assigned to high-frequency raw materials still possess a certain degree of subjectivity, and the data structure is vague. Beyond ranking based on results, further data exploration is required. SOM clustering is suitable for processing imprecise fuzzy information and has strong capabilities in solving nonlinear problems, allowing for better reflection of relationships among data. The data for analyzing each medicinal material in this study came from two-dimensional data points processed through AHP analysis, linear weighting, and normalization, with the final clustering results used for subsequent evaluations.
TOPSIS evaluates the proposed plan by judging the clustering of evaluation objects against positive and negative ideal solutions. If the plan is closest to the positive ideal solution while being farthest from the negative ideal solution, it is considered the best plan, and this forms the basis for ranking the plans, making it a comprehensive evaluation method that focuses on the characteristics of indicator data itself and evaluates more objectively. However, TOPSIS analysis cannot determine the weight of each evaluation factor, requiring pre-assigned weights, so in this study, the data for TOPSIS analysis came from the weights and combinations of the drugs in the candidate formulations after AHP analysis and SOM clustering.
2.4.2 AHP Analysis Results λ represents the maximum eigenvalue of the AHP analysis results. CI value is a parameter used to indicate the consistency of the weight calculation matrix in AHP analysis, while CR value is the consistency ratio, generally accepted if CR < 0.1. The smaller these two indicators, the better the consistency of the matrix, indicating that the original evaluation assignment is more appropriate. The maximum eigenvalue, CI, and CR values of different evaluation methods under the category of medicinal materials are shown in Table 7.
The results show that based on the existing approvals for health foods, traditional Chinese medicines, and formula data, the main medicinal materials with potential for prioritization in gastric and liver protection, totaling 20 raw materials with total weights greater than 0, include Licorice, Chenpi, Astragalus, Poria, and Schisandra, in addition to Bletillae. The weight ranking of these medicinal materials provides a basis for the establishment of subsequent formulations.
2.4.3 SOM Analysis Results After assigning weights through AHP analysis, the 64 high-frequency medicinal materials were clustered using SOM, resulting in seven categories, as shown in Table 9. The relationship diagram among neurons further validates the classification results, as shown in Figure 5. In Figure 5-A, gray indicates neuron nodes, red lines indicate connections between neurons, and the color blocks in the center indicate the proximity of the weight vectors of the neurons, with deeper colors indicating greater distance. Figures 5-C and D show the weight connections between each input vector and competitive layer neurons, with the smallest weight color block being blue, the block with zero weight being black, and the block with the maximum weight being red.
2.4.4 TOPSIS Analysis Results Based on the statistical analysis of each raw material’s protective functions for gastric mucosa and chemical liver damage in current medical research, and according to the total weight ranking from AHP analysis and linear weighting, as well as SOM clustering results, a combination of 15 different categories of raw materials was selected, including Bletillae, Licorice, Chenpi, Astragalus, Poria, Schisandra, Danshen, White Peony, Fritillaria, Atractylodes, Codonopsis, Ginseng, Saussurea, Yam, and Angelica. Following the principles of tonifying the spleen and invigorating qi in traditional Chinese medicine and the theory of compatibility, a total of 10 formulations were formed. After TOPSIS analysis, the formulas were ranked based on their scores, as shown in Table 10.
The scoring results indicate significant differences among the 10 formulations, but the top two formulations have similar scores, necessitating further research for final selection.
3 Discussion
The general idea for health food formulations is to complete statistics on the frequency of raw materials, categories of raw materials, etc., by organizing the content of the database, conducting formula analysis based on statistical results, and determining the formulation based on actual conditions. This study builds on this foundation by adding multi-dimensional analysis of statistical results and quantitatively evaluating multiple potential formulations. This includes using Apriori to analyze the correlation and network diagrams of medicinal materials as a reference for subsequent formulations. Improvements were made to the standard process of AHP by comparing various objective, quantifiable, and easily repeatable evaluation methods, selecting scoring based on the comprehensive situation of all databases. After calculating the weights of each high-frequency medicinal material’s indicators at various levels, linear weighting was applied to process and rank each medicinal material, assigning missing items a score of -1, ultimately forming a comprehensive ranking of high-frequency medicinal materials under the dual functions of gastric and liver protection. Then, a clustering analysis was conducted, categorizing high-frequency raw materials from optimal selection to non-recommended materials into seven categories. Based on the principles of tonifying the spleen and invigorating qi in traditional Chinese medicine and modern research on the functions and nutrition of optimal to generally recommended medicinal materials, formulations that can be used in conjunction with Bletillae and possess dual functions of gastric and liver protection were formed. Finally, the scoring of potential formulations was evaluated through TOPSIS analysis based on the five data points of monarch, minister, assistant, messenger, and total weight in the formulations.
Through the correlation analysis of various databases using Apriori, it was found that there is a high repetition rate of medicinal categories and high-frequency medicinal materials in the health food information, traditional Chinese medicine database, and formula database. Among the four databases, the categories of medicinal materials that frequently appeared were tonifying deficiency medicines, clearing heat medicines, and regulating qi medicines. The results of the secondary correlation analysis in the four databases support the treatment principle of tonifying the spleen and invigorating qi; however, the data results are limited to frequency or support levels, lacking quantitative analysis of each raw material or compatibility. Additionally, the secondary correlation analysis data is overly dispersed, leading to insufficient overall evaluation of medicinal materials in the database. Therefore, the introduction of AHP, SOM clustering, and TOPSIS analysis to conduct more layered analyses of the medicinal materials in the four databases is considered to uncover raw materials that may not have been fully utilized in past application research. AHP and TOPSIS analyses are widely applied in the pharmaceutical field, covering various aspects such as optimizing extraction processes, identifying quality markers, constructing evaluation systems for health food raw materials, evaluating the quality of medicinal materials, and assessing the utilization and activity of medicines. SOM clustering has been rarely applied in pharmacy, but its combined use with AHP and TOPSIS analysis has diverse and mature application examples. The combined application of AHP and SOM clustering is often used for selecting multiple schemes or establishing a quality evaluation system in situations where both qualitative and quantitative indicators coexist. In contrast, the combined use of SOM and TOPSIS can significantly reduce the number of objects evaluated in the final selection.
A limitation of this study is that the capacity of the resulting database is relatively small, leading to concentrated analysis results. Additionally, the scoring results primarily rely on the frequency statistics of various indicators, hence there are still defects in scoring assignments. Future research will aim to incorporate clinical data, patents, research projects, etc., to provide more objective references for raw material rankings. Furthermore, there is a lack of verification of the final result algorithms and the feasibility of the prescriptions, which will be strengthened in future studies.
Conflict of Interest All authors declare no conflicts of interest.
References (omitted)
Source: Ma Jiamu, Liu Xiaoyun, Ren Xueyang, Wang Yu, Dong Ying, Song Ruolan, Yu Axiang, Wei Jing, Fan Qiqi, Zhe Gaimai. Design and Evaluation of Bletilla-Based Health Food Formulation for Gastric and Liver Protection [J]. Chinese Herbal Medicine, 2021, 52(18): 5676-5687.