Which Software Should You Learn: TreeAge, Excel, R, or MATLAB? A Comparison of Four Classic Cost-Effectiveness Analysis (CEA) Tools

Click to follow me and set as a star to receive updates immediatelyWhich Software Should You Learn: TreeAge, Excel, R, or MATLAB? A Comparison of Four Classic Cost-Effectiveness Analysis (CEA) Tools

The academic and commercial research utilizing decision analysis models is rapidly increasing in both quantity and technical complexity. The variety of software used to implement such models is also expanding, but most modeling work is concentrated among a few software packages. In the field of decision analysis modeling, simplicity, transparency, and reproducibility are recognized as important principles, and these must be considered when selecting software. However, these are not the only considerations. The software must be capable of constructing the required types of models and should match the analyst’s experience and technical capabilities. The technical requirements of health technology assessment (HTA) agencies effectively set a minimum capability standard for the software. However, the capabilities of the assessment models used by HTA agencies also limit the range of software choices available to decision modelers. Additionally, complex analytical requirements mean that computational efficiency is crucial.

For constructing cost-effectiveness models, Microsoft Excel, R, and TreeAge Pro are the primary software used by the National Institute for Health Research (NIHR) Technology Assessment Review (TAR) team. Other relevant software includes Arena, Microsoft Visual Basic, and SIMUL8.

Next, we will compare TreeAge Pro 2016 R1 (hereinafter referred to as TreeAge), Microsoft Excel 2016 (hereinafter referred to as Excel), and two programming language-based software—R 3.2.4 x64 (hereinafter referred to as R) and MATLAB 2016a x64 (hereinafter referred to as MATLAB). Given that Markov models and decision trees dominate the published cost-effectiveness modeling (CEM) literature, the focus will be on software used to construct such models. Arena and Simul8 are excluded from the comparison as these packages are more commonly used for discrete event simulation (DES) rather than constructing Markov models or decision trees. Aside from TreeAge, the other software packages are either open-source or have open-source alternatives.

Software Overview

Excel

Excel is often regarded as the “universal tool” in the field of cost-effectiveness modeling. In a 2008 NICE DSU survey, all 28 respondents indicated that they had used Excel when constructing technology assessment models. Like TreeAge, it is the only software that the TAR team reported being proficient in.

Excel is efficient for building simple models, but for complex models, other software is more suitable. With macros written in Visual Basic for Applications (VBA) and numerous third-party packages, Excel has strong extensibility, effectively enhancing its functionality. Additionally, Microsoft continues to develop JavaScript application programming interfaces (APIs) within the Microsoft Office product line, providing more options for advanced modeling work.

Excel is commonly used in introductory courses on cost-effectiveness modeling and related textbooks, and it is available on both Microsoft Windows and Apple MacOS operating systems. LibreOffice and OpenOffice are open-source spreadsheet programs that are largely compatible with Excel; however, some Excel macros may need to be converted for use in these software. When running in LibreOffice, due to column limitations (LibreOffice limits to 1024 columns while Excel allows 16384), it is only necessary to split one worksheet in Excel into two.

Excel is feature-rich, supporting various statistical and econometric functions, and can extend its built-in capabilities through recording macros or writing macros directly. The complexity of macros varies, from basic “for” loops to complex user-defined functions. For advanced users who extend Excel’s functionality by writing macros, the VBA development environment is comprehensive, featuring syntax highlighting, code auto-completion, and debugging and project management tools. Excel offers a wealth of formatting options, facilitating the presentation of models and their results, while also allowing for clear documentation of model structures.Excel is tightly integrated with other components of the Microsoft Office suite (such as Microsoft Word and Microsoft PowerPoint), aiding in the efficient generation of reports and presentations for disseminating results..

Microsoft has faced criticism for the quality of Excel’s statistical functions and random number generator (RNG) algorithms. Many scholars have pointed out issues with the RNG quality and accuracy of statistical functions in Excel. Due to unresolved issues with the default RNG, it is necessary to use an external RNG. Specifically, Excel’s RNG lacks documentation, and the seed used to initialize the RNG cannot be set, which poses challenges for the reproducibility and validation of models. Despite these issues, given that many functions have seen significant improvements, Microsoft Excel (2010 and 2013 versions) is a powerful Monte Carlo simulation application. As of Microsoft Excel 2016, the RNG still lacks documentation. The “Randomize” VBA function can set the RNG seed in VBA macros, but it cannot be set within the Excel program itself..

Since Microsoft Excel 2007, the software has introduced a “multi-threaded recalculation” (MTR) feature, which automatically attempts to parallelize processing as long as the recalculation process only uses thread-safe functions. MTR is enabled by default, and users do not need to set it up additionally. Furthermore, disabling unnecessary display outputs (such as progress bars and screen updates) can significantly shorten processing time.

Which Software Should You Learn: TreeAge, Excel, R, or MATLAB? A Comparison of Four Classic Cost-Effectiveness Analysis (CEA) Tools

TreeAge

TreeAge is a commonly used tool in health technology assessment (HTA). Research shows that 57% of respondents have used TreeAge to submit technology assessment models. The advantage of TreeAge lies in its ease of model construction. In TreeAge, models can be defined through visual representations such as state transition diagrams, which can then be converted into decision trees or Markov models. Moreover, Markov models can also be converted into discrete event simulation (DES) models. The visual interface supports “copy” and “paste” operations for the whole model or parts of it, and also supports “cloning” of subtrees. TreeAge can interact with various software packages (including Excel) and can automatically generate various output results. The table below summarizes some of the software’s features.

Clearly Supported Models

Analysis and Output

Interoperability

Extensibility

Budget Impact Analysis (Dynamic Cohort)

Decision Trees

Discrete Event Simulation (Time to Event)

Markov Models

Bayesian Adjustment

Microsimulation (Individual State Transition Models)

Bayesian Adjustment

Cost-Effectiveness Plane / Scatter Plots

Cost-Effectiveness Acceptability Curves (CEAC)

Deterministic Analysis and Probabilistic Sensitivity Analysis (PSA)

Expected Value of Perfect Information (EVPI)

Expected Value of Partial Perfect Information (EVPPI)

Incremental Cost-Effectiveness Ratios (ICERs) and Advantage Analysis

Markov Trajectories

Survival Curves

Threshold Analysis

Tornado Diagrams

Various Charts (Net Monetary Benefit (NMB) vs Willingness to Pay (WTP) charts, Expected Value of Perfect Information (EVPI) vs Willingness to Pay (WTP) charts, etc.)

Various Distributions (Incremental Cost-Effectiveness Ratios (ICERs), Random Parameters, etc.)

Excel, Java, Active API, ODBC Data Links, Python

Distributed Computing, Multi-threading

It is worth noting that TreeAge supports two implementations of Markov models: standard Markov chains and Markov trees. The Markov tree, initially proposed by Horlenberg, allows for a more aesthetically pleasing presentation of Markov models. These two forms are not entirely equivalent; although the differences are subtle, they can have a significant impact on model results.

For more complex models, it is crucial to maximize the number of CPU threads utilized by TreeAge. Additionally, it is recommended to allocate a large amount of available random access memory (RAM) to TreeAge. TreeAge supports distributed processing.

TreeAge offers various license types based on industry and usage scenarios. The “Healthcare” version license is required for using Markov models and microsimulation models, as well as for conducting cost-effectiveness analysis, Markov queue analysis, and healthcare reporting functions. Creating state transition diagrams, conducting DES analysis, performing distributed processing, and creating participant models require not only the “Healthcare” module but also a valid “Maintenance” license. For more details, please visit the TreeAge official website.

Which Software Should You Learn: TreeAge, Excel, R, or MATLAB? A Comparison of Four Classic Cost-Effectiveness Analysis (CEA) Tools

MATLAB and R

MATLAB and R are both programming languages commonly used for mathematical and statistical analysis. They have similar advantages and disadvantages, and their development cycles are also comparable. Below, we will first provide an overall analysis of both, followed by specific application details relevant to analysts. As high-level programming languages, MATLAB and R support the construction of various models. There are virtually no limits on the complexity, structure, or scope of models and analyses, and they can perform analyses of information value (VOI) and expected value of perfect parameter information (EVPPI). Both languages have rich plotting capabilities with numerous customizable options. Their functionality can be further extended through community or vendor-developed packages.

Both MATLAB and R come with built-in standard statistical functions that can be used for econometric and other statistical analyses within models. They can be used with or alongside integrated development environments (IDEs), such as R with RStudio. IDEs can significantly enhance work efficiency during model development, as they not only visually highlight syntax errors but also provide code management and debugging tools.

MATLAB and R both use the Mersenne Twister algorithm as their default random number generator (RNG), which has advantages such as a long period and high implementation efficiency.

As with all programming languages, pre-allocating matrices, arrays, or vectors used during computations helps improve computational efficiency. Since MATLAB and R widely utilize vectorized operations, which can handle multiple data elements simultaneously, it is advisable to avoid using “for” loops to prevent reducing computational efficiency. For probabilistic models, generating random variables outside of loops is more efficient. Both software packages have analysis tools that can identify parts of the code that consume significant processing time, thus pinpointing where to optimize code efficiency. Similar to Excel, it is recommended to disable progress bar displays, as they can slow down analysis speed.

MATLAB

MATLAB, short for “Matrix Laboratory,” is a widely popular high-level development environment. GNU Octave (hereinafter referred to as Octave) is an open-source alternative that is “basically compatible” with the MATLAB language. Although both are suitable for various decision modeling tasks.

MATLAB, as a high-level language combined with an IDE, is a general computing platform that allows for rapid development of numerical programs. Its rich library of standard functions provides strong support for solving various problems, and both MATLAB and Octave can be functionally extended through user-submitted packages. MATLAB’s “toolboxes” provide specialized extensions for different fields, covering areas such as bioinformatics and aerospace. For example, MathWorks’ Simulink and SimEvents products offer graphical DES capabilities; however, even without these products, using MATLAB along with the Statistics and Machine Learning Toolbox is sufficient to achieve DES functionality.

MATLAB includes a built-in profiler that can be used to evaluate program performance and identify potential optimization directions. The MATLAB editor can visually and intuitively detect syntax and programming errors, as well as inefficient parts of the code, while Octave lacks this feature. The built-in debugging tools allow programmers to inspect programs line by line to troubleshoot issues. Its accelerator feature can further optimize programs, significantly reducing run times. Additionally, MATLAB extensively employs parallel processing techniques in many standard functions.

While programs written in MATLAB are mostly compatible with Octave, there are still some component differences. For instance, the syntax for setting the RNG seed in Octave differs from that in MATLAB. The processing speed of Octave largely depends on the user’s optimization of the code, while MATLAB’s accelerator reduces the need for extensive optimization. Code optimization is particularly important when performing EVPPI and other multi-layer Monte Carlo analyses. By default, MATLAB uses double-precision floating-point arithmetic; for some models, this level of precision may be excessive and increase computational burden. If precision requirements can be met, single-precision arithmetic is more efficient computationally.

MATLAB has different pricing for different user groups, while Octave is free. The basic version of MATLAB only has the capability to generate uniformly distributed and standard normal distributed random numbers. Although it is possible to manually generate random numbers from other distributions using basic functions, it is recommended to use the Statistics and Machine Learning Toolbox for decision models.

Which Software Should You Learn: TreeAge, Excel, R, or MATLAB? A Comparison of Four Classic Cost-Effectiveness Analysis (CEA) Tools

R

R is an open-source language and environment designed for developing statistical programming solutions. It is a different implementation of the S language (developed by Bell Labs) and has rapidly gained popularity since its official release in 2000. R’s application in health decision analysis is also becoming increasingly widespread.

Currently, R has over 8000 open-source packages developed by the community, available for download from the Comprehensive R Archive Network (CRAN). This includes packages specifically designed for cost-effectiveness analysis, such as BCEA (Bayesian Cost-Effectiveness Analysis package). Authors often publish articles in the Journal of Statistical Software to supplement the software packages, which often provide detailed documentation and sometimes include relevant theoretical content.

In addition to the base version of R on CRAN, there are other versions with various extensions and enhancements. Microsoft’s R Open (MRO) is a popular version that includes more efficient mathematical operation routines and advanced parallelization capabilities. For programmers, the analysis tools for checking program performance and debugging tools are very practical. From an analytical perspective, R has a wealth of free statistical and econometric analysis packages, along with a comprehensive help system and documentation, making it an attractive choice.

The standard R language itself does not support multi-threading and requires additional packages to achieve this functionality. It is important to note that using these packages may require additional technical knowledge. Unlike MATLAB, R does not have an “accelerator.” Compiling functions and files can help shorten run times, especially when using the just-in-time compilation feature supported by the “compiler” package. Therefore, users need to be more proactive in optimizing their code. By installing R versions optimized for mathematical operation routines and supporting multi-threading, processing speeds can be significantly improved.

Which Software Should You Learn: TreeAge, Excel, R, or MATLAB? A Comparison of Four Classic Cost-Effectiveness Analysis (CEA) Tools

Now let’s look at the performance of these four software packages:

Transparency and Validation

Models constructed in Excel have inherent transparency if there are no restrictions on the visibility of the model structure or code (such as hidden worksheets). Parameters and model structures should be clear. In any model, except the simplest ones, to efficiently update parameter values, cell names need to be used. However, models may become opaque due to “cell tracing,” where references and names form a complex network of variables. Nevertheless, Excel can use the “Trace Precedents” and “Trace Dependents” features to highlight these relationships, which somewhat alleviates the issue. Complex models may also require macros, and validating macros requires additional technical skills.

A significant issue in Excel is the inability to set the random number seed. Any model should be able to reproduce on demand; as of Excel 2016, if a model does not use complex VBA (including code to set the random number generator seed), then there is no reproducibility issue. This is also a problem during sensitivity analysis, as consistency requires using the same sampling data whenever possible. For example, if a parameter of a distribution is changed, all other sampling data should remain unchanged for consistency.

The transparency of model structures in TreeAge is visually presented, allowing for easy display of variables, distributions (and their parameters), tables, and trackers by exporting model outputs to spreadsheets, thus making parameter settings transparent. It also provides model validation tools to check for common development errors, which is particularly useful when dealing with large models. The ability to provide models (including stored analysis results) to unauthorized users in the form of TreeAge Pro Player, along with rich debugging options such as console output and Markov queue analysis, makes rigorous validation and interpretation possible. Additionally, user-defined help files tailored to specific models can be incorporated, allowing analysts to document models for reference and validation.

Although TreeAge’s implementation is proprietary and does not allow for code validation, the software can provide sufficient output results to confirm the accuracy of calculations. In many respects, TreeAge has standardized the cost-effectiveness models it supports, which no other program can achieve. However, as of March 2017, the TreeAge user manual did not explain the difference between Markov chains and Markov cycle trees; without understanding this, the validation and transparency of TreeAge models can become meaningless and even misleading.

Models constructed with languages like R and MATLAB have high transparency, as the code itself implicitly records the model structure. With detailed comments added, the program becomes easy to understand, and assumptions regarding parameters, model structures, and analyses are easily identifiable.

Changing parameters in these models is straightforward, allowing assessment agencies to easily conduct various scenario analyses. Mathematical formulas are also clear, and better understanding can be achieved through relevant comments and references. Since statistical data can be computed internally within the software, this type of analysis also has high transparency. However, despite these programming languages providing the highest level of transparency, conducting rigorous validation can be challenging due to the potential for thousands of lines of code in a model.

Processing Speed

Clearly, complex models and their analyses can benefit immensely from code written for specific tasks. This should be kept in mind when considering training options. Learning TreeAge or Excel may be easier, which can be appealing for one-off modeling tasks. However, over time, most analysts will be involved in constructing multiple models, so the additional investment required to learn a programming language should be considered, as it may yield greater returns in the long run due to the significant reduction in time required for running model simulations.

Learning Curve

It is evident that due to TreeAge’s graphical features and the various tools it provides for converting charts into models, constructing models in TreeAge is easier. Spending time studying the TreeAge user manual and referring to the rich examples within the software is sufficient to grasp the nuances of the software. Additionally, TreeAge offers various training options.

Excel, which does not involve complex VBA programming, is the second easiest software to learn. As long as one understands spreadsheets and the relationships between cells, implementing cost-effectiveness models (CEM) in Excel is relatively straightforward. However, stochastic models require some knowledge of VBA, and more complex simulations necessitate increasing proficiency in VBA. For analysts requiring this coding expertise, given the significant speed advantages of MATLAB or R, along with their benefits in transparency and validation, investing time in learning them may be more valuable.

The learning difficulty of R and MATLAB is comparable. Building models with programming languages requires at least an understanding of control flow, data structures, random number generators, file operations, and syntax. Additional skills are needed to ensure that the implementation process is thread-safe for models to run in parallel across multiple processing units. MATLAB offers training courses and certifications, and R has similar courses available.

Functionality

TreeAge can run specific model types with different structures. In contrast, the limitations on model types and variations in other software packages depend solely on the user’s imagination and computational capabilities. Additionally, Excel, R, and MATLAB can perform statistical analyses within models, enhancing transparency, verifiability, and work efficiency. When using TreeAge, parameters must first be obtained and then imported or linked to TreeAge. If these parameters are not documented in detail elsewhere, ambiguities can arise, affecting validation and transparency. Excel’s statistical capabilities are not as robust as those of programming languages, thus ranking lower in this regard.

Conclusion

Clearly, each program has distinct advantages and disadvantages, and which one is the “best” depends on many factors, including the purpose of building models, the analyst’s existing expertise, the complexity of the analyses required, the time available to complete the analyses, and the financial resources supporting this work.

For educational users, the following perspectives may be helpful. TreeAge provides an environment where concepts discussed in class can be quickly and easily implemented. It does not require complex mathematical skills, only an understanding of cost-effectiveness models (CEM). Implementing concepts such as half-cycle correction and specifying distributions and model structures in TreeAge is straightforward. Therefore, students can easily apply theoretical concepts to practical operations.

For courses that emphasize mathematical content, Excel offers a rapid development environment where students not only need to understand relevant concepts but also apply mathematical knowledge. For example, students must clearly construct Markov chains, set parameters for distributions, sample from these distributions, debug any issues encountered, and meaningfully combine results. Doing these things in Excel (without involving complex VBA programming) is relatively easier than using programming languages.

Advanced courses benefit more from using programming languages. Students need to apply mathematical knowledge as they would in Excel, but they also gain additional skills in basic programming. Statistical methods commonly used in health technology assessment (HTA), such as survival analysis, can be easily integrated into course content, enhancing learning outcomes.

For commercial users, it is important to consider the range of analyses that may be conducted across different projects rather than focusing on a single project. In this case, the flexibility of programming languages may be necessary. Although the acquisition cost of MATLAB is high, the increased productivity may quickly offset this cost. This is especially true for users who need to perform a large number of complex calculations in their projects. If such analyses are not a consistent feature of the work, R may be a more cost-effective investment choice.

For companies submitting models to health technology assessment (HTA) agencies, any of the four software packages mentioned above can meet the requirements. That said, as HTA agencies begin to focus on early evidence for conditionally approved technologies, the nature of the evidence required by HTA agencies may change. For example, the new Cancer Drug Fund process implemented in the UK stipulates that funding is contingent on the generation of additional evidence. NICE will determine what new evidence is needed. Relatively complex analyses of information value (VOI) may be required to determine the appropriate sample size for these post-marketing studies, and some software programs may not be able to conduct these analyses. In fact, HTA agencies themselves must also consider whether they should require companies to submit more complex materials and models.

(***)

Which Software Should You Learn: TreeAge, Excel, R, or MATLAB? A Comparison of Four Classic Cost-Effectiveness Analysis (CEA) Tools

  • For help and communication, please contact:[email protected]

  • For reprints and citations, please indicate “HEOR Notes

E.N.D

Leave a Comment