1.1. How do I change the number of trials in my study?

KERUS™ users can add additional trial setups to the current study to evaluate the effect of different trial designs or variable distributions on the probability of success. New trials can be added through the ‘New Trial’ menu option or by right-clicking on an existing trial and selecting ‘Copy’. The ‘New Trial’ menu option will generate a completely empty trial where all trial information, variables and relations need to be defined. The ‘Copy’ function generates a complete copy of the selected trial with all trial information, variables and relations included.

If a KERUS™ user wants to remove a trial from the simulation, they can right-click on the respective trial and select ‘Remove’, after which they will be asked to confirm the action through a pop-up dialogue window.

This is the last opportunity to halt the deletion of the trial. The user should consider carefully before proceeding as the deletion of a trial cannot be undone.

1.2. What is the difference between ‘Trial Designs’?

KERUS™ offers the user three trial design structures: Single Group, Case-Control, and Parallel Group.

Single Group can accommodate only a single group of subjects and is used for comparison to previously reported or hypothetical outcomes.

Case Control can accommodate two groups of subjects; one group with a disease or outcome and a group without the disease or outcome. These trials are exclusively retrospective in nature as the outcome of each subject is known by the investigator prior to the trial. Parallel Group can accommodate two or more groups, where each subject is randomised to a group prior to trial commencement. These trials are prospective as the subject outcomes are determined during the course of the trial.

Case-Control and Parallel Group have similar characteristics within KERUS™. Case-Control designs require the assignment of the ‘Control’ group through the adjacent tick box. KERUS™ automatically assigns this group a rank of 1 which is used for defining the order of groups in statistical comparisons. Parallel Group does not require the assignment of a ‘Control’ group, but the Value/Rank of each group can be assigned if the user wishes.

A two group Parallel Group design with ranks assigned is analysed in KERUS™ in the same way as a Case-Control trial.

1.3. How do I evaluate the impact of best- and worst-case values for a variable?

Parameters (mean, standard deviation, etc) entered for a given variable within KERUS™ are most often discovered from a previous study. These parameters are usually calculated with a degree of uncertainty dependent on study size, subject characteristics and distribution type. This uncertainty can be handled in two ways within KERUS™.

KERUS™ allows the user to define the range of uncertainty for most parameters simply by typing two numbers into each entry box (if more are entered then the maximum and minimum are used). KERUS™ will then randomly choose a parameter value within that range for each simulation. This allows the user to model the uncertainty in the parameter estimates into the simulation and allow for this in the overall risk assessment.

Traditional statistical tools are not able to handle this variability so comprehensively and so some users may wish to follow the route of comparing best case and worst case scenarios. Therefore, KERUS™ users may wish to model the impact of different extreme values for these variables in order to evaluate the plasticity of the given trial design to the best- and worst-case variable values.

In order to assess the impact of different variable parameters, a new copy of the trial must first be created. This is most easily accomplished by right-clicking on the original, complete trial and selecting “Copy”. This new trial should be assigned a new Trial Label, which clearly indicates its identity.

In the new trial, the variable to be manipulated should be selected from the Variable list within the Correlated or Independent tabs. Once selected, the “Edit Variable” button will become available, which allows the parameters of the variable to be viewed and edited. The variable parameter in question should be changed to the predicted best or worst-case value. Only single values should be entered.

KERUS™ users should note that there is no need to change the names of any variables in the copied trial, provided the variable distribution types are not changed between trials.

The variable can be used within an appropriate statistical test and analysis objective to directly assess the effect of the best/worst-case value on trial success probabilities. KERUS™ users should note that altering a variable to the best/worst-case values will affect the values simulated for all correlated variables and may alter success probabilities of additional analysis objectives.

Each statistical method and analysis objective assigned in the Analysis tab will be evaluated in all trials and therefore trial success probabilities will be calculated for the best/worse/average variable values assigned in each copy of the trial.

The trial success probabilities for each trial will be displayed as separate trial scenarios in the Output panel.

Trial scenarios 1 to 4 are generated from within Trial001 with the average values set for the variable parameter and increasing sample numbers in each consecutive scenario. Trial scenarios 5 to 8 are generated from within Trial002, which is a copy of Trial001 with the exception of the variable parameter is set with the best-case value. Trial scenarios 9 to 12 are generated from within Trial003, which is a copy of Trial001 with the exception of the variable parameter is set with the worst-case value.

1.4. How do I best optimise my trial design?

Determining the optimum trial design scenario, in terms of subject numbers, allocation ratios and trial structure, can be a difficult task and may require the evaluation of many trial designs. Simulating a large number of datasets in multiple trial design scenarios can be computationally expensive and require more memory and time to calculate.

Performing an iterative optimisation of the trial design parameters can increase the efficiency at which optimum trial designs can be ascertained. The KERUS™ user should first reduce the number of simulations per trial design scenario to a value between 50 and 100, using the respective text box in the Study Information panel on the Setup tab, to conserve memory and reduce analysis time.

Secondly, the KERUS™ user should define a wide range of trial design parameters to create a diverse collection of trial design scenarios. For example, the user could define Parallel Group trials with sample sizes of 50, 150, 250, 350, 450 and 550, and allocation ratios of 0.2:0.8, 0.4:0.6, 0.6:0.4 and 0.8:0.2 to generate 24 scenarios.

Performing the simulation and subsequent analysis of these datasets should proceed quickly and consume little memory. Once a single, or small range of, trial designs have been shown to meet the desired analysis objectives, the user should return to the KERUS™ Setup tab and alter the design parameters to a narrower range centred on the selected trial, or trials, from the first iteration. For example, the user may narrow the sample sizes of 150, 175, 200, 225, 250 and 300, and allocation ratios of 0.35:0.65, 0.5:0.5 and 0.65:0.35 to generate 18 scenarios. The process of simulation, analysis and selection of the best trial designs should be performed as in the previous iteration.

When the trial design parameters have been sufficiently narrowed, the number of simulations should be returned to more than 100 (> 1000 preferred) and the simulation and evaluation repeated to achieve a sensitive prediction of the trial success probabilities. This will be the final iteration of optimisation of the trial design and should be performed with few trial design scenarios of a narrow range and determine the optimum design.

1.5. How can I design a trial that takes into account confounding factors?

Confounding is when the effects of a variable of interest on an outcome is mixed with the effects of an additional variable resulting in a misrepresentation of the true association. This can occur in a clinical trial when the distribution of a variable differs between the groups being compared. For example, if measuring the effect of treatment (‘Control’ or ‘Treated’) on subject survival and the ‘Treated’ group has a greater proportion of subjects suffering from a less aggressive disease stage, a significant association between treatment and survival could be reported where none exists. These confounding variables can reduce the trial’s power to establish a causal link between treatment and outcome, and require the application of appropriate statistical methods to adjust the observed associations.

KERUS™ can currently simulate multiple interrelated factors and so if ideally suited to simulation of confounding variables, allowing the user to generate different test populations that exhibit different magnitudes of confounding influence. This would allow the user to readily estimate the impact of poor control over confounding factors in the trial design allowing optimisation of selection protocols to minimise confounding. The user could see which possible confounders are most likely to artificially inflate (or deflate) the probability of achieving the analysis objectives.

However, explicit analysis options for directly analysing the influence confounding variable are limited to group stratification. Applying statistical tests to subgroups of subjects defined by the confounding variable can provide insights into the true association of treatment and outcome.

A statistically robust analysis, especially when multiple confounding variables are included, requires multivariate analysis which is not currently available within KERUS™. However, multivariate analysis can be performed on the saved simulated data in third party software. See How do I save my simulated data to analyse in another program? for details on exporting KERUS™ simulated data. Multivariate analysis may be available in future versions of KERUS™.

1.6. What does uniform allocation employed by default mean/refer to?

The KERUS™ user may receive a pop-up message dialogue stating “Allocation not correctly set, uniform allocation employed as default” after either defining the first trial variable (if it is independent) or defining the final correlated variable relationship.

This dialogue is informing the user that no subject allocation ratios were set for the simulated datasets. This may be accidental or the user may have chosen not to define intended allocation patterns. The user will not receive this message if allocation ratios have been entered and saved previously. It is not strictly necessary to define allocation ratios, however, KERUS™ requires this information prior to establishing group characteristics and simulation. In order to mark the trial as complete and proceed with the definition of trial analysis parameters, an equal distribution of subjects to each group is used as default. Users can return to the subject allocation window at any time through the ‘Allocation’ button to supply KERUS™ with the desired allocation ratios. Users should note, however, that altering the allocation ratios after simulation will invalidate the data and analysis and so these will be removed from the KERUS™ session.

1.7. How do I design a trial to test for equivalency?

In most cases, clinical trials are designed to establish the superiority of one treatment compared to another. The objective in an equivalence trial is the opposite of traditional clinical trials, in that the investigator is attempting to **rule out differences between groups**, and so this requires modifications to the traditional testing methods. KERUS™ does not currently provide these modified testing methods, but future versions of KERUS™ will.

1.8. How do I design a trial to test for non-inferiority?

In most cases, clinical trials are designed to establish the superiority of one treatment compared to another. The objective in an non-inferiority trial is different to that of a traditional clinical trial, in that the investigator is attempting show that the one group is **at least equivalent or better** than another, and so this requires modifications to the traditional testing methods. The examination of non-inferiority requires the comparison of value confidence intervals of to a defined value. These confidence intervals are usually the 90% confidence intervals, however, KERUS™ currently provides the 95% confidence intervals. Non-inferiority testing is possible with these default confidence intervals, however, the user will be testing at the **0.025** significance level.

It is ubiquitously recognised that variation within disease response to treatment is not only due to random biological variation, but also the presence of disease subtypes which are missed by traditional classification systems. The identification of patient subgroups which are likely to obtain a greater (or less) than average response to treatment has been the objective of many trials. Many of these trials focus on the utility of molecular markers (biomarkers) of disease characteristics to predict patient response. KERUS™ can evaluate designs for such trials using Area Under the Curve Receiver-Operator Characteristics (AUCROC) for continuous biomarker values or using sensitivity/specificity statistics for binomial values.

1.9.1. Continuous biomarker data

In this example, we will consider a KERUS™ user wishing to design a trial to test the increased expression of a microRNA biomarker can predict rheumatoid arthritis response to Rituximab. The KERUS™ user should first define the study and trial parameters such as simulated cohort sizes, trial group structure and allocation ratios as described in the KERUS™ instructional videos. Cohort sizes of 100, 150, 200 and 400 were assigned in a case-control design with subject allocation between groups of 0.25:0.75, 0.5:0.5 and 0.75:0.25. A case-control design should be selected as this is a retrospective trial i.e. the investigator has knowledge of the response of each subject before the trial starts. As the biomarker was designed to detect a positive response, the ‘Non-Responders’ group should be defined the ‘Control’ group. This allows KERUS™ to define which group is the reference in any statistical tests performed.

In this example, data on several variables, including inter-correlation coefficients, are available describing the characteristics of each group in the treatment cohort. These variables should be defined within the Variables panel as ‘Correlated’ variables using the summarised data below. See the KERUS™ instructional videos for more details on defining variables.

KERUS™ users should note that mean proportions for binomial variables should lie between 0 and 1 e.g. 0.608 and 0.657 in this example.

Several of these variables are known to be correlated with each other within the groups, so the relationships between them should be defined. See the KERUS™ instructional videos for more details on defining variable relationships. The user has data on three inter-variable correlations, which are summarised in the table below. These data should be used to define the available relationships. Any variable relationship for which the user does not have data should be removed from the ‘Relations’ list using the ‘Delete Relation’ button. See What if I don’t know the correlation coefficients between variables? for more details.

Once the user has defined all relevant variables and any variable relationships, they should proceed to the Analysis tab, where statistical tests and analysis objectives can be defined for the simulated trial data. See the KERUS™ instructional videos for more details on defining statistical analysis options. Within the Statistics panel, the user should select the microRNA biomarker variable from the ‘Variable’ drop-down menu. KERUS™ will automatically populate the adjacent ‘Test/Model’ and ‘Grouping’ menus with the options available to that variable distribution. As the microRNA biomarker variable is a continuous normally distributed variable KERUS™ provides the Area Under the Curve of Receiver Operator Characteristics (AUCROC) amongst the available test options. In this example, the user should select AUCROC as the test and ‘Group’ from the ‘Grouping’ drop-down menu to compare between the Responders and Non-Responders groups.

The user can also define additional tests to, for example, evaluate the trial design’s power to detect a significant association between the microRNA and response within genders. Clicking the ‘Confirm’ button located at the bottom of the Statistics panel, transfers the test information to the Objectives panel. When the microRNA expression variable is selected from the ‘Variable’ drop-down menu in the Objectives panel, the AUCROC test will be available in the ‘Model’ drop-down menu, and the ‘Statistic’ drop-down menu will be automatically populated with the key statistical outputs of the test. In this example only a single grouping option is available – Responders vs Non-Responders. The user can define logical tests on the key test statistics such as a p-value of less than (“<”) 0.05, or an Area Under the Curve (AUC) of greater than (“>”) 0.7.

These two objectives can be combined using ‘And’ logic, to generate a third analysis objective in which the two primary objective must be satisfied within the same simulated dataset i.e. a significant (p<0.05) AUC of greater than 0.7.

Clicking on the ‘Simulate and Evaluate’ button will cause KERUS™ to simulate the datasets, perform the statistical tests and evaluate the analysis objectives. The trial success probabilities for each objective can be viewed in the Output panel as either a barchart or heatmap. See the KERUS™ instructional videos for more details on viewing analysis result outputs.

The simulated trial data for the example detailed above indicate that to achieve a greater than 80% probability of establishing a significant AUC of greater than 0.7, which would confirm the utility of the microRNA as a biomarker of response, the trial should contain approximately 150 patients. These patients should be 50% Responders:50% Non-Responders. The user can return to the Setup tab, where trial parameters can be altered to evaluate a narrower range of trial scenarios to refine the optimum design.

1.9.2. Binomial biomarker data

Biomarker values can be in the form of a binomial variable, where the biomarker can be ‘present’ or ‘absent’ as defined by a diagnostic test or threshold value. In this example, we will consider a KERUS™ user wishing to design a trial to test the presence of a microRNA biomarker to predict rheumatoid arthritis response to Rituximab. The KERUS™ user should first define the study and trial parameters such as simulated cohort sizes, trial group structure and allocation ratios as described in the KERUS™ instructional videos. Cohort sizes of 100, 150, 200 and 400 were assigned in a case-control design with subject allocation between groups of 0.25:0.75, 0.5:0.5 and 0.75:0.25. A case-control design should be selected as this is a retrospective trial i.e. the investigator has knowledge of the response of each subject before the trial starts. As the biomarker is designed to detect a positive response, the ‘Non-Responders’ group should be defined the ‘Control’ group. This allows KERUS™ to define which group is the reference in any statistical tests performed.

In this example, data on several variables, including inter-correlation coefficients, are available describing the characteristics of each group in the treatment cohort. These variables should be defined within the Variables panel as ‘Correlated’ variables using the summarised data below. See the KERUS™ instructional videos for more details on defining variables. Here, the expression of the biomarker is available as a binomial variable, however, if it were available as continuous variable it could be recoded to binomial using KERUS™’ Derived variable function. In order to achieve this the continuous variable should be simulated with any appropriate variable relationships, then by adding a variable under the Derived tab the user can define criteria on which presence or absent is assessed e.g. biomarker value is greater than (“>”) 3.8.

KERUS™ users should note that mean proportions for binomial variables should lie between 0 and 1 e.g. 0.608 and 0.657 for gender in this example.

s several of these variables are known to be correlated with each other within the groups, the relationships between them should be defined. See the KERUS™ instructional videos for more details on defining variable relationships. The user has data on three inter-variable correlations, which are summarised in the table below. These data should be used to define the available relationships. Any variable relationship for which the user does not have data should be removed from the ‘Relations’ list using the ‘Delete Relation’ button. See What if I don’t know the correlation coefficients between variables? for more details.

Once the user has defined all relevant variables and any variable relationships, they should proceed to the Analysis tab, where statistical tests and analysis objectives can be defined for the simulated trial data. See the KERUS™ instructional videos for more details on defining statistical analysis options. Within the Statistics panel, the user should select the microRNA biomarker variable from the ‘Variable’ drop-down menu. KERUS™ will automatically populate the adjacent ‘Test/Model’ and ‘Grouping’ menus with the options available to that variable distribution. As the microRNA biomarker variable is a bionomial variable KERUS™ provides Sensitivity/Specificity analysis amongst the available test options. In this example, the user should select ‘SensSpec’ as the test and ‘Group’ from the ‘Grouping’ drop-down menu to compare between the Responders and Non-Responders groups.

Clicking the ‘Confirm’ button located at the bottom of the Statistics panel, transfers the test information to the Objectives panel. When the microRNA expression variable is selected from the ‘Variable’ drop-down menu in the Objectives panel, the SensSpec test will be available in the ‘Model’ drop-down menu, and the ‘Statistic’ drop-down menu will be automatically populated with the key statistical outputs of the test. In this example only a single grouping option is available – Responders vs Non-Responders. The user can define logical tests on the key test statistics such as a sensitivity of greater than (“>”) 0.65, or a specificity of greater than 0.65.

These two objectives can be combined using ‘And’ logic, to generate a third analysis objective in which the two primary objectives must be simultaneously satisfied within the same simulated dataset.

Clicking on the ‘Simulate and Evaluate’ button will cause KERUS™ to simulate the datasets, perform the statistical tests and evaluate the analysis objectives. The trial success probabilities for each objective can be viewed in the Output panel as either a barchart or heatmap. See the KERUS™ instructional videos for more details on viewing analysis result outputs.

The simulated trial data for the example detailed above indicate that to achieve a greater than 80% probability of establishing the biomarker has a sensitivity of greater than 0.65 and a specificity of greater than 0.65, which would confirm the utility of the microRNA as a biomarker of response, the trial should contain approximately 100 patients. These patients should be 50% Responders:50% Non-Responders. The user can return to the Setup tab, where trial parameters can be altered to evaluate a narrower range of trial scenarios to refine the optimum design.

The ability to quickly and accurately identify patients with a disease, specific subtype, or likely response to treatment has received much interest in recent years. Multivariate logistical models based on a variety of patient characteristics and molecular markers (biomarkers) are often generated to **classify patients**. However, these models must be validated in independent cohort trials to confirm their accuracy. KERUS™ can evaluate designs for such trials including applying defined model coefficients to the simulated data or constructing a **logistical model in each simulated dataset**. This allows the user to design trials capable of validating an existing model, or for gathering sufficient information to construct a reliable and accurate model.

Biomarker values can be in the form of a continuous variable or as a binomial variable, where the biomarker can be ‘present’ or ‘absent’ as defined by a diagnostic test or threshold value. In this example, we will consider a KERUS™ user wishing to design a trial to validate an existing model using 2 microRNA biomarkers and 3 patient characteristics (previous diagnosis, patient age and tumour volume greater than 1.5 cm^{3}) to predict resistance (no pathological response) to neoadjuvant chemotherapy treatment in breast cancer patients.

The KERUS™ user should first define the study and trial parameters such as simulated cohort sizes, trial group structure and allocation ratios as described in the KERUS™ instructional videos. In this example, cohort sizes of 100, 150, 200 and 400 were assigned in a case-control design with subject allocation between groups of 0.25:0.75, 0.5:0.5 and 0.75:0.25. A case-control design should be selected as this is a retrospective trial i.e. the investigator has knowledge of the response of each subject before the trial starts. As the logistic model is designed to identify non-responders, the ‘Responders’ group should be defined the ‘Control’ group. This allows KERUS™ to define which group is the reference in any statistical tests performed.

In this example, data on several variables, including inter-correlation coefficients, are available describing the characteristics of each group in the treatment cohort. These variables should be defined within the Variables panel as ‘Correlated’ or ‘Independent’ variables using the summarised data in the table below. The expression of microRNA A and microRNA B were shown to have no significant correlation to any other measured variables and so should be simulated independently. See the KERUS™ instructional videos for more details on defining variables.

KERUS™ users should note that mean proportions for binomial variables should lie between 0 and 1 e.g. 0.3 and 0.67 for ‘Previous Diagnosis’ in this example.

As several of these variables are known to be correlated with each other within the groups, the relationships between them should be defined. See the KERUS™ instructional videos for more details on defining variable relationships. The user has data on two inter-variable correlations, which are summarised below. These data should be used to define the available relationships. Any variable relationship for which the user does not have data should be removed from the ‘Relations’ list using the ‘Delete Relation’ button. See What if I don’t know the correlation coefficients between variables? for more details.

Once the user has defined all relevant variables and any variable relationships, they can derive a new variable using logistic regression on which to perform their trial analysis. See the KERUS™ instructional videos for more details on defining derived variable options. Clicking on the ‘Add Variable’ button from within the Derived tab of the Variables panel allows the definition of regression covariates. The KERUS™ user should supply a descriptive name and label for the new variable, after which it will be possible to select the ‘Advanced’ method of variable derivation (linear or logistic regression). With ‘Logistic’ selected as the ‘Model’, the user can add variables of interest to the covariates list. These variables will be used to construct the binomial values which will constitute the new variable.

If the user has a pre-existing model which they would like to validate in the designed trial, the coefficients can be inputted. Activating the ‘Define Coefficients’ box creates a new data input table, where coefficients for each covariate should be entered.

If ‘Define Coefficients’ tick box is not active, KERUS™ will apply a logistic regression to each simulated dataset using the indicated covariates. This will allow the user to design a trial for constructing an initial model with sufficient specificity or sensitivity. The calculated coefficients from each simulation will not be available to prevent user mis-interpretation. See How can I view the coefficients that KERUS™ defined for my regressions when deriving new variables? for more details.

After deriving the new classifier variable the user should proceed to the Analysis tab, where statistical tests and analysis objectives can be defined for the simulated trial data. Within the Statistics panel, the user should select the classifier variable from the ‘Variable’ drop-down menu. KERUS™ will automatically populate the adjacent ‘Test/Model’ and ‘Grouping’ menus with the options applicable to that variable distribution. As the classifier variable is a bionomial variable KERUS™ provides Sensitivity/Specificity analysis amongst the available test options. In this example, the user should select SensSpec as the test and ‘Group’ from the ‘Grouping’ drop-down menu to compare between the Responders and Non-Responders groups.

Clicking the ‘Confirm’ button located at the bottom of the Statistics panel, transfers the test information to the Objectives panel. When the classifier variable is selected from the ‘Variable’ drop-down menu in the Objectives panel, the SensSpec test will be available in the ‘Model’ drop-down menu, and the ‘Statistic’ drop-down menu will be automatically populated with the key statistical outputs of the test. In this example only a single grouping option is available – Responders vs Non-Responders. The user can define logical tests on the key test statistics such as a sensitivity of greater than (“>”) 0.7, or a specificity of greater than 0.7.

These two objectives can be combined using ‘And’ logic, to generate a third analysis objective in which the two primary objective must be satisfied within the same simulated dataset.

Clicking on the ‘Simulate and Evaluate’ button will cause KERUS™ to simulate the datasets, perform the statistical tests and evaluate the analysis objectives. The trial success probabilities for each objective can be viewed in the Output panel as either a barchart or heatmap. See the KERUS™ instructional videos for more details on viewing analysis result outputs.

The simulated trial data for the example detailed above indicate that to achieve a greater than 80% probability of establishing that the classifier has a sensitivity of greater than 0.7 **and** a specificity of greater than 0.7, which would confirm the utility of the model as a classifier of response, the trial should contain approximately 100 patients. These patients should be 50% Responders:50% Non-Responders. The user can return to the Setup tab, where trial parameters can be altered to evaluate a narrower range of trial scenarios to refine the optimum design.

The development of diagnostic tests and devices to quickly and accurately identify patients with a disease, and therefore allow treatment to begin sooner, requires substantial investment. Such tests or devices require validation in trials on independent cohorts to confirm their accuracy. KERUS™ can evaluate designs for these validation trials. The results of these tests and devices can be continuous or, more likely, binomial (presence or absence of disease).

The simulation and evaluation of the trial designs can therefore be handled in the same fashion as described in How do I design a trial to test a biomarker’s ability to discriminate between treatment responders and non-responders?

1.12.1. How do I design a trial to test the difference in patient survival between two treatments?

The endpoint of many clinical trials is the evaluation of any difference in survival between treatments. Patient survival can be affected by a number of factors other than treatment such as disease stage, disease subtype or patient age. KERUS™ can model the survival times of patients accounting for correlations to such factors and perform Kaplan-Meier and Cox proportional hazards analysis on the resulting simulated data. Examining the survival of virtual subgroups can allow the user to identify patient characteristics which may be associated with prognosis for evaluation in future trials. The analysis of survival data can have many irregular aspects such as non-uniform censorship rates and varying follow-up time. Before conducting a survival analysis the user should review the Survival Trial Design subsection of the FAQs for details on design options.

In this example, we will consider a KERUS™ user wishing to design a trial with 60-month follow-up to test for a significant increase in prostate cancer patient survival time with Docetaxel and luteinizing hormone-releasing hormone (LHRH) agonists compared to LHRH agonists alone accounting for disease stage.

The KERUS™ user should first define the study and trial parameters such as simulated cohort sizes, trial group structure and allocation ratios as described in the KERUS™ instructional videos. Cohort sizes of 100, 150, 200 and 400 were assigned in a **two group parallel trial design** with subject allocation between groups of 0.25:0.75, 0.5:0.5 and 0.75:0.25. The user would like to compare the enhanced treatment to that of the standard LHRH agonist treatment, therefore, the LHRH agonist only treatment group should be given a rank of 1, and the combination treatment a rank of 2. This allows KERUS™ to define which group is the reference in any statistical tests performed.

In this example, data on several variables, including inter-correlation coefficients, are available describing the characteristics of each group in the treatment cohort. These variables should be defined within the Variables panel as ‘Correlated’ or ‘Independent’ variables using the summarised data in the table below. KERUS™ users should note that survival time can only be defined using the exponential or Weibull distributions. When defining the survival time variable, the ‘Study End Point’ should be set to 60 months. The censorship (drop-out) of patients and disease stage were shown to have significant correlations to survival time i.e. censorship was weighted towards earlier timepoints and earlier stage patients achieved greater survival times. Therefore, they should be simulated as correlated variables. See the KERUS™ instructional videos for more details on defining variables and this FAQ for more information on non-uniform censorship rates.

KERUS™ users should note that mean proportions for binomial variables should lie between 0 and 1 e.g. 0.12 and 0.09 for ‘Censorship’ in this example.

As several of these variables are known to be correlated with each other within the groups, the relationships between them should be defined. See the KERUS™ instructional videos for more details on defining variable relationships. The user has data on two inter-variable correlations, which are summarised in the table below. These data should be used to define the available relationships. Any variable relationship for which the user does not have data should be removed from the ‘Relations’ list using the ‘Delete Relation’ button. See What if I don’t know the correlation coefficients between variables? for more details.

Once the user has defined all relevant variables and any variable relationships, they should proceed to the Analysis tab, where statistical tests and analysis objectives can be defined for the simulated trial data. The user should consult the relevant FAQs if they wish to examine the effect of altered censorship rates orstudy lengths. See the KERUS™ instructional videos for more details on defining statistical analysis options. Within the Statistics panel the user should select the survival time variable from the ‘Variable’ drop-down menu. KERUS™ will automatically populate the adjacent ‘Test/Model’ and ‘Grouping’ menus with the options available to that variable distribution. As the survival time variable is a continuous Weibull variable KERUS™ provides Kaplan-Meier and Cox Porportional Hazard analysis amongst the available test options. In this example, the user should select ‘CoxPH’ as the test and ‘Group’ from the ‘Grouping’ drop-down menu to compare LHRH with LHRH and Docetaxel treatment groups. A second statistical test should also be defined by selecting ‘CoxPH’ as the test and ‘Group and Stage’ from the ‘Grouping’ drop-down menu to compare LHRH with LHRH and Docetaxel treatment groups subsetted by disease stage. When a survival analysis method (Kaplan-Meier or Cox Porportional Hazards) is selected as the statistical test, a forth drop-down menu is created where the user can select the variable representing the censorship for each patient in the simulation. In this example, only a single censor option is available – the binomial variable ‘Censorship’.

Clicking the ‘Confirm’ button located at the bottom of the Statistics panel, transfers the test information to the Objectives panel. When the survival time variable is selected from the ‘Variable’ drop-down menu in the Objectives panel, the CoxPH test will be available in the ‘Model’ drop-down menu, and the ‘Statistic’ drop-down menu will be automatically populated with the key statistical outputs of the test. In this example, multiple grouping options are available. The user can select LHRH vs LHRH and Docetaxel to examine the overall survival difference. The disease stage variable will have been used during the simulation of the survival times but this will be a univariate analysis by treatment group. The user could also select LHRH vs LHRH and Docetaxel in subgroups of disease stage. This will subset the patients based on disease stage and perform survival analysis on only those patients, allowing the user to determine if certain disease stages are more or less sensitive to the altered treatment regimen. The user can define logical tests on the key test statistics such as a p-value of less than (“<“) 0.05, or a hazard ratio of greater than (“>”) 1.5.

These two objectives can be combined using ‘And’ logic, to generate a third analysis objective in which the two primary objectives must be simultaneously satisfied within the same simulated dataset.

The simulated trial data for the example detailed above indicate that to achieve a greater than 80% probability of establishing the original treatment regimen has a significant hazard ratio of greater than 1.5 compared to the combination treatment, which would confirm the utility of the the new regimen, the trial should contain approximately 150 patients. These patients could be 25% Responders:75% Non-Responders or 50% Responders:50% Non-Responders. The user can return to the Setup tab, where trial parameters can be altered to evaluate a narrower range of trial scenarios to refine the optimum design.

1.12.2. How can I define random drop-out of participants within the study?

Trials with a survival analysis component usually have a predetermined duration or number of observed ‘events’ which define the trial end point. If a subject drops out of the trial before the end point without having an ‘event’, that subject is ‘censored’ at their time of leaving. Also, if subjects have not had an ‘event’ before reaching the trial end criteria (duration or number of events), they are ‘censored’ at the trial end point. Subjects can be ‘censored’ or not, and as such KERUS™ is able to use any directly defined, or derived, binomial variable as a record of censorship. Binomial variables created through the multinomial distribution cannot be used. KERUS™ users may model the effect of random subject drop-outs (censorship) from the study by defining either an independent binomial variable, where censorship is expected to be completely random, or a correlated binomial variable, where a relationship between censorship and one or more other variables is expected. See the KERUS™ instructional videos for more details on defining variables. For easy identification of the binomial variable representing censorship, it is advisable to label the variable clearly with “DropOuts”, “Censor”, or similar. The mean value defined for the binomial variable should be the average proportion of subjects expected to drop-out of the study (be censored). If, for example, co-morbidity is greater in one group, subjects may be more likely to leave the study. Therefore, it is possible to define different proportions for each subject group.

When Kaplan-Meier is selected as the statistical analysis method in the Statistics panel of the Analysis tab, qualifying binomial variables will be available through the Censor drop-down menu. See the KERUS™ instructional videos for more details on defining statistical analysis methods.

The selected censor variable will allow KERUS™ to generate a censor tag associated with the assigned proportion of subjects during the data simulation. These subjects will be censored within the added Kaplan-Meier analysis.

1.12.3. How can I vary the drop-out rate depending on the time interval from the start of the study?

In many cases, the rate of subject drop-out (censorship) from a trial is not completely random, but can instead vary with time. KERUS™ users may weight subject censorship rates to increase or decrease with time. In order to achieve this, the binomial censorship and Weibull/exponential time-to-event variables should be added as correlated variables within KERUS™. Changing a variable between independent and correlated is achieved through the “Correlated?” tick box within the Add Variable dialogue (See the KERUS™ instructional videos for more details on defining variables).

Defining a relationship between these two variables will vary the censorship rates with time. A positive correlation coefficient will weight censorship to later time points. A negative correlation coefficient will weight censorship to earlier time points. See the KERUS™ instructional videos for more details on defining variable relationships.

After defining the relationship between these two variables, they can be used to define the Kaplan-Meier analysis method and KERUS™ will correlate the distributions to generate the censorship rate weighted by time.

1.12.4. How can I define variable follow up periods for individual recruits?

It is likely that not all subjects will be recruited to a trial or present for final assessment at the same time, creating the situation where the follow-up interval varies between subjects. KERUS™ users can model the impact of this variation in individual follow-up interval on trial success probabilities by defining either an independent normal variable, where follow-up interval is expected to be completely random, or a correlated variable, where a relationship between follow-up interval and one or more other variables is expected. See the KERUS™ instructional videos for more details on defining variables and variable relationships. For easy identification of the normal variable representing individual follow-up intervals, it is advisable to label the variable clearly with “Follow”, “FollowUp” or similar. The mean and standard deviation values defined for each group should be the average and standard deviation of the follow-up interval for subjects within that group. It is possible to define different values for each group. However, these values must be in the same units (days, months, years) as any associated time-to-event variables. It is also advisable to use the available truncation parameters when defining the normal variable so that the follow-up interval maximum is set to the planned study length. This will not alter the statistical analysis but will aid in data interpretation should the simulated data be exported. Choosing “Force Limits” from the truncation options will replace any subject whose simulated follow-up interval exceeds the planned study length with that study length value.

When Kaplan-Meier is selected as the statistical analysis method in the Statistics panel of the Analysis tab, qualifying normal variables (and qualifying binomial variables) will be available through the Censor drop-down menu.

The selected follow-up interval variable will allow KERUS™ to adjust the individual censoring for each subject based on the actual follow-up interval for that subject. Subjects whose time-to-event value is greater than the respective individual follow-up interval will be censored at their individual follow-up interval within the added Kaplan-Meier analysis.

1.12.5. How can I compare the effect of different drop-out rates within the study?

KERUS™ users may wish to model the impact of different subject drop-out rates (censorship) on the trial success probabilities in order to examine the plasticity of the trial design to worse-case scenarios. Currently, KERUS™ only allows one censor variable to be analysed for each time-to-event Weibull/exponential variable. Multiple Kaplan-Meier analysis with different grouping parameters can be added, but they must have the same censor variable selected. In order to assess the impact of different censorship rates, a new copy of the time-to-event variable must first be created. This is accomplished through the definition of a variable derived from the original time-to event variable. Clicking on the ‘Add Variable’ button from within the Derived variable tab allows the definition of parameters capable of copying an already defined variable. The KERUS™ user should supply a name and label for the new variable. The addition of “Copy 1” and “C1” appended to the original time-to-event variable name and label is recommended. Within the ‘Simple’ option, the original time-to-event variable should be selected as Argument 1; the addition (“+”) operator should be selected from the operator list; and the value of 0 (zero) should be selected as Argument 2. Clicking “OK” will save the new variable.

The original time-to-event variable or the copy can be chosen as the subject of a Kaplan-Meier analysis in the Statistics panel of the nowith the designation of different censor variables.

The “DropWC” variable indicates a second binomial variable representing the predicted worse-case censorship rates. These two Kaplan-Meier analyses will be individually selectable within the Objectives panel of the Analysis tab where trial success criteria should be entered for each analysis. Therefore, the analyses with different censor variables will have separate outputs in the final Output tab.

1.12.6. How can I compare the impact of study follow up time on my survival analysis?

In order to minimise financial and safety considerations, KERUS™ users may wish to model the impact of different follow-up intervals (trial lengths) on the trial success probabilities. Currently, KERUS™ only allows one follow-up interval per trial. In order to assess the impact of different follow-up intervals, a new copy of the trial must first be created. This is most easily accomplished by right-clicking on the original, complete trial and selecting “Copy”. This new trial should be assigned a new Trial Label, which clearly indicates its identity.

In the new trial, the time-to-event Weibull/exponential variable should be selected from the Variable list within the Correlated or Independent tabs. Once selected, the “Edit Variable” button will become available, which allows the parameters of the variable to be viewed and edited. Altering the “Follow-up Period” located within the Censoring Parameter area, changes the trial follow-up interval.

KERUS™ users should note that there is no need to change the names of any variables in the copied trial, provided the variable distribution types are not changed between trials. The time-to-event and respective censor variables should be added to a Kaplan-Meier analysis to determine survival characteristics and analysis objective success. Each statistical method and analysis objective assigned in the Analysis tab will be evaluated in all trials and therefore trial success probabilities will be calculated for the different follow-up intervals assigned in each copy of the trial.

The trial success probabilities for each trial will be displayed as separate trial scenarios in the Output panel.

Trial scenarios 1 to 4 are generated from within Trial001 with increasing sample numbers in each consecutive scenario. Trial scenarios 5 to 8 are generated from within Trial002, which is a copy of Trial001 with the exception of a longer follow-up interval.

2.1. How can I simulate missing data?

In some cases, values for a variable will not be available for all subjects in a trial, creating missing data. For example, a subject who is not achieving an acceptable level of response to treatment may exit the trial without final measurements being taken. KERUS™ is capable of modelling these missing data in the simulations, by truncating the simulated variable values. Selecting to perform this function recodes any values outside defined thresholds as missing data. The variable truncation options can be set during the initial variable definition or when editing the variable. Clicking the ‘Add Variable’ button under the Correlated or Independent tab in the Variables panel of the KERUS™ Setup screen opens a new window where variable parameters can be added. This window can also be accessed by selecting the variable and clicking the ‘Edit Variable’ button.

When the selected variable distribution is normal, the Truncation Parameters panel will be available at the bottom of the window. Exponential and Weibull distributions have a similar Censoring Parameter panel, which should not be used to define missing data.

Entering values into the ‘Lower’ and ‘Upper’ text boxes defines the thresholds that define the ‘Out of Range’ data. Simulated values less than that specified in the ‘Lower’ box and values greater than that specified in the ‘Upper’ box will be recoded. It is possible to leave either, or both, boxes blank and KERUS™ will not apply a limit to the allowed values in that direction. The KERUS™ user should select ‘Missing’ from the ‘Out of Range’ drop-down menu in order for KERUS™ to recode the values outside the defined range as missing. After simulating the variable values, with any correlation to other variables, the selected values are recoded to missing.

2.2 How can I simulate truncated data?

In some cases, values for a variable will be from a truncated distribution. For example, patients’ age with a disease may have a normal distribution, however, trial protocol stipulates that only patients within a defined age range can be recruited. KERUS™ can model these truncated distributions, so that all values will be forced within the limits while maintaining the underlying distribution characteristics. The variable truncation options can be set during the initial variable definition or when editing the variable. Clicking the ‘Add Variable’ button under the Correlated or Independent tab in the Variables panel of the KERUS™ Setup screen opens a new window where variable parameters can be added. This window can also be accessed by selecting the variable and clicking the ‘Edit Variable’ button.

When the selected variable distribution is normal, the Truncation Parameters panel will be available at the bottom of the window. Exponential and Weibull distributions have a similar Censoring Parameter panel, which should not be used to define limits of truncated distributions.

Entering values into the ‘Lower’ and ‘Upper’ text boxes defines the thresholds that define the ‘Out of Range’ data. Simulated values will be greater than that specified in the ‘Lower’ box and less than that specified in the ‘Upper’ box. It is possible to leave either, or both, boxes blank and KERUS™ will not apply a limit to the allowed values in that direction. The KERUS™ user should select ‘Force Limits’ from the ‘Out of Range’ drop-down menu in order for KERUS™ to set limits of the range of simulated values. All simulated values for the selected variables will lie within the defined range.

2.3. How can I simulate Winsorised data?

In some cases, values for a variable will be from a distribution with limits of the applicability of the values. For example, many sensor based measurements will have saturation limits which a value cannot exceed. Analytical measurements based on a calibration will have a lower limit of detection, below which the exact value cannot be trusted and so values below this are typically replaced by a fixed value such as 0 or half the detection limit. KERUS™ can model these truncated distributions, so that all values outside a limit are replaced by a fixed value. This fixed value will be half the value of a lower limit or match an upper limit while maintaining the underlying distribution characteristics of values that lie within the dynamic range.

The variable truncation options can be set during the initial variable definition or when editing the variable. Clicking the ‘Add Variable’ button under the Correlated or Independent tab in the Variables panel of the KERUS™ Setup screen opens a new window where variable parameters can be added. This window can also be accessed by selecting the variable and clicking the ‘Edit Variable’ button.

When the selected variable distribution is normal, the Truncation Parameters panel will be available at the bottom of the window. Exponential and Weibull distributions have a similar Censoring Parameter panel, which should not be used to define limits of truncated distributions.

Entering values into the ‘Lower’ and ‘Upper’ text boxes defines the thresholds that define the ‘Out of Range’ data. Simulated values between limits will not be changed, but values below the lower value will be set to half that lower limit value, while values exceeding the upper limit will be set to that upper limit value. The KERUS™ user should select ‘Fixed Value’ from the ‘Out of Range’ drop-down menu in order for KERUS™ to set limits of the range of simulated values. All simulated values for the selected variables will lie within the defined range, but may have elevated proportions at the two limits if a significant proportion of samples fall outside of those limits.

2.4. How do I model data with outliers?

KERUS™ simulates variables based on the assigned distributions and relationships to other variables. However, in any trial biological or technical variation will cause some subjects to obtain statistically unusual values for a variable given their characteristics. These values are known as outliers. Outliers can be manually added to a simulated variable to evaluate the strength of the trial design to abnormal values. Creating these outliers within KERUS™ occurs in several stages. The first is to define a binomial variable representing the probability of that subject being an outlier, and a continuous variable, such as a normally distributed variable, representing the amount the outlier value should deviate by. For easy identification, it is advisable to label these variables with ‘pOutlier’ and ‘dOutlier’, or similar, respectively. See the KERUS™ instructional videos for more details on defining variables.

These variables can be independent, where outliers are expected to be completely random in both probability and deviation, or correlated, where a relationship between outlier probability, deviation and other variables is expected. For example outlier deviation may be skewed towards larger values, in which case the probability and deviation of outliers would be positively correlated. See the KERUS™ instructional videos for more details on defining variable relationships.

The second stage is to generate a new simple derived variable using ‘pOutlier’ and ‘dOutlier’. Clicking on the ‘Add Variable’ button from within the Derived variable tab opens a new data window where derived variable formulas can be entered. The KERUS™ user should supply a name and label for the new variable, for example ‘Outliers to add’ and ‘OutAdd’, or similar, respectively. Within the ‘Simple’ option, the ‘dOutlier’ variable should be selected as Argument 1; the multiplication (“*”) operator should be selected from the operator list; and the ‘pOutlier’ variable should be selected as Argument 2. Clicking “OK” will save the new variable. Where the binomial variable was 0 (zero) i.e. that subject was not selected as an outlier, the multiplication will yield a 0 (zero). Where the binomial variable was 1 i.e. that subject was selected as an outlier, the multiplication will yield the value from the ‘dOutlier’ variable.

Finally, the KERUS™ user should define an additional derived variable using the original variable of interest in the study, in this case ‘BioA’, and the recently derived ‘ToAdd’ variable. A descriptive name such as ‘BioAOut’ or similar should be supplied. Within the ‘Simple’ option, the ‘BioA’ variable should be selected as Argument 1; the addition (“+”) operator should be selected from the operator list; and the ‘ToAdd’ variable should be selected as Argument 2. Clicking “OK” will save the new variable.

This new variable will contain outliers defined by the probability and deviation parameters set within ‘pOutlier’ and ‘dOutlier’ and can be used within an appropriate statistical test and analysis objectives. See the KERUS™ instructional videos for more details on defining statistical tests and analysis objectives.

When defining a new derived variable through linear or logistic regression, KERUS™ users have two options. The first is to supply externally pre-calculated covariate coefficients. The second option is to allow KERUS™ to calculate covariate coefficients internally. These coefficients are calculated separately for each simulated dataset which contain different, randomly generated data and so the resultant covariate coefficients can vary greatly. It is not advisable to interpret the simulated covariate coefficients as a ‘true-to-life’ model of the regression and therefore the ability to view the calculated covariate coefficients has been purposefully excluded from KERUS™.

2.6. I only have medians and ranges, how do I define my variables?

KERUS™ simulates variables based on the assigned distributions and relationships to other variables. KERUS™ must be supplied with sufficient information on the variable distribution type and associated parameters for effective simulation. Medians and ranges do not supply enough information on the distribution and, therefore, cannot be used to define variables within KERUS™. It is advisable to return to the original data to determine the distribution type and appropriate parameters.

2.7. How do I define skewed data?

Not all variables will be describable using KERUS™’s available distribution types and parameters. However, these variables can be of interest to the user and essential to the simulation of a realistic trial. In order to simulate these variables, the user can attempt to transform the data to better approximate an available distribution prior to calculating the distribution parameters and correlations in their preferred statistical application. In many cases the logarithmic or reciprocal transformation of the variable values can create a better distribution fit. Any correlations to other trial variables should be determined using the transformed values; the original values should not be used.

When defining the variable, through the ‘Add Variable’ button, the distribution parameters for the transformed data should be entered. Additionally, any correlations to other variables should be entered based on the transformed data, through the subsequent ‘Edit Relation’ option. See the KERUS™ instructional videos for more details on defining variables and variable relationships.

This will allow KERUS™ to simulate the transformed values in each dataset with the respective correlations to other variables. The user is advised to include information on the transformation in the variable name. For example a normal distribution used to describe log transformed values of variable 1 could be called LogVar1.

Backtransform to distribution of original data.

The KERUS™ user may wish to then use the distribution of the original data in the statistical analysis to assess trial success probabilities. The simulated values for the variable based on the transformed parameters can be converted by deriving a new variable. Clicking ‘Add Variable’ under the Derived tab in the Variable panel on the KERUS™ Setup tab opens a new window where parameters capable of ‘back-transforming’ the variable can be entered. The addition of “Back-transformed” and “bt”, or similar, prefixed to the target variable name and label is recommended. The method for ‘back-transforming’ the data will depend heavily on the original transformation applied. If, for example, the values were log10 transformed, a value of 10 should be selected as Argument 1; the exponent (“^”) operator should be selected from the operator list; and the target variable should be selected as Argument 2. If, the values were reciprocally transformed, a value of 1 should be selected as Argument 1; the division (“/”) operator should be selected from the operator list; and the target variable should be selected as Argument 2. Clicking “OK” will save the new variable.

The new derived variable will contain data under the original skewed distribution and can be selected for statistical analysis within the Analysis tab panels. KERUS™ users should consider carefully which statistical tests are performed on the skewed data as to ensure that underlying test assumptions are not violated.

2.8 How do I view the values of a defined variable?

KERUS™ users may find it necessary to review the inputted parameter values (mean, standard deviation, etc) for a variable if unexpected trial analysis results are returned. The defined parameters for a given variable can be assessed by selecting that variable from within the appropriate Variable tab in the KERUS™ Setup screen. Once selected, the “Edit Variable” button will become available, which allows the parameters of the variable to be viewed and/or edited.

Clicking the ‘Edit Variable’ button will open a new window containing the information used to define that variable such as distribution type, parameters and truncation options. If an error is found in the values entered here they can be replaced and the variable saved. All previous trial simulation and analysis data would then be invalid and is automatically removed from the KERUS™ session. The data should be re-simulated and re-evaluated under the new variable parameters.

If the KERUS™ user wishes to audit multiple variables, it may be more convenient to export all variable parameter setting as a Microsoft Excel Workbook for review. See How do I export the information I used to define my study? for a guide of this process. This file can be opened using Microsoft Excel (2007 or later) or OpenOffice Calc (version 3.0 or later) and will contain two worksheet tabs: ‘Data Summary’ where trial and defined variable parameters can be found, and ‘Correlations’ where defined variable relation coefficients can be found.

If the KERUS™ user instead wishes to view the simulated values for a variable in each trial scenario, the raw simulated data can be viewed within KERUS™ or exported to a comma separated value (.csv) file. This file can be easily opened in a variety of programs such as Microsoft Excel or OpenOffice Calc. See How do I save my simulated data to analyse in another program? for a guide of the data export process.

2.9. Why is the Variable panel in the Setup tab not visible?

In order to successfully define variable and variable relationship parameters within KERUS™, several other parameters, such as number and size of subject groups, must first be set. To prevent erroneous definition of variable parameters, KERUS™ removes the Variable setup panel until these prerequisite parameters are defined. The text box and buttons governing the input of these parameters are found within the Trial Information panel of the Setup tab.

Detailed information on the definition of the Trial Information parameters can be found in the Kerus™ user manual or in the KERUS™ instructional videos. In brief:

• The Trial Label must be changed from ‘unnamed’ to a unique, descriptive label of no more than 8 characters.

• A series of trial Sample Sizes to be simulated should be entered for the sample sizes parameter. This will be the total number of subjects in the simulated trials.

• The Trial Design, which includes the type of design, number of groups and group names, must be defined.

• The allocation ratio of subjects to each previously defined group can be set. If not, the allocation ratio defaults to an even distribution of subjects between groups.

Once defined the respective buttons will turn green to indicate the presence of parameter data, and the Variables panel will appear in the lower area of the KERUS™ Setup tab.

Many of the values defined for the Trial Information parameters can be altered at any time to examine the effect on trial success probabilities. However, this will invalidate any simulated data or analysis and so this data will automatically be removed from the KERUS™ session. The ‘Simulate and Evaluate’ button (located at the bottom of the Objectives panel of the Analysis tab) should be used to re-simulate and analyse the data.

2.10. What happens if I set a range for a distribution parameter?

If variable information is available from multiple sources (which differ in specified parameter values) or if the user has the confidence intervals for the parameter estimate, the KERUS™ user may want to include this uncertainty in the variable simulation. The distribution parameters for each variable can be set using a single value or a range. The range (minimum and maximum) of parameter values can be entered separated by a space.

If parameter values are entered as a range, KERUS™ will randomly generate values within this range for the parameter value in the simulated datasets. These values are generated from a uniform distribution and each simulated dataset will have different parameter values assigned for that variable

2.11. How do I change a variable from independent to correlated or correlated to independent?

KERUS™ users can define variables as ‘Correlated’, ‘Independent’ or ‘Derived’, but may wish to alter the variable type if new information becomes available. Correlated and independent variables use the same distribution information for simulation, however, the values of a correlated variable are simulated with a defined relationship to other variables in the trial. The KERUS™ user can easily switch a correlated variable to an independent variable, and vice versa, without the need to re-enter the distribution parameters. The ‘Edit Variable’ button becomes available after selecting a variable from the Variables list under the appropriate variable tab.

Clicking the ‘Edit Variable’ button opens a new data window, showing the distribution information for the selected variable. The type of variable can be switched through the ‘Correlated?’ tick box. If this box is ticked, the variable will be defined as correlated, will appear in the Variables list within the Correlated tab and the user should define any relations to other correlated variables. If this box is unticked, the variable will be defined as independent, will appear in the Variables list within the Independent tab and its values will be simulated independently of all other trial variables.

Derived variables are calculated based on the values within other simulated variables and cannot be switched to another variable type in this way.

2.12. What do I need to define a derived variable?

KERUS™ allows the user to derive new variables based on existing variables within the simulated data.

There are two methods for defining the new variables: ‘Simple’ and ‘Advanced’.

When ‘Simple’ is selected, the user can choose from a variety of mathematical or logical operators and select up to two variables as arguments. If only a single variable is selected, a specific value must be defined as the second argument to be applied to all values within the first argument. The ‘Simple’ derivation of new variables can not accept variables with multinomial distributions as arguments.

Depending on the variable distributions entered as arguments there are limitations placed on the operators that can be used and the distribution of the outputted variable.

- If one argument is a variable of a binomial distribution, only the multiplication ‘×’ mathematical operator is allowed and the outputted variable is set to the distribution of the non-binomial variable.
- If both arguments are variables of binomial distribution, only the logical operators (<, >, =, etc) can be used and the outputted variable will therefore be of a binomial distribution.
- When performing mathematical operations on two variables of different distributions (excluding binomial) the output variable is set to the distribution of the variable entered as
**argument 1**. - If a continuous variable (normal, exponential or Weibull) is transformed by taking the logarithm, the resulting output variable will always be set as normally distributed.
- If a continuous variable is transformed by defining its exponential, the resulting output variable will always be set as exponentially distributed.

When ‘Advanced’ is selected, the user has the option to choose ‘Logistic’ or ‘Linear’ regression. Logistic regression requires a binomial variable as the defined ‘Response’ in order to generate a *de novo* logistic model e.g. patient group (assuming two patient groups in the trial).

If no suitable binomial variable is available, the derivation panel will require the user to input the coefficients from an externally calculated model. There is no limit on the variable types that can be defined as covariates in the logistical model.

Linear regression requires a normally distributed variable as the defined ‘Response’ in order to generate a *de novo* model.

If no normally distributed variable is available, the derivation panel will require the user to input the coefficients from an externally calculated model. There is no limit on the variable types that can be defined as covariates in the linear model.

2.13. How is subject censorship coded?

Trials with a survival analysis component usually have a predetermined duration or number of observed ‘events’ which define the trial end point. If a subject leaves the trial before the trial end point without having an ‘event’, that subject is ‘censored’ at their time of leaving. Also, if subjects have not had an ‘event’ before reaching the trial end criteria (duration or number of events), they are ‘censored’ at the trial end point. This is more specifically known as ‘Right Censoring’, where it is known that survival is greater than a certain value but it is unknown by how much.

Subjects can be ‘censored’ or not, and as such KERUS™ is able to use any directly defined, or derived, binomial variable as a record of censorship. KERUS™ users should note that when defining subject censorship as a binomial variable, the relative proportion supplied should be that of patients dropping-out of the trial. Therefore, in any viewed or exported data, censored subjects are represented with a ‘1’; subject who remain in the trial are represented with a ‘0’. While this coding pattern matches other 3rd party software, as there is no universal standard in how the coding is implemented, this may be different from coding methods previous encountered by the KERUS™ user.

See How can I define random drop-out of participants within the study? for detailed information on the definition of subject censorship.

3.1. What if I don’t know the correlation coefficients between variables?

The power of KERUS™ lies in the ability to examine multiple correlated outcomes in the same simulated dataset to determine the probability of exceeding defined criteria in a given trial design. This requires the definition of the relationship between variables to effectively simulate the data, however, information on the relationship of every variable pair will often not be available. KERUS™ is unable to simulate any data when variable relationships are ‘Unset’. Therefore, if the relationship between two variables in unknown that relationship should be deleted from the ‘Relations’ list by selecting the relation in question and clicking the ‘Delete Relation’ button.

KERUS™ will automatically complete the internal correlation matrix used for data simulation, with a random Spearman’s Rho coefficient of between -0.5 and 0.5. If this process creates impossible correlation matrix, the correlation matrix will be adjusted and alerts generated when simulating data. The alerts can be viewed through the ‘View Alerts’ button in the View Panel of the Analysis tab. If the user feels this range is unrealistic they are advised to input a range, however speculative, that reflects the values they feel are realistic (e.g. if they are sure it is a positive correlation but not sure what magnitude they may enter “0 0.9”).

3.2. What happens if I set a range for a relation value?

If relation information is available from multiple sources which differ in specified value, or if the user has calculated a confidence interval for the correlation from their data, the KERUS™ user may want to include this uncertainty in the trial simulation. The relationship between each pair of correlated variables can be set using a single value or a range. The range (minimum and maximum) of correlation coefficients can be entered separated by a space. The values obtained from multiple sources must be the results from the same correlation test i.e. it is not possible to define a range based on the results of a Kendall’s Tau test and a Pearson’s R test. If more than two values are entered then KERUS™ will use the maximum and minimum value.

If correlation coefficient values are entered as a range, KERUS™ will randomly generate values within this range for the correlation in the simulated datasets. These values are generated from a uniform distribution and each simulated dataset will have different correlation coefficient assigned for that variable pair.

4.1. Why are the panels within the Analysis tab not visible?

KERUS™ populates the various drop-down menus in the panels of the Analysis tab based on the variables previously defined within the Setup tab. In order to stop KERUS™ users defining statistical tests or analysis objectives on variables with insufficient simulation information, and thereby causing an invalid analysis or KERUS™ software crash, the analysis tab is only populated if all variables and trial information are completely defined.

KERUS™ automatically assesses whether all requisite information has been added to a trial, and this is indicated by an ‘(I)’ (Incomplete) appended to the trial name if there is outstanding parameters, or a ‘(C)’ (Complete) if all parameters have been entered.

If KERUS™ is reporting an incomplete trial, users should confirm that Study Description, Number of Simulations, Seed, Trial Label, Sample Size, Trial Design and Allocation have been defined within the Study Information and Trial Information panels. See the KERUS™ instructional videos for more details on defining these parameters.

If the KERUS™ user has added correlated variables, the pair-wise relationship parameter should be defined for each variable combination. These parameters are initially created with ‘Unset’ attributes.

KERUS™ cannot proceed with ‘Unset’ variable relations. If the relation between two variables is unknown, it should be removed from the Relations list by selecting the relation in question and clicking the Delete Relation button. If the relation is known selecting the relation and clicking the Edit Relation button opens the relation parameter input window. All variable relations must be defined or removed, before the trial setup will be marked ‘complete’. See the KERUS™ instructional videos for more details on defining variable relations.

If the Trial Design parameter has been modified since the variables and variable relations were defined, for example, if an additional group is added to a ‘Parallel Group’ design, all variable distribution and relation values will be invalid and are automatically removed. Variables will still exist in the variable list, but will contain no parameter (mean, standard deviation, etc) information. Therefore, they must be re-defined under the new Trial Design before the trial will be ‘complete’. See the KERUS™ instructional videos for more details on defining variable distribution parameters.

If multiple trial designs have been added to the study, all trials must be completely defined before the Analysis tab will be populated. Any incomplete trials should be defined or removed to proceed with the simulation and analysis of data. Trials can be removed from the Trial List by right-clicking on the trial in question and selecting ‘Remove’.

When all trials in the study are marked as ‘Complete’, the Analysis tab should be generated when the tab is selected.

4.2. How do I evaluate multiple possible outcomes with my study?

Modern trials often must accept the heterogeneity of patient cohorts while attempting to examine multiple patient outcomes, such as survival time and adverse medical effects. The power of KERUS™ lies in the ability to examine multiple correlated outcomes in the same simulated dataset to determine the probability of exceeding defined criteria in a given trial design. To achieve this KERUS™ allows the user to define multiple statistical tests which can be performed on different variables in each simulated dataset.

These statistical tests are defined through the Statistics Panel of the Analysis tab. Activating the ‘Variable’ drop-down menu displays all the variables defined with the trial designs. Selecting a variable automatically populates the adjacent ‘Test/Model’ and ‘Grouping’ drop-down menus with all the available statistical tests and grouping options applicable to that variable.

After clicking the Add button (‘+’) the selected test formula will be added to the list of defined tests. The user can select a different variable from the respective drop-down menu and repeat the process until all desired tests have been defined.

Single or multiple objectives can be assigned to each statistical test by defining them within the Objectives panel of the Analysis tab. Activating the Variable drop-down menu within the Analysis panel displays a list of variables on which at least one statistical test was performed. Selecting a variable from this list automatically populates the remaining drop-down menus with the appropriate options.

Objectives are added to the Analysis workflow through the ‘Add Objective’ button. See How can I apply multiple analysis objectives to a single statistical test? or the KERUS™ instructional videos for detailed information of defining Analysis Objectives.

4.3. How can I apply multiple analysis objectives to a single statistical test?

KERUS™ analyses the simulated trial data based on the statistical tests and analysis objectives defined by the user within the analysis tab. These objectives can be relatively simple queries of the data, for example using a two-sample t-test, are a variable’s values significantly different between Group A and Group B? However, to utilize the full power of KERUS™, the user will wish to evaluate each trial design’s ability to satisfy several objectives.

Multiple objectives can be assigned to a single statistical test by defining them within the Objectives panel of the Analysis tab. Activating the Statistic drop-down menu displays all key statistical measures returned from the selected test (‘Model’) on the defined variable. In some cases only a single measure, such as p-value is available. After selecting a statistical measure and defining an analysis success criteria through the logic drop-down menu and Value/Limit text box, the objective can be added. The KERUS™ user can return to the Statistic drop-down menu, select a different statistical measure and repeat the process. See the KERUS™ instructional videos for more details on defining analysis objectives.

KERUS™ will determine the trial success probabilities for each objective independently when the ‘Simulate and Evaluate’ button is clicked. Analysis objectives can also be easily combined in KERUS™ once defined, using ‘And’ or ‘Or’ logic gating, to create an additional objective where all or at least one objective must be satisfied in the same simulated dataset, respectively. The KERUS™ user should select the desired analysis objectives from the list. Holding the ‘Control’ () key and left-clicking allows the selection of multiple objectives. Right-clicking on a selected objective opens the ‘Combine or Delete’ menu where the logic gating can be selected.

These combinatorial objectives can themselves be combined with additional objectives to create increasing complex trial success criteria. However, KERUS™ users should note that if a trial scenario fails to achieve a combinatorial objective it is not possible to determine which individual objective was the cause.

4.4. How can I check that my confidence intervals do not cross a particular value?

Defining simple analysis objectives with the Analysis tab allows KERUS™ users to quickly and easily evaluate how consistently a statistical output will meet a given criteria. However, many statistical outputs have associated confidence intervals, as is the case with the hazard ratio returned from a Kaplan-Meier analysis. Confidence intervals are measures of the range of values for that output and will contain 95% of all statistically possible values. Assigning an analysis objective to the ‘Hazard Ratio’ examines the average hazard ratio value returned from the analysis of each simulated dataset. For example, the objective could be that the hazard ratio is greater than 1. A hazard ratio of 1 would indicate that each group has the same event hazard. This does not, however, incorporate the statistical uncertainty in the calculation of the hazard ratio in each simulation, and therefore the analysis objective cannot evaluate whether the hazard ratio will be significantly greater than 1 in each simulation.

When a statistical output has confidence intervals calculated, it is possible to compare these to a given value, thereby ensuring that its outer statistical estimates pass the objective criteria. In the example of the hazard ratio returned from a Kaplan-Meier analysis, a KERUS™ user could define the analysis objective as the lower confidence interval of the hazard ratio being greater than (“>”) 1. This would assess whether the hazard ratio from each simulation are significantly greater than 1 i.e. there is a less than 5% chance that the ‘real’ hazard ratio is 1.

Conversely, if the KERUS™ user wished to assess if the hazard ratio was significantly less than 1, they should choose the upper confidence interval of the hazard ratio and logical operator of less than (“<“) 1.

In an exploratory study, the KERUS™ user may not have sufficient information to predict the direction of the relative hazard ratio. In this case the user may instead wish to evaluate whether the confidence intervals of the hazard ratio do not contain 1. This would make no assumptions regarding the direction of the hazard ratio, only that it must be significantly different from 1. To achieve this, a combination of analysis objectives must be added. The first objective is the lower confidence interval of the hazard ratio being greater than (“>”) 1. The second objective is the upper confidence interval of the hazard ratio being less than (“<“) 1.

The KERUS™ user should select both these analysis objectives from the list. Holding the ‘Control’ () key and left-clicking allows the selection of multiple objectives from the list. These two objectives can be combined using ‘Or’ or ‘And’ logic. Right-clicking on a selected objective opens the Combine or Delete menu. To achieve the criteria detailed above, these two objectives should be combined using ‘Or’ logic.

The newly formed combination objective will report the analysis of that simulated dataset as a success if either the lower confidence limit of the hazard ratio is greater than 1 i.e. the hazard ratio is significantly greater than 1; or if the upper confidence limit of the hazard ratio is less than 1 i.e. the hazard ratio is significantly less than 1. This is equivalent to analysing if the hazard ratio is significantly different from 1.

4.5. Why can I censor with a variable that does not exist in my trial?

KERUS™ populates the drop-down menus within the Analysis tab using a global variable list containing all variables in any trial within the KERUS™ session. If the user has defined trials of the same structure (e.g. Parallel Group) containing different variables (Trial A and Trial B) and if the variable in question is used as a censor in the Kaplan-Meier or Cox PH functions, trial success probabilities **will** be generated. When the Censor variable is invalid, KERUS™ defaults to ‘None Selected’. The user may then observe trial success probabilities greater than expected as no censorship will be applied decreasing the uncertainty in calculated survival statistics.

The user should return to the KERUS™ Setup tab and ensure that any censor variables have been defined in each trial where available. The user should also confirm their Censor variable choice within the Analysis tab.

5.1. How do I save my Output results or graphics?

KERUS™ can display trial success probabilities based on defined analysis objectives as barcharts or heatmaps in the Output tab. KERUS™ users will often wish to save the graphics and underlying data summarising trial success probabilities.

The graphic can be saved as a tagged image file format (.tiff) or a joint photographic experts group (.jpeg) file through the ‘Save Image’ button (located on the right of the Output tab). After clicking the ‘Save Image’ button, the user will be prompted to supply a location and name for the created file. Graphics will be saved ‘as is’, with any grouping options or colour schemes applied. The data underlying the graphic can be exported as a comma separated value (.csv) file through the ‘Save Plot Data’ button (located on the right of the Output tab). After clicking the ‘Save Plot Data’ button, the user will be prompted to supply a location and name for the created file. Only analysis objectives displayed in the current graphic will be saved with any grouping options applied. If the displayed graphic contains error bars (barcharts only), the mean probability is saved with the associated standard deviation in brackets.

5.2. What do the error bars on the Output barchart represent?

KERUS™ can display trial success probabilities based on defined analysis objectives as barcharts or heatmaps in the Output tab. The uncertainty of these probabilities can vary depending on a number of trial design structure, simulation and probability summary options. Error bars are automatically displayed on barcharts when viewing objectives grouped by sample sizes, allocation ratios or trial design structures. When viewing probabilities based in these grouping options, these error bars are the standard deviation of the mean and are calculated from the multiple trial success probabilities within the group. For example, if grouped by sample size, probabilities from all allocation ratios and trial structures with that sample size are used in the calculation. Therefore, as the KERUS™ user defines a narrower range of design scenarios the uncertainty associated with the trial success probabilities will decrease, generating smaller error bars.

Evaluating the trial success probabilities with error bars in this way allows the KERUS™ user to examine the relative contribution of certain trial design parameters to the risk of trial failure.

5.3. Why are the panels within the Output tab not visible?

KERUS™ populates the Output tab panels with data summarising the probability of achieving an analysis objective in a given trial design scenario, derived from analysing simulated data. Therefore, the simulation and analysis of the data is an absolute prerequisite for the population of the Output tab. If the KERUS™ user has defined analysis objectives but failed to click the ‘Simulate and Evaluate’ button, located at the bottom of the Objective panel in the Analysis tab, KERUS™ will have no data to display in the Output tab.

If the ‘Simulate and Evaluate’ button is clicked without the definition of any analysis objectives the data will be simulated and any defined statistical tests performed, but no analysis success probabilities will be calculated. The panels within the Output panel will be visible, however, it will not be possible to select any data to display. The KERUS™ user should define analysis objectives through the drop-down menus adjacent text box and ‘Add Objective’ button before clicking the ‘Simulate and Evalulate’ button again. Additional statistical tests or analysis objectives can be added after data has been simulated, however, this will require the re-simulation and evaluation of the data for the output panel to be updated with the results.

5.4. How do I export the information I used to define my study?

To perform a robust trial simulation several parameters, variables and relations must be defined within a KERUS™ session. KERUS™ users may want to create a record of these data, accessible from outside KERUS™, for future audit or distribution to other KERUS™ users. KERUS™ allows the user to create a Microsoft Excel Workbook (.xlsx) containing all the parameters, variables and relation information for each trial scenario through the ‘View Trial Setup’ menu option. If multiple trials have been added to a single session, information from all trials will be saved within a single Excel Workbook file.

After selecting the ‘View Trial Setup’ menu option, the user will be prompted to supply a location and name for the created file. This file can be opened using Microsoft Excel (2007 or later) or OpenOffice Calc (version 3.0 or later) and will contain two worksheet tabs: ‘Data Summary’ where trial and defined variable parameters can be found, and ‘Correlations’ where defined variable relation coefficients can be found.

5.5. How do I save my simulated data to analyse in another program?

KERUS™ generates a large amount of simulated data, which the user may wish to analyse in another program in addition to evaluating trial success probabilities. KERUS™ allows the user to view and save the simulated data underlying the trial analysis through the View buttons within the Analysis tab.

The View buttons will only be available after the ‘Simulate and Evaluate’ process has been performed. KERUS™ users can use these buttons to select subsets of the simulated data based on simulation iteration and/or trial design scenario. Single or multiple simulations/scenarios can be selected. Multiple elements can be selected by holding the ‘Control’ () key and left-clicking on the desired elements, or all elements can be selected by simultaneously pressing the ‘Control’ and ‘A’ keys.

After selecting the desired simulation/scenario subsets, clicking the ‘Preview Data’ button will open a new window containing the selected simulated datasets.

KERUS™ users should note that viewing larger numbers of datapoints can be computationally expensive and may take more time and memory to process. The number of rows that can be viewed in the data window is restricted to 1,000, however, saving these data as a comma separated value (.csv) file through the ‘Save’ button allows access to all the data. After clicking the ‘Save’ button, the user will be prompted to supply a location and name for the created file. This file can be easily opened in a variety of programs such as Microsoft Excel, OpenOffice Calc or other 3rd party statistical packages. KERUS™ users should note that larger number of datapoints can generate large files and take more time to process.

5.6. I am colour blind, can I alter the colour scheme of the Output heatmap for visualisation?

KERUS™ can display trial success probabilities based on defined analysis objectives as barcharts or heatmaps. Heatmaps are, as default, coloured on a Red/Green colour scale with a midpoint defined by the set acceptable success rate. However, this colour scale may not be appropriate for all users if, for example, they have Red/Green colour-blindness.

The colour scale of heatmaps generated in the output tab can be changed from within the Plot Type drop-down menus in the Output Selection panel. The currently available options are Red/Green, Blue/Yellow or Red/Blue.

Selecting an option from the drop-down menu and clicking the Preview button (located at the bottom of the Output Selection panel) will update the generated heatmap graphic.

These colour scale options will also be applied to saved images of the generated heatmaps.

5.7. Why do I get ‘NaN’ as a result for analysis objectives?

KERUS™ populates the drop-down menus within the Analysis tab using a global variable list containing all variables in any trial within the KERUS™ session. In some cases, the KERUS™ user may define trials with different variables or structures (Single Group, Parallel Group). If multiple trials have been added to the Kerus™ session, the KERUS™ user may be offered variables as statistical test targets, grouping, subgrouping or censor options that do not co-exist in the same trial. KERUS™ applies statistical tests within each simulated dataset independently of variables in other datasets or trials i.e. a variable in ‘Trial A’ cannot be applied to analysis of ‘Trial B’. If an invalid target, grouping or subgrouping option is selected, the statistical test applied will return ‘NaN’ (Not a Number) as the result for that trial scenario-analysis objective combination in the Output tab.

For example, if the user has Single Group and Parallel Group trials defined and performs a two sample t-test by group, the analysis of the single group trial will return ‘NaN’ for all included scenarios. See the KERUS™ instructional videos for more details on defining analysis objectives.

In this case, a second statistical test of a one sample t-test should be defined and will be applied to both trials.

KERUS™ users should consider carefully which analysis objective success probability in the Output tab is most applicable to each trial design. Probabilities shown in the light blue box above are not valid for that trial design.

If the user has defined trials of the same structure (e.g. Parallel Group) containing different variables (Trial A and Trial B) and selects a two sample t-test grouped by a variable not present in all trials (e.g. Gender in the example below), the analysis will return ‘NaN’ for all included scenarios.

In this case, the user should return to the KERUS™ Setup tab and ensure that all appropriate variables have been added to each trial design. The user should also confirm their grouping and subgrouping variable choices within the Analysis tab are consistent for their analysis purposes and if necessary define a statistical test using appropriate grouping options for each trial design.

6. Who can I contact if I need assistance?

There are a number of resources available should a KERUS™ user encounter an issue preventing the successful use of the software. The KERUS™ user manual is available through the KERUS™ menu bar and online.

In addition, a variety of instructional videos which cover basic KERUS™ setup, simulation analysis procedures and analysis case studies are available online. In the event that any issues cannot be resolved by consulting the user manual or instructional videos, questions can be posted to the Kerus™ support forum, where a member of the Kerus™ support team or fellow KERUS™ user can provide assistance, or if preferred the KERUS™ support team can be emailed directly.

7. We are a UK based charity and qualify for VAT exemption how can I remove this charge?

If your organisation is a UK based charity and qualifies for VAT exemption you’ll need to contact us before purchasing and our support team will be happy to help.