SigmaPlot
Statistical Features

SigmaPlot Has Extensive And Easy-To-Use Statistical Analysis Features

SigmaPlot is now bundled with SigmaStat as an easy-to-use package for complete graphing and data analysis. The statistical functionality was designed with the non-statistician user in mind. This wizard-based statistical software package guides users through every step and performs powerful statistical analysis without having to be a statistical expert.

Learn More about the new and improved features

The statistical functionality was designed with the non-statistician user in mind.

Each statistical analysis has certain assumptions that have to be met by a data set. If underlying assumptions are not met, you may be given inaccurate or inappropriate results without knowing it.

However, SigmaPlot will check if your data set meets the test criteria and if not, it will suggest what test to run.

Statistical Analysis Features

Describe Data  Single Group

  • One Sample t-test
  • One Sample Signed Rank test

Compare Two Groups

  • t-test
  • Ranked Sum Test

Compare Many Groups

  • One Way ANOVA
  • Two Way ANOVA
  • Three Way ANOVA
  • ANOVA on Ranks
  • One Way ANCOVA

Before and After

  • Paired t-test
  • Signed Rank Test

Repeated Measures

  • One Way Repeated Measures ANOVA
  • Two Way Repeated Measures ANOVA
  • Repeated Measures ANOVA on Ranks

Rates and Proportions

  • z-test
  • Chi-Square
  • Fisher Exact Test
  • McNemars’s Test
  • Relative Risk
  • Odds Ratio

Regression

  • Linear
  • Multiple Logistic
  • Multiple Linear
  • Polynomial
  • Stepwise
  • Best Subsets
  • Regression Wizard
  • Deming

Principal Components Analysis

Correlation

  • Pearson Product Moment
  • Spearman Rank Order

Survival

  • Kaplan-Meier
  • Cox Regresssion

Normality

New Statistics Macros

  • The Histogram and Kernel Density macro creates graphical estimates of underlying data distributions

Enhancements to Existing Features

  • Analytic P values are implemented for all nonparametric ANOVAs
  • All P values can now be specified for any value between 0 and 1
  • The Akaike Information Criterion (AICc) is now found in the Regression Wizard and Dynamic Fit Wizard reports and the Report Options dialog
  • The Rerun button has returned to the SigmaStat group
  • Implemented the 24 probability functions in the curve fitter in standard.jfl
  • Added seven weighting functions to all curve fit equations in standard.jfl. There is a slight variant added for 3D equations

New Statistics Features Found in Version 13 – One Way ANCOVA

Introduction
A single-factor ANOVA model is based on a completely randomized design in which the subjects of a study are randomly sampled from a population and then each subject is randomly assigned to one of several factor levels or treatments so that each subject has an equal probability of receiving a treatment. A common assumption of this design is that the subjects are homogeneous.

This means that any other variable, where differences between the subjects exist,does not significantly alter the treatment effect and need not be included in the model. However, there are often variables, outside the investigator’s control, that affect the observations within one or more factor groups, leading to necessary adjustments in the group means, their errors, the sources of variability ,and the P-values of the group effect, including multiple comparisons.

These variables are called covariates. They are typically continuous variables, but can also be categorical. Since they are usually of secondary importance to the study and, as mentioned above, not controllable by the investigator, they do not represent additional main-effects factors, but can still be included into the model to improve the precision of the results. Covariates are also known as nuisance variables or concomitant variables.

ANCOVA (Analysis of Covariance) is an extension of ANOVA obtained by specifying one or more covariates as additional variables in the model. The ANCOVA data arrangement in a SigmaPlot worksheet has one column with the factor and one column with the dependent variable (the observations) as in an ANOVA design. In addition, you will have one column for each covariate. When using a model that includes the effects of covariates, there is more explained variability in the value of the dependent variable.

This generally reduces the unexplained variance that is attributed to random sampling variability, which increases the sensitivity of the ANCOVA as compared to the same model without covariates (the ANOVA model). Higher test sensitivity means that smaller mean differences between treatments will become significant as compared to a standard ANOVA model, thereby increasing statistical power.

As a simple example of using ANCOVA, consider an experiment where students are randomly assigned to one of three types of teaching methods and their achievement scores are measured. The goal is to measure the effect of the different methods and determine if one method achieves a significantly higher average score than the others.

The methods are Lecture, Self-paced, and Cooperative Learning. Performing a One Way ANOVA on this hypothetical data gives the results in the table below, under the ANOVA column heading. We conclude there is no significant difference among the teaching methods. Also note that the variance unexplained by the ANOVA model which is due to the random sampling variability in the observations is estimated as 35.17.

It is possible that students in our study may benefit more from one method than the others, based on their previous academic performance. Suppose we refine the study to include a covariate that measures some prior ability, such as a state-sanctioned Standards Based Assessment (SBA). Performing a One Way ANCOVA on this data gives the results in the table below, under the ANCOVA column heading.

 ANOVAANCOVA
MethodMeanStd. ErrorAdjusted MeanStd. Error
Coop79.332.421>82.090.782
Self83.332.42182.440.751
Lecture86.832.42184.970.764
 P = 0.124P =0.039
 MSres= 35.17MSres = 3.355

The adjusted mean that is given in the table for each method is a correction to the group mean to control for the effects of the covariate. The results show the adjusted means are significantly different with the Lecture method as the more successful. Notice how the standard errors of the means have decreased by almost a factor of three while the variance due to random sample variability has decreased by a factor of ten. A reduction in error is the usual consequence of introducing covariates and performing an ANCOVA analysis.

Assumption Checking 

In addition to the Normality and Equal Variance options, the One Way ANCOVA options will include testing for the equality of slopes (the Interaction Model).

  • Normality Test: There are two options, Shapiro-Wilk and Kolmogorov-Smirnov, as is provided for the parametric ANOVA tests. The default P Value to Reject is .05 and the default test is Shapiro-Wilk. Normality testing is performed on the residuals of the Equal Slopes model or, if the Equality of Slopes Test fails, then the normality test is performed on the residuals of the Interaction Model.
  • Equal Variance Test: The default P Value to Reject is .05. Levene’s mean test is used to assess equal variance. The test is performed on the residuals of the Equal Slopes Model or, if the Equality of Slopes Test fails, then the equal variance test is performed on the residuals of the Interaction Model.
  • Equality of Slopes – One of the assumptions of ANCOVA is that there is no interaction between the Factor variable (the treatment levels) and the covariate variables. In other words, the coefficient of each covariate in the model is assumed to be the same for all treatments. An Equality of Slopes option provides the calculations to test this assumption. If the Equality of Slopes option is not selected, then equality of the slopes will be assumed and the analysis for the report focuses on the results of fitting the ANCOVA Model (also called the Equal Slopes Model) to the user’s data. If the Equality of Slopes option is selected, then the Interaction Model will be fit to the data to determine if any of the interactions is significant. If an interaction between the factor and any covariate is significant (so the equality of slopes test fails), then the analysis will stop, but the regression equations for each group will be provided. If there is no significant interaction between the factor and any covariate (so the equality of slopes test passes), then the report will continue to provide the results of the Equal Slopes Model.

ANCOVA Results

An example of the ANCOVA report is shown. The assumption checking results are displayed followed by the ANOVA table and the results interpretation. The adjusted means are displayed and then the multiple comparison results.

One Way Analysis of Covariance

Data source: Data 1 in ANCOVA_DataSets.JNB

Dependent Variable: Length

Group NameNMissingMeanStd DevSEM
control6201.2170.1400.0178
exposed 12701.2060.05780.0111
exposed 24001.0500.09940.0157
Total12901.1630.1370.0121

Normality Test (Shapiro-Wilk): Passed (P = 0.057)

Equal Variance Test: Failed (P < 0.050)

Equal Slopes Test:

The equality of slopes assumption is tested by extending the ANCOVA regression model to include terms for the interactions of the factor with the covariates.

R = 0.625Rsqr = 0.390AdjRsqr = 0.365

Analysis of Variance:

Source of VariationDFSSMSFP
Col 120.1240.06185.1660.007
Age10.03690.03693.0860.081
Col 1 x Age20.06810.03402.8440.062
Residual1231.4720.0120
Total1282.4130.0189

The effect of the different treatment groups does not depend upon the value of covariate Age, averaging over the values of the remaining covariates. There is not a significant interaction between the factor Col 1 and the covariate Age (P = 0.062).

There are no significant interactions between the factor and the covariates. The equals slopes assumption passes and the equal slopes model is analyzed below.

Analysis of Equal Slopes Model:

R = 0.602Rsqr = 0.362AdjRsqr = 0.347

Analysis of Variance:

Source of VariationDFSSMSFP
Col 120.2290.1149.290<0.001
Age10.1280.12810.4210.002
Residual1251.5400.0123
Total1282.4130.0189

The differences in the adjusted means among the treatment groups are greater than would be expected by chance; there is a statistically significant difference (P = <0.001). To isolate which group(s) differ most from the others use a multiple comparison procedure. The adjusted means and their statistics are given in the table below.

The coefficient of covariate Age in the equal slopes regression model is significantly different from zero (P = 0.002). The covariate significantly affects the values of the dependent variable.

Adjusted Means of the Groups:

Group NameAdjusted MeanStd. Error95%Conf-L95%Conf-U
control1.2000.01501.1701.230
exposed 11.1930.02171.1501.236
exposed 21.0850.02061.0441.126

The adjusted means are the predicted values of the model for each group where each covariate variable is evaluated at the grand mean of its sampled values.

All Pairwise Multiple Comparison Procedures (Holm-Sidak method):

Comparisons for factor: Col 1

ComparisonDiff of MeanstPP<0.050
control vs. exposed 20.1154.178<0.001Yes
exposed 1 vs. exposed 20.1093.4560.001Yes
control vs. exposed 10.006930.2710.787No

Regression Equations for the Equal Slopes Model:

Each equation below is obtained by restricting the effects-coded dummy variables in the regression model to the values corresponding to each factor group.

A significant difference in the adjusted means of the factor groups is equivalent to a significant difference in the intercepts of the dependent variable for these equations.

Group: control

Length = 1.307 – (0.00280 * Age)

Group: exposed 1

Length = 1.300 – (0.00280 * Age)

Group: exposed 2

Length = 1.192 – (0.00280 * Age)
[/toggle] [toggle border=’2′ title=’ANCOVA Result Graphs’] There are four ANCOVA result graphs – Regression Lines in GroupsScatter Plot of Residuals, Adjusted Means with Confidence Intervals and Normality Probability Plot.

Examples of ANCOVA Graphs

  

Principal Components Analysis (PCA)

Introduction

Principal component analysis (PCA) is a technique for reducing the complexity of high-dimensional data by approximating the data with fewer dimensions. Each new dimension is called a principal component and represents a linear combination of the original variables. The first principal component accounts for as much variation in the data as possible. Each subsequent principal component accounts for as much of the remaining variation as possible and is orthogonal to all of the previous principal components.

You can examine principal components to understand the sources of variation in your data. You can also use them in forming predictive models. If most of the variation in your data exists in a low-dimensional subset, you might be able to model your response variable in terms of the principal components. You can use principal components to reduce the number of variables in regression, clustering and other statistical techniques.

The primary goal of Principal Components Analysis is to explain the sources of variability in the data and to represent the data with fewer variables while preserving most of the total variance.

Assumption Checking

Normality test – There are two options. Mardia’s Skewness and Kurtosis tests and the Henze-Zinkler test.

Principal Components Results

An example of the Principal Components report is shown. The assumption checking results are displayed followed by descriptive statistics, the correlation matrix and its eigenvalues. The number of in-model principal components is displayed along with a test for equality of eigenvalues. Based on these results, interpretations are given as to the number of principal components supported.

Principal Components Analysis

Normality Test (Henze-Zinkler):

Statistic = 1.351 Failed (P < 0.050)

Descriptive Statistics:

VariableMeanStd Dev
Murder7.4443.867
Rape25.73410.760
Robbery124.09288.349
Assault211.300100.253
Burglary1291.904432.456
Larceny2671.288725.909
Auto_Theft377.526193.394

Total Observations: 50

Missing: 0

Valid Observations: 50

An observation is missing if any worksheet cell in its row has a non-numeric value.

Correlation Matrix:

 MurderRapeRobberyAssaultBurglaryLarcenyAuto_Theft
Murder1.000 – – – – – 
Rape0.6011.000 – – – – –
Robbery0.4840.5921.000 – – – –
Assault0.6490.7400.5571.000 – – –
Burglary0.3860.7120.6370.6231.000 – –
Larceny0.1020.6140.4470.4040.7921.000 –
Auto_Theft0.06880.3490.5910.2760.5580.4441.000

Total Variance: = 7.000

Eigenvalues of the Correlation Matrix:

 EigenvalueDifferenceProportion(%)Cumulative(%)
14.1152.87658.78558.785
21.2390.51317.69676.481
30.7260.40910.36986.850
40.3160.05854.52091.370
50.2580.03593.68595.056
60.2220.09803.17298.228
70.1241.772100.000

If two or more eigenvalues have the same value, then the corresponding principal components are not well-defined and any interpretation of them is suspect.

Number of In-Model Principal Components = 2

The in-model components correspond to all eigenvalues greater than or equal to the average eigenvalue. When analyzing the correlation matrix, the average eigenvalue is always 1.0. This criterion can be changed in the Test Options dialog on the Criterion panel. The variance of each principal component equals its corresponding eigenvalue.

Chi-Square Tests for the Equality of Eigenvalues:

Hypothesis: All eigenvalues are equal.
Statistic = 224.295
Degrees of freedom = 21.000
P value = <0.001

There is a significant difference in the eigenvalues. A principal components analysis can be conducted.

Hypothesis: The last 5 eigenvalues are equal.
Statistic = 39.287
Degrees of freedom = 13.209
P value = <0.001

There is a significant difference in the last 5 eigenvalues. You may want to include additional principal components in your model by changing the settings in the Test Options dialog on the Criterion panel.

Eigenvectors of the Correlation Matrix:

 PC 1PC 2
Murder0.300-0.629
Rape0.432-0.169
Robbery0.3970.0422
Assault0.397-0.344
Burglary0.4400.203
Larceny0.3570.402
Auto_Theft0.2950.502

Each principal component is a linear combination of the original variables, after each original variable has been standardized to have unit variance. The coefficients of this linear combination are the entries in the corresponding column of the above table. These coefficients provide the interpretation of the principal components in terms of the original variables.

Standard Errors for the Eigenvector Entries:

 PC 1PC 2
Murder0.07540.0790
Rape0.03870.100
Robbery0.04760.153
Assault0.05120.0888
Burglary0.03710.0898
Larceny0.06200.151
Auto_Theft0.07290.160

Component Loadings:

 PC 1PC 2
Murder0.609-0.700
Rape0.876-0.189
Robbery0.8050.0470
Assault0.805-0.382
Burglary0.8930.226
Larceny0.7250.448
Auto_Theft0.5990.559

If the principle components are standardized to have unit variance, the loadings are the coefficients of the linear combination of in-model principal components used to approximate the original variables. If a correlation matrix is analyzed, then the loadings equal the correlations between the original variables and the principal components.

Fitted Correlation Matrix:

 MurderRapeRobberyAssaultBurglaryLarcenyAuto_Theft
Murder0.861 – – – – – –
Rape0.6660.803 – – – – –
Robbery0.4570.6960.650 – – – –
Assault0.7580.7770.6300.794 – – –
Burglary0.3850.7390.7290.6320.848 – –
Larceny0.1280.5500.6050.4120.7490.726 –
Auto_Theft-0.02680.4190.5080.2680.6610.6840.671

This is an estimate of the correlation matrix that results by approximating the original variables with the in-model principal components.

Principal Components Results Graphs

There are three PCA result graphs – Scree Plot, Component Loadings Plot, and Component Scores Plot. Below are examples of the result graphs together with captions explaining the information the graphs contain. The graphs are based on a study of crime data gathered across the United States. The original variables in the data are seven types of crimes: Murder, Rape, Larceny, Burglary, Auto Theft, Robbery and Assault. The rates per 100,000 people were measured for all 50 states.