The Mite Dataset - Borcard et al. (1992)

Let’s view the dataset! We are working with the mite dataset from Borcard et al. (1992) which contains: community mites data from 70 soil cores, soil core environmental data, and principle components of neighbor matrices.

The Community Component

Starting with the community dataset, each column is a species while each row is a soil core. Each of the mite datasets have 70 rows, with each row being a soil core community.

head(mite[,1:14])
##   Brachy PHTH HPAV RARD SSTR Protopl MEGR MPRO TVIE HMIN HMIN2 NPRA TVEL ONOV
## 1     17    5    5    3    2       1    4    2    2    1     4    1   17    4
## 2      2    7   16    0    6       0    4    2    0    0     1    3   21   27
## 3      4    3    1    1    2       0    3    0    0    0     6    3   20   17
## 4     23    7   10    2    2       0    4    0    1    2    10    0   18   47
## 5      5    8   13    9    0      13    0    0    0    3    14    3   32   43
## 6     19    7    5    9    3       2    3    0    0   20    16    2   13   38

The Environmental Component

We also have environmental variables for our soil cores, which is further described by the table below.

Code Description of variable
SubsDens Substrate density (dry matter) [g dm-3]
WatrCont Water content [g dm-3]
Substrate Substrate [7 unordered classes; 6 substrate types + interface category]
Shrub Shrubs [3 ordered classes]
Topo Microtopography [Blanket - Hummock]
head(mite.env)
##   SubsDens WatrCont Substrate Shrub    Topo
## 1    39.18   350.15   Sphagn1   Few Hummock
## 2    54.99   434.81    Litter   Few Hummock
## 3    46.07   371.72 Interface   Few Hummock
## 4    48.19   360.50   Sphagn1   Few Hummock
## 5    23.55   204.13   Sphagn1   Few Hummock
## 6    57.32   311.55   Sphagn1   Few Hummock

The Spatial Component

Additionally we have a spatial component in the form of a Principle Component Analysis, which reflects the spatial autocorrelation between soil cores to visualize how different sites are spatially related. The PCNM creates a list of vectors, with each explaining spatial autocorrelation from broader to finer scales (V1 = broad, V5 is finer, V15 is even finer etc.). The last few vectors in the list are essentially interpreting statistical noise. Here is a snapshot of the table, although the scale goes all the way down to V22!

head(mite.pcnm[,1:6])
##           V1          V2           V3           V4          V5          V6
## 1 0.01936957 -0.03564740 -0.004243617  0.013606215 -0.05189017 -0.03474468
## 2 0.02327134 -0.04809322 -0.004319021 -0.004129358 -0.06717623 -0.05795898
## 3 0.02553531 -0.05844679 -0.003091072 -0.025699042 -0.07594608 -0.07619106
## 4 0.03065998 -0.07805595 -0.001108683 -0.056124820 -0.08546514 -0.09535844
## 5 0.03105726 -0.08758357  0.003294018 -0.092445741 -0.05775704 -0.08126478
## 6 0.04127819 -0.12060082  0.004167658 -0.126085915 -0.10026023 -0.13218923

1. Unconstrained Ordination Analysis

Background

# 1. Run the Principle Component Analysis (PCA)
PCA = rda(mite.env[1:2])

# 2. View PCA
summary(PCA)
## 
## Call:
## rda(X = mite.env[1:2]) 
## 
## Partitioning of variance:
##               Inertia Proportion
## Total           20410          1
## Unconstrained   20410          1
## 
## Eigenvalues, and their contribution to the variance 
## 
## Importance of components:
##                             PC1       PC2
## Eigenvalue            2.029e+04 124.71444
## Proportion Explained  9.939e-01   0.00611
## Cumulative Proportion 9.939e-01   1.00000
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:  34.44874 
## 
## 
## Species scores
## 
##             PC1      PC2
## SubsDens  1.024 -2.69164
## WatrCont 34.328  0.08033
## 
## 
## Site scores (weighted sums of species scores)
## 
##        PC1       PC2
## 1  -1.7605  -0.63174
## 2   0.7172  -5.56245
## 3  -1.1267  -2.95030
## 4  -1.4515  -3.86151
## 5  -6.0240   3.55246
## 6  -2.8682  -7.79272
## 7  -0.9248   0.51483
## 8  -4.1510 -16.92626
## 9  -2.8894  -9.32773
## 10 -5.5334   0.54782
## 11 -8.0508  -1.69209
## 12 -0.1310  -2.84254
## 13 -4.8685   2.35013
## 14 -4.9823  -1.14094
## 15 -1.7282  -8.32852
## 16 -2.5869   0.45439
## 17 -3.3172   2.34980
## 18 -3.9174   3.50344
## 19 -0.7760  -2.08491
## 20 -7.7121  -2.68514
## 21 -6.6066   2.44648
## 22 -6.5469  -0.06083
## 23 -0.3529   2.57898
## 24 -3.4709   1.28450
## 25 -3.4130   0.18084
## 26  2.2783  -3.66056
## 27 -7.7373   3.14183
## 28 -1.6351   0.66803
## 29 -2.5527   1.41477
## 30 -0.2139   0.33215
## 31  2.4373  -0.78762
## 32 -3.6795  -0.67481
## 33 -0.6817   0.79685
## 34 -1.2547  -5.63682
## 35 -3.5993   2.39848
## 36  0.2149  -0.03964
## 37 -0.4992   1.49046
## 38  3.4360  -1.83415
## 39  8.2051  -6.33862
## 40 -0.3663   1.72679
## 41  1.6714   1.74071
## 42 -0.7173   4.43907
## 43  4.9735  -4.55923
## 44  8.6794  -5.27756
## 45  3.7283  -1.37739
## 46  1.1920  -1.30245
## 47 -2.5273   1.00834
## 48  2.8027   0.16508
## 49 -1.3723   4.27520
## 50  0.6786   4.34247
## 51  1.5886   1.26338
## 52  5.1612   5.83680
## 53  0.1060   4.66696
## 54  3.4944  -1.01316
## 55  1.0709   3.12088
## 56  3.5597   0.57581
## 57  5.9163  -4.26354
## 58 -1.6766   3.83617
## 59  7.1631  -2.26949
## 60  1.5987   3.54836
## 61  6.2853  -0.48591
## 62  6.5309  -0.97203
## 63  2.0812   2.37225
## 64  0.6982   4.43166
## 65  1.9853   7.48505
## 66  3.0263   4.05830
## 67 12.1282  -0.15315
## 68  5.2148   5.71602
## 69  1.9878   4.45436
## 70  3.0915   7.46394

PCA Interpretation

# 3. Plot PCA
autoplot(PCA, arrows=TRUE)

PCA Plot Interpretation

2. Constrained Ordination Analysis

Simple Redundancy Analysis

# 1. Run the Redundancy Analysis (RDA)
RDA = rda(mite ~ SubsDens + WatrCont, data = mite.env)

# 2. View RDA
RDA
## Call: rda(formula = mite ~ SubsDens + WatrCont, data = mite.env)
## 
##                 Inertia Proportion Rank
## Total         9098.5913     1.0000     
## Constrained   1977.8327     0.2174    2
## Unconstrained 7120.7586     0.7826   35
## Inertia is variance 
## 
## Eigenvalues for constrained axes:
##   RDA1   RDA2 
## 1919.3   58.6 
## 
## Eigenvalues for unconstrained axes:
##  PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8 
## 6252  289  153  112   92   57   41   30 
## (Showing 8 of 35 unconstrained eigenvalues)

RDA Interpretation

  • Constrained Inertia: Amount of variation explained by the environmental variables.
  • Unconstrained Inertia: Amount of variation not explained by the environmental variables.
  • Proportion: The proportion of variation explained by the constrained and unconstrained inertia respectively.
  • It looks like 21.74% of the variation was explained by the environmental variables, and 78.26% of the variation was not explained by the environmental variables.
  • Eigenvalues for the Constrained Axis: This is a measure of how much variation is explained by the first axis (RDA1) versus the second axis (RDA2).
  • Most of the variation is captured by RDA1 with an eigenvalue of 1919.3 while RDA2 captured much less with an eigenvalue of only 58.6.
# 3. Plot RDA
autoplot(RDA, arrows = TRUE)

RDA Plot Interpretation

  • RDA1 (WaterCont): Positive Score = Sites w/ High Water Content | Negative Score = Sites w/ Low Water Content
  • RDA2 (SubDens): Positive Score = Sites w/ High Shrub Density | Negative Score = Sites w/ Low Shrub Density
  • Notice how the labels have shifted from PCA1 & PCA2 from the previous unconstrained analysis to RDA1 & RDA2 in the constrained analysis!
  • We may want to remove the outlier for a better resolution.

Redundancy Analysis with more than one Predictor Variable Matrix

# 1. Run the RDA with more than one Predictor Variable Matrix
RDA_v2 = rda(mite ~ . + as.matrix(mite.pcnm[,1:3]), data = mite.env)

# 2. View RDA
RDA_v2
## Call: rda(formula = mite ~ SubsDens + WatrCont + Substrate + Shrub +
## Topo + as.matrix(mite.pcnm[, 1:3]), data = mite.env)
## 
##                Inertia Proportion Rank
## Total         9098.591      1.000     
## Constrained   3239.331      0.356   14
## Unconstrained 5859.260      0.644   35
## Inertia is variance 
## 
## Eigenvalues for constrained axes:
##   RDA1   RDA2   RDA3   RDA4   RDA5   RDA6   RDA7   RDA8   RDA9  RDA10  RDA11 
## 2738.9  295.8   87.2   43.0   27.2   14.2   10.7    9.3    6.4    3.0    1.6 
##  RDA12  RDA13  RDA14 
##    1.1    0.5    0.2 
## 
## Eigenvalues for unconstrained axes:
##  PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8 
## 5295  165  103   66   64   42   32   22 
## (Showing 8 of 35 unconstrained eigenvalues)

Model Information

  • Note the period, which represents that we are using all of the variables from mite.env
  • We are only interested in the first three columns of mite.pcnm as we do not need spatial autocorrelation information beyond that degree of resolution. Remember that each proceeding column in the mite.pcnm represents spatial autocorrelation at a finer resolution than the last.

RDA Interpretation

  • This model looks to describe how much variation in community composition of mites is explained by environmental variables, and now also spatial autocorrelation data (mite.pcnm).
  • The Constrained Inertia, or the variation explained by the environmental variables, has increased from the previous model (21.74%) to 35.6%.
  • Notice that we have many more Eigenvalues for all the new environmental variables we’ve added!

Interpretation Caveats to Consider

  • Eigenvalues appear to taper off from RDA3 - RDA14, which poses some difficulty in interpretation: the higher dimension variables may contain information that we are missing, or they are statistical noise.
  • We typically want to stick to the first two variables, and sometimes the third or fourth depending on how fast the eigenvalues attenuate from RDA2 onward.
  • A solution to this issue is to utilize non-metric multidimensional scaling (NMDS) as a way of showing patterns in only two dimensions.
# 3. Plot RDA
autoplot(RDA_v2, arrows = TRUE)

RDA Plot Interpretation

  • RDA1 (WaterCont): Positive Score = Sites w/ High Water Content | Negative Score = Sites w/ Low Water Content
  • RDA2 (SubDens): Positive Score = Sites w/ High Shrub Density | Negative Score = Sites w/ Low Shrub Density
  • There are many sites clustered along the RDA2, indicating that there may be some spatial autocorrelation occurring in the shrub density variable.

Redundancy Analysis with Partitions for Variation Among Space & Environment

The Question: Is there a relative contribution of space versus physical habitat?

Two-Group Partition

# 1. Run the RDA with Two-Group Partitions for Variation Among Space & Environment
v.part = varpart(mite, mite.env[,1:2], mite.pcnm[,1:3])

# 2. View RDA
v.part
## 
## Partition of variance in RDA 
## 
## Call: varpart(Y = mite, X = mite.env[, 1:2], mite.pcnm[, 1:3])
## 
## Explanatory tables:
## X1:  mite.env[, 1:2]
## X2:  mite.pcnm[, 1:3] 
## 
## No. of explanatory tables: 2 
## Total variation (SS): 627803 
##             Variance: 9098.6 
## No. of observations: 70 
## 
## Partition table:
##                      Df R.squared Adj.R.squared Testable
## [a+c] = X1            2   0.21738       0.19402     TRUE
## [b+c] = X2            3   0.06929       0.02698     TRUE
## [a+b+c] = X1+X2       5   0.26410       0.20661     TRUE
## Individual fractions                                    
## [a] = X1|X2           2                 0.17962     TRUE
## [b] = X2|X1           3                 0.01259     TRUE
## [c]                   0                 0.01439    FALSE
## [d] = Residuals                         0.79339    FALSE
## ---
## Use function 'rda' to test significance of fractions of interest

Two-Group Partition RDA Interpretation

  • Individual Fractions: Describes what proportion of the variation is explained by each explanatory variable table.
  • X1 | X2 : The pure effect of environment while we control for space, which indicates that 17.962% of the variation in the community can be explained with just environmental variables.
  • X2 | X1 : The pure effect of the space while we control for the environment, which indicates that 1.259% of the variation in the community can be explained with just the spatial variable.
  • [b] Blank : Represents the interaction between X1 & X2, which is 1.439%, and can be visualized in the Venn Diagram plot of v.part below.
# 3. Plot the RDA
plot(v.part)

Two-Group Partition RDA Plot Interpretation

  • The area outside of the Venn Diagram is the variation (residual error) not accounted for by X1 or X2.
  • We can conclude that the environment is much more important than spatial structure for regulating these mite communities.

Multi-Group Partition

# 1. Run the RDA with Multi-Group Partitions for Variation Among Space & Environment
v.part.mg = varpart(mite, ~ SubsDens + WatrCont, ~ Substrate +
                      Shrub + Topo, mite.pcnm[,1:3], data = mite.env,
                    transfo = 'hel')

# 2. View RDA
v.part.mg
## 
## Partition of variance in RDA 
## 
## Call: varpart(Y = mite, X = ~SubsDens + WatrCont, ~Substrate + Shrub +
## Topo, mite.pcnm[, 1:3], data = mite.env, transfo = "hel")
## Species transformation:  hellinger
## Explanatory tables:
## X1:  ~SubsDens + WatrCont
## X2:  ~Substrate + Shrub + Topo
## X3:  mite.pcnm[, 1:3] 
## 
## No. of explanatory tables: 3 
## Total variation (SS): 27.205 
##             Variance: 0.39428 
## No. of observations: 70 
## 
## Partition table:
##                       Df R.square Adj.R.square Testable
## [a+d+f+g] = X1         2  0.32677      0.30667     TRUE
## [b+d+e+g] = X2         9  0.40395      0.31454     TRUE
## [c+e+f+g] = X3         3  0.33631      0.30614     TRUE
## [a+b+d+e+f+g] = X1+X2 11  0.52650      0.43670     TRUE
## [a+c+d+e+f+g] = X1+X3  5  0.43681      0.39281     TRUE
## [b+c+d+e+f+g] = X2+X3 12  0.50517      0.40100     TRUE
## [a+b+c+d+e+f+g] = All 14  0.57806      0.47066     TRUE
## Individual fractions                                   
## [a] = X1 | X2+X3       2               0.06966     TRUE
## [b] = X2 | X1+X3       9               0.07785     TRUE
## [c] = X3 | X1+X2       3               0.03396     TRUE
## [d]                    0               0.01701    FALSE
## [e]                    0               0.05218    FALSE
## [f]                    0               0.05250    FALSE
## [g]                    0               0.16750    FALSE
## [h] = Residuals                        0.52934    FALSE
## Controlling 1 table X                                  
## [a+d] = X1 | X3        2               0.08668     TRUE
## [a+f] = X1 | X2        2               0.12216     TRUE
## [b+d] = X2 | X3        9               0.09486     TRUE
## [b+e] = X2 | X1        9               0.13003     TRUE
## [c+e] = X3 | X1        3               0.08614     TRUE
## [c+f] = X3 | X2        3               0.08645     TRUE
## ---
## Use function 'rda' to test significance of fractions of interest

Multi-Group Partition Model Information

  • “transfo = ‘hel’” : Utilizes the Hellinger transformation to adjust the community data, and down-weight effect of abundant taxa on our ordination. This transformation is commonly used in ecology when you have abundance data.
# 3. Plot the Multi-Group RDA
plot(v.part.mg)

Notice that there are three circles in the Multi-Group Venn Diagram instead of two like in the Two-Group Partition!

Individual Interpretation

  • X1 = Continuous Environmental Variables, which explains 7% of the variation.
  • X2 = Categorical Environmental Variables, which explains 8% of the variation.
  • X3 = Space Variable, which explains 3% of the variation.

Pariwise Interaction Interpretation

  • X1 | X2 = 2% of the variation
  • X1 | X3 = 5% of the variation
  • X2 | X3 = 5% of the variation
  • X1 | X2 | X3 = 17% of the variation

Variation Partitioning is a common way to determine the relative contributions/influence of different classes of environmental or spatial predictor variables on community.

3. Non-Metric Multidimensional Scaling

Conduct an NMDS Analysis

The goal of NMDS is to be able to show variation in as few dimensions as possible, so we avoid the issue seen where eigenvalues taper off from RDA3 onward, which poses some difficulty in interpretation.

NMDS accomplishes this by starting with a scatter plot of the data that is based on a PCA, but then iteratively shifts the points a little bit at a time and measures how well these points are represented in two-dimensional space. This allows us to visualize all model variation in only two-dimensions, and avoid omitting these higher dimension eigenvectors (RDA2<).

  • Based on Bray-Curtis Distance Measures, which states that a value of zero indicates that the community at Point A is the exact same has the exact same species & abundances as community at Point B. As the value increases from zero towards one, we’re either having turnover or loss of species (beta Diversity). You can have new species that come to replace others, or you just lose some species.
  • k = 2 : Describes the number of dimensions that we want to consider.
  • try = 20 : Number of iterations that we’re going to try to show variation in two dimensions, which is by default 20 times.
  • Note that upon running metaMDS() it automatically transforms your data by default to a Square Root Transform, and uses a Wisconsin double standardization. You want this because if you’re using raw abundance data, it accounts for the issue of abundant taxa having an undue influence on location.
# 1. Create Non-Metric Multidimensional Scaling for Mites
NMDS = metaMDS(mite)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1491318 
## Run 1 stress 0.1581872 
## Run 2 stress 0.1570059 
## Run 3 stress 0.1647107 
## Run 4 stress 0.1619723 
## Run 5 stress 0.1524394 
## Run 6 stress 0.1520924 
## Run 7 stress 0.1592943 
## Run 8 stress 0.1539993 
## Run 9 stress 0.1491318 
## ... Procrustes: rmse 0.0001851763  max resid 0.001307329 
## ... Similar to previous best
## Run 10 stress 0.1566999 
## Run 11 stress 0.168238 
## Run 12 stress 0.1632193 
## Run 13 stress 0.155299 
## Run 14 stress 0.1680435 
## Run 15 stress 0.1510129 
## Run 16 stress 0.1534991 
## Run 17 stress 0.1691308 
## Run 18 stress 0.1628422 
## Run 19 stress 0.1547792 
## Run 20 stress 0.1671669 
## *** Best solution repeated 1 times
# 2. Plot NMDS
autoplot(NMDS, geom = "text", legend = "none")

NMDS Model Interpretation

  • If your Two-Dimensional “Stress” is <0.2, you are ok to interpret the graphic-but we are really aiming for <.15 or <.1 which is where we start to build a lot of confidence that our Two-Dimensional representation of the data is as good as a Three-Dimensional representation.
  • If our 2D “Stress” is 0.2<, we may want to consider modifying (k = 2) to (k = 3) to interpret in Three-Dimensions.
  • We get the feedback “No convergence” which indicates that metaMDS() did not find the solution to minimize the stress. You may want to consider increasing the default (try = 20) parameter above 20 iterations.

NMDS Plot Interpretation

  • NMDS1: Score = Community # Contains Higher Abundances of the associated Taxa in those Communities

Test for Differences in Communities among Variable Groups

The adonis() function from vegan allows us to create a permuted multi-variate analysis of variance, and looks to predict your response variable (mites) based off a categorical variable (Shrubs).

  • Again we are using the Bray-Curtis Distances!
# 1. Test for Differences in Communities Among Shrub Groups (
# Shrub Groups: None, Few, Many
shrub.com = adonis2(mite ~ Shrub, 
                   data = mite.env,
                   permutations = 999,
                   method = "bray")

# 2. View Adonis Output
shrub.com
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = mite ~ Shrub, data = mite.env, permutations = 999, method = "bray")
##          Df SumOfSqs      R2      F Pr(>F)    
## Shrub     2   3.1221 0.21244 9.0365  0.001 ***
## Residual 67  11.5742 0.78756                  
## Total    69  14.6963 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Statistics Background

  • The larger the F Model (F Statistic), the more likely you’re going to have a significant relationship between variables.
  • P-value: The probability that we’re going to find an F Statistic equal to or greater than 9.0365.

Adonis Interpretation

  • P-value is <0.05, indicating that there is a statistically signficant difference between the Shrub Groups (None, Few, Many).
  • R2: 21.244% of the variation in the community structure is explained by the shrub factor, with the remaining 78.756% being unexplained as residuals.

Conclusion: There is a significant difference in mite community structure among sites with None, Few, and Many Shrubs!

  • Note that we assume that variance is equal among the shrub groups to make this conclusion!
# 1. Create Pairwise Distances between All Locations
mite.dist = vegdist(mite)

# 2. Determine Differences in Dispersion of the data 
#    within our Distance Classes based on Shrub Group
mite.bdesp      = betadisper(mite.dist, mite.env$Shrub)
mite.dist.anova = anova(mite.bdesp)

# 3. View ANOVA
mite.dist.anova
## Analysis of Variance Table
## 
## Response: Distances
##           Df  Sum Sq  Mean Sq F value Pr(>F)
## Groups     2 0.03153 0.015767  1.0123 0.3689
## Residuals 67 1.04355 0.015575

Interpretation

  • F-value is low and P-value is high, indicating that the variance within these Shrub groups (None, Few, Many) is equal.
  • This indicates that the adonis2() test above is valid!

Test for Correlations with Continuous Variables

# 1. Create Continuous NMDS Model Test
mite.cont = envfit(NMDS ~ SubsDens + WatrCont, mite.env)

# 2. View Model
mite.cont
## 
## ***VECTORS
## 
##             NMDS1    NMDS2     r2 Pr(>r)    
## SubsDens  0.02937 -0.99957 0.1378  0.004 ** 
## WatrCont  0.88466 -0.46623 0.7010  0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999

Interpretation

  • P-values for both SubsDens & WatrCont are <0.05, indicating that there is a significant test of relationship between the environmental variables in the community structure.
  • R2 indicates that 13.81% of the variation in the mite community can be explained by substrate density measure, and 70.09% of the variation in the mite community can be explained by water content.

Final Conclusions

We can conclude that there is variation in the community structure of mites across the seven samples in the landscape, and that it is related to shrub density classes (None, Few, Many), and they are distributed across the landscape along a gradient of water content.