This is the homepage of the **Structure Optimized Proximity Scaling (STOPS)** project. On this page you can find links to papers, talks, data and software related to STOPS. One can also find a tutorial for COPS and STOPS and the MDS functions below.

Except noted otherwise, content on this homepage and this work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Cluster Optimized Proximity Scaling (COPS)

Assessing and Quantifying Clusteredness: The OPTICS Cordillera

Title | Event | Date | Place |
---|---|---|---|

Structure Optimized Proximity Scaling (STOPS): A Framework for Hyperparameter Selection in Multidimensional Scaling | Psychoco 2017 | 09.02.2017-10.02.2017 | WU Vienna, Austria |

COPS and STOPS: Cluster and/or Structure Optimized Proximity Scaling | Brown Bag Seminar, Institute for Statistics and Mathematics | 07.12.2016 | WU Vienna, Austria |

The OPTICS Cordillera: Nonparametric Assessment of Clusteredness | Brown Bag Seminar, Institute for Statistics and Mathematics | 23.10.2016 | WU Vienna, Austria |

COPS: Cluster Optimized Proximity Scaling | Psychoco 2015 | 12.02.2015-13.02.2015 | Amsterdam, The Netherlands |

Scaling for Clusters with COPS: Cluster Optimized Proximity Scaling | CFE-ERCIM 2014 | 06.12.2014-08.12.2014 | Pisa, Italy |

The **project summary page** you can find **here**.

In this document we introduce the functionality avalaiable in `stops`

for fitting multidimensional scaling (MDS; Borg & Groenen 2005) or proximity scaling (PS) models either with a STOPS or COPS idea or not. We start with a short introduction to PS and the models that we have available. We then explain fitting of these models with the `stops`

package. Next, we introduce the reader to COPS (Rusch et al. 2015a) and STOPS (Rusch et al. 2015b) models and show how to fit those. For illustration we use the `smacof::kinshipdelta`

data set (Rosenberg, S. & Kim, M. P., 1975) which lists percentages of how often 15 kinship terms were not grouped together by college students.

`library(stops)`

```
## Loading required package: smacof
## Loading required package: rgl
##
## Attaching package: 'stops'
##
## The following object is masked from 'package:stats':
##
## cmdscale
```

For proximity scaling (PS) or multidimensional scaling (MDS) the input is typically an \(N\times N\) matrix \(\Delta^*=f(\Delta)\), a matrix of proximities with elements \(\delta^*_{ij}\), that is a function of a matrix of observed non-negative dissimilarities \(\Delta\) with elements \(\delta_{ij}\). \(\Delta^*\) usually is symmetric (but does not need to be). The main diagonal of \(\Delta\) is 0. We call a \(f: \delta_{ij} \mapsto \delta^*_{ij}\) a proximity transformation function. In the MDS literature these \(\delta_{ij}^*\) are often called dhats or disparities. The problem that proximity scaling solves is to locate an \(N \times M\) matrix \(X\) (the configuration) with row vectors \(x_i, i=1,\ldots,N\) in low-dimensional space \((\mathbb{R}^M, M \leq N)\) in such a way that transformations \(g(d_{ij}(X))\) of the fitted distances \(d_{ij}(X)=d(x_i,x_j)\)—i.e., the distance between different \(x_i, x_j\)—approximate the \(\delta^*_{ij}\) as closely as possible. We call a \(g: d_{ij}(X) \mapsto d_{ij}^*(X)\) a distance transformation function. In other words, proximity scaling means finding \(X\) so that \(d^*_{ij}(X)=g(d_{ij}(X))\approx\delta^*_{ij}=f(\delta_{ij})\).

This approximation \(D^*(X)\) to the matrix \(\Delta^*\) is found by defining a fit criterion (the loss function), \(\sigma_{MDS}(X)=L(\Delta^*,D^*(X))\), that is used to measure how closely \(D^*(X)\) approximates \(\Delta^*\). Usually, they are closely related to the quadratic loss function. A general formulation of a loss function based on a quadratic loss is \begin{equation} \label{eq:stress} \sigma_{MDS}(X)=\sum^N_{i=1}\sum^N_{j=1} z_{ij} w_{ij}\left[d^*_{ij}(X)-\delta^*_{ij}\right]^2=\sum^N_{i=1}\sum^N_{j=1} z_{ij}w_{ij}\left[g\left(d_{ij}(X)\right)-f(\delta_{ij})\right]^2 \end{equation}Here, the \(w_{ij}\) and \(z_{ij}\) are finite weights, with \(z_{ij}=0\) if the entry is missing and \(z_{ij}=1\) otherwise.

The loss function used is then minimized to find the vectors \(x_1,\dots,x_N\), i.e., \begin{equation} \label{eq:optim} \arg \min_{X}\ \sigma_{MDS}(X). \end{equation}There are a number of optimization techniques one can use to solve this optimization problem.

`stops`

is based on the loss function type Typically, the norm used is the Euclidean norm, so \(p=2\). In standard MDS \(g(\cdot)=f(\cdot)=I(\cdot)\), the identity function.

This formulation enables one to express a large number of PS methods many of which are implemented in `stops`

. In `stops`

we allow to use specific choices for \(f(\cdot)\) and \(g(\cdot)\) from the family of power transformations so one can fit the following stress models:

**Explicitly normalized stress**: \(w_{ij}=(\sum_{ij}\delta^{*2}_{ij})^{-1}\), \(\delta_{ij}^*=\delta_{ij}\), \(d_{ij}(X)^*=d_{ij}(X)\)**Stress-1**: \(w_{ij}=(\sum_{ij} d^{*2}_{ij}(X))^{-1}\), \(\delta_{ij}^*=\delta_{ij}\), \(d_{ij}(X)^*=d_{ij}(X)\)**Sammon stress**(Sammon 1969): \(w_{ij}=\delta^{*-1}_{ij}\) , \(\delta_{ij}^*=\delta_{ij}\), \(d_{ij}(X)^*=d_{ij}(X)\)**Elastic scaling**stress (McGee 1966): \(w_{ij}=\delta^{*-2}_{ij}\), \(\delta_{ij}^*=\delta_{ij}\), \(d_{ij}(X)^*=d_{ij}(X)\)**S-stress**(Takane et al. 1977): \(\delta^*_{ij}=\delta_{ij}^2\) and \(d^*_{ij}(X)=d^2_{ij}(X)\), \(w_{ij}=1\)**R-stress**(de Leeuw, 2014): \(\delta^*_{ij}=\delta_{ij}\) and \(d^*_{ij}=d^{2r}_{ij}\), \(w_{ij}=1\)

**Power MDS**(Buja et al. 2008, Rusch et al. 2015a): \(\delta^*_{ij}=\delta_{ij}^\lambda\) and \(d^*_{ij}=d^\kappa_{ij}\), \(w_{ij}=1\)**Power elastic scaling**(Rusch et al. 2015a): \(w_{ij}=\delta^{*-2}_{ij}\), \(\delta^*_{ij}=\delta_{ij}^\lambda\) and \(d^*_{ij}=d^\kappa_{ij}\)**Power Sammon mapping**(Rusch et al. 2015a): \(w_{ij}=\delta^{*-1}_{ij}\), \(\delta^*_{ij}=\delta_{ij}^\lambda\) and \(d^*_{ij}=d^\kappa_{ij}\)**Powerstress**(encompassing all previous models; Buja et al. 2008, Rusch et al. 2015a): \(\delta^*_{ij}=\delta_{ij}^\lambda\), \(d^*_{ij}=d^\kappa_{ij}\) and \(w_{ij}=w_{ij}^\nu\) for arbitrary \(w_{ij}\) (e.g., a function of the \(\delta_{ij}\))

For all of these models one can use the function `powerStressMin`

which uses majorization to find the solution (de Leeuw, 2014) . The function allows to specify a `kappa`

, `lambda`

and `nu`

argument as well as a `weightmat`

(the \(w_{ij}\)).

The object returned from `powerStressMin`

is of class `smacofP`

which extends the `smacof`

classes (de Leeuw & Mair, 2009) to allow for the power transformations. Apart from that the objects are made so that they have maximum compatibility to methods from `smacof`

. Accordingly, the following S3 methods are available:

Method | Description |
---|---|

Prints the object | |

summary | A summary of the object |

plot | 2D Plots of the object |

plot3d | Dynamic 3D configuration plot |

plot3dstatic | Static 3D configuration plot |

residuals | Residuals |

coef | Model Coefficients |

Let us illustrate the usage

`dis<-as.matrix(smacof::kinshipdelta)`

- A standard MDS (
`stress`

)

```
res1<-powerStressMin(dis,kappa=1,lambda=1)
res1
```

```
##
## Call: powerStressMin(delta = dis, kappa = 1, lambda = 1)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 0.2667
## Number of iterations: 5219
```

- A
`sammon`

mapping

```
res2<-powerStressMin(dis,kappa=1,lambda=1,nu=-1,weightmat=dis)
res2
```

```
##
## Call: powerStressMin(delta = dis, kappa = 1, lambda = 1, nu = -1, weightmat = dis)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 7.023
## Number of iterations: 74860
```

Alternatively, one can use the faster `sammon`

function from `MASS`

(Venables & Ripley, 2002) for which we provide a wrapper that adds class attributes and methods (and overloads the function).

`res2a<-sammon(dis)`

```
## Initial stress : 0.17053
## stress after 3 iters: 0.10649
```

`res2a`

```
##
## Call: sammon(d = dis)
##
## Model: Sammon Scaling
## Number of objects: 15
## Stress: 0.1065
```

- An
`elastic`

scaling

```
res3<-powerStressMin(dis,kappa=1,lambda=1,nu=-2,weightmat=dis)
res3
```

```
##
## Call: powerStressMin(delta = dis, kappa = 1, lambda = 1, nu = -2, weightmat = dis)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 59.87
## Number of iterations: 1e+05
```

- A
`sstress`

model

```
res4<-powerStressMin(dis,kappa=2,lambda=2)
res4
```

```
##
## Call: powerStressMin(delta = dis, kappa = 2, lambda = 2)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 0.3516
## Number of iterations: 39427
```

- An
`rstress`

model (with \(r=1\) as \(r=\kappa/2\))

```
res5<-powerStressMin(dis,kappa=2,lambda=1)
res5
```

```
##
## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 0.4133
## Number of iterations: 17686
```

- A
`powermds`

model

```
res6<-powerStressMin(dis,kappa=2,lambda=1.5)
res6
```

```
##
## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1.5)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 0.3733
## Number of iterations: 39377
```

- A
`powersammon`

model

```
res7<-powerStressMin(dis,kappa=2,lambda=1.5,nu=-1,weightmat=dis)
res7
```

```
##
## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1.5, nu = -1,
## weightmat = dis)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 7.271
## Number of iterations: 1e+05
```

- A
`powerelastic`

scaling

```
res8<-powerStressMin(dis,kappa=2,lambda=1.5,nu=-2,weightmat=dis)
res8
```

```
##
## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1.5, nu = -2,
## weightmat = dis)
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 62.61
## Number of iterations: 1e+05
```

- A
`powerstress`

model

```
res9<-powerStressMin(dis,kappa=2,lambda=1.5,nu=-1.5,weightmat=2*1-diag(nrow(dis)))
res9
```

```
##
## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1.5, nu = -1.5,
## weightmat = 2 * 1 - diag(nrow(dis)))
##
## Model: Power Stress SMACOF
## Number of objects: 15
## Stress-1 value: 0.8362
## Number of iterations: 44052
```

`summary(res9)`

```
##
## Configurations:
## D1 D2
## Aunt -0.1228 0.2497
## Brother 0.1961 -0.1405
## Cousin 0.0526 0.3101
## Daughter -0.2048 -0.1259
## Father 0.1637 -0.1824
## Granddaughter -0.2358 -0.0525
## Grandfather 0.2149 -0.1329
## Grandmother -0.2362 -0.0861
## Grandson 0.2148 -0.1073
## Mother -0.2097 -0.1366
## Nephew 0.1709 0.2108
## Niece -0.1234 0.2441
## Sister -0.2216 -0.0973
## Son 0.1700 -0.1710
## Uncle 0.1713 0.2179
##
##
## Stress per point:
## SPP SPP(%)
## Niece 0.0022 4.688
## Nephew 0.0022 4.712
## Aunt 0.0023 4.937
## Uncle 0.0023 5.028
## Daughter 0.0028 6.084
## Son 0.0029 6.275
## Father 0.0030 6.452
## Mother 0.0031 6.686
## Cousin 0.0032 6.798
## Sister 0.0034 7.272
## Brother 0.0034 7.283
## Grandson 0.0037 7.979
## Granddaughter 0.0038 8.102
## Grandmother 0.0041 8.819
## Grandfather 0.0041 8.885
```

```
plot(res9)
plot(res9,"transplot")
plot(res9,"Shepard")
plot(res9,"resplot")
plot(res9,"bubbleplot")
```

`stops`

is based on the loss function type . Here the \(\Delta^*\) are a transformation of the \(\Delta\), \(\Delta^*= f (\Delta)\) so that \(f(\cdot)=-(h\circ l)(\cdot)\) where \(l\) is any function and \(h(\cdot)\) is a double centering operation, \(h(\Delta)=\Delta-\Delta_{i.}-\Delta_{.j}+\Delta_{..}\) where \(\Delta_{i.}, \Delta_{.j}, \Delta_{..}\) are matrices consisting of the row, column and grand marginal means respectively. These then get approximated by (functions of) the inner product matrices of \(X\)
\begin{equation}
\label{eq:dist2}
d_{ij}(X) = \langle x_{i},x_{j} \rangle
\end{equation}
We can thus express classical scaling as a special case of the general PS loss with \(d_{ij}(X)\) as an inner product, \(g(\cdot) = I(\cdot)\) and \(f(\cdot)=-(h \circ I)(\cdot)\).

If we again allow power transformations for \(g(\cdot)\) and \(f(\cdot)\) one can fit the following strain models with `stops`

**Classical scaling**(Torgerson, 1958): \(\delta^*_{ij}=-h(\delta_{ij})\) and \(d^*_{ij}=d_{ij}\)**Powerstrain**(Buja et al. 2008, Rusch et al. 2015a): \(\delta^*_{ij}=-h(\delta_{ij}^\lambda)\), \(d^*_{ij}=d_{ij}\) and \(w_{ij}=w_{ij}^\nu\) for arbitrary \(w_{ij}\)

In `stops`

we have a wrapper to `cmdscale`

(overloading the `base`

function) which extend functionality by offering an object that matches `smacofP`

objects with corresponding methods.

A `powerstrain`

model is rather easy to fit with simply subjecting the dissimilarity matrix to some power. Here we use \(\lambda=3\).

```
resc<-cmdscale(kinshipdelta^3)
resc
```

```
##
## Call: cmdscale(d = kinshipdelta^3)
##
## Model: Torgerson-Gower Scaling
## Number of objects: 15
## GOF: 0.4258 0.6282
```

`summary(resc)`

```
##
## Configurations:
## D1 D2
## Aunt 178193 204987
## Brother -174369 -94357
## Cousin -48355 265057
## Daughter 169149 -109936
## Father -145389 -168604
## Granddaughter 187039 -44851
## Grandfather -180116 -103668
## Grandmother 199145 -83039
## Grandson -169768 -72359
## Mother 185798 -138677
## Nephew -195964 173124
## Niece 168278 208870
## Sister 182224 -70764
## Son -149876 -135883
## Uncle -205989 170101
```

```
summary(resc)
plot(resc)
```

The main contribution of the `stops`

package is not in solely fitting the *powerstress* or *powerstrain* models and their variants from above, but allowing to choose the right transformation to achieve a “structured” MDS result automatically. This can be useful in a variety of contextes: to explore or generate structures, to restrict the target space, to avoid artefacts, to preserve certain types of structures and so forth.

For this, an MDS loss function is subjected to nonlinear transformations and is augmented to include penalties for the type of structures one is aiming for. This combination of an MDS loss with a structuredness penalty is what we call “structure optimized loss” (stoploss) and the resulting MDS is coined “Structured Optimized Proximity Scaling” (or STOPS). The prime example for a STOPS model is “Cluster Optimized Proximity Scaling” (COPS) which selects optimal parameters so that the clusteredness appearance of the configuation is improved (see below).

Following Rusch et al. (2015b) the general idea is that from given observations \(\Delta\) we look for a configuration \(X\). This achieves this by minimizing some loss function \(\sigma_{MDS}(X^*;\Delta^*)\) where the \(\Delta^*, X^*\) are functions of the \(\Delta\) and \(X\). The \(X\) has properties with regards to its structural appearance, which we call *c-structuredness* for configuration-structuredness. There are different types of c-structuredness people might be interested in (say, how clustered the result is, or that dimensions are orthogonal or if there is some submanifold that the data live on). We developed indices for these types of c-structuredness that capture that essence in the configuration.

We have as part of a STOPS model a proximity scaling loss function \(\sigma_{MDS}(\cdot)\), a \(\Delta\) and an \(X\) and some transformation \(f_{ij}(\delta_{ij};\theta)\) and \(g_{ij}(d_{ij};\theta)\) that is parametrized (with \(\theta\) either finite or infinite dimensional, e.g., a transformation applied to all observations like a power transformation or even an individual transformation per object). These transformations achieve a sort of push towards more structure, so different values for \(\theta\) will in general lead to different c-structuredness.

We further have \(K\) different indices \(I_k(X)\) that measure different types of c-structuredness. We can then define as methods that are of the form (additive STOPS) \begin{equation} \text{aSTOPS}(X, \theta, v_0, \dots, v_k; \Delta) = v_0 \cdot \sigma_{MDS}(X^*(\theta)) + \sum^K_{k=1} v_k I_k(X(\theta)) \end{equation} or (multiplicative STOPS) \begin{equation} \text{mSTOPS}(X, \theta, v_0, \dots, v_k; \Delta) = \sigma_{MDS}(X^*(\theta))^{v_0} \cdot \prod^K_{k=1} I_k(X(\theta))^{v_k} \end{equation}(which can be expressed as aSTOPS by logarithms). Here the \(v_0,...,v_k\) are weights that determine how the individual parts (mds loss and c-structuredness indices) are aggregated.

The job is then to find \begin{equation} \arg\min_{\vartheta}\ \text{aSTOPS}(X, \theta, v_0, \dots, v_k; \Delta)\ \text{or} \ \arg\min_{\vartheta}\ \text{mSTOPS}(X, \theta, v_0, \dots, v_k; \Delta) \end{equation}where \(\vartheta \subseteq \{X,\theta, v_0, \dots, v_k\}\). Typically \(\vartheta\) will be a subset of all possible parameters here (e.g., the weights might be given *a priori*). Currently, the transformations that can be used in `stops`

are limited to power transformations.

Minimizing stoploss can be difficult. In `stops`

we use a nested algorithm combining optimization that internally first solves for \(X\) given \(\theta\), \(\arg\min_X \sigma_{MDS}\left(X,\theta\right)\), and then optimize over \(\theta\) with a metaheuristic. Implemented are a simulated annealing (`optimmethod="SANN"`

) or particle swarm optimization (`optimmethod="pso"`

) and a variant of the Luus-Jaakola (`optimmethod="ALJ"`

) procedure . We suggest to use the latter. A Bayesian optimization approach is currently under way.

Currently the following c-structuredness types are supported:

**c-clusteredness**(`cclusteredness`

): A clustered appearance of the configuration (\(I_k\) is the normed OPTICS cordillera (COPS; Rusch et al. 2015a); 0 means no c-clusteredness, 1 means perfect c-clusteredness)**c-linearity**(`clinearity`

): Projections lie close to a linear subspace of the configuration (\(I_k\) is maximal multiple correlation; 0 means orthogonal, 1 means perfectly linear)**c-manifoldness**(`cmanifoldness`

): Projections lie on a sub manifold of the configuration (\(I_k\) is maximal correlation (Sarmanov, 1958); only available for two dimensions; 1 means perfectly smooth function)**c-dependence**(`cdependence`

): Random vectors of projections onto the axes are stochastically dependent (\(I_k\) is distance correlation (Szekely et al., 2007); only available for two dimensions; 0 means they are stochastically independent)**c-association**(`cassociation`

): Pairwise nonlinear association between dimensions (\(I_k\) is the pairwise maximal maximum information coefficient (Reshef et al. 2011), 1 means perfect functional association)

**c-nonmonotonicity**(`cnonmonotonicity`

): Deviation from monotonicity (\(I_k\) is the pairwise maximal maximum assymmetry score (Reshef et al. 2011), the higher the less monotone)

**c-functionality**(`cfunctionality`

): Pairwise functional, smooth, noise-free relationship between dimensions (\(I_k\) is the mean pairwise maximum edge value (Reshef et al. 2011), 1 means perfect functional association)**c-complexity**(`ccomplexity`

): Measures the degree of complexity of the functional relationship between any two dimensions (\(I_k\) is the pairwise maximal minimum cell number (Reshef et al. 2011), the higher the more complex)**c-faithfulness**(`cfaithfulness`

): How accurate is the neighbourhood of \(\Delta\) preserved in \(D\) (\(I_k\) is the adjusted Md index of Chen & Buja, 2013; note that this index deviates from the others by being a function of both \(X^*\) and \(\Delta^*\) rather than \(X^*\) alone)- c-randomness: How close to a random pattern (under some model) is the configuration (\(I_k\) is not clear yet; not yet implemented)

If we have a single \(I(X)=OC(X)\), the OPTICS cordillera (Rusch et al. 2015a), and the transformations applied are power transformations and the weights for the \(I(X)\) is negative we essentially have COPS (see below).

For the MDS loss (argument `loss`

in functions `stops`

and `cops`

), the functions currently support all losses derived from *powerstress* and *powerstrain* and can in principle be fitted with `powerStressMin`

alone. However, for many models offer dedicated functions that either use workhorses that are more optimized for the problem at hand and/or restrict the parameter space for the distance/proximity transformations and thus can be faster. They are:

`stress`

,`smacofSym`

: Kruskall’s stress; Workhorse:`smacofSym`

, Optimization over \(\lambda\)`smacofSphere`

: Kruskall’s stress for projection onto a sphere; Workhorse`smacofSphere`

, Optimizes over \(\lambda\)`strain`

,`powerstrain`

: Classical scaling; Workhorse:`cmdscale`

, Optimization over \(\lambda\)`sammon`

,`sammon2`

: Sammon scaling; Workhorse:`sammon`

or`smacofSym`

, Optimization over \(\lambda\)`elastic`

: Elastic scaling; Workhorse:`smacofSym`

, Optimization over \(\lambda\)`sstress`

: S-stress; Workhorse:`powerStressMin`

, Optimization over \(\lambda\)`rstress`

: S-stress; Workhorse:`powerStressMin`

, Optimization over \(\kappa\)`powermds`

: MDS with powers; Workhorse:`powerStressMin`

, Optimization over \(\kappa\), \(\lambda\)`powersammon`

: Sammon scaling with powers; Workhorse:`powerStressMin`

, Optimization over \(\kappa\), \(\lambda\)`powerelastic`

: Elastic scaling with powers; Workhorse:`powerStressMin`

, Optimization over \(\kappa\), \(\lambda\)`powerstress`

: Power stress model; Workhorse:`powerStressMin`

, Optimization over \(\kappa\), \(\lambda\), \(\nu\)

The syntax for fitting a `stops`

model is rather straightforward. One has to supply the arguments `dis`

which is a dissimilarity matrix and `structures`

a character vector listing the c-structuredness type that should be used to augment the PS loss (see above for the types of structures and losses). The parameters for the structuredness indices should be given with `strucpars`

, a list whose elements are lists corresponding to each structuredness index and listing the parameters (if the default should be used the list element should be set to `NULL`

). The PS loss can be chosen with the argument `loss`

. The type of aggregation for the multi-objective optimization is specified in `type`

and can be one of `additive`

or `multiplicative`

. One can pass additional parameters to the fitting workhorses with `...`

.

`stops(dis, structures = c("cclusteredness","clinearity"), loss="stress", ...)`

One then has all the S3 methods of `smacofP`

at one’s disposal.

For example, let us fit an mSTOPS model that looks for a transformation of the \(\delta_{ij}\) so that a) the result has maximal c-clusteredness (which is 1 in the best case, so we set a negative weight for this structure) b) the projection onto the principal axes are nearly orthogonal (c-clinearity close to 0, so we set a positive weight for this structure) c) but the projections onto the principal axes should be stochastically dependent (negative weight on c-dependence) and d) the fit of the MDS is also factored in (so we set positive weight on the MDS loss). Since we use mSTOPS and a negative weight for c-linearity and c-dependence, a c-linearity/c-dependence close to 0 will overall dominate the stoploss function with the other two criteria being more of an afterthought - in aSTOPS that would be different.

!!: This is generally the approach to be chosen: We *minimize* the stoploss, so a c-structuredness index that should be (numerically) large needs a *negative weight* and a c-structuredness index that should be (numerically) small needs a *positive weight*.

We first set up the parameters for the structuredness indices. For the OPTICS cordillera we use a \(d_{max}\) of 1, epsilon=10 and minpts=2, for c-linearity we have no parameters (so using `NULL`

will work) and for the c-dependence we have a single parameter, `index`

, which we set to 2.

```
strucpars<-list(list(epsilon=10,minpts=2,rang=c(0,1.3)), #cordillera
NULL, # c-linearity (has no parameters)
list(index=2) #c-dependence
)
```

```
ressm<-stops(kinshipdelta,loss="stress",stressweight=1,structures=c("cclusteredness","clinearity","cdependence"),strucweight=c(-0.33,0.33,-0.33),verbose=0,strucpars=strucpars,type="multiplicative")
ressm
```

```
##
## Call: stops(dis = kinshipdelta, loss = "stress", structures = c("cclusteredness",
## "clinearity", "cdependence"), stressweight = 1, strucweight = c(-0.33,
## 0.33, -0.33), strucpars = strucpars, verbose = 0, type = "multiplicative")
##
## Model: multiplicative STOPS with stress loss function and theta parameters= 1 3.095 1
##
## Number of objects: 15
## MDS loss value: 1
## C-Structuredness Indices: cclusteredness 0.53027 clinearity 0.01802 cdependence 0.01802
## Structure optimized loss (stoploss): 1.233
## MDS loss weight: 1 c-structuredness weights: -0.33 0.33 -0.33
## Number of iterations of ALJ optimization: 117
```

`plot(ressm)`

Let us compare this with the corresponding aSTOPS

```
ressa<-stops(kinshipdelta,loss="stress",stressweight=1,structures=c("cclusteredness","clinearity","cdependence"),strucweight=c(-0.33,0.33,-0.33),verbose=0,strucpars=strucpars,type="additive")
ressa
```

```
##
## Call: stops(dis = kinshipdelta, loss = "stress", structures = c("cclusteredness",
## "clinearity", "cdependence"), stressweight = 1, strucweight = c(-0.33,
## 0.33, -0.33), strucpars = strucpars, verbose = 0, type = "additive")
##
## Model: additive STOPS with stress loss function and theta parameters= 1 3.114 1
##
## Number of objects: 15
## MDS loss value: 1
## C-Structuredness Indices: cclusteredness 0.53021 clinearity 0.01785 cdependence 0.01785
## Structure optimized loss (stoploss): 0.825
## MDS loss weight: 1 c-structuredness weights: -0.33 0.33 -0.33
## Number of iterations of ALJ optimization: 81
```

`plot(ressa)`

We see that the c-clusteredness is higher as compared to the mSTOPS result - we have a number of distinct object clusters (with at least minpts=2) and they are more spread out and distributed more evenly. The dimensions on the other hand are now farther from being orthogonal but the stochastic dependence is higher (which is a non-linear one obviously).

When choosing a c-structuredness index, one needs to be clear what structure one might be interested in and how it interacts with the PS loss chosen. Consider the following example: We fit a `powermds`

model to the kinship data and want to maximize c-association (i.e., any non-linear relationship) and c-manifoldness but minimize the c-linearity. In other words we try to find a power transformation of \(\Delta\) and \(D\) so that the objects are positioned in the configuration in such a way that the projection onto the principal axes are as close as possible to being related by a smooth but non-linear function.

`resa<-stops(kinshipdelta,structures=c("cassociation","cmanifoldness","clinearity"),loss="powermds",verbose=0,strucpars=list(NULL,NULL,NULL),type="additive",strucweight=c(-0.5,-0.5,0.5))`

`resa`

```
##
## Call: stops(dis = kinshipdelta, loss = "powermds", theta = c(2.9429394,
## 1.67850653, 1.57140404), structures = c("cassociation", "cmanifoldness",
## "clinearity"), strucweight = c(-0.5, -0.5, 0.5), strucpars = list(NULL,
## NULL, NULL), verbose = 0, type = "additive", itmax = 1)
##
## Model: additive STOPS with powermds loss function and theta parameters= 2.943 1.679 1
##
## Number of objects: 15
## MDS loss value: 0.9995
## C-Structuredness Indices: cassociation 1.0000000 cmanifoldness 0.9918892 clinearity 0.0001654
## Structure optimized loss (stoploss): 0.003602
## MDS loss weight: 1 c-structuredness weights: -0.5 -0.5 0.5
## Number of iterations of ALJ optimization: 1
```

We see in this model (`resa`

) that indeed the c-association is 1, which says we have a near perfect non-linear relationship. How does this relationship look like?

`plot(resa)`

It is a parabolic shape, so the projections are so that the points on D2 are a near parabolic function of the D1 (projecting onto some structure resembling a conic section is often the case for r-stress which is essentially what we got here - setting a negative weight on c-assocation can combat that if that is an artefact). What we can also see is that there are three clear clusters, so c-clusteredness should be high. But when looking at the OPTICS cordillera here, we find that the OPTICS cordillera is lower than for the result from above with using `stress`

and lambda=2.66 (the `ressa`

model).

```
c1<-cordillera(resa$fit$conf,minpts=2,epsilon=10,rang=c(0,1.3))
c2<-cordillera(ressa$fit$conf,minpts=2,epsilon=10,rang=c(0,1.3))
c1
```

```
## raw normed
## 7.9441 0.4365
```

`c2`

```
## raw normed
## 9.6499 0.5302
```

This discrepancy comes from the definition of c-clusteredness (Rusch et al, 2015a) where more clusters, more spread-out clusters, more evenly distributed clusters and denser clusters all increase c-clusteredness. In the example with maximizing c-association we have two very dense clusters of 5 points and 1 relatively non-dense cluster of five other points. In the model maximizing c-clusteredness (and others) we get 6 relatively moderate dense clusters with 2 or 3 points each, which is also the minimum number of points we wanted to be grouped together. Most importantly, they are projected onto a much larger range of the target space as the \(X\) obtained from the `stress`

loss is different than the one obtained from the `powermds`

loss, so the \(d_{max}\) is very different. Since we use the normed OPTICS cordillera there, we look at c-clusteredness relative to the most clustered appearance with two points per cluster. Thus, the second result has more c-clusteredness. If we would define a cluster as having at most 5 points then c-clusteredness of the result with high c-association also has a large c-clusteredness because then the clusters found match the definition of high c-clusteredness.

```
c3<-cordillera(resa$fit$conf,minpts=5,epsilon=10,rang=c(0,1.3))
c3
```

```
## raw normed
## 3.8650 0.5946
```

Note that it may just as well be possible to have a high c-association and no c-clusteredness at all (e.g., points lying equidistant on a smooth non-linear curve). Note also that the models are not necessarily comparable due to different stress functions - the transformation in `powermds`

that is optimal with respect to c-clusteredness would be different.

Indeed one can optimize for c-clusteredness alone and using it as a “goodness-of-clusteredness” index (i.e., the \(d_{max}\) is not constant over configurations but varies conditional on the configuration) then we get a projection with c-clusteredness of 0.67.

`resa2<-stops(kinshipdelta,structures=c("cclusteredness"),loss="powermds",verbose=0,strucpars=list(list(epsilon=10,rang=NULL,minpts=2)),type="additive",strucweight=-1,stressweight=0)`

For convenience it is also possible to use the `stops`

function for finding the loss-optimal transformation in the the non-augmented models specified in `loss`

, by setting the `strucweight`

, the weight of any c-structuredness, to 0. Then the function optimizes the MDS loss function only.

`ressa<-stops(kinshipdelta,structure=c("clinearity"),strucweight=0,loss="stress",verbose=0)`

with \(\theta_0=(1,1)^\top\) in case of loss functions with power transformations. Thus an increase of 1 in the MDS loss measure can be compensated by an increase of \(v^0_1/v^0_2\) in c-clusteredness. Selecting \(v_1=1,v_2=v^{0}_2\) this way is in line with the idea of pushing the configurations towards a more clustered appearance relative to the initial solution.

Another possibility is to choose them in such a way that \(\text{coploss}=0\) in the optimum value, i.e., choosing \(v^{opt}_{1}, v^{opt}_2\) so that \begin{equation} v^{opt}_1 \cdot \sigma_{MDS}\left(X(\theta^*),\theta^*\right)-v^{opt}_2 \cdot \text{OC}\left(X(\theta^*);\epsilon,k,q \right) = 0 \end{equation}with \(\theta^*:=\arg\min_\theta \text{coploss}(\theta)\). This is in line with having \(\text{coploss}(\theta)>0\) for \(\theta \neq \theta^*\) and allows to optimize over \(v_1,v_2\).

The optimization problem in COPS is then to find\begin{equation} \label{eq:soemdsopt2} \arg\min_{\theta} \text{coploss}(\theta) \end{equation} by doing \begin{equation} \label{eq:soemdsopt} v_1 \cdot \sigma_{MDS}\left(X(\theta),\theta\right) - v_2 \cdot \text{OC}\left(X(\theta);\epsilon,k,q\right) \rightarrow \min_\theta! \end{equation}

For a given \(\theta\) if \(v_2=0\) than the result of optimizing the above is the same as solving the respective original PS problem. Letting \(\theta\) be variable, \(v_2=0\) will minimize the loss over configurations obtained from using different \(\theta\).

The c-clusteredness index we use is the OPTICS cordillera which measures how clustered a configuration appears. It is based on the OPTICS algorithm that outputs an ordering together with a distance. The OPTICS cordillera is now simply an agregation of that information. Since we know what constitutes a maximally clustered result, we can derive an upper bound and normalize the index to lie between 0 and 1. If it is maximally clustered, the index gets a value of 1,and it gets 0 if all points are equidistant to their nearest neighbours (a matchstick embedding). See Rusch et al (2015a) for details.

Even though one can fit a COPS model with `stops`

, there is a dedicated function `cops`

. Its syntax works pretty much like in `stops`

only that the `structure`

argument is non-existant.

`cops(dis,loss,...)`

For the example we use a COPS model for a classical scaling (`strain`

loss)

```
resc<-cops(kinshipdelta,loss="strain")
resc
```

```
##
## Call: cops(dis = kinshipdelta, loss = "strain")
##
## Model: COPS with strain loss function and parameters kappa= 1 lambda= 1.606 nu= 1
##
## Number of objects: 15
## MDS loss value: 0.3237
## OPTICS cordillera: Raw 10.87 Normed 0.3087
## Cluster optimized loss (coploss): -0.03458
## MDS loss weight: 1 OPTICS cordillera weight: 1.16
## Number of iterations of ALJ optimization: 90
```

`summary(resc)`

```
##
## Configurations:
## D1 D2
## Aunt -345.9 492.49
## Brother 365.5 -253.89
## Cousin 107.3 560.94
## Daughter -394.8 -262.97
## Father 320.3 -371.50
## Granddaughter -411.9 -99.39
## Grandfather 370.3 -214.95
## Grandmother -412.2 -148.94
## Grandson 370.4 -181.08
## Mother -411.9 -289.88
## Nephew 424.9 398.15
## Niece -340.0 485.92
## Sister -402.9 -183.98
## Son 328.5 -338.10
## Uncle 432.4 407.18
```

A number of plots are availabe

```
plot(resc,"confplot")
plot(resc,"Shepard")
plot(resc,"transplot")
plot(resc,"reachplot")
```

For convenience it is also possible to use the `cops`

function for finding the loss-optimal transformation in the the non-augmented models specified in `loss`

, by setting the `cordweight`

, the weight of the OPTICS cordillera, to 0. Then the function optimizes the MDS loss function only.

```
resca<-cops(kinshipdelta,cordweight=0,loss="strain")
resca
```

```
##
## Call: cops(dis = kinshipdelta, loss = "strain", cordweight = 0)
##
## Model: COPS with strain loss function and parameters kappa= 1 lambda= 1.586 nu= 1
##
## Number of objects: 15
## MDS loss value: 0.3237
## OPTICS cordillera: Raw 10.87 Normed 0.3087
## Cluster optimized loss (coploss): 0.3237
## MDS loss weight: 1 OPTICS cordillera weight: 0
## Number of iterations of ALJ optimization: 48
```

Here the results match the result from using the standard `cordweight`

. We can give more weight to the c-clusteredness though:

```
rescb<-cops(kinshipdelta,cordweight=20,loss="strain")
rescb
```

```
##
## Call: cops(dis = kinshipdelta, loss = "strain", cordweight = 20)
##
## Model: COPS with strain loss function and parameters kappa= 1 lambda= 2.06 nu= 1
##
## Number of objects: 15
## MDS loss value: 0.3395
## OPTICS cordillera: Raw 11.01 Normed 0.3128
## Cluster optimized loss (coploss): -5.916
## MDS loss weight: 1 OPTICS cordillera weight: 20
## Number of iterations of ALJ optimization: 94
```

```
plot(resca,main="with cordweight=0")
plot(rescb,main="with cordweight=20")
```

This result has more c-clusteredness but less fit. The higher c-clusteredness is discernable in the Grandfather/Brother and Grandmother/Sister clusters (we used a minimum number of 2 observations to make up a cluster, `minpts=2`

).

The package also provides functions that are used by the `cops`

and `stops`

and `powerStressMin`

functions but may be of interest to a end user beyond that.

For calculating a COPS solution, we need the OPTICS algorithm and the OPTICS cordillera. In the package we also provide a rudimentary interface to the OPTICS impementation in ELKI.

```
data(iris)
res<-optics(iris[,1:4],minpts=2,epsilon=1000)
print(res)
```

```
## observation reachability
## 1 ID=1 reachability=∞
## 2 ID=18 reachability=0.1
## 3 ID=41 reachability=0.14142136
## 4 ID=5 reachability=0.14142136
## 5 ID=38 reachability=0.14142136
## 6 ID=40 reachability=0.14142136
## 7 ID=8 reachability=0.1
## 8 ID=50 reachability=0.14142136
## 9 ID=29 reachability=0.14142136
## 10 ID=28 reachability=0.14142136
## 11 ID=36 reachability=0.2236068
## 12 ID=49 reachability=0.2236068
## 13 ID=11 reachability=0.1
## 14 ID=27 reachability=0.2236068
## 15 ID=24 reachability=0.2
## 16 ID=44 reachability=0.2236068
## 17 ID=12 reachability=0.2236068
## 18 ID=30 reachability=0.2236068
## 19 ID=31 reachability=0.14142136
## 20 ID=35 reachability=0.14142136
## 21 ID=10 reachability=0.1
## 22 ID=2 reachability=0.14142136
## 23 ID=46 reachability=0.14142136
## 24 ID=13 reachability=0.14142136
## 25 ID=26 reachability=0.17320508
## 26 ID=4 reachability=0.17320508
## 27 ID=48 reachability=0.14142136
## 28 ID=3 reachability=0.14142136
## 29 ID=43 reachability=0.2236068
## 30 ID=39 reachability=0.2
## 31 ID=9 reachability=0.14142136
## 32 ID=7 reachability=0.2236068
## 33 ID=20 reachability=0.24494897
## 34 ID=22 reachability=0.14142136
## 35 ID=47 reachability=0.14142136
## 36 ID=14 reachability=0.24494897
## 37 ID=25 reachability=0.3
## 38 ID=37 reachability=0.3
## 39 ID=21 reachability=0.3
## 40 ID=32 reachability=0.28284271
## 41 ID=17 reachability=0.34641016
## 42 ID=6 reachability=0.34641016
## 43 ID=19 reachability=0.33166248
## 44 ID=33 reachability=0.34641016
## 45 ID=34 reachability=0.34641016
## 46 ID=45 reachability=0.36055513
## 47 ID=16 reachability=0.36055513
## 48 ID=15 reachability=0.41231056
## 49 ID=23 reachability=0.45825757
## 50 ID=42 reachability=0.6244998
## 51 ID=99 reachability=1.64012195
## 52 ID=58 reachability=0.38729833
## 53 ID=94 reachability=0.14142136
## 54 ID=61 reachability=0.36055513
## 55 ID=82 reachability=0.64807407
## 56 ID=81 reachability=0.14142136
## 57 ID=70 reachability=0.17320508
## 58 ID=90 reachability=0.24494897
## 59 ID=54 reachability=0.2
## 60 ID=93 reachability=0.26457513
## 61 ID=83 reachability=0.14142136
## 62 ID=68 reachability=0.24494897
## 63 ID=100 reachability=0.26457513
## 64 ID=97 reachability=0.14142136
## 65 ID=96 reachability=0.14142136
## 66 ID=95 reachability=0.17320508
## 67 ID=89 reachability=0.17320508
## 68 ID=91 reachability=0.26457513
## 69 ID=62 reachability=0.3
## 70 ID=56 reachability=0.31622777
## 71 ID=67 reachability=0.3
## 72 ID=85 reachability=0.2
## 73 ID=79 reachability=0.33166248
## 74 ID=92 reachability=0.2
## 75 ID=64 reachability=0.14142136
## 76 ID=74 reachability=0.2236068
## 77 ID=72 reachability=0.34641016
## 78 ID=98 reachability=0.33166248
## 79 ID=75 reachability=0.2
## 80 ID=76 reachability=0.26457513
## 81 ID=66 reachability=0.14142136
## 82 ID=59 reachability=0.24494897
## 83 ID=55 reachability=0.24494897
## 84 ID=52 reachability=0.31622777
## 85 ID=57 reachability=0.26457513
## 86 ID=87 reachability=0.31622777
## 87 ID=53 reachability=0.28284271
## 88 ID=51 reachability=0.26457513
## 89 ID=78 reachability=0.31622777
## 90 ID=77 reachability=0.31622777
## 91 ID=80 reachability=0.34641016
## 92 ID=86 reachability=0.37416574
## 93 ID=60 reachability=0.38729833
## 94 ID=148 reachability=0.41231056
## 95 ID=111 reachability=0.2236068
## 96 ID=112 reachability=0.34641016
## 97 ID=117 reachability=0.36055513
## 98 ID=138 reachability=0.14142136
## 99 ID=104 reachability=0.24494897
## 100 ID=129 reachability=0.33166248
## 101 ID=133 reachability=0.1
## 102 ID=105 reachability=0.3
## 103 ID=146 reachability=0.36055513
## 104 ID=142 reachability=0.24494897
## 105 ID=141 reachability=0.36055513
## 106 ID=145 reachability=0.24494897
## 107 ID=121 reachability=0.26457513
## 108 ID=144 reachability=0.2236068
## 109 ID=125 reachability=0.3
## 110 ID=113 reachability=0.34641016
## 111 ID=140 reachability=0.17320508
## 112 ID=116 reachability=0.37416574
## 113 ID=149 reachability=0.3
## 114 ID=137 reachability=0.24494897
## 115 ID=147 reachability=0.37416574
## 116 ID=124 reachability=0.24494897
## 117 ID=127 reachability=0.17320508
## 118 ID=128 reachability=0.24494897
## 119 ID=139 reachability=0.14142136
## 120 ID=71 reachability=0.2236068
## 121 ID=150 reachability=0.28284271
## 122 ID=143 reachability=0.33166248
## 123 ID=102 reachability=0
## 124 ID=114 reachability=0.26457513
## 125 ID=122 reachability=0.31622777
## 126 ID=84 reachability=0.36055513
## 127 ID=134 reachability=0.33166248
## 128 ID=73 reachability=0.36055513
## 129 ID=103 reachability=0.4
## 130 ID=126 reachability=0.38729833
## 131 ID=130 reachability=0.34641016
## 132 ID=65 reachability=0.42426407
## 133 ID=101 reachability=0.42426407
## 134 ID=120 reachability=0.43588989
## 135 ID=108 reachability=0.43588989
## 136 ID=131 reachability=0.26457513
## 137 ID=115 reachability=0.48989795
## 138 ID=63 reachability=0.48989795
## 139 ID=69 reachability=0.50990195
## 140 ID=88 reachability=0.26457513
## 141 ID=106 reachability=0.52915026
## 142 ID=123 reachability=0.26457513
## 143 ID=119 reachability=0.41231056
## 144 ID=136 reachability=0.53851648
## 145 ID=135 reachability=0.53851648
## 146 ID=109 reachability=0.55677644
## 147 ID=110 reachability=0.63245553
## 148 ID=107 reachability=0.73484692
## 149 ID=118 reachability=0.81853528
## 150 ID=132 reachability=0.41231056
```

`summary(res)`

```
##
## An OPTICS results with minpts= 2 and epsilon= 1000
##
## Five Point Summary of the Minimum Reachabilities:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.173 0.265 0.292 0.346 1.640 1
##
## Stem and Leaf Display of the Minimum Reachabilities:
##
## The decimal point is 1 digit(s) to the left of the |
##
## 0 | 0
## 1 | 00000444444444444444444444444447777777
## 2 | 00000022222222222244444444444466666666666888
## 3 | 0000000022222233333355555555566666666777999
## 4 | 011112244699
## 5 | 13446
## 6 | 235
## 7 | 3
## 8 | 2
## 9 |
## 10 |
## 11 |
## 12 |
## 13 |
## 14 |
## 15 |
## 16 | 4
```

`plot(res,withlabels=TRUE)`

and a function for calculating and displaying the OPTICS cordillera.

```
cres<-cordillera(iris[,1:4],minpts=2,epsilon=1000,scale=FALSE)
cres
```

```
## raw normed
## 14.96797 0.06125
```

`summary(cres)`

```
##
## OPTICS cordillera values with minpts= 2 and epsilon= 1000
##
## Raw OC: 14.97
## Normalization: 244.4
## Normed OC: 0.06125
```

`plot(cres)`

Since the inner optimization problem in STOPS models is hard and takes long, Rusch et al. (2015a) developed a metaheuristic for the outer optimization problem that needs typically less calls to the inner minimization than `pso`

or `SANN`

, albeit without the guarantees of convergence to a global minimum for non-smooth functions. It is an adaptation of the Luus-Jaakola random search (Luus & Jaakola 1973). It can used with the function `ljoptim`

which modeled its output after `optim`

. It needs as arguments `x`

a starting value, `fun`

a function to optimize, a `lower`

and `upper`

box constraint for the search region. By using the argument `adaptive=TRUE`

or `FALSE`

one can switch between our adaptive version and the original LJ algorithm. Accuracy of the optimization can be controlled with the `maxit`

(maximum number of iterations), `accd`

(terminates after the length of the search space is below this number ) and `acc`

arguments (terminates if difference of two subsequent function values are below this value).

We optimize a “Wild Function” with the non-adaptive LJ version (and numerical accuracies of at least `1e-16`

for `accd`

and `acc`

).

```
set.seed(210485)
fwild <- function (x) 10*sin(0.3*x)*sin(1.3*x^2) + 0.00001*x^4 + 0.2*x+80
res2<-ljoptim(50, fwild,lower=-50,upper=50,adaptive=FALSE,accd=1e-16,acc=1e-16)
res2
```

```
## $par
## [1] -15.82
##
## $value
## [1] 67.47
##
## $counts
## function gradient
## 463 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
```

```
plot(fwild, -50, 50, n = 1000, main = "ljoptim() minimising 'wild function'")
points(res2$par,res2$value,col="red",pch=19)
```

We also provide a procrustes adjustment to make two configurations visually comparable. The function is `conf_adjust`

and takes two configurations `conf1`

the reference configuration and `conf2`

another configuration. It returns the adjusted versions

`conf_adjust(conf1,conf2) `

Borg I, Groenen PJ (2005). Modern multidimensional scaling: Theory and applications. 2nd edition. Springer, New York

Buja A, Swayne DF, Littman ML, Dean N, Hofmann H, Chen L (2008). Data visualization with multidimensional scaling. Journal of Computational and Graphical Statistics, 17 (2), 444-472.

Chen L, Buja A (2013). Stress functions for nonlinear dimension reduction, proximity analysis, and graph drawing. Journal of Machine Learning Research, 14, 1145-1173.

de Leeuw J (2014). Minimizing r-stress using nested majorization. Technical Report, UCLA, Statistics Preprint Series.

de Leeuw J, Mair P (2009). Multidimensional Scaling Using Majorization: SMACOF in R. Journal of Statistical Software, 31 (3), 1-30.

Kruskal JB (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29 (1), 1-27.

Luus R, Jaakola T (1973). by direct search and systematic reduction of the size of search region. American Institute of Chemical Engineers Journal (AIChE), 19 (4), 760-766.

McGee VE (1966). The multidimensional analysis of ‘elastic’ distances. British Journal of Mathematical and Statistical Psychology, 19 (2), 181-196.

Reshef D, Reshef Y, Finucane H, Grossman S, McVean G, Turnbaugh P, Lander E, Mitzenmacher M, Sabeti P (2011). Detecting novel associations in large datasets. Science, 334, 6062.

Rosenberg, S. & Kim, M. P. (1975). The method of sorting as a data gathering procedure in multivariate research. Multivariate Behavioral Research, 10, 489-502.

Rusch, T., Mair, P. and Hornik, K. (2015a) COPS: Cluster Optimized Proximity Scaling. Discussion Paper Series / Center for Empirical Research Methods, 2015/1. WU Vienna University of Economics and Business, Vienna.

Rusch, T., Mair, P. and Hornik, K. (2015b). Structuredness Indices and Augmented Nonlinear Dimension Reduction. In preparation.

Sammon JW (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18 (5), 401-409

Sarmanov OV (1958) The maximum correlation coefficient (symmetric case). Dokl. Akad. Nauk SSSR, 120 : 4 (1958), 715 - 718.

Székely, G. J. Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing independence by correlation of distances, The Annals of Statistics, 35:6, 2769–2794.

Takane Y, Young F, de Leeuw J (1977). Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika, 42 (1), 7-67.

Torgerson WS (1958). Theory and methods of scaling. Wiley.

Venables WN, Ripley BD (2002). Modern Applied Statistics with S. Fourth edition. Springer, New York.