R-Forge Logo

This is the homepage of the Structure Optimized Proximity Scaling (STOPS) project. On this page you can find links to papers, talks, data and software related to STOPS. One can also find a tutorial for COPS and STOPS and the MDS functions below.

Creative Commons License
Except noted otherwise, content on this homepage and this work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Papers:

Technical Reports:

Cluster Optimized Proximity Scaling (COPS)
Assessing and Quantifying Clusteredness: The OPTICS Cordillera

Talks:

(title links lead to slides)
Title Event Date Place
Structure Optimized Proximity Scaling (STOPS): A Framework for Hyperparameter Selection in Multidimensional Scaling Psychoco 2017 09.02.2017-10.02.2017 WU Vienna, Austria
COPS and STOPS: Cluster and/or Structure Optimized Proximity Scaling Brown Bag Seminar, Institute for Statistics and Mathematics 07.12.2016 WU Vienna, Austria
The OPTICS Cordillera: Nonparametric Assessment of Clusteredness Brown Bag Seminar, Institute for Statistics and Mathematics 23.10.2016 WU Vienna, Austria
COPS: Cluster Optimized Proximity Scaling Psychoco 2015 12.02.2015-13.02.2015 Amsterdam, The Netherlands
Scaling for Clusters with COPS: Cluster Optimized Proximity Scaling CFE-ERCIM 2014 06.12.2014-08.12.2014 Pisa, Italy

Software:

The project summary page you can find here.

The most recent build is available for Windows and Linux here: STOPS Package

People:

  • Thomas Rusch
  • Patrick Mair
  • Kurt Hornik
  • Jan de Leeuw
  • A tutorial on Stucture Optimized Proximity Scaling (STOPS)

    In this document we introduce the functionality avalaiable in stops for fitting multidimensional scaling (MDS; Borg & Groenen 2005) or proximity scaling (PS) models either with a STOPS or COPS idea or not. We start with a short introduction to PS and the models that we have available. We then explain fitting of these models with the stops package. Next, we introduce the reader to COPS (Rusch et al. 2015a) and STOPS (Rusch et al. 2015b) models and show how to fit those. For illustration we use the smacof::kinshipdelta data set (Rosenberg, S. & Kim, M. P., 1975) which lists percentages of how often 15 kinship terms were not grouped together by college students.

    library(stops)
    ## Loading required package: smacof
    ## Loading required package: rgl
    ## 
    ## Attaching package: 'stops'
    ## 
    ## The following object is masked from 'package:stats':
    ## 
    ##     cmdscale

    Proximity Scaling

    For proximity scaling (PS) or multidimensional scaling (MDS) the input is typically an \(N\times N\) matrix \(\Delta^*=f(\Delta)\), a matrix of proximities with elements \(\delta^*_{ij}\), that is a function of a matrix of observed non-negative dissimilarities \(\Delta\) with elements \(\delta_{ij}\). \(\Delta^*\) usually is symmetric (but does not need to be). The main diagonal of \(\Delta\) is 0. We call a \(f: \delta_{ij} \mapsto \delta^*_{ij}\) a proximity transformation function. In the MDS literature these \(\delta_{ij}^*\) are often called dhats or disparities. The problem that proximity scaling solves is to locate an \(N \times M\) matrix \(X\) (the configuration) with row vectors \(x_i, i=1,\ldots,N\) in low-dimensional space \((\mathbb{R}^M, M \leq N)\) in such a way that transformations \(g(d_{ij}(X))\) of the fitted distances \(d_{ij}(X)=d(x_i,x_j)\)—i.e., the distance between different \(x_i, x_j\)—approximate the \(\delta^*_{ij}\) as closely as possible. We call a \(g: d_{ij}(X) \mapsto d_{ij}^*(X)\) a distance transformation function. In other words, proximity scaling means finding \(X\) so that \(d^*_{ij}(X)=g(d_{ij}(X))\approx\delta^*_{ij}=f(\delta_{ij})\).

    This approximation \(D^*(X)\) to the matrix \(\Delta^*\) is found by defining a fit criterion (the loss function), \(\sigma_{MDS}(X)=L(\Delta^*,D^*(X))\), that is used to measure how closely \(D^*(X)\) approximates \(\Delta^*\). Usually, they are closely related to the quadratic loss function. A general formulation of a loss function based on a quadratic loss is \begin{equation} \label{eq:stress} \sigma_{MDS}(X)=\sum^N_{i=1}\sum^N_{j=1} z_{ij} w_{ij}\left[d^*_{ij}(X)-\delta^*_{ij}\right]^2=\sum^N_{i=1}\sum^N_{j=1} z_{ij}w_{ij}\left[g\left(d_{ij}(X)\right)-f(\delta_{ij})\right]^2 \end{equation}

    Here, the \(w_{ij}\) and \(z_{ij}\) are finite weights, with \(z_{ij}=0\) if the entry is missing and \(z_{ij}=1\) otherwise.

    The loss function used is then minimized to find the vectors \(x_1,\dots,x_N\), i.e., \begin{equation} \label{eq:optim} \arg \min_{X}\ \sigma_{MDS}(X). \end{equation}

    There are a number of optimization techniques one can use to solve this optimization problem.

    Stress Models

    The first popular type of PS supported in stops is based on the loss function type stress (Kruskall 1964). This uses some type of Minkowski distance (\(p > 0\)) as the distance fitted to the points in the configuration, \begin{equation} \label{eq:dist} d_{ij}(X) = ||x_{i}-x_{j}||_p=\left( \sum_{m=1}^M |x_{im}-x_{jm}|^p \right)^{1/p} \ i,j = 1, \dots, N. \end{equation}

    Typically, the norm used is the Euclidean norm, so \(p=2\). In standard MDS \(g(\cdot)=f(\cdot)=I(\cdot)\), the identity function.

    This formulation enables one to express a large number of PS methods many of which are implemented in stops. In stops we allow to use specific choices for \(f(\cdot)\) and \(g(\cdot)\) from the family of power transformations so one can fit the following stress models:

    For all of these models one can use the function powerStressMin which uses majorization to find the solution (de Leeuw, 2014) . The function allows to specify a kappa, lambda and nu argument as well as a weightmat (the \(w_{ij}\)).

    The object returned from powerStressMin is of class smacofP which extends the smacof classes (de Leeuw & Mair, 2009) to allow for the power transformations. Apart from that the objects are made so that they have maximum compatibility to methods from smacof. Accordingly, the following S3 methods are available:

    Method Description
    print Prints the object
    summary A summary of the object
    plot 2D Plots of the object
    plot3d Dynamic 3D configuration plot
    plot3dstatic Static 3D configuration plot
    residuals Residuals
    coef Model Coefficients

    Let us illustrate the usage

    dis<-as.matrix(smacof::kinshipdelta)
    res1<-powerStressMin(dis,kappa=1,lambda=1)
    res1
    ## 
    ## Call: powerStressMin(delta = dis, kappa = 1, lambda = 1)
    ## 
    ## Model: Power Stress SMACOF 
    ## Number of objects: 15 
    ## Stress-1 value: 0.2667 
    ## Number of iterations: 5219
    res2<-powerStressMin(dis,kappa=1,lambda=1,nu=-1,weightmat=dis)
    res2
    ## 
    ## Call: powerStressMin(delta = dis, kappa = 1, lambda = 1, nu = -1, weightmat = dis)
    ## 
    ## Model: Power Stress SMACOF 
    ## Number of objects: 15 
    ## Stress-1 value: 7.023 
    ## Number of iterations: 74860

    Alternatively, one can use the faster sammon function from MASS (Venables & Ripley, 2002) for which we provide a wrapper that adds class attributes and methods (and overloads the function).

    res2a<-sammon(dis)
    ## Initial stress        : 0.17053
    ## stress after   3 iters: 0.10649
    res2a
    ## 
    ## Call: sammon(d = dis)
    ## 
    ## Model: Sammon Scaling 
    ## Number of objects: 15 
    ## Stress: 0.1065
    res3<-powerStressMin(dis,kappa=1,lambda=1,nu=-2,weightmat=dis)
    res3
    ## 
    ## Call: powerStressMin(delta = dis, kappa = 1, lambda = 1, nu = -2, weightmat = dis)
    ## 
    ## Model: Power Stress SMACOF 
    ## Number of objects: 15 
    ## Stress-1 value: 59.87 
    ## Number of iterations: 1e+05
    res4<-powerStressMin(dis,kappa=2,lambda=2)
    res4
    ## 
    ## Call: powerStressMin(delta = dis, kappa = 2, lambda = 2)
    ## 
    ## Model: Power Stress SMACOF 
    ## Number of objects: 15 
    ## Stress-1 value: 0.3516 
    ## Number of iterations: 39427
    res5<-powerStressMin(dis,kappa=2,lambda=1)
    res5
    ## 
    ## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1)
    ## 
    ## Model: Power Stress SMACOF 
    ## Number of objects: 15 
    ## Stress-1 value: 0.4133 
    ## Number of iterations: 17686
    res6<-powerStressMin(dis,kappa=2,lambda=1.5)
    res6
    ## 
    ## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1.5)
    ## 
    ## Model: Power Stress SMACOF 
    ## Number of objects: 15 
    ## Stress-1 value: 0.3733 
    ## Number of iterations: 39377
    res7<-powerStressMin(dis,kappa=2,lambda=1.5,nu=-1,weightmat=dis)
    res7
    ## 
    ## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1.5, nu = -1, 
    ##     weightmat = dis)
    ## 
    ## Model: Power Stress SMACOF 
    ## Number of objects: 15 
    ## Stress-1 value: 7.271 
    ## Number of iterations: 1e+05
    res8<-powerStressMin(dis,kappa=2,lambda=1.5,nu=-2,weightmat=dis)
    res8
    ## 
    ## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1.5, nu = -2, 
    ##     weightmat = dis)
    ## 
    ## Model: Power Stress SMACOF 
    ## Number of objects: 15 
    ## Stress-1 value: 62.61 
    ## Number of iterations: 1e+05
    res9<-powerStressMin(dis,kappa=2,lambda=1.5,nu=-1.5,weightmat=2*1-diag(nrow(dis)))
    res9
    ## 
    ## Call: powerStressMin(delta = dis, kappa = 2, lambda = 1.5, nu = -1.5, 
    ##     weightmat = 2 * 1 - diag(nrow(dis)))
    ## 
    ## Model: Power Stress SMACOF 
    ## Number of objects: 15 
    ## Stress-1 value: 0.8362 
    ## Number of iterations: 44052
    summary(res9)
    ## 
    ## Configurations:
    ##                    D1      D2
    ## Aunt          -0.1228  0.2497
    ## Brother        0.1961 -0.1405
    ## Cousin         0.0526  0.3101
    ## Daughter      -0.2048 -0.1259
    ## Father         0.1637 -0.1824
    ## Granddaughter -0.2358 -0.0525
    ## Grandfather    0.2149 -0.1329
    ## Grandmother   -0.2362 -0.0861
    ## Grandson       0.2148 -0.1073
    ## Mother        -0.2097 -0.1366
    ## Nephew         0.1709  0.2108
    ## Niece         -0.1234  0.2441
    ## Sister        -0.2216 -0.0973
    ## Son            0.1700 -0.1710
    ## Uncle          0.1713  0.2179
    ## 
    ## 
    ## Stress per point:
    ##                  SPP SPP(%)
    ## Niece         0.0022  4.688
    ## Nephew        0.0022  4.712
    ## Aunt          0.0023  4.937
    ## Uncle         0.0023  5.028
    ## Daughter      0.0028  6.084
    ## Son           0.0029  6.275
    ## Father        0.0030  6.452
    ## Mother        0.0031  6.686
    ## Cousin        0.0032  6.798
    ## Sister        0.0034  7.272
    ## Brother       0.0034  7.283
    ## Grandson      0.0037  7.979
    ## Granddaughter 0.0038  8.102
    ## Grandmother   0.0041  8.819
    ## Grandfather   0.0041  8.885
    plot(res9)
    plot(res9,"transplot")
    plot(res9,"Shepard")
    plot(res9,"resplot")
    plot(res9,"bubbleplot")

    Strain Models

    The second popular type of PS supported in stops is based on the loss function type . Here the \(\Delta^*\) are a transformation of the \(\Delta\), \(\Delta^*= f (\Delta)\) so that \(f(\cdot)=-(h\circ l)(\cdot)\) where \(l\) is any function and \(h(\cdot)\) is a double centering operation, \(h(\Delta)=\Delta-\Delta_{i.}-\Delta_{.j}+\Delta_{..}\) where \(\Delta_{i.}, \Delta_{.j}, \Delta_{..}\) are matrices consisting of the row, column and grand marginal means respectively. These then get approximated by (functions of) the inner product matrices of \(X\) \begin{equation} \label{eq:dist2} d_{ij}(X) = \langle x_{i},x_{j} \rangle \end{equation}

    We can thus express classical scaling as a special case of the general PS loss with \(d_{ij}(X)\) as an inner product, \(g(\cdot) = I(\cdot)\) and \(f(\cdot)=-(h \circ I)(\cdot)\).

    If we again allow power transformations for \(g(\cdot)\) and \(f(\cdot)\) one can fit the following strain models with stops

    In stops we have a wrapper to cmdscale (overloading the base function) which extend functionality by offering an object that matches smacofP objects with corresponding methods.

    A powerstrain model is rather easy to fit with simply subjecting the dissimilarity matrix to some power. Here we use \(\lambda=3\).

    resc<-cmdscale(kinshipdelta^3)
    resc
    ## 
    ## Call: cmdscale(d = kinshipdelta^3)
    ## 
    ## Model: Torgerson-Gower Scaling 
    ## Number of objects: 15 
    ## GOF: 0.4258 0.6282
    summary(resc)
    ## 
    ## Configurations:
    ##                    D1      D2
    ## Aunt           178193  204987
    ## Brother       -174369  -94357
    ## Cousin         -48355  265057
    ## Daughter       169149 -109936
    ## Father        -145389 -168604
    ## Granddaughter  187039  -44851
    ## Grandfather   -180116 -103668
    ## Grandmother    199145  -83039
    ## Grandson      -169768  -72359
    ## Mother         185798 -138677
    ## Nephew        -195964  173124
    ## Niece          168278  208870
    ## Sister         182224  -70764
    ## Son           -149876 -135883
    ## Uncle         -205989  170101
    summary(resc)
    plot(resc)

    Augmenting MDS with structure considerations: STOPS and COPS

    The main contribution of the stops package is not in solely fitting the powerstress or powerstrain models and their variants from above, but allowing to choose the right transformation to achieve a “structured” MDS result automatically. This can be useful in a variety of contextes: to explore or generate structures, to restrict the target space, to avoid artefacts, to preserve certain types of structures and so forth.

    For this, an MDS loss function is subjected to nonlinear transformations and is augmented to include penalties for the type of structures one is aiming for. This combination of an MDS loss with a structuredness penalty is what we call “structure optimized loss” (stoploss) and the resulting MDS is coined “Structured Optimized Proximity Scaling” (or STOPS). The prime example for a STOPS model is “Cluster Optimized Proximity Scaling” (COPS) which selects optimal parameters so that the clusteredness appearance of the configuation is improved (see below).

    STOPS

    Following Rusch et al. (2015b) the general idea is that from given observations \(\Delta\) we look for a configuration \(X\). This achieves this by minimizing some loss function \(\sigma_{MDS}(X^*;\Delta^*)\) where the \(\Delta^*, X^*\) are functions of the \(\Delta\) and \(X\). The \(X\) has properties with regards to its structural appearance, which we call c-structuredness for configuration-structuredness. There are different types of c-structuredness people might be interested in (say, how clustered the result is, or that dimensions are orthogonal or if there is some submanifold that the data live on). We developed indices for these types of c-structuredness that capture that essence in the configuration.

    We have as part of a STOPS model a proximity scaling loss function \(\sigma_{MDS}(\cdot)\), a \(\Delta\) and an \(X\) and some transformation \(f_{ij}(\delta_{ij};\theta)\) and \(g_{ij}(d_{ij};\theta)\) that is parametrized (with \(\theta\) either finite or infinite dimensional, e.g., a transformation applied to all observations like a power transformation or even an individual transformation per object). These transformations achieve a sort of push towards more structure, so different values for \(\theta\) will in general lead to different c-structuredness.

    We further have \(K\) different indices \(I_k(X)\) that measure different types of c-structuredness. We can then define as methods that are of the form (additive STOPS) \begin{equation} \text{aSTOPS}(X, \theta, v_0, \dots, v_k; \Delta) = v_0 \cdot \sigma_{MDS}(X^*(\theta)) + \sum^K_{k=1} v_k I_k(X(\theta)) \end{equation} or (multiplicative STOPS) \begin{equation} \text{mSTOPS}(X, \theta, v_0, \dots, v_k; \Delta) = \sigma_{MDS}(X^*(\theta))^{v_0} \cdot \prod^K_{k=1} I_k(X(\theta))^{v_k} \end{equation}

    (which can be expressed as aSTOPS by logarithms). Here the \(v_0,...,v_k\) are weights that determine how the individual parts (mds loss and c-structuredness indices) are aggregated.

    The job is then to find \begin{equation} \arg\min_{\vartheta}\ \text{aSTOPS}(X, \theta, v_0, \dots, v_k; \Delta)\ \text{or} \ \arg\min_{\vartheta}\ \text{mSTOPS}(X, \theta, v_0, \dots, v_k; \Delta) \end{equation}

    where \(\vartheta \subseteq \{X,\theta, v_0, \dots, v_k\}\). Typically \(\vartheta\) will be a subset of all possible parameters here (e.g., the weights might be given a priori). Currently, the transformations that can be used in stops are limited to power transformations.

    Minimizing stoploss can be difficult. In stops we use a nested algorithm combining optimization that internally first solves for \(X\) given \(\theta\), \(\arg\min_X \sigma_{MDS}\left(X,\theta\right)\), and then optimize over \(\theta\) with a metaheuristic. Implemented are a simulated annealing (optimmethod="SANN") or particle swarm optimization (optimmethod="pso") and a variant of the Luus-Jaakola (optimmethod="ALJ") procedure . We suggest to use the latter. A Bayesian optimization approach is currently under way.

    Currently the following c-structuredness types are supported:

    If we have a single \(I(X)=OC(X)\), the OPTICS cordillera (Rusch et al. 2015a), and the transformations applied are power transformations and the weights for the \(I(X)\) is negative we essentially have COPS (see below).

    For the MDS loss (argument loss in functions stops and cops), the functions currently support all losses derived from powerstress and powerstrain and can in principle be fitted with powerStressMin alone. However, for many models offer dedicated functions that either use workhorses that are more optimized for the problem at hand and/or restrict the parameter space for the distance/proximity transformations and thus can be faster. They are:

    Usage

    The syntax for fitting a stops model is rather straightforward. One has to supply the arguments dis which is a dissimilarity matrix and structures a character vector listing the c-structuredness type that should be used to augment the PS loss (see above for the types of structures and losses). The parameters for the structuredness indices should be given with strucpars, a list whose elements are lists corresponding to each structuredness index and listing the parameters (if the default should be used the list element should be set to NULL). The PS loss can be chosen with the argument loss. The type of aggregation for the multi-objective optimization is specified in type and can be one of additive or multiplicative. One can pass additional parameters to the fitting workhorses with ....

    stops(dis, structures = c("cclusteredness","clinearity"), loss="stress", ...)

    One then has all the S3 methods of smacofP at one’s disposal.

    For example, let us fit an mSTOPS model that looks for a transformation of the \(\delta_{ij}\) so that a) the result has maximal c-clusteredness (which is 1 in the best case, so we set a negative weight for this structure) b) the projection onto the principal axes are nearly orthogonal (c-clinearity close to 0, so we set a positive weight for this structure) c) but the projections onto the principal axes should be stochastically dependent (negative weight on c-dependence) and d) the fit of the MDS is also factored in (so we set positive weight on the MDS loss). Since we use mSTOPS and a negative weight for c-linearity and c-dependence, a c-linearity/c-dependence close to 0 will overall dominate the stoploss function with the other two criteria being more of an afterthought - in aSTOPS that would be different.

    !!: This is generally the approach to be chosen: We minimize the stoploss, so a c-structuredness index that should be (numerically) large needs a negative weight and a c-structuredness index that should be (numerically) small needs a positive weight.

    We first set up the parameters for the structuredness indices. For the OPTICS cordillera we use a \(d_{max}\) of 1, epsilon=10 and minpts=2, for c-linearity we have no parameters (so using NULL will work) and for the c-dependence we have a single parameter, index, which we set to 2.

    strucpars<-list(list(epsilon=10,minpts=2,rang=c(0,1.3)), #cordillera
                    NULL,     # c-linearity (has no parameters)
                    list(index=2) #c-dependence
                    ) 
    ressm<-stops(kinshipdelta,loss="stress",stressweight=1,structures=c("cclusteredness","clinearity","cdependence"),strucweight=c(-0.33,0.33,-0.33),verbose=0,strucpars=strucpars,type="multiplicative")
    ressm
    ## 
    ## Call: stops(dis = kinshipdelta, loss = "stress", structures = c("cclusteredness", 
    ##     "clinearity", "cdependence"), stressweight = 1, strucweight = c(-0.33, 
    ##     0.33, -0.33), strucpars = strucpars, verbose = 0, type = "multiplicative")
    ## 
    ## Model: multiplicative STOPS with stress loss function and theta parameters= 1 3.095 1 
    ## 
    ## Number of objects: 15 
    ## MDS loss value: 1 
    ## C-Structuredness Indices: cclusteredness 0.53027 clinearity 0.01802 cdependence 0.01802 
    ## Structure optimized loss (stoploss): 1.233 
    ## MDS loss weight: 1 c-structuredness weights: -0.33 0.33 -0.33 
    ## Number of iterations of ALJ optimization: 117
    plot(ressm)

    Let us compare this with the corresponding aSTOPS

    ressa<-stops(kinshipdelta,loss="stress",stressweight=1,structures=c("cclusteredness","clinearity","cdependence"),strucweight=c(-0.33,0.33,-0.33),verbose=0,strucpars=strucpars,type="additive")
    ressa
    ## 
    ## Call: stops(dis = kinshipdelta, loss = "stress", structures = c("cclusteredness", 
    ##     "clinearity", "cdependence"), stressweight = 1, strucweight = c(-0.33, 
    ##     0.33, -0.33), strucpars = strucpars, verbose = 0, type = "additive")
    ## 
    ## Model: additive STOPS with stress loss function and theta parameters= 1 3.114 1 
    ## 
    ## Number of objects: 15 
    ## MDS loss value: 1 
    ## C-Structuredness Indices: cclusteredness 0.53021 clinearity 0.01785 cdependence 0.01785 
    ## Structure optimized loss (stoploss): 0.825 
    ## MDS loss weight: 1 c-structuredness weights: -0.33 0.33 -0.33 
    ## Number of iterations of ALJ optimization: 81
    plot(ressa)

    We see that the c-clusteredness is higher as compared to the mSTOPS result - we have a number of distinct object clusters (with at least minpts=2) and they are more spread out and distributed more evenly. The dimensions on the other hand are now farther from being orthogonal but the stochastic dependence is higher (which is a non-linear one obviously).

    When choosing a c-structuredness index, one needs to be clear what structure one might be interested in and how it interacts with the PS loss chosen. Consider the following example: We fit a powermds model to the kinship data and want to maximize c-association (i.e., any non-linear relationship) and c-manifoldness but minimize the c-linearity. In other words we try to find a power transformation of \(\Delta\) and \(D\) so that the objects are positioned in the configuration in such a way that the projection onto the principal axes are as close as possible to being related by a smooth but non-linear function.

    resa<-stops(kinshipdelta,structures=c("cassociation","cmanifoldness","clinearity"),loss="powermds",verbose=0,strucpars=list(NULL,NULL,NULL),type="additive",strucweight=c(-0.5,-0.5,0.5))
    resa
    ## 
    ## Call: stops(dis = kinshipdelta, loss = "powermds", theta = c(2.9429394, 
    ##     1.67850653, 1.57140404), structures = c("cassociation", "cmanifoldness", 
    ##     "clinearity"), strucweight = c(-0.5, -0.5, 0.5), strucpars = list(NULL, 
    ##     NULL, NULL), verbose = 0, type = "additive", itmax = 1)
    ## 
    ## Model: additive STOPS with powermds loss function and theta parameters= 2.943 1.679 1 
    ## 
    ## Number of objects: 15 
    ## MDS loss value: 0.9995 
    ## C-Structuredness Indices: cassociation 1.0000000 cmanifoldness 0.9918892 clinearity 0.0001654 
    ## Structure optimized loss (stoploss): 0.003602 
    ## MDS loss weight: 1 c-structuredness weights: -0.5 -0.5 0.5 
    ## Number of iterations of ALJ optimization: 1

    We see in this model (resa) that indeed the c-association is 1, which says we have a near perfect non-linear relationship. How does this relationship look like?

    plot(resa)

    It is a parabolic shape, so the projections are so that the points on D2 are a near parabolic function of the D1 (projecting onto some structure resembling a conic section is often the case for r-stress which is essentially what we got here - setting a negative weight on c-assocation can combat that if that is an artefact). What we can also see is that there are three clear clusters, so c-clusteredness should be high. But when looking at the OPTICS cordillera here, we find that the OPTICS cordillera is lower than for the result from above with using stress and lambda=2.66 (the ressa model).

    c1<-cordillera(resa$fit$conf,minpts=2,epsilon=10,rang=c(0,1.3))
    c2<-cordillera(ressa$fit$conf,minpts=2,epsilon=10,rang=c(0,1.3))
    c1
    ##    raw normed 
    ## 7.9441 0.4365
    c2
    ##    raw normed 
    ## 9.6499 0.5302

    This discrepancy comes from the definition of c-clusteredness (Rusch et al, 2015a) where more clusters, more spread-out clusters, more evenly distributed clusters and denser clusters all increase c-clusteredness. In the example with maximizing c-association we have two very dense clusters of 5 points and 1 relatively non-dense cluster of five other points. In the model maximizing c-clusteredness (and others) we get 6 relatively moderate dense clusters with 2 or 3 points each, which is also the minimum number of points we wanted to be grouped together. Most importantly, they are projected onto a much larger range of the target space as the \(X\) obtained from the stress loss is different than the one obtained from the powermds loss, so the \(d_{max}\) is very different. Since we use the normed OPTICS cordillera there, we look at c-clusteredness relative to the most clustered appearance with two points per cluster. Thus, the second result has more c-clusteredness. If we would define a cluster as having at most 5 points then c-clusteredness of the result with high c-association also has a large c-clusteredness because then the clusters found match the definition of high c-clusteredness.

    c3<-cordillera(resa$fit$conf,minpts=5,epsilon=10,rang=c(0,1.3))
    c3
    ##    raw normed 
    ## 3.8650 0.5946

    Note that it may just as well be possible to have a high c-association and no c-clusteredness at all (e.g., points lying equidistant on a smooth non-linear curve). Note also that the models are not necessarily comparable due to different stress functions - the transformation in powermds that is optimal with respect to c-clusteredness would be different.

    Indeed one can optimize for c-clusteredness alone and using it as a “goodness-of-clusteredness” index (i.e., the \(d_{max}\) is not constant over configurations but varies conditional on the configuration) then we get a projection with c-clusteredness of 0.67.

    resa2<-stops(kinshipdelta,structures=c("cclusteredness"),loss="powermds",verbose=0,strucpars=list(list(epsilon=10,rang=NULL,minpts=2)),type="additive",strucweight=-1,stressweight=0)

    For convenience it is also possible to use the stops function for finding the loss-optimal transformation in the the non-augmented models specified in loss, by setting the strucweight, the weight of any c-structuredness, to 0. Then the function optimizes the MDS loss function only.

    ressa<-stops(kinshipdelta,structure=c("clinearity"),strucweight=0,loss="stress",verbose=0)

    COPS

    A special STOPS model is COPS (Rusch et al. 2015a) for “Cluster Optimized Proximity Scaling”. This is also one of the main use cases for STOPS models. Let us write \(X(\theta)=\arg\min_X \sigma_{MDS}(X,\theta)\) for the optimal configuration for given transformation parameter \(\theta\). Following the outline of STOPS the overall objective function, which we call , is a weighted combination of the \(\theta-\)parametrized loss function, \(\sigma_{MDS}\left(X(\theta),\theta\right)\), and a c-clusteredness measure, the OPTICS cordillera or \(OC(X(\theta);\epsilon,k,q)\) to be optimized as a function of \(\theta\) or \begin{equation} \label{eq:spstress} \text{coploss}(\theta) = v_1 \cdot \sigma_{MDS}\left(X(\theta),\theta \right) - v_2 \cdot \text{OC}\left(X(\theta);\epsilon,k,q\right) \end{equation} with \(v_1,v_2 \in \mathbb{R}\) controlling how much weight should be given to the scaling fit measure and the c-clusteredness. In general \(v_2,v_2\) are either determined values that make sense for the application or may be used to trade-off fit and c-clusteredness in a way for them to be commensurable. In the latter case we suggest taking the fit function value as it is (\(v_1=1\)) and fixing the scale such that \(\text{coploss}=0\) for the scaling result with no transformations (\(\theta=\theta_0\)), i.e., \begin{equation} \label{eq:spconstant0} v^{0}_{1}=1, \quad v^{0}_2=\frac{\sigma_{MDS}\left(X(\theta_0),\theta_0\right)}{\text{OC}\left(X(\theta_0);\epsilon,k,q\right)}, \end{equation}

    with \(\theta_0=(1,1)^\top\) in case of loss functions with power transformations. Thus an increase of 1 in the MDS loss measure can be compensated by an increase of \(v^0_1/v^0_2\) in c-clusteredness. Selecting \(v_1=1,v_2=v^{0}_2\) this way is in line with the idea of pushing the configurations towards a more clustered appearance relative to the initial solution.

    Another possibility is to choose them in such a way that \(\text{coploss}=0\) in the optimum value, i.e., choosing \(v^{opt}_{1}, v^{opt}_2\) so that \begin{equation} v^{opt}_1 \cdot \sigma_{MDS}\left(X(\theta^*),\theta^*\right)-v^{opt}_2 \cdot \text{OC}\left(X(\theta^*);\epsilon,k,q \right) = 0 \end{equation}

    with \(\theta^*:=\arg\min_\theta \text{coploss}(\theta)\). This is in line with having \(\text{coploss}(\theta)>0\) for \(\theta \neq \theta^*\) and allows to optimize over \(v_1,v_2\).

    The optimization problem in COPS is then to find
    \begin{equation} \label{eq:soemdsopt2} \arg\min_{\theta} \text{coploss}(\theta) \end{equation} by doing \begin{equation} \label{eq:soemdsopt} v_1 \cdot \sigma_{MDS}\left(X(\theta),\theta\right) - v_2 \cdot \text{OC}\left(X(\theta);\epsilon,k,q\right) \rightarrow \min_\theta! \end{equation}

    For a given \(\theta\) if \(v_2=0\) than the result of optimizing the above is the same as solving the respective original PS problem. Letting \(\theta\) be variable, \(v_2=0\) will minimize the loss over configurations obtained from using different \(\theta\).

    The c-clusteredness index we use is the OPTICS cordillera which measures how clustered a configuration appears. It is based on the OPTICS algorithm that outputs an ordering together with a distance. The OPTICS cordillera is now simply an agregation of that information. Since we know what constitutes a maximally clustered result, we can derive an upper bound and normalize the index to lie between 0 and 1. If it is maximally clustered, the index gets a value of 1,and it gets 0 if all points are equidistant to their nearest neighbours (a matchstick embedding). See Rusch et al (2015a) for details.

    Usage

    Even though one can fit a COPS model with stops, there is a dedicated function cops. Its syntax works pretty much like in stops only that the structure argument is non-existant.

    cops(dis,loss,...)

    For the example we use a COPS model for a classical scaling (strain loss)

    resc<-cops(kinshipdelta,loss="strain")
    resc
    ## 
    ## Call: cops(dis = kinshipdelta, loss = "strain")
    ## 
    ## Model: COPS with strain loss function and parameters kappa= 1 lambda= 1.606 nu= 1 
    ## 
    ## Number of objects: 15 
    ## MDS loss value: 0.3237 
    ## OPTICS cordillera: Raw 10.87 Normed 0.3087 
    ## Cluster optimized loss (coploss):  -0.03458 
    ## MDS loss weight: 1  OPTICS cordillera weight: 1.16 
    ## Number of iterations of ALJ optimization: 90
    summary(resc)
    ## 
    ## Configurations:
    ##                   D1      D2
    ## Aunt          -345.9  492.49
    ## Brother        365.5 -253.89
    ## Cousin         107.3  560.94
    ## Daughter      -394.8 -262.97
    ## Father         320.3 -371.50
    ## Granddaughter -411.9  -99.39
    ## Grandfather    370.3 -214.95
    ## Grandmother   -412.2 -148.94
    ## Grandson       370.4 -181.08
    ## Mother        -411.9 -289.88
    ## Nephew         424.9  398.15
    ## Niece         -340.0  485.92
    ## Sister        -402.9 -183.98
    ## Son            328.5 -338.10
    ## Uncle          432.4  407.18

    A number of plots are availabe

    plot(resc,"confplot")
    plot(resc,"Shepard")
    plot(resc,"transplot")
    plot(resc,"reachplot")

    For convenience it is also possible to use the cops function for finding the loss-optimal transformation in the the non-augmented models specified in loss, by setting the cordweight, the weight of the OPTICS cordillera, to 0. Then the function optimizes the MDS loss function only.

    resca<-cops(kinshipdelta,cordweight=0,loss="strain")
    resca
    ## 
    ## Call: cops(dis = kinshipdelta, loss = "strain", cordweight = 0)
    ## 
    ## Model: COPS with strain loss function and parameters kappa= 1 lambda= 1.586 nu= 1 
    ## 
    ## Number of objects: 15 
    ## MDS loss value: 0.3237 
    ## OPTICS cordillera: Raw 10.87 Normed 0.3087 
    ## Cluster optimized loss (coploss):  0.3237 
    ## MDS loss weight: 1  OPTICS cordillera weight: 0 
    ## Number of iterations of ALJ optimization: 48

    Here the results match the result from using the standard cordweight. We can give more weight to the c-clusteredness though:

    rescb<-cops(kinshipdelta,cordweight=20,loss="strain")
    rescb
    ## 
    ## Call: cops(dis = kinshipdelta, loss = "strain", cordweight = 20)
    ## 
    ## Model: COPS with strain loss function and parameters kappa= 1 lambda= 2.06 nu= 1 
    ## 
    ## Number of objects: 15 
    ## MDS loss value: 0.3395 
    ## OPTICS cordillera: Raw 11.01 Normed 0.3128 
    ## Cluster optimized loss (coploss):  -5.916 
    ## MDS loss weight: 1  OPTICS cordillera weight: 20 
    ## Number of iterations of ALJ optimization: 94
    plot(resca,main="with cordweight=0")
    plot(rescb,main="with cordweight=20")

    This result has more c-clusteredness but less fit. The higher c-clusteredness is discernable in the Grandfather/Brother and Grandmother/Sister clusters (we used a minimum number of 2 observations to make up a cluster, minpts=2).

    Other Functions

    The package also provides functions that are used by the cops and stops and powerStressMin functions but may be of interest to a end user beyond that.

    OPTICS and OPTICS cordillera

    For calculating a COPS solution, we need the OPTICS algorithm and the OPTICS cordillera. In the package we also provide a rudimentary interface to the OPTICS impementation in ELKI.

    data(iris)
    res<-optics(iris[,1:4],minpts=2,epsilon=1000)
    print(res)
    ##     observation            reachability
    ## 1          ID=1          reachability=∞
    ## 2         ID=18        reachability=0.1
    ## 3         ID=41 reachability=0.14142136
    ## 4          ID=5 reachability=0.14142136
    ## 5         ID=38 reachability=0.14142136
    ## 6         ID=40 reachability=0.14142136
    ## 7          ID=8        reachability=0.1
    ## 8         ID=50 reachability=0.14142136
    ## 9         ID=29 reachability=0.14142136
    ## 10        ID=28 reachability=0.14142136
    ## 11        ID=36  reachability=0.2236068
    ## 12        ID=49  reachability=0.2236068
    ## 13        ID=11        reachability=0.1
    ## 14        ID=27  reachability=0.2236068
    ## 15        ID=24        reachability=0.2
    ## 16        ID=44  reachability=0.2236068
    ## 17        ID=12  reachability=0.2236068
    ## 18        ID=30  reachability=0.2236068
    ## 19        ID=31 reachability=0.14142136
    ## 20        ID=35 reachability=0.14142136
    ## 21        ID=10        reachability=0.1
    ## 22         ID=2 reachability=0.14142136
    ## 23        ID=46 reachability=0.14142136
    ## 24        ID=13 reachability=0.14142136
    ## 25        ID=26 reachability=0.17320508
    ## 26         ID=4 reachability=0.17320508
    ## 27        ID=48 reachability=0.14142136
    ## 28         ID=3 reachability=0.14142136
    ## 29        ID=43  reachability=0.2236068
    ## 30        ID=39        reachability=0.2
    ## 31         ID=9 reachability=0.14142136
    ## 32         ID=7  reachability=0.2236068
    ## 33        ID=20 reachability=0.24494897
    ## 34        ID=22 reachability=0.14142136
    ## 35        ID=47 reachability=0.14142136
    ## 36        ID=14 reachability=0.24494897
    ## 37        ID=25        reachability=0.3
    ## 38        ID=37        reachability=0.3
    ## 39        ID=21        reachability=0.3
    ## 40        ID=32 reachability=0.28284271
    ## 41        ID=17 reachability=0.34641016
    ## 42         ID=6 reachability=0.34641016
    ## 43        ID=19 reachability=0.33166248
    ## 44        ID=33 reachability=0.34641016
    ## 45        ID=34 reachability=0.34641016
    ## 46        ID=45 reachability=0.36055513
    ## 47        ID=16 reachability=0.36055513
    ## 48        ID=15 reachability=0.41231056
    ## 49        ID=23 reachability=0.45825757
    ## 50        ID=42  reachability=0.6244998
    ## 51        ID=99 reachability=1.64012195
    ## 52        ID=58 reachability=0.38729833
    ## 53        ID=94 reachability=0.14142136
    ## 54        ID=61 reachability=0.36055513
    ## 55        ID=82 reachability=0.64807407
    ## 56        ID=81 reachability=0.14142136
    ## 57        ID=70 reachability=0.17320508
    ## 58        ID=90 reachability=0.24494897
    ## 59        ID=54        reachability=0.2
    ## 60        ID=93 reachability=0.26457513
    ## 61        ID=83 reachability=0.14142136
    ## 62        ID=68 reachability=0.24494897
    ## 63       ID=100 reachability=0.26457513
    ## 64        ID=97 reachability=0.14142136
    ## 65        ID=96 reachability=0.14142136
    ## 66        ID=95 reachability=0.17320508
    ## 67        ID=89 reachability=0.17320508
    ## 68        ID=91 reachability=0.26457513
    ## 69        ID=62        reachability=0.3
    ## 70        ID=56 reachability=0.31622777
    ## 71        ID=67        reachability=0.3
    ## 72        ID=85        reachability=0.2
    ## 73        ID=79 reachability=0.33166248
    ## 74        ID=92        reachability=0.2
    ## 75        ID=64 reachability=0.14142136
    ## 76        ID=74  reachability=0.2236068
    ## 77        ID=72 reachability=0.34641016
    ## 78        ID=98 reachability=0.33166248
    ## 79        ID=75        reachability=0.2
    ## 80        ID=76 reachability=0.26457513
    ## 81        ID=66 reachability=0.14142136
    ## 82        ID=59 reachability=0.24494897
    ## 83        ID=55 reachability=0.24494897
    ## 84        ID=52 reachability=0.31622777
    ## 85        ID=57 reachability=0.26457513
    ## 86        ID=87 reachability=0.31622777
    ## 87        ID=53 reachability=0.28284271
    ## 88        ID=51 reachability=0.26457513
    ## 89        ID=78 reachability=0.31622777
    ## 90        ID=77 reachability=0.31622777
    ## 91        ID=80 reachability=0.34641016
    ## 92        ID=86 reachability=0.37416574
    ## 93        ID=60 reachability=0.38729833
    ## 94       ID=148 reachability=0.41231056
    ## 95       ID=111  reachability=0.2236068
    ## 96       ID=112 reachability=0.34641016
    ## 97       ID=117 reachability=0.36055513
    ## 98       ID=138 reachability=0.14142136
    ## 99       ID=104 reachability=0.24494897
    ## 100      ID=129 reachability=0.33166248
    ## 101      ID=133        reachability=0.1
    ## 102      ID=105        reachability=0.3
    ## 103      ID=146 reachability=0.36055513
    ## 104      ID=142 reachability=0.24494897
    ## 105      ID=141 reachability=0.36055513
    ## 106      ID=145 reachability=0.24494897
    ## 107      ID=121 reachability=0.26457513
    ## 108      ID=144  reachability=0.2236068
    ## 109      ID=125        reachability=0.3
    ## 110      ID=113 reachability=0.34641016
    ## 111      ID=140 reachability=0.17320508
    ## 112      ID=116 reachability=0.37416574
    ## 113      ID=149        reachability=0.3
    ## 114      ID=137 reachability=0.24494897
    ## 115      ID=147 reachability=0.37416574
    ## 116      ID=124 reachability=0.24494897
    ## 117      ID=127 reachability=0.17320508
    ## 118      ID=128 reachability=0.24494897
    ## 119      ID=139 reachability=0.14142136
    ## 120       ID=71  reachability=0.2236068
    ## 121      ID=150 reachability=0.28284271
    ## 122      ID=143 reachability=0.33166248
    ## 123      ID=102          reachability=0
    ## 124      ID=114 reachability=0.26457513
    ## 125      ID=122 reachability=0.31622777
    ## 126       ID=84 reachability=0.36055513
    ## 127      ID=134 reachability=0.33166248
    ## 128       ID=73 reachability=0.36055513
    ## 129      ID=103        reachability=0.4
    ## 130      ID=126 reachability=0.38729833
    ## 131      ID=130 reachability=0.34641016
    ## 132       ID=65 reachability=0.42426407
    ## 133      ID=101 reachability=0.42426407
    ## 134      ID=120 reachability=0.43588989
    ## 135      ID=108 reachability=0.43588989
    ## 136      ID=131 reachability=0.26457513
    ## 137      ID=115 reachability=0.48989795
    ## 138       ID=63 reachability=0.48989795
    ## 139       ID=69 reachability=0.50990195
    ## 140       ID=88 reachability=0.26457513
    ## 141      ID=106 reachability=0.52915026
    ## 142      ID=123 reachability=0.26457513
    ## 143      ID=119 reachability=0.41231056
    ## 144      ID=136 reachability=0.53851648
    ## 145      ID=135 reachability=0.53851648
    ## 146      ID=109 reachability=0.55677644
    ## 147      ID=110 reachability=0.63245553
    ## 148      ID=107 reachability=0.73484692
    ## 149      ID=118 reachability=0.81853528
    ## 150      ID=132 reachability=0.41231056
    summary(res)
    ## 
    ##  An OPTICS results with minpts= 2 and epsilon= 1000 
    ## 
    ##  Five Point Summary of the Minimum Reachabilities:
    ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
    ##   0.000   0.173   0.265   0.292   0.346   1.640       1 
    ## 
    ##  Stem and Leaf Display of the Minimum Reachabilities:
    ## 
    ##   The decimal point is 1 digit(s) to the left of the |
    ## 
    ##    0 | 0
    ##    1 | 00000444444444444444444444444447777777
    ##    2 | 00000022222222222244444444444466666666666888
    ##    3 | 0000000022222233333355555555566666666777999
    ##    4 | 011112244699
    ##    5 | 13446
    ##    6 | 235
    ##    7 | 3
    ##    8 | 2
    ##    9 | 
    ##   10 | 
    ##   11 | 
    ##   12 | 
    ##   13 | 
    ##   14 | 
    ##   15 | 
    ##   16 | 4
    plot(res,withlabels=TRUE)

    and a function for calculating and displaying the OPTICS cordillera.

    cres<-cordillera(iris[,1:4],minpts=2,epsilon=1000,scale=FALSE)
    cres
    ##      raw   normed 
    ## 14.96797  0.06125
    summary(cres)
    ## 
    ##   OPTICS cordillera values with minpts= 2 and epsilon= 1000 
    ## 
    ##    Raw OC: 14.97 
    ##    Normalization: 244.4 
    ##    Normed OC: 0.06125
    plot(cres)

    Optimization

    Since the inner optimization problem in STOPS models is hard and takes long, Rusch et al. (2015a) developed a metaheuristic for the outer optimization problem that needs typically less calls to the inner minimization than pso or SANN, albeit without the guarantees of convergence to a global minimum for non-smooth functions. It is an adaptation of the Luus-Jaakola random search (Luus & Jaakola 1973). It can used with the function ljoptim which modeled its output after optim. It needs as arguments x a starting value, fun a function to optimize, a lower and upper box constraint for the search region. By using the argument adaptive=TRUE or FALSE one can switch between our adaptive version and the original LJ algorithm. Accuracy of the optimization can be controlled with the maxit (maximum number of iterations), accd (terminates after the length of the search space is below this number ) and acc arguments (terminates if difference of two subsequent function values are below this value).

    We optimize a “Wild Function” with the non-adaptive LJ version (and numerical accuracies of at least 1e-16 for accd and acc).

    set.seed(210485)
    fwild <- function (x) 10*sin(0.3*x)*sin(1.3*x^2) + 0.00001*x^4 + 0.2*x+80
    res2<-ljoptim(50, fwild,lower=-50,upper=50,adaptive=FALSE,accd=1e-16,acc=1e-16)
    res2
    ## $par
    ## [1] -15.82
    ## 
    ## $value
    ## [1] 67.47
    ## 
    ## $counts
    ## function gradient 
    ##      463       NA 
    ## 
    ## $convergence
    ## [1] 0
    ## 
    ## $message
    ## NULL
    plot(fwild, -50, 50, n = 1000, main = "ljoptim() minimising 'wild function'")
    points(res2$par,res2$value,col="red",pch=19)

    Procrustes Adjustment

    We also provide a procrustes adjustment to make two configurations visually comparable. The function is conf_adjust and takes two configurations conf1 the reference configuration and conf2 another configuration. It returns the adjusted versions

    conf_adjust(conf1,conf2) 

    References