<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">AJCM</journal-id><journal-title-group><journal-title>American Journal of Computational Mathematics</journal-title></journal-title-group><issn pub-type="epub">2161-1203</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/ajcm.2012.21004</article-id><article-id pub-id-type="publisher-id">AJCM-17956</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  VATdt: Visual Assessment of Cluster Tendency Using Diagonal Tracing
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>ingkang</surname><given-names>Hu</given-names></name><xref ref-type="aff" rid="aff1"><sub>1</sub></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><label>1</label><addr-line>Department of Mathematical Sciences, Georgia Southern University, Statesboro, USA</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>yhu@georgiasouthern.edu</email></corresp></author-notes><pub-date pub-type="epub"><day>21</day><month>03</month><year>2012</year></pub-date><volume>02</volume><issue>01</issue><fpage>27</fpage><lpage>41</lpage><history><date date-type="received"><day>December</day>	<month>31,</month>	<year>2011</year></date><date date-type="rev-recd"><day>January</day>	<month>31,</month>	<year>2012</year>	</date><date date-type="accepted"><day>February</day>	<month>10,</month>	<year>2012</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  The visual assessment of tendency (VAT) technique, for visually finding the number of meaningful clusters in data, developed by J. C. Bezdek, R. J. Hathaway and J. M. Huband, is very useful, but there is room for improvements. Instead of displaying the ordered dissimilarity matrix (ODM) as a 2D gray-level image for human interpretation as is done by VAT, we trace the changes in dissimilarities along the diagonal of the ODM. This changes the 2D data structure (matrices) into 1D arrays, displayed as what we call the tendency curves, which enables one to concentrate only on one variable, namely the height. One of these curves, called the d-curve, clearly shows the existence of cluster structure as patterns in peaks and valleys, which can be caught not only by human eyes but also by the computer. Our numerical experiments showed that the computer can catch cluster structures from the d-curve even in some cases where the human eyes see no structure from the visual outputs of VAT. And success on all numerical experiments was obtained us- ing the same (fixed) set of program parameter values.
 
</p></abstract><kwd-group><kwd>Clustering; Dissimilarity Measures; Data Visualization; Clustering Tendency</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Clustering is the problem of partitioning a set of objects <img src="4-1100084\c452ca43-8f3d-4904-8d1e-0dc1addf0331.jpg" /> into c self-similar subsets (clusters) based on available data and some well-defined measure of similarity. The type of clusters found depends strongly on the mathematical model that underlies the clustering algorithm. All clustering algorithms will find any number (up to n) of clusters, even if no meaningful clusters exist. Therefore before choosing a clustering method one has to decide whether there are meaningful clusters, and if so, how many are there. This is called the assessing of clustering tendency.</p><p>Numerous formal (statistics-based) and informal techniques for such assessment are discussed in Jain and Dubes [<xref ref-type="bibr" rid="scirp.17956-ref1">1</xref>] and Everitt [<xref ref-type="bibr" rid="scirp.17956-ref2">2</xref>]. None of these existing methods are totally satisfactory, nor will they ever be. Visual approaches for assessing clustering tendency have been widely studied in the last few decades; Tukey [<xref ref-type="bibr" rid="scirp.17956-ref3">3</xref>] and Cleveland [<xref ref-type="bibr" rid="scirp.17956-ref4">4</xref>] are standard references for visual approaches in various data analysis problems. Recently the research on the visual assessment of tendency (VAT) technique has been quite active; see the original VAT paper by Bezdek and Hathaway [<xref ref-type="bibr" rid="scirp.17956-ref5">5</xref>], also see VATr by Bezdek, Hathaway and Huband [<xref ref-type="bibr" rid="scirp.17956-ref6">6</xref>], sVAT by Hathaway, Bezdek and Huband [<xref ref-type="bibr" rid="scirp.17956-ref7">7</xref>], and reVAT and bigVAT by Huband, Bezdek and Hathaway [8,9].</p><p>The object set O is usually represented in the following two ways. When each object <img src="4-1100084\43495da3-2885-437c-ab21-c01334736c1f.jpg" /> is represented by a vector<img src="4-1100084\6e365cc7-4507-411e-836d-3de634737bdb.jpg" />, the set <img src="4-1100084\4182ee41-d302-4a4d-80a1-3b1a51a5bc7b.jpg" /> is called an object data representation of O. The s components of <img src="4-1100084\0d0cbb82-c696-4ef6-beda-49da4cac06cd.jpg" /> represent the s features of the object<img src="4-1100084\3446c96d-7f00-41dd-87af-0f3d784303a6.jpg" />. It is in this feature space that people sometimes seek descriptors of the clusters, cluster centers or prototypes, as they are called. Alternatively, when each pair of objects in O is represented by a relationship, it is called relational data. Most of the time, the relationship between <img src="4-1100084\728da6c1-c3e4-48f9-b2e9-e39486661264.jpg" /> and <img src="4-1100084\7a9c840f-c779-4479-ae12-1cb0b4943f94.jpg" /> is given by their dissimilarity <img src="4-1100084\56aec500-08fe-4909-916a-06a362b65f2e.jpg" /> (a distance or some other measure; see [10,11]). These <img src="4-1100084\b8ac05a8-13f8-4d74-aacd-0a644b72089c.jpg" /> data items form a symmetric matrix <img src="4-1100084\a69e7265-8171-416a-b4aa-ea5d3a3ce7e2.jpg" /></p><p>Our method, which we call VATdt, standing for Visual Assessment of cluster Tendency using diagonal tracing, replaces the visual output of the VAT algorithms (the original one or its variations). VAT applies directly on a dissimilarity matrix. If the original data consist of a</p><p>(symmetric) matrix of pair-wise similarities <img src="4-1100084\e53d2da5-fa27-4249-9e9e-483b330f9afe.jpg" /></p><p>then a dissimilarity matrix R can be obtained through a simple transformation such as</p><p><img src="4-1100084\8aa94b17-546a-425d-9262-68c3bb54a2e3.jpg" />where <img src="4-1100084\27b1f56c-3c59-4228-b894-03544521291b.jpg" /> denotes the largest similarity value. If the original data are represented by object data</p><p><img src="4-1100084\b4beaea9-800b-4bd7-9858-9915c60921eb.jpg" />, then <img src="4-1100084\126aa675-8fea-4bed-ba77-1f374820774a.jpg" /> can be computed as the distance between <img src="4-1100084\d70bc8ad-deb6-4e12-b604-7fd58209db4e.jpg" /> and <img src="4-1100084\29cb0524-538e-4b6c-838a-18231eab20f8.jpg" /> measured by some norm or metric in the feature space <img src="4-1100084\ec28b047-9618-4a13-875a-3878c2b9de98.jpg" /> Hence the VAT algorithms can always be applied, and so can our VATdt algorithm. They are applicable even if some components of the original data are missing; see [<xref ref-type="bibr" rid="scirp.17956-ref5">5</xref>] and the references therein. In this paper if the data are given as object data X, the dissimilarity matrix R will be given by the square root of the Euclidean norm of <img src="4-1100084\a14bbd8e-79f8-423c-a6cb-e173c1d4909c.jpg" /></p><disp-formula id="scirp.17956-formula89806"><label>(1)</label><graphic position="anchor" xlink:href="4-1100084\1fa2f400-14f1-431a-81d1-f7b20f14e174.jpg"  xlink:type="simple"/></disp-formula><p><img src="4-1100084\2361d599-7c18-4b70-9753-7b3b3ea73c2d.jpg" />&#160;&#160;&#160;&#160;&#160; VAT reorders the points in a data set so that points that are close to one another in the feature space will generally have similar indices (see the example below). Some versions, such as sVAT [<xref ref-type="bibr" rid="scirp.17956-ref7">7</xref>], reduce the size of R by choosing a subset of the original set O. Their numeric output is an ordered dissimilarity matrix (ODM). We will still use the letter R for the ODM. This will not cause confusion since it is the only information on the data we are going to use. The ODM satisfies</p><p><img src="4-1100084\28336a49-5a90-4d29-b1bf-178bd791b0ca.jpg" /></p><p>The largest element of R is 1 because the VAT algorithms scale the elements of R.</p><p>VAT displays the ODM on the screen in a straightforward way, as ordered dissimilarity image (ODI). In ODI the gray level <img src="4-1100084\803c4df6-c143-491b-950e-c5d60e1313e6.jpg" /> of pixel (i,j) is proportional to the value of <img src="4-1100084\cce6c7a3-33f5-4418-8d20-cb3d3ebdc3c2.jpg" />with <img src="4-1100084\446a493a-9701-4b2a-95ec-1da258df6d1c.jpg" /> (pure black) if <img src="4-1100084\d3361717-28c8-4fb2-9376-df0caa8f7a7c.jpg" /> and <img src="4-1100084\d66a30d6-71ba-40d6-9441-3615c83b41b3.jpg" /> (pure white) if<img src="4-1100084\c80f540c-93c8-4e6f-b3ed-2abbe8503267.jpg" />. The idea of VAT is shown in the following example.</p><p>Example 1. A data set <img src="4-1100084\a625f931-1ed6-4015-ad94-b2fae75004ac.jpg" />of 20 points containing three well-defined clusters is shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>. As most likely found in applications, the points in each of the clusters are not indexed together. <xref ref-type="fig" rid="fig2">Figure 2</xref>(a) shows the original (random) order of the points in X, with <img src="4-1100084\fe997180-95d0-4560-b57d-469d23cf95be.jpg" /> represented by a diamond. The corresponding ODI image in <xref ref-type="fig" rid="fig2">Figure 2</xref>(b) shows no useful visual information about the structure in X. The VAT technique can reorder the points in X so that nearby points are (generally) indexed closely. <xref ref-type="fig" rid="fig3">Figure 3</xref>(a) shows the new order of the data set X, with the diamond in the lower left corner representing the first point in the ordered data set. <xref ref-type="fig" rid="fig3">Figure 3</xref>(b) gives the corresponding ODI. Now the three clusters are represented by the three well-formed black blocks.</p><p>The VAT algorithms are certainly very useful, but there is room for improvements. It seems to us that our eyes are not very sensitive to structures in gray level images. One example is given in <xref ref-type="fig" rid="fig4">Figure 4</xref>. There are three clusters in the data as we will show later. The clusters are not well separated, and the ODI from VAT reveals almost no sign of the existence of the structure.</p><p>The approach of this paper is to trace changes in dissimilarities along the diagonal of the ODM, the numeric output of VAT that underlies its visual output ODI. This</p><p>will result in what we call the tendency curves. The borders of clusters in the ODM (or blocks in the ODI) are reflected as certain patterns in peaks and valleys on the tendency curves. To be exact, we will actually use only one of these curves, called the d-curve, which is the difference of two other curves. The patterns on the d-curve can be caught not only by human eyes but also by the computer. It seems that the computer is more sensitive to these patterns than human eyes are to them, or to the gray level patterns in the ODI. For example, the computer caught three clusters in the data set that produced the virtually useless ODI in <xref ref-type="fig" rid="fig4">Figure 4</xref>.</p><p>Remark: The patterns on the tendency curves only roughly match the block borders in ODI in positions, and the sizes of these blocks do not closely approximate the</p><p>sizes of clusters in the data, either. This is because the VAT algorithms tend to index each cluster’s most outlying points at the very end, after all the more dense cluster cores are indexed. Whenever we say in this paper “catch clusters/blocks”, we mean the program reveals the existence of clusters/blocks. The sizes and members (or memberships) will have to be found by a clustering method, not by a tendency algorithm such as ours.</p><p>We will describe our method in detail in &#167;2 below, give numerical examples in &#167;3, and conclude the paper with discussions and future plans in the last section.</p></sec><sec id="s2"><title>2. Visual Assessment of Cluster Tendency Using Diagonal Tracing</title><p>We try to catch possible diagonal blocks in the ordered dissimilarity matrix R, the numeric output of VAT. We do so by using various averages of dissimilarities, which are stored as vectors and displayed as curves. The goal is to catch the borders of black blocks in an ODI such as <xref ref-type="fig" rid="fig3">Figure 3</xref>(b). Imagine that a horizontal line segment running from the left edge of the ODI to the diagonal (exclusive) moves down. The line segment is dark when it is inside the first block, and becomes light once it gets out of the block. If the clusters are well separated, the change in the darkness should be large enough to catch. We use the row-average, which is the average of the elements to the left of the diagonal in a row of the ODM, to represent (the darkness of) the line segment. We call its graph (versus the row number) the r-curve. The darker the line segment, the smaller the row-average, thus the lower the r-curve. When line segment goes across a border, the r-curve should first show a peak because the numbers to the left of the diagonal element <img src="4-1100084\2b83b6cb-ee39-4ada-ac54-e4a6067769ca.jpg" /> will suddenly increase. It should drop back down rather quickly when the line moves well inside the next black block. Therefore a border of two blocks should induce a peak on the curve. There is a complication, though, that we can not keep using all the elements from the very left edge to the diagonal. This is because, from the second block down, the beginning part of the line, which is rather light, would drag down its average darkness, and decrease the change in its darkness when the line goes across another possible border. In terms of the graph, the r-curve would become flatter and flatter, its peaks lower and lower when moving to the right, thus harder and harder for the program to catch. Our way to solve this problem is to restrict ourselves in a subdiagonal band with a width w, called the w-band, as shown in <xref ref-type="fig" rid="fig5">Figure 5</xref>. That is, we cap by w the number of elements in the average. To be exact, we define the i-th element of the r-curve (the i-th row-average) as</p><disp-formula id="scirp.17956-formula89807"><label>(2)</label><graphic position="anchor" xlink:href="4-1100084\7c4f82da-580a-47bf-b361-65f70fbeb521.jpg"  xlink:type="simple"/></disp-formula><p>where <img src="4-1100084\d7eb65f5-0295-4fae-85d9-fe23e2370f8c.jpg" /> This is the average of the elements of row i in the w-band shown in <xref ref-type="fig" rid="fig5">Figure 5</xref> below.</p><p>When the situation is less than ideal, there will be noise, sometimes very “loud” noise, on the r-curve, which may destroy possible patterns on it. To overcome this, we extend the idea of averaging to more rows, which leads to the m-curve, whose i-the element is the average of all elements <img src="4-1100084\e63c39a3-0d35-4f8a-a23b-119930e7b2d0.jpg" /> such that</p><p><img src="4-1100084\d8130484-6986-4fb8-9dd3-7d1afa63f715.jpg" /></p><p>and</p><p><img src="4-1100084\871b64c6-daf8-4169-9faf-fda433d15af6.jpg" /></p><p>These are the elements in up to m rows above row i, inclusive, that fall in the w-band, corresponding to the region between the two horizontal line segments in <xref ref-type="fig" rid="fig5">Figure 5</xref>.</p><p>The m-curve often reveals the pattern beneath the noisy r-curve. Since the ODM is scaled so that <img src="4-1100084\de4e47e9-61b7-4713-8c62-022088035a71.jpg" /></p><p><img src="4-1100084\a2ddbdfc-d8c1-43c0-bfca-4f46cc84f742.jpg" />the heights of peaks on the m-curve remain roughly the same from case to case, that is, when clusters are well formed.</p><p>But again there are less-than-ideal situations, in which there are outliers. The VAT algorithms tend to order outliers near the end, so the m-curve tends to move up toward the right, which is fine to human eyes but makes it hard for the program to identify peaks and valleys using thresholds. This is why we introduce the M-row moving average, called the M-curve. The M-curve is defined in the same way as the m-curve except with m replaced by M. The M-curve shows long term trends of the r-curve. We are, however, NOT interested in the M-curve itself. We use the M-curve to “correct”, or to level up, the m-curve, by subtracting the former from the latter. It is the difference of the mand M-curves, which we call the d-curve, that we are interested in. The d-curve retains the shape of the m-curve but is more horizontal, basically lying on the horizontal axis. Furthermore, the M-curve changes more slowly than the m-curve, thus when moving from one block into another block in the ODM, it will tend to be lower than the m-curve. As the result, the d-curve will show a valley, most likely below the horizontal axis, after a peak. It is the peak-valley, or high-low, patterns on the d-curve that signal the existence of cluster structures. This will become clear in our examples in the section that follows.</p><p>Although the d-curve is the only curve we really need, we will also show other tendency curves, that is, the r-, mand M-curves, in the first few examples to show the reader how the idea evolved from an intuitive r-curve to the final, rather technical, d-curve.</p><p>Remark: It may seem much more natural to define the i-th element of the r-curve as the average of all <img src="4-1100084\0da98226-448e-4797-96b5-e4895972a984.jpg" /> such that the object <img src="4-1100084\88c12330-7681-47c8-9876-60016628dee6.jpg" /> is in the same cluster as <img src="4-1100084\ffc68823-31dc-447f-9e52-48622249d58e.jpg" /> and <img src="4-1100084\40a8c0d8-8c2f-450f-9621-77e62814e13b.jpg" /> Actually this is what we tried at the very beginning of this work. More precisely, we set <img src="4-1100084\63969c5d-72cd-4efa-ba2a-f532ac8aca2d.jpg" /> in definition (2) at the beginning of the calculation, and once we believed we had found a new cluster, we reset <img src="4-1100084\0d3fbf26-0f5e-4f7b-a504-bae6f0e83734.jpg" /> to the index of the element we believed to be the first one in the new cluster. There were several problems. First, neither the VAT algorithms nor our program can accurately locate the borders of clusters in terms of the index values. Second, any possible patterns obtained that way were self-fulfilled: once we reset <img src="4-1100084\6b3058ee-2d8d-4ee2-9ea3-d7f0739e43c3.jpg" /> all curves went back to zero, and then it would look like there was indeed a new cluster. It would literally tear the tendency curves apart, and distort all possible high-low patterns.</p></sec><sec id="s3"><title>3. Numerical Examples</title><p>In all the examples in this paper, we will use the values</p><disp-formula id="scirp.17956-formula89808"><label>(3)</label><graphic position="anchor" xlink:href="4-1100084\a23f48fa-1abe-4479-bbd2-0a3e39e748d0.jpg"  xlink:type="simple"/></disp-formula><p>where n is the number of objects in the data set. Here the ceiling function is used for m so that it is at least 1 even if n is very small. And these are the values we recommend to possible users of our algorithm when there is no clear reason to change them. Discussion on how the values of these, and two other, parameters were chosen can be found later in the section.</p><p>We first give one group of examples in <img src="4-1100084\1bed8ce2-26e2-49fb-ae6c-7634862b0474.jpg" /> so that we can use their scatterplots to show how well/poorly the clusters are separated. We also give the visual outputs (ODIs) of VAT for comparison. These sets are generated by choosing <img src="4-1100084\f1600d2a-927a-420b-8fdc-256a3036e615.jpg" /> 8, 4, 3, 2, 1 and 0 in the following settings: 2000 points are generated in three groups from multivariate normal distribution having mean vectors</p><p><img src="4-1100084\478b3624-5937-4d68-a123-fc9ebee2f24b.jpg" /><img src="4-1100084\8089fa8c-f562-4523-bcc7-de2b1b619779.jpg" />and</p><p><img src="4-1100084\6a6c0926-69e8-4c98-b15b-7bd89f112b23.jpg" />The probabilities for a point to fall into each of the three groups are 0.35, 0.4 and 0.25, respectively. The covariance matrices for all three groups are <img src="4-1100084\0ca46d53-d6e8-41ab-9dee-d28d0899a92c.jpg" /> Note that <img src="4-1100084\beeb0fcf-f0d8-459f-971c-f7812a41cca7.jpg" /> <img src="4-1100084\e709b5f5-6300-44b7-86b8-95b41c191f9f.jpg" /> and <img src="4-1100084\14a0b9b6-09dd-44e6-9468-1b04070df513.jpg" /> form an equilateral triangle of side length <img src="4-1100084\554c11b9-685c-4815-a315-20dfe91205a6.jpg" /></p><p>The pictures for <img src="4-1100084\bb03ab1f-73dc-4fd5-afea-9e53450ef690.jpg" /> (<xref ref-type="fig" rid="fig6">Figure 6</xref>) show what we should look for on the curves. The clusters are very well separated, and the ODI has three black blocks on the diagonal with sharp borders. Our r-curve (the one with “noise”) has two vertical rises and the m-curve (the solid curve going through the r-curve where it is relatively flat) has two peaks, corresponding to the two block borders in the ODI. The M-curve, the smoother, dash-dotted curve, is only interesting in its relative position with respect to the m-curve. That is, it is only useful in generating the d-curve, the difference of these two curves. The d-curve looks almost identical to the m-curve, also having two peaks and two valleys. The major difference is that it is in the lower part of the figure, around the horizontal axis.</p><p><xref ref-type="fig" rid="fig7">Figure 7</xref> shows the case <img src="4-1100084\c908a729-315b-4c63-9275-a8027cca274f.jpg" /> The clusters are less separated than the case <img src="4-1100084\c238e6cd-9e74-4ac0-805f-dabb2fe008ce.jpg" /> and the slopes of the tendency curves are smaller. There are still two vertical rises on the r-curve, and two peaks followed by two valleys on all other curves where the block borders are in the ODI in part (b). What is really different here from the case <img src="4-1100084\95766ced-7f72-4243-98fc-fa3d08702561.jpg" /> is the wild oscillations near the end of the r-curve, bringing up all other three curves. This corresponds to the small region in the lower-right corner of the ODI, where there lacks pattern. Note that no valley follows from the third rise or peak. This is understandable because a valley appears only when the curve index (the horizontal variable of the graphs) runs into a cluster, shown as a block in ODI.</p><p>Now we know what we should look for: peaks followed by valleys, or high-low patterns, on the r and d-curves. Later on we will show that even the r-curve is not good enough and only the d-curve will do the job.</p><p>The case <img src="4-1100084\5ea77f73-506e-4431-8318-f2051470a66e.jpg" /> is given in <xref ref-type="fig" rid="fig8">Figure 8</xref>. One can still easily make out three clusters in the scatterplot, but it is hard to say to which cluster many points in the middle belong. It is expected that every visual method will have difficulties with them, as evidenced by the lower right corner of the ODI, and the oscillations on the last one fifth of the r-curve. The oscillations bring up the mand M-curves, but not the d-curve. The d-curve remains almost the same as those in the two previous cases, except the third peak becomes larger and decreases moderately near the end, without forming a valley. The two high-low patterns on the mand d-curves show the existence of three clusters. As we have said earlier that it is a valley on the m-curve and, especially, the d-curve that signals the beginning of a new cluster.</p><p>Note that the m-curve goes up with wild oscillations so much in <xref ref-type="fig" rid="fig8">Figure 8</xref>(c) that its right end rises higher than any peak on it, which makes it hard for the computer to catch the high-low patterns, at least hard with thresholds. That is why we introduced the more technical d-curve to replace the intuitive m-curve, which had earlier replaced the more intuitive but often noisy r-curve. The d-curve remains mostly level, close to the horizontal axis, thanks to the compensation it gets from the M-curve. Also, unlike the other three curves, its values never get too high or too low, which enables us to catch the high-lows easily. As a consequence, our VATdt algorithm only uses the d-curve to access the tendency. And we will only display the d-curve in the remaining part of the paper to show the reader a cleaner view, although we ourselves often feel the rand m-curves visually informative.</p><p>We use two thresholds to detect high-lows. When the d-curve hits a ceiling, set as 0.04, and then a floor, set as 0, the program reports one new cluster. These ceiling and floor values are satisfied by all cases in our numerical experiments, even those not reported here, where the clusters are reasonably, sometimes only barely, separated.</p><p>If we lower the ceiling and raise the floor, we would be able to catch some of the blended clusters we know we have missed, but it would also increase the chance of “catching” false clusters. We are not saying these values are the best. Any values are arguable, as arguable as the number of clusters is when the clusters are blended. We do not like the idea of tuning parameters to particular examples, and will stick to the same ceiling and floor values throughout this paper. In fact, we will stick to the same set of values for all parameters in our program, that is, the values for the ceiling and floor set here, and those for m, M and w given in (3).</p><p>The situation in the case <img src="4-1100084\a639294e-ae72-4ba3-bf25-af29f2c33a8b.jpg" /> shown in <xref ref-type="fig" rid="fig9">Figure 9</xref>, really deteriorates. One can barely make out the three clusters in part (a) that are supposed to be there; the ODI in part (b) is a mess. In fact, this is the same ODI as the one in <xref ref-type="fig" rid="fig4">Figure 4</xref>, put here again for side-by-side comparison with the scatterplot and the d-curve. The d-curve,</p><p>however, picks up cluster structure from the ODM. It has several high-lows, with two of them large enough to hit both the ceiling and floor, whose peaks are near 600 and 1000 marks on the horizontal axis, respectively. This example shows that our tendency curves are more sensitive than the raw block structure in the 2D display ODI. The largest advantage of the tendency curves is probably the quantization of gray level patterns which enables the computer, not only human eyes, to catch possible patterns.</p><p>One may question how many clusters this data set truly has, but it then depends on what one means by “truly”. This may be subjective. We see three clusters in <xref ref-type="fig" rid="fig9">Figure 9</xref>(a); if one sees only one cluster there, one may want to tune down the sensitivity of the program by raising its ceiling value and lowering its floor value. We are</p><p>only saying that our program can be sensitive enough to “see” three clusters in this case.</p><p>When <img src="4-1100084\9286ad72-b894-4de5-9845-1d5372e0abde.jpg" /> goes down to zero, the cluster structure disappears. The scatterplots for <img src="4-1100084\473adc6c-16dc-4736-8ded-676a97cd9e18.jpg" /> (<xref ref-type="fig" rid="fig1">Figure 1</xref>0(a)) and <img src="4-1100084\f5c787cc-8f06-413d-80af-e752b7ed7f51.jpg" /> (not shown) are almost identical, showing a single cluster in the center. The d-curves for both cases (Figures 10(b) and (c)) have no high-lows large enough to hit the ceiling then the floor, which is the way they should be.</p><p>We now show that, without modifying any parameter values, our VATdt algorithm works on small data sets, too. The data sets in this group of examples are similar to those in Figures 6-10 of Bezdek and Hathaway [<xref ref-type="bibr" rid="scirp.17956-ref5">5</xref>]. These are data sets in <img src="4-1100084\1dfe84fb-d14c-4c78-aa4f-712772591d74.jpg" /> generated in the same way as the examples in the first group. A total number of 120 observations were generated in four groups from multivariate normal distribution having mean vectors <img src="4-1100084\b84ca9af-4c12-4bea-8da9-fff4b7c41064.jpg" /> <img src="4-1100084\a1d0ab0d-6406-4ebe-859f-a2e2b627cd2e.jpg" /> where <img src="4-1100084\a395b776-b007-4e07-a682-4c044bc96d9a.jpg" /> are the unit axis vectors. A point has equal opportunity to fall into each of the four groups, that is, the probability is 0.25 for every group. The covariance matrices are all <img src="4-1100084\52acf28e-e9f9-4a3f-9de8-b176ca243e52.jpg" /> Note the distances between cluster centers are still <img src="4-1100084\c8768ace-14da-42bb-a590-71454c45748d.jpg" /> same as the previous examples. The appearances of the ODIs and the d-curves</p><p>are similar to those in <img src="4-1100084\589f8f22-c241-443e-90b0-aeddc00b3321.jpg" /> and so is the way they deteriorate as the value of <img src="4-1100084\e27ae6ec-8aea-43ba-9e03-b1555c98b51e.jpg" /> decreases. The d-curves for <img src="4-1100084\2a983b0d-af7a-43f7-bb7b-7839b1712e0e.jpg" /> 4 and 3 are given in <xref ref-type="fig" rid="fig1">Figure 1</xref>1. There are three clear high-lows on each of them, revealing the existence of four clusters, which we think is appropriate since the <img src="4-1100084\d5840b4e-074f-40d0-93f2-b6aa5b7c395c.jpg" /> values are relatively large.</p><p><xref ref-type="fig" rid="fig1">Figure 1</xref>2 gives another example in which the ODI from VAT (in part (a)) fails to show any useful visual information on the structure, but our program identified three high-lows (between the 40 and 80 marks), or four clusters, from the d-curve.</p><p>Remark: We remind the reader that the positions of the peaks and valleys do not reflect the sizes of the clusters closely unless the clusters are very well separated.</p><p>Does our method always say what it should say? Well, there is not, and there will never be, an infallible method to determine the number of clusters. In many cases, there are no right or wrong answers; it all depends on what one means by “should”. The data set used in <xref ref-type="fig" rid="fig1">Figure 1</xref>3 was generated the same way as that in Figures 10(a) and (b) where <img src="4-1100084\11e5bd50-feb1-4f41-85fe-2b7526523474.jpg" /> except that it contains only 100 observations. So there “should” be only a single cluster. But both the ODI from VAT and our d-curve show some structure, and our program “caught” three clusters. If one compares the scatterplots in Figures 10 and 13, one can find that there is a single well-shaped cluster in <xref ref-type="fig" rid="fig1">Figure 1</xref>0 while there are only scattered points in <xref ref-type="fig" rid="fig1">Figure 1</xref>3. If the number of points is small, there is a difference between what a random generator is intended to generate and what it actually generates.</p><p>We now give two examples where the points are regularly arranged, on a rectangular grid, and along a pair of concentric circles, respectively. These are similar to the data sets in Figures 12 and 13 of Bezdek and Hathaway [<xref ref-type="bibr" rid="scirp.17956-ref5">5</xref>]. These examples show that the d-curve works better on these rows and rings than the ODI, which does not contain black blocks anymore. In <xref ref-type="fig" rid="fig1">Figure 1</xref>4, 32 points are equally arranged on each of the 8 lines, with the distance between two consecutive points on the same line equal to 0.05, and that between two lines 0.4. The ODI in</p><p>part (b) has a periodic nature, but no blocks. Bezdek and Hathaway [<xref ref-type="bibr" rid="scirp.17956-ref5">5</xref>] conclude from the ODI generated from a similar data set that “it is reasonable to conjecture that the underlying data fall into 8 very regular clusters”. The d-curve shown in part (c) is almost sinusoidal, with the highs and lows far beyond the ceiling and floor, strongly indicating the existence of 8 clusters (which are rows).</p><p>Remark: Our program works on the original example in [<xref ref-type="bibr" rid="scirp.17956-ref5">5</xref>] just as fine. We made the changes here so that it looks more like 8 clusters instead of a single rectangular cluster.</p><p>In <xref ref-type="fig" rid="fig1">Figure 1</xref>5, 64 points are equally distributed on a circle of radius 0.45 centered at (0.5, 0.5), and another 64 on a concentric circle of radius 0.25. This ODI does not contain black blocks either, only a block form; see <xref ref-type="fig" rid="fig1">Figure 1</xref>5(b). Based on this <img src="4-1100084\c458c2df-5523-4c1e-9c62-3b09db968168.jpg" /> block form Bezdek and Hathaway [<xref ref-type="bibr" rid="scirp.17956-ref5">5</xref>] infer that the data consist of two similar, regular structures. Our d-curve in <xref ref-type="fig" rid="fig1">Figure 1</xref>5(c) has a single high-low pattern on it, and the program reported the existence of two clusters.</p><p>It is almost a sacred ritual that everybody tries the Iris data in a paper on clustering, so we also tried our program on it. It is well-known that the data consist of values of four features of each of 150 irises (150 points in a four-dimensional feature space). These irises are of three different physical types, 50 from each type, thus the data have three physically labeled classes. But two of the three flower types yield data points that largely overlap in this particular feature space, so many argue that the unlabeled data are naturally clustered into two geometrically well-defined clusters; see [<xref ref-type="bibr" rid="scirp.17956-ref5">5</xref>]. The d-curve our program produced is given in <xref ref-type="fig" rid="fig1">Figure 1</xref>6(b), and the com-</p><p>puter caught the large high-low on the left and ignored the small one on the right, and reported the existence of two clusters. Once again one may argue on the correctness of the program ignoring the smaller high-low (thus the choice of the ceiling value), just as one can argue on the “correct” number of clusters in the Iris data.</p><p>We conclude this section with some comments on the choice of the parameter values of the program. Since enough has been said about the floor and ceiling values, here we only discuss the values of m, M and w. We</p><p>ended up with the values in (3) from experiments. First, we want as small a value for m as possible so that relatively small clusters will not get lost in the averaging process. But if it gets too small, the m-curve would get noisier and noisier and eventually fall back to the r-curve. We also want it as a percentage of n so that we do not have to change it to suit data sets of different sizes. Five percent is the smallest we dare go, (n often gets below 100, and then we are only looking at the average of a few rows), and it works very well. The performance of the program is not sensitive at all to the changes in M. As long as it is several times larger than m, we did not see much difference. The value of w makes a difference only occasionally, and, when it does, only marginally. We tried values from <img src="4-1100084\a7ee0b57-b74e-4916-b525-ad5a02b1b031.jpg" /> to 5n, and all of them worked fine. All in all, the value <img src="4-1100084\82018c71-154d-4454-9f54-7def1aa07fa3.jpg" /> worked best, but the difference was insignificant. Thus we decided that a single set of parameter values could successfully be used for all cases, which is a rare situation for clustering procedures invol&#173;ving user-selected parameter values.</p><p>One scenario in which we foresee the need of changing parameters is when the ratios of the cluster sizes in a data set are so large that (relatively) small clusters get lost in the averaging, causing the d-curve valleys to be too shallow to hit the floor. One will then need to decrease the values of m and w, which may help form larger high-low patterns on the d-curve. We would feel comfortable adjusting the values of the ceiling and floor if there are “clean” high-low patterns on the d-curve, that is, if there are not many zigzags when the curve goes up and down. When changing parameter values, we recommend the user to look at the r-curve, too. One should feel more confident if the r-curve does not show too much noise.</p></sec><sec id="s4"><title>4. Conclusions</title><p>Our VATdt algorithm is meant to replace the straightforward visual displaying part of the VAT algorithms mentioned in the second paragraph of &#167;1. Or, for that matter, it can start from an ordered dissimilarity matrix from any algorithm of that kind. Instead of displaying the matrix as a 2-dimensional gray-level image ODI for human interpretation, VATdt analyzes the matrix by taking averages of various kinds along its diagonal and produces the tendency curves, with the most useful of them being the d-curve. This changes 2D data (a matrix) into a 1D array, which is certainly easier to both human eyes and the computer since the concentration is now only on one variable—the height.</p><p>Possible cluster structure is reflected as high-low patterns on the d-curve with a relatively uniform range that enables the computer to catch them with thresholds. The values of thresholds may be arguable, but no more so than the “right” number of clusters that exist in a given data set. For example, some see only one single cluster in <xref ref-type="fig" rid="fig9">Figure 9</xref>(a) while we see three. Our experiments show that the computer is more sensitive to high-low patterns on the d-curve than human eyes to patterns in 2D graylevel images.</p><p>We are truly encouraged by the two examples in Figures 14 and 15 where the ODI images do not have blocks but the d-curve still did the job nicely. An ODI shows blocks only if the data set contains (elliptical) diskshaped clusters in 2-dimensional feature space, or ellipsoidor ball-shaped clusters in feature spaces of higher dimensions. Clusters of other shapes show different patterns in the ODI, whose meaning one can only guess. Our d-curve, however, clearly shows the cluster structures in both cases, where chain-shaped clusters exist.</p><p>We plan to further investigate and improve the VATdt algorithm, experimenting with it on clusters of different shapes, even with mixed shapes in the same data set. It also interests us to use different metrics, even dissimilarities that are not metrics. We are mainly interested in cases where structures in the ODM exist but the ODI does not show them clearly, at least not in black blocks.</p></sec><sec id="s5"><title>5. Acknowledgements</title><p>The author would like to thank Professor Richard J. Hathaway for his numerous helpful thoughts and suggestions, including the final title of this paper.</p></sec><sec id="s6"><title>REFERENCES</title></sec></body><back><ref-list><title>References</title><ref id="scirp.17956-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">A. K. Jain and R. C. Dubes, “Algorithms for Clustering Data,” Prentice-Hall, Englewood Cliffs, 1988.</mixed-citation></ref><ref id="scirp.17956-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">B. S. Everitt, “Graphical Techniques for Multivariate Data,” Elsevier, New York, 1978.</mixed-citation></ref><ref id="scirp.17956-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">J. W. Tukey, “Exploratory Data Analysis,” Addison-Wesley, Reading, 1977.</mixed-citation></ref><ref id="scirp.17956-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">W. S. Cleveland, “Visualizing Data,” Hobart Press, Summit, 1993.</mixed-citation></ref><ref id="scirp.17956-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">J. C. Bezdek and R. J. Hathaway, “VAT: A Tool for Visual Assessment of (Cluster) Tendency,” Proceedings of the 2002 International Joint Conference on Neural Networks, Honolulu, 12-17 May 2002, pp. 2225-2230.</mixed-citation></ref><ref id="scirp.17956-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">J. C. Bezdek, R. J. Hathaway and J. M. Huband, “Visual Assessment of Clustering Tendency for Rectangular Dissimilarity Matrices,” IEEE Transactions on Fuzzy Systems, Vol. 15, No. 5, 2007, pp. 890-903. 
doi:10.1109/TFUZZ.2006.889956</mixed-citation></ref><ref id="scirp.17956-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">R. J. Hathaway, J. C. Bezdek and J. M. Huband, “Scalable Visual Assessment of Cluster Tendency for Large Data Sets,” Pattern Recognition, Vol. 39, No. 7, 2006, pp. 1315-1324. doi:10.1016/j.patcog.2006.02.011</mixed-citation></ref><ref id="scirp.17956-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">J. M. Huband, J. C. Bezdek and R. J. Hathaway, “Revised Visual Assessment of (Cluster) Tendency (reVAT),” Proceedings of the North American Fuzzy Information Processing Society (NAFIPS), Banff, 27-30 June 2004, pp. 101-104.</mixed-citation></ref><ref id="scirp.17956-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">J. M. Huband, J. C. Bezdek and R. J. Hathaway, “bigVAT: Visual Assessment of Cluster Tendency for Large Data Set,” Pattern Recognition, Vol. 38, No. 11, 2005, pp. 1875-1886. doi:10.1016/j.patcog.2005.03.018</mixed-citation></ref><ref id="scirp.17956-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">I. Borg and J. Lingoes, “Multidimensional Similarity Structure Analysis,” Springer-Verlag, New York, 1987. 
doi:10.1007/978-1-4612-4768-5</mixed-citation></ref><ref id="scirp.17956-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">M. Kendall and J. D. Gibbons, “Rank Correlation Methods,” Oxford University Press, New York, 1990.</mixed-citation></ref></ref-list></back></article>