# I. INTRODUCTION The generalization of the principal component analysis (PCA) is an important research theme in the symbolic data analysis [1][2][3][4]. The main purpose of the traditional PCA is to transform a number of possibly correlated variables into a small number of uncorrelated variables called principal components. Chouakria [5] proposed the extension of the PCA to interval data as vertices principal component analysis (V-PCA). Chouakria et al. [6] proposed also the centers method of PCA (C-PCA) for interval data, and they presented a comparative example for the V-PCA and the C-PCA. Lauro and Palumbo [7] proposed symbolic object principal component analysis (SO-PCA) as an extended PCA to any numerical data structure. Lauro et al. [8] summarize various methods of SO-PCA for interval data. The author also proposed a general "Symbolic PCA" (S-PCA) based on the quantification method by using the generalized Minkowski metrics [9,10]. In this approach, we first transform the given symbolic data table to a usual numerical data table, and then we execute the traditional PCA on the transformed data table. In this article, another quantification method for symbolic data tables based on the monotone structures of objects is presented. In Section 2, first we describe the case of point sequences in a d-dimensional Euclidean space. The monotone structures are characterized by the nesting of the Cartesian join regions associated with pairs of objects. If the given point sequence is monotone in the Euclidean d space, the property is also satisfied in any feature axis. In other words, a nesting structure of the given point sequence in the d space confines the orders of points in each feature axis to be similar. Therefore, we can evaluate the degree of similarity between features based on the Kendall or the Spearman's rank correlation coefficients. Then, we can execute a traditional PCA based on the correlation matrix by the selected rank correlation coefficient. Secondly, we describe the "object splitting method" for SO-PCA for interval-valued data [11]. This method splits each of N symbolic objects described by d interval-valued features into the two d-dimensional vertices called the "minimum sub-object" and the "maximum sub object". We should point out the fact that any interval object can be reproduced from the minimum and the maximum sub-objects. Moreover, the nesting structure of interval objects in the d space confines the orders of the minimum and the maximum sub-objects in each feature axis to be similar. Therefore, we can evaluate again the degree of similarity between features based on the Kendall or the Spearman's rank correlation coefficients on the (2 × N) × d standard numerical data table. We can execute a traditional PCA based on the correlation matrix by the selected rank correlation coefficient. As a further extension to manipulate histogram data, nominal multi-valued data, and others, we describe the "quantile method" for S-PCA [12] in Section 4. The problem is how to obtain a common numerical rep resentation of objects described by mixed types of features. For example, in histogram data, the numbers of subinter vals (bins) of the given histograms are mutually different in general. Therefore, we first define the cumulative dis tribution function for each histogram. Then, we select a common integer number m to generate the "quantiles" for all histograms. As the result, for each histogram, we have an (m + 1)-tuple composed of (m -1) quantiles and the minimum and the maximum values of the whole interval of the histogram. Then, we split each object into (m + 1) sub-objects: the minimum sub-object, (m -1) quantile sub objects and the maximum sub-object. By virtue of the monotonic property of the distribution function, (m + 1) sub-objects of an object satisfy automatically a nesting structure. Therefore, the nesting of N objects described by the minimum and the maximum sub-objects in the d space confines the orders of N × (m + 1) sub-objects in each feature axis to be similar. Again, we can evaluate the degree of similarity between features by the Kendall or the Spearman's rank correlation coefficient, and then execute a traditional PCA. Interval-valued data may be regarded as a special histogram-valued data, where only one bin organizes the histogram. Furthermore, we can also split nominal multi-valued data into (m + 1) sub-objects based on the distribution function associated with rank values attached to categorical values of an object. Therefore, by the quantile method we can transform a given general N × d symbolic data table to an {N × (m + 1)} × d standard numerical data table, and then we can execute a traditional PCA on the transformed data table. In Section 5, we describe several experimental results in order to show the effectiveness of the quantile method. Section 6 is a summary. # II. MONOTONE STRUCTURES AND OBJECT SPLITTING METHOD In this section, we describe some properties of monotone structures for point sequence and for interval objects. Then, we describe the object splitting method for S-PCA. # Monotone Structures for Point Sequence Let a set of N objects U be represented by U = {? 1 , ? 2 , ... , ? N }. Let each object ? i be described by d numerical features, i.e. a vector x i = (x i1 , x i2 , ... , x id ) in a d-dimensional Euclidean space R d . DEFINITION 1: Rectangular region spanned by x i and x j . Let J (? i , ? j ) be a rectangular region in R d spanned by the vectors x i and x j , and be defined by the following Cartesian product of d closed intervals. J (? i , ? j ) = [min(x i1 , x j 1 ), max(x i1 , x j 1 )] × [min(x i2 , x j 2 ), max(x i2 , x j 2 )] ×???× [min(x id , x jd ), max(x id , x jd )],(1) where min(a, b) and max(a, b) are the operators to take the minimum value and the maximum value from a and b, respectively. London Journal of Research in Science: Natural and Formal In the following, we call J (? i , ? j ) as the Cartesian join (region) of objects ? i and ? j [9,10,13]. DEFINITION 2: Nesting structure If a series of objects ? 1 , ? 2 , ... , ? N satisifies the nesting property J (? 1 , ? k ) ? J (? 1 , ? k+1 ), k = 1, 2,...,N -1, (2) the series is called a "nesting structure with the starting point ? 1 and the ending point ? N ". In Fig. 1, (a) is a monotone increasing series, and (b) is a monotone decreasing series of objects. It should be noted that the two series of objects show the same nesting structures with starting point ? 1 and ending point ? 5 . PROPOSITION 1: If a series of objects ? 1 , ? 2 , ... , ? N is a nesting structure with the starting point ? 1 and the ending point ? N in the space R d , the series satisfies the same structure in each feature (axis) of the space R d . Proof: From the definition of rectangular region as in Eq. ( 1), we have 15 J (? 1 , ? k ) = [min(x 11 , x k1 ), max(x 11 , x k1 )] × [min(x 12 , x k2 ), max(x 12 , x k2 )] ×???× [min(x 1d , x kd ), max(x 1d , x kd )],(3) Therefore, the relations of the Cartesian join regions J (? 1 , ? k ) ? J (? 1 , ? k+1 ), k = 1, 2,... , N -1, in Definition 2, require the following relations for each feature, i.e. for each j (= 1, 2,... , d), [min(x 1j , x kj ), max(x 1j , x kj )] ? [min(x 1j , x k+1,j ), max(x 1j , x k+1,j ], k = 1, 2,...,N -1. (5) Although, there exist several ways to define the mono tone sequences of objects, i.e. monotone structures, we use the following definition. DEFINITION 3: Monotone structure of a series of points. A series of objects ? 1 , ? 2 , ... , ? N is called a monotone structure, if the series satisfies the nesting structure of Definition 2. Since, for a pair of features, we can evaluate the degree of similarity between two sets of orders of objects for the same object set U by using the Kendall or the Spearman's rank correlation coefficient, we have Proposition 2. PROPOSITION 2: Correlation matrix S . If a series of objects ? 1 , ? 2 , ... , ? N is a monotone structure in the space R d , the absolute value of each off diagonal element of the d × d correlation matrix S takes the maximum value one in the sense of the Kendall or the Spearman's rank correlation coefficient. Proof: From Definition 3, any monotone structure must satisfy the nesting property of Definition 2. Then, from Proposition 1, the given series of objects has the identical nesting structure for each feature. This property exactly restricts the order of objects for each feature to be the same way or the reverse way according to the series of objects is monotone increasing or monotone decreasing. Therefore, if a series of objects is a monotone structure in R d , the absolute value of the correlation coefficient for each pair of features takes the maximum value one in the sense of the Kendall or the Spearman's rank correlation coefficient. From Proposition 2, if many off-diagonal elements of S take highly correlated values, we can expect the existence of a large eigenvalue of S , and that the corresponding eigenvector reproduces well the original nesting property of the set of objects in the space R d . EXAMPLE 1: As an intuitive example, suppose that the given set of objects in R d organizes an approximate monotone structure which is monotone increasing along each of d features, and the degrees of similarity between two features are the same for all possible pairs. Therefore, all off-diagonal elements of S take an identical value ?, 0