SORT Vectors, or Numeric-Text Mixed Matrix Files

How to index sort, rank sort, and cluster sort vectors or mixed numeric-text files. Also Standard and Cluster sorts in MatrixExplorer are described.

• To see EDIT
• ( ⇾ , Matrix Explorer for a list of commands):
A 6-row example file (disk name "num_example.csv")
• 1,15.1,119.1
• 2,6.2,94.2
• 3,3.3,121.3
• 4,14.4,93.4
• 5,2.5,120.5
• 6,17.6,93.6
To work with this example we give it the symbolic name numfile: Now we open numfile for 3 numeric values per row and display it in Text mode:
• OPEN(FIle=numfile, Format='3F')
• DLG(Text=numfile, Format="i2, 2f6.1", ColTitle="row,col1,col2", Ti="Cluster sort a numeric file")
See AXIS, INTPOL, and LINE on how to plot the data shown here.
• A click to a standard column button toggles the column sort ascending - descending - unsorted
• The sort algorithm is stable (i.e. for elements of equal value the original order is conserved)
• By activating a column trackbar the column is sorted to groups or clusters of similar values.  By use of the column track bars col1 was sorted to 2 clusters first. Subsequently col2 was sorted to 2 clusters as well. This time col2 was cluster sorted first. col1 is then subclustered
1. Clicking the col1 button or setting its trackbar to position 1 have the same effect:
• col1 is ordered in ascending order
• the title displays 1 for sort column nr 1, and a "^" symbol for ascending
• the row caption displays the reordered row numbers
• the column is displayed in red to show it is only 1 single group
2. Setting the col1 trackbar to position 2:
• col1 is kept in the same ascending order
• the row captions are the cluster numbers now. Decimal places are a sequence number.
• the 2 clusters are displayed in alternating colors
3. Setting the col2 trackbar to position 2:
• each col1 cluster is divided in 2 col2 clusters
• the row captions display the cluster nr of col1 in the 1st digit. The 2nd digit is the cluster nr of col2
• clusters of cluster sorted columns are displayed in alternating colors
• Cluster borders are set at decreasing maximum differences of adjacent elements.
• Column clusters are conserved when other columns are sorted subsequently.
• The number of clusters for each column ranges from 1 to a maximum of 15 (hex 0F).
• Numbering multi-column clusters is in hexadecimal digits 1,2,...,9,A,B,C,D,E,F for each column. For example, "A3" denotes cluster 10 of the first sorted column and cluster 3 of the second sorted column within cluster 10 of the first sorted column.
• Up to 8 columns can be subsorted with up to 15 subclusters each. The maximum cluster number is "FFFFFFFF" (decimal display would be "4294967295")
• Instead of sorting interactively, the script_controlled_file_sort sort function can in a single statement:
• SORT(FIle=numfile, Column=2,Groups=2, Column=3,Groups=2, Index=idx, CLUSter=clu, Rank=rnk)
The vectors clu, rnk, and idx should be dimensioned as the number of rows in the file.
• 1 2 3 4 5 6
• idx 2, 5, 3, 4, 6, 1 original_row_nr = idx(sorted_row_nr )
• rnk 6, 1, 3, 4, 2, 5 sorted_row_nr = rnk(original_row_nr)
• clu 11, 12, 12, 21, 21, 22
cluster_value( sorted_row_nr ) = clu( sorted_row_nr )
cluster_value(original_row_nr ) = clu( rnk(original_row_nr) )
(Note: Without PhysSort=1 the sorted file is not saved.)
• File sort works with as well. The first 3 lines of the next example are e.g.:
• Tokyo; Japan; Asia; 4.0
• Seoul; Korea; Asia; 4.0
• Mexico City; Mexico; America; 2.8
• OPEN(FIle=Cities, Format="3;,N;,") ! 3 semicolon separated strings and 1 numeric column
• DLG(Edit=Cities, Format="A20, 2A10, F5.1") ! open MatrixExplorer
The table was created by first clustering (4) the continents, second the countries (7), third million people / km 2 (1). Export to disk is by: More button -> Export -> Export to file. This adds the cluster numbers as a first column. A change in cluster number is clarified by a blank line.
• 111 Cairo Egypt Africa 9.0
• 211 Buenos Aires Argentina America 1.2
• 221 Sao Paulo Brazil America 2.2
• 231 Mexico City Mexico America 2.8
• 241 New York City USA America 1.1
• 241 Los Angeles USA America 1.4
• 311 Beijing China Asia 1.9
• 311 Shanghai China Asia 3.2
• 311 Hong Kong-Shenzhen China Asia 5.2
• 321 Delhi India Asia 5.8
• 321 Mumbai India Asia 8.2
• 321 Calcutta India Asia 8.5
• 331 Jakarta Indonesia Asia 3.7
• 341 Osaka-Kobe-Kyoto Japan Asia 2.5
• 341 Tokyo Japan Asia 4.0
• 351 Seoul Korea Asia 4.0
• 361 Karachi Pakistan Asia 0.7
• 371 Metro Manila Philippin Asia 6.5
• 411 London England Europe 1.1
• 421 Moscow Russia Europe 1.0
• , sort text, and/or numeric
• SORT(FIle=filename, Column=nr [, options])
• The file must first be OPENed for matrix access, e.g.:
• OPEN(FIle=name, Format="A") ! sort text lines ( 1 column sort)
• OPEN(fi=name, fmt="10fb4") ! a 10-column 4 byte floating point file
• OPEN(fi=name, fmt="10,;") ! 10 comma separated columns
 file options type mini ⇾ more file sort options: Index, Rank, CLUSter, Groups, Descending, ERRor FIle txt fi=name (required) name is a ⇾ matrix file as described in Matrix Explorer Column num (1) col=nr nr is the file column number to be sorted (default is 1) up to 8 columns can be sorted in a single SORT(...) statement SORT(FIle=filename, Col=4, Col=2) sorts along column 4 and then along column 2. For identical elements of column 2 the column 4 sort order is preserved. hierarchically classify the elements of one or more columns into different groups or clusters: cluster sort is activated when Groups=... is present ( ⇾ examples of cluster sorts)_ Option num opt=1 =1: sort character columns case sensitive (default is 0). Disregarded for numeric columns. DELDouBLes num DelDbl=d d = 0: keep doubles, d /= 0: keep just 1 copy. PhySsort log phys=P No sort if the Index=... option is present P=1: physical file sort to disk (this closes the file) P=0: logical sort only (default), no disk save. Useful e.g. to pre-sort a file for subsequent calls to MatrixExplorer or to READ and WRITE
(1) Sequence is relevant for this keyword. The keyword may appear repeatedly.
• Sort :
• SORT(Vector=vector1, Sorted=vector2 [, options])
• Matrices should be stored to disk prior to sorting along file
 options type mini ⇾ more vector sort options: Index, Rank, CLUSter, Groups, Descending, ERRor Vector vec (1) v=O O is the original vector of length lenO to be sorted (required) Sorted vec (1) s=S vector S (length lenS) is assigned the sorted vector O. Only the first MIN(lenO, lenS) elements are assigned. SORT(Vector=O, Sorted=S) 1 2 3 4 5 6 7 8 9 Orig O: 13.1 27.6 12.3 11.7 26.2 10.4 16.5 29.8 16.9 Sorted S: 10.4 11.7 12.3 13.1 16.5 16.9 26.2 27.6 29.8 O and S can be identical (in-place sort): SORT(Vector=O, Sorted=O) ! Original O is overwritten Sorted=... must be postfixed to its Vector=... co-sort of up to 8 vectors in a single SORT(...) statement. The last Vec=... defines the order sequence: SORT(Vec=O, Sorted=S, Vec=B, Sorted=T) 1 2 3 4 5 6 7 8 9 O: 13.1 27.6 12.3 11.7 26.2 10.4 16.5 29.8 16.9 S: 16.5 13.1 27.6 26.2 16.9 11.7 12.3 29.8 10.4 B: 0.1 0.1 0.8 0.3 0.1 0.9 0.0 0.8 0.2 T: 0.0 0.1 0.1 0.1 0.2 0.3 0.8 0.8 0.9 (O is cosorted following B)
(1) Sequence is relevant for this keyword. The keyword may appear repeatedly.
• common to file and array sorts  option type mini X, R, and C for a vector sort are vectors X, R, and C for file sorts can be either a file column number, or a vector with 1 element per row: Index num idx=X the index vector X is assigned the index of Sorted in Original . File sort only: if the Column=... option is missing: sort_file_along_index_vector SORT(FIle=name, Index=X) File sort only: no logical or physical sort of file if both the Column=... and the Index=... options are present. This is to allow subsequent manipulations of name: SORT(FIle=name, Column=nr, Index=X, ...) Rank num rnk=R the rank vector R is assigned the index of Original in Sorted ⇾ example CLUSter num clus=C the cluster vector C is assigned the compounded cluster value of all sorted columns. A column is divided in up to 15 clusters. For file sorts a maximum of Groups=... clusters is generated. Groups num (2) grps=g max nr of Vector cluster groups Groups= must be postfixed to its Column= or Vector= ⇾ example Descending log (2) desc=1 sort descending Descending= must be postfixed to its Column= or Vector= ERror LBL err=999 on error goto 999
(2) For files this keyword may occur repeatedly and must be postfixed to its Column=... option

Support HicEst   ⇾ Impressum