Advanced search
Start date
(Reference retrieved automatically from Web of Science through information on FAPESP grant and its corresponding number as mentioned in the publication by the authors.)

Fast median computation for symmetric, orthogonal matrices under the rank distance

Full text
Meidanis, Joao [1] ; Chindelevitch, Leonid [2]
Total Authors: 2
[1] Univ Estadual Campinas, Campinas, SP - Brazil
[2] Imperial Coll, London - England
Total Affiliations: 2
Document type: Journal article
Source: Linear Algebra and its Applications; v. 614, n. SI, p. 394-414, APR 1 2021.
Web of Science Citations: 0

Biological genomes can be represented as square, symmetric, orthogonal, 0-1 matrices. It turns out that the rank distance applied to two genome matrices has a biological significance: it is related to the smallest number of basic rearrangement mutations, such as reversals, translocations, transpositions (taken with weight 2), etc. that explain the differences between the two genomes. Therefore, closer genomes will produce smaller rank distances. An important tool in this context is the median problem: given three genomes A, B, and C, find a fourth genome M that minimizes d(A, M) + d(B, M) + d(C, M). For genome matrices, the computational complexity of this problem is currently unknown. However, for orthogonal matrices, there are fast algorithms that solve it exactly. One such algorithm uses a ``walk towards the median{''} paradigm. Starting from any of the input matrices, say, B, the algorithm produces rank-1 ``steps{''}, which are rank-1 matrices that, added to B, decrease its rank distance to both A and C simultaneously. It can be shown that such steps always exist for orthogonal matrices, and can be found in polynomial time. The algorithm stops when no more improvement can be made, which is equivalent to saying that B lies between A and C in terms of the rank distance (the triangle inequality becomes an equality). There is an O(n(omega+1)) algorithm implementing this idea, where w is the matrix multiplication exponent. Here we propose a novel scheme that works for symmetric orthogonal matrices, and produces a median, also guaranteed to be symmetric, in O(n(omega)) time. There is another O(n(omega)) time algorithm that produces the so-called M-I median, which agrees with the majority in the subspaces where A = B, B = C, or C = A, and is equal to the identity in the orthogonal complement of the sum of these subspaces. However, this algorithm produces a different median, and has only been proved to be correct for genomic matrices. The algorithm we present here is more general. (C) 2020 Elsevier Inc. All rights reserved. (AU)

FAPESP's process: 18/00031-7 - Studies on genome comparison
Grantee:João Meidanis
Support type: Regular Research Grants