Home | Issues | About JGAA | Instructions for Authors |
Special Issue on Selected Papers from the Second International Workshop on Algorithms and
Computation, WALCOM 2008
DOI: 10.7155/jgaa.00175
On the Approximability of Comparing Genomes with Duplicates
Vol. 13, no. 1, pp. 19-53, 2009. Regular paper.
Abstract A central problem in comparative genomics consists in computing a
(dis-)similarity measure between two genomes, e.g. in order to
construct a phylogenetic tree. A large number of such measures has
been proposed in the recent past: number of reversals,
number of breakpoints, number of common or
conserved intervals etc. In their initial
definitions, all these measures suppose that genomes contain no
duplicates. However, we now know that genes can be duplicated within
the same genome. One possible approach to overcome this difficulty is
to establish a one-to-one correspondence (i.e. a matching) between
genes of both genomes, where the correspondence is chosen in order to
optimize the studied measure. Then, after a gene relabeling according
to this matching and a deletion of the unmatched signed genes, two
genomes without duplicates are obtained and the measure can be
computed.
In this paper, we are interested in three measures (number of
breakpoints, number of common intervals and number of
conserved intervals) and three models of matching (exemplar,
intermediate and maximum matching
models).
We prove that, for each model and each measure M, computing a
matching between two genomes that optimizes M is
. We show that this result remains true even for two
genomes G1 and G2 such that G1 contains no duplicates and no
gene of G2 appears more than twice. Therefore, our results extend
those of [,,].
Besides, in order to evaluate the possible existence of approximation
algorithms concerning the number of
breakpoints, we also study the complexity of the
following decision problem: is there an exemplarization (resp. an
intermediate matching, a maximum matching) that induces no
breakpoint ? In particular, we extend a result
of [] by proving the
problem to be in the exemplar model for a new class of
instances, we note that the problems are equivalent in the
intermediate and the exemplar models and we show that the problem is in in the
maximum matching model.
Finally, we focus on a fourth measure, closely related to the number of breakpoints: the number
of adjacencies, for which we give several constant ratio
approximation algorithms in the maximum matching model, in the case where genomes contain
the same number of duplications of each gene.
Keywords: genome rearrangements, , duplicate genes,
breakpoints, adjacencies, common intervals, conserved intervals,
approximation algorithms.
|
Submitted: January 2008.
Published: February 2008.
Reviewed: July 2008.
Revised: July 2008.
Accepted: November 2008.
Final: December 2008.
Communicated by
Md. Saidur Rahman
|
Journal Supporters
|