Subscribe to Syndicate

Anna Katharina Hildebrandt

We are interested in the greedy method to compute an hierarchical complete linkage clustering. There are two known methods for this problem, one having a running time of O(n3) with a space requirement of O(n) and one having a running time of O(n2 log n) with a space requirement of (n2), where n is the number of points to be clustered. Both methods are not capable to handle large point sets. In this paper, we give an algorithm with a space requirement of O(n) which is able to cluster one million points in a day.
Estimation of the pose in which two given molecules might bind together to form a potential complex is a crucial task in structural biology. To solve this so-called "docking problem", most algorithms initially generate large numbers of candidate poses (or decoys) which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates ranges from thousands to millions, performing the clustering on standard CPUs is highly time consuming.
The computation of root mean square deviations (RMSD) is an important step in many bioinformatics applications. If approached naively, each RMSD computation takes time linear in the number of atoms. In addition, a careful implementation is required to achieve numerical stability, which further increases runtimes. In practice, the structural variations under consideration are often induced by rigid transformations of the protein, or are at least dominated by a rigid component.

Pages