Author(s): Bansal N, Blum A, Chawla S
We consider the following clustering problem: we have a complete graph on n
vertices (items), where each edge
u v is labeled either or depending on whether u and v
have been deemed to be similar or different. The goal is to produce a partition of the vertices (a
clustering) that agrees as much as possible with the edge labels. That is, we want a clustering
that maximizes the number of edges within clusters, plus the number of edges between
clusters (equivalently, minimizes the number of disagreements: the number of edges inside
clusters plus the number of edges between clusters). This formulation is motivated from a
document clustering problem in which one has a pairwise similarity function f learned from
past data, and the goal is to partition the current set of documents in a way that correlates with
f as much as possible; it can also be viewed as a kind of “agnostic learning” problem.
An interesting feature of this clustering formulation is that one does not need to specify
the number of clusters k as a separate parameter, as in measures such as k-median or min-sum
or min-max clustering. Instead, in our formulation, the optimal number of clusters could be
any value between 1 and n, depending on the edge labels. We look at approximation algorithms for both minimizing disagreements and for maximizing agreements. For minimizing
disagreements, we give a constant factor approximation. For maximizing agreements we give
a PTAS, building on ideas of Goldreich, Goldwasser and Ron (1998) and de la Vega (1996).
We also show how to extend some of these results to graphs with edge labels in 1 1 , and
give some results for the case of random noise.
Author(s): Sakurai MH, Matsumoto T, Kiyohara H, Yamada H
Author(s): Yang YY, Tang YZ, Fan CL, Luo HT, Guo PR, et al.
Author(s): Zhu S, Shimokawa S, Shoyama Y, Tanaka H
Author(s): Huang HQ, Zhang X, Xu ZX, Su J, Yan SK, et al.
Author(s): Shyu KG, Tsai SC, Wang BW, Liu YC, Lee CC
Author(s): Zong Z, Fujikawa-Yamamoto K, Ota T, Guan X, Murakami M, et al.
Author(s): Wong VK, Zhou H, Cheung SS, Li T, Liu L
Author(s): Sui C Zhang J, Wei J, Chen S, Li Y, et al.
Author(s): Yen MH, Lin CC, Chang CH, Lin SC
Author(s): Ling RF
Author(s): Ashour ML, Wink M
Author(s): Yamamoto M, Kumagai A, Yamamura Y