分类项目
现在我们需要编写一个函数来将项目分类到一个组/集群。对于给定的项目,我们将找到它与每个均值的相似性,我们将该项目分类为最接近的项目。
def Classify(means,item):
# Classify item to the mean with minimum distance
minimum = sys.maxint;
index = -1;
for i in range(len(means)):
# Find distance from item to mean
dis = EuclideanDistance(item, means[i]);
if (dis < minimum):
minimum = dis;
index = i;
return index;
为了实际找到方法,我们将遍历所有项目,将它们分类到最近的集群并更新集群的均值。我们将重复该过程一定数量的迭代。如果在两次迭代之间没有项目更改分类,我们会在算法找到最佳解决方案时停止该过程。
以下函数将输入k(所需簇的数量),项目和最大迭代次数作为输入,并返回均值和簇。的项的分类存储在数组属于关联和项目的群集中的号被存储在clusterSizes。
def CalculateMeans(k,items,maxIterations=100000):
# Find the minima and maxima for columns
cMin, cMax = FindColMinMax(items);
# Initialize means at random points
means = InitializeMeans(items,k,cMin,cMax);
# Initialize clusters, the array to hold
# the number of items in a class
clusterSizes= [0 for i in range(len(means))];
# An array to hold the cluster an item is in
belongsTo = [0 for i in range(len(items))];
# Calculate means
for e in range(maxIterations):
# If no change of cluster occurs, halt
noChange = True;
for i in range(len(items)):
item = items[i];
# Classify item into a cluster and update the
# corresponding means.
index = Classify(means,item);
clusterSizes[index] += 1;
cSize = clusterSizes[index];
means[index] = UpdateMean(cSize,means[index],item);
# Item changed cluster
if(index != belongsTo[i]):
noChange = False;
belongsTo[i] = index;
# Nothing changed, return
if (noChange):
break;
return means;








暂无数据