CMEANS
Fuzzy c-means clustering allows each data point to belong to multiple clusters with varying degrees of membership. This is a soft clustering approach compared to k-means where each data point belongs to exactly one cluster.
The algorithm minimizes the following objective function:
J_m = \sum_{i=1}^{N} \sum_{j=1}^{C} u_{ij}^m ||x_i - c_j||^2
where N is the number of samples, C is the number of clusters, u_{ij} is the degree of membership of x_i in the cluster j, c_j is the cluster center, and m is the fuzziness parameter.
The algorithm handles high-dimensional datasets and overlapping clusters well, returning properties like the cluster centers, partition matrix, and objective function history.
Excel Usage
=CMEANS(data, c, m, error, maxiter)
data(list[list], required): 2D array of data to be clustered, where rows are features and columns are samples (S x N). Note the transpose requirement compared to typical scikit-learn S x N layout.c(int, required): Desired number of clusters or classes.m(float, required): Array exponentiation applied to the membership function (fuzziness parameter, typically 2.0).error(float, required): Stopping criterion; stop early if the change in partition matrix is less than this error (e.g., 0.005).maxiter(int, required): Maximum number of iterations allowed.
Returns (dict): Dictionary of clustering results, including cluster centers and the fuzzy partition matrix.
Example 1: Fuzzy c-means clustering of 2D data
Inputs:
| data | c | m | error | maxiter | |
|---|---|---|---|---|---|
| 1 | 2 | 1 | 2 | 0.005 | 2 |
| 1 | 2 |
Excel formula:
=CMEANS({1,2;1,2}, 1, 2, 0.005, 2)
Expected output:
{"type":"Double","basicValue":1,"properties":{"fpc":{"type":"Double","basicValue":1},"cntr":{"type":"Array","elements":[[{"type":"Double","basicValue":1.5},{"type":"Double","basicValue":1.5}]]},"u":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":1}]]},"d":{"type":"Array","elements":[[{"type":"Double","basicValue":0.707107},{"type":"Double","basicValue":0.707107}]]},"jm":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"p":{"type":"Double","basicValue":1}}}
Python Code
import numpy as np
from skfuzzy import cmeans as fuzz_cmeans
def cmeans(data, c, m, error, maxiter):
"""
Perform fuzzy c-means clustering on data.
See: https://pythonhosted.org/scikit-fuzzy/api/skfuzzy.html#skfuzzy.cmeans
This example function is provided as-is without any representation of accuracy.
Args:
data (list[list]): 2D array of data to be clustered, where rows are features and columns are samples (S x N). Note the transpose requirement compared to typical scikit-learn S x N layout.
c (int): Desired number of clusters or classes.
m (float): Array exponentiation applied to the membership function (fuzziness parameter, typically 2.0).
error (float): Stopping criterion; stop early if the change in partition matrix is less than this error (e.g., 0.005).
maxiter (int): Maximum number of iterations allowed.
Returns:
dict: Dictionary of clustering results, including cluster centers and the fuzzy partition matrix.
"""
try:
data_np = np.array(data, dtype=float)
if data_np.ndim != 2:
return "Error: data must be a 2D array"
cntr, u, u0, d, jm, p, fpc = fuzz_cmeans(
data=data_np,
c=c,
m=m,
error=error,
maxiter=maxiter
)
return {
"type": "Double",
"basicValue": float(fpc),
"properties": {
"fpc": {"type": "Double", "basicValue": float(fpc)},
"cntr": {
"type": "Array",
"elements": [[{"type": "Double", "basicValue": float(val)} for val in row] for row in cntr]
},
"u": {
"type": "Array",
"elements": [[{"type": "Double", "basicValue": float(val)} for val in row] for row in u]
},
"d": {
"type": "Array",
"elements": [[{"type": "Double", "basicValue": float(val)} for val in row] for row in d]
},
"jm": {
"type": "Array",
"elements": [[{"type": "Double", "basicValue": float(val)}] for val in jm]
},
"p": {"type": "Double", "basicValue": float(p)}
}
}
except Exception as e:
return f"Error: {str(e)}"