CMEANS

Fuzzy c-means clustering allows each data point to belong to multiple clusters with varying degrees of membership. This is a soft clustering approach compared to k-means where each data point belongs to exactly one cluster.

The algorithm minimizes the following objective function:

J_m = \sum_{i=1}^{N} \sum_{j=1}^{C} u_{ij}^m ||x_i - c_j||^2

where N is the number of samples, C is the number of clusters, u_{ij} is the degree of membership of x_i in the cluster j, c_j is the cluster center, and m is the fuzziness parameter.

The algorithm handles high-dimensional datasets and overlapping clusters well, returning properties like the cluster centers, partition matrix, and objective function history.

Excel Usage

=CMEANS(data, c, m, error, maxiter)
  • data (list[list], required): 2D array of data to be clustered, where rows are features and columns are samples (S x N). Note the transpose requirement compared to typical scikit-learn S x N layout.
  • c (int, required): Desired number of clusters or classes.
  • m (float, required): Array exponentiation applied to the membership function (fuzziness parameter, typically 2.0).
  • error (float, required): Stopping criterion; stop early if the change in partition matrix is less than this error (e.g., 0.005).
  • maxiter (int, required): Maximum number of iterations allowed.

Returns (dict): Dictionary of clustering results, including cluster centers and the fuzzy partition matrix.

Example 1: Fuzzy c-means clustering of 2D data

Inputs:

data c m error maxiter
1 2 1 2 0.005 2
1 2

Excel formula:

=CMEANS({1,2;1,2}, 1, 2, 0.005, 2)

Expected output:

{"type":"Double","basicValue":1,"properties":{"fpc":{"type":"Double","basicValue":1},"cntr":{"type":"Array","elements":[[{"type":"Double","basicValue":1.5},{"type":"Double","basicValue":1.5}]]},"u":{"type":"Array","elements":[[{"type":"Double","basicValue":1},{"type":"Double","basicValue":1}]]},"d":{"type":"Array","elements":[[{"type":"Double","basicValue":0.707107},{"type":"Double","basicValue":0.707107}]]},"jm":{"type":"Array","elements":[[{"type":"Double","basicValue":1}]]},"p":{"type":"Double","basicValue":1}}}

Python Code

import numpy as np
from skfuzzy import cmeans as fuzz_cmeans

def cmeans(data, c, m, error, maxiter):
    """
    Perform fuzzy c-means clustering on data.

    See: https://pythonhosted.org/scikit-fuzzy/api/skfuzzy.html#skfuzzy.cmeans

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): 2D array of data to be clustered, where rows are features and columns are samples (S x N). Note the transpose requirement compared to typical scikit-learn S x N layout.
        c (int): Desired number of clusters or classes.
        m (float): Array exponentiation applied to the membership function (fuzziness parameter, typically 2.0).
        error (float): Stopping criterion; stop early if the change in partition matrix is less than this error (e.g., 0.005).
        maxiter (int): Maximum number of iterations allowed.

    Returns:
        dict: Dictionary of clustering results, including cluster centers and the fuzzy partition matrix.
    """
    try:
        data_np = np.array(data, dtype=float)
        if data_np.ndim != 2:
            return "Error: data must be a 2D array"

        cntr, u, u0, d, jm, p, fpc = fuzz_cmeans(
            data=data_np,
            c=c,
            m=m,
            error=error,
            maxiter=maxiter
        )

        return {
            "type": "Double",
            "basicValue": float(fpc),
            "properties": {
                "fpc": {"type": "Double", "basicValue": float(fpc)},
                "cntr": {
                    "type": "Array",
                    "elements": [[{"type": "Double", "basicValue": float(val)} for val in row] for row in cntr]
                },
                "u": {
                    "type": "Array",
                    "elements": [[{"type": "Double", "basicValue": float(val)} for val in row] for row in u]
                },
                "d": {
                    "type": "Array",
                    "elements": [[{"type": "Double", "basicValue": float(val)} for val in row] for row in d]
                },
                "jm": {
                    "type": "Array",
                    "elements": [[{"type": "Double", "basicValue": float(val)}] for val in jm]
                },
                "p": {"type": "Double", "basicValue": float(p)}
            }
        }
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

2D array of data to be clustered, where rows are features and columns are samples (S x N). Note the transpose requirement compared to typical scikit-learn S x N layout.
Desired number of clusters or classes.
Array exponentiation applied to the membership function (fuzziness parameter, typically 2.0).
Stopping criterion; stop early if the change in partition matrix is less than this error (e.g., 0.005).
Maximum number of iterations allowed.