Clustering model selection matrix

This notebook compares performance of various clustering models aimed at the selection of the optimal model for delineation of spatial signatures.

Dimensions

Dimensions of models to be tested.

Algorithms

  • K-Means

  • K-Medoid

  • SOM

    • different architectures

      • grid dimensions

      • parameter selection

  • GMM

Data normalisation

  • MinMax stretch

  • Standardise

  • RobustScaler?

Dimensionality reduction

  • PCA

  • tSNA?

  • K-means with k>200?

Number of clusters

  • n -> m

Input data

  • Form

  • Function

  • Form & Function

Comparison

Quantitative data

  • Mean sampled silhouette score

  • Calinski-Harabasz

  • Davies-Bouldin

  • BIC

Qualitative data

  • label frequencies

  • cross tabulation

    • postcode classification

    • modum

    • worldpop

  • N-S, E-W distribution of cluster centers

    • weighted by area?

  • Signatures

    • polygon areas

    • distances between signatures of the same kind

    • number of polygons/components (how many times we see the signature type)

  • maps for a few cities

    • Liverpool

    • Glasgow

    • London

  • clustergram for every algorithm

First phase

  • Each algorithm on a few similar

Data normalisation

Since we’ll be reusing normalised/standardised data repeatedly, we do the transformation once and store results in chunked parquet files.

import dask.dataframe
form = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/morphometrics/convolutions/conv_*.pq")
standardized = (form - form.mean()) / form.std()
%%time
standardized.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/standardized/")
CPU times: user 3min 43s, sys: 1min 14s, total: 4min 57s
Wall time: 3min 47s
min_max = (form - form.min()) / (form.max() - form.min())
%%time
min_max.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/normalized/")
CPU times: user 3min 1s, sys: 1min 7s, total: 4min 8s
Wall time: 3min 48s

Harmonize chunks

Some chunks are missing columns as certain land use types are not present. We need to harmonize our chunks to have the same columns in each of them.

import pyarrow.parquet as pq

columns = set()
for i in range(103):
    schema = pq.read_schema(f"../../urbangrammar_samba/spatial_signatures/functional/functional/func_{i}.pq")
    for c in schema.names:
        columns.add(c)

for i in range(103):
    df = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/functional/functional/func_{i}.pq")
    missing = [c for c in columns if c not in df.columns]
    df[missing] = 0
    df.to_parquet(f"../../urbangrammar_samba/spatial_signatures/functional/functional/func_{i}.pq")
%%time
function = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/functional/functional/func_*.pq")
standardized = (function - function.mean()) / function.std()
min_max = (function - function.min()) / (function.max() - function.min())
standardized.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/function/standardized/")
min_max.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/function/normalized/")
CPU times: user 5min 35s, sys: 1min 39s, total: 7min 15s
Wall time: 3min 3s

Ensure that each observation has hindex.

stand_fn = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/function/standardized/")
standardized_form = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/standardized/")
standardized_form['hindex'] = stand_fn.index.values
standardized_form.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/standardized/")

normalized_form = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/normalized/")
normalized_form['hindex'] = stand_fn.index.values
normalized_form.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/normalized/")

Test cases

Test cases:

  • Chunk 68 - Glasgow 155609

  • Chunk 51 - Merseyside 121188

  • Random sample - 250000

standardized_form = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/standardized/").compute().set_index('hindex')
sample = standardized_form.sample(n=250_000, random_state=42)
sample.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/form_standardized.pq")
normalized_form = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/normalized/").compute().set_index('hindex')
sample_norm = normalized_form.loc[sample.index]
sample_norm.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/form_normalized.pq")
stand_fn = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/function/standardized/").compute()
sample_stand_fn = stand_fn.loc[sample.index]
sample_stand_fn.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/function_standardized.pq")
norm_fn = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/function/normalized/").compute()
sample_norm_fn = norm_fn.loc[sample.index]
sample_norm_fn.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/function_normalized.pq")
geoms = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_*.pq").compute().set_index("hindex")
sample_geoms = geoms.loc[sample.index]
sample_geoms.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/geometry.pq")

Evaluation

Link auxillary data.

import geopandas as gpd
import tobler
import rioxarray
import rasterstats
import numpy as np

parts = {}
parts["chunk51"] = gpd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_51.pq")
parts["chunk68"] = gpd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_68.pq")
parts["sample"] = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/geometry.pq", columns=["tessellation"])
parts["sample"] = gpd.GeoDataFrame(parts["sample"])
parts["sample"]["tessellation"] = gpd.GeoSeries.from_wkb(parts["sample"].tessellation, crs=27700)
parts["sample"] = parts["sample"].set_geometry("tessellation")


for key, gdf in parts.items():
    murray = gpd.read_file("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/murray.gpkg", bbox=tuple(gdf.total_bounds))
    murray.geometry = murray.buffer(80, cap_style=3)
    joined = tobler.area_weighted.area_join(murray, gdf, variables=["ward"])
    joined.reset_index()[["hindex", 'ward']].to_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/murray_{key}.pq")
    
    modum = gpd.read_file("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/modumew2016.zip", bbox=tuple(gdf.total_bounds))
    joined = tobler.area_weighted.area_join(modum, gdf, variables=["CLUSTER_LA"])
    joined.reset_index()[["hindex", 'CLUSTER_LA']].to_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/modum_{key}.pq")
    
    foot = rioxarray.open_rasterio("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/jochem.tif")
    foot_osgb = foot.rio.reproject("EPSG:27700")
    clipped = foot_osgb.rio.clip_box(*gdf.total_bounds)
    arr = clipped.values
    affine = clipped.rio.transform()
    stats = rasterstats.zonal_stats(
        gdf.representative_point(), 
        raster=arr[0],
        affine=affine,
        stats=['mean'],
        nodata = np.nan,
    )
    gdf['jochem'] = [x["mean"] for x in stats]
    gdf.reset_index()[["hindex", 'jochem']].to_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/jochem_{key}.pq")
    print(f"Part {key} done.")
/opt/conda/lib/python3.8/site-packages/geopandas/geodataframe.py:577: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance.
  for feature in features_lst:
/opt/conda/lib/python3.8/site-packages/tobler/area_weighted/area_join.py:63: UserWarning: Cannot preserve dtype of 'ward'. Falling back to `dtype=object`.
  warnings.warn(
Part sample done.
def evaluation(data, labels, case, identifier, murray, modum, jochem, geom, sample_size=None):
    """Get evaluation metrics for a given clustering
    
    Parameters:
        data : array
        labels : array
        case : string {"chunks", "sample"}
        identifier : ID of clustering model
        sample_size : int (silhouette_score sample size)
    
    """
    from sklearn import metrics
    import pandas as pd
    import scipy as sp
    import matplotlib.pyplot as plt
    import contextily as ctx
    import urbangrammar_graphics as ugg
    import dask_geopandas
    from utils.dask_geopandas import dask_dissolve
    
    
    results = {}
    
    try:
        results['silhouette'] = metrics.silhouette_score(data, labels, sample_size=sample_size, random_state=42)
    except ValueError:
        results['silhouette'] = np.nan
    results['calinski'] = metrics.calinski_harabasz_score(data, labels)
    results['davies'] = metrics.davies_bouldin_score(data, labels)

    results['frequencies'] = pd.Series(labels).value_counts()
    
    # cross tabulation

    modum['labels'] = labels
    mod_crosstab = pd.crosstab(modum.dropna()['labels'], modum.dropna()["CLUSTER_LA"])
    results['mod_chi'], results['mod_p'], results['mod_dof'], results['mod_exp'] = sp.stats.chi2_contingency(mod_crosstab)
    results['mod_cramers_'] = cramers_v(mod_crosstab)
    results['mod_crosstab'] = mod_crosstab

    murray['labels'] = labels
    mur_crosstab = pd.crosstab(murray.dropna()['labels'], murray.dropna()["ward"])
    results['mur_chi'], results['mur_p'], results['mur_dof'], results['mur_exp'] = sp.stats.chi2_contingency(mur_crosstab)
    results['mur_cramers_v'] = cramers_v(mur_crosstab)
    results['mur_crosstab'] = mur_crosstab


    jochem['labels'] = labels
    joc_crosstab = pd.crosstab(jochem.dropna()['labels'], jochem.dropna()["jochem"])
    results['joc_chi'], results['joc_p'], results['joc_dof'], results['joc_exp'] = sp.stats.chi2_contingency(joc_crosstab)
    results['joc_cramers_v'] = cramers_v(joc_crosstab)
    results['joc_crosstab'] = joc_crosstab
    
    
    if case == "chunks":
        # signatures
        
        geom['labels'] = labels        
        ddf = dask_geopandas.from_geopandas(geom.sort_values('labels'), npartitions=64)
        spsig = dask_dissolve(ddf, by='labels').compute().reset_index(drop=True).explode()
        
        results['signature_abundance'] = spsig.labels.value_counts()
        results['signature_areas'] = spsig.area
        
        
        cmap = ugg.get_colormap(spsig.labels.nunique(), randomize=True)
        token = ""
        
        ax = spsig.cx[332971:361675, 379462:404701].plot("labels", figsize=(20, 20), zorder=1, linewidth=.3, edgecolor='w', alpha=1, legend=True, cmap=cmap, categorical=True)
        ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('roads', token), zorder=2, alpha=.3)
        ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('labels', token), zorder=3, alpha=1)
        ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('background', token), zorder=-1, alpha=1)
        ax.set_axis_off()

        plt.savefig(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/maps/{identifier}_lpool.png")
        plt.close()   

        ax = spsig.cx[218800:270628, 645123:695069].plot("labels", figsize=(20, 20), zorder=1, linewidth=.3, edgecolor='w', alpha=1, legend=True, cmap=cmap, categorical=True)
        ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('roads', token), zorder=2, alpha=.3)
        ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('labels', token), zorder=3, alpha=1)
        ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('background', token), zorder=-1, alpha=1)
        ax.set_axis_off()
        plt.savefig(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/maps/{identifier}_gla.png")
        plt.close()    
    
#     else:
        
#         geom['labels'] = labels        
#         ddf = dask_geopandas.from_geopandas(geom.sort_values('labels'), npartitions=64)
#         spsig = dask_dissolve(ddf, by='labels').compute().reset_index().explode()
#         centroid = spsig.centroid
#         results['x_coords'] = centroid.x
#         results['y_coords ']= centroid.y
        
    return results


def cramers_v(confusion_matrix):
    import scipy as sp
    import numpy as np
    
    chi2 = sp.stats.chi2_contingency(confusion_matrix)[0]
    n = confusion_matrix.sum().sum()
    phi2 = chi2/n
    r,k = confusion_matrix.shape
    phi2corr = max(0, phi2-((k-1)*(r-1))/(n-1))
    rcorr = r-((r-1)**2)/(n-1)
    kcorr = k-((k-1)**2)/(n-1)
    return np.sqrt(phi2corr/min((kcorr-1),(rcorr-1)))
for transformation in ["normalized", "standardized"]:
    c51 = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/form/{transformation}/part.51.parquet")
    c68 = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/form/{transformation}/part.68.parquet")
    form = pd.concat([c51, c68]).reset_index(drop=True).drop(columns="hindex")

    c51f = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/function/{transformation}/part.51.parquet")
    c68f = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/function/{transformation}/part.68.parquet")
    fn = pd.concat([c51f, c68f]).reset_index(drop=True)
    
    data = pd.concat([form, fn], axis=1).replace([np.inf, -np.inf], np.nan).fillna(0)
    data.to_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/chunks_{transformation}_data.pq")
    
    form = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/form_{transformation}.pq")
    fn = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/function_{transformation}.pq")
    data = pd.concat([form, fn], axis=1).replace([np.inf, -np.inf], np.nan).fillna(0)
    data.to_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/sample_{transformation}_data.pq")

Test matrix

# !pip install scikit-learn-extra
# !pip install minisom
from itertools import product
from time import time

import pandas as pd
import numpy as np
import geopandas as gpd

from sklearn.cluster import KMeans, MiniBatchKMeans
from sklearn.mixture import GaussianMixture
from sklearn_extra.cluster import KMedoids
from minisom import MiniSom


labels = {}
times = {}
evaluations = {}
quant_errors = {}
import numpy as np
# for case in ["chunks", "sample"]:
for case in ["sample"]:
    
    if case == "chunks":
        mod51 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/modum_chunk51.pq")
        mod68 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/modum_chunk68.pq")
        modum = pd.concat([mod51, mod68]).reset_index(drop=True)

        mur51 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/murray_chunk51.pq")
        mur68 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/murray_chunk68.pq")
        murray = pd.concat([mur51, mur68]).reset_index(drop=True)

        joc51 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/jochem_chunk51.pq")
        joc68 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/jochem_chunk68.pq")
        jochem = pd.concat([joc51, joc68]).reset_index(drop=True)

        geom51 = gpd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_51.pq", columns=["tessellation", "hindex"])
        geom68 = gpd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_68.pq", columns=["tessellation", "hindex"])
        geom = pd.concat([geom51, geom68]).reset_index(drop=True).rename_geometry("geometry")

    else:
        modum = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/modum_sample.pq")
        murray = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/murray_sample.pq")
        jochem = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/jochem_sample.pq")
        geom = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/geometry.pq", columns=["tessellation", "hindex"])
        geom = gpd.GeoDataFrame(geom, geometry=gpd.GeoSeries.from_wkb(geom.tessellation))

#     for transformation in ["normalized", "standardized"]:
    for transformation in ["standardized"]:
        
        # load data and prepare numpy.array
        if case == "chunks":
            data = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/chunks_{transformation}_data.pq").values
        else:
            data = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/sample_{transformation}_data.pq").values 

        for k in [10, 15, 20, 30]:
            # KMeans
            identifier = f"{case}_{transformation}_KMeans_{k}"
            s = time()
            km = KMeans(n_clusters=k, n_init=10, random_state=42).fit(data)
            times[identifier] = time() - s
            labels_ = km.labels_
            labels[identifier] = labels_
            
            evaluations[identifier] = evaluation(data, labels_, case, identifier, murray, modum, jochem, geom, sample_size=10_000)
            
            print(f"{identifier} done. Time to fit the model: {times[identifier]} seconds.")
            
        for k in [10, 15, 20, 30]:
            # MiniBatchKMeans
            identifier = f"{case}_{transformation}_MiniBatchKMeans_{k}"
            s = time()
            km = MiniBatchKMeans(n_clusters=k, batch_size=25_000, n_init=10, random_state=42).fit(data)
            times[identifier] = time() - s
            labels_ = km.labels_
            labels[identifier] = labels_
            
            evaluations[identifier] = evaluation(data, labels_, case, identifier, murray, modum, jochem, geom, sample_size=10_000)
            
            print(f"{identifier} done. Time to fit the model: {times[identifier]} seconds.")
            
#         for k in [10, 15, 20, 30]:
#             # K-Medoid
#             identifier = f"{case}_{transformation}_KMedoid_{k}"
#             s = time()
#             km = KMedoids(n_clusters=k, random_state=42).fit(data)
#             times[identifier] = time() - s
#             labels_ = km.labels_
#             labels[identifier] = labels_
            
#             evaluations[identifier] = evaluation(data, labels_, case, identifier, murray, modum, jochem, geom, sample_size=10_000)
            
#             print(f"{identifier} done. Time to fit the model: {times[identifier]} seconds.")
        
        for k in [10, 15, 20, 30]:
            # GMM
            identifier = f"{case}_{transformation}_GMM_{k}"
            s = time()
            gmm = GaussianMixture(n_components=k, n_init=10, random_state=42, covariance_type="full", max_iter=500).fit(data)
            times[identifier] = time() - s
            labels_ = gmm.predict(data)
            labels[identifier] = labels_
            
            evaluations[identifier] = evaluation(data, labels_, case, identifier, murray, modum, jochem, geom, sample_size=10_000)
            
            print(f"{identifier} done. Time to fit the model: {times[identifier]} seconds.")

        # SOM
        for som_shape in [(3, 3), (2, 5), (3, 4), (2, 6), (3, 5), (4, 5), (5, 5), (6, 5)]:
            for (sigma, rate) in product([.01, .1, .25, .5, 1], [.01, .05, .1, .2, .5, 1]):
                identifier = f"{case}_{transformation}_SOM_{som_shape}_sigma-{sigma}_rate-{rate}"
                s = time()
                som = MiniSom(som_shape[0], som_shape[1], data.shape[1], sigma=sigma, learning_rate=rate,
                              topology="hexagonal", random_seed=42)
                som.train_batch(data, 50000, verbose=False)
                winner_coordinates = np.array([som.winner(x) for x in data])
                labels_ = np.apply_along_axis(lambda x: str(tuple(x)), 1, winner_coordinates)
                if len(np.unique(labels_)) > 1:
                    times[identifier] = time() - s
                    labels[identifier] = labels_

                    evaluations[identifier] = evaluation(data, labels_, case, identifier, murray, modum, jochem, geom, sample_size=10_000)
                    quant_errors[identifier] = som.quantization_error(data)

                print(f"{identifier} done. Time to fit the model: {times[identifier]} seconds.")
sample_standardized_KMeans_10 done. Time to fit the model: 33.980648040771484 seconds.
sample_standardized_KMeans_15 done. Time to fit the model: 36.58458423614502 seconds.
sample_standardized_KMeans_20 done. Time to fit the model: 50.62095856666565 seconds.
sample_standardized_KMeans_30 done. Time to fit the model: 88.40085291862488 seconds.
sample_standardized_MiniBatchKMeans_10 done. Time to fit the model: 17.53959631919861 seconds.
sample_standardized_MiniBatchKMeans_15 done. Time to fit the model: 19.377883672714233 seconds.
sample_standardized_MiniBatchKMeans_20 done. Time to fit the model: 19.961794137954712 seconds.
sample_standardized_MiniBatchKMeans_30 done. Time to fit the model: 27.228529930114746 seconds.
sample_standardized_GMM_10 done. Time to fit the model: 8817.877690076828 seconds.
sample_standardized_GMM_15 done. Time to fit the model: 8004.2522094249725 seconds.
sample_standardized_GMM_20 done. Time to fit the model: 11593.263016223907 seconds.
sample_standardized_GMM_30 done. Time to fit the model: 21957.19268512726 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.01_rate-0.01 done. Time to fit the model: 6.041452169418335 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.01_rate-0.05 done. Time to fit the model: 6.0424699783325195 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.01_rate-0.1 done. Time to fit the model: 6.0597083568573 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.01_rate-0.2 done. Time to fit the model: 6.382687568664551 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.01_rate-0.5 done. Time to fit the model: 6.059550046920776 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.01_rate-1 done. Time to fit the model: 6.28273344039917 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.1_rate-0.01 done. Time to fit the model: 6.27402925491333 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.1_rate-0.05 done. Time to fit the model: 6.091947555541992 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.1_rate-0.1 done. Time to fit the model: 6.291349411010742 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.1_rate-0.2 done. Time to fit the model: 6.2743775844573975 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.1_rate-0.5 done. Time to fit the model: 6.125728130340576 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.1_rate-1 done. Time to fit the model: 6.091755628585815 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.25_rate-0.01 done. Time to fit the model: 6.207939147949219 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.25_rate-0.05 done. Time to fit the model: 6.399035930633545 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.25_rate-0.1 done. Time to fit the model: 6.149229049682617 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.25_rate-0.2 done. Time to fit the model: 6.255995750427246 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.25_rate-0.5 done. Time to fit the model: 6.120840549468994 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.25_rate-1 done. Time to fit the model: 6.245414733886719 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.5_rate-0.01 done. Time to fit the model: 5.998126745223999 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.5_rate-0.05 done. Time to fit the model: 6.2665159702301025 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.5_rate-0.1 done. Time to fit the model: 6.202017545700073 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.5_rate-0.2 done. Time to fit the model: 6.271490812301636 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.5_rate-0.5 done. Time to fit the model: 6.278396368026733 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.5_rate-1 done. Time to fit the model: 6.106752872467041 seconds.
sample_standardized_SOM_(3, 3)_sigma-1_rate-0.01 done. Time to fit the model: 6.008735656738281 seconds.
sample_standardized_SOM_(3, 3)_sigma-1_rate-0.05 done. Time to fit the model: 6.033384323120117 seconds.
sample_standardized_SOM_(3, 3)_sigma-1_rate-0.1 done. Time to fit the model: 6.1437554359436035 seconds.
sample_standardized_SOM_(3, 3)_sigma-1_rate-0.2 done. Time to fit the model: 6.351598739624023 seconds.
sample_standardized_SOM_(3, 3)_sigma-1_rate-0.5 done. Time to fit the model: 6.337004899978638 seconds.
sample_standardized_SOM_(3, 3)_sigma-1_rate-1 done. Time to fit the model: 6.048030853271484 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.01_rate-0.01 done. Time to fit the model: 6.246315956115723 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.01_rate-0.05 done. Time to fit the model: 6.338611841201782 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.01_rate-0.1 done. Time to fit the model: 6.214588403701782 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.01_rate-0.2 done. Time to fit the model: 6.299610137939453 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.01_rate-0.5 done. Time to fit the model: 6.38018798828125 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.01_rate-1 done. Time to fit the model: 6.186944007873535 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.1_rate-0.01 done. Time to fit the model: 6.1516852378845215 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.1_rate-0.05 done. Time to fit the model: 6.371593952178955 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.1_rate-0.1 done. Time to fit the model: 6.331070899963379 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.1_rate-0.2 done. Time to fit the model: 6.6916892528533936 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.1_rate-0.5 done. Time to fit the model: 6.45002007484436 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.1_rate-1 done. Time to fit the model: 6.447925567626953 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.25_rate-0.01 done. Time to fit the model: 6.230453252792358 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.25_rate-0.05 done. Time to fit the model: 6.4163124561309814 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.25_rate-0.1 done. Time to fit the model: 6.312195301055908 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.25_rate-0.2 done. Time to fit the model: 6.496335744857788 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.25_rate-0.5 done. Time to fit the model: 6.320337533950806 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.25_rate-1 done. Time to fit the model: 6.391258239746094 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.5_rate-0.01 done. Time to fit the model: 6.227715969085693 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.5_rate-0.05 done. Time to fit the model: 6.272166013717651 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.5_rate-0.1 done. Time to fit the model: 6.29688572883606 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.5_rate-0.2 done. Time to fit the model: 6.512700319290161 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.5_rate-0.5 done. Time to fit the model: 6.449718236923218 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.5_rate-1 done. Time to fit the model: 6.2205281257629395 seconds.
sample_standardized_SOM_(2, 5)_sigma-1_rate-0.01 done. Time to fit the model: 6.2186102867126465 seconds.
sample_standardized_SOM_(2, 5)_sigma-1_rate-0.05 done. Time to fit the model: 6.249726295471191 seconds.
sample_standardized_SOM_(2, 5)_sigma-1_rate-0.1 done. Time to fit the model: 6.2725629806518555 seconds.
sample_standardized_SOM_(2, 5)_sigma-1_rate-0.2 done. Time to fit the model: 6.410464763641357 seconds.
sample_standardized_SOM_(2, 5)_sigma-1_rate-0.5 done. Time to fit the model: 6.647894382476807 seconds.
sample_standardized_SOM_(2, 5)_sigma-1_rate-1 done. Time to fit the model: 6.202193737030029 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.01_rate-0.01 done. Time to fit the model: 6.466960906982422 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.01_rate-0.05 done. Time to fit the model: 6.4868175983428955 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.01_rate-0.1 done. Time to fit the model: 6.603813648223877 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.01_rate-0.2 done. Time to fit the model: 6.70729660987854 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.01_rate-0.5 done. Time to fit the model: 6.687565326690674 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.01_rate-1 done. Time to fit the model: 6.6126062870025635 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.1_rate-0.01 done. Time to fit the model: 6.622690200805664 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.1_rate-0.05 done. Time to fit the model: 6.527747392654419 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.1_rate-0.1 done. Time to fit the model: 8.404837846755981 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.1_rate-0.2 done. Time to fit the model: 9.995216131210327 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.1_rate-0.5 done. Time to fit the model: 8.759083271026611 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.1_rate-1 done. Time to fit the model: 6.665789604187012 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.25_rate-0.01 done. Time to fit the model: 6.517737150192261 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.25_rate-0.05 done. Time to fit the model: 6.59808087348938 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.25_rate-0.1 done. Time to fit the model: 6.585826396942139 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.25_rate-0.2 done. Time to fit the model: 6.641624450683594 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.25_rate-0.5 done. Time to fit the model: 8.689597129821777 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.25_rate-1 done. Time to fit the model: 9.790433645248413 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.5_rate-0.01 done. Time to fit the model: 8.342492580413818 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.5_rate-0.05 done. Time to fit the model: 6.496647596359253 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.5_rate-0.1 done. Time to fit the model: 6.587351083755493 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.5_rate-0.2 done. Time to fit the model: 6.779898166656494 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.5_rate-0.5 done. Time to fit the model: 6.784470319747925 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.5_rate-1 done. Time to fit the model: 6.452801465988159 seconds.
sample_standardized_SOM_(3, 4)_sigma-1_rate-0.01 done. Time to fit the model: 6.6474609375 seconds.
sample_standardized_SOM_(3, 4)_sigma-1_rate-0.05 done. Time to fit the model: 6.5371527671813965 seconds.
sample_standardized_SOM_(3, 4)_sigma-1_rate-0.1 done. Time to fit the model: 6.913776159286499 seconds.
sample_standardized_SOM_(3, 4)_sigma-1_rate-0.2 done. Time to fit the model: 6.674077033996582 seconds.
sample_standardized_SOM_(3, 4)_sigma-1_rate-0.5 done. Time to fit the model: 6.895212173461914 seconds.
sample_standardized_SOM_(3, 4)_sigma-1_rate-1 done. Time to fit the model: 6.550818204879761 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.01_rate-0.01 done. Time to fit the model: 6.406692743301392 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.01_rate-0.05 done. Time to fit the model: 6.451681852340698 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.01_rate-0.1 done. Time to fit the model: 6.651016712188721 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.01_rate-0.2 done. Time to fit the model: 6.768008470535278 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.01_rate-0.5 done. Time to fit the model: 6.5549585819244385 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.01_rate-1 done. Time to fit the model: 6.604512453079224 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.1_rate-0.01 done. Time to fit the model: 6.488897800445557 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.1_rate-0.05 done. Time to fit the model: 6.474110841751099 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.1_rate-0.1 done. Time to fit the model: 6.891201496124268 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.1_rate-0.2 done. Time to fit the model: 6.849219799041748 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.1_rate-0.5 done. Time to fit the model: 7.015116453170776 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.1_rate-1 done. Time to fit the model: 6.593651533126831 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.25_rate-0.01 done. Time to fit the model: 6.737154245376587 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.25_rate-0.05 done. Time to fit the model: 8.17149806022644 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.25_rate-0.1 done. Time to fit the model: 9.674012899398804 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.25_rate-0.2 done. Time to fit the model: 8.839396953582764 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.25_rate-0.5 done. Time to fit the model: 6.666623592376709 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.25_rate-1 done. Time to fit the model: 6.6030731201171875 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.5_rate-0.01 done. Time to fit the model: 6.458607912063599 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.5_rate-0.05 done. Time to fit the model: 6.374103784561157 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.5_rate-0.1 done. Time to fit the model: 6.520724058151245 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.5_rate-0.2 done. Time to fit the model: 6.621204376220703 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.5_rate-0.5 done. Time to fit the model: 6.6905903816223145 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.5_rate-1 done. Time to fit the model: 6.467045783996582 seconds.
sample_standardized_SOM_(2, 6)_sigma-1_rate-0.01 done. Time to fit the model: 6.614234685897827 seconds.
sample_standardized_SOM_(2, 6)_sigma-1_rate-0.05 done. Time to fit the model: 6.433612108230591 seconds.
sample_standardized_SOM_(2, 6)_sigma-1_rate-0.1 done. Time to fit the model: 6.642040967941284 seconds.
sample_standardized_SOM_(2, 6)_sigma-1_rate-0.2 done. Time to fit the model: 6.810717821121216 seconds.
sample_standardized_SOM_(2, 6)_sigma-1_rate-0.5 done. Time to fit the model: 6.835365056991577 seconds.
sample_standardized_SOM_(2, 6)_sigma-1_rate-1 done. Time to fit the model: 6.449937343597412 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.01_rate-0.01 done. Time to fit the model: 6.848511457443237 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.01_rate-0.05 done. Time to fit the model: 6.851857423782349 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.01_rate-0.1 done. Time to fit the model: 6.921448469161987 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.01_rate-0.2 done. Time to fit the model: 7.021340370178223 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.01_rate-0.5 done. Time to fit the model: 6.932762384414673 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.01_rate-1 done. Time to fit the model: 7.124608039855957 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.1_rate-0.01 done. Time to fit the model: 6.972275733947754 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.1_rate-0.05 done. Time to fit the model: 7.044497013092041 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.1_rate-0.1 done. Time to fit the model: 7.2646777629852295 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.1_rate-0.2 done. Time to fit the model: 8.82405710220337 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.1_rate-0.5 done. Time to fit the model: 10.416626453399658 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.1_rate-1 done. Time to fit the model: 9.215104579925537 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.25_rate-0.01 done. Time to fit the model: 8.639707326889038 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.25_rate-0.05 done. Time to fit the model: 10.058373928070068 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.25_rate-0.1 done. Time to fit the model: 8.89886999130249 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.25_rate-0.2 done. Time to fit the model: 7.006105422973633 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.25_rate-0.5 done. Time to fit the model: 7.002037286758423 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.25_rate-1 done. Time to fit the model: 6.9071714878082275 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.5_rate-0.01 done. Time to fit the model: 6.878051280975342 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.5_rate-0.05 done. Time to fit the model: 6.978433847427368 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.5_rate-0.1 done. Time to fit the model: 7.020828485488892 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.5_rate-0.2 done. Time to fit the model: 7.195865869522095 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.5_rate-0.5 done. Time to fit the model: 7.115154266357422 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.5_rate-1 done. Time to fit the model: 7.015010356903076 seconds.
sample_standardized_SOM_(3, 5)_sigma-1_rate-0.01 done. Time to fit the model: 6.9420506954193115 seconds.
sample_standardized_SOM_(3, 5)_sigma-1_rate-0.05 done. Time to fit the model: 7.148162841796875 seconds.
sample_standardized_SOM_(3, 5)_sigma-1_rate-0.1 done. Time to fit the model: 7.281223773956299 seconds.
sample_standardized_SOM_(3, 5)_sigma-1_rate-0.2 done. Time to fit the model: 7.0804712772369385 seconds.
sample_standardized_SOM_(3, 5)_sigma-1_rate-0.5 done. Time to fit the model: 7.378613233566284 seconds.
sample_standardized_SOM_(3, 5)_sigma-1_rate-1 done. Time to fit the model: 7.000572204589844 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.01_rate-0.01 done. Time to fit the model: 7.723738193511963 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.01_rate-0.05 done. Time to fit the model: 7.72091817855835 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.01_rate-0.1 done. Time to fit the model: 7.813778638839722 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.01_rate-0.2 done. Time to fit the model: 7.847135543823242 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.01_rate-0.5 done. Time to fit the model: 8.101349592208862 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.01_rate-1 done. Time to fit the model: 7.922775506973267 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.1_rate-0.01 done. Time to fit the model: 7.866520166397095 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.1_rate-0.05 done. Time to fit the model: 7.924614191055298 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.1_rate-0.1 done. Time to fit the model: 7.896871566772461 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.1_rate-0.2 done. Time to fit the model: 7.926555156707764 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.1_rate-0.5 done. Time to fit the model: 7.869636058807373 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.1_rate-1 done. Time to fit the model: 8.035306930541992 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.25_rate-0.01 done. Time to fit the model: 7.94612717628479 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.25_rate-0.05 done. Time to fit the model: 7.792319059371948 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.25_rate-0.1 done. Time to fit the model: 7.886402368545532 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.25_rate-0.2 done. Time to fit the model: 7.945167303085327 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.25_rate-0.5 done. Time to fit the model: 7.895480155944824 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.25_rate-1 done. Time to fit the model: 8.203335523605347 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.5_rate-0.01 done. Time to fit the model: 7.717268705368042 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.5_rate-0.05 done. Time to fit the model: 7.8255932331085205 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.5_rate-0.1 done. Time to fit the model: 8.04367995262146 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.5_rate-0.2 done. Time to fit the model: 9.806514978408813 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.5_rate-0.5 done. Time to fit the model: 11.029207229614258 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.5_rate-1 done. Time to fit the model: 9.586603164672852 seconds.
sample_standardized_SOM_(4, 5)_sigma-1_rate-0.01 done. Time to fit the model: 7.6841535568237305 seconds.
sample_standardized_SOM_(4, 5)_sigma-1_rate-0.05 done. Time to fit the model: 7.742480993270874 seconds.
sample_standardized_SOM_(4, 5)_sigma-1_rate-0.1 done. Time to fit the model: 8.154315948486328 seconds.
sample_standardized_SOM_(4, 5)_sigma-1_rate-0.2 done. Time to fit the model: 9.876065254211426 seconds.
sample_standardized_SOM_(4, 5)_sigma-1_rate-0.5 done. Time to fit the model: 11.988449573516846 seconds.
sample_standardized_SOM_(4, 5)_sigma-1_rate-1 done. Time to fit the model: 9.684459686279297 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.01_rate-0.01 done. Time to fit the model: 8.574947834014893 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.01_rate-0.05 done. Time to fit the model: 8.547278642654419 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.01_rate-0.1 done. Time to fit the model: 8.799760341644287 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.01_rate-0.2 done. Time to fit the model: 8.887267827987671 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.01_rate-0.5 done. Time to fit the model: 8.837449073791504 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.01_rate-1 done. Time to fit the model: 8.59840703010559 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.1_rate-0.01 done. Time to fit the model: 8.567549228668213 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.1_rate-0.05 done. Time to fit the model: 8.490204811096191 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.1_rate-0.1 done. Time to fit the model: 9.251555919647217 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.1_rate-0.2 done. Time to fit the model: 8.830044507980347 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.1_rate-0.5 done. Time to fit the model: 8.744437217712402 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.1_rate-1 done. Time to fit the model: 10.71109390258789 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.25_rate-0.01 done. Time to fit the model: 8.963495254516602 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.25_rate-0.05 done. Time to fit the model: 8.572678327560425 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.25_rate-0.1 done. Time to fit the model: 8.890587568283081 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.25_rate-0.2 done. Time to fit the model: 8.855430364608765 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.25_rate-0.5 done. Time to fit the model: 9.030456781387329 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.25_rate-1 done. Time to fit the model: 8.97514533996582 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.5_rate-0.01 done. Time to fit the model: 8.725110054016113 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.5_rate-0.05 done. Time to fit the model: 8.969297409057617 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.5_rate-0.1 done. Time to fit the model: 8.507286548614502 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.5_rate-0.2 done. Time to fit the model: 11.285863161087036 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.5_rate-0.5 done. Time to fit the model: 10.909534215927124 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.5_rate-1 done. Time to fit the model: 8.99370288848877 seconds.
sample_standardized_SOM_(5, 5)_sigma-1_rate-0.01 done. Time to fit the model: 8.653746128082275 seconds.
sample_standardized_SOM_(5, 5)_sigma-1_rate-0.05 done. Time to fit the model: 8.670263767242432 seconds.
sample_standardized_SOM_(5, 5)_sigma-1_rate-0.1 done. Time to fit the model: 8.526506185531616 seconds.
sample_standardized_SOM_(5, 5)_sigma-1_rate-0.2 done. Time to fit the model: 8.697363138198853 seconds.
sample_standardized_SOM_(5, 5)_sigma-1_rate-0.5 done. Time to fit the model: 9.55558180809021 seconds.
sample_standardized_SOM_(5, 5)_sigma-1_rate-1 done. Time to fit the model: 8.891328811645508 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 done. Time to fit the model: 11.544363021850586 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.01_rate-0.05 done. Time to fit the model: 14.227508068084717 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.01_rate-0.1 done. Time to fit the model: 15.08832836151123 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.01_rate-0.2 done. Time to fit the model: 14.320567846298218 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.01_rate-0.5 done. Time to fit the model: 12.988727807998657 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.01_rate-1 done. Time to fit the model: 12.774047136306763 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 done. Time to fit the model: 12.655745267868042 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.1_rate-0.05 done. Time to fit the model: 13.632773637771606 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.1_rate-0.1 done. Time to fit the model: 13.69734263420105 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.1_rate-0.2 done. Time to fit the model: 11.825275182723999 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.1_rate-0.5 done. Time to fit the model: 11.564648151397705 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.1_rate-1 done. Time to fit the model: 13.11095905303955 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 done. Time to fit the model: 11.38231635093689 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.25_rate-0.05 done. Time to fit the model: 10.2324857711792 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.25_rate-0.1 done. Time to fit the model: 10.075611352920532 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.25_rate-0.2 done. Time to fit the model: 11.45384669303894 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.25_rate-0.5 done. Time to fit the model: 13.136953353881836 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.25_rate-1 done. Time to fit the model: 12.988116979598999 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 done. Time to fit the model: 13.676304578781128 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 done. Time to fit the model: 13.65310263633728 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.5_rate-0.1 done. Time to fit the model: 11.345702409744263 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.5_rate-0.2 done. Time to fit the model: 10.200783014297485 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.5_rate-0.5 done. Time to fit the model: 10.253746271133423 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.5_rate-1 done. Time to fit the model: 9.905344724655151 seconds.
sample_standardized_SOM_(6, 5)_sigma-1_rate-0.01 done. Time to fit the model: 9.97464895248413 seconds.
sample_standardized_SOM_(6, 5)_sigma-1_rate-0.05 done. Time to fit the model: 9.808958053588867 seconds.
sample_standardized_SOM_(6, 5)_sigma-1_rate-0.1 done. Time to fit the model: 9.867895603179932 seconds.
sample_standardized_SOM_(6, 5)_sigma-1_rate-0.2 done. Time to fit the model: 10.446823596954346 seconds.
sample_standardized_SOM_(6, 5)_sigma-1_rate-0.5 done. Time to fit the model: 10.370377779006958 seconds.
sample_standardized_SOM_(6, 5)_sigma-1_rate-1 done. Time to fit the model: 11.236787557601929 seconds.
import pickle
with open("all_data.pickle", "wb") as f:
    pickle.dump((labels, times, evaluations, quant_errors), f)
labels = {}
times = {}
evaluations = {}
quant_errors = {}
options = evaluations.keys()
len(options)
1008
one_eval = evaluations[list(options)[0]]
useless = []

for op in list(options):
    if 'chunks' in op:
        if ((evaluations[op]["frequencies"] / 276797) > .9).any():
            useless.append(op)
    else:
        if ((evaluations[op]["frequencies"] / 250000) > .9).any():
            useless.append(op)
len(useless)
388
options = [o for o in list(evaluations.keys()) if o not in useless]
silhouettes_chunks = pd.Series()
silhouettes_sample = pd.Series()


for op in options:
    if 'chunks' in op:
        silhouettes_chunks[op[7:]] = evaluations[op]['silhouette']
    else:
        silhouettes_sample[op[7:]] = evaluations[op]['silhouette']
<ipython-input-98-aae62151d6e4>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  silhouettes_chunks = pd.Series()
<ipython-input-98-aae62151d6e4>:2: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  silhouettes_sample = pd.Series()
silhouettes_chunks.sort_values()[:60]
standardized_SOM_(5, 5)_sigma-1_rate-1         -0.024278
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01    -0.017866
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01   -0.017866
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01   -0.017779
standardized_SOM_(3, 4)_sigma-1_rate-0.5       -0.017356
standardized_SOM_(2, 5)_sigma-1_rate-0.5       -0.015143
standardized_SOM_(3, 4)_sigma-1_rate-0.2       -0.015097
standardized_SOM_(3, 4)_sigma-1_rate-1         -0.014982
standardized_SOM_(6, 5)_sigma-1_rate-1         -0.012165
standardized_SOM_(2, 6)_sigma-1_rate-0.5       -0.007150
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01    -0.005978
standardized_SOM_(3, 5)_sigma-1_rate-1         -0.005481
standardized_SOM_(2, 6)_sigma-1_rate-1         -0.004917
standardized_SOM_(6, 5)_sigma-1_rate-0.2       -0.004576
standardized_SOM_(5, 5)_sigma-0.25_rate-0.01   -0.003037
standardized_SOM_(5, 5)_sigma-0.1_rate-0.01    -0.003037
standardized_SOM_(5, 5)_sigma-0.01_rate-0.01   -0.003037
standardized_SOM_(4, 5)_sigma-1_rate-1         -0.002892
standardized_SOM_(5, 5)_sigma-1_rate-0.5        0.000401
standardized_SOM_(5, 5)_sigma-0.5_rate-0.01     0.001457
standardized_SOM_(4, 5)_sigma-1_rate-0.5        0.001654
standardized_SOM_(3, 5)_sigma-1_rate-0.5        0.001845
standardized_SOM_(6, 5)_sigma-1_rate-0.5        0.002844
standardized_SOM_(2, 6)_sigma-1_rate-0.1        0.004279
standardized_SOM_(4, 5)_sigma-1_rate-0.05       0.005213
standardized_SOM_(4, 5)_sigma-0.25_rate-0.01    0.009532
standardized_SOM_(4, 5)_sigma-0.1_rate-0.01     0.009661
standardized_SOM_(4, 5)_sigma-0.01_rate-0.01    0.009661
standardized_SOM_(2, 5)_sigma-1_rate-1          0.009980
standardized_SOM_(5, 5)_sigma-1_rate-0.1        0.010408
standardized_SOM_(3, 3)_sigma-1_rate-0.1        0.012268
standardized_SOM_(6, 5)_sigma-1_rate-0.1        0.013664
standardized_SOM_(2, 5)_sigma-1_rate-0.2        0.013981
standardized_SOM_(6, 5)_sigma-1_rate-0.05       0.014087
standardized_SOM_(3, 3)_sigma-1_rate-0.5        0.015501
standardized_GMM_20                             0.015964
standardized_SOM_(3, 3)_sigma-1_rate-1          0.016246
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05    0.016587
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05     0.016587
standardized_SOM_(5, 5)_sigma-1_rate-0.05       0.017124
dtype: float64
silhouettes_sample.sort_values()[:20]
standardized_MiniBatchKMeans_15                 0.003868
standardized_MiniBatchKMeans_30                 0.006452
standardized_GMM_30                             0.006930
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01    0.007282
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01     0.007616
standardized_SOM_(6, 5)_sigma-1_rate-0.01       0.011388
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01    0.012175
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01     0.012175
standardized_SOM_(4, 5)_sigma-1_rate-0.01       0.012226
standardized_SOM_(5, 5)_sigma-1_rate-0.01       0.012382
normalized_GMM_30                               0.012945
standardized_SOM_(5, 5)_sigma-0.1_rate-0.01     0.013606
standardized_SOM_(5, 5)_sigma-0.01_rate-0.01    0.013606
standardized_SOM_(5, 5)_sigma-0.25_rate-0.01    0.015820
standardized_SOM_(5, 5)_sigma-0.5_rate-0.01     0.018615
standardized_SOM_(4, 5)_sigma-0.5_rate-0.01     0.020928
standardized_SOM_(5, 5)_sigma-1_rate-0.05       0.024459
normalized_GMM_20                               0.024881
standardized_SOM_(6, 5)_sigma-1_rate-0.05       0.026259
standardized_SOM_(4, 5)_sigma-0.25_rate-0.01    0.026938
dtype: float64
calinski_chunks = pd.Series()
calinski_sample = pd.Series()


for op in options:
    if 'chunks' in op:
        calinski_chunks[op[7:]] = evaluations[op]['calinski']
    else:
        calinski_sample[op[7:]] = evaluations[op]['calinski']
<ipython-input-101-3d898caf6c7b>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  calinski_chunks = pd.Series()
<ipython-input-101-3d898caf6c7b>:2: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  calinski_sample = pd.Series()
calinski_chunks.sort_values()[:40]
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01     6959.598231
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01    6959.598231
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01    6960.295718
standardized_SOM_(6, 5)_sigma-1_rate-1          6990.208635
standardized_SOM_(5, 5)_sigma-0.01_rate-1       7421.136225
standardized_SOM_(5, 5)_sigma-0.1_rate-1        7421.136225
standardized_SOM_(5, 5)_sigma-0.25_rate-1       7424.063839
standardized_SOM_(6, 5)_sigma-0.1_rate-1        7430.500504
standardized_SOM_(6, 5)_sigma-0.01_rate-1       7430.500504
standardized_SOM_(6, 5)_sigma-0.25_rate-1       7433.264305
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01     7748.246318
standardized_SOM_(5, 5)_sigma-1_rate-1          7825.409446
standardized_SOM_(4, 5)_sigma-1_rate-1          8107.615230
standardized_SOM_(5, 5)_sigma-0.01_rate-0.01    8112.822740
standardized_SOM_(5, 5)_sigma-0.1_rate-0.01     8112.822740
standardized_SOM_(5, 5)_sigma-0.25_rate-0.01    8112.861026
standardized_SOM_(6, 5)_sigma-1_rate-0.1        8151.871415
standardized_SOM_(6, 5)_sigma-1_rate-0.05       8325.496573
standardized_SOM_(6, 5)_sigma-1_rate-0.5        8435.576391
standardized_SOM_(6, 5)_sigma-1_rate-0.2        8531.387056
standardized_SOM_(5, 5)_sigma-1_rate-0.5        8637.871332
standardized_SOM_(6, 5)_sigma-1_rate-0.01       8692.361927
standardized_SOM_(5, 5)_sigma-1_rate-0.05       8757.975202
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05     8880.918331
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05    8880.918331
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05    8889.270389
standardized_SOM_(6, 5)_sigma-0.5_rate-1        8916.942159
standardized_SOM_(6, 5)_sigma-0.25_rate-0.5     9182.328743
standardized_SOM_(5, 5)_sigma-0.01_rate-0.5     9191.094510
standardized_SOM_(5, 5)_sigma-0.1_rate-0.5      9191.094510
standardized_SOM_(6, 5)_sigma-0.5_rate-0.5      9257.700244
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5     9268.506869
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5      9268.506869
standardized_SOM_(5, 5)_sigma-0.5_rate-0.01     9270.206260
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1      9272.037507
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1     9272.037507
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1     9272.245939
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05     9293.160804
standardized_SOM_(3, 4)_sigma-1_rate-1          9301.033645
standardized_SOM_(3, 4)_sigma-1_rate-0.5        9387.162748
dtype: float64
calinski_sample.sort_values()[:20]
standardized_SOM_(6, 5)_sigma-1_rate-0.01       4824.010917
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01    4889.160249
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01     4889.160249
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01    4915.344589
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01     4980.880058
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5     5014.890213
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5      5014.890213
standardized_SOM_(6, 5)_sigma-0.25_rate-0.5     5014.890213
standardized_MiniBatchKMeans_30                 5134.759793
standardized_SOM_(6, 5)_sigma-1_rate-0.05       5153.592227
standardized_SOM_(6, 5)_sigma-0.5_rate-1        5286.382837
standardized_SOM_(6, 5)_sigma-1_rate-0.1        5364.270909
standardized_SOM_(4, 5)_sigma-0.5_rate-0.5      5555.249455
standardized_SOM_(5, 5)_sigma-1_rate-0.01       5559.596842
standardized_SOM_(5, 5)_sigma-0.01_rate-0.01    5678.900502
standardized_SOM_(5, 5)_sigma-0.1_rate-0.01     5678.900502
standardized_SOM_(5, 5)_sigma-0.25_rate-0.01    5739.259915
standardized_SOM_(5, 5)_sigma-0.01_rate-0.5     5772.861153
standardized_SOM_(5, 5)_sigma-0.1_rate-0.5      5772.861153
standardized_SOM_(6, 5)_sigma-1_rate-0.2        5774.465544
dtype: float64
davies_chunks = pd.Series()
davies_sample = pd.Series()


for op in options:
    if 'chunks' in op:
        davies_chunks[op[7:]] = evaluations[op]['davies']
    else:
        davies_sample[op[7:]] = evaluations[op]['davies']
<ipython-input-104-073e6f8c2f61>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  davies_chunks = pd.Series()
<ipython-input-104-073e6f8c2f61>:2: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  davies_sample = pd.Series()
davies_chunks.sort_values()[:20]
standardized_SOM_(3, 4)_sigma-0.5_rate-0.2      1.327454
standardized_SOM_(3, 4)_sigma-0.5_rate-0.5      1.328461
standardized_SOM_(3, 5)_sigma-0.5_rate-1        1.336137
standardized_SOM_(3, 5)_sigma-0.1_rate-0.2      1.351700
standardized_SOM_(3, 5)_sigma-0.01_rate-0.2     1.351700
standardized_SOM_(3, 5)_sigma-0.25_rate-0.2     1.354574
standardized_SOM_(2, 6)_sigma-0.5_rate-1        1.363487
standardized_SOM_(2, 5)_sigma-0.5_rate-0.5      1.381653
standardized_SOM_(3, 4)_sigma-0.5_rate-1        1.393961
standardized_SOM_(2, 5)_sigma-0.5_rate-1        1.394819
standardized_SOM_(4, 5)_sigma-0.5_rate-1        1.411874
normalized_SOM_(2, 6)_sigma-0.5_rate-0.01       1.426189
standardized_SOM_(2, 6)_sigma-0.5_rate-0.2      1.426299
standardized_SOM_(5, 5)_sigma-0.25_rate-1       1.437903
standardized_SOM_(5, 5)_sigma-0.01_rate-1       1.440224
standardized_SOM_(5, 5)_sigma-0.1_rate-1        1.440224
standardized_SOM_(3, 5)_sigma-0.5_rate-0.2      1.448240
standardized_SOM_(3, 3)_sigma-0.5_rate-0.01     1.469553
standardized_SOM_(3, 3)_sigma-0.1_rate-0.01     1.478125
standardized_SOM_(3, 3)_sigma-0.01_rate-0.01    1.478125
dtype: float64
davies_sample.sort_values()[:20]
normalized_SOM_(5, 5)_sigma-0.25_rate-1        0.803414
normalized_SOM_(6, 5)_sigma-0.25_rate-1        0.803414
normalized_SOM_(3, 4)_sigma-0.25_rate-1        0.962934
normalized_SOM_(2, 5)_sigma-0.25_rate-1        0.962934
normalized_SOM_(4, 5)_sigma-0.25_rate-1        0.962934
normalized_SOM_(3, 5)_sigma-0.25_rate-1        0.962934
normalized_SOM_(2, 6)_sigma-0.25_rate-1        0.962934
standardized_SOM_(6, 5)_sigma-0.5_rate-1       1.144874
standardized_SOM_(5, 5)_sigma-0.25_rate-0.5    1.193206
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5    1.198918
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5     1.198918
standardized_SOM_(6, 5)_sigma-0.25_rate-0.5    1.198918
standardized_SOM_(5, 5)_sigma-0.01_rate-0.5    1.212181
standardized_SOM_(5, 5)_sigma-0.1_rate-0.5     1.212181
standardized_SOM_(4, 5)_sigma-0.25_rate-0.5    1.212815
standardized_SOM_(4, 5)_sigma-0.01_rate-0.5    1.212815
standardized_SOM_(4, 5)_sigma-0.1_rate-0.5     1.212815
standardized_SOM_(4, 5)_sigma-0.5_rate-0.5     1.264575
normalized_SOM_(3, 3)_sigma-0.25_rate-1        1.308968
normalized_SOM_(3, 4)_sigma-0.5_rate-0.05      1.345703
dtype: float64
one_eval.keys()
dict_keys(['silhouette', 'calinski', 'davies', 'frequencies', 'mod_chi', 'mod_p', 'mod_dof', 'mod_exp', 'mod_cramers_', 'mod_crosstab', 'mur_chi', 'mur_p', 'mur_dof', 'mur_exp', 'mur_cramers_v', 'mur_crosstab', 'joc_chi', 'joc_p', 'joc_dof', 'joc_exp', 'joc_cramers_v', 'joc_crosstab', 'signature_abundance', 'signature_areas'])
fragmentation = pd.Series()

for op in options:
    if 'chunks' in op:
        fragmentation[op[7:]] = evaluations[op]['signature_abundance'].sum()
   
<ipython-input-113-66d2c596d8be>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  fragmentation = pd.Series()
fragmentation_area = pd.Series()

for op in options:
    if 'chunks' in op:
        fragmentation_area[op[7:]] = evaluations[op]['signature_areas'].median()
   
<ipython-input-136-010d462a7fa0>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  fragmentation_area = pd.Series()
fragmentation.loc[fragmentation.index.str.contains('stand')].sort_values()[:50]
standardized_SOM_(3, 3)_sigma-0.1_rate-0.05      838
standardized_SOM_(3, 3)_sigma-0.01_rate-0.05     838
standardized_SOM_(3, 3)_sigma-0.25_rate-0.05     840
standardized_SOM_(2, 5)_sigma-0.5_rate-1         908
standardized_SOM_(3, 3)_sigma-0.5_rate-0.01      927
standardized_SOM_(3, 3)_sigma-0.25_rate-0.01     931
standardized_SOM_(2, 5)_sigma-0.1_rate-0.05      937
standardized_SOM_(2, 5)_sigma-0.01_rate-0.05     937
standardized_SOM_(2, 6)_sigma-0.5_rate-1         940
standardized_SOM_(2, 5)_sigma-0.25_rate-0.05     940
standardized_SOM_(3, 3)_sigma-0.1_rate-0.01      941
standardized_SOM_(3, 3)_sigma-0.01_rate-0.01     941
standardized_SOM_(3, 4)_sigma-0.5_rate-1         953
standardized_SOM_(3, 4)_sigma-0.5_rate-0.2       964
standardized_SOM_(2, 6)_sigma-0.5_rate-0.2       967
standardized_SOM_(2, 5)_sigma-0.01_rate-0.1     1023
standardized_SOM_(2, 5)_sigma-0.1_rate-0.1      1023
standardized_SOM_(2, 5)_sigma-0.25_rate-0.1     1023
standardized_SOM_(3, 5)_sigma-0.5_rate-1        1029
standardized_SOM_(3, 4)_sigma-0.25_rate-0.1     1040
standardized_SOM_(2, 6)_sigma-0.1_rate-0.1      1040
standardized_SOM_(3, 4)_sigma-0.01_rate-0.1     1040
standardized_SOM_(2, 6)_sigma-0.25_rate-0.1     1040
standardized_SOM_(2, 6)_sigma-0.01_rate-0.1     1040
standardized_SOM_(3, 4)_sigma-0.1_rate-0.1      1040
standardized_SOM_(3, 3)_sigma-0.25_rate-0.1     1101
standardized_SOM_(3, 3)_sigma-0.01_rate-0.1     1103
standardized_SOM_(3, 3)_sigma-0.1_rate-0.1      1103
standardized_SOM_(3, 5)_sigma-0.25_rate-0.2     1146
standardized_SOM_(2, 5)_sigma-0.5_rate-0.5      1147
standardized_SOM_(3, 5)_sigma-0.1_rate-0.2      1148
standardized_SOM_(3, 5)_sigma-0.01_rate-0.2     1148
standardized_SOM_(2, 5)_sigma-0.25_rate-0.01    1151
standardized_SOM_(2, 5)_sigma-0.1_rate-0.01     1151
standardized_SOM_(2, 5)_sigma-0.01_rate-0.01    1151
standardized_SOM_(3, 3)_sigma-0.5_rate-0.1      1185
standardized_SOM_(3, 3)_sigma-0.5_rate-0.05     1188
standardized_SOM_(2, 5)_sigma-0.5_rate-0.01     1203
standardized_SOM_(2, 5)_sigma-0.5_rate-0.05     1203
standardized_SOM_(3, 4)_sigma-0.5_rate-0.5      1218
standardized_SOM_(2, 5)_sigma-0.5_rate-0.1      1219
standardized_SOM_(3, 4)_sigma-0.5_rate-0.1      1248
standardized_SOM_(2, 6)_sigma-0.25_rate-0.05    1328
standardized_SOM_(3, 4)_sigma-0.01_rate-0.05    1329
standardized_SOM_(3, 4)_sigma-0.1_rate-0.05     1329
standardized_SOM_(3, 4)_sigma-0.25_rate-0.05    1329
standardized_SOM_(2, 6)_sigma-0.1_rate-0.05     1329
standardized_SOM_(2, 6)_sigma-0.01_rate-0.05    1329
standardized_SOM_(3, 5)_sigma-0.5_rate-0.2      1353
standardized_SOM_(5, 5)_sigma-0.25_rate-1       1354
dtype: int64
fragmentation.loc[fragmentation.index.str.contains('KMeans')]
normalized_KMeans_10               2547
normalized_KMeans_15               3197
normalized_KMeans_20               3771
normalized_KMeans_30               4779
normalized_MiniBatchKMeans_10      2364
normalized_MiniBatchKMeans_15      3414
normalized_MiniBatchKMeans_20      4120
normalized_MiniBatchKMeans_30      5145
standardized_KMeans_10             1661
standardized_KMeans_15             1994
standardized_KMeans_20             2365
standardized_KMeans_30             3174
standardized_MiniBatchKMeans_10    1654
standardized_MiniBatchKMeans_15    2163
standardized_MiniBatchKMeans_20    2735
standardized_MiniBatchKMeans_30    3389
dtype: int64
fragmentation.loc[fragmentation.index.str.contains('GMM')]
normalized_GMM_10      1409
normalized_GMM_15      1453
normalized_GMM_20      1229
normalized_GMM_30      1379
standardized_GMM_10    1520
standardized_GMM_15    1373
standardized_GMM_20    1531
standardized_GMM_30    1369
dtype: int64
fragmentation_area.loc[fragmentation_area.index.str.contains('stand')].sort_values(ascending=False)[:40]
standardized_GMM_30                             2700.965074
standardized_SOM_(2, 6)_sigma-1_rate-0.1        2391.975084
standardized_SOM_(2, 5)_sigma-1_rate-0.05       2374.398935
standardized_SOM_(3, 3)_sigma-1_rate-0.1        2368.246530
standardized_SOM_(4, 5)_sigma-0.5_rate-0.2      2356.992203
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05    2344.947563
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05     2344.947563
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05    2307.287479
standardized_SOM_(3, 5)_sigma-1_rate-0.01       2284.809471
standardized_SOM_(4, 5)_sigma-0.1_rate-0.1      2281.160246
standardized_SOM_(4, 5)_sigma-0.25_rate-0.1     2281.160246
standardized_SOM_(4, 5)_sigma-0.01_rate-0.1     2281.160246
standardized_SOM_(2, 5)_sigma-1_rate-0.2        2270.651929
standardized_SOM_(3, 3)_sigma-1_rate-0.2        2266.443011
standardized_MiniBatchKMeans_30                 2244.820890
standardized_GMM_15                             2239.217112
standardized_SOM_(2, 6)_sigma-1_rate-0.01       2225.597443
standardized_SOM_(5, 5)_sigma-0.25_rate-0.2     2211.483474
standardized_SOM_(3, 3)_sigma-1_rate-0.5        2209.690410
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2      2208.273922
standardized_SOM_(4, 5)_sigma-1_rate-0.2        2181.945189
standardized_SOM_(5, 5)_sigma-0.25_rate-0.05    2181.621606
standardized_KMeans_15                          2178.768019
standardized_SOM_(2, 5)_sigma-0.5_rate-0.01     2172.631353
standardized_SOM_(5, 5)_sigma-0.01_rate-0.05    2171.944870
standardized_SOM_(5, 5)_sigma-0.1_rate-0.05     2171.944870
standardized_SOM_(2, 5)_sigma-0.01_rate-0.01    2167.586785
standardized_SOM_(2, 5)_sigma-0.1_rate-0.01     2167.586785
standardized_SOM_(2, 5)_sigma-0.25_rate-0.01    2167.586785
standardized_SOM_(5, 5)_sigma-0.01_rate-0.2     2162.585085
standardized_SOM_(5, 5)_sigma-0.1_rate-0.2      2162.585085
standardized_SOM_(4, 5)_sigma-0.5_rate-0.01     2160.063375
standardized_GMM_20                             2150.874923
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01    2146.214265
standardized_KMeans_30                          2141.359582
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01     2135.201701
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5     2131.123645
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5      2131.123645
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01    2129.098278
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01     2129.098278
dtype: float64
evaluations["chunks_standardized_GMM_30"]['frequencies']
17    24262
18    21483
12    20069
28    19235
13    17831
5     17506
11    16281
6     16210
3     15925
27    13116
19    12653
1      8949
15     8100
16     8015
9      7612
2      7590
14     6497
0      6306
24     5686
23     5089
29     4582
20     3989
21     3505
26     3180
4      1669
25      610
8       361
10      221
22      179
7        86
dtype: int64
postcode = pd.Series()

for op in options:
    if 'chunks' in op:
        postcode[op[7:]] = evaluations[op]['mur_cramers_v']
<ipython-input-124-3e38af0033b5>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  postcode = pd.Series()
postcode.sort_values()[-20:]
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01    0.193043
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01    0.193126
standardized_SOM_(6, 5)_sigma-1_rate-0.01       0.193576
standardized_GMM_15                             0.195398
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2      0.195544
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5      0.195676
normalized_GMM_20                               0.196112
standardized_SOM_(6, 5)_sigma-1_rate-0.05       0.197000
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05     0.197602
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05    0.197952
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05    0.198055
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05     0.198055
standardized_GMM_20                             0.200421
normalized_KMeans_30                            0.201362
standardized_KMeans_20                          0.202131
normalized_MiniBatchKMeans_30                   0.204752
normalized_GMM_30                               0.204883
standardized_MiniBatchKMeans_30                 0.210613
standardized_KMeans_30                          0.214844
standardized_GMM_30                             0.215383
dtype: float64
jochem = pd.Series()

for op in options:
    if 'chunks' in op:
        jochem[op[7:]] = evaluations[op]['joc_cramers_v']
<ipython-input-126-60783ea3004f>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  jochem = pd.Series()
jochem.sort_values()[-20:]
standardized_SOM_(6, 5)_sigma-1_rate-0.05      0.293012
standardized_SOM_(5, 5)_sigma-1_rate-0.5       0.293537
standardized_SOM_(6, 5)_sigma-1_rate-0.2       0.293610
normalized_MiniBatchKMeans_30                  0.293773
standardized_GMM_15                            0.295652
standardized_SOM_(6, 5)_sigma-1_rate-0.01      0.295684
standardized_SOM_(5, 5)_sigma-1_rate-0.01      0.296311
standardized_SOM_(6, 5)_sigma-1_rate-0.5       0.296498
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05    0.299157
normalized_GMM_30                              0.299246
standardized_MiniBatchKMeans_15                0.299745
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5     0.302494
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2     0.304802
standardized_GMM_20                            0.308456
standardized_MiniBatchKMeans_20                0.311887
standardized_KMeans_15                         0.313963
standardized_KMeans_20                         0.315366
standardized_GMM_30                            0.320287
standardized_KMeans_30                         0.326838
standardized_MiniBatchKMeans_30                0.329180
dtype: float64
modum = pd.Series()

for op in options:
    if 'chunks' in op:
        modum[op[7:]] = evaluations[op]['mod_cramers_']
<ipython-input-129-13e43c17149a>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  modum = pd.Series()
modum.sort_values()[-20:]
standardized_SOM_(5, 5)_sigma-1_rate-0.1        0.298132
standardized_SOM_(4, 5)_sigma-1_rate-0.5        0.298227
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1     0.301453
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1      0.301453
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1     0.301463
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05     0.301649
standardized_SOM_(5, 5)_sigma-1_rate-0.5        0.302658
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05    0.303264
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05     0.303293
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05    0.303293
standardized_KMeans_30                          0.304986
standardized_GMM_30                             0.305577
standardized_SOM_(4, 5)_sigma-1_rate-0.05       0.305997
standardized_SOM_(6, 5)_sigma-1_rate-0.05       0.307816
standardized_SOM_(6, 5)_sigma-1_rate-0.1        0.308840
standardized_SOM_(5, 5)_sigma-1_rate-0.01       0.308846
standardized_SOM_(6, 5)_sigma-1_rate-0.01       0.311538
standardized_MiniBatchKMeans_30                 0.311836
standardized_SOM_(6, 5)_sigma-1_rate-0.2        0.312026
standardized_SOM_(6, 5)_sigma-1_rate-0.5        0.316889
dtype: float64
score = pd.DataFrame(index=modum.index)
score["modum"] = pd.Series(range(1, 312), index=modum.sort_values(ascending=False).index)
score["postcode_class"] = pd.Series(range(1, 312), index=postcode.sort_values(ascending=False).index)
score["jochem"] = pd.Series(range(1, 312), index=jochem.sort_values(ascending=False).index)
score["fragmentation_count"] = pd.Series(range(1, 312), index=fragmentation.sort_values(ascending=True).index)
score["fragmentation_area"] = pd.Series(range(1, 312), index=fragmentation_area.sort_values(ascending=False).index)
score["davies"] = pd.Series(range(1, 312), index=davies_chunks.sort_values(ascending=True).index)
score["silhouette"] = pd.Series(range(1, 312), index=silhouettes_chunks.sort_values(ascending=True).index)
score["calinski"] = pd.Series(range(1, 312), index=calinski_chunks.sort_values(ascending=True).index)
score["total"] = score.sum(axis=1)
score["comparative"] = score.modum + score.postcode_class + score.jochem
score["internal"] = score.davies + score.silhouette + score.calinski
score["fragmentation"] = score.fragmentation_count + score.fragmentation_area
score.total.sort_values()[:20]
standardized_GMM_30                             459
normalized_GMM_30                               553
standardized_SOM_(5, 5)_sigma-0.01_rate-0.2     637
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2      640
standardized_SOM_(5, 5)_sigma-0.1_rate-0.2      641
standardized_GMM_20                             683
standardized_SOM_(6, 5)_sigma-0.5_rate-0.5      683
standardized_SOM_(5, 5)_sigma-0.25_rate-0.05    694
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5      694
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05     698
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05    700
standardized_GMM_15                             709
standardized_SOM_(5, 5)_sigma-0.01_rate-0.05    709
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5      709
standardized_SOM_(5, 5)_sigma-0.1_rate-0.05     711
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5     715
standardized_SOM_(4, 5)_sigma-0.1_rate-0.1      715
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05    716
standardized_SOM_(4, 5)_sigma-0.25_rate-0.1     717
standardized_SOM_(4, 5)_sigma-0.01_rate-0.1     717
Name: total, dtype: int64
score.comparative.sort_values()[:20]
standardized_MiniBatchKMeans_30                  7
standardized_GMM_30                             13
standardized_KMeans_30                          14
standardized_KMeans_20                          32
standardized_SOM_(6, 5)_sigma-1_rate-0.01       37
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05     39
standardized_SOM_(6, 5)_sigma-1_rate-0.05       40
standardized_SOM_(6, 5)_sigma-1_rate-0.2        42
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2      45
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05     45
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05    48
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05    48
standardized_SOM_(6, 5)_sigma-1_rate-0.5        49
standardized_SOM_(5, 5)_sigma-1_rate-0.01       57
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5      57
standardized_SOM_(5, 5)_sigma-1_rate-0.5        70
standardized_MiniBatchKMeans_20                 79
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01     80
standardized_SOM_(6, 5)_sigma-1_rate-0.1        83
standardized_KMeans_15                          87
Name: comparative, dtype: int64
score.internal.sort_values()[:60]
standardized_SOM_(4, 5)_sigma-0.25_rate-0.5     163
standardized_SOM_(4, 5)_sigma-0.01_rate-0.5     177
standardized_SOM_(4, 5)_sigma-0.1_rate-0.5      178
standardized_SOM_(4, 5)_sigma-0.5_rate-1        185
standardized_SOM_(5, 5)_sigma-0.01_rate-0.5     188
standardized_SOM_(5, 5)_sigma-0.1_rate-0.5      189
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5      199
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5     200
standardized_SOM_(5, 5)_sigma-0.5_rate-1        202
standardized_SOM_(5, 5)_sigma-0.25_rate-0.5     211
standardized_SOM_(6, 5)_sigma-0.01_rate-1       218
standardized_SOM_(6, 5)_sigma-0.1_rate-1        219
standardized_SOM_(6, 5)_sigma-0.25_rate-1       220
standardized_SOM_(6, 5)_sigma-0.25_rate-0.5     223
standardized_SOM_(6, 5)_sigma-0.5_rate-1        237
standardized_SOM_(6, 5)_sigma-0.5_rate-0.5      247
standardized_SOM_(4, 5)_sigma-0.5_rate-0.5      259
standardized_SOM_(3, 4)_sigma-1_rate-0.5        267
standardized_SOM_(5, 5)_sigma-0.25_rate-0.2     269
standardized_SOM_(4, 5)_sigma-1_rate-1          271
standardized_SOM_(2, 6)_sigma-0.5_rate-0.5      277
standardized_SOM_(2, 5)_sigma-1_rate-0.5        286
standardized_SOM_(5, 5)_sigma-0.01_rate-0.2     288
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5      289
standardized_SOM_(5, 5)_sigma-0.1_rate-0.2      291
standardized_SOM_(3, 5)_sigma-0.5_rate-0.5      302
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05     307
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01     307
standardized_SOM_(3, 4)_sigma-1_rate-1          309
standardized_SOM_(6, 5)_sigma-1_rate-1          309
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1     309
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01    310
standardized_SOM_(4, 5)_sigma-0.5_rate-0.2      310
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01    310
standardized_SOM_(5, 5)_sigma-1_rate-1          311
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1      311
standardized_SOM_(2, 6)_sigma-1_rate-0.5        312
standardized_SOM_(6, 5)_sigma-1_rate-0.1        312
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1     313
standardized_SOM_(5, 5)_sigma-0.01_rate-1       318
standardized_SOM_(3, 3)_sigma-1_rate-0.2        318
standardized_SOM_(5, 5)_sigma-0.1_rate-1        319
standardized_SOM_(5, 5)_sigma-0.5_rate-0.05     320
standardized_SOM_(5, 5)_sigma-0.5_rate-0.2      320
standardized_SOM_(5, 5)_sigma-0.25_rate-1       320
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01     324
standardized_SOM_(6, 5)_sigma-1_rate-0.5        324
standardized_SOM_(6, 5)_sigma-0.1_rate-0.2      324
standardized_SOM_(6, 5)_sigma-0.01_rate-0.2     325
standardized_SOM_(3, 5)_sigma-1_rate-0.5        325
standardized_SOM_(5, 5)_sigma-1_rate-0.5        325
standardized_SOM_(6, 5)_sigma-1_rate-0.2        326
standardized_SOM_(6, 5)_sigma-0.25_rate-0.2     326
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2      329
standardized_SOM_(3, 5)_sigma-0.1_rate-0.05     331
standardized_SOM_(5, 5)_sigma-1_rate-0.05       331
standardized_SOM_(3, 4)_sigma-0.1_rate-0.05     332
standardized_SOM_(6, 5)_sigma-0.5_rate-0.1      332
standardized_SOM_(6, 5)_sigma-1_rate-0.05       332
standardized_SOM_(3, 5)_sigma-0.01_rate-0.05    333
Name: internal, dtype: int64
score.internal.loc[score.index.str.contains("GMM")].sort_values()[:60]
standardized_GMM_30    389
normalized_GMM_30      397
standardized_GMM_20    439
standardized_GMM_15    472
normalized_GMM_20      500
standardized_GMM_10    538
normalized_GMM_15      556
normalized_GMM_10      586
Name: internal, dtype: int64
score.internal.sort_values()[:60]
standardized_SOM_(4, 5)_sigma-0.25_rate-0.5     163
standardized_SOM_(4, 5)_sigma-0.01_rate-0.5     177
standardized_SOM_(4, 5)_sigma-0.1_rate-0.5      178
standardized_SOM_(4, 5)_sigma-0.5_rate-1        185
standardized_SOM_(5, 5)_sigma-0.01_rate-0.5     188
standardized_SOM_(5, 5)_sigma-0.1_rate-0.5      189
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5      199
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5     200
standardized_SOM_(5, 5)_sigma-0.5_rate-1        202
standardized_SOM_(5, 5)_sigma-0.25_rate-0.5     211
standardized_SOM_(6, 5)_sigma-0.01_rate-1       218
standardized_SOM_(6, 5)_sigma-0.1_rate-1        219
standardized_SOM_(6, 5)_sigma-0.25_rate-1       220
standardized_SOM_(6, 5)_sigma-0.25_rate-0.5     223
standardized_SOM_(6, 5)_sigma-0.5_rate-1        237
standardized_SOM_(6, 5)_sigma-0.5_rate-0.5      247
standardized_SOM_(4, 5)_sigma-0.5_rate-0.5      259
standardized_SOM_(3, 4)_sigma-1_rate-0.5        267
standardized_SOM_(5, 5)_sigma-0.25_rate-0.2     269
standardized_SOM_(4, 5)_sigma-1_rate-1          271
standardized_SOM_(2, 6)_sigma-0.5_rate-0.5      277
standardized_SOM_(2, 5)_sigma-1_rate-0.5        286
standardized_SOM_(5, 5)_sigma-0.01_rate-0.2     288
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5      289
standardized_SOM_(5, 5)_sigma-0.1_rate-0.2      291
standardized_SOM_(3, 5)_sigma-0.5_rate-0.5      302
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05     307
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01     307
standardized_SOM_(3, 4)_sigma-1_rate-1          309
standardized_SOM_(6, 5)_sigma-1_rate-1          309
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1     309
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01    310
standardized_SOM_(4, 5)_sigma-0.5_rate-0.2      310
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01    310
standardized_SOM_(5, 5)_sigma-1_rate-1          311
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1      311
standardized_SOM_(2, 6)_sigma-1_rate-0.5        312
standardized_SOM_(6, 5)_sigma-1_rate-0.1        312
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1     313
standardized_SOM_(5, 5)_sigma-0.01_rate-1       318
standardized_SOM_(3, 3)_sigma-1_rate-0.2        318
standardized_SOM_(5, 5)_sigma-0.1_rate-1        319
standardized_SOM_(5, 5)_sigma-0.5_rate-0.05     320
standardized_SOM_(5, 5)_sigma-0.5_rate-0.2      320
standardized_SOM_(5, 5)_sigma-0.25_rate-1       320
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01     324
standardized_SOM_(6, 5)_sigma-1_rate-0.5        324
standardized_SOM_(6, 5)_sigma-0.1_rate-0.2      324
standardized_SOM_(6, 5)_sigma-0.01_rate-0.2     325
standardized_SOM_(3, 5)_sigma-1_rate-0.5        325
standardized_SOM_(5, 5)_sigma-1_rate-0.5        325
standardized_SOM_(6, 5)_sigma-1_rate-0.2        326
standardized_SOM_(6, 5)_sigma-0.25_rate-0.2     326
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2      329
standardized_SOM_(3, 5)_sigma-0.1_rate-0.05     331
standardized_SOM_(5, 5)_sigma-1_rate-0.05       331
standardized_SOM_(3, 4)_sigma-0.1_rate-0.05     332
standardized_SOM_(6, 5)_sigma-0.5_rate-0.1      332
standardized_SOM_(6, 5)_sigma-1_rate-0.05       332
standardized_SOM_(3, 5)_sigma-0.01_rate-0.05    333
Name: internal, dtype: int64
score.fragmentation.sort_values()[:20]
standardized_GMM_30                              57
normalized_GMM_30                                58
standardized_GMM_15                              84
normalized_GMM_15                                86
standardized_SOM_(2, 5)_sigma-0.01_rate-0.01     89
standardized_SOM_(2, 5)_sigma-0.25_rate-0.01     91
standardized_SOM_(3, 3)_sigma-1_rate-0.2         91
standardized_SOM_(2, 5)_sigma-0.5_rate-0.01      92
standardized_SOM_(2, 5)_sigma-0.1_rate-0.01      93
normalized_SOM_(3, 4)_sigma-0.5_rate-0.01       104
normalized_SOM_(2, 6)_sigma-0.5_rate-0.01       105
standardized_SOM_(3, 3)_sigma-1_rate-0.1        111
normalized_GMM_20                               113
standardized_SOM_(2, 6)_sigma-1_rate-0.1        114
normalized_SOM_(2, 5)_sigma-0.5_rate-0.01       116
standardized_SOM_(2, 5)_sigma-1_rate-0.2        120
standardized_SOM_(4, 5)_sigma-0.5_rate-0.2      125
standardized_SOM_(2, 5)_sigma-1_rate-0.05       125
standardized_GMM_20                             134
standardized_SOM_(3, 3)_sigma-1_rate-0.5        138
Name: fragmentation, dtype: int64
postcode_sample = pd.Series()
jochem_sample = pd.Series()
modum_sample = pd.Series()


for op in options:
    if 'sample' in op:
        postcode_sample[op[7:]] = evaluations[op]['mur_cramers_v']
        jochem_sample[op[7:]] = evaluations[op]['joc_cramers_v']
        modum_sample[op[7:]] = evaluations[op]['mod_cramers_']
<ipython-input-149-8df411749ae7>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  postcode_sample = pd.Series()
<ipython-input-149-8df411749ae7>:2: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  jochem_sample = pd.Series()
<ipython-input-149-8df411749ae7>:3: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  modum_sample = pd.Series()
score_sample = pd.DataFrame(index=modum_sample.index)
score_sample["modum"] = pd.Series(range(1, 310), index=modum_sample.sort_values(ascending=False).index)
score_sample["postcode_class"] = pd.Series(range(1, 310), index=postcode_sample.sort_values(ascending=False).index)
score_sample["jochem"] = pd.Series(range(1, 310), index=jochem_sample.sort_values(ascending=False).index)
score_sample["davies"] = pd.Series(range(1, 310), index=davies_sample.sort_values(ascending=True).index)
score_sample["silhouette"] = pd.Series(range(1, 310), index=silhouettes_sample.sort_values(ascending=True).index)
score_sample["calinski"] = pd.Series(range(1, 310), index=calinski_sample.sort_values(ascending=True).index)
score_sample["total"] = score_sample.sum(axis=1)
score_sample["comparative"] = score_sample.modum + score_sample.postcode_class + score_sample.jochem
score_sample["internal"] = score_sample.davies + score_sample.silhouette + score_sample.calinski
score_sample.total.sort_values()[:20]
standardized_KMeans_30                          340
standardized_MiniBatchKMeans_30                 341
standardized_SOM_(6, 5)_sigma-1_rate-0.01       346
standardized_SOM_(6, 5)_sigma-1_rate-0.05       354
standardized_SOM_(6, 5)_sigma-1_rate-0.1        354
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05    361
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05     361
standardized_SOM_(6, 5)_sigma-1_rate-0.2        371
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01     376
standardized_SOM_(5, 5)_sigma-1_rate-0.05       387
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05    391
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05     392
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01    398
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01     400
standardized_SOM_(6, 5)_sigma-0.5_rate-0.1      401
standardized_SOM_(5, 5)_sigma-1_rate-0.1        402
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01    403
standardized_SOM_(5, 5)_sigma-1_rate-0.01       403
standardized_SOM_(5, 5)_sigma-0.25_rate-0.05    424
standardized_SOM_(5, 5)_sigma-0.5_rate-0.01     431
Name: total, dtype: int64
score_sample.comparative.loc[score_sample.index.str.contains('stand')].sort_values()[:20]
standardized_SOM_(6, 5)_sigma-1_rate-0.05       36
standardized_SOM_(6, 5)_sigma-1_rate-0.01       37
standardized_MiniBatchKMeans_30                 39
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05    41
standardized_SOM_(6, 5)_sigma-1_rate-0.1        42
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05     42
standardized_SOM_(5, 5)_sigma-1_rate-0.05       56
standardized_SOM_(6, 5)_sigma-1_rate-0.2        63
standardized_SOM_(4, 5)_sigma-0.1_rate-0.05     63
standardized_SOM_(4, 5)_sigma-1_rate-0.05       65
standardized_SOM_(4, 5)_sigma-0.01_rate-0.05    66
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05    69
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01     69
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05     72
standardized_SOM_(5, 5)_sigma-1_rate-0.01       79
standardized_SOM_(4, 5)_sigma-0.5_rate-0.05     87
standardized_SOM_(5, 5)_sigma-0.01_rate-0.05    88
standardized_SOM_(5, 5)_sigma-0.25_rate-0.05    89
standardized_SOM_(5, 5)_sigma-0.5_rate-0.05     90
standardized_SOM_(5, 5)_sigma-1_rate-0.1        91
Name: comparative, dtype: int64
score_sample.internal.sort_values()[:20]
standardized_SOM_(6, 5)_sigma-0.1_rate-0.2      208
standardized_SOM_(6, 5)_sigma-0.01_rate-0.2     209
standardized_SOM_(6, 5)_sigma-0.25_rate-0.2     210
standardized_KMeans_30                          218
standardized_SOM_(6, 5)_sigma-0.5_rate-0.5      235
standardized_SOM_(5, 5)_sigma-0.5_rate-0.2      242
standardized_SOM_(5, 5)_sigma-0.25_rate-0.2     258
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2      259
standardized_SOM_(5, 5)_sigma-0.1_rate-0.2      259
standardized_SOM_(5, 5)_sigma-0.01_rate-0.2     263
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5      264
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01    284
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01    289
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01     290
standardized_SOM_(6, 5)_sigma-0.5_rate-1        295
standardized_SOM_(5, 5)_sigma-0.01_rate-0.1     297
standardized_SOM_(5, 5)_sigma-0.1_rate-0.1      298
standardized_SOM_(5, 5)_sigma-1_rate-0.5        298
standardized_SOM_(6, 5)_sigma-1_rate-1          298
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1     302
Name: internal, dtype: int64
(score.total + score_sample.total).loc[score.index.intersection(score_sample.index)].sort_values()[:40]
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05     1059.0
standardized_MiniBatchKMeans_30                 1059.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05    1061.0
standardized_SOM_(6, 5)_sigma-1_rate-0.01       1095.0
standardized_KMeans_30                          1103.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05    1107.0
standardized_SOM_(5, 5)_sigma-0.25_rate-0.05    1118.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01     1133.0
standardized_SOM_(5, 5)_sigma-0.01_rate-0.05    1144.0
standardized_SOM_(6, 5)_sigma-1_rate-0.05       1147.0
standardized_SOM_(5, 5)_sigma-0.1_rate-0.05     1148.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05     1154.0
standardized_SOM_(6, 5)_sigma-1_rate-0.2        1162.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01    1184.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1     1189.0
standardized_SOM_(5, 5)_sigma-1_rate-0.01       1194.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1      1194.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01     1196.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01    1202.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1     1208.0
standardized_GMM_30                             1210.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2      1228.0
standardized_SOM_(6, 5)_sigma-1_rate-0.1        1239.0
normalized_GMM_30                               1244.0
standardized_SOM_(5, 5)_sigma-0.5_rate-0.01     1267.0
standardized_SOM_(5, 5)_sigma-1_rate-0.1        1295.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.1      1299.0
standardized_SOM_(5, 5)_sigma-0.25_rate-0.1     1305.0
standardized_SOM_(4, 5)_sigma-0.5_rate-0.01     1311.0
standardized_SOM_(5, 5)_sigma-1_rate-0.05       1316.0
standardized_SOM_(4, 5)_sigma-1_rate-0.01       1353.0
standardized_KMeans_20                          1353.0
standardized_SOM_(5, 5)_sigma-0.01_rate-0.1     1360.0
standardized_SOM_(5, 5)_sigma-0.1_rate-0.1      1360.0
standardized_SOM_(5, 5)_sigma-0.5_rate-0.05     1366.0
standardized_SOM_(4, 5)_sigma-1_rate-0.05       1380.0
standardized_KMeans_15                          1387.0
standardized_SOM_(5, 5)_sigma-1_rate-0.2        1409.0
standardized_SOM_(4, 5)_sigma-0.01_rate-0.05    1416.0
standardized_SOM_(4, 5)_sigma-0.1_rate-0.05     1418.0
Name: total, dtype: float64
(score.comparative + score_sample.comparative).loc[score.index.intersection(score_sample.index)].sort_values()[:40]
standardized_MiniBatchKMeans_30                  46.0
standardized_SOM_(6, 5)_sigma-1_rate-0.01        74.0
standardized_SOM_(6, 5)_sigma-1_rate-0.05        76.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05      87.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05     89.0
standardized_SOM_(6, 5)_sigma-1_rate-0.2        105.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05     111.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05    117.0
standardized_SOM_(6, 5)_sigma-1_rate-0.1        125.0
standardized_SOM_(5, 5)_sigma-1_rate-0.01       136.0
standardized_KMeans_30                          136.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01     149.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.1      187.0
standardized_SOM_(5, 5)_sigma-0.01_rate-0.05    188.0
standardized_SOM_(5, 5)_sigma-0.25_rate-0.05    188.0
standardized_SOM_(5, 5)_sigma-0.1_rate-0.05     192.0
standardized_SOM_(4, 5)_sigma-1_rate-0.05       197.0
standardized_SOM_(5, 5)_sigma-1_rate-0.1        209.0
standardized_SOM_(5, 5)_sigma-0.5_rate-0.01     209.0
standardized_KMeans_20                          217.0
standardized_SOM_(5, 5)_sigma-1_rate-0.05       223.0
standardized_SOM_(4, 5)_sigma-0.5_rate-0.01     235.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01     236.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01    238.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1     240.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1      242.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01    242.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1     251.0
standardized_SOM_(4, 5)_sigma-1_rate-0.01       262.0
standardized_SOM_(5, 5)_sigma-0.5_rate-0.05     269.0
standardized_SOM_(5, 5)_sigma-1_rate-0.2        285.0
standardized_SOM_(4, 5)_sigma-0.25_rate-0.01    304.0
standardized_SOM_(6, 5)_sigma-1_rate-0.5        306.0
normalized_GMM_30                               307.0
standardized_MiniBatchKMeans_15                 314.0
standardized_KMeans_15                          315.0
standardized_SOM_(4, 5)_sigma-0.1_rate-0.01     324.0
standardized_SOM_(4, 5)_sigma-0.01_rate-0.01    326.0
standardized_SOM_(4, 5)_sigma-0.1_rate-0.05     331.0
standardized_SOM_(4, 5)_sigma-0.01_rate-0.05    335.0
Name: comparative, dtype: float64
(score.internal + score_sample.internal).loc[score.index.intersection(score_sample.index)].sort_values()[:40]
standardized_SOM_(6, 5)_sigma-0.5_rate-0.5      482.0
standardized_SOM_(4, 5)_sigma-0.25_rate-0.5     505.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5     512.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5      514.0
standardized_SOM_(5, 5)_sigma-0.01_rate-0.5     514.0
standardized_SOM_(5, 5)_sigma-0.1_rate-0.5      516.0
standardized_SOM_(4, 5)_sigma-0.01_rate-0.5     520.0
standardized_SOM_(4, 5)_sigma-0.1_rate-0.5      522.0
standardized_SOM_(5, 5)_sigma-0.25_rate-0.5     526.0
standardized_SOM_(5, 5)_sigma-0.25_rate-0.2     527.0
standardized_SOM_(6, 5)_sigma-0.5_rate-1        532.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.2      532.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.2     534.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.2     536.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.5     541.0
standardized_SOM_(5, 5)_sigma-0.1_rate-0.2      550.0
standardized_SOM_(5, 5)_sigma-0.01_rate-0.2     551.0
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5      553.0
standardized_SOM_(5, 5)_sigma-0.5_rate-0.2      562.0
standardized_SOM_(4, 5)_sigma-0.5_rate-0.5      571.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2      588.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01    594.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01     597.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01    599.0
standardized_SOM_(6, 5)_sigma-1_rate-1          607.0
standardized_SOM_(4, 5)_sigma-1_rate-1          608.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1     611.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1      614.0
standardized_SOM_(4, 5)_sigma-0.5_rate-0.2      616.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1     619.0
standardized_SOM_(5, 5)_sigma-1_rate-0.5        623.0
standardized_SOM_(6, 5)_sigma-1_rate-0.1        624.0
standardized_SOM_(5, 5)_sigma-1_rate-1          625.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05     627.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01     631.0
standardized_SOM_(6, 5)_sigma-1_rate-0.2        634.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.1      635.0
standardized_SOM_(6, 5)_sigma-1_rate-0.5        636.0
standardized_SOM_(6, 5)_sigma-1_rate-0.05       650.0
standardized_SOM_(6, 5)_sigma-1_rate-0.01       651.0
Name: internal, dtype: float64
evaluations["chunks_standardized_GMM_30"]['frequencies']
17    24262
18    21483
12    20069
28    19235
13    17831
5     17506
11    16281
6     16210
3     15925
27    13116
19    12653
1      8949
15     8100
16     8015
9      7612
2      7590
14     6497
0      6306
24     5686
23     5089
29     4582
20     3989
21     3505
26     3180
4      1669
25      610
8       361
10      221
22      179
7        86
dtype: int64
evaluations["chunks_standardized_KMeans_30"]['frequencies']
23    32317
0     28309
15    25717
4     23572
18    18582
7     17673
29    16669
11    15973
20    15864
1     14967
5     14199
13    10040
22     7694
25     7138
9      5783
3      4641
21     3207
19     3031
27     2957
14     2012
16     1952
12     1629
28      776
2       559
17      409
24      339
26      302
6       221
10      179
8        86
dtype: int64
labels["chunks_standardized_KMeans_30"]
array([13, 13, 13, ..., 21, 21, 21], dtype=int32)
data = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/sample_standardized_data.pq").values 
chunk_data = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/chunks_standardized_data.pq").values

%time km = KMeans(n_clusters=30, n_init=10, random_state=42).fit(data)
%time labels_ = km.predict(chunk_data)

# geom51 = gpd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_51.pq", columns=["tessellation", "hindex"])
# geom68 = gpd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_68.pq", columns=["tessellation", "hindex"])
# geom = pd.concat([geom51, geom68]).reset_index(drop=True).rename_geometry("geometry")

geom['labels'] = labels_        
CPU times: user 17min 16s, sys: 4min 24s, total: 21min 41s
Wall time: 1min 28s
CPU times: user 5.76 s, sys: 0 ns, total: 5.76 s
Wall time: 605 ms
data.shape
(250000, 331)
chunk_data.shape
(276797, 331)
    from sklearn import metrics
    import pandas as pd
    import scipy as sp
    import matplotlib.pyplot as plt
    import contextily as ctx
    import urbangrammar_graphics as ugg
    import dask_geopandas
    from utils.dask_geopandas import dask_dissolve
ddf = dask_geopandas.from_geopandas(geom.sort_values('labels'), npartitions=64)
spsig = dask_dissolve(ddf, by='labels').compute().reset_index(drop=True).explode()

cmap = ugg.get_colormap(spsig.labels.nunique(), randomize=True)
token = ""

ax = spsig.cx[332971:361675, 379462:404701].plot("labels", figsize=(20, 20), zorder=1, linewidth=.3, edgecolor='w', alpha=1, legend=True, cmap=cmap, categorical=True)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('roads', token), zorder=2, alpha=.3)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('labels', token), zorder=3, alpha=1)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('background', token), zorder=-1, alpha=1)
ax.set_axis_off()

plt.savefig(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/maps/KMeans_predicted_lpool.png")
plt.close()   

ax = spsig.cx[218800:270628, 645123:695069].plot("labels", figsize=(20, 20), zorder=1, linewidth=.3, edgecolor='w', alpha=1, legend=True, cmap=cmap, categorical=True)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('roads', token), zorder=2, alpha=.3)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('labels', token), zorder=3, alpha=1)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('background', token), zorder=-1, alpha=1)
ax.set_axis_off()
plt.savefig(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/maps/KMeans_predicted_gla.png")
plt.close() 
%time km = KMeans(n_clusters=30, n_init=100, random_state=42).fit(chunk_data)
%time labels_ = km.labels_
CPU times: user 2h 42min 15s, sys: 44min 14s, total: 3h 26min 30s
Wall time: 14min 20s
CPU times: user 104 µs, sys: 0 ns, total: 104 µs
Wall time: 9.3 µs
geom['labels'] = labels_        

ddf = dask_geopandas.from_geopandas(geom.sort_values('labels'), npartitions=64)
spsig = dask_dissolve(ddf, by='labels').compute().reset_index(drop=True).explode()

cmap = ugg.get_colormap(spsig.labels.nunique(), randomize=True)
token = ""

ax = spsig.cx[332971:361675, 379462:404701].plot("labels", figsize=(20, 20), zorder=1, linewidth=.3, edgecolor='w', alpha=1, legend=True, cmap=cmap, categorical=True)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('roads', token), zorder=2, alpha=.3)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('labels', token), zorder=3, alpha=1)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('background', token), zorder=-1, alpha=1)
ax.set_axis_off()

plt.savefig(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/maps/KMeans30_100_lpool.png")
plt.close()   

ax = spsig.cx[218800:270628, 645123:695069].plot("labels", figsize=(20, 20), zorder=1, linewidth=.3, edgecolor='w', alpha=1, legend=True, cmap=cmap, categorical=True)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('roads', token), zorder=2, alpha=.3)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('labels', token), zorder=3, alpha=1)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('background', token), zorder=-1, alpha=1)
ax.set_axis_off()
plt.savefig(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/maps/KMeans30_100_gla.png")
plt.close() 

Full scale

import numpy as np
standardized_form = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/standardized/").set_index('hindex')
stand_fn = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/function/standardized/")
data = dask.dataframe.multi.concat([standardized_form, stand_fn], axis=1).replace([np.inf, -np.inf], np.nan).fillna(0)
%time data = data.compute()
CPU times: user 2min 37s, sys: 1min 25s, total: 4min 2s
Wall time: 2min 44s
from sklearn.cluster import KMeans, MiniBatchKMeans
data
sdbAre_q1 sdbAre_q2 sdbAre_q3 sdbPer_q1 sdbPer_q2 sdbPer_q3 sdbCoA_q1 sdbCoA_q2 sdbCoA_q3 ssbCCo_q1 ... Code_18_521_q2 Code_18_334_q3 Code_18_244_q1 Code_18_244_q2 Code_18_331_q3 Code_18_132_q2 Code_18_132_q3 Code_18_521_q1 Code_18_222_q2 Code_18_521_q3
hindex
c000e094707t0000 -0.947406 -0.371977 0.020285 -0.901199 -0.237045 -0.023143 -0.000419 -0.001515 -0.010221 -0.046170 ... 0.0 0.0 0.0 0.0 -0.008758 0.0 -0.000679 0.0 -0.009142 0.0
c000e094763t0000 -0.913567 -0.420861 -0.271703 -0.903627 -0.428003 -0.336729 -0.000419 -0.001515 -0.010221 -0.035325 ... 0.0 0.0 0.0 0.0 -0.008758 0.0 -0.000679 0.0 -0.009142 0.0
c000e094763t0001 -0.878137 -0.411587 -0.284021 -0.900393 -0.416250 -0.350010 -0.000419 -0.001515 -0.010221 -0.034917 ... 0.0 0.0 0.0 0.0 -0.008758 0.0 -0.000679 0.0 -0.009142 0.0
c000e094763t0002 -0.952475 -0.421566 -0.283919 -0.968400 -0.429947 -0.343165 -0.000419 -0.001515 -0.010221 -0.065649 ... 0.0 0.0 0.0 0.0 -0.008758 0.0 -0.000679 0.0 -0.009142 0.0
c000e094764t0000 -0.964878 -0.420861 -0.271703 -0.972440 -0.420006 -0.315861 -0.000419 -0.001515 -0.010221 -0.066832 ... 0.0 0.0 0.0 0.0 -0.008758 0.0 -0.000679 0.0 -0.009142 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
c102e644989t0111 -0.311466 -0.431706 -0.373463 -0.082269 -0.459270 -0.389532 -0.000419 -0.001515 -0.010221 0.132837 ... 0.0 0.0 0.0 0.0 -0.008758 0.0 -0.000679 0.0 -0.009142 0.0
c102e644989t0112 -0.326671 -0.461825 -0.371855 -0.149873 -0.528701 -0.386678 -0.000419 -0.001515 -0.010221 0.136559 ... 0.0 0.0 0.0 0.0 -0.008758 0.0 -0.000679 0.0 -0.009142 0.0
c102e644989t0113 -0.094236 -0.364761 -0.304254 0.024972 -0.347371 -0.283669 -0.000419 -0.001515 -0.010221 0.021411 ... 0.0 0.0 0.0 0.0 -0.008758 0.0 -0.000679 0.0 -0.009142 0.0
c102e644989t0114 -0.477667 -0.568464 -0.390033 -0.600170 -0.646516 -0.472676 -0.000419 -0.001515 -0.010221 0.424887 ... 0.0 0.0 0.0 0.0 -0.008758 0.0 -0.000679 0.0 -0.009142 0.0
c102e644989t0115 -0.413094 -0.545952 -0.382834 -0.400108 -0.610332 -0.440413 -0.000419 -0.001515 -0.010221 0.160613 ... 0.0 0.0 0.0 0.0 -0.008758 0.0 -0.000679 0.0 -0.009142 0.0

14539578 rows × 331 columns

%time km = KMeans(n_clusters=20, n_init=1, random_state=42).fit(data)
CPU times: user 27min 19s, sys: 1min 9s, total: 28min 29s
Wall time: 5min 11s
%time kmb = MiniBatchKMeans(n_clusters=20, n_init=1, random_state=42, batch_size=1_000_000).fit(data)
CPU times: user 5min 13s, sys: 3min 53s, total: 9min 7s
Wall time: 1min 40s