Clustering model selection matrix¶
This notebook compares performance of various clustering models aimed at the selection of the optimal model for delineation of spatial signatures.
Dimensions¶
Dimensions of models to be tested.
Algorithms¶
K-Means
K-Medoid
SOM
different architectures
grid dimensions
parameter selection
GMM
Data normalisation¶
MinMax stretch
Standardise
RobustScaler?
Dimensionality reduction¶
PCA
tSNA?
K-means with k>200?
Number of clusters¶
n -> m
Input data¶
Form
Function
Form & Function
Comparison¶
Quantitative data¶
Mean sampled silhouette score
Calinski-Harabasz
Davies-Bouldin
BIC
Qualitative data¶
label frequencies
cross tabulation
postcode classification
modum
worldpop
N-S, E-W distribution of cluster centers
weighted by area?
Signatures
polygon areas
distances between signatures of the same kind
number of polygons/components (how many times we see the signature type)
maps for a few cities
Liverpool
Glasgow
London
clustergram for every algorithm
First phase¶
Each algorithm on a few similar
Data normalisation¶
Since we’ll be reusing normalised/standardised data repeatedly, we do the transformation once and store results in chunked parquet files.
import dask.dataframe
form = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/morphometrics/convolutions/conv_*.pq")
standardized = (form - form.mean()) / form.std()
%%time
standardized.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/standardized/")
CPU times: user 3min 43s, sys: 1min 14s, total: 4min 57s
Wall time: 3min 47s
min_max = (form - form.min()) / (form.max() - form.min())
%%time
min_max.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/normalized/")
CPU times: user 3min 1s, sys: 1min 7s, total: 4min 8s
Wall time: 3min 48s
Harmonize chunks¶
Some chunks are missing columns as certain land use types are not present. We need to harmonize our chunks to have the same columns in each of them.
import pyarrow.parquet as pq
columns = set()
for i in range(103):
schema = pq.read_schema(f"../../urbangrammar_samba/spatial_signatures/functional/functional/func_{i}.pq")
for c in schema.names:
columns.add(c)
for i in range(103):
df = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/functional/functional/func_{i}.pq")
missing = [c for c in columns if c not in df.columns]
df[missing] = 0
df.to_parquet(f"../../urbangrammar_samba/spatial_signatures/functional/functional/func_{i}.pq")
%%time
function = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/functional/functional/func_*.pq")
standardized = (function - function.mean()) / function.std()
min_max = (function - function.min()) / (function.max() - function.min())
standardized.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/function/standardized/")
min_max.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/function/normalized/")
CPU times: user 5min 35s, sys: 1min 39s, total: 7min 15s
Wall time: 3min 3s
Ensure that each observation has hindex
.
stand_fn = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/function/standardized/")
standardized_form = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/standardized/")
standardized_form['hindex'] = stand_fn.index.values
standardized_form.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/standardized/")
normalized_form = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/normalized/")
normalized_form['hindex'] = stand_fn.index.values
normalized_form.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/normalized/")
Test cases¶
Test cases:
Chunk 68 - Glasgow 155609
Chunk 51 - Merseyside 121188
Random sample - 250000
standardized_form = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/standardized/").compute().set_index('hindex')
sample = standardized_form.sample(n=250_000, random_state=42)
sample.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/form_standardized.pq")
normalized_form = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/normalized/").compute().set_index('hindex')
sample_norm = normalized_form.loc[sample.index]
sample_norm.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/form_normalized.pq")
stand_fn = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/function/standardized/").compute()
sample_stand_fn = stand_fn.loc[sample.index]
sample_stand_fn.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/function_standardized.pq")
norm_fn = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/function/normalized/").compute()
sample_norm_fn = norm_fn.loc[sample.index]
sample_norm_fn.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/function_normalized.pq")
geoms = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_*.pq").compute().set_index("hindex")
sample_geoms = geoms.loc[sample.index]
sample_geoms.to_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/geometry.pq")
Evaluation¶
Link auxillary data.
import geopandas as gpd
import tobler
import rioxarray
import rasterstats
import numpy as np
parts = {}
parts["chunk51"] = gpd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_51.pq")
parts["chunk68"] = gpd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_68.pq")
parts["sample"] = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/geometry.pq", columns=["tessellation"])
parts["sample"] = gpd.GeoDataFrame(parts["sample"])
parts["sample"]["tessellation"] = gpd.GeoSeries.from_wkb(parts["sample"].tessellation, crs=27700)
parts["sample"] = parts["sample"].set_geometry("tessellation")
for key, gdf in parts.items():
murray = gpd.read_file("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/murray.gpkg", bbox=tuple(gdf.total_bounds))
murray.geometry = murray.buffer(80, cap_style=3)
joined = tobler.area_weighted.area_join(murray, gdf, variables=["ward"])
joined.reset_index()[["hindex", 'ward']].to_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/murray_{key}.pq")
modum = gpd.read_file("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/modumew2016.zip", bbox=tuple(gdf.total_bounds))
joined = tobler.area_weighted.area_join(modum, gdf, variables=["CLUSTER_LA"])
joined.reset_index()[["hindex", 'CLUSTER_LA']].to_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/modum_{key}.pq")
foot = rioxarray.open_rasterio("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/jochem.tif")
foot_osgb = foot.rio.reproject("EPSG:27700")
clipped = foot_osgb.rio.clip_box(*gdf.total_bounds)
arr = clipped.values
affine = clipped.rio.transform()
stats = rasterstats.zonal_stats(
gdf.representative_point(),
raster=arr[0],
affine=affine,
stats=['mean'],
nodata = np.nan,
)
gdf['jochem'] = [x["mean"] for x in stats]
gdf.reset_index()[["hindex", 'jochem']].to_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/jochem_{key}.pq")
print(f"Part {key} done.")
/opt/conda/lib/python3.8/site-packages/geopandas/geodataframe.py:577: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance.
for feature in features_lst:
/opt/conda/lib/python3.8/site-packages/tobler/area_weighted/area_join.py:63: UserWarning: Cannot preserve dtype of 'ward'. Falling back to `dtype=object`.
warnings.warn(
Part sample done.
def evaluation(data, labels, case, identifier, murray, modum, jochem, geom, sample_size=None):
"""Get evaluation metrics for a given clustering
Parameters:
data : array
labels : array
case : string {"chunks", "sample"}
identifier : ID of clustering model
sample_size : int (silhouette_score sample size)
"""
from sklearn import metrics
import pandas as pd
import scipy as sp
import matplotlib.pyplot as plt
import contextily as ctx
import urbangrammar_graphics as ugg
import dask_geopandas
from utils.dask_geopandas import dask_dissolve
results = {}
try:
results['silhouette'] = metrics.silhouette_score(data, labels, sample_size=sample_size, random_state=42)
except ValueError:
results['silhouette'] = np.nan
results['calinski'] = metrics.calinski_harabasz_score(data, labels)
results['davies'] = metrics.davies_bouldin_score(data, labels)
results['frequencies'] = pd.Series(labels).value_counts()
# cross tabulation
modum['labels'] = labels
mod_crosstab = pd.crosstab(modum.dropna()['labels'], modum.dropna()["CLUSTER_LA"])
results['mod_chi'], results['mod_p'], results['mod_dof'], results['mod_exp'] = sp.stats.chi2_contingency(mod_crosstab)
results['mod_cramers_'] = cramers_v(mod_crosstab)
results['mod_crosstab'] = mod_crosstab
murray['labels'] = labels
mur_crosstab = pd.crosstab(murray.dropna()['labels'], murray.dropna()["ward"])
results['mur_chi'], results['mur_p'], results['mur_dof'], results['mur_exp'] = sp.stats.chi2_contingency(mur_crosstab)
results['mur_cramers_v'] = cramers_v(mur_crosstab)
results['mur_crosstab'] = mur_crosstab
jochem['labels'] = labels
joc_crosstab = pd.crosstab(jochem.dropna()['labels'], jochem.dropna()["jochem"])
results['joc_chi'], results['joc_p'], results['joc_dof'], results['joc_exp'] = sp.stats.chi2_contingency(joc_crosstab)
results['joc_cramers_v'] = cramers_v(joc_crosstab)
results['joc_crosstab'] = joc_crosstab
if case == "chunks":
# signatures
geom['labels'] = labels
ddf = dask_geopandas.from_geopandas(geom.sort_values('labels'), npartitions=64)
spsig = dask_dissolve(ddf, by='labels').compute().reset_index(drop=True).explode()
results['signature_abundance'] = spsig.labels.value_counts()
results['signature_areas'] = spsig.area
cmap = ugg.get_colormap(spsig.labels.nunique(), randomize=True)
token = ""
ax = spsig.cx[332971:361675, 379462:404701].plot("labels", figsize=(20, 20), zorder=1, linewidth=.3, edgecolor='w', alpha=1, legend=True, cmap=cmap, categorical=True)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('roads', token), zorder=2, alpha=.3)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('labels', token), zorder=3, alpha=1)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('background', token), zorder=-1, alpha=1)
ax.set_axis_off()
plt.savefig(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/maps/{identifier}_lpool.png")
plt.close()
ax = spsig.cx[218800:270628, 645123:695069].plot("labels", figsize=(20, 20), zorder=1, linewidth=.3, edgecolor='w', alpha=1, legend=True, cmap=cmap, categorical=True)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('roads', token), zorder=2, alpha=.3)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('labels', token), zorder=3, alpha=1)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('background', token), zorder=-1, alpha=1)
ax.set_axis_off()
plt.savefig(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/maps/{identifier}_gla.png")
plt.close()
# else:
# geom['labels'] = labels
# ddf = dask_geopandas.from_geopandas(geom.sort_values('labels'), npartitions=64)
# spsig = dask_dissolve(ddf, by='labels').compute().reset_index().explode()
# centroid = spsig.centroid
# results['x_coords'] = centroid.x
# results['y_coords ']= centroid.y
return results
def cramers_v(confusion_matrix):
import scipy as sp
import numpy as np
chi2 = sp.stats.chi2_contingency(confusion_matrix)[0]
n = confusion_matrix.sum().sum()
phi2 = chi2/n
r,k = confusion_matrix.shape
phi2corr = max(0, phi2-((k-1)*(r-1))/(n-1))
rcorr = r-((r-1)**2)/(n-1)
kcorr = k-((k-1)**2)/(n-1)
return np.sqrt(phi2corr/min((kcorr-1),(rcorr-1)))
for transformation in ["normalized", "standardized"]:
c51 = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/form/{transformation}/part.51.parquet")
c68 = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/form/{transformation}/part.68.parquet")
form = pd.concat([c51, c68]).reset_index(drop=True).drop(columns="hindex")
c51f = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/function/{transformation}/part.51.parquet")
c68f = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/function/{transformation}/part.68.parquet")
fn = pd.concat([c51f, c68f]).reset_index(drop=True)
data = pd.concat([form, fn], axis=1).replace([np.inf, -np.inf], np.nan).fillna(0)
data.to_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/chunks_{transformation}_data.pq")
form = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/form_{transformation}.pq")
fn = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/function_{transformation}.pq")
data = pd.concat([form, fn], axis=1).replace([np.inf, -np.inf], np.nan).fillna(0)
data.to_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/sample_{transformation}_data.pq")
Test matrix¶
# !pip install scikit-learn-extra
# !pip install minisom
from itertools import product
from time import time
import pandas as pd
import numpy as np
import geopandas as gpd
from sklearn.cluster import KMeans, MiniBatchKMeans
from sklearn.mixture import GaussianMixture
from sklearn_extra.cluster import KMedoids
from minisom import MiniSom
labels = {}
times = {}
evaluations = {}
quant_errors = {}
import numpy as np
# for case in ["chunks", "sample"]:
for case in ["sample"]:
if case == "chunks":
mod51 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/modum_chunk51.pq")
mod68 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/modum_chunk68.pq")
modum = pd.concat([mod51, mod68]).reset_index(drop=True)
mur51 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/murray_chunk51.pq")
mur68 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/murray_chunk68.pq")
murray = pd.concat([mur51, mur68]).reset_index(drop=True)
joc51 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/jochem_chunk51.pq")
joc68 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/jochem_chunk68.pq")
jochem = pd.concat([joc51, joc68]).reset_index(drop=True)
geom51 = gpd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_51.pq", columns=["tessellation", "hindex"])
geom68 = gpd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_68.pq", columns=["tessellation", "hindex"])
geom = pd.concat([geom51, geom68]).reset_index(drop=True).rename_geometry("geometry")
else:
modum = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/modum_sample.pq")
murray = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/murray_sample.pq")
jochem = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/validation/jochem_sample.pq")
geom = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/sample/geometry.pq", columns=["tessellation", "hindex"])
geom = gpd.GeoDataFrame(geom, geometry=gpd.GeoSeries.from_wkb(geom.tessellation))
# for transformation in ["normalized", "standardized"]:
for transformation in ["standardized"]:
# load data and prepare numpy.array
if case == "chunks":
data = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/chunks_{transformation}_data.pq").values
else:
data = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/sample_{transformation}_data.pq").values
for k in [10, 15, 20, 30]:
# KMeans
identifier = f"{case}_{transformation}_KMeans_{k}"
s = time()
km = KMeans(n_clusters=k, n_init=10, random_state=42).fit(data)
times[identifier] = time() - s
labels_ = km.labels_
labels[identifier] = labels_
evaluations[identifier] = evaluation(data, labels_, case, identifier, murray, modum, jochem, geom, sample_size=10_000)
print(f"{identifier} done. Time to fit the model: {times[identifier]} seconds.")
for k in [10, 15, 20, 30]:
# MiniBatchKMeans
identifier = f"{case}_{transformation}_MiniBatchKMeans_{k}"
s = time()
km = MiniBatchKMeans(n_clusters=k, batch_size=25_000, n_init=10, random_state=42).fit(data)
times[identifier] = time() - s
labels_ = km.labels_
labels[identifier] = labels_
evaluations[identifier] = evaluation(data, labels_, case, identifier, murray, modum, jochem, geom, sample_size=10_000)
print(f"{identifier} done. Time to fit the model: {times[identifier]} seconds.")
# for k in [10, 15, 20, 30]:
# # K-Medoid
# identifier = f"{case}_{transformation}_KMedoid_{k}"
# s = time()
# km = KMedoids(n_clusters=k, random_state=42).fit(data)
# times[identifier] = time() - s
# labels_ = km.labels_
# labels[identifier] = labels_
# evaluations[identifier] = evaluation(data, labels_, case, identifier, murray, modum, jochem, geom, sample_size=10_000)
# print(f"{identifier} done. Time to fit the model: {times[identifier]} seconds.")
for k in [10, 15, 20, 30]:
# GMM
identifier = f"{case}_{transformation}_GMM_{k}"
s = time()
gmm = GaussianMixture(n_components=k, n_init=10, random_state=42, covariance_type="full", max_iter=500).fit(data)
times[identifier] = time() - s
labels_ = gmm.predict(data)
labels[identifier] = labels_
evaluations[identifier] = evaluation(data, labels_, case, identifier, murray, modum, jochem, geom, sample_size=10_000)
print(f"{identifier} done. Time to fit the model: {times[identifier]} seconds.")
# SOM
for som_shape in [(3, 3), (2, 5), (3, 4), (2, 6), (3, 5), (4, 5), (5, 5), (6, 5)]:
for (sigma, rate) in product([.01, .1, .25, .5, 1], [.01, .05, .1, .2, .5, 1]):
identifier = f"{case}_{transformation}_SOM_{som_shape}_sigma-{sigma}_rate-{rate}"
s = time()
som = MiniSom(som_shape[0], som_shape[1], data.shape[1], sigma=sigma, learning_rate=rate,
topology="hexagonal", random_seed=42)
som.train_batch(data, 50000, verbose=False)
winner_coordinates = np.array([som.winner(x) for x in data])
labels_ = np.apply_along_axis(lambda x: str(tuple(x)), 1, winner_coordinates)
if len(np.unique(labels_)) > 1:
times[identifier] = time() - s
labels[identifier] = labels_
evaluations[identifier] = evaluation(data, labels_, case, identifier, murray, modum, jochem, geom, sample_size=10_000)
quant_errors[identifier] = som.quantization_error(data)
print(f"{identifier} done. Time to fit the model: {times[identifier]} seconds.")
sample_standardized_KMeans_10 done. Time to fit the model: 33.980648040771484 seconds.
sample_standardized_KMeans_15 done. Time to fit the model: 36.58458423614502 seconds.
sample_standardized_KMeans_20 done. Time to fit the model: 50.62095856666565 seconds.
sample_standardized_KMeans_30 done. Time to fit the model: 88.40085291862488 seconds.
sample_standardized_MiniBatchKMeans_10 done. Time to fit the model: 17.53959631919861 seconds.
sample_standardized_MiniBatchKMeans_15 done. Time to fit the model: 19.377883672714233 seconds.
sample_standardized_MiniBatchKMeans_20 done. Time to fit the model: 19.961794137954712 seconds.
sample_standardized_MiniBatchKMeans_30 done. Time to fit the model: 27.228529930114746 seconds.
sample_standardized_GMM_10 done. Time to fit the model: 8817.877690076828 seconds.
sample_standardized_GMM_15 done. Time to fit the model: 8004.2522094249725 seconds.
sample_standardized_GMM_20 done. Time to fit the model: 11593.263016223907 seconds.
sample_standardized_GMM_30 done. Time to fit the model: 21957.19268512726 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.01_rate-0.01 done. Time to fit the model: 6.041452169418335 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.01_rate-0.05 done. Time to fit the model: 6.0424699783325195 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.01_rate-0.1 done. Time to fit the model: 6.0597083568573 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.01_rate-0.2 done. Time to fit the model: 6.382687568664551 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.01_rate-0.5 done. Time to fit the model: 6.059550046920776 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.01_rate-1 done. Time to fit the model: 6.28273344039917 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.1_rate-0.01 done. Time to fit the model: 6.27402925491333 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.1_rate-0.05 done. Time to fit the model: 6.091947555541992 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.1_rate-0.1 done. Time to fit the model: 6.291349411010742 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.1_rate-0.2 done. Time to fit the model: 6.2743775844573975 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.1_rate-0.5 done. Time to fit the model: 6.125728130340576 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.1_rate-1 done. Time to fit the model: 6.091755628585815 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.25_rate-0.01 done. Time to fit the model: 6.207939147949219 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.25_rate-0.05 done. Time to fit the model: 6.399035930633545 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.25_rate-0.1 done. Time to fit the model: 6.149229049682617 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.25_rate-0.2 done. Time to fit the model: 6.255995750427246 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.25_rate-0.5 done. Time to fit the model: 6.120840549468994 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.25_rate-1 done. Time to fit the model: 6.245414733886719 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.5_rate-0.01 done. Time to fit the model: 5.998126745223999 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.5_rate-0.05 done. Time to fit the model: 6.2665159702301025 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.5_rate-0.1 done. Time to fit the model: 6.202017545700073 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.5_rate-0.2 done. Time to fit the model: 6.271490812301636 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.5_rate-0.5 done. Time to fit the model: 6.278396368026733 seconds.
sample_standardized_SOM_(3, 3)_sigma-0.5_rate-1 done. Time to fit the model: 6.106752872467041 seconds.
sample_standardized_SOM_(3, 3)_sigma-1_rate-0.01 done. Time to fit the model: 6.008735656738281 seconds.
sample_standardized_SOM_(3, 3)_sigma-1_rate-0.05 done. Time to fit the model: 6.033384323120117 seconds.
sample_standardized_SOM_(3, 3)_sigma-1_rate-0.1 done. Time to fit the model: 6.1437554359436035 seconds.
sample_standardized_SOM_(3, 3)_sigma-1_rate-0.2 done. Time to fit the model: 6.351598739624023 seconds.
sample_standardized_SOM_(3, 3)_sigma-1_rate-0.5 done. Time to fit the model: 6.337004899978638 seconds.
sample_standardized_SOM_(3, 3)_sigma-1_rate-1 done. Time to fit the model: 6.048030853271484 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.01_rate-0.01 done. Time to fit the model: 6.246315956115723 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.01_rate-0.05 done. Time to fit the model: 6.338611841201782 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.01_rate-0.1 done. Time to fit the model: 6.214588403701782 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.01_rate-0.2 done. Time to fit the model: 6.299610137939453 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.01_rate-0.5 done. Time to fit the model: 6.38018798828125 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.01_rate-1 done. Time to fit the model: 6.186944007873535 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.1_rate-0.01 done. Time to fit the model: 6.1516852378845215 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.1_rate-0.05 done. Time to fit the model: 6.371593952178955 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.1_rate-0.1 done. Time to fit the model: 6.331070899963379 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.1_rate-0.2 done. Time to fit the model: 6.6916892528533936 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.1_rate-0.5 done. Time to fit the model: 6.45002007484436 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.1_rate-1 done. Time to fit the model: 6.447925567626953 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.25_rate-0.01 done. Time to fit the model: 6.230453252792358 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.25_rate-0.05 done. Time to fit the model: 6.4163124561309814 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.25_rate-0.1 done. Time to fit the model: 6.312195301055908 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.25_rate-0.2 done. Time to fit the model: 6.496335744857788 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.25_rate-0.5 done. Time to fit the model: 6.320337533950806 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.25_rate-1 done. Time to fit the model: 6.391258239746094 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.5_rate-0.01 done. Time to fit the model: 6.227715969085693 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.5_rate-0.05 done. Time to fit the model: 6.272166013717651 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.5_rate-0.1 done. Time to fit the model: 6.29688572883606 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.5_rate-0.2 done. Time to fit the model: 6.512700319290161 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.5_rate-0.5 done. Time to fit the model: 6.449718236923218 seconds.
sample_standardized_SOM_(2, 5)_sigma-0.5_rate-1 done. Time to fit the model: 6.2205281257629395 seconds.
sample_standardized_SOM_(2, 5)_sigma-1_rate-0.01 done. Time to fit the model: 6.2186102867126465 seconds.
sample_standardized_SOM_(2, 5)_sigma-1_rate-0.05 done. Time to fit the model: 6.249726295471191 seconds.
sample_standardized_SOM_(2, 5)_sigma-1_rate-0.1 done. Time to fit the model: 6.2725629806518555 seconds.
sample_standardized_SOM_(2, 5)_sigma-1_rate-0.2 done. Time to fit the model: 6.410464763641357 seconds.
sample_standardized_SOM_(2, 5)_sigma-1_rate-0.5 done. Time to fit the model: 6.647894382476807 seconds.
sample_standardized_SOM_(2, 5)_sigma-1_rate-1 done. Time to fit the model: 6.202193737030029 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.01_rate-0.01 done. Time to fit the model: 6.466960906982422 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.01_rate-0.05 done. Time to fit the model: 6.4868175983428955 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.01_rate-0.1 done. Time to fit the model: 6.603813648223877 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.01_rate-0.2 done. Time to fit the model: 6.70729660987854 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.01_rate-0.5 done. Time to fit the model: 6.687565326690674 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.01_rate-1 done. Time to fit the model: 6.6126062870025635 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.1_rate-0.01 done. Time to fit the model: 6.622690200805664 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.1_rate-0.05 done. Time to fit the model: 6.527747392654419 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.1_rate-0.1 done. Time to fit the model: 8.404837846755981 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.1_rate-0.2 done. Time to fit the model: 9.995216131210327 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.1_rate-0.5 done. Time to fit the model: 8.759083271026611 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.1_rate-1 done. Time to fit the model: 6.665789604187012 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.25_rate-0.01 done. Time to fit the model: 6.517737150192261 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.25_rate-0.05 done. Time to fit the model: 6.59808087348938 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.25_rate-0.1 done. Time to fit the model: 6.585826396942139 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.25_rate-0.2 done. Time to fit the model: 6.641624450683594 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.25_rate-0.5 done. Time to fit the model: 8.689597129821777 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.25_rate-1 done. Time to fit the model: 9.790433645248413 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.5_rate-0.01 done. Time to fit the model: 8.342492580413818 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.5_rate-0.05 done. Time to fit the model: 6.496647596359253 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.5_rate-0.1 done. Time to fit the model: 6.587351083755493 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.5_rate-0.2 done. Time to fit the model: 6.779898166656494 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.5_rate-0.5 done. Time to fit the model: 6.784470319747925 seconds.
sample_standardized_SOM_(3, 4)_sigma-0.5_rate-1 done. Time to fit the model: 6.452801465988159 seconds.
sample_standardized_SOM_(3, 4)_sigma-1_rate-0.01 done. Time to fit the model: 6.6474609375 seconds.
sample_standardized_SOM_(3, 4)_sigma-1_rate-0.05 done. Time to fit the model: 6.5371527671813965 seconds.
sample_standardized_SOM_(3, 4)_sigma-1_rate-0.1 done. Time to fit the model: 6.913776159286499 seconds.
sample_standardized_SOM_(3, 4)_sigma-1_rate-0.2 done. Time to fit the model: 6.674077033996582 seconds.
sample_standardized_SOM_(3, 4)_sigma-1_rate-0.5 done. Time to fit the model: 6.895212173461914 seconds.
sample_standardized_SOM_(3, 4)_sigma-1_rate-1 done. Time to fit the model: 6.550818204879761 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.01_rate-0.01 done. Time to fit the model: 6.406692743301392 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.01_rate-0.05 done. Time to fit the model: 6.451681852340698 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.01_rate-0.1 done. Time to fit the model: 6.651016712188721 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.01_rate-0.2 done. Time to fit the model: 6.768008470535278 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.01_rate-0.5 done. Time to fit the model: 6.5549585819244385 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.01_rate-1 done. Time to fit the model: 6.604512453079224 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.1_rate-0.01 done. Time to fit the model: 6.488897800445557 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.1_rate-0.05 done. Time to fit the model: 6.474110841751099 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.1_rate-0.1 done. Time to fit the model: 6.891201496124268 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.1_rate-0.2 done. Time to fit the model: 6.849219799041748 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.1_rate-0.5 done. Time to fit the model: 7.015116453170776 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.1_rate-1 done. Time to fit the model: 6.593651533126831 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.25_rate-0.01 done. Time to fit the model: 6.737154245376587 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.25_rate-0.05 done. Time to fit the model: 8.17149806022644 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.25_rate-0.1 done. Time to fit the model: 9.674012899398804 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.25_rate-0.2 done. Time to fit the model: 8.839396953582764 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.25_rate-0.5 done. Time to fit the model: 6.666623592376709 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.25_rate-1 done. Time to fit the model: 6.6030731201171875 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.5_rate-0.01 done. Time to fit the model: 6.458607912063599 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.5_rate-0.05 done. Time to fit the model: 6.374103784561157 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.5_rate-0.1 done. Time to fit the model: 6.520724058151245 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.5_rate-0.2 done. Time to fit the model: 6.621204376220703 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.5_rate-0.5 done. Time to fit the model: 6.6905903816223145 seconds.
sample_standardized_SOM_(2, 6)_sigma-0.5_rate-1 done. Time to fit the model: 6.467045783996582 seconds.
sample_standardized_SOM_(2, 6)_sigma-1_rate-0.01 done. Time to fit the model: 6.614234685897827 seconds.
sample_standardized_SOM_(2, 6)_sigma-1_rate-0.05 done. Time to fit the model: 6.433612108230591 seconds.
sample_standardized_SOM_(2, 6)_sigma-1_rate-0.1 done. Time to fit the model: 6.642040967941284 seconds.
sample_standardized_SOM_(2, 6)_sigma-1_rate-0.2 done. Time to fit the model: 6.810717821121216 seconds.
sample_standardized_SOM_(2, 6)_sigma-1_rate-0.5 done. Time to fit the model: 6.835365056991577 seconds.
sample_standardized_SOM_(2, 6)_sigma-1_rate-1 done. Time to fit the model: 6.449937343597412 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.01_rate-0.01 done. Time to fit the model: 6.848511457443237 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.01_rate-0.05 done. Time to fit the model: 6.851857423782349 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.01_rate-0.1 done. Time to fit the model: 6.921448469161987 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.01_rate-0.2 done. Time to fit the model: 7.021340370178223 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.01_rate-0.5 done. Time to fit the model: 6.932762384414673 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.01_rate-1 done. Time to fit the model: 7.124608039855957 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.1_rate-0.01 done. Time to fit the model: 6.972275733947754 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.1_rate-0.05 done. Time to fit the model: 7.044497013092041 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.1_rate-0.1 done. Time to fit the model: 7.2646777629852295 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.1_rate-0.2 done. Time to fit the model: 8.82405710220337 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.1_rate-0.5 done. Time to fit the model: 10.416626453399658 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.1_rate-1 done. Time to fit the model: 9.215104579925537 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.25_rate-0.01 done. Time to fit the model: 8.639707326889038 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.25_rate-0.05 done. Time to fit the model: 10.058373928070068 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.25_rate-0.1 done. Time to fit the model: 8.89886999130249 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.25_rate-0.2 done. Time to fit the model: 7.006105422973633 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.25_rate-0.5 done. Time to fit the model: 7.002037286758423 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.25_rate-1 done. Time to fit the model: 6.9071714878082275 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.5_rate-0.01 done. Time to fit the model: 6.878051280975342 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.5_rate-0.05 done. Time to fit the model: 6.978433847427368 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.5_rate-0.1 done. Time to fit the model: 7.020828485488892 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.5_rate-0.2 done. Time to fit the model: 7.195865869522095 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.5_rate-0.5 done. Time to fit the model: 7.115154266357422 seconds.
sample_standardized_SOM_(3, 5)_sigma-0.5_rate-1 done. Time to fit the model: 7.015010356903076 seconds.
sample_standardized_SOM_(3, 5)_sigma-1_rate-0.01 done. Time to fit the model: 6.9420506954193115 seconds.
sample_standardized_SOM_(3, 5)_sigma-1_rate-0.05 done. Time to fit the model: 7.148162841796875 seconds.
sample_standardized_SOM_(3, 5)_sigma-1_rate-0.1 done. Time to fit the model: 7.281223773956299 seconds.
sample_standardized_SOM_(3, 5)_sigma-1_rate-0.2 done. Time to fit the model: 7.0804712772369385 seconds.
sample_standardized_SOM_(3, 5)_sigma-1_rate-0.5 done. Time to fit the model: 7.378613233566284 seconds.
sample_standardized_SOM_(3, 5)_sigma-1_rate-1 done. Time to fit the model: 7.000572204589844 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.01_rate-0.01 done. Time to fit the model: 7.723738193511963 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.01_rate-0.05 done. Time to fit the model: 7.72091817855835 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.01_rate-0.1 done. Time to fit the model: 7.813778638839722 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.01_rate-0.2 done. Time to fit the model: 7.847135543823242 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.01_rate-0.5 done. Time to fit the model: 8.101349592208862 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.01_rate-1 done. Time to fit the model: 7.922775506973267 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.1_rate-0.01 done. Time to fit the model: 7.866520166397095 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.1_rate-0.05 done. Time to fit the model: 7.924614191055298 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.1_rate-0.1 done. Time to fit the model: 7.896871566772461 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.1_rate-0.2 done. Time to fit the model: 7.926555156707764 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.1_rate-0.5 done. Time to fit the model: 7.869636058807373 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.1_rate-1 done. Time to fit the model: 8.035306930541992 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.25_rate-0.01 done. Time to fit the model: 7.94612717628479 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.25_rate-0.05 done. Time to fit the model: 7.792319059371948 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.25_rate-0.1 done. Time to fit the model: 7.886402368545532 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.25_rate-0.2 done. Time to fit the model: 7.945167303085327 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.25_rate-0.5 done. Time to fit the model: 7.895480155944824 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.25_rate-1 done. Time to fit the model: 8.203335523605347 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.5_rate-0.01 done. Time to fit the model: 7.717268705368042 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.5_rate-0.05 done. Time to fit the model: 7.8255932331085205 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.5_rate-0.1 done. Time to fit the model: 8.04367995262146 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.5_rate-0.2 done. Time to fit the model: 9.806514978408813 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.5_rate-0.5 done. Time to fit the model: 11.029207229614258 seconds.
sample_standardized_SOM_(4, 5)_sigma-0.5_rate-1 done. Time to fit the model: 9.586603164672852 seconds.
sample_standardized_SOM_(4, 5)_sigma-1_rate-0.01 done. Time to fit the model: 7.6841535568237305 seconds.
sample_standardized_SOM_(4, 5)_sigma-1_rate-0.05 done. Time to fit the model: 7.742480993270874 seconds.
sample_standardized_SOM_(4, 5)_sigma-1_rate-0.1 done. Time to fit the model: 8.154315948486328 seconds.
sample_standardized_SOM_(4, 5)_sigma-1_rate-0.2 done. Time to fit the model: 9.876065254211426 seconds.
sample_standardized_SOM_(4, 5)_sigma-1_rate-0.5 done. Time to fit the model: 11.988449573516846 seconds.
sample_standardized_SOM_(4, 5)_sigma-1_rate-1 done. Time to fit the model: 9.684459686279297 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.01_rate-0.01 done. Time to fit the model: 8.574947834014893 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.01_rate-0.05 done. Time to fit the model: 8.547278642654419 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.01_rate-0.1 done. Time to fit the model: 8.799760341644287 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.01_rate-0.2 done. Time to fit the model: 8.887267827987671 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.01_rate-0.5 done. Time to fit the model: 8.837449073791504 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.01_rate-1 done. Time to fit the model: 8.59840703010559 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.1_rate-0.01 done. Time to fit the model: 8.567549228668213 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.1_rate-0.05 done. Time to fit the model: 8.490204811096191 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.1_rate-0.1 done. Time to fit the model: 9.251555919647217 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.1_rate-0.2 done. Time to fit the model: 8.830044507980347 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.1_rate-0.5 done. Time to fit the model: 8.744437217712402 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.1_rate-1 done. Time to fit the model: 10.71109390258789 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.25_rate-0.01 done. Time to fit the model: 8.963495254516602 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.25_rate-0.05 done. Time to fit the model: 8.572678327560425 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.25_rate-0.1 done. Time to fit the model: 8.890587568283081 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.25_rate-0.2 done. Time to fit the model: 8.855430364608765 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.25_rate-0.5 done. Time to fit the model: 9.030456781387329 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.25_rate-1 done. Time to fit the model: 8.97514533996582 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.5_rate-0.01 done. Time to fit the model: 8.725110054016113 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.5_rate-0.05 done. Time to fit the model: 8.969297409057617 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.5_rate-0.1 done. Time to fit the model: 8.507286548614502 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.5_rate-0.2 done. Time to fit the model: 11.285863161087036 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.5_rate-0.5 done. Time to fit the model: 10.909534215927124 seconds.
sample_standardized_SOM_(5, 5)_sigma-0.5_rate-1 done. Time to fit the model: 8.99370288848877 seconds.
sample_standardized_SOM_(5, 5)_sigma-1_rate-0.01 done. Time to fit the model: 8.653746128082275 seconds.
sample_standardized_SOM_(5, 5)_sigma-1_rate-0.05 done. Time to fit the model: 8.670263767242432 seconds.
sample_standardized_SOM_(5, 5)_sigma-1_rate-0.1 done. Time to fit the model: 8.526506185531616 seconds.
sample_standardized_SOM_(5, 5)_sigma-1_rate-0.2 done. Time to fit the model: 8.697363138198853 seconds.
sample_standardized_SOM_(5, 5)_sigma-1_rate-0.5 done. Time to fit the model: 9.55558180809021 seconds.
sample_standardized_SOM_(5, 5)_sigma-1_rate-1 done. Time to fit the model: 8.891328811645508 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 done. Time to fit the model: 11.544363021850586 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.01_rate-0.05 done. Time to fit the model: 14.227508068084717 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.01_rate-0.1 done. Time to fit the model: 15.08832836151123 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.01_rate-0.2 done. Time to fit the model: 14.320567846298218 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.01_rate-0.5 done. Time to fit the model: 12.988727807998657 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.01_rate-1 done. Time to fit the model: 12.774047136306763 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 done. Time to fit the model: 12.655745267868042 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.1_rate-0.05 done. Time to fit the model: 13.632773637771606 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.1_rate-0.1 done. Time to fit the model: 13.69734263420105 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.1_rate-0.2 done. Time to fit the model: 11.825275182723999 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.1_rate-0.5 done. Time to fit the model: 11.564648151397705 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.1_rate-1 done. Time to fit the model: 13.11095905303955 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 done. Time to fit the model: 11.38231635093689 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.25_rate-0.05 done. Time to fit the model: 10.2324857711792 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.25_rate-0.1 done. Time to fit the model: 10.075611352920532 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.25_rate-0.2 done. Time to fit the model: 11.45384669303894 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.25_rate-0.5 done. Time to fit the model: 13.136953353881836 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.25_rate-1 done. Time to fit the model: 12.988116979598999 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 done. Time to fit the model: 13.676304578781128 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 done. Time to fit the model: 13.65310263633728 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.5_rate-0.1 done. Time to fit the model: 11.345702409744263 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.5_rate-0.2 done. Time to fit the model: 10.200783014297485 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.5_rate-0.5 done. Time to fit the model: 10.253746271133423 seconds.
sample_standardized_SOM_(6, 5)_sigma-0.5_rate-1 done. Time to fit the model: 9.905344724655151 seconds.
sample_standardized_SOM_(6, 5)_sigma-1_rate-0.01 done. Time to fit the model: 9.97464895248413 seconds.
sample_standardized_SOM_(6, 5)_sigma-1_rate-0.05 done. Time to fit the model: 9.808958053588867 seconds.
sample_standardized_SOM_(6, 5)_sigma-1_rate-0.1 done. Time to fit the model: 9.867895603179932 seconds.
sample_standardized_SOM_(6, 5)_sigma-1_rate-0.2 done. Time to fit the model: 10.446823596954346 seconds.
sample_standardized_SOM_(6, 5)_sigma-1_rate-0.5 done. Time to fit the model: 10.370377779006958 seconds.
sample_standardized_SOM_(6, 5)_sigma-1_rate-1 done. Time to fit the model: 11.236787557601929 seconds.
import pickle
with open("all_data.pickle", "wb") as f:
pickle.dump((labels, times, evaluations, quant_errors), f)
labels = {}
times = {}
evaluations = {}
quant_errors = {}
options = evaluations.keys()
len(options)
1008
one_eval = evaluations[list(options)[0]]
useless = []
for op in list(options):
if 'chunks' in op:
if ((evaluations[op]["frequencies"] / 276797) > .9).any():
useless.append(op)
else:
if ((evaluations[op]["frequencies"] / 250000) > .9).any():
useless.append(op)
len(useless)
388
options = [o for o in list(evaluations.keys()) if o not in useless]
silhouettes_chunks = pd.Series()
silhouettes_sample = pd.Series()
for op in options:
if 'chunks' in op:
silhouettes_chunks[op[7:]] = evaluations[op]['silhouette']
else:
silhouettes_sample[op[7:]] = evaluations[op]['silhouette']
<ipython-input-98-aae62151d6e4>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
silhouettes_chunks = pd.Series()
<ipython-input-98-aae62151d6e4>:2: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
silhouettes_sample = pd.Series()
silhouettes_chunks.sort_values()[:60]
standardized_SOM_(5, 5)_sigma-1_rate-1 -0.024278
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 -0.017866
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 -0.017866
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 -0.017779
standardized_SOM_(3, 4)_sigma-1_rate-0.5 -0.017356
standardized_SOM_(2, 5)_sigma-1_rate-0.5 -0.015143
standardized_SOM_(3, 4)_sigma-1_rate-0.2 -0.015097
standardized_SOM_(3, 4)_sigma-1_rate-1 -0.014982
standardized_SOM_(6, 5)_sigma-1_rate-1 -0.012165
standardized_SOM_(2, 6)_sigma-1_rate-0.5 -0.007150
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 -0.005978
standardized_SOM_(3, 5)_sigma-1_rate-1 -0.005481
standardized_SOM_(2, 6)_sigma-1_rate-1 -0.004917
standardized_SOM_(6, 5)_sigma-1_rate-0.2 -0.004576
standardized_SOM_(5, 5)_sigma-0.25_rate-0.01 -0.003037
standardized_SOM_(5, 5)_sigma-0.1_rate-0.01 -0.003037
standardized_SOM_(5, 5)_sigma-0.01_rate-0.01 -0.003037
standardized_SOM_(4, 5)_sigma-1_rate-1 -0.002892
standardized_SOM_(5, 5)_sigma-1_rate-0.5 0.000401
standardized_SOM_(5, 5)_sigma-0.5_rate-0.01 0.001457
standardized_SOM_(4, 5)_sigma-1_rate-0.5 0.001654
standardized_SOM_(3, 5)_sigma-1_rate-0.5 0.001845
standardized_SOM_(6, 5)_sigma-1_rate-0.5 0.002844
standardized_SOM_(2, 6)_sigma-1_rate-0.1 0.004279
standardized_SOM_(4, 5)_sigma-1_rate-0.05 0.005213
standardized_SOM_(4, 5)_sigma-0.25_rate-0.01 0.009532
standardized_SOM_(4, 5)_sigma-0.1_rate-0.01 0.009661
standardized_SOM_(4, 5)_sigma-0.01_rate-0.01 0.009661
standardized_SOM_(2, 5)_sigma-1_rate-1 0.009980
standardized_SOM_(5, 5)_sigma-1_rate-0.1 0.010408
standardized_SOM_(3, 3)_sigma-1_rate-0.1 0.012268
standardized_SOM_(6, 5)_sigma-1_rate-0.1 0.013664
standardized_SOM_(2, 5)_sigma-1_rate-0.2 0.013981
standardized_SOM_(6, 5)_sigma-1_rate-0.05 0.014087
standardized_SOM_(3, 3)_sigma-1_rate-0.5 0.015501
standardized_GMM_20 0.015964
standardized_SOM_(3, 3)_sigma-1_rate-1 0.016246
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05 0.016587
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05 0.016587
standardized_SOM_(5, 5)_sigma-1_rate-0.05 0.017124
dtype: float64
silhouettes_sample.sort_values()[:20]
standardized_MiniBatchKMeans_15 0.003868
standardized_MiniBatchKMeans_30 0.006452
standardized_GMM_30 0.006930
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 0.007282
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 0.007616
standardized_SOM_(6, 5)_sigma-1_rate-0.01 0.011388
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 0.012175
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 0.012175
standardized_SOM_(4, 5)_sigma-1_rate-0.01 0.012226
standardized_SOM_(5, 5)_sigma-1_rate-0.01 0.012382
normalized_GMM_30 0.012945
standardized_SOM_(5, 5)_sigma-0.1_rate-0.01 0.013606
standardized_SOM_(5, 5)_sigma-0.01_rate-0.01 0.013606
standardized_SOM_(5, 5)_sigma-0.25_rate-0.01 0.015820
standardized_SOM_(5, 5)_sigma-0.5_rate-0.01 0.018615
standardized_SOM_(4, 5)_sigma-0.5_rate-0.01 0.020928
standardized_SOM_(5, 5)_sigma-1_rate-0.05 0.024459
normalized_GMM_20 0.024881
standardized_SOM_(6, 5)_sigma-1_rate-0.05 0.026259
standardized_SOM_(4, 5)_sigma-0.25_rate-0.01 0.026938
dtype: float64
calinski_chunks = pd.Series()
calinski_sample = pd.Series()
for op in options:
if 'chunks' in op:
calinski_chunks[op[7:]] = evaluations[op]['calinski']
else:
calinski_sample[op[7:]] = evaluations[op]['calinski']
<ipython-input-101-3d898caf6c7b>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
calinski_chunks = pd.Series()
<ipython-input-101-3d898caf6c7b>:2: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
calinski_sample = pd.Series()
calinski_chunks.sort_values()[:40]
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 6959.598231
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 6959.598231
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 6960.295718
standardized_SOM_(6, 5)_sigma-1_rate-1 6990.208635
standardized_SOM_(5, 5)_sigma-0.01_rate-1 7421.136225
standardized_SOM_(5, 5)_sigma-0.1_rate-1 7421.136225
standardized_SOM_(5, 5)_sigma-0.25_rate-1 7424.063839
standardized_SOM_(6, 5)_sigma-0.1_rate-1 7430.500504
standardized_SOM_(6, 5)_sigma-0.01_rate-1 7430.500504
standardized_SOM_(6, 5)_sigma-0.25_rate-1 7433.264305
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 7748.246318
standardized_SOM_(5, 5)_sigma-1_rate-1 7825.409446
standardized_SOM_(4, 5)_sigma-1_rate-1 8107.615230
standardized_SOM_(5, 5)_sigma-0.01_rate-0.01 8112.822740
standardized_SOM_(5, 5)_sigma-0.1_rate-0.01 8112.822740
standardized_SOM_(5, 5)_sigma-0.25_rate-0.01 8112.861026
standardized_SOM_(6, 5)_sigma-1_rate-0.1 8151.871415
standardized_SOM_(6, 5)_sigma-1_rate-0.05 8325.496573
standardized_SOM_(6, 5)_sigma-1_rate-0.5 8435.576391
standardized_SOM_(6, 5)_sigma-1_rate-0.2 8531.387056
standardized_SOM_(5, 5)_sigma-1_rate-0.5 8637.871332
standardized_SOM_(6, 5)_sigma-1_rate-0.01 8692.361927
standardized_SOM_(5, 5)_sigma-1_rate-0.05 8757.975202
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05 8880.918331
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05 8880.918331
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05 8889.270389
standardized_SOM_(6, 5)_sigma-0.5_rate-1 8916.942159
standardized_SOM_(6, 5)_sigma-0.25_rate-0.5 9182.328743
standardized_SOM_(5, 5)_sigma-0.01_rate-0.5 9191.094510
standardized_SOM_(5, 5)_sigma-0.1_rate-0.5 9191.094510
standardized_SOM_(6, 5)_sigma-0.5_rate-0.5 9257.700244
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5 9268.506869
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5 9268.506869
standardized_SOM_(5, 5)_sigma-0.5_rate-0.01 9270.206260
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1 9272.037507
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1 9272.037507
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1 9272.245939
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 9293.160804
standardized_SOM_(3, 4)_sigma-1_rate-1 9301.033645
standardized_SOM_(3, 4)_sigma-1_rate-0.5 9387.162748
dtype: float64
calinski_sample.sort_values()[:20]
standardized_SOM_(6, 5)_sigma-1_rate-0.01 4824.010917
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 4889.160249
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 4889.160249
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 4915.344589
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 4980.880058
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5 5014.890213
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5 5014.890213
standardized_SOM_(6, 5)_sigma-0.25_rate-0.5 5014.890213
standardized_MiniBatchKMeans_30 5134.759793
standardized_SOM_(6, 5)_sigma-1_rate-0.05 5153.592227
standardized_SOM_(6, 5)_sigma-0.5_rate-1 5286.382837
standardized_SOM_(6, 5)_sigma-1_rate-0.1 5364.270909
standardized_SOM_(4, 5)_sigma-0.5_rate-0.5 5555.249455
standardized_SOM_(5, 5)_sigma-1_rate-0.01 5559.596842
standardized_SOM_(5, 5)_sigma-0.01_rate-0.01 5678.900502
standardized_SOM_(5, 5)_sigma-0.1_rate-0.01 5678.900502
standardized_SOM_(5, 5)_sigma-0.25_rate-0.01 5739.259915
standardized_SOM_(5, 5)_sigma-0.01_rate-0.5 5772.861153
standardized_SOM_(5, 5)_sigma-0.1_rate-0.5 5772.861153
standardized_SOM_(6, 5)_sigma-1_rate-0.2 5774.465544
dtype: float64
davies_chunks = pd.Series()
davies_sample = pd.Series()
for op in options:
if 'chunks' in op:
davies_chunks[op[7:]] = evaluations[op]['davies']
else:
davies_sample[op[7:]] = evaluations[op]['davies']
<ipython-input-104-073e6f8c2f61>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
davies_chunks = pd.Series()
<ipython-input-104-073e6f8c2f61>:2: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
davies_sample = pd.Series()
davies_chunks.sort_values()[:20]
standardized_SOM_(3, 4)_sigma-0.5_rate-0.2 1.327454
standardized_SOM_(3, 4)_sigma-0.5_rate-0.5 1.328461
standardized_SOM_(3, 5)_sigma-0.5_rate-1 1.336137
standardized_SOM_(3, 5)_sigma-0.1_rate-0.2 1.351700
standardized_SOM_(3, 5)_sigma-0.01_rate-0.2 1.351700
standardized_SOM_(3, 5)_sigma-0.25_rate-0.2 1.354574
standardized_SOM_(2, 6)_sigma-0.5_rate-1 1.363487
standardized_SOM_(2, 5)_sigma-0.5_rate-0.5 1.381653
standardized_SOM_(3, 4)_sigma-0.5_rate-1 1.393961
standardized_SOM_(2, 5)_sigma-0.5_rate-1 1.394819
standardized_SOM_(4, 5)_sigma-0.5_rate-1 1.411874
normalized_SOM_(2, 6)_sigma-0.5_rate-0.01 1.426189
standardized_SOM_(2, 6)_sigma-0.5_rate-0.2 1.426299
standardized_SOM_(5, 5)_sigma-0.25_rate-1 1.437903
standardized_SOM_(5, 5)_sigma-0.01_rate-1 1.440224
standardized_SOM_(5, 5)_sigma-0.1_rate-1 1.440224
standardized_SOM_(3, 5)_sigma-0.5_rate-0.2 1.448240
standardized_SOM_(3, 3)_sigma-0.5_rate-0.01 1.469553
standardized_SOM_(3, 3)_sigma-0.1_rate-0.01 1.478125
standardized_SOM_(3, 3)_sigma-0.01_rate-0.01 1.478125
dtype: float64
davies_sample.sort_values()[:20]
normalized_SOM_(5, 5)_sigma-0.25_rate-1 0.803414
normalized_SOM_(6, 5)_sigma-0.25_rate-1 0.803414
normalized_SOM_(3, 4)_sigma-0.25_rate-1 0.962934
normalized_SOM_(2, 5)_sigma-0.25_rate-1 0.962934
normalized_SOM_(4, 5)_sigma-0.25_rate-1 0.962934
normalized_SOM_(3, 5)_sigma-0.25_rate-1 0.962934
normalized_SOM_(2, 6)_sigma-0.25_rate-1 0.962934
standardized_SOM_(6, 5)_sigma-0.5_rate-1 1.144874
standardized_SOM_(5, 5)_sigma-0.25_rate-0.5 1.193206
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5 1.198918
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5 1.198918
standardized_SOM_(6, 5)_sigma-0.25_rate-0.5 1.198918
standardized_SOM_(5, 5)_sigma-0.01_rate-0.5 1.212181
standardized_SOM_(5, 5)_sigma-0.1_rate-0.5 1.212181
standardized_SOM_(4, 5)_sigma-0.25_rate-0.5 1.212815
standardized_SOM_(4, 5)_sigma-0.01_rate-0.5 1.212815
standardized_SOM_(4, 5)_sigma-0.1_rate-0.5 1.212815
standardized_SOM_(4, 5)_sigma-0.5_rate-0.5 1.264575
normalized_SOM_(3, 3)_sigma-0.25_rate-1 1.308968
normalized_SOM_(3, 4)_sigma-0.5_rate-0.05 1.345703
dtype: float64
one_eval.keys()
dict_keys(['silhouette', 'calinski', 'davies', 'frequencies', 'mod_chi', 'mod_p', 'mod_dof', 'mod_exp', 'mod_cramers_', 'mod_crosstab', 'mur_chi', 'mur_p', 'mur_dof', 'mur_exp', 'mur_cramers_v', 'mur_crosstab', 'joc_chi', 'joc_p', 'joc_dof', 'joc_exp', 'joc_cramers_v', 'joc_crosstab', 'signature_abundance', 'signature_areas'])
fragmentation = pd.Series()
for op in options:
if 'chunks' in op:
fragmentation[op[7:]] = evaluations[op]['signature_abundance'].sum()
<ipython-input-113-66d2c596d8be>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
fragmentation = pd.Series()
fragmentation_area = pd.Series()
for op in options:
if 'chunks' in op:
fragmentation_area[op[7:]] = evaluations[op]['signature_areas'].median()
<ipython-input-136-010d462a7fa0>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
fragmentation_area = pd.Series()
fragmentation.loc[fragmentation.index.str.contains('stand')].sort_values()[:50]
standardized_SOM_(3, 3)_sigma-0.1_rate-0.05 838
standardized_SOM_(3, 3)_sigma-0.01_rate-0.05 838
standardized_SOM_(3, 3)_sigma-0.25_rate-0.05 840
standardized_SOM_(2, 5)_sigma-0.5_rate-1 908
standardized_SOM_(3, 3)_sigma-0.5_rate-0.01 927
standardized_SOM_(3, 3)_sigma-0.25_rate-0.01 931
standardized_SOM_(2, 5)_sigma-0.1_rate-0.05 937
standardized_SOM_(2, 5)_sigma-0.01_rate-0.05 937
standardized_SOM_(2, 6)_sigma-0.5_rate-1 940
standardized_SOM_(2, 5)_sigma-0.25_rate-0.05 940
standardized_SOM_(3, 3)_sigma-0.1_rate-0.01 941
standardized_SOM_(3, 3)_sigma-0.01_rate-0.01 941
standardized_SOM_(3, 4)_sigma-0.5_rate-1 953
standardized_SOM_(3, 4)_sigma-0.5_rate-0.2 964
standardized_SOM_(2, 6)_sigma-0.5_rate-0.2 967
standardized_SOM_(2, 5)_sigma-0.01_rate-0.1 1023
standardized_SOM_(2, 5)_sigma-0.1_rate-0.1 1023
standardized_SOM_(2, 5)_sigma-0.25_rate-0.1 1023
standardized_SOM_(3, 5)_sigma-0.5_rate-1 1029
standardized_SOM_(3, 4)_sigma-0.25_rate-0.1 1040
standardized_SOM_(2, 6)_sigma-0.1_rate-0.1 1040
standardized_SOM_(3, 4)_sigma-0.01_rate-0.1 1040
standardized_SOM_(2, 6)_sigma-0.25_rate-0.1 1040
standardized_SOM_(2, 6)_sigma-0.01_rate-0.1 1040
standardized_SOM_(3, 4)_sigma-0.1_rate-0.1 1040
standardized_SOM_(3, 3)_sigma-0.25_rate-0.1 1101
standardized_SOM_(3, 3)_sigma-0.01_rate-0.1 1103
standardized_SOM_(3, 3)_sigma-0.1_rate-0.1 1103
standardized_SOM_(3, 5)_sigma-0.25_rate-0.2 1146
standardized_SOM_(2, 5)_sigma-0.5_rate-0.5 1147
standardized_SOM_(3, 5)_sigma-0.1_rate-0.2 1148
standardized_SOM_(3, 5)_sigma-0.01_rate-0.2 1148
standardized_SOM_(2, 5)_sigma-0.25_rate-0.01 1151
standardized_SOM_(2, 5)_sigma-0.1_rate-0.01 1151
standardized_SOM_(2, 5)_sigma-0.01_rate-0.01 1151
standardized_SOM_(3, 3)_sigma-0.5_rate-0.1 1185
standardized_SOM_(3, 3)_sigma-0.5_rate-0.05 1188
standardized_SOM_(2, 5)_sigma-0.5_rate-0.01 1203
standardized_SOM_(2, 5)_sigma-0.5_rate-0.05 1203
standardized_SOM_(3, 4)_sigma-0.5_rate-0.5 1218
standardized_SOM_(2, 5)_sigma-0.5_rate-0.1 1219
standardized_SOM_(3, 4)_sigma-0.5_rate-0.1 1248
standardized_SOM_(2, 6)_sigma-0.25_rate-0.05 1328
standardized_SOM_(3, 4)_sigma-0.01_rate-0.05 1329
standardized_SOM_(3, 4)_sigma-0.1_rate-0.05 1329
standardized_SOM_(3, 4)_sigma-0.25_rate-0.05 1329
standardized_SOM_(2, 6)_sigma-0.1_rate-0.05 1329
standardized_SOM_(2, 6)_sigma-0.01_rate-0.05 1329
standardized_SOM_(3, 5)_sigma-0.5_rate-0.2 1353
standardized_SOM_(5, 5)_sigma-0.25_rate-1 1354
dtype: int64
fragmentation.loc[fragmentation.index.str.contains('KMeans')]
normalized_KMeans_10 2547
normalized_KMeans_15 3197
normalized_KMeans_20 3771
normalized_KMeans_30 4779
normalized_MiniBatchKMeans_10 2364
normalized_MiniBatchKMeans_15 3414
normalized_MiniBatchKMeans_20 4120
normalized_MiniBatchKMeans_30 5145
standardized_KMeans_10 1661
standardized_KMeans_15 1994
standardized_KMeans_20 2365
standardized_KMeans_30 3174
standardized_MiniBatchKMeans_10 1654
standardized_MiniBatchKMeans_15 2163
standardized_MiniBatchKMeans_20 2735
standardized_MiniBatchKMeans_30 3389
dtype: int64
fragmentation.loc[fragmentation.index.str.contains('GMM')]
normalized_GMM_10 1409
normalized_GMM_15 1453
normalized_GMM_20 1229
normalized_GMM_30 1379
standardized_GMM_10 1520
standardized_GMM_15 1373
standardized_GMM_20 1531
standardized_GMM_30 1369
dtype: int64
fragmentation_area.loc[fragmentation_area.index.str.contains('stand')].sort_values(ascending=False)[:40]
standardized_GMM_30 2700.965074
standardized_SOM_(2, 6)_sigma-1_rate-0.1 2391.975084
standardized_SOM_(2, 5)_sigma-1_rate-0.05 2374.398935
standardized_SOM_(3, 3)_sigma-1_rate-0.1 2368.246530
standardized_SOM_(4, 5)_sigma-0.5_rate-0.2 2356.992203
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05 2344.947563
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05 2344.947563
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05 2307.287479
standardized_SOM_(3, 5)_sigma-1_rate-0.01 2284.809471
standardized_SOM_(4, 5)_sigma-0.1_rate-0.1 2281.160246
standardized_SOM_(4, 5)_sigma-0.25_rate-0.1 2281.160246
standardized_SOM_(4, 5)_sigma-0.01_rate-0.1 2281.160246
standardized_SOM_(2, 5)_sigma-1_rate-0.2 2270.651929
standardized_SOM_(3, 3)_sigma-1_rate-0.2 2266.443011
standardized_MiniBatchKMeans_30 2244.820890
standardized_GMM_15 2239.217112
standardized_SOM_(2, 6)_sigma-1_rate-0.01 2225.597443
standardized_SOM_(5, 5)_sigma-0.25_rate-0.2 2211.483474
standardized_SOM_(3, 3)_sigma-1_rate-0.5 2209.690410
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2 2208.273922
standardized_SOM_(4, 5)_sigma-1_rate-0.2 2181.945189
standardized_SOM_(5, 5)_sigma-0.25_rate-0.05 2181.621606
standardized_KMeans_15 2178.768019
standardized_SOM_(2, 5)_sigma-0.5_rate-0.01 2172.631353
standardized_SOM_(5, 5)_sigma-0.01_rate-0.05 2171.944870
standardized_SOM_(5, 5)_sigma-0.1_rate-0.05 2171.944870
standardized_SOM_(2, 5)_sigma-0.01_rate-0.01 2167.586785
standardized_SOM_(2, 5)_sigma-0.1_rate-0.01 2167.586785
standardized_SOM_(2, 5)_sigma-0.25_rate-0.01 2167.586785
standardized_SOM_(5, 5)_sigma-0.01_rate-0.2 2162.585085
standardized_SOM_(5, 5)_sigma-0.1_rate-0.2 2162.585085
standardized_SOM_(4, 5)_sigma-0.5_rate-0.01 2160.063375
standardized_GMM_20 2150.874923
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 2146.214265
standardized_KMeans_30 2141.359582
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 2135.201701
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5 2131.123645
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5 2131.123645
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 2129.098278
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 2129.098278
dtype: float64
evaluations["chunks_standardized_GMM_30"]['frequencies']
17 24262
18 21483
12 20069
28 19235
13 17831
5 17506
11 16281
6 16210
3 15925
27 13116
19 12653
1 8949
15 8100
16 8015
9 7612
2 7590
14 6497
0 6306
24 5686
23 5089
29 4582
20 3989
21 3505
26 3180
4 1669
25 610
8 361
10 221
22 179
7 86
dtype: int64
postcode = pd.Series()
for op in options:
if 'chunks' in op:
postcode[op[7:]] = evaluations[op]['mur_cramers_v']
<ipython-input-124-3e38af0033b5>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
postcode = pd.Series()
postcode.sort_values()[-20:]
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 0.193043
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 0.193126
standardized_SOM_(6, 5)_sigma-1_rate-0.01 0.193576
standardized_GMM_15 0.195398
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2 0.195544
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5 0.195676
normalized_GMM_20 0.196112
standardized_SOM_(6, 5)_sigma-1_rate-0.05 0.197000
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 0.197602
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05 0.197952
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05 0.198055
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05 0.198055
standardized_GMM_20 0.200421
normalized_KMeans_30 0.201362
standardized_KMeans_20 0.202131
normalized_MiniBatchKMeans_30 0.204752
normalized_GMM_30 0.204883
standardized_MiniBatchKMeans_30 0.210613
standardized_KMeans_30 0.214844
standardized_GMM_30 0.215383
dtype: float64
jochem = pd.Series()
for op in options:
if 'chunks' in op:
jochem[op[7:]] = evaluations[op]['joc_cramers_v']
<ipython-input-126-60783ea3004f>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
jochem = pd.Series()
jochem.sort_values()[-20:]
standardized_SOM_(6, 5)_sigma-1_rate-0.05 0.293012
standardized_SOM_(5, 5)_sigma-1_rate-0.5 0.293537
standardized_SOM_(6, 5)_sigma-1_rate-0.2 0.293610
normalized_MiniBatchKMeans_30 0.293773
standardized_GMM_15 0.295652
standardized_SOM_(6, 5)_sigma-1_rate-0.01 0.295684
standardized_SOM_(5, 5)_sigma-1_rate-0.01 0.296311
standardized_SOM_(6, 5)_sigma-1_rate-0.5 0.296498
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 0.299157
normalized_GMM_30 0.299246
standardized_MiniBatchKMeans_15 0.299745
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5 0.302494
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2 0.304802
standardized_GMM_20 0.308456
standardized_MiniBatchKMeans_20 0.311887
standardized_KMeans_15 0.313963
standardized_KMeans_20 0.315366
standardized_GMM_30 0.320287
standardized_KMeans_30 0.326838
standardized_MiniBatchKMeans_30 0.329180
dtype: float64
modum = pd.Series()
for op in options:
if 'chunks' in op:
modum[op[7:]] = evaluations[op]['mod_cramers_']
<ipython-input-129-13e43c17149a>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
modum = pd.Series()
modum.sort_values()[-20:]
standardized_SOM_(5, 5)_sigma-1_rate-0.1 0.298132
standardized_SOM_(4, 5)_sigma-1_rate-0.5 0.298227
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1 0.301453
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1 0.301453
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1 0.301463
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 0.301649
standardized_SOM_(5, 5)_sigma-1_rate-0.5 0.302658
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05 0.303264
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05 0.303293
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05 0.303293
standardized_KMeans_30 0.304986
standardized_GMM_30 0.305577
standardized_SOM_(4, 5)_sigma-1_rate-0.05 0.305997
standardized_SOM_(6, 5)_sigma-1_rate-0.05 0.307816
standardized_SOM_(6, 5)_sigma-1_rate-0.1 0.308840
standardized_SOM_(5, 5)_sigma-1_rate-0.01 0.308846
standardized_SOM_(6, 5)_sigma-1_rate-0.01 0.311538
standardized_MiniBatchKMeans_30 0.311836
standardized_SOM_(6, 5)_sigma-1_rate-0.2 0.312026
standardized_SOM_(6, 5)_sigma-1_rate-0.5 0.316889
dtype: float64
score = pd.DataFrame(index=modum.index)
score["modum"] = pd.Series(range(1, 312), index=modum.sort_values(ascending=False).index)
score["postcode_class"] = pd.Series(range(1, 312), index=postcode.sort_values(ascending=False).index)
score["jochem"] = pd.Series(range(1, 312), index=jochem.sort_values(ascending=False).index)
score["fragmentation_count"] = pd.Series(range(1, 312), index=fragmentation.sort_values(ascending=True).index)
score["fragmentation_area"] = pd.Series(range(1, 312), index=fragmentation_area.sort_values(ascending=False).index)
score["davies"] = pd.Series(range(1, 312), index=davies_chunks.sort_values(ascending=True).index)
score["silhouette"] = pd.Series(range(1, 312), index=silhouettes_chunks.sort_values(ascending=True).index)
score["calinski"] = pd.Series(range(1, 312), index=calinski_chunks.sort_values(ascending=True).index)
score["total"] = score.sum(axis=1)
score["comparative"] = score.modum + score.postcode_class + score.jochem
score["internal"] = score.davies + score.silhouette + score.calinski
score["fragmentation"] = score.fragmentation_count + score.fragmentation_area
score.total.sort_values()[:20]
standardized_GMM_30 459
normalized_GMM_30 553
standardized_SOM_(5, 5)_sigma-0.01_rate-0.2 637
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2 640
standardized_SOM_(5, 5)_sigma-0.1_rate-0.2 641
standardized_GMM_20 683
standardized_SOM_(6, 5)_sigma-0.5_rate-0.5 683
standardized_SOM_(5, 5)_sigma-0.25_rate-0.05 694
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5 694
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05 698
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05 700
standardized_GMM_15 709
standardized_SOM_(5, 5)_sigma-0.01_rate-0.05 709
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5 709
standardized_SOM_(5, 5)_sigma-0.1_rate-0.05 711
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5 715
standardized_SOM_(4, 5)_sigma-0.1_rate-0.1 715
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05 716
standardized_SOM_(4, 5)_sigma-0.25_rate-0.1 717
standardized_SOM_(4, 5)_sigma-0.01_rate-0.1 717
Name: total, dtype: int64
score.comparative.sort_values()[:20]
standardized_MiniBatchKMeans_30 7
standardized_GMM_30 13
standardized_KMeans_30 14
standardized_KMeans_20 32
standardized_SOM_(6, 5)_sigma-1_rate-0.01 37
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 39
standardized_SOM_(6, 5)_sigma-1_rate-0.05 40
standardized_SOM_(6, 5)_sigma-1_rate-0.2 42
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2 45
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05 45
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05 48
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05 48
standardized_SOM_(6, 5)_sigma-1_rate-0.5 49
standardized_SOM_(5, 5)_sigma-1_rate-0.01 57
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5 57
standardized_SOM_(5, 5)_sigma-1_rate-0.5 70
standardized_MiniBatchKMeans_20 79
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 80
standardized_SOM_(6, 5)_sigma-1_rate-0.1 83
standardized_KMeans_15 87
Name: comparative, dtype: int64
score.internal.sort_values()[:60]
standardized_SOM_(4, 5)_sigma-0.25_rate-0.5 163
standardized_SOM_(4, 5)_sigma-0.01_rate-0.5 177
standardized_SOM_(4, 5)_sigma-0.1_rate-0.5 178
standardized_SOM_(4, 5)_sigma-0.5_rate-1 185
standardized_SOM_(5, 5)_sigma-0.01_rate-0.5 188
standardized_SOM_(5, 5)_sigma-0.1_rate-0.5 189
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5 199
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5 200
standardized_SOM_(5, 5)_sigma-0.5_rate-1 202
standardized_SOM_(5, 5)_sigma-0.25_rate-0.5 211
standardized_SOM_(6, 5)_sigma-0.01_rate-1 218
standardized_SOM_(6, 5)_sigma-0.1_rate-1 219
standardized_SOM_(6, 5)_sigma-0.25_rate-1 220
standardized_SOM_(6, 5)_sigma-0.25_rate-0.5 223
standardized_SOM_(6, 5)_sigma-0.5_rate-1 237
standardized_SOM_(6, 5)_sigma-0.5_rate-0.5 247
standardized_SOM_(4, 5)_sigma-0.5_rate-0.5 259
standardized_SOM_(3, 4)_sigma-1_rate-0.5 267
standardized_SOM_(5, 5)_sigma-0.25_rate-0.2 269
standardized_SOM_(4, 5)_sigma-1_rate-1 271
standardized_SOM_(2, 6)_sigma-0.5_rate-0.5 277
standardized_SOM_(2, 5)_sigma-1_rate-0.5 286
standardized_SOM_(5, 5)_sigma-0.01_rate-0.2 288
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5 289
standardized_SOM_(5, 5)_sigma-0.1_rate-0.2 291
standardized_SOM_(3, 5)_sigma-0.5_rate-0.5 302
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 307
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 307
standardized_SOM_(3, 4)_sigma-1_rate-1 309
standardized_SOM_(6, 5)_sigma-1_rate-1 309
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1 309
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 310
standardized_SOM_(4, 5)_sigma-0.5_rate-0.2 310
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 310
standardized_SOM_(5, 5)_sigma-1_rate-1 311
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1 311
standardized_SOM_(2, 6)_sigma-1_rate-0.5 312
standardized_SOM_(6, 5)_sigma-1_rate-0.1 312
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1 313
standardized_SOM_(5, 5)_sigma-0.01_rate-1 318
standardized_SOM_(3, 3)_sigma-1_rate-0.2 318
standardized_SOM_(5, 5)_sigma-0.1_rate-1 319
standardized_SOM_(5, 5)_sigma-0.5_rate-0.05 320
standardized_SOM_(5, 5)_sigma-0.5_rate-0.2 320
standardized_SOM_(5, 5)_sigma-0.25_rate-1 320
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 324
standardized_SOM_(6, 5)_sigma-1_rate-0.5 324
standardized_SOM_(6, 5)_sigma-0.1_rate-0.2 324
standardized_SOM_(6, 5)_sigma-0.01_rate-0.2 325
standardized_SOM_(3, 5)_sigma-1_rate-0.5 325
standardized_SOM_(5, 5)_sigma-1_rate-0.5 325
standardized_SOM_(6, 5)_sigma-1_rate-0.2 326
standardized_SOM_(6, 5)_sigma-0.25_rate-0.2 326
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2 329
standardized_SOM_(3, 5)_sigma-0.1_rate-0.05 331
standardized_SOM_(5, 5)_sigma-1_rate-0.05 331
standardized_SOM_(3, 4)_sigma-0.1_rate-0.05 332
standardized_SOM_(6, 5)_sigma-0.5_rate-0.1 332
standardized_SOM_(6, 5)_sigma-1_rate-0.05 332
standardized_SOM_(3, 5)_sigma-0.01_rate-0.05 333
Name: internal, dtype: int64
score.internal.loc[score.index.str.contains("GMM")].sort_values()[:60]
standardized_GMM_30 389
normalized_GMM_30 397
standardized_GMM_20 439
standardized_GMM_15 472
normalized_GMM_20 500
standardized_GMM_10 538
normalized_GMM_15 556
normalized_GMM_10 586
Name: internal, dtype: int64
score.internal.sort_values()[:60]
standardized_SOM_(4, 5)_sigma-0.25_rate-0.5 163
standardized_SOM_(4, 5)_sigma-0.01_rate-0.5 177
standardized_SOM_(4, 5)_sigma-0.1_rate-0.5 178
standardized_SOM_(4, 5)_sigma-0.5_rate-1 185
standardized_SOM_(5, 5)_sigma-0.01_rate-0.5 188
standardized_SOM_(5, 5)_sigma-0.1_rate-0.5 189
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5 199
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5 200
standardized_SOM_(5, 5)_sigma-0.5_rate-1 202
standardized_SOM_(5, 5)_sigma-0.25_rate-0.5 211
standardized_SOM_(6, 5)_sigma-0.01_rate-1 218
standardized_SOM_(6, 5)_sigma-0.1_rate-1 219
standardized_SOM_(6, 5)_sigma-0.25_rate-1 220
standardized_SOM_(6, 5)_sigma-0.25_rate-0.5 223
standardized_SOM_(6, 5)_sigma-0.5_rate-1 237
standardized_SOM_(6, 5)_sigma-0.5_rate-0.5 247
standardized_SOM_(4, 5)_sigma-0.5_rate-0.5 259
standardized_SOM_(3, 4)_sigma-1_rate-0.5 267
standardized_SOM_(5, 5)_sigma-0.25_rate-0.2 269
standardized_SOM_(4, 5)_sigma-1_rate-1 271
standardized_SOM_(2, 6)_sigma-0.5_rate-0.5 277
standardized_SOM_(2, 5)_sigma-1_rate-0.5 286
standardized_SOM_(5, 5)_sigma-0.01_rate-0.2 288
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5 289
standardized_SOM_(5, 5)_sigma-0.1_rate-0.2 291
standardized_SOM_(3, 5)_sigma-0.5_rate-0.5 302
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 307
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 307
standardized_SOM_(3, 4)_sigma-1_rate-1 309
standardized_SOM_(6, 5)_sigma-1_rate-1 309
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1 309
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 310
standardized_SOM_(4, 5)_sigma-0.5_rate-0.2 310
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 310
standardized_SOM_(5, 5)_sigma-1_rate-1 311
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1 311
standardized_SOM_(2, 6)_sigma-1_rate-0.5 312
standardized_SOM_(6, 5)_sigma-1_rate-0.1 312
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1 313
standardized_SOM_(5, 5)_sigma-0.01_rate-1 318
standardized_SOM_(3, 3)_sigma-1_rate-0.2 318
standardized_SOM_(5, 5)_sigma-0.1_rate-1 319
standardized_SOM_(5, 5)_sigma-0.5_rate-0.05 320
standardized_SOM_(5, 5)_sigma-0.5_rate-0.2 320
standardized_SOM_(5, 5)_sigma-0.25_rate-1 320
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 324
standardized_SOM_(6, 5)_sigma-1_rate-0.5 324
standardized_SOM_(6, 5)_sigma-0.1_rate-0.2 324
standardized_SOM_(6, 5)_sigma-0.01_rate-0.2 325
standardized_SOM_(3, 5)_sigma-1_rate-0.5 325
standardized_SOM_(5, 5)_sigma-1_rate-0.5 325
standardized_SOM_(6, 5)_sigma-1_rate-0.2 326
standardized_SOM_(6, 5)_sigma-0.25_rate-0.2 326
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2 329
standardized_SOM_(3, 5)_sigma-0.1_rate-0.05 331
standardized_SOM_(5, 5)_sigma-1_rate-0.05 331
standardized_SOM_(3, 4)_sigma-0.1_rate-0.05 332
standardized_SOM_(6, 5)_sigma-0.5_rate-0.1 332
standardized_SOM_(6, 5)_sigma-1_rate-0.05 332
standardized_SOM_(3, 5)_sigma-0.01_rate-0.05 333
Name: internal, dtype: int64
score.fragmentation.sort_values()[:20]
standardized_GMM_30 57
normalized_GMM_30 58
standardized_GMM_15 84
normalized_GMM_15 86
standardized_SOM_(2, 5)_sigma-0.01_rate-0.01 89
standardized_SOM_(2, 5)_sigma-0.25_rate-0.01 91
standardized_SOM_(3, 3)_sigma-1_rate-0.2 91
standardized_SOM_(2, 5)_sigma-0.5_rate-0.01 92
standardized_SOM_(2, 5)_sigma-0.1_rate-0.01 93
normalized_SOM_(3, 4)_sigma-0.5_rate-0.01 104
normalized_SOM_(2, 6)_sigma-0.5_rate-0.01 105
standardized_SOM_(3, 3)_sigma-1_rate-0.1 111
normalized_GMM_20 113
standardized_SOM_(2, 6)_sigma-1_rate-0.1 114
normalized_SOM_(2, 5)_sigma-0.5_rate-0.01 116
standardized_SOM_(2, 5)_sigma-1_rate-0.2 120
standardized_SOM_(4, 5)_sigma-0.5_rate-0.2 125
standardized_SOM_(2, 5)_sigma-1_rate-0.05 125
standardized_GMM_20 134
standardized_SOM_(3, 3)_sigma-1_rate-0.5 138
Name: fragmentation, dtype: int64
postcode_sample = pd.Series()
jochem_sample = pd.Series()
modum_sample = pd.Series()
for op in options:
if 'sample' in op:
postcode_sample[op[7:]] = evaluations[op]['mur_cramers_v']
jochem_sample[op[7:]] = evaluations[op]['joc_cramers_v']
modum_sample[op[7:]] = evaluations[op]['mod_cramers_']
<ipython-input-149-8df411749ae7>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
postcode_sample = pd.Series()
<ipython-input-149-8df411749ae7>:2: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
jochem_sample = pd.Series()
<ipython-input-149-8df411749ae7>:3: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
modum_sample = pd.Series()
score_sample = pd.DataFrame(index=modum_sample.index)
score_sample["modum"] = pd.Series(range(1, 310), index=modum_sample.sort_values(ascending=False).index)
score_sample["postcode_class"] = pd.Series(range(1, 310), index=postcode_sample.sort_values(ascending=False).index)
score_sample["jochem"] = pd.Series(range(1, 310), index=jochem_sample.sort_values(ascending=False).index)
score_sample["davies"] = pd.Series(range(1, 310), index=davies_sample.sort_values(ascending=True).index)
score_sample["silhouette"] = pd.Series(range(1, 310), index=silhouettes_sample.sort_values(ascending=True).index)
score_sample["calinski"] = pd.Series(range(1, 310), index=calinski_sample.sort_values(ascending=True).index)
score_sample["total"] = score_sample.sum(axis=1)
score_sample["comparative"] = score_sample.modum + score_sample.postcode_class + score_sample.jochem
score_sample["internal"] = score_sample.davies + score_sample.silhouette + score_sample.calinski
score_sample.total.sort_values()[:20]
standardized_KMeans_30 340
standardized_MiniBatchKMeans_30 341
standardized_SOM_(6, 5)_sigma-1_rate-0.01 346
standardized_SOM_(6, 5)_sigma-1_rate-0.05 354
standardized_SOM_(6, 5)_sigma-1_rate-0.1 354
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05 361
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05 361
standardized_SOM_(6, 5)_sigma-1_rate-0.2 371
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 376
standardized_SOM_(5, 5)_sigma-1_rate-0.05 387
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05 391
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 392
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 398
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 400
standardized_SOM_(6, 5)_sigma-0.5_rate-0.1 401
standardized_SOM_(5, 5)_sigma-1_rate-0.1 402
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 403
standardized_SOM_(5, 5)_sigma-1_rate-0.01 403
standardized_SOM_(5, 5)_sigma-0.25_rate-0.05 424
standardized_SOM_(5, 5)_sigma-0.5_rate-0.01 431
Name: total, dtype: int64
score_sample.comparative.loc[score_sample.index.str.contains('stand')].sort_values()[:20]
standardized_SOM_(6, 5)_sigma-1_rate-0.05 36
standardized_SOM_(6, 5)_sigma-1_rate-0.01 37
standardized_MiniBatchKMeans_30 39
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05 41
standardized_SOM_(6, 5)_sigma-1_rate-0.1 42
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05 42
standardized_SOM_(5, 5)_sigma-1_rate-0.05 56
standardized_SOM_(6, 5)_sigma-1_rate-0.2 63
standardized_SOM_(4, 5)_sigma-0.1_rate-0.05 63
standardized_SOM_(4, 5)_sigma-1_rate-0.05 65
standardized_SOM_(4, 5)_sigma-0.01_rate-0.05 66
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05 69
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 69
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 72
standardized_SOM_(5, 5)_sigma-1_rate-0.01 79
standardized_SOM_(4, 5)_sigma-0.5_rate-0.05 87
standardized_SOM_(5, 5)_sigma-0.01_rate-0.05 88
standardized_SOM_(5, 5)_sigma-0.25_rate-0.05 89
standardized_SOM_(5, 5)_sigma-0.5_rate-0.05 90
standardized_SOM_(5, 5)_sigma-1_rate-0.1 91
Name: comparative, dtype: int64
score_sample.internal.sort_values()[:20]
standardized_SOM_(6, 5)_sigma-0.1_rate-0.2 208
standardized_SOM_(6, 5)_sigma-0.01_rate-0.2 209
standardized_SOM_(6, 5)_sigma-0.25_rate-0.2 210
standardized_KMeans_30 218
standardized_SOM_(6, 5)_sigma-0.5_rate-0.5 235
standardized_SOM_(5, 5)_sigma-0.5_rate-0.2 242
standardized_SOM_(5, 5)_sigma-0.25_rate-0.2 258
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2 259
standardized_SOM_(5, 5)_sigma-0.1_rate-0.2 259
standardized_SOM_(5, 5)_sigma-0.01_rate-0.2 263
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5 264
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 284
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 289
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 290
standardized_SOM_(6, 5)_sigma-0.5_rate-1 295
standardized_SOM_(5, 5)_sigma-0.01_rate-0.1 297
standardized_SOM_(5, 5)_sigma-0.1_rate-0.1 298
standardized_SOM_(5, 5)_sigma-1_rate-0.5 298
standardized_SOM_(6, 5)_sigma-1_rate-1 298
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1 302
Name: internal, dtype: int64
(score.total + score_sample.total).loc[score.index.intersection(score_sample.index)].sort_values()[:40]
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05 1059.0
standardized_MiniBatchKMeans_30 1059.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05 1061.0
standardized_SOM_(6, 5)_sigma-1_rate-0.01 1095.0
standardized_KMeans_30 1103.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05 1107.0
standardized_SOM_(5, 5)_sigma-0.25_rate-0.05 1118.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 1133.0
standardized_SOM_(5, 5)_sigma-0.01_rate-0.05 1144.0
standardized_SOM_(6, 5)_sigma-1_rate-0.05 1147.0
standardized_SOM_(5, 5)_sigma-0.1_rate-0.05 1148.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 1154.0
standardized_SOM_(6, 5)_sigma-1_rate-0.2 1162.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 1184.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1 1189.0
standardized_SOM_(5, 5)_sigma-1_rate-0.01 1194.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1 1194.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 1196.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 1202.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1 1208.0
standardized_GMM_30 1210.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2 1228.0
standardized_SOM_(6, 5)_sigma-1_rate-0.1 1239.0
normalized_GMM_30 1244.0
standardized_SOM_(5, 5)_sigma-0.5_rate-0.01 1267.0
standardized_SOM_(5, 5)_sigma-1_rate-0.1 1295.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.1 1299.0
standardized_SOM_(5, 5)_sigma-0.25_rate-0.1 1305.0
standardized_SOM_(4, 5)_sigma-0.5_rate-0.01 1311.0
standardized_SOM_(5, 5)_sigma-1_rate-0.05 1316.0
standardized_SOM_(4, 5)_sigma-1_rate-0.01 1353.0
standardized_KMeans_20 1353.0
standardized_SOM_(5, 5)_sigma-0.01_rate-0.1 1360.0
standardized_SOM_(5, 5)_sigma-0.1_rate-0.1 1360.0
standardized_SOM_(5, 5)_sigma-0.5_rate-0.05 1366.0
standardized_SOM_(4, 5)_sigma-1_rate-0.05 1380.0
standardized_KMeans_15 1387.0
standardized_SOM_(5, 5)_sigma-1_rate-0.2 1409.0
standardized_SOM_(4, 5)_sigma-0.01_rate-0.05 1416.0
standardized_SOM_(4, 5)_sigma-0.1_rate-0.05 1418.0
Name: total, dtype: float64
(score.comparative + score_sample.comparative).loc[score.index.intersection(score_sample.index)].sort_values()[:40]
standardized_MiniBatchKMeans_30 46.0
standardized_SOM_(6, 5)_sigma-1_rate-0.01 74.0
standardized_SOM_(6, 5)_sigma-1_rate-0.05 76.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.05 87.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.05 89.0
standardized_SOM_(6, 5)_sigma-1_rate-0.2 105.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 111.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.05 117.0
standardized_SOM_(6, 5)_sigma-1_rate-0.1 125.0
standardized_SOM_(5, 5)_sigma-1_rate-0.01 136.0
standardized_KMeans_30 136.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 149.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.1 187.0
standardized_SOM_(5, 5)_sigma-0.01_rate-0.05 188.0
standardized_SOM_(5, 5)_sigma-0.25_rate-0.05 188.0
standardized_SOM_(5, 5)_sigma-0.1_rate-0.05 192.0
standardized_SOM_(4, 5)_sigma-1_rate-0.05 197.0
standardized_SOM_(5, 5)_sigma-1_rate-0.1 209.0
standardized_SOM_(5, 5)_sigma-0.5_rate-0.01 209.0
standardized_KMeans_20 217.0
standardized_SOM_(5, 5)_sigma-1_rate-0.05 223.0
standardized_SOM_(4, 5)_sigma-0.5_rate-0.01 235.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 236.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 238.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1 240.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1 242.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 242.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1 251.0
standardized_SOM_(4, 5)_sigma-1_rate-0.01 262.0
standardized_SOM_(5, 5)_sigma-0.5_rate-0.05 269.0
standardized_SOM_(5, 5)_sigma-1_rate-0.2 285.0
standardized_SOM_(4, 5)_sigma-0.25_rate-0.01 304.0
standardized_SOM_(6, 5)_sigma-1_rate-0.5 306.0
normalized_GMM_30 307.0
standardized_MiniBatchKMeans_15 314.0
standardized_KMeans_15 315.0
standardized_SOM_(4, 5)_sigma-0.1_rate-0.01 324.0
standardized_SOM_(4, 5)_sigma-0.01_rate-0.01 326.0
standardized_SOM_(4, 5)_sigma-0.1_rate-0.05 331.0
standardized_SOM_(4, 5)_sigma-0.01_rate-0.05 335.0
Name: comparative, dtype: float64
(score.internal + score_sample.internal).loc[score.index.intersection(score_sample.index)].sort_values()[:40]
standardized_SOM_(6, 5)_sigma-0.5_rate-0.5 482.0
standardized_SOM_(4, 5)_sigma-0.25_rate-0.5 505.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.5 512.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.5 514.0
standardized_SOM_(5, 5)_sigma-0.01_rate-0.5 514.0
standardized_SOM_(5, 5)_sigma-0.1_rate-0.5 516.0
standardized_SOM_(4, 5)_sigma-0.01_rate-0.5 520.0
standardized_SOM_(4, 5)_sigma-0.1_rate-0.5 522.0
standardized_SOM_(5, 5)_sigma-0.25_rate-0.5 526.0
standardized_SOM_(5, 5)_sigma-0.25_rate-0.2 527.0
standardized_SOM_(6, 5)_sigma-0.5_rate-1 532.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.2 532.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.2 534.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.2 536.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.5 541.0
standardized_SOM_(5, 5)_sigma-0.1_rate-0.2 550.0
standardized_SOM_(5, 5)_sigma-0.01_rate-0.2 551.0
standardized_SOM_(5, 5)_sigma-0.5_rate-0.5 553.0
standardized_SOM_(5, 5)_sigma-0.5_rate-0.2 562.0
standardized_SOM_(4, 5)_sigma-0.5_rate-0.5 571.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.2 588.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.01 594.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.01 597.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.01 599.0
standardized_SOM_(6, 5)_sigma-1_rate-1 607.0
standardized_SOM_(4, 5)_sigma-1_rate-1 608.0
standardized_SOM_(6, 5)_sigma-0.01_rate-0.1 611.0
standardized_SOM_(6, 5)_sigma-0.1_rate-0.1 614.0
standardized_SOM_(4, 5)_sigma-0.5_rate-0.2 616.0
standardized_SOM_(6, 5)_sigma-0.25_rate-0.1 619.0
standardized_SOM_(5, 5)_sigma-1_rate-0.5 623.0
standardized_SOM_(6, 5)_sigma-1_rate-0.1 624.0
standardized_SOM_(5, 5)_sigma-1_rate-1 625.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.05 627.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.01 631.0
standardized_SOM_(6, 5)_sigma-1_rate-0.2 634.0
standardized_SOM_(6, 5)_sigma-0.5_rate-0.1 635.0
standardized_SOM_(6, 5)_sigma-1_rate-0.5 636.0
standardized_SOM_(6, 5)_sigma-1_rate-0.05 650.0
standardized_SOM_(6, 5)_sigma-1_rate-0.01 651.0
Name: internal, dtype: float64
evaluations["chunks_standardized_GMM_30"]['frequencies']
17 24262
18 21483
12 20069
28 19235
13 17831
5 17506
11 16281
6 16210
3 15925
27 13116
19 12653
1 8949
15 8100
16 8015
9 7612
2 7590
14 6497
0 6306
24 5686
23 5089
29 4582
20 3989
21 3505
26 3180
4 1669
25 610
8 361
10 221
22 179
7 86
dtype: int64
evaluations["chunks_standardized_KMeans_30"]['frequencies']
23 32317
0 28309
15 25717
4 23572
18 18582
7 17673
29 16669
11 15973
20 15864
1 14967
5 14199
13 10040
22 7694
25 7138
9 5783
3 4641
21 3207
19 3031
27 2957
14 2012
16 1952
12 1629
28 776
2 559
17 409
24 339
26 302
6 221
10 179
8 86
dtype: int64
labels["chunks_standardized_KMeans_30"]
array([13, 13, 13, ..., 21, 21, 21], dtype=int32)
data = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/sample_standardized_data.pq").values
chunk_data = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/clustering_data/sample/chunks_standardized_data.pq").values
%time km = KMeans(n_clusters=30, n_init=10, random_state=42).fit(data)
%time labels_ = km.predict(chunk_data)
# geom51 = gpd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_51.pq", columns=["tessellation", "hindex"])
# geom68 = gpd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_68.pq", columns=["tessellation", "hindex"])
# geom = pd.concat([geom51, geom68]).reset_index(drop=True).rename_geometry("geometry")
geom['labels'] = labels_
CPU times: user 17min 16s, sys: 4min 24s, total: 21min 41s
Wall time: 1min 28s
CPU times: user 5.76 s, sys: 0 ns, total: 5.76 s
Wall time: 605 ms
data.shape
(250000, 331)
chunk_data.shape
(276797, 331)
from sklearn import metrics
import pandas as pd
import scipy as sp
import matplotlib.pyplot as plt
import contextily as ctx
import urbangrammar_graphics as ugg
import dask_geopandas
from utils.dask_geopandas import dask_dissolve
ddf = dask_geopandas.from_geopandas(geom.sort_values('labels'), npartitions=64)
spsig = dask_dissolve(ddf, by='labels').compute().reset_index(drop=True).explode()
cmap = ugg.get_colormap(spsig.labels.nunique(), randomize=True)
token = ""
ax = spsig.cx[332971:361675, 379462:404701].plot("labels", figsize=(20, 20), zorder=1, linewidth=.3, edgecolor='w', alpha=1, legend=True, cmap=cmap, categorical=True)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('roads', token), zorder=2, alpha=.3)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('labels', token), zorder=3, alpha=1)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('background', token), zorder=-1, alpha=1)
ax.set_axis_off()
plt.savefig(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/maps/KMeans_predicted_lpool.png")
plt.close()
ax = spsig.cx[218800:270628, 645123:695069].plot("labels", figsize=(20, 20), zorder=1, linewidth=.3, edgecolor='w', alpha=1, legend=True, cmap=cmap, categorical=True)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('roads', token), zorder=2, alpha=.3)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('labels', token), zorder=3, alpha=1)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('background', token), zorder=-1, alpha=1)
ax.set_axis_off()
plt.savefig(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/maps/KMeans_predicted_gla.png")
plt.close()
%time km = KMeans(n_clusters=30, n_init=100, random_state=42).fit(chunk_data)
%time labels_ = km.labels_
CPU times: user 2h 42min 15s, sys: 44min 14s, total: 3h 26min 30s
Wall time: 14min 20s
CPU times: user 104 µs, sys: 0 ns, total: 104 µs
Wall time: 9.3 µs
geom['labels'] = labels_
ddf = dask_geopandas.from_geopandas(geom.sort_values('labels'), npartitions=64)
spsig = dask_dissolve(ddf, by='labels').compute().reset_index(drop=True).explode()
cmap = ugg.get_colormap(spsig.labels.nunique(), randomize=True)
token = ""
ax = spsig.cx[332971:361675, 379462:404701].plot("labels", figsize=(20, 20), zorder=1, linewidth=.3, edgecolor='w', alpha=1, legend=True, cmap=cmap, categorical=True)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('roads', token), zorder=2, alpha=.3)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('labels', token), zorder=3, alpha=1)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('background', token), zorder=-1, alpha=1)
ax.set_axis_off()
plt.savefig(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/maps/KMeans30_100_lpool.png")
plt.close()
ax = spsig.cx[218800:270628, 645123:695069].plot("labels", figsize=(20, 20), zorder=1, linewidth=.3, edgecolor='w', alpha=1, legend=True, cmap=cmap, categorical=True)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('roads', token), zorder=2, alpha=.3)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('labels', token), zorder=3, alpha=1)
ctx.add_basemap(ax, crs=27700, source=ugg.get_tiles('background', token), zorder=-1, alpha=1)
ax.set_axis_off()
plt.savefig(f"../../urbangrammar_samba/spatial_signatures/clustering_data/validation/maps/KMeans30_100_gla.png")
plt.close()
Full scale¶
import numpy as np
standardized_form = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/standardized/").set_index('hindex')
stand_fn = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/function/standardized/")
data = dask.dataframe.multi.concat([standardized_form, stand_fn], axis=1).replace([np.inf, -np.inf], np.nan).fillna(0)
%time data = data.compute()
CPU times: user 2min 37s, sys: 1min 25s, total: 4min 2s
Wall time: 2min 44s
from sklearn.cluster import KMeans, MiniBatchKMeans
data
sdbAre_q1 | sdbAre_q2 | sdbAre_q3 | sdbPer_q1 | sdbPer_q2 | sdbPer_q3 | sdbCoA_q1 | sdbCoA_q2 | sdbCoA_q3 | ssbCCo_q1 | ... | Code_18_521_q2 | Code_18_334_q3 | Code_18_244_q1 | Code_18_244_q2 | Code_18_331_q3 | Code_18_132_q2 | Code_18_132_q3 | Code_18_521_q1 | Code_18_222_q2 | Code_18_521_q3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
hindex | |||||||||||||||||||||
c000e094707t0000 | -0.947406 | -0.371977 | 0.020285 | -0.901199 | -0.237045 | -0.023143 | -0.000419 | -0.001515 | -0.010221 | -0.046170 | ... | 0.0 | 0.0 | 0.0 | 0.0 | -0.008758 | 0.0 | -0.000679 | 0.0 | -0.009142 | 0.0 |
c000e094763t0000 | -0.913567 | -0.420861 | -0.271703 | -0.903627 | -0.428003 | -0.336729 | -0.000419 | -0.001515 | -0.010221 | -0.035325 | ... | 0.0 | 0.0 | 0.0 | 0.0 | -0.008758 | 0.0 | -0.000679 | 0.0 | -0.009142 | 0.0 |
c000e094763t0001 | -0.878137 | -0.411587 | -0.284021 | -0.900393 | -0.416250 | -0.350010 | -0.000419 | -0.001515 | -0.010221 | -0.034917 | ... | 0.0 | 0.0 | 0.0 | 0.0 | -0.008758 | 0.0 | -0.000679 | 0.0 | -0.009142 | 0.0 |
c000e094763t0002 | -0.952475 | -0.421566 | -0.283919 | -0.968400 | -0.429947 | -0.343165 | -0.000419 | -0.001515 | -0.010221 | -0.065649 | ... | 0.0 | 0.0 | 0.0 | 0.0 | -0.008758 | 0.0 | -0.000679 | 0.0 | -0.009142 | 0.0 |
c000e094764t0000 | -0.964878 | -0.420861 | -0.271703 | -0.972440 | -0.420006 | -0.315861 | -0.000419 | -0.001515 | -0.010221 | -0.066832 | ... | 0.0 | 0.0 | 0.0 | 0.0 | -0.008758 | 0.0 | -0.000679 | 0.0 | -0.009142 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
c102e644989t0111 | -0.311466 | -0.431706 | -0.373463 | -0.082269 | -0.459270 | -0.389532 | -0.000419 | -0.001515 | -0.010221 | 0.132837 | ... | 0.0 | 0.0 | 0.0 | 0.0 | -0.008758 | 0.0 | -0.000679 | 0.0 | -0.009142 | 0.0 |
c102e644989t0112 | -0.326671 | -0.461825 | -0.371855 | -0.149873 | -0.528701 | -0.386678 | -0.000419 | -0.001515 | -0.010221 | 0.136559 | ... | 0.0 | 0.0 | 0.0 | 0.0 | -0.008758 | 0.0 | -0.000679 | 0.0 | -0.009142 | 0.0 |
c102e644989t0113 | -0.094236 | -0.364761 | -0.304254 | 0.024972 | -0.347371 | -0.283669 | -0.000419 | -0.001515 | -0.010221 | 0.021411 | ... | 0.0 | 0.0 | 0.0 | 0.0 | -0.008758 | 0.0 | -0.000679 | 0.0 | -0.009142 | 0.0 |
c102e644989t0114 | -0.477667 | -0.568464 | -0.390033 | -0.600170 | -0.646516 | -0.472676 | -0.000419 | -0.001515 | -0.010221 | 0.424887 | ... | 0.0 | 0.0 | 0.0 | 0.0 | -0.008758 | 0.0 | -0.000679 | 0.0 | -0.009142 | 0.0 |
c102e644989t0115 | -0.413094 | -0.545952 | -0.382834 | -0.400108 | -0.610332 | -0.440413 | -0.000419 | -0.001515 | -0.010221 | 0.160613 | ... | 0.0 | 0.0 | 0.0 | 0.0 | -0.008758 | 0.0 | -0.000679 | 0.0 | -0.009142 | 0.0 |
14539578 rows × 331 columns
%time km = KMeans(n_clusters=20, n_init=1, random_state=42).fit(data)
CPU times: user 27min 19s, sys: 1min 9s, total: 28min 29s
Wall time: 5min 11s
%time kmb = MiniBatchKMeans(n_clusters=20, n_init=1, random_state=42, batch_size=1_000_000).fit(data)
CPU times: user 5min 13s, sys: 3min 53s, total: 9min 7s
Wall time: 1min 40s