Exploration of form-based signatures

This notebook explored form-based signatures, trying to understand what individual clusters represent.

import numpy as np
import pandas as pd
import geopandas as gpd
import dask.dataframe
import matplotlib.pyplot as plt
import urbangrammar_graphics as ugg

from matplotlib.lines import Line2D
%time data = dask.dataframe.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/form/standardized/").replace([np.inf, -np.inf], np.nan).fillna(0)
%time data = data.compute().set_index('hindex')
CPU times: user 105 ms, sys: 138 ms, total: 244 ms
Wall time: 290 ms
CPU times: user 51.5 s, sys: 40.5 s, total: 1min 32s
Wall time: 1min 50s
labels_l1 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/k8_form_labels.pq")
labels_l2_4 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/clustergram_c4_form_labels.pq")
labels_l2_2 = pd.read_parquet("../../urbangrammar_samba/spatial_signatures/clustering_data/clustergram_c2_form_labels.pq")
labels = labels_l1.copy()
labels.loc[labels.k8 == 4, 'k8'] = labels_l2_4['9'].values + 40
labels.loc[labels.k8 == 2, 'k8'] = labels_l2_2['8'].values + 20
labels
k8
hindex
c000e094707t0000 0
c000e094763t0000 5
c000e094763t0001 5
c000e094763t0002 5
c000e094764t0000 5
... ...
c102e644989t0111 5
c102e644989t0112 5
c102e644989t0113 5
c102e644989t0114 5
c102e644989t0115 5

14539578 rows × 1 columns

outliers = [6, 44, 48, 43]
mask = ~labels.k8.isin(outliers)
labels.columns = ["label"]
group = data.groupby(labels['label'].values).mean()  # cluster centroids
group.columns.values
array(['sdbAre_q1', 'sdbAre_q2', 'sdbAre_q3', 'sdbPer_q1', 'sdbPer_q2',
       'sdbPer_q3', 'sdbCoA_q1', 'sdbCoA_q2', 'sdbCoA_q3', 'ssbCCo_q1',
       'ssbCCo_q2', 'ssbCCo_q3', 'ssbCor_q1', 'ssbCor_q2', 'ssbCor_q3',
       'ssbSqu_q1', 'ssbSqu_q2', 'ssbSqu_q3', 'ssbERI_q1', 'ssbERI_q2',
       'ssbERI_q3', 'ssbElo_q1', 'ssbElo_q2', 'ssbElo_q3', 'ssbCCM_q1',
       'ssbCCM_q2', 'ssbCCM_q3', 'ssbCCD_q1', 'ssbCCD_q2', 'ssbCCD_q3',
       'stbOri_q1', 'stbOri_q2', 'stbOri_q3', 'sdcLAL_q1', 'sdcLAL_q2',
       'sdcLAL_q3', 'sdcAre_q1', 'sdcAre_q2', 'sdcAre_q3', 'sscCCo_q1',
       'sscCCo_q2', 'sscCCo_q3', 'sscERI_q1', 'sscERI_q2', 'sscERI_q3',
       'stcOri_q1', 'stcOri_q2', 'stcOri_q3', 'sicCAR_q1', 'sicCAR_q2',
       'sicCAR_q3', 'stbCeA_q1', 'stbCeA_q2', 'stbCeA_q3', 'mtbAli_q1',
       'mtbAli_q2', 'mtbAli_q3', 'mtbNDi_q1', 'mtbNDi_q2', 'mtbNDi_q3',
       'mtcWNe_q1', 'mtcWNe_q2', 'mtcWNe_q3', 'mdcAre_q1', 'mdcAre_q2',
       'mdcAre_q3', 'ltcWRE_q1', 'ltcWRE_q2', 'ltcWRE_q3', 'ltbIBD_q1',
       'ltbIBD_q2', 'ltbIBD_q3', 'sdsSPW_q1', 'sdsSPW_q2', 'sdsSPW_q3',
       'sdsSWD_q1', 'sdsSWD_q2', 'sdsSWD_q3', 'sdsSPO_q1', 'sdsSPO_q2',
       'sdsSPO_q3', 'sdsLen_q1', 'sdsLen_q2', 'sdsLen_q3', 'sssLin_q1',
       'sssLin_q2', 'sssLin_q3', 'ldsMSL_q1', 'ldsMSL_q2', 'ldsMSL_q3',
       'mtdDeg_q1', 'mtdDeg_q2', 'mtdDeg_q3', 'lcdMes_q1', 'lcdMes_q2',
       'lcdMes_q3', 'linP3W_q1', 'linP3W_q2', 'linP3W_q3', 'linP4W_q1',
       'linP4W_q2', 'linP4W_q3', 'linPDE_q1', 'linPDE_q2', 'linPDE_q3',
       'lcnClo_q1', 'lcnClo_q2', 'lcnClo_q3', 'ldsCDL_q1', 'ldsCDL_q2',
       'ldsCDL_q3', 'xcnSCl_q1', 'xcnSCl_q2', 'xcnSCl_q3', 'mtdMDi_q1',
       'mtdMDi_q2', 'mtdMDi_q3', 'lddNDe_q1', 'lddNDe_q2', 'lddNDe_q3',
       'linWID_q1', 'linWID_q2', 'linWID_q3', 'stbSAl_q1', 'stbSAl_q2',
       'stbSAl_q3', 'sddAre_q1', 'sddAre_q2', 'sddAre_q3', 'sdsAre_q1',
       'sdsAre_q2', 'sdsAre_q3', 'sisBpM_q1', 'sisBpM_q2', 'sisBpM_q3',
       'misCel_q1', 'misCel_q2', 'misCel_q3', 'mdsAre_q1', 'mdsAre_q2',
       'mdsAre_q3', 'lisCel_q1', 'lisCel_q2', 'lisCel_q3', 'ldsAre_q1',
       'ldsAre_q2', 'ldsAre_q3', 'ltcRea_q1', 'ltcRea_q2', 'ltcRea_q3',
       'ltcAre_q1', 'ltcAre_q2', 'ltcAre_q3', 'ldeAre_q1', 'ldeAre_q2',
       'ldeAre_q3', 'ldePer_q1', 'ldePer_q2', 'ldePer_q3', 'lseCCo_q1',
       'lseCCo_q2', 'lseCCo_q3', 'lseERI_q1', 'lseERI_q2', 'lseERI_q3',
       'lseCWA_q1', 'lseCWA_q2', 'lseCWA_q3', 'lteOri_q1', 'lteOri_q2',
       'lteOri_q3', 'lteWNB_q1', 'lteWNB_q2', 'lteWNB_q3', 'lieWCe_q1',
       'lieWCe_q2', 'lieWCe_q3'], dtype=object)
sel = group[["sicCAR_q2", "lcdMes_q2", "sdcLAL_q2", "stbCeA_q1", "stcOri_q2", "linPDE_q2", "ssbCor_q3" ,"sdbAre_q3", "sddAre_q2"]]
sel = (sel-sel.min())/(sel.max()-sel.min()) 
sel
sicCAR_q2 lcdMes_q2 sdcLAL_q2 stbCeA_q1 stcOri_q2 linPDE_q2 ssbCor_q3 sdbAre_q3 sddAre_q2
0 0.326064 0.277856 0.240502 0.706473 0.871973 0.699971 1.119532e-02 0.042313 0.404416
1 0.482047 0.362348 0.083403 0.519900 0.829345 0.619649 1.669432e-02 0.067230 0.237370
3 0.305631 0.287905 0.312750 0.646587 0.965733 0.764049 1.568136e-02 0.036948 0.454185
5 0.413370 0.309513 0.127579 0.610906 0.831341 0.665396 1.290708e-02 0.051744 0.295283
6 0.464284 0.000000 0.210695 0.393272 0.000000 1.000000 3.361835e-18 0.177451 0.262379
7 0.000000 0.087568 1.000000 1.000000 1.000000 0.941431 1.647299e-16 0.065779 1.000000
20 0.551620 0.486684 0.062880 0.395459 0.723952 0.525718 2.955567e-02 0.088063 0.148939
21 0.565900 0.453098 0.053280 0.394376 0.818832 0.530562 5.949657e-02 0.150553 0.121038
22 0.560256 0.442562 0.047341 0.414214 0.744077 0.546407 2.617269e-02 0.108811 0.133056
23 0.550755 0.440602 0.053681 0.420885 0.874504 0.549118 2.954338e-02 0.109329 0.137769
24 0.575068 0.486899 0.043510 0.377864 0.842704 0.505271 3.586106e-02 0.099278 0.122654
25 0.553745 0.464867 0.059534 0.399977 0.840077 0.523696 3.411659e-02 0.121572 0.180358
26 0.550945 0.430609 0.049239 0.432253 0.804648 0.550894 2.826390e-02 0.090354 0.134050
27 0.650376 0.568312 0.028627 0.296350 0.799767 0.440345 1.847212e-01 0.258739 0.090640
40 0.682158 0.652837 0.043809 0.274658 0.797039 0.357952 3.051706e-01 0.386590 0.237481
41 0.930326 1.000000 0.021525 0.035035 0.585487 0.039694 5.286567e-01 0.783033 0.007037
42 0.706732 0.723045 0.038523 0.245126 0.842489 0.287014 4.257728e-01 0.443653 0.093988
43 0.026294 0.323674 0.715510 0.712914 0.998924 0.840615 0.000000e+00 0.015136 0.702403
44 0.582772 0.247936 0.000000 0.483896 0.379204 0.664963 6.723671e-18 0.070398 0.029880
45 0.761153 0.824085 0.022645 0.168886 0.785924 0.195078 3.679056e-01 0.473925 0.034220
46 1.000000 0.970814 0.027575 0.000000 0.782228 0.000000 1.000000e+00 1.000000 0.000428
47 0.953251 0.956072 0.029195 0.017098 0.655929 0.021849 9.829798e-01 0.907551 0.000000
48 0.219788 0.371201 0.238190 0.658567 0.754113 0.563664 1.008551e-17 0.000000 0.233776
cmap = ugg.get_colormap(len(sel), randomize=True)
len(sel)
23
fig, axs = plt.subplots(4, 6, figsize=(20, 16), subplot_kw={'projection': 'polar'})

N = len(sel.columns)
angles = [(n / float(N) * 2 * np.pi)  + .05 for n in range(N)]
angles += angles[:1]

for i, ax in enumerate(axs.flatten()):
    try:
        ax.set_xticks(angles[:-1])
        ax.set_xticklabels(sel.columns, color='grey', size=8)
        ax.set_ylim(0, 1.1)
        ax.set_yticks([.25, .5, .75, 1])
        ax.set_yticklabels([.25, .5, .75, 1], color='grey', size=5)
        ax.set_rlabel_position(10)
        ax.set_title(sel.index[i])

        values = sel.iloc[i].values.flatten().tolist()
        values += values[:1]
        ax.plot(angles, values, linewidth=2, linestyle='solid', color=cmap.colors[i])
        ax.fill(angles, values, 'b', alpha=0.2, color=cmap.colors[i])
    except IndexError:
        continue

fig.set_facecolor('white')
../_images/exploration_of_form_signatures_14_0.png
fig, ax = plt.subplots(figsize=(10, 10), subplot_kw={'projection': 'polar'})

N = len(sel.columns)
angles = [(n / float(N) * 2 * np.pi)  + .05 for n in range(N)]
angles += angles[:1]

ax.set_xticks(angles[:-1])
ax.set_xticklabels(sel.columns, color='grey', size=8)
ax.set_ylim(0, 1.1)
ax.set_yticks([.25, .5, .75, 1])
ax.set_yticklabels([.25, .5, .75, 1], color='grey', size=5)
ax.set_rlabel_position(10)
custom_points = []

opts = [o for o in sel.index if str(o).startswith('2')]

for n, i in enumerate(opts):
    values = sel.loc[i].values.flatten().tolist()
    values += values[:1]
    ax.plot(angles, values, linewidth=2, linestyle='solid', color=cmap.colors[n + 1])
    ax.fill(angles, values, 'b', alpha=0.05, color=cmap.colors[n])
    
    custom_points.append(Line2D([0], [0], marker="o", linestyle="none", markersize=5, color=cmap.colors[n + 1]))

fig.set_facecolor('white')
leg_points = ax.legend(custom_points, opts, loc='lower center', frameon=False, ncol=5, bbox_to_anchor=(0.5, -0.1))
ax.add_artist(leg_points)
<matplotlib.legend.Legend at 0x7f9ec5b064f0>
../_images/exploration_of_form_signatures_15_1.png
fig, ax = plt.subplots(figsize=(10, 10), subplot_kw={'projection': 'polar'})

cmap = ugg.get_colormap()

N = len(sel.columns)
angles = [(n / float(N) * 2 * np.pi)  + .05 for n in range(N)]
angles += angles[:1]

ax.set_xticks(angles[:-1])
ax.set_xticklabels(sel.columns, color='grey', size=8)
ax.set_ylim(0, 1.1)
ax.set_yticks([.25, .5, .75, 1])
ax.set_yticklabels([.25, .5, .75, 1], color='grey', size=5)
ax.set_rlabel_position(10)
custom_points = []

opts = [o for o in sel.index if str(o).startswith('4') and o not in [43, 44, 48]]

for n, i in enumerate(opts):
    values = sel.loc[i].values.flatten().tolist()
    values += values[:1]
    ax.plot(angles, values, linewidth=2, linestyle='solid', color=cmap.colors[n + 1])
    ax.fill(angles, values, 'b', alpha=0.05, color=cmap.colors[n])
    
    custom_points.append(Line2D([0], [0], marker="o", linestyle="none", markersize=5, color=cmap.colors[n + 1]))

fig.set_facecolor('white')
leg_points = ax.legend(custom_points, opts, loc='lower center', frameon=False, ncol=5, bbox_to_anchor=(0.5, -0.1))
ax.add_artist(leg_points)
<matplotlib.legend.Legend at 0x7f9e25cb3fd0>
../_images/exploration_of_form_signatures_16_1.png

Summary

Countryside

Cluster 0

  • Large scale countryside without agriculture

Cluster 3

  • Even larger countryside (Highlands, Lowlands)

Cluster 5

  • Counstryside with agriculture

Cluster 7

  • yet another countryside class (one bit in Highlands)

Urban areas

Periphery

Cluster 1

  • Green belt area; urban periphery mixing with countryside, low density, small scale development

Outer parts

Cluster 2-5

  • Predominantly residential neighbourhoods with direct access to large urban green open spaces.

Cluster 2-2

  • peripheral, low density residential neighbourhoods; N-S orientation

Cluster 2-3

  • peripheral, low density residential neighbourhoods; rotated (45) orientation

Cluster 2-6

  • peripheral residential development. relatively well connected network with a low density, wide and open streets.

Cluster 2-0

  • relatively well connected but less granular than 2-6

Cluster 2-4

  • Compact peripheral neighbourhoods, potentially centers of peripheral districts with well connected networks

Cluster 2-7

  • Near center, still relatively dense development with slightly larger and more complex buildings

Cluster 2-1

  • peripheral development with a high ratio of large buildings, potentially of industrial origin

Centers

Central signatures (4-x) are organised in a pyramid way from less central to more. The gradient follows rising density, building area and building complexity.

Only larger cities have these signatures and can be organised based on the number of types present.

London is the only place to have all of them (4-6 and 4-7 are only in London). 4-1 is also in Birmingham, and very marginally in Manchester.

4-5 is more common (centers of Edinburgh and Glasgow, Bristol, Sheffield or Manchester) and 4-2 even more (centers of Liverpool or Southampton).