Evaluation of the effect of chip size and origin

Compare performance of various models in relation to a chip size and origin (mosaic vs temporal)

import json
import glob

from itertools import product

import numpy
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
import urbangrammar_graphics as ugg
path = "../../urbangrammar_samba/spatial_signatures/ai/gb_*_shuffled/json/"
results = glob.glob(path + "*")
names = [i[64:-5] + "_" + i[47:49] for i in results]
with open(results[0], "r") as f:
    result = json.load(f)
accuracy = pd.DataFrame(columns=["global"] + result["meta_class_names"], index=pd.MultiIndex.from_product([names, ["train", "val", "secret"]]))
for r in results[:-1]:
    with open(r, "r") as f:
        result = json.load(f)
    
    accuracy.loc[(result["model_name"]+ "_" + r[47:49], "train")] = [result["perf_model_accuracy_train"]] + result["perf_within_class_accuracy_train"]
    accuracy.loc[(result["model_name"]+ "_" + r[47:49], "val")] = [result["perf_model_accuracy_val"]] + result["perf_within_class_accuracy_val"]
    accuracy.loc[(result["model_name"]+ "_" + r[47:49], "secret")] = [result["perf_model_accuracy_secret"]] + result["perf_within_class_accuracy_secret"]
accuracy
global centres periphery Countryside agriculture Wild countryside Urban buffer
efficientnet_pooling_256_5_16 train 0.540909 0.716571 0.469743 0.500086 0.732914 0.285229
val 0.532293 0.7104 0.4596 0.490533 0.728267 0.272667
secret 0.537547 0.733733 0.453333 0.4912 0.730533 0.278933
efficientnet_pooling_256_5_64 train 0.825237 0.999681 0.978119 0.608968 0.857752 0.681666
val 0.599028 0.876449 0.346168 0.480374 0.777009 0.51514
secret 0.537702 0.698892 0.311963 0.46972 0.779252 0.511028
efficientnet_pooling_256_5_32 train 0.631566 0.912743 0.5622 0.5084 0.826914 0.347571
val 0.561333 0.754533 0.4884 0.469467 0.796667 0.2976
secret 0.561562 0.747511 0.494267 0.474133 0.800533 0.312933
accuracy.xs('val', level=1).sort_index().plot.bar(figsize=(22, 10), title="validation")
<AxesSubplot:title={'center':'validation'}>
../_images/chip_size_eval_9_11.png

Notes:

  1. overall accuracy is unimpressive

  2. in princple, larger chips have a better performance but there’s a big but - if you don’t do massive aggregation, you don’t have enough data for urban signature types

  3. we’re not horrible in predicting centres and wild countryside

  4. urban buffer is a challenging class due to the amout of greenery

  5. periphery and countryside are somewhere in between

  6. we’re good in predicting urban vs non-urban (two blocks on the confusion matrices)

  7. we’re not putting urban buffer to countryside classes (conceptually) but the neural net is (empirically)

  8. 32 is empircally a better way of approximating signatures than 64 - when there is a space to fit 64x64 chip, it usually covers some adjacent green space because other signature types are too small to fit enough chips

    • this could be mitigating by preferring chips with the larger number of intersections with underlying ET cells

      • count the number of intersections, sort and get N chips with the largest number. That way you limit the number of chips from large ET cells covering greenery.

  • ordered confusion matrix has a pattern similar to co-occurency matrix of singature types (empirical paper)

To-do:

  • prediction on 12 classes (combine only urbanity) and analyse the confusion matrix to know which to combine

  • we can try two parallel models, one with a single chip and another one with 3x3 chips around (or a few sampled) and merge before the end with a Dense to combine them [spatial lag]

Next steps:

  1. prediction on 12 classes using 32x32 chips

    • deirive aggregation

  2. pipeline for John’s data to get way more chips

  3. replicate 32x32 training with the new chips

    • both 12 classes and aggregation

  4. consider elimination of chips from large green ET cells (see note 8)

  5. consider parallel model

Timeline:

  • Revisions by end of Feb

  • Empirical paper submitted by end of Apr

  • AI experiments by end of Apr

  • AI paper submitted by end of June

fig, axs = plt.subplots(1, 3, figsize=(18, 6))
for i, r in enumerate(sorted(results)):
    with open(r, "r") as f:
        result = json.load(f)
    a = numpy.array(result['perf_confusion_val'])
    a = a / a.sum(axis=1)[:, numpy.newaxis]
    a = pd.DataFrame(a).iloc[[0,1,2,4,3],[0,1,2,4,3]].values
    im = axs[i].imshow(a, cmap="viridis", vmin=0, vmax=1)
    for k, j in product(range(5), range(5)):
        axs[i].text(j, k, "{:.2f}".format(a[k, j]),
                       ha="center", va="center", color="w")
    axs[i].set_title(result['meta_chip_size'])
#     add labels
fig.colorbar(im, ax=axs[:], shrink=.7)
<matplotlib.colorbar.Colorbar at 0x7f5042555a30>
../_images/chip_size_eval_11_11.png
fig, axs = plt.subplots(1, 3, figsize=(18, 6))
for i, r in enumerate(sorted(results)):
    with open(r, "r") as f:
        result = json.load(f)
    a = numpy.array(result['perf_confusion_val'])
    a = a / a.sum(axis=1)[:, numpy.newaxis]
    im = axs[i].imshow(a, cmap="viridis", vmin=0, vmax=1)
    for k, j in product(range(5), range(5)):
        axs[i].text(j, k, "{:.2f}".format(a[k, j]),
                       ha="center", va="center", color="w")
    axs[i].set_title(result['meta_chip_size'])
fig.colorbar(im, ax=axs[:], shrink=.7)
<matplotlib.colorbar.Colorbar at 0x7f2dd8fd27f0>
../_images/chip_size_eval_12_11.png
fig, axs = plt.subplots(1, 3, figsize=(18, 6))
for i, r in enumerate(sorted(results)):
    with open(r, "r") as f:
        result = json.load(f)
    a = numpy.array(result['perf_confusion_train'])
    a = a / a.sum(axis=1)[:, numpy.newaxis]
    im = axs[i].imshow(a, cmap="viridis", vmin=0, vmax=1)
    for k, j in product(range(5), range(5)):
        axs[i].text(j, k, f"{a[k, j]:.2f}",
                       ha="center", va="center", color="w")
    axs[i].set_title(result['meta_chip_size'])
fig.colorbar(im, ax=axs[:], shrink=.7)
<matplotlib.colorbar.Colorbar at 0x7f2dd8d67e50>
../_images/chip_size_eval_13_11.png

12 classes

results
['../../urbangrammar_samba/spatial_signatures/ai/gb_16_shuffled/json/efficientnet_pooling_256_5.json',
 '../../urbangrammar_samba/spatial_signatures/ai/gb_32_shuffled/json/efficientnet_pooling_256_12.json',
 '../../urbangrammar_samba/spatial_signatures/ai/gb_32_shuffled/json/efficientnet_pooling_256_5.json',
 '../../urbangrammar_samba/spatial_signatures/ai/gb_64_shuffled/json/efficientnet_pooling_256_5.json']
with open(results[1], "r") as f:
    result = json.load(f)
result["meta_class_names"] = [
        "Urbanity", 
        "Dense residential neighbourhoods",
        "Connected residential neighbourhoods",
        "Dense urban neighbourhoods",
        "Accessible suburbia",
        "Open sprawl",
        "Warehouse/Park land",
        "Gridded residential quarters",
        "Disconnected suburbia",
        "Countryside agriculture", 
        "Wild countryside", 
        "Urban buffer"
    ]
accuracy = pd.DataFrame(columns=["global"] + result["meta_class_names"], index=pd.Index(["train", "val", "secret"]))
accuracy.loc["train"] = [result["perf_model_accuracy_train"]] + result["perf_within_class_accuracy_train"]
accuracy.loc["val"] = [result["perf_model_accuracy_val"]] + result["perf_within_class_accuracy_val"]
accuracy.loc["secret"] = [result["perf_model_accuracy_secret"]] + result["perf_within_class_accuracy_secret"]
accuracy
global Urbanity Dense residential neighbourhoods Connected residential neighbourhoods Dense urban neighbourhoods Accessible suburbia Open sprawl Warehouse/Park land Gridded residential quarters Disconnected suburbia Countryside agriculture Wild countryside Urban buffer
train 0.62358 0.844 0.677061 0.7896 0.2669 0.731106 0.7369 0.6127 0.4265 0.4537 0.822284 0.729964 0.5854
val 0.420283 0.618 0.328 0.7345 0.174 0.428009 0.302 0.4855 0.2905 0.3175 0.40856 0.476336 0.498
secret 0.428244 0.553905 0.28684 0.753 0.1915 0.384058 0.339 0.5035 0.285 0.327 0.608543 0.294023 0.5175
accuracy.loc['val'].sort_values(ascending=False).plot.bar(figsize=(22, 10), title="validation")
<AxesSubplot:title={'center':'validation'}>
../_images/chip_size_eval_21_11.png
a = numpy.array(result['perf_confusion_val'])
a = a / a.sum(axis=1)[:, numpy.newaxis]
order = numpy.array([0, 3, 1, 2, 7, 4, 8, 5, 6, 11, 9, 10], dtype=int)
a = pd.DataFrame(a).iloc[order, order].values
fig, ax = plt.subplots(figsize=(12, 12))


im = plt.imshow(a, cmap="viridis", vmin=0, vmax=1)
for k, j in product(range(12), range(12)):
    plt.text(j, k, "{:.2f}".format(a[k, j]),
                   ha="center", va="center", color="w")
fig.colorbar(im, ax=ax, shrink=.7)
plt.xticks(range(12),numpy.array(result["meta_class_names"])[order], rotation=90)
plt.yticks(range(12),numpy.array(result["meta_class_names"])[order])
([<matplotlib.axis.YTick at 0x7efc65b62610>,
  <matplotlib.axis.YTick at 0x7efc65b7ce20>,
  <matplotlib.axis.YTick at 0x7efc65b7c310>,
  <matplotlib.axis.YTick at 0x7efc70238820>,
  <matplotlib.axis.YTick at 0x7efc702f1700>,
  <matplotlib.axis.YTick at 0x7efc7023ee20>,
  <matplotlib.axis.YTick at 0x7efc702465b0>,
  <matplotlib.axis.YTick at 0x7efc70246d00>,
  <matplotlib.axis.YTick at 0x7efc7024a490>,
  <matplotlib.axis.YTick at 0x7efc7024abe0>,
  <matplotlib.axis.YTick at 0x7efc7024a3d0>,
  <matplotlib.axis.YTick at 0x7efc70246130>],
 [Text(0, 0, 'Urbanity'),
  Text(0, 1, 'Dense urban neighbourhoods'),
  Text(0, 2, 'Dense residential neighbourhoods'),
  Text(0, 3, 'Connected residential neighbourhoods'),
  Text(0, 4, 'Gridded residential quarters'),
  Text(0, 5, 'Accessible suburbia'),
  Text(0, 6, 'Disconnected suburbia'),
  Text(0, 7, 'Open sprawl'),
  Text(0, 8, 'Warehouse/Park land'),
  Text(0, 9, 'Urban buffer'),
  Text(0, 10, 'Countryside agriculture'),
  Text(0, 11, 'Wild countryside')])
../_images/chip_size_eval_24_11.png
fig, ax = plt.subplots(figsize=(12, 12))

a = numpy.array(result['perf_confusion_train'])
a = a / a.sum(axis=1)[:, numpy.newaxis]
im = plt.imshow(a, cmap="viridis", vmin=0, vmax=1)
for k, j in product(range(12), range(12)):
    plt.text(j, k, "{:.2f}".format(a[k, j]),
                   ha="center", va="center", color="w")
fig.colorbar(im, ax=ax, shrink=.7)
plt.xticks(range(12),result["meta_class_names"], rotation=90)
plt.yticks(range(12),result["meta_class_names"])
([<matplotlib.axis.YTick at 0x7fee9883fe80>,
  <matplotlib.axis.YTick at 0x7fee9883f760>,
  <matplotlib.axis.YTick at 0x7fee9886c850>,
  <matplotlib.axis.YTick at 0x7fee986f0220>,
  <matplotlib.axis.YTick at 0x7fee986f0970>,
  <matplotlib.axis.YTick at 0x7fee986f6100>,
  <matplotlib.axis.YTick at 0x7fee986f0940>,
  <matplotlib.axis.YTick at 0x7fee986e32e0>,
  <matplotlib.axis.YTick at 0x7fee986f60d0>,
  <matplotlib.axis.YTick at 0x7fee986f6c40>,
  <matplotlib.axis.YTick at 0x7fee986fc3d0>,
  <matplotlib.axis.YTick at 0x7fee986fcb20>],
 [Text(0, 0, 'Urbanity'),
  Text(0, 1, 'Dense residential neighbourhoods'),
  Text(0, 2, 'Connected residential neighbourhoods'),
  Text(0, 3, 'Dense urban neighbourhoods'),
  Text(0, 4, 'Accessible suburbia'),
  Text(0, 5, 'Open sprawl'),
  Text(0, 6, 'Warehouse/Park land'),
  Text(0, 7, 'Gridded residential quarters'),
  Text(0, 8, 'Disconnected suburbia'),
  Text(0, 9, 'Countryside agriculture'),
  Text(0, 10, 'Wild countryside'),
  Text(0, 11, 'Urban buffer')])
../_images/chip_size_eval_25_11.png

Temporal data

The same based on temporal chips

path = "../urbangrammar_samba/spatial_signatures/ai/gb_32_temporal/json/"
results = glob.glob(path + "*")
r = results[0]
with open(r, "r") as f:
    result = json.load(f)
accuracy = pd.DataFrame(columns=["global"] + result["meta_class_names"], index=pd.Index(["train", "val", "secret"]))
accuracy.loc["train"] = [result["perf_model_accuracy_train"]] + result["perf_within_class_accuracy_train"]
accuracy.loc["val"] = [result["perf_model_accuracy_val"]] + result["perf_within_class_accuracy_val"]
accuracy.loc["secret"] = [result["perf_model_accuracy_secret"]] + result["perf_within_class_accuracy_secret"]
accuracy.loc['val'].sort_values(ascending=False).plot.bar(figsize=(22, 10), title="validation")
<AxesSubplot:title={'center':'validation'}>
../_images/chip_size_eval_33_11.png
a = numpy.array(result['perf_confusion_val'])
a = a / a.sum(axis=1)[:, numpy.newaxis]
# order = numpy.array([0, 3, 1, 2, 7, 4, 8, 5, 6, 11, 9, 10], dtype=int)
# a = pd.DataFrame(a).iloc[order, order].values
fig, ax = plt.subplots(figsize=(12, 12))


im = plt.imshow(a, cmap="viridis", vmin=0, vmax=1)
for k, j in product(range(12), range(12)):
    plt.text(j, k, "{:.2f}".format(a[k, j]),
                   ha="center", va="center", color="w")
fig.colorbar(im, ax=ax, shrink=.7)
plt.xticks(range(12),numpy.array(result["meta_class_names"])[order], rotation=90)
plt.yticks(range(12),numpy.array(result["meta_class_names"])[order])
([<matplotlib.axis.YTick at 0x7f9ac06168b0>,
  <matplotlib.axis.YTick at 0x7f9ac0616130>,
  <matplotlib.axis.YTick at 0x7f9ac064fa90>,
  <matplotlib.axis.YTick at 0x7f9ac0421ac0>,
  <matplotlib.axis.YTick at 0x7f9ac04986d0>,
  <matplotlib.axis.YTick at 0x7f9ac042b340>,
  <matplotlib.axis.YTick at 0x7f9ac042bac0>,
  <matplotlib.axis.YTick at 0x7f9ac0434250>,
  <matplotlib.axis.YTick at 0x7f9ac04349a0>,
  <matplotlib.axis.YTick at 0x7f9ac0434850>,
  <matplotlib.axis.YTick at 0x7f9ac042b4f0>,
  <matplotlib.axis.YTick at 0x7f9ac0427790>],
 [Text(0, 0, 'Urbanity'),
  Text(0, 1, 'Dense urban neighbourhoods'),
  Text(0, 2, 'Dense residential neighbourhoods'),
  Text(0, 3, 'Connected residential neighbourhoods'),
  Text(0, 4, 'Gridded residential quarters'),
  Text(0, 5, 'Accessible suburbia'),
  Text(0, 6, 'Disconnected suburbia'),
  Text(0, 7, 'Open sprawl'),
  Text(0, 8, 'Warehouse/Park land'),
  Text(0, 9, 'Urban buffer'),
  Text(0, 10, 'Countryside agriculture'),
  Text(0, 11, 'Wild countryside')])
../_images/chip_size_eval_36_11.png

Balanced data

path = "../../urbangrammar_samba/spatial_signatures/ai/gb_32_balanced_named_v2/json/"
results = glob.glob(path + "*")
r = results[0]
with open(r, "r") as f:
    result = json.load(f)
accuracy = pd.DataFrame(columns=["global"] + result["meta_class_names"], index=pd.Index(["train", "val", "secret"]))
accuracy.loc["train"] = [result["perf_model_accuracy_train"]] + result["perf_within_class_accuracy_train"]
accuracy.loc["val"] = [result["perf_model_accuracy_val"]] + result["perf_within_class_accuracy_val"]
accuracy.loc["secret"] = [result["perf_model_accuracy_secret"]] + result["perf_within_class_accuracy_secret"]
accuracy
global Urbanity Dense residential neighbourhoods Connected residential neighbourhoods Dense urban neighbourhoods Accessible suburbia Open sprawl Warehouse_Park land Gridded residential quarters Disconnected suburbia Countryside agriculture Wild countryside Urban buffer
train 0.554045 0.769971 0.707678 0.615457 0.538372 0.600216 0.477016 0.589415 0.393829 0.233229 0.683429 0.462629 0.762229
val 0.428392 0.6646 0.507874 0.5828 0.238587 0.219465 0.254962 0.230869 0.304 0.2158 0.452206 0.3276 0.7366
secret 0.439124 0.6722 0.481884 0.5816 0.237983 0.211743 0.163166 0.377451 0.286 0.2152 0.383388 0.3482 0.7412
accuracy.loc['val'].sort_values(ascending=False).plot.bar(figsize=(22, 10), title="validation")
<AxesSubplot:title={'center':'validation'}>
../_images/chip_size_eval_45_11.png
a = numpy.array(result['perf_confusion_val'])
a = a / a.sum(axis=1)[:, numpy.newaxis]
order = numpy.array([9, 4, 3, 1, 6, 0, 5, 7, 10, 8, 2, 11], dtype=int)
a = pd.DataFrame(a).iloc[order, order].values
fig, ax = plt.subplots(figsize=(12, 12))


im = plt.imshow(a, vmin=0, vmax=1, cmap='viridis')
for k, j in product(range(12), range(12)):
    plt.text(j, k, "{:.2f}".format(a[k, j]),
                   ha="center", va="center", color="w")
fig.colorbar(im, ax=ax, shrink=.7)
ticks = numpy.array(sorted(result["meta_class_names"]))[order]
plt.xticks(range(12),ticks, rotation=90)
plt.yticks(range(12),ticks)
plt.savefig("figs/image_class_conf.pdf")
../_images/chip_size_eval_49_01.png