Chip making for a subset of GB

This document extracts a series of chips for a region of GB and stores them as numpy arrays ready to be loaded by TensorFlow.

import tools


import geopandas
import contextily
import xarray, rioxarray
import numpy
import pandas
import pyogrio
from shapely.geometry import box
from dask.distributed import Client, LocalCluster

Specs

specs = {
    'bb': box(321566, 365379, 468106, 437198),
    'chip_size': 32,
    'bands': [1, 2, 3], #RGB
    'mosaic_p': (
        '/home/jovyan/work/urbangrammar_samba/'
        'ghs_composite_s2/GHS-composite-S2.vrt'
    ),
    'spsig_p': (
        '/home/jovyan/work/urbangrammar_samba/spatial_signatures/'
        'signatures/'
        'signatures_combined_levels_simplified.gpkg'
    ),
    'tensor': (
        '/home/jovyan/work/urbangrammar_samba/'
        'spatial_signatures/chips/sample_32.npz'
    ),
    'folder': (
        '/home/jovyan/work/urbangrammar_samba/'
        'spatial_signatures/chips/32/'
    ),
}

Load region

  • Mosaic

r = rioxarray.open_rasterio(
    specs['mosaic_p'], chunks={'x': 1024, 'y': 1024}
)
  • Region

region = r.sel(
    band=[1, 2, 3]
).rio.clip_box(
    *specs['bb'].bounds
).compute()

Make chips

%%time
chips = tools.build_grid(
    region.coords['x'],
    region.coords['y'],
    specs['chip_size'],
    crs=region.rio.crs
)
CPU times: user 1.83 s, sys: 68.8 ms, total: 1.9 s
Wall time: 1.74 s

Filter through signatures

  • Read signature layer

%%time
spsig = pyogrio.read_dataframe(specs['spsig_p'])
CPU times: user 936 ms, sys: 274 ms, total: 1.21 s
Wall time: 2.64 s
  • Join chips to signatures by within to keep only single-class chips

%%time
oc_chips = geopandas.sjoin(
    chips, 
    spsig[['signature_type', 'geometry']], 
    how='inner', 
    predicate='within'
)
CPU times: user 3.81 s, sys: 18.5 ms, total: 3.83 s
Wall time: 3.81 s

ax = spsig.plot(facecolor='none', edgecolor='k', figsize=(12, 12))
oc_chips.plot(facecolor='none', edgecolor='red', ax=ax)
minX, minY, maxX, maxY = oc_chips.total_bounds
ax.set_xlim((minX, maxX))
ax.set_ylim((minY, maxY))
contextily.add_basemap(
    ax, crs=oc_chips.crs, source=contextily.providers.CartoDB.Voyager
);
../_images/chip_making_subset_17_0.png
ax = spsig.plot(facecolor='none', edgecolor='k', figsize=(12, 12))
oc_chips.plot(facecolor='none', edgecolor='red', ax=ax)
minX, minY, maxX, maxY = oc_chips.total_bounds
ax.set_xlim((minX, maxX))
ax.set_ylim((minY, maxY))
(365462.947432155, 437199.5895371777)
../_images/chip_making_subset_18_1.png

Load imagery into chips

client = Client(
    LocalCluster(n_workers=16, threads_per_worker=1)
)
client
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-6e0qxdsb', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-00if11q9', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-9ovq9z8c', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-naaycxl0', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-rs2vt_vy', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-yinuktyk', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-r0az67v6', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-x0ez1col', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-wdms9pvr', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-7vxfug7n', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-2f_xmyka', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-69skdxwi', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-i7wbqerl', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-7jbly44z', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-_siuomnu', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-mxapku7e', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-a314zwgw', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-sh4mujfj', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-0ft48y4l', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-f86cx4iq', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-6208hn5f', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-nwwg3sir', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-9guipejt', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-fj8kfxdq', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-tawfngsi', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-wkkk8he6', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-cpx5iwnr', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-z4mkp72t', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-hfphevb9', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-b0xeaazg', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-lu35raz9', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-l6uwizav', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-64lpg7hq', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-4t7s6l0i', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-giioi5s4', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-yd3h5g_s', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-33rts_d1', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-jxze3cao', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-7o4sfm7f', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-2vkuh897', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-mc02rth9', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-r_x5im64', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-fh49okor', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-mrsltk9m', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-ldk0uatp', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-nl8b5yrf', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-a_xlerg0', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-ztfis0hh', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-p392381j', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-tjia1tfd', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-_nv_uxoe', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-b70w1asj', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-0dw8uumw', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-flb15emw', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-36f4nxm1', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-n2r5jlpf', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-3embxh2d', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-227m6ull', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-msfikl_7', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-tetghvvb', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-2im089fp', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-mbkl90_l', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-ne1b9ol3', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-6oaszlvl', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-pfhwsjk9', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-yd9fc0fh', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-kysgkkx2', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-t3mz3ab7', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-b55pr5nf', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-1tlbxeba', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-z64eia7a', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-b60h8tcv', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-vt9noze7', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-j9fsjlkb', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-er5otl13', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-1nz_h_bl', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-4829ayl_', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-zufzv9zp', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-ftqdhhwt', purging
distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/work/signature_ai/dask-worker-space/worker-38tlyrls', purging

Client

Client-053b92b1-7915-11ec-80b1-412b84e05dc5

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

%time out = tools.bag_of_chips(oc_chips, specs, 16)
CPU times: user 5min 3s, sys: 6min 15s, total: 11min 18s
Wall time: 5min 53s

Shuffle

numpy.random.seed(42)

shuffled_idx = numpy.arange(0, out.shape[0])
numpy.random.shuffle(shuffled_idx)

out = out[shuffled_idx]

labels = oc_chips.signature_type.values[shuffled_idx].reshape((-1,1))

Write to disk

Once ready, we store the array as a .npz file to be shipped to TensorFlow.

%%time
numpy.savez_compressed(specs["tensor"], labels=labels, chips=out)
CPU times: user 2min 41s, sys: 1min 57s, total: 4min 38s
Wall time: 1min 24s

Spill chips to disk for out-of-core computation

%%time
# max value for normalisation
specs['max'] = int(region.quantile(.99))
CPU times: user 733 ms, sys: 135 ms, total: 867 ms
Wall time: 689 ms
client = Client(
    LocalCluster(n_workers=16, threads_per_worker=1)
)
client
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.

Client

Client-e091e21f-791f-11ec-8a38-0753886cb588

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

%%time
tools.spilled_bag_of_chips(oc_chips, specs, npartitions=16)
CPU times: user 38.7 s, sys: 5.86 s, total: 44.6 s
Wall time: 2min 14s

Add shifted chips

Count the number of chips per class

import glob

for f in glob.glob(specs['folder'] + "*"):
    print(f[-3:], len(glob.glob(f + "/*")))
4_0 3090
9_2 11
2_0 379
9_0 18
7_0 26610
2_2 255
1_0 1576
8_0 76
6_0 271
2_1 243
3_0 4827
9_4 1
0_0 25571
5_0 2114

Add overlapping chips for some classes.

# translate
from itertools import product

subset_to_add = ["4_0", "9_2", "2_0", "9_0", "2_2", "1_0", "8_0", "6_0", "2_1", "9_4", "5_0"]

for x, y in product([0, 80, 160, 240], [80, 160, 240]):
    chips_translated = chips.copy()
    chips_translated.geometry = chips_translated.geometry.translate(xoff=x, yoff=y)
    chips_translated["X"] = chips_translated["X"] + x
    chips_translated["Y"] = chips_translated["Y"] + y

    oc_chips_sub = geopandas.sjoin(
        chips_translated, 
        spsig[['signature_type', 'geometry']][spsig.signature_type.isin(subset_to_add)], 
        how='inner', 
        predicate='within'
    )

    tools.spilled_bag_of_chips(oc_chips_sub, specs, npartitions=16)
    print(x, y, "done")
0 80 done
0 160 done
0 240 done
80 80 done
80 160 done
80 240 done
160 80 done
160 160 done
160 240 done
240 80 done
240 160 done
240 240 done
for f in glob.glob(specs['folder'] + "*"):
    print(f[-3:], len(glob.glob(f + "/*")))
4_0 39986
9_2 124
2_0 4931
9_0 218
7_0 26610
2_2 3168
1_0 20442
8_0 934
6_0 3617
2_1 3091
3_0 4827
9_4 7
0_0 25571
5_0 27725

A few more.

subset_to_add = ["9_2", "2_0", "9_0", "2_2", "8_0", "6_0", "2_1", "9_4"]

for x, y in product([0, 40, 120, 200], [40, 120, 200]):
    chips_translated = chips.copy()
    chips_translated.geometry = chips_translated.geometry.translate(xoff=x, yoff=y)
    chips_translated["X"] = chips_translated["X"] + x
    chips_translated["Y"] = chips_translated["Y"] + y

    oc_chips_sub = geopandas.sjoin(
        chips_translated, 
        spsig[['signature_type', 'geometry']][spsig.signature_type.isin(subset_to_add)], 
        how='inner', 
        predicate='within'
    )

    tools.spilled_bag_of_chips(oc_chips_sub, specs, npartitions=16)
    print(x, y, "done")
0 40 done
0 120 done
0 200 done
40 40 done
40 120 done
40 200 done
120 40 done
120 120 done
120 200 done
200 40 done
200 120 done
200 200 done
for f in glob.glob(specs['folder'] + "*"):
    print(f[-3:], len(glob.glob(f + "/*")))
4_0 39986
9_2 261
2_0 9459
9_0 407
7_0 26610
2_2 6108
1_0 20442
8_0 1824
6_0 6948
2_1 5947
3_0 4827
9_4 14
0_0 25571
5_0 27725
subset_to_add = ["9_2", "9_0", "9_4"]

for x, y in product([0, 20, 60, 100, 140, 180, 220, 260, 280, 300], [20, 60, 100, 140, 180, 220, 260, 280, 300]):
    chips_translated = chips.copy()
    chips_translated.geometry = chips_translated.geometry.translate(xoff=x, yoff=y)
    chips_translated["X"] = chips_translated["X"] + x
    chips_translated["Y"] = chips_translated["Y"] + y

    oc_chips_sub = geopandas.sjoin(
        chips_translated, 
        spsig[['signature_type', 'geometry']][spsig.signature_type.isin(subset_to_add)], 
        how='inner', 
        predicate='within'
    )

    tools.spilled_bag_of_chips(oc_chips_sub, specs, npartitions=16)
    print(x, y, "done")
0 20 done
0 60 done
0 100 done
0 140 done
0 180 done
0 220 done
0 260 done
0 280 done
0 300 done
20 20 done
20 60 done
20 100 done
20 140 done
20 180 done
20 220 done
20 260 done
20 280 done
20 300 done
60 20 done
60 60 done
60 100 done
60 140 done
60 180 done
60 220 done
60 260 done
60 280 done
60 300 done
100 20 done
100 60 done
100 100 done
100 140 done
100 180 done
100 220 done
100 260 done
100 280 done
100 300 done
140 20 done
140 60 done
140 100 done
140 140 done
140 180 done
140 220 done
140 260 done
140 280 done
140 300 done
180 20 done
180 60 done
180 100 done
180 140 done
180 180 done
180 220 done
180 260 done
180 280 done
180 300 done
220 20 done
220 60 done
220 100 done
220 140 done
220 180 done
220 220 done
220 260 done
220 280 done
220 300 done
260 20 done
260 60 done
260 100 done
260 140 done
260 180 done
260 220 done
260 260 done
260 280 done
260 300 done
280 20 done
280 60 done
280 100 done
280 140 done
280 180 done
280 220 done
280 260 done
280 280 done
280 300 done
300 20 done
300 60 done
300 100 done
300 140 done
300 180 done
300 220 done
300 260 done
300 280 done
300 300 done
counts = {}
for f in glob.glob(specs['folder'] + "*"):
    counts[f[-3:]] = len(glob.glob(f + "/*"))
counts
{'4_0': 39986,
 '9_2': 1211,
 '2_0': 9459,
 '9_0': 1892,
 '7_0': 26610,
 '2_2': 6108,
 '1_0': 20442,
 '8_0': 1824,
 '6_0': 6948,
 '2_1': 5947,
 '3_0': 4827,
 '9_4': 62,
 '0_0': 25571,
 '5_0': 27725}
group_mapping = [
    ['9_0', '9_1', '9_2', '9_4', '9_5', '2_0', '2_1', '2_2'],
    ['1_0', '3_0', '5_0', '6_0', '8_0'],
    ['0_0', '4_0', '7_0']
]
group_counts = {0:0, 1:0, 2:0}
for key, val in counts.items():
    for i, g in enumerate(group_mapping):
        if key in g:
            group_counts[i] += val
group_counts
{0: 24679, 1: 61766, 2: 92167}

move them around to a proper place

specs = {
    'chips': "../urbangrammar_samba/spatial_signatures/chips/32/",
}
split = (.6, .2, .2)

subfolders = glob.glob(specs["chips"] + "*")

for t in ["train", "validation", "secret"]:
    os.makedirs(specs["chips"] + t, exist_ok=True)
for sub in subfolders:
    for t in ["train", "validation", "secret"]:
        os.makedirs(f"{specs['chips']}{t}/{Path(sub).stem}", exist_ok=True)
    files = glob.glob(sub + "/*.tif")
    count = len(files)
    for f in files[:int(count * split[0])]:
        f = Path(f)
        shutil.move(f, str(f.parent.parent) + "/train/" + f.parent.stem + "/" + f.name)
    for f in files[int(count * split[0]):int(count * (split[0] + split[1]))]:
        f = Path(f)
        shutil.move(f, str(f.parent.parent) + "/validation/" + f.parent.stem + "/" + f.name)
    for f in files[int(count * (split[0] + split[1])):]:
        f = Path(f)
        shutil.move(f, str(f.parent.parent) + "/secret/" + f.parent.stem + "/" + f.name)
    print(sub, "done")
../urbangrammar_samba/spatial_signatures/chips/32/4_0 done
../urbangrammar_samba/spatial_signatures/chips/32/9_2 done
../urbangrammar_samba/spatial_signatures/chips/32/2_0 done
../urbangrammar_samba/spatial_signatures/chips/32/9_0 done
../urbangrammar_samba/spatial_signatures/chips/32/7_0 done
../urbangrammar_samba/spatial_signatures/chips/32/2_2 done
../urbangrammar_samba/spatial_signatures/chips/32/1_0 done
../urbangrammar_samba/spatial_signatures/chips/32/8_0 done
../urbangrammar_samba/spatial_signatures/chips/32/6_0 done
../urbangrammar_samba/spatial_signatures/chips/32/2_1 done
../urbangrammar_samba/spatial_signatures/chips/32/3_0 done
../urbangrammar_samba/spatial_signatures/chips/32/9_4 done
../urbangrammar_samba/spatial_signatures/chips/32/0_0 done
../urbangrammar_samba/spatial_signatures/chips/32/5_0 done

Todo

  • For GB, do the train/validation/secret split during spilling the chips, not afterwards.

int(region.sel(band=1).quantile(.99)), int(region.sel(band=2).quantile(.99)), int(region.sel(band=3).quantile(.99))
(1386, 1378, 1589)
int(region.sel(band=1).quantile(.02)), int(region.sel(band=2).quantile(.02)), int(region.sel(band=3).quantile(.02))
(376, 637, 801)