Create chips from a temporal set of Sentinel 2 snapshotsΒΆ
create chip bounds augmented by sliding
sample S2
loop over tiles
get bounds by sjoin
loop over times
loop over bounds (bag)
check cloud probability/cloud mask
sample a chip
check for missing values
save a chip
# this cell does not work in the gds_env:7.0 and requires newer versions of GDAL and pyogrio. Works with GDAL 3.4.1, pyogrio 0.3.0.
# import pyogrio
# tiles_of_interest = [
# '29UQR', '30UUA', '29UQS', '30UWA', '30UVA', '30UXB', '30UWB',
# '30UVB', '30UXC', '30UWC', '30UVC', '30UXD', '30UWD', '30UUB',
# '29UQT', '30UUC', '29UQU', '30UUD', '30UVD', '30UXE', '30UWE',
# '30UVE', '30UXF', '30UWF', '30UVF', '30UXG', '30UWG', '30UVG',
# '29UQV', '30UUE', '30UUF', '30UUG', '30VWH', '30VVH', '29VPC',
# '30VUH', '30UYB', '31UCS', '30UYC', '31UCT', '31UDT', '30UYD',
# '31UCU', '31UDU', '30UYE', '31UCV'
# ]
# tile_geometry = pyogrio.read_dataframe("https://sentinels.copernicus.eu/documents/247904/1955685/S2A_OPER_GIP_TILPAR_MPC__20151209T095117_V20150622T000000_21000101T000000_B00.kml")
# tile_geometry = tile_geometry[tile_geometry.Name.isin(tiles_of_interest)].explode(index_parts=False)
# tile_geometry[tile_geometry.geom_type == "Polygon"].to_file("../chips_gb/sentinel_tiles.gpkg")
import os
import pyogrio
import geopandas
import glob
import math
import rasterio
import rioxarray
import matplotlib.pyplot as plt
import pandas
from pathlib import Path
from dask.distributed import Client, LocalCluster, as_completed
client = Client(LocalCluster(n_workers=16))
client
Client
Client-42df0a96-90dd-11ec-b8ad-33d6c79f778e
Connection method: Cluster object | Cluster type: distributed.LocalCluster |
Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
367344c5
Dashboard: http://127.0.0.1:8787/status | Workers: 16 |
Total threads: 16 | Total memory: 125.54 GiB |
Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-552ece40-15b3-4532-8c0e-81ff9a02c8af
Comm: tcp://127.0.0.1:45485 | Workers: 16 |
Dashboard: http://127.0.0.1:8787/status | Total threads: 16 |
Started: Just now | Total memory: 125.54 GiB |
Workers
Worker: 0
Comm: tcp://127.0.0.1:37517 | Total threads: 1 |
Dashboard: http://127.0.0.1:45131/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:33627 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-ti72jamp |
Worker: 1
Comm: tcp://127.0.0.1:33861 | Total threads: 1 |
Dashboard: http://127.0.0.1:37831/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:34055 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-li1ckzid |
Worker: 2
Comm: tcp://127.0.0.1:40359 | Total threads: 1 |
Dashboard: http://127.0.0.1:34507/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:37495 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-ugjw3vp0 |
Worker: 3
Comm: tcp://127.0.0.1:32769 | Total threads: 1 |
Dashboard: http://127.0.0.1:40385/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:44175 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-_of1varo |
Worker: 4
Comm: tcp://127.0.0.1:42397 | Total threads: 1 |
Dashboard: http://127.0.0.1:38323/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:36105 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-deugq5ut |
Worker: 5
Comm: tcp://127.0.0.1:45319 | Total threads: 1 |
Dashboard: http://127.0.0.1:33163/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:43973 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-w4fp16zz |
Worker: 6
Comm: tcp://127.0.0.1:43325 | Total threads: 1 |
Dashboard: http://127.0.0.1:38091/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:33521 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-dh_x9s5w |
Worker: 7
Comm: tcp://127.0.0.1:44189 | Total threads: 1 |
Dashboard: http://127.0.0.1:35107/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:42629 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-p13o2j4n |
Worker: 8
Comm: tcp://127.0.0.1:44887 | Total threads: 1 |
Dashboard: http://127.0.0.1:45457/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:39577 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-6d01bipq |
Worker: 9
Comm: tcp://127.0.0.1:39487 | Total threads: 1 |
Dashboard: http://127.0.0.1:38543/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:34711 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-2uyqfsay |
Worker: 10
Comm: tcp://127.0.0.1:34551 | Total threads: 1 |
Dashboard: http://127.0.0.1:41183/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:39319 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-jvypepw4 |
Worker: 11
Comm: tcp://127.0.0.1:42383 | Total threads: 1 |
Dashboard: http://127.0.0.1:45585/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:45477 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-qxm_twpf |
Worker: 12
Comm: tcp://127.0.0.1:37287 | Total threads: 1 |
Dashboard: http://127.0.0.1:43957/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:37429 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-1w8shg4v |
Worker: 13
Comm: tcp://127.0.0.1:43141 | Total threads: 1 |
Dashboard: http://127.0.0.1:42381/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:41919 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-s0uasck0 |
Worker: 14
Comm: tcp://127.0.0.1:39213 | Total threads: 1 |
Dashboard: http://127.0.0.1:34905/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:42205 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-qwd7v3ij |
Worker: 15
Comm: tcp://127.0.0.1:39195 | Total threads: 1 |
Dashboard: http://127.0.0.1:35877/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:42625 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-z4tun56j |
specs = {
'chip_size': 32,
'folder': (
'/home/jovyan/work/chips_gb/32_temporal/'
),
}
s = specs["chip_size"]
tiles = pyogrio.read_dataframe("../chips_gb/sentinel_tiles.gpkg").to_crs(27700)
def process_tile(row):
epsg = pandas.read_html(row.description)[0].loc[1, 1]
bounds_in = bounds.iloc[bounds.sindex.query(row.geometry, predicate="contains")].to_crs(int(epsg))
times = glob.glob(f"../../data/Sentinel2/{row.Name}/*")
for t in times:
try:
cloud_proba = glob.glob(f"{t}/*CLDPRB*")[0]
tci = glob.glob(f"{t}/*TCI*")[0]
for tup in bounds_in.itertuples():
with rasterio.open(cloud_proba) as f:
cldprb, transform = rasterio.mask.mask(
f, [tup.geometry], crop=True, all_touched=True
)
if (cldprb > 10).sum() < 10:
with rasterio.open(tci) as src:
profile = src.profile
profile.update(
width=s,
height=s,
driver="GTiff",
dtype=rasterio.uint8,
tiled=False
)
try:
img, transform = rasterio.mask.mask(
src, [tup.geometry], crop=True, all_touched=True
)
_, w, h = img.shape
rw = (w - s) / 2
rh = (h - s) / 2
img = img[:, math.floor(rw):-math.ceil(rw), math.floor(rh):-math.ceil(rh)]
if ((img[0] == img[1]) & (img[1] == img[2])).sum() < 3: # filter missingness
path = f"{specs['folder']}{tup.signature_type}/{tup.X}_{tup.Y}_{Path(t).stem}.tif"
with rasterio.open(path, 'w', **profile) as dst:
dst.write(img.astype(rasterio.uint8))
except ValueError:
pass
except IndexError:
pass
return f"Tile {row.Name} processed sucessfully."
subsets = ["train", "validation", "secret"]
subsets = ["secret"]
for sub in subsets:
specs['folder'] = f'/home/jovyan/work/chips_gb/32_temporal/{sub}/'
bounds = geopandas.read_parquet(f"../chips_gb/slided_{sub}_50k.pq")
centroid = bounds.centroid
bounds['X'] = centroid.x.astype(int)
bounds['Y'] = centroid.y.astype(int)
for t in bounds.signature_type.unique():
os.makedirs(f"{specs['folder']}{t}", exist_ok=True)
inputs = tiles.itertuples()
futures = [client.submit(process_tile, next(inputs)) for i in range(16)]
ac = as_completed(futures)
for finished_future in ac:
# submit new future
try:
new_future = client.submit(process_tile, next(inputs))
ac.add(new_future)
except StopIteration:
pass
print(finished_future.result())
Tile 29VPC processed sucessfully.
Tile 29UQU processed sucessfully.
Tile 29UQV processed sucessfully.
Tile 30UUE processed sucessfully.
Tile 30UUD processed sucessfully.
Tile 30UWA processed sucessfully.
Tile 29UQS processed sucessfully.
Tile 30UUF processed sucessfully.
Tile 29UQR processed sucessfully.
Tile 30UUB processed sucessfully.
Tile 30UUA processed sucessfully.
Tile 29UQT processed sucessfully.
Tile 30UUC processed sucessfully.
Tile 30UVA processed sucessfully.
Tile 30UVD processed sucessfully.
Tile 30UUG processed sucessfully.
Tile 30UVF processed sucessfully.
Tile 30UXG processed sucessfully.
Tile 30UVE processed sucessfully.
Tile 30UVB processed sucessfully.
Tile 30UWB processed sucessfully.
Tile 30UYE processed sucessfully.
Tile 30UWC processed sucessfully.
Tile 30VUH processed sucessfully.
Tile 30VWH processed sucessfully.
Tile 30UYB processed sucessfully.
Tile 30UXF processed sucessfully.
Tile 30UYD processed sucessfully.
Tile 31UCV processed sucessfully.
Tile 31UDT processed sucessfully.
Tile 30VVH processed sucessfully.
Tile 31UDU processed sucessfully.
Tile 30UVG processed sucessfully.
Tile 31UCS processed sucessfully.
Tile 30UWG processed sucessfully.
Tile 31UCU processed sucessfully.
Tile 30UXE processed sucessfully.
Tile 30UVC processed sucessfully.
Tile 30UWF processed sucessfully.
Tile 30UXB processed sucessfully.
client.restart()
distributed.nanny - WARNING - Worker process still alive after 1.5999980926513673 seconds, killing
Client
Client-42df0a96-90dd-11ec-b8ad-33d6c79f778e
Connection method: Cluster object | Cluster type: distributed.LocalCluster |
Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
367344c5
Dashboard: http://127.0.0.1:8787/status | Workers: 16 |
Total threads: 16 | Total memory: 125.54 GiB |
Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-552ece40-15b3-4532-8c0e-81ff9a02c8af
Comm: tcp://127.0.0.1:45485 | Workers: 16 |
Dashboard: http://127.0.0.1:8787/status | Total threads: 16 |
Started: 2 days ago | Total memory: 125.54 GiB |
Workers
Worker: 0
Comm: tcp://127.0.0.1:32787 | Total threads: 1 |
Dashboard: http://127.0.0.1:43601/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:33627 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-j8oryagm |
Worker: 1
Comm: tcp://127.0.0.1:34069 | Total threads: 1 |
Dashboard: http://127.0.0.1:44247/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:34055 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-1h58gw8c |
Worker: 2
Comm: tcp://127.0.0.1:37003 | Total threads: 1 |
Dashboard: http://127.0.0.1:35457/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:37495 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-__ba_jqe |
Worker: 3
Comm: tcp://127.0.0.1:34777 | Total threads: 1 |
Dashboard: http://127.0.0.1:36987/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:44175 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-_lbtrlex |
Worker: 4
Comm: tcp://127.0.0.1:45797 | Total threads: 1 |
Dashboard: http://127.0.0.1:39637/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:36105 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-cas91jbc |
Worker: 5
Comm: tcp://127.0.0.1:35143 | Total threads: 1 |
Dashboard: http://127.0.0.1:45425/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:43973 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-ynoiyio4 |
Worker: 6
Comm: tcp://127.0.0.1:37719 | Total threads: 1 |
Dashboard: http://127.0.0.1:40693/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:33521 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-n9gwy0ys |
Worker: 7
Comm: tcp://127.0.0.1:41083 | Total threads: 1 |
Dashboard: http://127.0.0.1:40535/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:42629 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-zk9pyujw |
Worker: 8
Comm: tcp://127.0.0.1:38551 | Total threads: 1 |
Dashboard: http://127.0.0.1:41691/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:39577 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-ibiaq9w3 |
Worker: 9
Comm: tcp://127.0.0.1:40411 | Total threads: 1 |
Dashboard: http://127.0.0.1:34633/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:34711 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-kfm98rib |
Worker: 10
Comm: tcp://127.0.0.1:40351 | Total threads: 1 |
Dashboard: http://127.0.0.1:38361/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:39319 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-e3dkz0qp |
Worker: 11
Comm: tcp://127.0.0.1:38803 | Total threads: 1 |
Dashboard: http://127.0.0.1:36489/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:45477 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-jfekx6or |
Worker: 12
Comm: tcp://127.0.0.1:44583 | Total threads: 1 |
Dashboard: http://127.0.0.1:43459/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:37429 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-o46cbhao |
Worker: 13
Comm: tcp://127.0.0.1:42945 | Total threads: 1 |
Dashboard: http://127.0.0.1:41571/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:41919 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-_hhso_h2 |
Worker: 14
Comm: tcp://127.0.0.1:41585 | Total threads: 1 |
Dashboard: http://127.0.0.1:42929/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:42205 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-f8yiasy_ |
Worker: 15
Comm: tcp://127.0.0.1:37511 | Total threads: 1 |
Dashboard: http://127.0.0.1:44685/status | Memory: 7.85 GiB |
Nanny: tcp://127.0.0.1:42625 | |
Local directory: /home/jovyan/work/signature_ai/dask-worker-space/worker-r6tp3sta |