Urban morphometrics¶
Morpohometric assessment measure wide range of characters of urban form to derive a complex description of built-up patterns composed of enclosed tessellation, buildings and street network.
All algorithms used within this notebook are part of momepy
Python toolkit and can be used from there. We have extracted them from momepy
, adapted for dask
and pygeos
and used in raw form tailored directly to our use case. The algorithms which were enhanced are pushed back to momepy and will be part of momepy
0.4.0.
All steps within this notebook are parallelised using dask
. The first part, which measures aspects of individual elements (does not require to know the context) uses pre-release of dask-geopandas
. The rest uses dask
to manage parallel iteration over geo-chunks with single-core algorithms.
Some functions are imported from a momepy_utils.py
file stored wihtin this directory. Those are either helper functions taken directly from momepy or their enhanced versions, all which will be included in the next release of momepy:
get_edge_ratios
is implemented in momepy 0.4.0 asget_network_ratio
get_nodes
is included inget_node_id
remaining functions have been used to refactor existing momepy classes.
Individual elements¶
Note: Requires dask-geopandas and current master of geopandas to support dask version or gds_py:6.0
.
# !pip install git+git://github.com/jsignell/dask-geopandas.git
# !pip install git+git://github.com/geopandas/geopandas.git
import time
import warnings
from time import time
import dask.dataframe as dd
import dask_geopandas as dask_geopandas
import geopandas
import libpysal
import momepy
import networkx as nx
import numpy as np
import pandas as pd
import pygeos
import scipy
from tqdm.notebook import tqdm
from dask.distributed import Client, LocalCluster, as_completed
from libpysal.weights import Queen
from momepy_utils import (
_circle_radius,
centroid_corner,
elongation,
get_corners,
get_edge_ratios,
get_nodes,
solar_orientation_poly,
squareness,
)
We are using a single machine wihtin this notebook with 14 cores, so we start local dask cluster with 14 workers.
client = Client(LocalCluster(n_workers=14))
client
dask-geopandas
is still under development and raises few warnigns at the moment, all which can be ignored.
warnings.filterwarnings('ignore', message='.*initial implementation of Parquet.*')
warnings.filterwarnings('ignore', message='.*Assigning CRS to a GeoDataFrame without a geometry*')
Measuring buildings and enclosed cells¶
In the first step, we iterate over geo-chunks, merge enclosed tessellation and buildings to a single geopandas.GeoDataFrame
and convert it to dask.GeoDataFrame
. The rest of the code is mostly an extraction from momepy source code adapted for dask.
for chunk_id in tqdm(range(103), total=103):
# Load data and merge them together
blg = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/buildings/blg_{chunk_id}.pq")
tess = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/tessellation/tess_{chunk_id}.pq")
blg = blg.rename_geometry('buildings')
tess = tess.rename_geometry('tessellation')
df = tess.merge(blg, on='uID', how='left')
# Convert to dask.GeoDataFrame
ddf = dask_geopandas.from_geopandas(df, npartitions=14)
## Measure morphometric characters
# Building area
ddf['sdbAre'] = ddf.buildings.area
# Building perimeter
ddf['sdbPer'] = ddf.buildings.length
# Courtyard area
exterior_area = ddf.buildings.map_partitions(lambda series: pygeos.area(pygeos.polygons(series.exterior.values.data)), meta='float')
ddf['sdbCoA'] = exterior_area - ddf['sdbAre']
# Circular compactness
hull = ddf.buildings.convex_hull.exterior
radius = hull.apply(lambda g: _circle_radius(list(g.coords)) if g is not None else None, meta='float')
ddf['ssbCCo'] = ddf['sdbAre'] / (np.pi * radius ** 2)
# Corners
ddf['ssbCor'] = ddf.buildings.apply(lambda g: get_corners(g), meta='float')
# Squareness
ddf['ssbSqu'] = ddf.buildings.apply(lambda g: squareness(g), meta='float')
# Equivalent rectangular index
bbox = ddf.buildings.apply(lambda g: g.minimum_rotated_rectangle if g is not None else None, meta=geopandas.GeoSeries())
ddf['ssbERI'] = (ddf['sdbAre'] / bbox.area).pow(1./2) * (bbox.length / ddf['sdbPer'])
# Elongation
ddf['ssbElo'] = bbox.map_partitions(lambda s: elongation(s), meta='float')
# Centroid corner mean distance and deviation
def _centroid_corner(series):
ccd = series.apply(lambda g: centroid_corner(g))
return pd.DataFrame(ccd.to_list(), index=series.index)
ddf[['ssbCCM', 'ssbCCD']] = ddf.buildings.map_partitions(_centroid_corner, meta=pd.DataFrame({0: [0.1], 1: [1.1]}))
# Solar orientation
ddf['stbOri'] = bbox.apply(lambda g: solar_orientation_poly(g), meta='float')
# Tessellation longest axis length
hull = ddf.tessellation.convex_hull.exterior
ddf['sdcLAL'] = hull.apply(lambda g: _circle_radius(list(g.coords)), meta='float') * 2
# Tessellation area
ddf['sdcAre'] = ddf.tessellation.area
# Circular compactness
radius = hull.apply(lambda g: _circle_radius(list(g.coords)), meta='float')
ddf['sscCCo'] = ddf['sdcAre'] / (np.pi * radius ** 2)
# Equivalent rectangular index
bbox = ddf.tessellation.apply(lambda g: g.minimum_rotated_rectangle, meta=geopandas.GeoSeries())
ddf['sscERI'] = (ddf['sdcAre'] / bbox.area).pow(1./2) * (bbox.length / ddf.tessellation.length)
# Solar orientation
ddf['stcOri'] = bbox.apply(lambda g: solar_orientation_poly(g), meta='float')
# Covered area ratio
ddf['sicCAR'] = ddf['sdbAre'] / ddf['sdcAre']
# Building-cell alignment
ddf['stbCeA'] = (ddf['stbOri'] - ddf['stcOri']).abs()
# Compute all characters using dask
df = ddf.compute()
# Save to parquet file
df.to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
client.restart()
time.sleep(5)
Measuring enclosures¶
All enclosures are loaded as a single dask.GeoDataFrame and measured at once.
%%time
# Load data
encl = dask_geopandas.read_parquet("../../urbangrammar_samba/spatial_signatures/enclosures/encl_*.pq")
# Area
encl['ldeAre'] = encl.geometry.area
# Perimeter
encl['ldePer'] = encl.geometry.length
# Circular compacntess
hull = encl.geometry.convex_hull.exterior
radius = hull.apply(lambda g: _circle_radius(list(g.coords)) if g is not None else None, meta='float')
encl['lseCCo'] = encl['ldeAre'] / (np.pi * radius ** 2)
# Equivalent rectangular index
bbox = encl.geometry.apply(lambda g: g.minimum_rotated_rectangle if g is not None else None, meta=geopandas.GeoSeries())
encl['lseERI'] = (encl['ldeAre'] / bbox.area).pow(1./2) * (bbox.length / encl['ldePer'])
# Compactness-weighted axis
longest_axis = hull.apply(lambda g: _circle_radius(list(g.coords)), meta='float') * 2
encl['lseCWA'] = longest_axis * ((4 / np.pi) - (16 * encl['ldeAre']) / ((encl['ldePer']) ** 2))
# Solar orientation
encl['lteOri'] = bbox.apply(lambda g: solar_orientation_poly(g), meta='float')
# Compute data and return geopandas.GeoDataFrame
encl_df = encl.compute()
# Weighted number of neighbors
inp, res = encl_df.sindex.query_bulk(encl_df.geometry, predicate='intersects')
indices, counts = np.unique(inp, return_counts=True)
encl_df['neighbors'] = counts - 1
encl_df['lteWNB'] = encl_df['neighbors'] / encl_df['ldePer']
# Load complete enclosed tessellation as a dask.GeoDataFrame
tess = dd.read_parquet("../../urbangrammar_samba/spatial_signatures/tessellation/tess_*.pq")
# Measure weighted cells within enclosure
encl_counts = tess.groupby('enclosureID').count().compute()
merged = encl_df[['enclosureID', 'ldeAre']].merge(encl_counts[['geometry']], how='left', on='enclosureID')
encl_df['lieWCe'] = merged['geometry'] / merged['ldeAre']
# Save data to parquet
encl_df.drop(columns='geometry').to_parquet("../../urbangrammar_samba/spatial_signatures/morphometrics/enclosures.pq")
We can now close dask client.
client.close()
Generate spatial weights (W)¶
Subsequent steps will require understanding of the context of each tessellation cell in a form of spatial weights matrices (Queen contiguity and Queen contiguty of inclusive 3rd order). We generate them beforehand and store as npz
files representing sparse matrix.
Each geo-chunk is loaded together with relevant cross-chunk tessellation cells (to avoid edge effect). We use dask to parallelise the iteration. Number of workers is smaller now to ensure enough memory for each chunk.
workers = 8
client = Client(LocalCluster(n_workers=workers, threads_per_worker=1))
client
First we have to specify a function doing the processing itself, where the only attribure is the chunk_id
.
def generate_w(chunk_id):
# load cells of a chunk
cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
# add neighbouring cells from other chunks
cross_chunk_cells = []
for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
cross_chunk_cells.append(add_cells)
df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)
w = libpysal.weights.Queen.from_dataframe(df, geom_col='tessellation')
w3 = momepy.sw_high(k=3, weights=w)
scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w_{chunk_id}.npz", w.sparse)
scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w3_{chunk_id}.npz", w3.sparse)
return f"Chunk {chunk_id} processed sucessfully."
Then we use dask to iterate over all 103 chunks. The following script sends first 8 chunks to dask together and then submits a new chunk as soon as any of previous finishes (courtesy of Matthew Rocklin). That way we process only 8 chunks at once ensuring that we the cluster will not run out of memory.
%%time
inputs = iter(range(103))
futures = [client.submit(generate_w, next(inputs)) for i in range(workers)]
ac = as_completed(futures)
for finished_future in ac:
# submit new future
try:
new_future = client.submit(generate_w, next(inputs))
ac.add(new_future)
except StopIteration:
pass
print(finished_future.result())
client.close()
Spatial distribution and network analysis¶
To measure spatial distribution of we use single-core algorithm and parallelise iteration.
workers = 8
client = Client(LocalCluster(n_workers=workers, threads_per_worker=1))
client
We will need to load street network data from PostGIS datatabase, so we establish a connection which will be used within the loop.
cross_chunk = pd.read_parquet('../../urbangrammar_samba/spatial_signatures/cross-chunk_indices.pq')
chunks = geopandas.read_parquet('../../urbangrammar_samba/spatial_signatures/local_auth_chunks.pq')
user = os.environ.get('DB_USER')
pwd = os.environ.get('DB_PWD')
host = os.environ.get('DB_HOST')
port = os.environ.get('DB_PORT')
db_connection_url = f"postgres+psycopg2://{user}:{pwd}@{host}:{port}/built_env"
Within the same function below we measure spatial distribution of elements and network-based characters.
def measure(chunk_id):
# load cells of a chunk
cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
cells['keep'] = True
# add neighbouring cells from other chunks
cross_chunk_cells = []
for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
add_cells['keep'] = False
cross_chunk_cells.append(add_cells)
df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)
# read W
w = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w_{chunk_id}.npz")).to_W()
# alignment
def alignment(x, orientation='stbOri'):
orientations = df[orientation].iloc[w.neighbors[x]]
return abs(orientations - df[orientation].iloc[x]).mean()
df['mtbAli'] = [alignment(x) for x in range(len(df))]
# mean neighbour distance
def neighbor_distance(x):
geom = df.buildings.iloc[x]
if geom is None:
return np.nan
return df.buildings.iloc[w.neighbors[x]].distance(df.buildings.iloc[x]).mean()
df['mtbNDi'] = [neighbor_distance(x) for x in range(len(df))]
# weighted neighbours
df['mtcWNe'] = pd.Series([w.cardinalities[x] for x in range(len(df))], index=df.index) / df.tessellation.length
# area covered by neighbours
def area_covered(x, area='sdcAre'):
neighbours = [x]
neighbours += w.neighbors[x]
return df[area].iloc[neighbours].sum()
df['mdcAre'] = [area_covered(x) for x in range(len(df))]
# read W3
w3 = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w3_{chunk_id}.npz")).to_W()
# weighted reached enclosures
def weighted_reached_enclosures(x, area='sdcAre', enclosure_id='enclosureID'):
neighbours = [x]
neighbours += w3.neighbors[x]
vicinity = df[[area, enclosure_id]].iloc[neighbours]
return vicinity[enclosure_id].unique().shape[0] / vicinity[area].sum()
df['ltcWRE'] = [weighted_reached_enclosures(x) for x in range(len(df))]
# mean interbuilding distance
# define adjacency list from lipysal
adj_list = w.to_adjlist(remove_symmetric=False)
adj_list["weight"] = (
df.buildings.iloc[adj_list.focal]
.reset_index(drop=True)
.distance(df.buildings.iloc[adj_list.neighbor].reset_index(drop=True)).values
)
G = nx.from_pandas_edgelist(
adj_list, source="focal", target="neighbor", edge_attr="weight"
)
ibd = []
for i in range(len(df)):
try:
sub = nx.ego_graph(G, i, radius=3)
ibd.append(np.nanmean([x[-1] for x in list(sub.edges.data('weight'))]))
except:
ibd.append(np.nan)
df['ltbIBD'] = ibd
# Reached neighbors and area on 3 topological steps on tessellation
df['ltcRea'] = [w3.cardinalities[i] for i in range(len(df))]
df['ltcAre'] = [df.sdcAre.iloc[w3.neighbors[i]].sum() for i in range(len(df))]
# Save cells to parquet keeping only within-chunk data not the additional neighboring
df[df['keep']].drop(columns=['keep']).to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
# Load street network for an extended chunk area
chunk_area = chunks.geometry.iloc[chunk_id].buffer(5000) # we extend the area by 5km to minimise edge effect
engine = create_engine(db_connection_url)
sql = f"SELECT * FROM openroads_200803_topological WHERE ST_Intersects(geometry, ST_GeomFromText('{chunk_area.wkt}',27700))"
streets = geopandas.read_postgis(sql, engine, geom_col='geometry')
# Street profile (measures width, width deviation and openness)
sp = street_profile(streets, blg)
streets['sdsSPW'] = sp[0]
streets['sdsSWD'] = sp[1]
streets['sdsSPO'] = sp[2]
# Street segment length
streets['sdsLen'] = streets.length
# Street segment linearity
streets['sssLin'] = momepy.Linearity(streets).series
# Convert geopadnas.GeoDataFrame to networkx.Graph for network analysis
G = momepy.gdf_to_nx(streets)
# Node degree
G = momepy.node_degree(G)
# Subgraph analysis (meshedness, proportion of 0, 3 and 4 way intersections, local closeness)
G = momepy.subgraph(
G,
radius=5,
meshedness=True,
cds_length=False,
mode="sum",
degree="degree",
length="mm_len",
mean_node_degree=False,
proportion={0: True, 3: True, 4: True},
cyclomatic=False,
edge_node_ratio=False,
gamma=False,
local_closeness=True,
closeness_weight="mm_len",
verbose=False
)
# Cul-de-sac length
G = momepy.cds_length(G, radius=3, name="ldsCDL", verbose=False)
# Square clustering
G = momepy.clustering(G, name="xcnSCl")
# Mean node distance
G = momepy.mean_node_dist(G, name="mtdMDi", verbose=False)
# Convert networkx.Graph back to GeoDataFrames and W (denoting relationships between nodes)
nodes, edges, sw = momepy.nx_to_gdf(G, spatial_weights=True)
# Generate inclusive higher order weights
edges_w3 = momepy.sw_high(k=3, gdf=edges)
# Mean segment length
edges["ldsMSL"] = momepy.SegmentsLength(edges, spatial_weights=edges_w3, mean=True, verbose=False).series
# Generate inclusive higher order weights
nodes_w5 = momepy.sw_high(k=5, weights=sw)
# Node density
nodes["lddNDe"] = momepy.NodeDensity(nodes, edges, nodes_w5, verbose=False).series
# Weighter node density
nodes["linWID"] = momepy.NodeDensity(nodes, edges, nodes_w5, weighted=True, node_degree="degree", verbose=False).series
# Save to parquets
edges.to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/edges/edges_{chunk_id}.pq")
nodes.to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/nodes/nodes_{chunk_id}.pq")
return f"Chunk {chunk_id} processed sucessfully."
Again we use dask to iterate over all 103 chunks. The following script sends first 8 chunks to dask together and then submits a new chunk as soon as any of previous finishes. That way we process only 8 chunks at once ensuring that we the cluster will not run out of memory.
inputs = iter(range(103))
futures = [client.submit(measure, next(inputs)) for i in range(workers)]
ac = as_completed(futures)
for finished_future in ac:
# submit new future
try:
new_future = client.submit(measure, next(inputs))
ac.add(new_future)
except StopIteration:
pass
print(finished_future.result())
client.close()
Link elements together¶
For the further analysis, we need to link data measured on individual elements together. We link cells to edges based on the proportion of overlap (if a cell intersects more than one edge) and nodes based on proximity (with a restriction - node has to be on linked edge). Enclosures are linked based on enclosure ID.
As above, we define a single-core function and use dask to manage parallel iteration.
def link(chunk_id):
s = time()
cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
edges = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/edges/edges_{chunk_id}.pq")
nodes = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/nodes/nodes_{chunk_id}.pq")
cells['edgeID'] = get_edge_ratios(cells, edges)
cells['nodeID'] = get_nodes(cells, nodes, edges, 'nodeID', 'edgeID', 'node_start', 'node_end')
characters = ['sdsSPW', 'sdsSWD', 'sdsSPO', 'sdsLen', 'sssLin', 'ldsMSL']
l = []
for d in cells.edgeID:
l.append((edges.iloc[list(d.keys())][characters].multiply(list(d.values()), axis='rows')).sum(axis=0))
cells[characters] = pd.DataFrame(l, index=cells.index)
cells = cells.merge(nodes.drop(columns=['geometry']), on='nodeID', how='left')
cells = cells.rename({'degree': 'mtdDeg', 'meshedness': 'lcdMes', 'proportion_3': 'linP3W', 'proportion_4': 'linP4W',
'proportion_0': 'linPDE', 'local_closeness': 'lcnClo'}, axis='columns')
cells['edgeID_keys'] = cells.edgeID.apply(lambda d: list(d.keys()))
cells['edgeID_values'] = cells.edgeID.apply(lambda d: list(d.values()))
cells.drop(columns='edgeID').to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."
workers = 14
client = Client(LocalCluster(n_workers=workers, threads_per_worker=1))
client
%%time
inputs = iter(range(103))
futures = [client.submit(link, next(inputs)) for i in range(workers)]
ac = as_completed(futures)
for finished_future in ac:
# submit new future
try:
new_future = client.submit(link, next(inputs))
ac.add(new_future)
except StopIteration:
pass
print(finished_future.result())
client.close()
Enclosures are linked via simple attribute join and since the operation is does not require any computation, it is done as a simple loop.
enclosures = pd.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/enclosures.pq")
for chunk_id in range(103):
s = time()
cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
cells = cells.merge(enclosures.drop(columns=['neighbors']), on='enclosureID', how='left')
cells.to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
print(f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds.")
Inter-element characters¶
The remaining morphometric characters are based on a relations between multiple elements. The implementation mirrors the approach above.
workers = 8
client = Client(LocalCluster(n_workers=workers, threads_per_worker=1))
client
def measure(chunk_id):
s = time()
# Load data
cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
edges = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/edges/edges_{chunk_id}.pq")
nodes = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/nodes/nodes_{chunk_id}.pq")
# Street Alignment
edges['orient'] = momepy.Orientation(edges, verbose=False).series
edges['edgeID'] = range(len(edges))
keys = cells.edgeID_values.apply(lambda a: np.argmax(a))
cells['edgeID_primary'] = [inds[i] for inds, i in zip(cells.edgeID_keys, keys)]
cells['stbSAl'] = momepy.StreetAlignment(cells,
edges,
'stbOri',
left_network_id='edgeID_primary',
right_network_id='edgeID').series
# Area Covered by each edge
vals = {x:[] for x in range(len(edges))}
for i, keys in enumerate(cells.edgeID_keys):
for k in keys:
vals[k].append(i)
area_sums = []
for inds in vals.values():
area_sums.append(cells.sdcAre.iloc[inds].sum())
edges['sdsAre'] = area_sums
# Building per meter
bpm = []
for inds, l in zip(vals.values(), edges.sdsLen):
bpm.append(cells.buildings.iloc[inds].notna().sum() / l if len(inds) > 0 else 0)
edges['sisBpM'] = bpm
# Cell area
nodes['sddAre'] = nodes.nodeID.apply(lambda nid: cells[cells.nodeID == nid].sdcAre.sum())
# Area covered by neighboring edges + count of reached cells
edges_W = Queen.from_dataframe(edges)
areas = []
reached_cells = []
for i in range(len(edges)):
neighbors = [i] + edges_W.neighbors[i]
# areas
areas.append(edges.sdsAre.iloc[neighbors].sum())
# reached cells
ids = []
for n in neighbors:
ids += vals[n]
reached_cells.append(len(set(ids)))
edges['misCel'] = reached_cells
edges['mdsAre'] = areas
# Area covered by neighboring (3 steps) edges + count of reached cells
edges_W3 = momepy.sw_high(k=3, weights=edges_W)
areas = []
reached_cells = []
for i in range(len(edges)):
neighbors = [i] + edges_W3.neighbors[i]
# areas
areas.append(edges.sdsAre.iloc[neighbors].sum())
# reached cells
ids = []
for n in neighbors:
ids += vals[n]
reached_cells.append(len(set(ids)))
edges['lisCel'] = reached_cells
edges['ldsAre'] = areas
# Link together
e_to_link = ['sdsAre', 'sisBpM', 'misCel', 'mdsAre', 'lisCel', 'ldsAre']
n_to_link = 'sddAre'
cells = cells.merge(nodes[['nodeID', 'sddAre']], on='nodeID', how='left')
l = []
for keys, values in zip(cells.edgeID_keys, cells.edgeID_values):
l.append((edges.iloc[keys][e_to_link].multiply(values, axis='rows')).sum(axis=0)) # weighted by the proportion
cells[e_to_link] = pd.DataFrame(l, index=cells.index)
# Reached neighbors and area on 3 topological steps on tessellation
cells['keep'] = True
# add neighbouring cells from other chunks
cross_chunk_cells = []
for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
add_cells['keep'] = False
cross_chunk_cells.append(add_cells)
df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)
w3 = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w3_{chunk_id}.npz")).to_W()
# Reached cells in 3 topological steps
df['ltcRea'] = [w3.cardinalities[i] for i in range(len(df))]
# Reached area in 3 topological steps
df['ltcAre'] = [df.sdcAre.iloc[w3.neighbors[i]].sum() for i in range(len(df))]
# Save
df[df['keep']].drop(columns=['keep']).to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."
%%time
inputs = iter(range(103))
futures = [client.submit(measure, next(inputs)) for i in range(workers)]
ac = as_completed(futures)
for finished_future in ac:
# submit new future
try:
new_future = client.submit(measure, next(inputs))
ac.add(new_future)
except StopIteration:
pass
print(finished_future.result())
client.close()
At this point, all primary morphometric characters are measured and stored in a chunked parquet.
Convolution¶
Morphometric variables are an input of cluster analysis, which should result in delineation of spatial signatures. However, primary morphometric characters can’t be used directly. We have to understand them in context. For that reason, we introduce a convolution step. Each of the characters above will be expressed as a distance-decay-weighted first, second (median) and third quartile within 10 topological steps on enclosed tessellation. Distance is measured between centres of maximum inscribed circles of each geometry. Resulting convolutional data will be then used as an input of cluster analysis.
Generate weights of 10th order¶
cross_chunk = pd.read_parquet('../../urbangrammar_samba/spatial_signatures/cross-chunk_indices_10.pq')
def generate_w(chunk_id):
s = time()
# load cells of a chunk
cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
# add neighbouring cells from other chunks
cross_chunk_cells = []
for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
cross_chunk_cells.append(add_cells)
df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)
w = libpysal.weights.Queen.from_dataframe(df, geom_col='tessellation', silence_warnings=True)
w10 = momepy.sw_high(k=10, weights=w)
scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_queen_{chunk_id}.npz", w.sparse)
scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_10_{chunk_id}.npz", w10.sparse)
return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."
# I am afraid that we would run out of memory if we did this in parallel
for i in tqdm(range(103), total=103):
print(generate_w(i))
#### Generate distance weights within 10th order
def generate_distance_w(chunk_id):
s = time()
# load cells of a chunk
points = geopandas.read_parquet(f'../../urbangrammar_samba/spatial_signatures/inscribed_circle/circle_{chunk_id}.pq')
# add neighbouring cells from other chunks
cross_chunk_cells = []
for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/inscribed_circle/circle_{chunk}.pq").iloc[inds]
cross_chunk_cells.append(add_cells)
df = points.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)
w = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_10_{chunk_id}.npz")).to_W()
for i, (radius, geom) in enumerate(points[['radius', 'geometry']].itertuples(index=False)):
neighbours = w.neighbors[i]
vicinity = df.iloc[neighbours]
distance = vicinity.distance(geom).to_list()
distance.append(radius)
w.neighbors[i].append(i)
w.weights[i] = distance
scipy.sparse.save_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_10_distance_circles_{chunk_id}.npz", w.sparse)
return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."
# I am afraid that we would run out of memory if we did this in parallel
for i in tqdm(range(103), total=103):
print(generate_distance_w(i))
Measure distance-decay weighted convolutions¶
characters = ['sdbAre', 'sdbPer', 'sdbCoA', 'ssbCCo', 'ssbCor', 'ssbSqu', 'ssbERI', 'ssbElo',
'ssbCCM', 'ssbCCD', 'stbOri', 'sdcLAL', 'sdcAre', 'sscCCo', 'sscERI', 'stcOri',
'sicCAR', 'stbCeA', 'mtbAli', 'mtbNDi', 'mtcWNe', 'mdcAre', 'ltcWRE', 'ltbIBD',
'sdsSPW', 'sdsSWD', 'sdsSPO', 'sdsLen', 'sssLin', 'ldsMSL', 'mtdDeg', 'lcdMes',
'linP3W', 'linP4W', 'linPDE', 'lcnClo', 'ldsCDL', 'xcnSCl', 'mtdMDi', 'lddNDe',
'linWID', 'stbSAl', 'sddAre', 'sdsAre', 'sisBpM', 'misCel', 'mdsAre', 'lisCel',
'ldsAre', 'ltcRea', 'ltcAre', 'ldeAre', 'ldePer', 'lseCCo', 'lseERI', 'lseCWA',
'lteOri', 'lteWNB', 'lieWCe'
]
def convolute(chunk_id):
s = time()
cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk_id}.pq")
cells['keep'] = True
# add neighbouring cells from other chunks
cross_chunk_cells = []
for chunk, inds in cross_chunk.loc[chunk_id].indices.iteritems():
add_cells = geopandas.read_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/cells/cells_{chunk}.pq").iloc[inds]
add_cells['keep'] = False
cross_chunk_cells.append(add_cells)
df = cells.append(pd.concat(cross_chunk_cells, ignore_index=True), ignore_index=True)
# read W
W = libpysal.weights.WSP(scipy.sparse.load_npz(f"../../urbangrammar_samba/spatial_signatures/weights/w10_10_distance_circles_{chunk_id}.npz")).to_W()
# prepare dictionary to store results
convolutions = {}
for c in characters:
convolutions[c] = []
# measure convolutions
for i in range(len(df)):
neighbours = W.neighbors[i]
vicinity = df.iloc[neighbours]
distance = W.weights[i]
distance_decay = 1 / np.array(distance)
for c in characters:
values = vicinity[c].values
sorter = np.argsort(values)
values = values[sorter]
nan_mask = np.isnan(values)
if nan_mask.all():
convolutions[c].append(np.array([np.nan] * 3))
else:
sample_weight = distance_decay[sorter][~nan_mask]
weighted_quantiles = np.cumsum(sample_weight) - 0.5 * sample_weight
weighted_quantiles /= np.sum(sample_weight)
interpolate = np.interp([.25, .5, .75], weighted_quantiles, values[~nan_mask])
convolutions[c].append(interpolate)
# save convolutions to parquet file
conv = pd.DataFrame(convolutions, index=df.index)
exploded = pd.concat([pd.DataFrame(conv[c].to_list(), columns=[c + '_q1', c + '_q2',c + '_q3']) for c in characters], axis=1)
exploded[df.keep].to_parquet(f"../../urbangrammar_samba/spatial_signatures/morphometrics/convolutions/conv_{chunk_id}.pq")
return f"Chunk {chunk_id} processed sucessfully in {time() - s} seconds."
# I am afraid that we would run out of memory if we did this in parallel
for i in tqdm(range(103), total=103):
print(convolute(i))