Generate enclosures for the Great Britain

This notebook generates enclosures for the area of Great Britain based on set barriers. Note that the final version is different from the proof of a concept. It does not use dask since we found that parallelisation is not necessary and it uses different preprocessing steps for railway.

Note: An algorithm to generate enclosures has been implemented in momepy 0.4.0 as momepy.enclosures.

Used barriers:

  • road network (OS OpenRoads)

  • railway network (OS OpenMap Local)

  • rivers (OS OpenRivers)

  • coastline (OS Strategi®)

Connect to db:

import os

import geopandas as gpd

from sqlalchemy import create_engine

user = os.environ.get('DB_USER')
pwd = os.environ.get('DB_PWD')
host = os.environ.get('DB_HOST')
port = os.environ.get('DB_PORT')

db_connection_url = f"postgres+psycopg2://{user}:{pwd}@{host}:{port}/built_env"
engine = create_engine(db_connection_url)

Load initial data:

%%time
sql = f'SELECT * FROM openroads_200803_topological'
roads = gpd.read_postgis(sql, engine, geom_col='geometry')
CPU times: user 2min 7s, sys: 4.94 s, total: 2min 12s
Wall time: 2min 17s
%%time
# filter out tunnels
roads = roads[roads.roadStructure != 'Road In Tunnel']
CPU times: user 1.65 s, sys: 18.1 ms, total: 1.67 s
Wall time: 1.66 s
%%time
sql = f'SELECT * FROM gb_coastline_2016'
coastline = gpd.read_postgis(sql, engine, geom_col='geometry')
CPU times: user 330 ms, sys: 7.78 ms, total: 337 ms
Wall time: 378 ms

Generate enclosures

First level

The first level enclosures are composed of road network and external boundary, which in our case is a coastline.

import pygeos
import pandas as pd

from shapely.ops import polygonize
from shapely.geometry import Point

from snap import line_to_line, close_gaps
from consolidate import topology

There three simple steps:

  1. merge layers to a single GeoSeries.

  2. union geometries to a single MultiLineString. That helps with precision of polygonize.

  3. polygonize data and save them as GeometryArray.

%%time
barriers = pd.concat([roads.geometry, coastline.geometry])
CPU times: user 25.5 ms, sys: 0 ns, total: 25.5 ms
Wall time: 23 ms
%%time
unioned = barriers.unary_union
CPU times: user 14min 16s, sys: 7.33 s, total: 14min 23s
Wall time: 14min 13s
%%time
polygons = polygonize(unioned)
enclosures = gpd.array.from_shapely(list(polygons), crs=roads.crs)
CPU times: user 1min 4s, sys: 482 ms, total: 1min 4s
Wall time: 1min 3s

Additional barriers

Additional barriers are used to further subdivide those generated above.

%%time
sql = f'SELECT * FROM openmap_railwaytrack_200824'
railway = gpd.read_postgis(sql, engine, geom_col='geometry')
CPU times: user 2.95 s, sys: 15.8 ms, total: 2.97 s
Wall time: 3.06 s
%%time
sql = f'SELECT * FROM openrivers_200909'
rivers = gpd.read_postgis(sql, engine, geom_col='geometry')
CPU times: user 9.97 s, sys: 190 ms, total: 10.2 s
Wall time: 10.8 s

Preprocess railways

Railways are not optimal input. OS OpenMap Local is cartographic resource and LineStrings representing railways are split into multiple pieces, sometimes even with gaps in between. Moreover, there is almost always a gap where railways cross roads. All that needs to be fixed, otherwise we won’t get enclosed geometry.

Preprocessing fucntions topology, close_gaps and line_to_line have been implemented in momepy 0.4.0 as momepy.remove_false_nodes, momepy.close_gaps and momepy.extend_lines.

The first step is to clean topology - remove nodes of a degree 2.

%%time
railway_topo = topology(railway)
CPU times: user 4min 42s, sys: 7.19 ms, total: 4min 42s
Wall time: 4min 42s

Second step closes gaps between LineStrings and then fixes resulting topology again.

%%time
closed = close_gaps(railway_topo, tolerance=25)
CPU times: user 5min 8s, sys: 506 ms, total: 5min 8s
Wall time: 5min 7s
%%time
closed_topo = topology(gpd.GeoDataFrame(geometry=closed))
CPU times: user 10.6 s, sys: 3.96 ms, total: 10.6 s
Wall time: 10.6 s

Finally, we extend lines to adjacent road geometry to close the area.

%%time
extended_topo = line_to_line(closed_topo, roads, 25)
CPU times: user 13.5 s, sys: 19.5 ms, total: 13.5 s
Wall time: 13.5 s

Subdivide enclosures

With the preprocessed data, we can subdivide first level enclosures into final ones.

Due to the current transition between pygeos and shapely 2.0, we are using here private _pygeos_to_shapely function from GeoPandas. That will not be needed in future.

import itertools

import numpy as np
import dask.bag as db

from tqdm.notebook import tqdm
from geopandas._vectorized import _pygeos_to_shapely

We need a single GeoSeries of additional barriers.

%%time
additional = pd.concat([rivers.geometry, extended_topo.geometry])
CPU times: user 3.93 ms, sys: 36 µs, total: 3.96 ms
Wall time: 2.6 ms

Using spatial index, we link additional barriers to existing enclosures.

%%time
sindex = gpd.GeoSeries(enclosures).sindex
inp, res = sindex.query_bulk(additional.geometry, predicate='intersects')
CPU times: user 27.8 s, sys: 43.9 ms, total: 27.9 s
Wall time: 27.7 s

Unique enclosure indices in res mark those enclosures which needs to be subdivided.

%%time
unique = np.unique(res)
CPU times: user 12.2 ms, sys: 2 µs, total: 12.2 ms
Wall time: 10.6 ms

We loop over polygons and generate new geometry using additional barriers.

%%time
new = []

for i in tqdm(unique, total=len(unique)):
    poly = enclosures.data[i]  # get enclosure polygon
    crossing = inp[res==i]  # get relevant additional barriers
    buf = pygeos.buffer(poly, 0.01)  # to avoid floating point errors
    crossing_ins = pygeos.intersection(buf, additional.values.data[crossing])  # keeping only parts of additional barriers within polygon
    union = pygeos.union_all(np.append(crossing_ins, pygeos.boundary(poly)))  # union
    polygons = np.array(list(polygonize(_pygeos_to_shapely(union))))  # polygonize
    within = pygeos.covered_by(pygeos.from_shapely(polygons), buf)  # keep only those within original polygon
    new += list(polygons[within])
CPU times: user 3min 7s, sys: 607 ms, total: 3min 7s
Wall time: 3min 6s

Now we replace those polygons which needed subdivision with a new geometry.

%%time
final_enclosures = gpd.GeoSeries(enclosures).drop(unique).append(gpd.GeoSeries(new))
CPU times: user 334 ms, sys: 5 µs, total: 334 ms
Wall time: 331 ms

Before we save it to a file, let’s check the difference between initial and final enclosures.

final_enclosures.shape
(735372,)
enclosures.shape
(619191,)
%%time
gpd.GeoDataFrame(geometry=final_enclosures, crs=roads.crs).to_parquet('../../urbangrammar_samba/enclosures.pq')
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:1: UserWarning: this is an initial implementation of Parquet/Feather file support and associated metadata.  This is tracking version 0.1.0 of the metadata specification at https://github.com/geopandas/geo-arrow-spec

This metadata specification does not yet make stability promises.  We do not yet recommend using this in a production setting unless you are able to rewrite your Parquet/Feather files.

To further ignore this warning, you can do: 
import warnings; warnings.filterwarnings('ignore', message='.*initial implementation of Parquet.*')
  """Entry point for launching an IPython kernel.
CPU times: user 4.81 s, sys: 626 ms, total: 5.43 s
Wall time: 6.81 s