Interpolation of signatures to OA and LSOA¶

Signatures are composed of bespoke geometries but can be transferred to other. In this notebook, we are interpolating signatures to Output Areas and Lower Super Output Area.

The notebook requires tobler > 0.8.2.

import geopandas
import tobler

signatures = geopandas.read_file("spatial_signatures_GB.gpkg")

/opt/conda/lib/python3.8/site-packages/geopandas/geodataframe.py:577: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance.
  for feature in features_lst:

Output Areas¶

We fetch OA for England and Wales, and those for Scotlaned and combine them to a single GeoDataFrame.

from download import download
import pandas
import dask_geopandas

download("https://opendata.arcgis.com/api/v3/datasets/09b58d063d4e421a9cad16ba5419a6bd_0/downloads/data?format=geojson&spatialRefId=4326", "oa_ew.geojson")

Downloading data from https://opendata.arcgis.com/api/v3/datasets/09b58d063d4e421a9cad16ba5419a6bd_0/downloads/data?format=geojson&spatialRefId=4326 (1 byte)

file_sizes: 2.62GB [01:15, 34.7MB/s]                                            
Successfully downloaded file to oa_ew.geojson

'oa_ew.geojson'

download("https://www.nrscotland.gov.uk/files/geography/output-area-2011-mhw.zip", "oa_scot", kind="zip")

Creating data folder...
Downloading data from https://www.nrscotland.gov.uk/files/geography/output-area-2011-mhw.zip (31.2 MB)

file_sizes: 100%|██████████████████████████| 32.7M/32.7M [00:00<00:00, 99.8MB/s]
Extracting zip file...
Successfully downloaded / unzipped to oa_scot

'oa_scot'

oa_ew = geopandas.read_file("oa_ew.geojson")

oa_scot = geopandas.read_file('oa_scot')

oa_ew = oa_ew[["OA11CD", "geometry"]].to_crs(27700)

oa_scot = oa_scot[["code", "geometry"]].rename(columns={"code": "OA11CD"})

oa = pandas.concat([oa_ew, oa_scot]).reset_index(drop=True)

OA geometries are very dense. For our purpose, we simplify them a bit to speedup the interpolation.

%%time
oa_ddf = dask_geopandas.from_geopandas(oa, npartitions=16)
oa_ddf["geometry"] = oa_ddf.simplify(1).buffer(0)
oa = oa_ddf.compute()

CPU times: user 2min 18s, sys: 493 ms, total: 2min 18s
Wall time: 12.1 s

Using tobler, we interpolate the data and then clean the DataFrame.

%%time
estimates = tobler.area_weighted.area_interpolate(signatures, oa, spatial_index="target", categorical_variables=["type"], n_jobs=-1)

CPU times: user 1h 11min 21s, sys: 13.3 s, total: 1h 11min 35s
Wall time: 2h 6min 1s

estimates["OA11CD"] = oa.OA11CD.values

We also want to get the primary type covering the largest portion of each OA.

primary = estimates.drop(columns=["geometry", "OA11CD"]).idxmax(axis=1)

estimates["primary_type"] = primary.str.replace("type_", "")

estimates.columns = [col.replace("type_", "") for col in estimates.columns]

order = ['OA11CD', 'primary_type', 'Countryside agriculture', 'Accessible suburbia', 'Open sprawl',
       'Wild countryside', 'Warehouse land', 'Gridded residential quarters',
       'Urban buffer', 'Disconnected suburbia',
       'Dense residential neighbourhoods',
       'Connected residential neighbourhoods', 'Dense urban neighbourhoods',
       'Local urbanity', 'Distilled urbanity', 'Regional urbanity', 'outlier',
       'Metropolitan urbanity', 'Hyper distilled urbanity', 
       ]

oa_estimated = estimates[order]

oa_estimated

	OA11CD	primary_type	Countryside agriculture	Accessible suburbia	Open sprawl	Wild countryside	Warehouse land	Gridded residential quarters	Urban buffer	Disconnected suburbia	Dense residential neighbourhoods	Connected residential neighbourhoods	Dense urban neighbourhoods	Local urbanity	Distilled urbanity	Regional urbanity	outlier	Metropolitan urbanity	Hyper distilled urbanity
0	E00000001	Distilled urbanity	0.000000	0.0	0.000000	0.000000	0.0	0.0	0.000000	0.0	0.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0
1	E00000003	Distilled urbanity	0.000000	0.0	0.000000	0.000000	0.0	0.0	0.000000	0.0	0.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0
2	E00000005	Distilled urbanity	0.000000	0.0	0.000000	0.000000	0.0	0.0	0.000000	0.0	0.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0
3	E00000007	Distilled urbanity	0.000000	0.0	0.000000	0.000000	0.0	0.0	0.000000	0.0	0.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0
4	E00000010	Distilled urbanity	0.000000	0.0	0.000000	0.000000	0.0	0.0	0.000000	0.0	0.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
227754	S00094726	Wild countryside	0.000000	0.0	0.000000	0.965738	0.0	0.0	0.000000	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
227755	S00102583	Urban buffer	0.000000	0.0	0.076019	0.000000	0.0	0.0	0.923981	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
227756	S00119179	Wild countryside	0.000000	0.0	0.000000	0.995102	0.0	0.0	0.000594	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
227757	S00119262	Wild countryside	0.000000	0.0	0.000000	0.982881	0.0	0.0	0.000000	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
227758	S00126169	Countryside agriculture	0.999624	0.0	0.000000	0.000377	0.0	0.0	0.000000	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

227759 rows × 19 columns

oa_estimated.to_csv("output_area_estimates.csv")

LSOA¶

We re-use OA geometries and combine them to LSOA using the lookup table.

download("https://www.arcgis.com/sharing/rest/content/items/39b8fd75e06f4588b9242bcad1f1a898/data", "lookup", kind="zip")

Creating data folder...
Downloading data from https://www.arcgis.com/sharing/rest/content/items/39b8fd75e06f4588b9242bcad1f1a898/data (22.4 MB)

file_sizes: 100%|██████████████████████████| 23.5M/23.5M [00:01<00:00, 18.8MB/s]
Extracting zip file...
Successfully downloaded / unzipped to lookup

'lookup'

lookup = pandas.read_csv("lookup/Postcode_to_Output_Area_to_Lower_Layer_Super_Output_Area_to_Middle_Layer_Super_Output_Area_to_Local_Authority_District_November_2018_Lookup_in_the_UK.csv")

/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py:3165: DtypeWarning: Columns (13) have mixed types.Specify dtype option on import or set low_memory=False.
  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,

lookup = lookup[["oa11cd", "lsoa11cd"]]

oa_lsoa = oa.merge(lookup[~lookup.oa11cd.duplicated()], left_on="OA11CD", right_on="oa11cd", how="left")

oa_lsoa

	OA11CD	geometry	oa11cd	lsoa11cd
0	E00000001	POLYGON ((532305.295 181814.350, 532192.035 18...	E00000001	E01000001
1	E00000003	POLYGON ((532215.180 181846.434, 532215.756 18...	E00000003	E01000001
2	E00000005	POLYGON ((532181.931 181763.261, 532176.493 18...	E00000005	E01000001
3	E00000007	POLYGON ((532203.092 181668.419, 532229.800 18...	E00000007	E01000001
4	E00000010	POLYGON ((532129.762 182133.441, 532099.885 18...	E00000010	E01000003
...	...	...	...	...
227754	S00094726	MULTIPOLYGON (((140992.500 757140.000, 140998....	S00094726	S01007285
227755	S00102583	MULTIPOLYGON (((249205.272 658381.836, 249199....	S00102583	S01008322
227756	S00119179	MULTIPOLYGON (((180736.000 822983.000, 181184....	S00119179	S01010669
227757	S00119262	POLYGON ((331504.549 973428.233, 332026.118 97...	S00119262	S01010790
227758	S00126169	POLYGON ((314833.000 734046.000, 314950.484 73...	S00126169	S01011959

227759 rows × 4 columns

Creation of LSOA geometries is done using dask_geopandas dissolve method, that was not released at the time.

# https://github.com/geopandas/dask-geopandas/pull/69
def dissolve(ddf, by=None, aggfunc="first", split_out=1, **kwargs):
    """Dissolve geometries within `groupby` into a single geometry.
    Parameters
    ----------
    by : string, default None
        Column whose values define groups to be dissolved. If None,
        whole GeoDataFrame is considered a single group.
    aggfunc : function,  string or dict, default "first"
        Aggregation function for manipulation of data associated
        with each group. Passed to dask `groupby.agg` method.
        Note that ``aggfunc`` needs to be applicable to all column (i.e. ``"mean:``
        cannot be used with string dtype). Select only required columns before
        ``dissolve`` or pass a dictionary mapping ``aggfunc`` to each column
        separately.
    split_out : int, default 1
        Number of partitions of the output
    **kwargs
        keyword arguments passed to `groupby`
    Examples
    --------
    >>> ddf.dissolve("foo", split_out=12)
    >>> ddf[["foo", "bar", "geometry"]].dissolve("foo", aggfunc="mean")
    >>> ddf.dissolve("foo", aggfunc={"bar": "mean", "baz": "first"})
    """
    import dask.dataframe as dd
    
    if by is None:
        by = lambda x: 0
        drop = [ddf.geometry.name]
    else:
        drop = [by, ddf.geometry.name]

    def union(block):
        merged_geom = block.unary_union
        return merged_geom

    merge_geometries = dd.Aggregation(
        "merge_geometries", lambda s: s.agg(union), lambda s0: s0.agg(union)
    )
    if isinstance(aggfunc, dict):
        data_agg = aggfunc
    else:
        data_agg = {col: aggfunc for col in ddf.columns.drop(drop)}
    data_agg[ddf.geometry.name] = merge_geometries
    aggregated = ddf.groupby(by=by, **kwargs).agg(
        data_agg,
        split_out=split_out,
    )
    return aggregated.set_crs(ddf.crs)

oa_lsoa_ddf = dask_geopandas.from_geopandas(oa_lsoa, npartitions=16)
lsoa_ddf = dissolve(oa_lsoa_ddf, "lsoa11cd", split_out=16)
lsoa = lsoa_ddf.compute()

Now we can interpolate the signature classification to LSOA and clean the DataFrame as above.

%%time
estimates = tobler.area_weighted.area_interpolate(signatures, lsoa, spatial_index="target", categorical_variables=["type"], n_jobs=-1)

CPU times: user 11min 11s, sys: 9.26 s, total: 11min 20s
Wall time: 50min 52s

estimates["LSOA11CD"] = lsoa.index.values

primary = estimates.drop(columns=["geometry", "LSOA11CD"]).idxmax(axis=1)

estimates["primary_type"] = primary.str.replace("type_", "")

estimates.columns = [col.replace("type_", "") for col in estimates.columns]

order = ['LSOA11CD', 'primary_type', 'Countryside agriculture', 'Accessible suburbia', 'Open sprawl',
       'Wild countryside', 'Warehouse land', 'Gridded residential quarters',
       'Urban buffer', 'Disconnected suburbia',
       'Dense residential neighbourhoods',
       'Connected residential neighbourhoods', 'Dense urban neighbourhoods',
       'Local urbanity', 'Distilled urbanity', 'Regional urbanity', 'outlier',
       'Metropolitan urbanity', 'Hyper distilled urbanity', 
       ]

lsoa_estimated = estimates[order]

lsoa_estimated

	LSOA11CD	primary_type	Countryside agriculture	Accessible suburbia	Open sprawl	Wild countryside	Warehouse land	Gridded residential quarters	Urban buffer	Disconnected suburbia	Dense residential neighbourhoods	Connected residential neighbourhoods	Dense urban neighbourhoods	Local urbanity	Distilled urbanity	Regional urbanity	outlier	Metropolitan urbanity	Hyper distilled urbanity
0	E01000007	Dense urban neighbourhoods	0.0	0.000000	0.000000	0.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.822703	0.177297	0.0	0.0	0.0	0.0	0.0
1	E01000015	Dense residential neighbourhoods	0.0	0.000000	0.001117	0.0	0.000000	0.000000	0.000000	0.022815	0.707635	0.136794	0.130251	0.000000	0.0	0.0	0.0	0.0	0.0
2	E01000030	Dense residential neighbourhoods	0.0	0.000000	0.000000	0.0	0.427742	0.000000	0.000000	0.000000	0.572258	0.000000	0.000000	0.000000	0.0	0.0	0.0	0.0	0.0
3	E01000085	Dense urban neighbourhoods	0.0	0.000000	0.000000	0.0	0.000000	0.000000	0.000000	0.000000	0.171808	0.126567	0.701626	0.000000	0.0	0.0	0.0	0.0	0.0
4	E01000118	Connected residential neighbourhoods	0.0	0.339714	0.036230	0.0	0.000000	0.132012	0.000000	0.000000	0.004280	0.487764	0.000000	0.000000	0.0	0.0	0.0	0.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
41724	W01001910	Gridded residential quarters	0.0	0.000000	0.000000	0.0	0.000000	0.981498	0.000000	0.000000	0.000000	0.000000	0.018502	0.000000	0.0	0.0	0.0	0.0	0.0
41725	W01001914	Urban buffer	0.0	0.000000	0.000020	0.0	0.000000	0.000000	0.999980	0.000000	0.000000	0.000000	0.000000	0.000000	0.0	0.0	0.0	0.0	0.0
41726	W01001923	Urban buffer	0.0	0.000000	0.383934	0.0	0.000000	0.005691	0.533735	0.000000	0.003031	0.000000	0.000000	0.000000	0.0	0.0	0.0	0.0	0.0
41727	W01001949	Accessible suburbia	0.0	0.922427	0.000000	0.0	0.000000	0.000000	0.000000	0.000000	0.077573	0.000000	0.000000	0.000000	0.0	0.0	0.0	0.0	0.0
41728	W01001952	Dense urban neighbourhoods	0.0	0.000000	0.000000	0.0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.885102	0.114898	0.0	0.0	0.0	0.0	0.0

41729 rows × 19 columns

lsoa_estimated.to_csv("lsoa_estimates.csv")

estimates.to_file("lsoa.gpkg", driver="GPKG")

Urban Grammar AI | WP2 - Spatial Signatures

Interpolation of signatures to OA and LSOA¶

Output Areas¶

LSOA¶