Data Acquisition

import geopandas, pandas
from pyogrio import read_dataframe
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed

Greenbelts

Download if not available:

gb_url = (
    'https://maps.communities.gov.uk/geoserver/dclg_inspire/ows'
    '?service=WFS&version=2.0.0&request=GetFeature&'
    'typeName=dclg_inspire:England_Green_Belt_2017-18_WGS84&'
    'outputFormat=json&srsName=EPSG:27700'
)

Read up:

%%time
gb = read_dataframe(
    gb_url,
    #where="GB_Name='Merseyside and Greater Manchester'"
)
CPU times: user 3.3 s, sys: 340 ms, total: 3.64 s
Wall time: 14.8 s

Signatures

  • Pull data if not present
try:
    open('signatures.gpkg').close()
except:
    ! wget https://figshare.com/ndownloader/files/30904861 -O signatures.gpkg
--2023-02-16 10:27:30--  https://figshare.com/ndownloader/files/30904861
Resolving figshare.com (figshare.com)... 46.137.13.70, 63.33.127.36, 2a05:d018:1f4:d003:2d08:7968:d247:2fb8, ...
Connecting to figshare.com (figshare.com)|46.137.13.70|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/30904861/spatial_signatures_GB.gpkg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIYCQYOYV5JSSROOA/20230216/eu-west-1/s3/aws4_request&X-Amz-Date=20230216T102730Z&X-Amz-Expires=10&X-Amz-SignedHeaders=host&X-Amz-Signature=04d758bcf56e9e5628d023e3762610ae92bec7d0f1f592b2943805a8acfba07e [following]
--2023-02-16 10:27:30--  https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/30904861/spatial_signatures_GB.gpkg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIYCQYOYV5JSSROOA/20230216/eu-west-1/s3/aws4_request&X-Amz-Date=20230216T102730Z&X-Amz-Expires=10&X-Amz-SignedHeaders=host&X-Amz-Signature=04d758bcf56e9e5628d023e3762610ae92bec7d0f1f592b2943805a8acfba07e
Resolving s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)... 52.218.105.122, 52.92.32.48, 52.92.34.24, ...
Connecting to s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)|52.218.105.122|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 909824000 (868M) [application/octet-stream]
Saving to: ‘signatures.gpkg’

signatures.gpkg     100%[===================>] 867.68M  31.6MB/s    in 23s     

2023-02-16 10:27:53 (37.9 MB/s) - ‘signatures.gpkg’ saved [909824000/909824000]
  • Read and clip
def read_clip(irow, p='signatures.gpkg'):
    i, row = irow
    geom = row.geometry
    out = geopandas.read_file(
        p, mask=geom
    ).clip(geom)
    for col in ['LA_Code', 'LA_Name', 'GB_Name']:
        out[col] = row[col]
    return out

import dask.bag as db
from dask.diagnostics import ProgressBar

bag = db.from_sequence(gb.iterrows()).map(read_clip)
with ProgressBar():
    clipped = pandas.concat(bag.compute())
[                                        ] | 0% Completed |  2.0s
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
[                                        ] | 0% Completed |  2.2s
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
[                                        ] | 0% Completed |  2.5s
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
[                                        ] | 0% Completed |  2.7s
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
[                                        ] | 0% Completed |  3.2s
ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/share/proj failed
[########################################] | 100% Completed |  3min 57.4s

Store in disk

(
    clipped
    .reset_index()
    .drop(columns=['index', 'id'])
    .to_parquet('ss_clipped.pq')
)
/tmp/ipykernel_96/2760393140.py:2: UserWarning: this is an initial implementation of Parquet/Feather file support and associated metadata.  This is tracking version 0.1.0 of the metadata specification at https://github.com/geopandas/geo-arrow-spec

This metadata specification does not yet make stability promises.  We do not yet recommend using this in a production setting unless you are able to rewrite your Parquet/Feather files.

To further ignore this warning, you can do: 
import warnings; warnings.filterwarnings('ignore', message='.*initial implementation of Parquet.*')
  clipped