Urban Grammar AI
On Tuesday, November 30th, Martin gave a lighting talk on the Spatial Signatures of Great Britain at the Alan Turing Insititute event Towards urban analytics 2.0 that was in person in Leeds, UK. The talk introduced the classification of Great Britain that is now available in the form of an interactive map and for download either from the Consumer Data Research Centre’s open data portal or the archived version from figshare. While there was no space for a Q&A directly after the talk, it has raised an interest leading to several great informal discussions afterwards.
You can watch the talk at:
The slides from the talk are here or here is the PDF file.
Read more ...
On December 9th 2021, we held the meeting of the Advisory Board for the project. Still being limited by the global pandemic, we had a frutiful Zoom call filled with an exciting discussion on the opportunities Spatial Signatures offer.
The session started with an overview of the current progress focusing on the current progress and explanation of the whole process of generating spatial signatures from data acquistion to empirical exploration. We then spent
some time discussed the open infrastruture built around the project resulting in an open data product and maximum transparency of the process behind it. We followed by the focused discussion on dissemination and impact of the classification.
Three hours later, we finished with a lot of ideas and potential research avenues and collaborations to be explored. We hope that the situation permits another Advisory board meeting soon and hopefully, in person.
On Monday November 29th, Dani gave a talk on the Spatial Signatures project at the CASA seminars. The talk covered known bits from previous talks, mostly about framing of the problem and the foundational blocks of the Spatial Signatures but, more importantly, presented for the first time our results from the British signatures. The audience was engaging and had many super interesting questions. Thanks!
CASA recorded and published the video on YouTube:
The slides used are here or, if you prefer it, here is the PDF file.
Last month, on November 23rd, Dani gave a quick overview of the Urban Grammar project at the monthly catch up of the Alan Turing Institute. This is an internal call open to all fellows and staff. Dani gave an overview of the project, and covered in a bit more detail the aspects that have been completed already, including the development of a Spatial Signature classification for Great Britain.
There is no video of the talk, but you can check out the slides used here in the talks page of the project or, if you prefer to download a PDF version, you can do so here.
Within the project, we often need to map the results within different contexts ranging from static to interactive maps. We felt that it could be a smoother experience and built xyzservices.
A Python ecosystem offers numerous tools for the visualisation of data
on a map. A lot of them depend on XYZ tiles, providing a base map layer,
either from OpenStreetMap, satellite or other sources. The issue is that
each package that offers XYZ support manages its own list of supported
We have built xyzservices package to support any Python library making
use of XYZ tiles. I’ll try to explain the rationale why we did that,
without going into the details of the package. If you want those
details, check its documentation.
Let me quickly look at a few popular packages and their approach to tile
management - contextily, folium, ipyleaflet and holoviews.
contextily brings contextual base maps to static geopandas plots. It
comes with a dedicated contextily.providers module, which contains a
hard-coded list of providers scraped from the list used by leaflet
(as of version 1.1.0).
On June 30th, Dani provided an overview of ongoing work to build Spatial Signatures for all
of Great Britain at the Turing’s Urban Analytics monthly meetup.
This was the first public presentation covering our work at a national scale and we are excited at how positively it was
The talk is published now on YouTube:
You can have a look at Martin’s presentation at:
And if you want to check out the slides, you can see the HTML deck or the PDF version.
We are nearing the point where we can release some of the data products we have been building and feature in this talk so, if you are interested, stay
tuned as we will have more news to share shortly!
On June 30th, Martin presented a classification of Great Britain based on the form
component of Spatial Signatures at the International Seminar on Urban Form 2021, which
was held virtually in Glasgow, Scotland. This was the first time we have presented results covering the
whole GB, albeit using a single component only (function has been temporarily excluded). The work was received positively and spurred
a bit of discussion. Within a few months, the results should be published in the conference proceedings.
The slides used are here or, if you prefer it, here is the PDF file (24Mb).
Members of the Urban Grammar project are getting involved in developing the next
generation set of tools for distributing processing of geospatial vector data. In its
part, the Urban Grammar
project heavily depends on the processing of vector geospatial data using
GeoPandas Python library. However, to scale GeoPandas
algorithms to the extent of Great Britain, we need to do more than the library can do by
default. GeoPandas operations are currently all single-threaded, severely limiting the
scalability of its usage and leaving most of the CPU cores just laying around, doing
nothing. Dask is a library that brings parallel and distributed
computing to the ecosystem. For example, it provides a Dask DataFrame that consists of
partitioned pandas DataFrames. Each partition can be processed by a different process
enabling the computation to be done in parallel or even out-of-core.
We are using Dask within our workflows in bespoke scripts. However, Dask could provide
ways to scale geospatial operations in GeoPandas in a similar way it does with pandas.
There has been some effort to build a bridge between Dask and GeoPandas, currently
taking the shape of the dask-geopandas
library. While that already supports basic parallelisation, which we used in our code,
some critical components are not ready yet. That should change during this summer within
the Google Summer of Code project Martin is (co-)mentoring. We hope that this effort
will allow us to significantly simplify and even speed up the custom machinery we built
to create spatial signatures in
On April 15th. 2021, we held the second meeting of the Advisory Board for the project. We are delighted that all board members joined us on Zoom for a few hours of exciting discussions on the progress and the future of the project.
Dani started with an overview of our progress since the last meeting, which you can check in his UBDC talk. We followed by the focused discussion on the concepts of Spatial Signature and Enclosed Tessellation and our initial paper illustrating both on the sample of cities worldwide. We discussed the clarity of our ideas and the needs for new spatial units and classification methods, and their potential drawbacks and enhancements. In the last part, we tried to zoom out to see a bigger picture and fit the research within existing projects within academia and the public sector.
After three hours of a very fruitful discussion, we finished with a lot of food for thought and ideas to be explored in the future. Let’s just hope that the Advisory Board meetings will soon happen physically in Liverpool to have an even more productive and friendly environment!
In this post, we introduce a new Python package to generate clustergrams from clustering solutions. The library has been developed as part of the Urban Grammar project, and it is compatible with scikit-learn and GPU-enabled libraries such as cuML or cuDF within RAPIDS.AI.
When we want to do some cluster analysis to identify groups in our data, we often use algorithms like K-Means, which require the specification of a number of clusters. But the issue is that we usually don’t know how many clusters there are.
There are many methods on how to determine the correct number, like silhouettes or elbow plot, to name a few. But they usually don’t give much insight into what is happening between different options, so the numbers are a bit abstract.
Matthias Schonlau proposed another approach - a clustergram. Clustergram is a two-dimensional plot capturing the flows of observations between classes as you add more clusters. It tells you how your data reshuffles and how good your splits are. Tal Galili later implemented clustergram for K-Means in R. And I have used Tal’s
implementation, ported it to Python and created clustergram - a Python package to make clustergrams.
clustergram currently supports K-Means and using scikit-learn (inlcuding Mini-Batch implementation) and RAPIDS.AI cuML (if you have a CUDA-enabled GPU), Gaussian Mixture Model (scikit-learn only) and hierarchical clustering based on scipy.hierarchy. Alternatively, we can create clustergram based on labels and data derived from alternative custom clustering algorithms. It provides a sklearn-like API and plots clustergram using matplotlib, which gives it a wide range of styling
options to match your publication style.
On March 30th, Martin presented the current progress in the development
of Spatial Signatures at the Spatial Analytics + Data Seminar Series
organised by the University of Newcastle, the University of Bristol and
the Alan Turing Institute. Martin presented the basics of urban morphometrics,
showing examples of relevant research based on the momepy Python package to
provide a background for the second part of the talk focusing on Spatial
Signatures. The research was well received and initiated a great discussion,
which we hope will continue on some other platforms soon.
The recording is available on Youtube:
The slides used are here or, if you prefer it, here is the PDF file (27Mb).
On March 16th, Dani presented ongoing work on the development of Spatial
Signatures at the University of Glasgow’s Urban Big Data Centre webinar
series. This was the first time we took the project on tour and it was very
well received. With an audience that peaked at about 75 folks and many great
questions that extended the session well over one hour, we are super happy
with how the foundational ideas of the project were received.
The link to the webinar, for posterity, is here and
you can have a look at an edited version of the video at:
The slides used are here or, if you prefer it, here is the PDF file (15Mb).
Any work of the size of our Urban Grammar AI project has many outputs. All of them should ideally share the same design language, so once we combine them, they tell a coherent story. Therefore, we have defined a visual style applied to any graphical output we will produce.
We have started with a basic colourmap. A significant part of our work will result in categorical maps, so we need diverse colours. We have looked back at the excellent cartography our predecessors produced and found a gem. A study of wage and nationality in Chicago by Jane Addams and Florence Kelley from 1895 resulted in a series of beautiful maps like this one:
Hull House Maps and Papers (1895) by Jane Addams and Florence Kelley.
We based our primary colours on the six colours you can see on this map. This colour map offers a variety while retaining readability, colours nicely play with each other and, importantly, are colour blind safe.
Primary colours derived from Adams and Kelly.
Have you ever needed to link two sources of data, each attached to a different geometry? In our work in the WP2, we do. We have to transfer data from various sources, linked to output areas, urban blocks or other spatial units to our own bespoke set of geographies. Therefore, we often need to do areal interpolation to correctly map data from one layer to another. Luckily, the open-source Python ecosystem can help.
Tobler, a part of PySAL family, is a library for areal interpolation and dasymetric mapping which already offered what we needed. However, our data tends to be large, up to 15 million rows on which we need to interpolate several hundreds of thousands of rows of input data. That can take a while, so each performance improvement can help a lot.
We have looked into the existing code of tobler and contributed to the refactoring if its area_interpolate function. The original implementation was using custom code for spatial indexing, which was replaced by a performant vectorised implementation based on the more recent pygeos project.
With sample data, we’ve been able to speed up the interpolation from 2.4 seconds to less than 400 milliseconds, having the same result 6x faster.
Such an improvement is great, but it still uses only a single core (as most of the geospatial code to be honest), leaving the rest in a modern computer (four or more cores is not uncommon these days…) lazily laying around. We have tried to change this and contributed a (still experimental) parallel implementation of the same algorithm (based on joblib).