Training Material for UN Open GIS OpenData
- 1 Introduction
- 2 Educational Material
The following educational material has been drafted within the framework of the OSGeo UN Committee Educational Challenge - Open Geospatial Data and software for UN sustainable development goals. The overarching goal is to show that at this time, the combination of open (geo)data globally available and the significant developments of the free and open source solutions for geospatial is sufficient to initiate geospatial analysis, at worldwide level, at small and intermediate scales, to better understand our ecosystem. In that respect, we have employed OSGeo software solutions to process global open geospatial datasets to answer one selected indicator for a sustainable development goal. The selected indicator is 9.1.1 Proportion of the rural population who live within 2 km of an all-season road (C0901010) which supports the target of developing quality, reliable, sustainable and resilient infrastructure, including regional and transborder infrastructure, to support economic development and human well-being, with a focus on affordable and equitable access for all. The indicator has been chosen after a close analysis of all SGDs and the corresponding indicators as to comply with the following:
- to have a spatial dimension;
- to not be an indicator that is already addressed through another initiative, such as the GEO Wetlands Initiative, WHO Interactive Air Pollution Maps, GEO AquaWatch, ESA CoastColour etc.;
- if possible, to not be yet the subject of a published methodology.
We have prepared this educational material for researchers, educators and professionals in local, regional, national or international agencies with minimal to intermediate geospatial information knowledge. We assume our audience has already basic knowledge of geospatial data structures, formats and that they have already used a GIS software, as to have basic skills and understanding of how to work with geospatial and tabular data. In that respect, we have limited the interactions with the command line, however we have inserted references to it.
Data and software used
Datasets and software used For the calculation of the SDG indicator, we have only used QGIS 3.4. We have also taken into consideration that changes that might occur from one version to another and thus focused on the functions used more than on a step-by-step guide. The datasets used for our exercise are the following:
After going through the entire educational material, one will be able to:
- Have a broader view on what are the types of geospatial data open at global scale, as well as what are their limitations.
- Have a more deeper understanding of working with geospatial data using a dedicated software
- Consistent knowledge of QGIS software fundamentals
- Learn how to create cartographic representations of the obtained results
Open geospatial information and its role in answering UN Sustainable Development Goals
Population dataset description
Global Administrative Units Dataset description
For road related data, we have decided to use OpenStreetMap data as it is the only homogeneously designed globally available dataset. Without doubt, the amount and the quality of the available data for various regions around the world can vary consistently. However, given the clear and consistent definition of each map element and tag, this exercise should be reproducible in any other part of the world.
Yet, given our area of interest, the Tabora county from Tanzania, we must take into consideration specific developments for Africa, more precisely, the Highway Tag Africa - Topology of Road Network in African countries, and furthermore, the East Africa Tagging Guidelines.
However, with consideration to the global replicability of our educational material, we will also insert specifications on a more general scale. Of course, it must be acknowledged that the workflow presented here could require other adjustments with respect to the specificity of the road dataset used in calculation.
Preparing the geospatial data
For the scope of this exercise we have chosen the Tabora county of Tanzania. As we strive to create an educational material that can be applied no matter the region of interest, a decision was made to use the available datasets, on a global level.
The following table presents the datasets used:
|Topic||Name collection/dataset||Abstract||Indicators||Produce/collector||Owner||License||Type of data||Format||Scale/spatial resolution||Edition||CRS||Other URL|
|Administrative units||Database of Global Administrative Areas||GADM provides maps and spatial data for all countries and their sub-divisions.||administrative units||University of California, Berkeley,Museum of Vertebrate Zoology, and theInternational Rice Research Institute (Global Administrative Areas 2009)||GDAM||The data are freely available for academic use and other non-commercial use. Redistribution, or commercial use is not allowed without prior permission.||vector||Geopackage, shapefile, geodatabase. KMZ, R formats||n/a||April 2018||Geographic WGS84||https://gadm.org/metadata.html|
|World Population||WorldPop||Alpha version 2010 and 2015 estimates of numbers of people per grid square, with national totals adjusted to match UN population division estimates (http://esa.un.org/wpp/) and remaining unadjusted.||Settelments, Population numbers, birth and pregnancy, age structures, poverty spatial distribution etc.||GeoData Institute, University of Southampton||GeoData Institute, University of Southampton||CC BY 4.0||raster||GeoTIFF||100 m||July 2013||Geographic WGS84||http://www.worldpop.org.uk/data/methods/|
|World Population||Global Rural-Urban Mapping project (GRUMP), v1||To provide a polygon representation of urban areas with city or agglomeration name and time series population estimates.||urban geometries||Socioeconomic Data and Applications Center (sedac)||Socioeconomic Data and Applications Center (sedac)||CC BY 4.0||vector||shapefile||30 arc-second||2006||Geographic WGS84||n/a|
|World Population||Global Human Built-up And Settlement Extent (HBASE) Dataset From Landsat, v1 (2010)||To provide high spatial resolution estimates of global urban extent derived from global 30m Landsat satellite data for the target year 2010 and a companion dataset to the Global Man-made Impervious Surface (GMIS) dataset.||urban extent||Socioeconomic Data and Applications Center (sedac)||Socioeconomic Data and Applications Center (sedac)||CC BY 4.0||raster||GeoTiff||30 m||2017||Geographic WGS84, UTM||n/a|
Add all listed data to your project. Create group layers as you bring in the data, so it is easier when you start processing to navigate through all datasets. Create and save your project, so you can pick up the work from where you left it.
/ [Layer]-->[Add layer] - will open the Data Source Manager that allows you to load the data.: the administrative unites (GDAM dataset), the population numbers (WorldPop - we will use the TZA_popmap15adj_v2b.tif file), the urban extent (we will use the global_urban_extent_polygons_v1.01.shp file).
As mentioned, we will use OpenStreetMap data for the roads geometry and condition. Bringing OSM data will require you install a new plugin - OSM Downloader.
[Plugins]-->[Manage and install plugins].
The datasets are in various projections, either the Geographic projection EPSG 4326 or the Pseudo_Mercator EPSG 3857. A geographic coordinate system is based on a spheroid and uses angular units (degrees). Thus, when using QGIS calculator, for example, it returns values in decimal degree and not meters. You can see the used units in a projection's description that can be retrieved from epsg.io. 
As we will work with road geometries, we must reproject all the datasets in a projected coordinate system, which is based on a 2D plane (with the spheroid projected on a 2D plane) and uses linear units, such as meters. For our study, we identify a suitable CRS  for our region of interest, the Tabora county in Tanzania. To do that we will use epsg.io. After a quick search, we find WGS 84 / UTM zone 36S-EPSG: 32736 to be the appropriate for our region.
To reproject vector data using QGIS, we have to save the file with the desired projection.
Click on the vector layer you want to reproject and choose [Export]-->[Save features as..]
For raster datasets, we will use gdalwrap that is available as a processing tool in the Processing toolbox.
[Processing]-->[Toolbox] We can search by typing the keyword 'reproject' in the search bar.
- Fig1 reproject.jpg
Fig. 1 Reproject tool for a raster dataset
Then, we will cut all layers by the boundary of the selected county, Tabora.
Firstly, export from the administrative units level 1, Tabora county. Secondly, clip all layers by its geometry. fig2_clip.jpg|Clip tool identified in the Processing toolbox fig3_clip_rastere.jpg| gdalwarp utility allows cutting a raster by a vector extent </gallery>
Step 3 produces the rural areas of the Tabora county. According to Wikipedia, Tanzania is divided into regions (GDAM administrative level 1), districts (GDAM administrative level 2) and wards, (GDAM administrative level 3).
We will calculate the Rural Access Index on wards, thus we will extract the rural regions from the administrative units level 3. The resulting dataset will be vector type. [Vector]-->[Geoprocessing tools]-->[Difference]
Step 4 prepares the dataset from which we will extract the population number for all rural areas of each administrative unit. We will use zonal statistics, a tool now available in the Processing Toolbox.
- Fig4 zonalStats.jpg
Zonal statistics ran on the administrative units - rural extent
Bring in the roads!
Step 5 is the most time consuming processing stage and, more over, it may vary when this exercise will be applied to other regions in the world. For the roads geometry and condition we will use the OpenStreetMap data available.
As we will see, Tanzania is very well represented on the OSM map, even if we will encounter various situations mainly regarding quality of network connectivity, an important aspect for our study. That is because in 2015, the Crowd2Map Tanzania was launched and during the following years, there have been significant crowd mapping campaigns.
After step 1 and 2, the road dataset should be imported into QGIS, clipped, reprojected in EPSG 32736 and saved as a geopackage file.
Next, we will do a preliminary cleaning, by eliminating all roads segments that are not suitable for cars. The SDG indicator 9.1.1 Proportion of the rural population who live within 2 km of an all-season road refers to roads that are suitable for any kind of vehicle (average modern automobile), thus we will filtrate by: "highway" = 'cycleway' or "highway" = 'pedestrian' or "highway" = 'path' or "highway" = 'footway' or "highway" = ‘residential’ or "highway" = ‘service’. We have also deleted roads under construction, because that means that the roads can not be used for access.
- EPSG.io is an open-source web service with a database of coordinates systems used in maps worldwide that allows discovery of coordinate reference systems utilized all over the world for creating maps and geodata and for identifying geo-position.
- CRS stands for Coordinate Reference System