Draft v20250616
BI, CI, CMH
Spatial open data is a powerful resource for understanding and improving the environments in which people live and work. It reveals patterns, disparities, and relationships across geographic space, enabling evidence-based decisions in areas such as public health, urban planning, environmental management, economic development, and social services. When made openly available, this data can support transparency and foster accountability by allowing stakeholders to monitor developments, assess needs, and inform more inclusive policies. The ability to extract and use this data empowers a wide range of users—from government agencies and researchers to civil society organizations and citizens—to analyze spatial dynamics, generating deeper insights into complex societal challenges and opportunities.
Extracting open-source spatial data is rarely a straightforward task, due to the diversity of data types, formats, and platforms involved. Spatial data can include vector and raster formats, static and dynamic layers, and a wide range of thematic content, from land use and transportation networks to population distributions and environmental indicators. These datasets are hosted across various platforms, each with its own access protocols, licensing terms, and metadata standards. Users must navigate differences in spatial resolution, coordinate systems, and data structures, which can complicate integration and analysis. As a result, effective use of open spatial data requires not only technical proficiency but also a clear understanding of the data landscape and its limitations.
This document provides comprehensive guidelines for collecting open spatial data to support evidence-based decision-making across various sectors and applications. It outlines standardized methodologies, quality assurance processes, and technical specifications essential for gathering reliable, consistent spatial data. These guidelines are designed to ensure that data collection efforts yield high-quality, interoperable datasets that can be effectively integrated and analyzed to address diverse analytical needs.
The guidelines serve multiple audiences, including government agencies, international development organizations, research institutions, private sector entities, and civil society organizations. By standardizing spatial data collection practices, these guidelines enable more efficient resource utilization, reduce duplication of efforts, and facilitate data sharing and collaboration across institutional boundaries.
Open data serves as a fundamental catalyst for sustainable development and innovation. By making spatial information accessible, verifiable, and usable, open data illuminates previously invisible patterns and relationships that can inform better policies and interventions. Whether analyzing access to healthcare facilities, mapping environmental hazards, understanding urban growth patterns, or assessing infrastructure needs, standardized spatial data collection enables more precise and effective responses to societal challenges.
The principles and practices outlined in this document directly support multiple Sustainable Development Goals (SDGs). High-quality spatial data underpins efforts to reduce inequalities (SDG 10), build sustainable cities and communities (SDG 11), take climate action (SDG 13), and strengthen institutions (SDG 16). Moreover, the open data approach itself embodies the collaborative spirit of SDG 17 (Partnerships for the Goals), facilitating knowledge sharing and capacity building across regions and sectors.
Spatial data collection represents a unique intersection of geographic information science, domain expertise, and technological capability. The complexity of modern spatial analysis requires diverse data types from multiple sources, collected and processed using standardized methods that ensure compatibility and reliability. By following these protocols, stakeholders can build robust spatial data infrastructures that support evidence-based decision-making, enable monitoring and evaluation of interventions, and ultimately contribute to more equitable and sustainable development outcomes.
Open geospatial data is governed by a set of principles and standards designed to ensure that data is accessible, usable, and interoperable across platforms and applications. These principles guide how spatial data should be discovered, accessed, and utilized to maximize its value for analysis and decision-making.
The FAIR principles provide a framework for ensuring that spatial data can be effectively discovered and used. Here's how they apply specifically to geospatial information:
FAIR Principle | What It Means | Spatial Data Requirements | Practical Example |
---|---|---|---|
Findable | Data can be easily discovered by humans and machines | • Include coordinate reference system (CRS) in metadata • Document geographic extent (bounding box) • Specify spatial resolution/scale • Use standardized geographic keywords |
A dataset of health facilities includes metadata showing it covers Kenya, uses WGS84 coordinates, and contains point locations at 1:50,000 scale |
Accessible | Data can be retrieved using standard protocols | • Provide data through OGC web services (WFS, WMS, WCS) • Offer downloads in common GIS formats • Include clear access instructions • Document any access restrictions |
Users can download road network data as Shapefile or GeoJSON, or access it via a Web Feature Service with documented endpoints |
Interoperable | Data works well with other datasets | • Use standard coordinate systems • Apply consistent boundary definitions • Follow established feature classifications • Document data structure clearly |
Administrative boundaries use standard ISO codes and match official national boundary files, enabling easy joining with census data |
Reusable | Data can be used for different purposes | • Include spatial accuracy information • Document collection methods and date • Specify appropriate usage scales • Provide clear licensing terms |
Satellite-derived land cover includes accuracy assessment, collection date, recommended zoom levels, and CC-BY license |
The Open Data Charter principles take on specific meanings when applied to spatial data:
Principle | Core Concept | Spatial Data Application | Implementation Tips |
---|---|---|---|
Open by Default | Data should be open unless there's a good reason not to be | • Publish all non-sensitive geographic data • Aggregate sensitive locations appropriately • Default to open licenses for government spatial data |
Aggregate individual addresses to neighborhood level; publish infrastructure locations openly |
Timely and Comprehensive | Current and complete coverage | • Cover entire geographic areas without gaps • Update regularly based on change frequency • Include all relevant features, not just urban areas |
Ensure rural areas are mapped; update annually for slow-changing features, monthly for dynamic data |
Accessible and Usable | Easy to access and work with | • Provide in standard GIS formats (Shapefile, GeoJSON, GeoTIFF) • Include in both proprietary and open-source compatible formats • Offer different scales/resolutions for different uses |
Offer simplified versions for web visualization and detailed versions for analysis |
Comparable and Interoperable | Can be compared and combined | • Use consistent spatial units across datasets • Standardize geographic classifications • Align to common boundary files |
All datasets use the same district boundaries; coding schemes match national standards |
For Improved Governance & Citizen Engagement | Support better decisions and participation | • Prioritize data revealing service gaps • Enable spatial analysis of inequalities • Support evidence-based planning |
Publish facility locations to show underserved areas; enable distance-to-service analysis |
For Inclusive Development and Innovation | Enable broad usage and new applications | • Remove technical barriers to access • Provide examples and documentation • Support diverse use cases |
Include tutorials; provide API access; offer multiple formats |
When discovering and using spatial open data, understanding quality standards helps assess fitness for purpose:
Quality Dimension | What to Check | Why It Matters | Red Flags |
---|---|---|---|
Positional Accuracy | How precisely features are located | Determines suitable analysis scale | No accuracy statement; obvious misalignments |
Attribute Completeness | Whether all fields contain data | Missing data limits analysis options | Many null values; undocumented codes |
Temporal Currency | How recent the data is | Affects relevance for current decisions | No collection date; data >5 years old for dynamic features |
Logical Consistency | Whether data follows its own rules | Indicates data reliability | Overlapping polygons; disconnected road networks |
Lineage | How data was created/processed | Helps assess appropriateness | No methodology documentation; unclear sources |
Comprehensive metadata is essential for understanding and properly using spatial data:
Metadata Element | Description | Why It's Important | Minimum Requirement |
---|---|---|---|
Geographic Extent | Bounding coordinates or area covered | Determines if data covers area of interest | Bounding box coordinates or place names |
Coordinate System | CRS/projection information | Required for accurate overlay with other data | EPSG code or full CRS definition |
Data Structure | Feature types and attributes | Understand what's included and how it's organized | List of layers/tables and key fields |
Collection Method | How data was gathered | Assess reliability and limitations | Basic method (survey, satellite, etc.) |
Update Frequency | How often data is refreshed | Plan for data currency needs | Statement of update schedule or "static" |
Access Information | How to obtain the data | Enable data retrieval | Download URL or service endpoint |
Use Constraints | Any limitations on use | Ensure compliance and appropriate use | License type or "no restrictions" |
This chapter outlines comprehensive approaches for extracting open spatial data, focusing on methodologies that ensure high-quality, standardized geographic information for various analytical purposes. The extraction process encompasses discovering, accessing, evaluating, and acquiring spatial data from diverse sources while navigating technical and administrative challenges.
Effective spatial analysis requires integrating data from multiple sources to create a comprehensive geographic understanding of the phenomena being studied. Each source type offers unique advantages and challenges in terms of coverage, quality, accessibility, and update frequency.
Official statistics from government sources provide authoritative data with defined administrative boundaries and standardized collection methodologies. However, accessing this data often involves navigating significant administrative and technical challenges.
Key sources include:
- National statistical offices
- Census bureaus
- Sectoral ministries (health, education, infrastructure, transportation, agriculture)
- Regional and local government agencies
- National mapping and cadastral agencies
- Environmental protection agencies
- Electoral commissions
Best practices:
- Obtain data at the smallest available administrative level (ideally district or sub-district)
- Document official geographic boundary definitions used by the source
- Verify the coordinate reference system employed
- Check update frequency and most recent collection date

Figure xx. Diagram showing typical government data flow from collection to publication, with emphasis on points where spatial references are added
Examples of official statistics with spatial components:
Data Type | Spatial Resolution | Typical Update Frequency | Common Spatial Identifier |
---|---|---|---|
Census data | Enumeration area | 5-10 years | Census block ID |
Labor force surveys | Administrative district | 1-2 years | District code |
Infrastructure registries | Exact coordinates | Variable | Latitude/longitude |
Service location directories | Address or coordinates | Annual | Facility ID with coordinates |
Environmental monitoring | Station points | Real-time to monthly | Station coordinates |
Land use/zoning | Parcel level | Quarterly to annual | Parcel ID with geometry |
Health statistics | Health district | Monthly to annual | Health facility code |
Education facilities | School location | Annual | School ID with coordinates |
Access Challenges and Navigation Strategies:
Given that we are discussing spatial data extraction, it is essential to examine the process in the specific context of government data sources. Accessing such data often requires navigating a range of administrative and procedural hurdles:
Challenge Category | Specific Issues | Impact on Data Extraction | Mitigation Strategies |
---|---|---|---|
Access Requirements | • User registration with email verification • Official institutional credentials • Formal request letters • Proof of research purpose • Government-to-government agreements |
Delays project timelines; may exclude independent researchers | • Start registration early • Partner with recognized institutions • Prepare documentation templates • Build relationships with data officers |
Technical Barriers | • Platform-specific software requirements • Limited API access or rate limits • CAPTCHA systems preventing automation • Session timeouts during large downloads • Browser-specific compatibility issues |
Increases technical complexity; requires manual intervention | • Test platform requirements • Develop workarounds • Use official tools when required • Plan for manual processes |
Format Inconsistencies | • Data in non-machine readable PDFs • Scanned documents requiring OCR • Interactive dashboards with no export • Mixed formats across regions • Proprietary database formats |
Significant processing overhead; potential data loss | • Budget extraction time • Acquire necessary tools • Develop conversion pipelines • Document all transformations |
Data Fragmentation | • Separate portals for each ministry • Different systems for each admin level • Historical data in different locations • Spatial and attribute data separated • No unified metadata catalog |
Complicates comprehensive collection; increases integration effort | • Map all relevant portals • Create source inventory • Develop integration plan • Track data lineage |
Practical Example: Extracting Health Facility Data
Scenario: Accessing public health facility locations from government sources
Typical Process:
- Ministry website → Register account (2-3 days approval)
- Navigate to data section → Find only PDF reports
- Request spatial data → Directed to different department
- Submit formal request → Wait 2-4 weeks
- Receive data in mixed formats:
- Capital region: Shapefile with coordinates
- Other regions: Excel with addresses only
- Rural areas: PDF lists with village names
- Integration required: Geocoding, digitization, harmonization
Best practices for navigating government data systems:
-
Pre-extraction Assessment:
- Survey all potential government sources
- Document access procedures for each
- Identify required credentials or permissions
- Note available formats and download options
- Check for usage restrictions or licenses
-
Common Spatial Data Types and Typical Access Patterns:
Data Type | Typical Provider | Common Formats | Spatial Reference | Access Complexity |
---|---|---|---|---|
Census boundaries | Statistics office | Shapefile, KML, PDF maps | Usually included | Medium - often requires registration |
Demographic data | Census bureau | CSV, Excel, PDF tables | Admin codes only | High - may need special approval |
Facility locations | Sectoral ministries | Excel, PDF, Web maps | Mixed quality | High - often fragmented |
Infrastructure networks | Transport ministry | CAD files, PDFs | Variable | Very high - technical barriers |
Land records | Cadastral agency | Proprietary GIS, PDF | Good quality | High - restricted access |
- Documentation Throughout the Process:
- Screenshot access procedures
- Save all correspondence
- Record exact download dates and URLs
- Note any data transformations required
- Maintain version control
- Document all license terms
International organizations maintain standardized spatial datasets that enable cross-country comparison and provide data where national statistics may be unavailable. Where possible, users should consult existing manuals and official user guides that detail extraction methods for these sources. These resources provide valuable dataset-specific instructions, metadata documentation, and appropriate extraction techniques tailored to each platform's unique structure and access protocols.
Key spatial data sources:
Data Source | Description | Spatial Data Types | Common Applications | User Guide/Documentation |
---|---|---|---|---|
World Bank Open Data https://data.worldbank.org/ |
Development indicators and statistics for countries worldwide | Country-level data with some subnational coverage | Economic analysis, poverty mapping, development planning | World Bank Data Help Desk |
UN Data Portal https://data.un.org/ |
Official UN statistics across multiple domains | National and some regional statistics | SDG monitoring, demographic analysis, social indicators | UNdata User Guide |
Humanitarian Data Exchange (HDX) https://data.humdata.org/ |
Crisis and humanitarian data from multiple organizations | Administrative boundaries, infrastructure, population | Emergency response, vulnerability assessment, humanitarian planning | HDX Quick Start Guide |
Natural Earth Data https://www.naturalearthdata.com/ |
Public domain map datasets | Global coverage at multiple scales (1:10m, 1:50m, 1:110m) | Base mapping, cartography, reference layers | Natural Earth Quick Start |
OpenStreetMap Data Extracts https://download.geofabrik.de/ |
Crowdsourced geographic data | Roads, buildings, POIs, land use | Infrastructure analysis, accessibility studies, urban planning | OSM Data User Guide |
FAO GeoNetwork https://www.fao.org/geonetwork/ |
Agricultural and environmental data | Land use, soil, climate, agricultural statistics | Food security, agricultural planning, environmental assessment | FAO GeoNetwork Manual |
NASA Earthdata https://earthdata.nasa.gov/ |
Satellite observations and derived products | Remote sensing imagery, climate data, land cover | Environmental monitoring, climate analysis, disaster response | Earthdata User Guide |
SEDAC (Socioeconomic Data) https://sedac.ciesin.columbia.edu/ |
Population, sustainability, and environmental data | Gridded population, environmental hazards | Population distribution, risk assessment, urban studies | SEDAC User Guide |
GADM Database https://gadm.org/ |
Administrative boundaries worldwide | Administrative boundaries at all levels | Spatial analysis framework, administrative mapping | Documentation on website |
Global Forest Watch https://www.globalforestwatch.org/ |
Forest monitoring and land use change | Forest cover, deforestation alerts, land use | Environmental monitoring, conservation planning | GFW How-To Guide |
WHO Global Health Observatory https://www.who.int/data/gho |
Health statistics and information | Health facility locations, disease data | Health planning, epidemiology, service accessibility | GHO Data Portal Guide |
WorldPop https://www.worldpop.org/ |
High-resolution population data | Gridded population distributions, demographics | Population analysis, service planning, accessibility studies | WorldPop Data Access Guide |
UNEP Environmental Data https://wesr.unep.org/ |
Environmental statistics and indicators | Environmental quality, natural resources | Environmental assessment, policy planning | Platform-specific guides available |
ILO Statistics https://ilostat.ilo.org/ |
Labor and employment data | National and regional employment statistics | Labor market analysis, economic planning | ILOSTAT User Guide |
OECD Data https://data.oecd.org/ |
Economic and social statistics | National and regional indicators | Economic analysis, policy comparison | OECD.Stat User Guide |
Best practices:
- Consult platform-specific user guides and API documentation before beginning data extraction
- Verify the spatial harmonization methods used for cross-country datasets
- Document vintage of both data and geographic boundaries
- Check for post-collection spatial adjustments or transformations
- Assess geographic completeness, particularly for small nations, territories, or remote regions
- Identify the methodology used for spatial disaggregation of national statistics
- Review data licenses and citation requirements for each source
- Use bulk download options or APIs for large-scale data extraction when available
Participatory mapping and volunteered geographic information can fill critical gaps in official datasets, particularly for rapidly changing environments, areas with limited official coverage, or locally-specific features that may not appear in conventional datasets.
Understanding extraction pathways:
There are several ways to extract crowdsourced spatial data, each suited to different needs and technical capacities:
Extraction Method | Technical Level | Best For | Advantages | Limitations |
---|---|---|---|---|
Direct Downloads | Beginner | Complete datasets for specific regions | • Pre-processed files • Multiple formats available • No API knowledge needed |
• Large file sizes • May include unnecessary data • Requires local processing |
GIS Software Plugins | Intermediate | Specific features or small areas | • Query only needed data • Direct integration with analysis • Real-time access |
• Requires GIS software • API limits may apply • Internet connection needed |
APIs and Web Services | Advanced | Automated workflows, large-scale analysis | • Programmatic access • Always current data • Efficient for specific queries |
• Programming skills required • Rate limits • Complex authentication |
Curated Platforms | Beginner | Analysis-ready datasets | • Pre-cleaned data • Includes metadata • Quality controlled |
• May not be fully current • Limited to available extracts • Less customizable |
Key platforms and extraction methods:
Platform | Description | Extraction Methods | Documentation |
---|---|---|---|
OpenStreetMap (OSM) | Global crowdsourced map data | • Geofabrik downloads • Overpass API • Planet OSM files • QuickOSM (QGIS plugin) |
OSM Wiki - Downloading Data |
Geofabrik Downloads | Pre-extracted OSM data by region | • Direct downloads (PBF, Shapefile) • Updated daily |
Geofabrik Download Server |
HOT Export Tool | Custom OSM extracts with thematic filtering | • Web interface • Scheduled exports • Multiple formats |
HOT Export Tool Documentation |
Overpass Turbo | Query-based OSM data extraction | • Web interface • Custom queries • API access |
Overpass API User's Manual |
Mapillary | Street-level imagery and derived data | • API access • Web downloads • Developer tools |
Mapillary Developer Guide |
Local Ground | Community mapping platform | • Project-based exports • API access |
Platform documentation |
Ushahidi | Crisis mapping and crowdsourcing | • Platform exports • API access |
Ushahidi Developer Documentation |
Practical extraction example using different methods:
Task: Extract all health facilities in a district
Method 1 - Direct Download (Beginner):
- Visit Geofabrik.de → Select country → Download shapefile
- Load in GIS software → Filter by amenity=hospital/clinic
- Clip to district boundary
Method 2 - QGIS Plugin (Intermediate):
- Open QGIS → Install QuickOSM plugin
- Query: amenity=hospital OR amenity=clinic
- Set district as extent → Run query
Method 3 - Overpass API (Advanced):
[out:json];
area["name"="District Name"]->.searchArea;
(
node"amenity"~"hospital|clinic";
way"amenity"~"hospital|clinic";
);
out body;
Quality considerations for crowdsourced data:
Quality Aspect | What to Check | Validation Methods |
---|---|---|
Completeness | Coverage gaps, especially in rural areas | Compare with official registries, satellite imagery |
Currency | Last edit dates, mapper activity | Check OSM metadata, changeset history |
Accuracy | Positional accuracy, attribute correctness | Ground truthing, cross-reference with other sources |
Consistency | Tagging variations, naming conventions | Data cleaning scripts, standardization tools |
Best practices:
- Choose the extraction method that matches your technical skills and data needs
- Always check data currency using platform metadata or changeset information
- Implement quality assurance protocols for volunteered geographic information
- Document the extraction date, method, and any filters applied
- Validate critical features against authoritative sources when available
- Consider combining multiple crowdsourced platforms for better coverage
- Review platform-specific tagging guides to understand data structure
- Use established data models and schemas where available (e.g., OSM tagging conventions)
Earth observation and modeling techniques provide consistent spatial coverage across large areas, enabling analysis of physical features, environmental conditions, population distributions, and change over time. This section covers both direct satellite observations and derived/modeled products that combine satellite data with other inputs.
Understanding extraction pathways for satellite data:
Extraction Method | Technical Level | Best For | Infrastructure Needs | Documentation |
---|---|---|---|---|
Direct Download | Beginner-Intermediate | Small areas, specific dates | High bandwidth, storage | USGS EarthExplorer Guide |
Cloud Platforms | Intermediate-Advanced | Large-scale analysis, time series | Internet connection, no storage | Google Earth Engine Guides |
Desktop Plugins | Intermediate | Specific imagery, preprocessing | Local processing power | QGIS SCP Tutorial |
APIs/Web Services | Advanced | Automated workflows | Programming skills | Platform-specific API docs |
Data Cubes | Advanced | National-scale analysis | Significant infrastructure | Open Data Cube Manual |
A. Raw Satellite Data Sources:
Satellite/Sensor | Spatial Resolution | Temporal Resolution | Key Applications | Access Methods | Documentation |
---|---|---|---|---|---|
Sentinel-2 | 10-60m | 5 days | Land cover, vegetation, water | Copernicus Hub, GEE, AWS | Sentinel Online User Guide |
Landsat 8/9 | 15-30m | 16 days | Long-term change, thermal | USGS, GEE, AWS | Landsat User Guide |
Sentinel-1 | 5-40m | 6-12 days | Flood mapping, deformation | Copernicus Hub, GEE | Sentinel-1 User Guide |
MODIS | 250m-1km | Daily | Fire, temperature, vegetation | NASA Earthdata, GEE | MODIS Data User Guide |
Planet | 3-5m | Daily | High-res monitoring | Commercial API | Planet Developer Center |
VIIRS | 375-750m | Daily | Nighttime lights, fires | NOAA, NASA | VIIRS User Guide |
B. Derived Satellite Products:
Product Category | Examples | Resolution | Update Frequency | Access Platform | Use Cases |
---|---|---|---|---|---|
Land Cover/Use | • ESA WorldCover • Dynamic World • MODIS Land Cover |
10m-500m | Annual to real-time | GEE, Copernicus | Habitat mapping, urban growth, agricultural monitoring |
Nighttime Lights | • VIIRS DNB • DMSP-OLS (historical) |
500m-1km | Monthly | NOAA, GEE | Economic activity, electrification, urban extent |
Vegetation Indices | • MODIS NDVI/EVI • Sentinel-2 vegetation |
10m-1km | 5-16 days | GEE, NASA | Agricultural monitoring, drought assessment, phenology |
Water/Flood | • Global Surface Water • Sentinel-1 flood maps |
10-30m | Event-based to annual | GEE, Copernicus EMS | Flood risk, water resources, disaster response |
Elevation | • SRTM • ASTER GDEM • Copernicus DEM |
30-90m | Static | USGS, Copernicus | Terrain analysis, watershed modeling, accessibility |
Climate Variables | • CHIRPS rainfall • MODIS temperature |
1-25km | Daily to monthly | Climate Data Store, GEE | Agricultural planning, climate risk assessment |
C. Modeled Spatial Products (combining satellite with other data):
Product Type | Examples | Resolution | Methodology | Access | Applications |
---|---|---|---|---|---|
Population Distribution | • WorldPop • LandScan • GHS-POP • Facebook HRSL |
30m-1km | Satellite + census + ML | HDX, WorldPop, GEE | Service planning, disaster response, demographic analysis |
Infrastructure | • Global Roads (GRIP) • Building footprints • Global Power Plants |
Various | Satellite + OSM + official | Direct download, HDX | Accessibility analysis, infrastructure planning |
Urban Extent | • Global Human Settlement • World Settlement Footprint |
10-30m | Satellite classification | GEE, DLR | Urban planning, growth monitoring |
Environmental Risk | • Global Flood Database • Wildfire risk maps • Landslide susceptibility |
Various | Satellite + modeling | Various platforms | Risk assessment, insurance, planning |
Socioeconomic | • Relative Wealth Index • Grid3 settlements |
Various | Satellite + ML + surveys | Meta, Grid3 | Development planning, targeting interventions |
Practical extraction workflows:
Example 1: Extracting Land Cover Data
- Small area: Download from Copernicus Browser → Process in QGIS
- Large area: Use GEE → Export to Google Drive
- Time series: GEE or Open Data Cube → Cloud processing
Example 2: Population Distribution Analysis
- Direct download: WorldPop.org → GeoTIFF files
- Cloud analysis: GEE → WorldPop catalog
- API access: WorldPop REST API → Custom queries
Quality considerations:
Data Type | Key Quality Checks | Validation Approaches |
---|---|---|
Optical Imagery | Cloud cover, atmospheric correction | Visual inspection, quality bands |
Radar Data | Speckle noise, geometric distortion | Filtering, terrain correction |
Derived Products | Classification accuracy, temporal consistency | Confusion matrices, ground truth |
Modeled Data | Model assumptions, input data quality | Cross-validation, uncertainty maps |
Best practices:
- Choose appropriate spatial and temporal resolution for your analysis scale
- Understand the difference between raw imagery and analysis-ready data
- Document all preprocessing steps and product versions used
- Consider seasonal and weather impacts on data quality
- Validate satellite-derived products with ground data when possible
- Review product-specific accuracy assessments and limitations
- Use cloud platforms for large-scale processing to avoid data transfer
- Combine multiple data sources to overcome individual limitations
Systematic quality assessment is essential when using open spatial data. Understanding common quality issues helps you identify potential problems early and make informed decisions about how to use the data appropriately. This section raises awareness of typical challenges without prescribing specific solutions, as the best approach depends on your particular context and analytical needs.
Spatial data accuracy involves two main components that beginners should understand:
- Positional accuracy: How precisely features are located on the map (are things in the right place?)
- Attribute accuracy: How correct the information attached to features is (is the information about each feature correct?)
Common accuracy issues to watch for:

Figure xx. Diagram illustrating common spatial accuracy issues and their potential impact on spatial analysis
The diagram above shows four typical accuracy problems you might encounter:
-
Positional Inaccuracy: When recorded locations don't match true positions
- Example: A hospital marked 500m from its actual location
- Impact: Distance calculations and service area analysis become unreliable
-
Boundary Misalignment: When administrative or feature boundaries don't match reality
- Example: District boundaries from different sources don't align
- Impact: Data gets assigned to wrong areas, creating false patterns
-
Attribute Inaccuracy: When feature information is wrong or unclear
- Example: A school misclassified as a hospital in the data
- Impact: Analysis of available services becomes incorrect
-
Scale Inconsistency: When data detail varies across your study area
- Example: Urban areas mapped at building level, rural areas only at village level
- Impact: Some areas appear to have more features simply due to mapping detail
Simple ways to check accuracy:
What to Check | How to Check It | What You're Looking For | Why It Matters |
---|---|---|---|
Location accuracy | • Compare with known landmarks • Check against satellite imagery • Look for obvious errors (facilities in water) |
Features should be reasonably close to expected positions | Wrong locations affect all distance-based analysis |
Information accuracy | • Compare feature names/types with local knowledge • Check for duplicate entries • Look for missing or nonsensical values |
Information should match what you know about the area | Wrong information leads to wrong conclusions |
Logical consistency | • Check if roads connect properly • Verify administrative units don't overlap • Ensure features are in correct boundaries |
Data should follow logical rules | Inconsistencies suggest data processing errors |
Source credibility | • Check who created the data • Look for documentation • Note the data collection date |
Authoritative or well-documented sources | Helps assess overall trustworthiness |
Understanding accuracy terminology (for beginners):
- RMSE (Root Mean Square Error): A measure of average position error - think of it as "typical distance off"
- Ground truth: Real-world verification of what the data shows
- Validation: Checking data against a trusted source
- Anomaly: Something unusual that might indicate an error
Practical tips for beginners:
-
Start with visual checks: Load your data in GIS software and look for obvious problems
- Do features appear where you expect them?
- Are there gaps or clusters that seem wrong?
- Do boundaries make sense?
-
Use multiple sources: When possible, compare data from different sources
- If they agree, confidence increases
- If they disagree, investigate further
-
Document what you find: Keep notes about:
- Which datasets you checked
- What issues you discovered
- How you decided to handle them
-
Accept imperfection: No dataset is perfect
- Understand the limitations
- Decide if the quality is "good enough" for your purpose
- Be transparent about known issues
Questions to ask yourself:
- Is this data accurate enough for my analysis needs?
- What are the consequences if some locations or attributes are wrong?
- Can I verify critical features through other means?
- Should I collect additional data to fill quality gaps?
Remember: The goal isn't to achieve perfect data (which rarely exists) but to understand your data's limitations and work appropriately within them. Quality assessment builds your confidence in knowing when and how to use the data effectively.
Completeness is about whether your spatial data tells the whole story. It involves two key questions:
- Geographic coverage: Are all areas in your study region included?
- Feature completeness: Are all relevant features captured in the data?
Why completeness matters: Missing data can lead to biased analysis and poor decisions. For example, if rural health facilities are missing from your dataset, any analysis will unfairly represent rural areas as underserved.
Common completeness issues to watch for:
-
Geographic gaps: Some areas might have no data at all
- Remote or rural areas often have less complete data
- Border regions may be missed due to administrative divisions
- Islands or isolated communities frequently lack coverage
-
Feature gaps: Important features might be missing
- Informal facilities/services often not captured in official data
- Recently built infrastructure not yet included
- Temporary or seasonal features overlooked
-
Inconsistent coverage: Data quality varies across regions
- Urban areas typically have more complete data
- Some districts may have better data collection than others
- Historical areas might have outdated or missing information

Figure xx. Map showing spatial data completeness assessment for a sample dataset, highlighting gaps in coverage
Simple ways to check completeness:
Check Method | What to Do | What to Look For | Example |
---|---|---|---|
Visual inspection | Display data on a map | Obvious gaps or empty areas | No roads shown in certain districts |
Administrative comparison | Check each admin unit has data | Missing districts or regions | 3 out of 20 districts have no health facility data |
Local knowledge | Ask people familiar with the area | Known features not in dataset | Major market not appearing in commercial data |
Coverage statistics | Calculate data density by area | Significant variations | Urban areas: 50 features/km², Rural: 0.5 features/km² |
Multiple sources | Compare different datasets | Features in one but not another | OSM has schools that government data lacks |
Documenting completeness (simple approach):
When you assess completeness, keep track of what you find:
Geographic Area | Data Theme | What's Missing | Why It Matters | Possible Solutions |
---|---|---|---|---|
Northern Region | Schools | Rural schools | Underestimates education access | Check education ministry data |
Coastal Areas | Roads | Unpaved roads | Misrepresents connectivity | Use satellite imagery or local mapping |
City Center | Businesses | Informal shops | Economic activity understated | Field survey or crowdsourced data |
Mountain District | Health facilities | Mobile clinics | Health access appears worse than reality | Contact health department |
Practical tips for dealing with incomplete data:
-
Be transparent: Always note where data is incomplete
- "This analysis only includes formally registered facilities"
- "Rural areas may be underrepresented in this dataset"
-
Understand the impact: Consider how gaps affect your analysis
- Will missing data change your conclusions?
- Are the gaps in critical areas for your study?
-
Explore solutions based on your resources:
- No additional resources: Work with what you have, but document limitations
- Some time available: Seek supplementary data from other sources
- Resources for fieldwork: Consider targeted data collection for critical gaps
-
Use completeness as a filter: Sometimes it's better to limit analysis to well-covered areas
- Analyze only regions with >80% coverage
- Focus on urban areas if rural data is too sparse
Questions to guide your assessment:
- Where are the obvious gaps in coverage?
- Why might these gaps exist? (accessibility, administrative issues, recent changes)
- How critical are the missing areas/features to your analysis?
- What's the minimum completeness level acceptable for your purpose?
- Can you obtain supplementary data for critical gaps?
Remember about completeness:
- Perfect completeness is rare - most datasets have gaps
- Urban bias is common - expect better coverage in cities
- Official data often misses informal/temporary features
- Completeness can vary by theme (roads may be complete while buildings are not)
- Document what's missing as thoroughly as what's present
The goal is not to achieve 100% completeness but to understand what's missing and how it affects your analysis. This awareness helps you make informed decisions and communicate limitations honestly.
Temporal resolution refers to how current your data is and how often it gets updated. Using outdated data can lead to incorrect conclusions, especially for features that change frequently.
Why timing matters: The world changes constantly - new roads are built, facilities open or close, populations shift. Understanding how fresh your data needs to be depends entirely on what you're analyzing.
Common temporal issues to consider:
Data Type | Typical Change Rate | Ideal Update Frequency | Impact of Outdated Data |
---|---|---|---|
Demographic data | Slow | 1-5 years | Population estimates become less accurate |
Infrastructure | Moderate | 6 months - 2 years | Missing new developments, closed facilities |
Transportation | Variable | Monthly - Annual | Route changes, service updates missed |
Emergency services | Fast | Real-time - Monthly | Critical service availability incorrect |
Land use | Moderate | 1-3 years | Urban expansion not captured |
Economic activity | Fast | Monthly - Quarterly | Business closures, market changes missed |

Figure xx. Timeline visualization showing ideal temporal resolution for different data types
Simple ways to check temporal quality:
-
Look for date stamps: When was the data collected or last updated?
- Check metadata for collection dates
- Look for "last modified" information
- Note any version numbers
-
Consider the context: How fast do things change in your study area?
- Urban areas typically change faster than rural
- Developing regions may have rapid infrastructure changes
- Post-disaster areas need very current data
-
Identify seasonal patterns: Some features vary by season
- Road accessibility in rainy seasons
- Seasonal businesses or services
- Agricultural land use changes
Questions to guide your assessment:
- How old is too old for my analysis purpose?
- What features in my area change most rapidly?
- Are there recent events that would make older data obsolete?
- Does my analysis period match my data period?
Practical tips for temporal issues:
-
Date your analysis: Always state when your data was collected
- "Based on 2023 road network data"
- "Population figures from 2020 census"
-
Mix time periods carefully: When combining datasets
- Document different collection dates
- Consider if changes between dates affect results
- Avoid comparing different time periods directly
-
Update critical data: Prioritize updating frequently-changing features
- Emergency services for safety analysis
- Transportation for accessibility studies
- Current businesses for economic analysis
Red flags for temporal problems:
- No date information available
- Data more than 5 years old for dynamic features
- Major events (disasters, construction projects) since data collection
- Inconsistent dates across related datasets
Remember: The "right" update frequency depends on your use case. Census data from 5 years ago might be acceptable for general planning, but emergency service locations from last year could be dangerously outdated. Always consider how data age affects your specific analysis goals.
Spatial resolution refers to the level of detail in your data - how small are the units used to represent geographic features? This determines what patterns you can see and what might be hidden.
Why resolution matters: Think of spatial resolution like zoom levels on a map. At low resolution (zoomed out), you see general patterns. At high resolution (zoomed in), you see local details. The right resolution depends on what questions you're asking.
Understanding different resolution levels:
Spatial Resolution | What You Can See | Best Used For | Trade-offs | Common Examples |
---|---|---|---|---|
Building/Parcel | Individual structures | Detailed neighborhood analysis | • Hard to get • Privacy issues • Large file sizes |
Building footprints, property maps |
Block/Village | Small area patterns | Community planning | • Good detail • Manageable size • Some boundaries unclear |
Census blocks, neighborhood data |
District/Municipality | Area-wide trends | Local government planning | • Matches admin units • Hides local variation |
Municipal statistics, service areas |
Province/State | Regional patterns | Policy making | • Complete coverage • Too general for local needs |
National surveys, regional data |

Figure xx. Multi-scale visualization of the same area showing how different spatial resolutions reveal or obscure patterns
Common resolution concepts explained simply:
-
Minimum mapping unit: The smallest feature that appears in your data
- Example: If buildings smaller than 50m² aren't included, that's your minimum
- Why it matters: Small but important features might be missing
-
Spatial patterns across scales: How patterns change at different resolutions
- Example: Poverty might look evenly distributed at province level but clustered at neighborhood level
- Why it matters: Coarse resolution can hide important local variations
-
Scale-appropriate representation: Using the right detail level for your purpose
- Example: City-wide planning doesn't need individual building details
- Why it matters: Too much detail can be overwhelming; too little hides important patterns
-
Resolution matching: Making sure different datasets work together
- Example: Combining neighborhood-level health data with district-level population data
- Why it matters: Mismatched resolutions create analysis problems
-
Aggregation effects: How combining smaller units into larger ones changes the picture
- Example: Average income by district versus by neighborhood
- Why it matters: Aggregation can hide pockets of need or opportunity
Practical considerations:
Choosing the right resolution:
- What decisions will be made with this analysis?
- Who needs to use the results?
- What's the smallest area that matters for action?
- What data is actually available?
Common resolution mismatches and solutions:
Situation | Problem | Practical Solution |
---|---|---|
Population data by district, facilities as points | Can't calculate facility density accurately | Aggregate facilities to districts or find finer population data |
High-res satellite imagery, coarse admin boundaries | Imagery detail wasted | Consider creating finer analysis zones |
Mixed urban (detailed) and rural (coarse) data | Uneven analysis quality | Document the difference, analyze separately |
Different years at different resolutions | Changes confused with resolution effects | Use consistent resolution across years |
Questions to ask yourself:
- Is my data resolution appropriate for my analysis goals?
- Am I seeing real patterns or just resolution effects?
- Where might important details be hidden by coarse resolution?
- Do all my datasets have compatible resolutions?
Tips for working with resolution:
- Start by understanding what resolution you have (check metadata)
- Document any resolution conversions you make
- Be honest about what your resolution can and cannot show
- Consider showing results at multiple resolutions when possible
- Remember: finer resolution isn't always better - it depends on your purpose
Red flags:
- Trying to make local decisions with regional data
- Combining incompatible resolutions without acknowledgment
- Assuming fine resolution data is automatically more accurate
- Ignoring how aggregation might hide important variations
The key is matching your data resolution to your analysis needs while being transparent about what details might be missed at your chosen scale.
Choosing the right tools and file formats can make the difference between a smooth workflow and hours of frustration. This section helps you select appropriate technologies for extracting and working with spatial data.
Different file formats serve different purposes. Choosing the right one depends on your data type, intended use, and the software you're using.
Understanding spatial data formats:
Format | What It's For | Pros | Cons | When to Use |
---|---|---|---|---|
GeoJSON | Sharing vector data online | • Human-readable • Works in web browsers • Easy to edit |
• Gets slow with big files • Only for vector data |
Web mapping, data sharing, APIs |
Shapefile | General vector data | • Works everywhere • Industry standard • Fast processing |
• Multiple files (.shp, .dbf, .shx) • 10-character field names • 2GB size limit |
Desktop GIS analysis, data exchange |
GeoTIFF | Raster/image data | • Keeps location info • Compression options • Widely supported |
• Can be very large • Only for raster |
Satellite imagery, elevation data, continuous surfaces |
GeoPackage | Modern all-purpose | • Everything in one file • No size limits • Vector and raster |
• Newer format • Some software doesn't support |
Complex projects, data sharing, mobile apps |
CSV with coordinates | Simple point locations | • Opens in Excel • Very simple • Universal |
• Only points • No projection info • Easy to break |
Lists of locations, simple data transfer |

Figure xx. Decision tree for selecting appropriate spatial data formats based on data characteristics and intended use
Quick format selection guide:
Ask yourself these questions:
-
Is your data vector (points, lines, polygons) or raster (images, grids)?
- Vector → Consider Shapefile, GeoJSON, GeoPackage
- Raster → Use GeoTIFF
-
Where will you use it?
- Web browser → GeoJSON
- Desktop GIS → Shapefile or GeoPackage
- Multiple platforms → GeoPackage
-
How big is your data?
- Small (<50MB) → Any format works
- Large (>1GB) → Avoid GeoJSON, consider GeoPackage
- Huge (>2GB) → Can't use Shapefile, need GeoPackage or database
Format conversion tips:
- Most GIS software can convert between formats
- Always check your data after conversion
- Keep the original file as backup
- Document what conversions you made
Common format problems and solutions:
Problem | Likely Cause | Quick Fix |
---|---|---|
Can't open file | Wrong format for software | Convert to supported format |
Missing location | No projection info | Define projection in GIS |
Broken characters | Encoding issues | Check UTF-8 encoding |
File too large | Inefficient format | Compress or change format |
Lost attributes | Format limitations | Check field name length |
Best practices:
- Choose formats based on your workflow needs, not just familiarity
- Consider your collaborators' software capabilities
- Document which format and projection you're using
- Test format compatibility early in your project
- Keep data in the simplest appropriate format
Remember: No format is perfect for everything. The "best" format depends on your specific needs, software, and sharing requirements.
Different software tools serve different purposes in spatial data work. This section helps you choose the right tool for your needs and skill level.
Desktop and analysis software:
Software | What It Does | Skill Level | Cost | Best For |
---|---|---|---|---|
QGIS | Complete GIS toolkit with visual interface | Beginner to advanced | Free | • Making maps • Basic to advanced analysis • Data viewing and editing |
GDAL/OGR | Converts between formats, processes data via commands | Intermediate to advanced | Free | • Batch converting files • Automating repetitive tasks • Format troubleshooting |
Python (GeoPandas, Rasterio) | Programming for custom analysis | Intermediate to advanced | Free | • Repeating analysis • Combining many datasets • Building custom tools |
R (sf, terra) | Statistical analysis with maps | Intermediate to advanced | Free | • Statistical modeling • Research analysis • Data visualization |
Google Earth Engine | Analyze satellite imagery online | Intermediate | Free | • Large area analysis • Time series • No download needed |
PostgreSQL/PostGIS | Database for storing and querying spatial data | Advanced | Free | • Managing large datasets • Multi-user access • Complex spatial queries |
Understanding PostgreSQL/PostGIS: Think of it as a filing cabinet for maps. Instead of having hundreds of files on your computer, you store everything in one organized system where you can quickly find what you need using searches like "show me all schools within 2km of a road."
Which desktop tool should you start with?
- New to GIS? → Start with QGIS
- Have programming experience? → Try Python or R
- Working with satellite imagery? → Use Google Earth Engine
- Managing lots of data for an organization? → Consider PostgreSQL/PostGIS
Mobile data collection tools:
Sometimes you need to collect data in the field to fill gaps or verify existing information. These tools help you do that:
Tool | Platform | Works Offline? | What It's Like | Use It For |
---|---|---|---|---|
ODK Collect | Android | Yes | Like a digital survey form | • Recording locations • Structured questionnaires • Photo documentation |
KoboToolbox | Web, Android | Yes | User-friendly forms with analysis | • Field surveys • Monitoring visits • Data visualization |
QField | Android | Yes | QGIS on your phone | • Editing existing maps • Professional data collection • Complex geometries |
Mapillary | iOS, Android | Collect offline, process online | Street View-style photos | • Road conditions • Infrastructure inventory • Visual documentation |
Epicollect5 | iOS, Android | Yes | Simple and flexible | • Community mapping • Citizen science • Quick surveys |
Understanding ODK Collect: Imagine a digital clipboard that knows your location. You create forms on a computer (like "Hospital Assessment"), then collectors use phones to fill them out in the field, automatically recording GPS locations. All responses sync to a central database when internet is available.

Figure xx. Field data collection workflow diagram, from planning to integration with existing datasets
Choosing mobile tools:
Ask yourself:
- What am I collecting? Simple points → ODK/KoboToolbox; Complex mapping → QField
- Who's collecting? Community members → Simple tools; GIS professionals → Advanced tools
- Need photos? Documentation → Any tool; Street view → Mapillary
- What happens to the data? Just viewing → Any tool; GIS analysis → QField/ODK
Practical workflow example:
- Plan: Define what data you need
- Design: Create forms/projects in chosen tool
- Test: Try it yourself before deployment
- Collect: Field teams gather data
- Sync: Upload data when connected
- Process: Clean and validate in QGIS
- Integrate: Combine with existing datasets
Tips for software selection:
- Start simple - you can always upgrade later
- Test with a small pilot before full deployment
- Consider your team's technical skills
- Check if your existing data works with the tool
- Look for active user communities for help
Remember: The best tool is the one your team can actually use effectively. Fancy features don't help if they're too complex for your situation.
When combining spatial data from different sources, you need to ensure they all "speak the same language." This means using consistent formats, coordinate systems, and classification methods so your datasets work together properly.
Think of coordinate reference systems (CRS) like different ways of drawing the round Earth on flat maps. Each system has trade-offs, and using the wrong one can make your analysis incorrect.
Why this matters: If your datasets use different coordinate systems, features won't line up properly. Roads might appear to run through buildings, or distances might be wildly wrong.
Common coordinate system concepts (in plain language):
-
Geographic coordinates (lat/lon): Like a global address system using degrees
- Example: 40.7128°N, 74.0060°W (New York City)
- Good for: Storing locations, sharing data globally
- Bad for: Measuring distances or areas
-
Projected coordinates: Flatten the Earth for accurate measurements
- Example: UTM coordinates like 583960E, 4507523N
- Good for: Measuring distances, calculating areas
- Bad for: Large areas (distortion increases)
Key coordinate systems to know:
Purpose | System Name | Code | When to Use | Remember |
---|---|---|---|---|
Storing data | WGS84 | EPSG:4326 | Default for GPS, data sharing | Universal standard |
Web maps | Web Mercator | EPSG:3857 | Online mapping | Distorts sizes badly |
Local analysis | UTM (your zone) | Varies | Distance measurements | Different zones for different regions |
Area calculations | Local equal-area | Varies | Measuring land area | Preserves area, distorts shape |

Figure xx. Map showing how coordinate system choice affects area and distance measurements in different regions
Simple checks for coordinate system issues:
- Visual check: Load all datasets in GIS - do they line up?
- Location check: Is your data in the right country/ocean?
- Unit check: Are coordinates in degrees (-180 to 180) or meters (large numbers)?
Common coordinate system problems:
Problem | What You'll See | Solution |
---|---|---|
Missing CRS | Data appears in wrong location | Define the coordinate system |
Wrong CRS | Data offset by hundreds of meters/miles | Reproject to correct system |
Mixed systems | Layers don't align | Convert all to same system |
Web vs. analysis | Measurements incorrect | Use appropriate system for task |
Practical workflow:
- Check what coordinate system each dataset uses
- Choose appropriate system for your analysis:
- Just viewing? → Keep original
- Measuring distances? → Use projected system
- Combining datasets? → Convert all to same system
- Document your choice
- Transform datasets as needed
- Verify alignment visually
Quick decision guide:
- Sharing data internationally? → Use WGS84 (EPSG:4326)
- Making a web map? → Convert to Web Mercator (EPSG:3857)
- Measuring distances locally? → Use appropriate UTM zone
- Calculating areas? → Use equal-area projection for your region
Tips for beginners:
- Most GPS data comes in WGS84 - this is your starting point
- Your GIS software can convert between systems (called "reprojecting")
- When in doubt, WGS84 is the safest choice for storage
- Always document which system you used
- If data doesn't align, coordinate system mismatch is the likely culprit
Remember: There's no perfect coordinate system for everything. Choose based on what you need to do with the data, and be consistent across all your datasets.
Sometimes administrative boundaries (like districts or provinces) don't work well for analysis. They vary wildly in size, shape, and population. Spatial indexing systems offer an alternative: consistent, regularly-shaped units that make comparison and analysis easier.
What are spatial indexes? Think of them as graph paper laid over a map. Instead of irregular administrative units, you get uniform grid cells or hexagons. This makes it easier to:
- Compare different areas fairly
- Aggregate data consistently
- Analyze patterns without boundary bias
Common spatial indexing systems:
System | Shape | Who Uses It | Best For | Simple Explanation |
---|---|---|---|---|
H3 | Hexagons | Uber, data scientists | Smooth analysis | Like honeycomb cells covering the Earth |
S2 | Squares (curved) | Big data | Squares that fit Earth's curve | |
Simple Grid | Squares | Anyone | Basic analysis | Regular graph paper grid |
Quadkeys | Squares | Microsoft, web maps | Online maps | How web map tiles are organized |

Figure xx. Comparison of H3 hexagons versus administrative boundaries for analyzing service access patterns
Why use spatial indexes instead of administrative boundaries?
Issue with Admin Boundaries | How Indexes Help |
---|---|
Vastly different sizes (tiny urban districts vs huge rural ones) | All cells are similar size |
Irregular shapes make distance calculations complex | Regular shapes simplify analysis |
Political boundaries change over time | Grid stays constant |
Hard to compare densities across different areas | Equal areas make comparison fair |
H3 Hexagon System (Most Popular for Analysis):
H3 is like laying hexagonal tiles over your map. You choose the tile size based on your needs:
Resolution | Cell Width | Area | Think of It As | Use When Analyzing |
---|---|---|---|---|
7 | ~1.2 km | 5 km² | Large neighborhoods | City-wide patterns |
8 | ~460 m | 0.7 km² | Several city blocks | Neighborhood services |
9 | ~174 m | 0.1 km² | Single block | Local accessibility |
10 | ~65 m | 0.015 km² | Large buildings | Detailed urban features |
When to use spatial indexing:
- Comparing service access across a city
- Analyzing population density fairly
- Creating heat maps of activity
- Standardizing data from different sources
- Working across administrative boundaries
When to stick with administrative boundaries:
- Your results need to match government units
- You're working with official statistics
- Decision-makers expect traditional boundaries
- Data is only available by admin unit
Simple example: Instead of: "District A has 5 hospitals, District B has 2" (But District A is 10x larger!)
Using H3: "These hexagons average 0.8 hospitals per km², those hexagons average 1.2 hospitals per km²" (Fair comparison!)
Getting started:
- Most indexing systems have online tools to generate grids
- QGIS plugins available for H3 and simple grids
- Start with resolution 8 for city-level analysis
- Test different resolutions to find what works
Tips:
- Hexagons (H3) are better for distance analysis than squares
- Squares are simpler and more familiar to most users
- You can always aggregate indexed data back to admin boundaries
- Document which system and resolution you used
Remember: Spatial indexes are just another way to organize geographic data. Use them when they make your analysis clearer and fairer, but don't overcomplicate things if traditional boundaries work fine for your purpose.
Once you have data from different sources, you need to combine them meaningfully. This section covers common methods and challenges you'll encounter when bringing datasets together.
Common ways to combine spatial data:
-
Spatial joins: Connecting data based on location
- Example: Which schools are in which districts?
- How: GIS software matches features by their position
-
Attribute matching: Using common identifiers
- Example: Joining census data to districts using district codes
- How: Match shared ID fields between datasets
-
Format conversion: Making different data types work together
- Example: Converting points to a grid for analysis with raster data
- How: Use GIS tools to transform between formats
-
Unit matching: Handling different spatial units
- Example: Population by district + services by neighborhood
- How: Aggregate or disaggregate to common units
-
Alignment fixing: Making misaligned boundaries match
- Example: Two datasets with slightly different coastlines
- How: Adjust boundaries to a common reference

Figure xx. Workflow diagram showing the integration process for combining different types of spatial data
Common integration challenges and practical solutions:
Challenge | What It Looks Like | Simple Solution | Remember to Document |
---|---|---|---|
Boundaries don't match | Features appear on wrong side of borders | Snap to official boundaries | Which boundary file you used |
Different time periods | 2020 population with 2023 facilities | Note the time difference | Date of each dataset |
Different detail levels | City blocks vs whole districts | Aggregate to coarser level | What detail was lost |
Different categories | "School" vs "Primary/Secondary" | Create matching table | How you grouped categories |
Coverage gaps | Some areas have no data | Note gaps or estimate | Where data is missing |
Understanding temporal issues (time differences):
Real-world data is collected at different times, and things change. As a beginner, you don't need complex adjustments, but you should:
- Know your data dates: Always check when data was collected
- Think about change: Has the area changed significantly since then?
- Document differences: Note when datasets are from different years
- Be transparent: Tell users about time gaps in your analysis
Simple example of time awareness: Population data: 2020 census School locations: 2023 survey → Note: "Population figures are 3 years older than facility data" → Consider: Have new neighborhoods been built since 2020?

Figure xx. Timeline showing how different datasets often come from different time periods
Basic temporal alignment approach:
When time differences matter, you can:
- Use everything as-is: Often fine if changes are slow
- Pick a reference year: Try to get all data close to one year
- Update critical data: Refresh the most important/changeable datasets
- Document clearly: Always note the time period of each dataset
Integration workflow checklist:
- List all datasets and their formats
- Check coordinate systems match
- Note the date of each dataset
- Identify common joining fields or locations
- Test integration with a small sample
- Document any transformations made
- Check results make sense visually
- Note any assumptions or limitations
Practical tips:
- Start simple - join just two datasets first
- Always keep original files unchanged
- Document every step you take
- Verify results visually in your GIS
- When in doubt, note the limitation rather than hiding it
Red flags to watch for:
- Features appearing in wrong locations after joining
- Sudden changes in patterns at dataset boundaries
- Missing data after integration (did the join fail?)
- Unrealistic values after aggregation
Remember: Perfect integration is rare. The goal is to combine data thoughtfully, understand the limitations, and document what you did so others can evaluate your work.