Free GIS Book: Metadata
Metadata
Introduction
This chapter of the FreeGIS Book is about geospatial metadata. It first provides a simple definition and explanation of geospatial metadata. It then considers the following topics:
- The Creation of Metadata
- The Use of Metadata
- The Content of Metadata
- Why Metadata Is Important
What You Should Already Know
There are some basic things you need to know about GIS if you want to get the most benefit out of this chapter. Before reading this chapter you should know and understand the following things:
- What a Geographic Information System is and what it can do.
- The differences between the two basic types of GIS Data, raster data and vector data.
- How GIS data is organized into geospatial datasets.
- How GIS data is collected and created.
What You Will Learn
After reading this chapter and completing the excercises you should have learned the following things:
- How metadata is created.
- How metadata is used.
- How metadata standards can be used to guide the content and creation of metadata.
- Why it is important to produce and use metadata.
The Definition Of Metadata in a Geospatial Context
Metadata is often defined as "data about data". As an example, you might have metadata for a document might include the following information:
- Identify the purpose of the document.
- The date the document was written.
- The date the document was last revised.
- The author.
Geospatial Metadata is information about a geospatial dataset that may not be contained in, or may not be readily apparent from, the Geospatial dataset itself. For example, you could view the coordinates of feature geometry in a vector data set, but may not be able to determine the spatial reference system from the coordinate values. You could view the name of a feature attribute, but that may not indicate what aspect of the real world that attribute represents. This is the type of inormation that can be encapsulated in geospatial metadata.
The Creation of Metadata
Metadata can be created by various organizations involved in creating, maintaining and producing geospatial data. Metadata can be created by organizations that actually create spatial data. This can be an organization converting information from paper maps into digital format, or organizations that create geospatial data from mapping measurements made in the fied or collected from sensors. Metadata can be created by organizations that provide geospatial data to others. This might be an organization that maintains a catlog of GIS data, or an organization that takes geospatial data produced by others and then enhances it. Metadata can be created by organizations that maintain geospatial data. For example, a tax assessor may be responsible for maintaining a geospatial data set of parcels. It could also be an organization that consumes or uses geospatial data. For example, a local utility company might use a geospatial data set of parcels maintained by the tax assessor to manage customer information in its own GIS. Each of these organizations that creates metadata may be interested in different metadata elements or metadata content. For example, an organization that actually creates geospatial data may be interested in documenting the actual process of data creation and the quality of the results, while an organization that maintains geospatial data might be more interested in tracking edits to geospatial data.
Metadata can be created or modified at various points in the lifecycle of a geospatial dataset. This is implied by the types of organizations identified in the previous paragraph, but lets look at metadata as it relates to the geospatial data set lifecycle a little more closely.
Metadata can be created or modified at the following points in the geospatial data set lifecycle:
- When geospatial data is created.
- When geospatial data is catalog or organized.
- When geospatial data is modified or edited.
- When geospatial data is archived or "decomissioned".
What is Metadata Used For?
Metadata is used to provide information to potential users of a data set. For example: it can tell a user is a vector data set has the required set of attribute data or if a raster data set is new enough for this particular needs. In this way metadata can be very helpful for users, because they don't have to use a GIS software package to open a data set and examine it for this information. This can save the user time when they are searching for geospatial data to use in their GIS system.
Metadata is used to assist in the cataloging or indexing of information. Information contained in the metadata for a data set, but not contained in the data set itself, can be used to place the data in the appropriate category in the catalog or index. For example: Geospatial data may be organized in a catalog according to spatial reference system. The spatial reference system for data set may not be contained in data set itself, but is usually contained in the metadata for a data set. In this example, metadata would be key to placing a data set in the appropriate category. Other examples of this type include organizing geospatial data by the date it was created or last updated, or organizing it by the entity or organization responsible for maintiaining the data set.
Metadata can be used to prevent the abuse, misuse, or harmful use of GIS data. Metadata can indicate the security level need to access a data set containing sensitive information. (For example: The location of municipal wells or other water sources.) Metadata can also indicate the purpose of a data set, which can alert potential users to uses of data that might not be appropriate. (For example: Using a urban road data set created for a system to manage addresses may not be appropriate for use in determining alternatives for an emergency evacuation plan.) Metadata can also contain restrictions, imposed by a data license or other means, that the creators or maintainers of a geospatial data set want to impose on the use of the dataset.
Metadata Content
What should be included in geospatial Metadata? This depends a great deal upon the following factors:
- The basic data type of the geospatial data. (Metadata for vector data will likely have different content than metadata for raster data.)
- The format of the data. (Metadata for geospatial data stored in a DXF file will likely have different content than metadata for geospatial data stored in a ESRI Shapefile.)
- The organization producing or maintaining the data. (A fire department may use different content for metadata on city addresses than a tax assessor would.)
- The metadata standard used to create the metadata. (See the later section on metadata standards for more information.)
- The intended use of the metadata. (Metadata designed for a GIS data catalog will likely have different content than metadata designed for high-security geospatial data.)
We can identify some common metadata elements despite these different factors influencing the content of metadata.
Common Elements of Metadata
Identification Information: This is information that allows the geospatial data set to be uniquely identified and distinguished from other data sets. This information would also assist in the cataloging of the geospatial dataset. This would include the name of the dataset, keywords, basic description, and the geographic extents of the dataset.
Data Quality Information: This is information about the data quality of the geospatial data set. This could include information about the completeness of the data set, the processes used to create and maintain it, and the amount of validation or verification performed on the dataset.
Spatial Data Representation Information: This is information about the spatial representation of the dataset. It could include information about the precision and accuracy of vector geometry or the resolution of raster data.
Non-Spatial (Attribute or Tabular) Data Information: This is information about the attribute data associated with features in a geospatial data in vector format, or attribute data associated with cells of geospatial data in raster format. This could include the meaning of attribute names, the valid values, domain, or range for attribute values, and the method used to collect and update attribute values.
Distribution Information: This is information that can be used to govern the distribution the geospatial dataset. It includes the identity of the organization creating and maintaining the data set, and the date the dataset was published or made available to the public.
Using a Metadata Standard
What is a metadata standard?
A metadata standard or specification is a document that contains rules or guidelines for the content of metadata. These standards can be created specifically for use within a particular organization, or for much broader adoption. They allow for the creation of tools, often with a graphical interface, that can be used to create and modify metadata that meets the standard. They may also allow the metadata to be validated against the standard. This is another way of saying that instances of the metadata can be checked for conformance with the standard.
Metadata standards often contain rules or guidelines about the following items:
- The content of the metadata. This would include the required elements of the metadata.
- The format of the metadata. A metadata standard may specify a specific file format for metadata documents, such as XML or a delimited text format.
- Creation and update policies. These policies would govern when instances of metadata are created and when they are updated.
- Conformance Rules. These rules are used to determine if a particular instance of metadata complies with the standard.
Why use a metadata standard?
We have mentioned some of the benefits of using a metadata standard already. These benefits include the ability to use tools that can ease creation and maintenance of metadata conforming to the standard and the ability to automatically check a metadata instance for conformance with the standard. Another benefit is clarity about metadata requirements and content within your organization, or between organizations that share geospatial data.
Cooking Up Your Own Metadata Standard
Why cook up your own metadata standard? Unless there is specific information you need to capture in your metadata you probably don't need to cook up your own metadata standard. One advantage of a standard is widespread use. This benefit will be eroded if your organization maintiains its own, independent metadata standard.
Still, there may be situations in which it makes sense to track custom metadata. In this case it would be prudent to develop a standard of your own that governs how this metadata is created, stored, and used.
What are some things you need to consider when cooking up your own metadata standard? Here is a short list:
- What metadata elements will I want to capture that are not part of an existing standard? How will this information be represented in my metadata?
- Where in my organization is are these metadata elements produced?
- At what point in the process are they produced?
- What procedures do I need to have in place to capture these metadata elements?
- Can I extend an existing metadata standard to capture this information?
- What file format will I use to store my metadata?
- Will this information be shared outside of my organization? If so, how will it be formatted? Will it need to be modified before it is shared?
Why Is Metadata Important?
This article has already mentioned some of the benefits that come from preparing and maintain metadata for geospatial information. In this section we expand this discussion on the benefits of metadata and focus on three (3) specific ways in which metadata is important. They are (1) ensuring the proper use of geospatial data, (2) using metadata to explain the spatial component of geospatial data, and (3) coordinating data collection to avoid data duplication.
Ensuring The Proper Use of GeoSpatial Data
One of the most important benefits of metadata is ensuring the proper use of geospatial data. Geospatial data is typically created for a specific purpose or use. This purpose of use is not always evident from the data itself. Problems can (but don’t always) arise when geospatial data is used for purposes that it was not designed for. Here are some examples that can help you appreciate what type of problems we are talking about.
Example #1: A county public works department creates a geospatial data set representing road centerlines. This data set is designed to allow county maintenance crews to map the location of street signs as part of a street sign inventory. No address information suitable for geocoding is collected as part of the road centerline dataset, and topology for the road centerlines is not created. A company specializing in fleet management solutions for package delivery companies acquires the road centerline data set and attempts to use it to model address location and traffic routing.
Example #2: A utility district collects the location of storm water manholes, storm water pump stations, and storm water outlets as part of a one-time project to determine the value of the utility districts physical assets. The data is not maintained with changes to the storm water system. Years after the original geospatial data set is created the utility district tries to use the information about storm water outlets for a pollution monitoring program.
Example #3: A city tax collector maintains a geospatial data set of approximate parcel lines for all of the real estate within the city limits that it assesses. This geospatial dataset does not show possible gaps, overlaps, encroachments or other problems that may exist in parcel boundaries. A company that manages commercial real estate uses this information to show potential purchasers of commercial property where there property lines are.
These three (3) examples illustrate some of the situations that arise because metadata that conveys the intent or purpose of a geospatial dataset is never created or does not contain the necessary information. All three (3) situations could have been prevented if the producers of the datasets created and maintained good metadata and if the potential users of the datasets reviewed this metadata before using the datasets.
Metadata should always identify the purpose of the geospatial dataset it documents and should identify appropriate uses for the data. It is also helpful for the metadata to identify improper or potentially harmful uses of the geospatial dataset it documents.
Where is 5000, 5000, 100? (Using Metadata To Explain The Spatial Component of Geospatial Data)
One of the most frustrating (and common) problems that new GIS users have when acquiring and using existing geospatial data is determining what spatial reference system is in the position of the spatial component of the dataset. The actual ordinate values of feature geometry can provide some clues. (As an example, spatial data referenced to the NAD 83 California State Plane Coordinate System, using United States Survey feet for the distance unit will have northing and easting values in the six million and two million range.) However, it is not always possible to determine the spatial reference system for geospatial data based on the ordinate values of the feature geometry. This could also lead to an erroneous identification of the spatial reference system in some cases.
If the spatial reference system of a geospatial dataset is not known, or can’t be determined the geospatial data set looses much of its value. Although spatial analysis can be made on features in the same data set, spatial analysis can’t be performed on other geospatial data sets for the same location, but in a different spatial reference system.
As a result, one of the most important uses of metadata is to identify the details of the spatial reference system of a geospatial data set. This includes the following items (not all items will be required for each dataset):
- Horizontal Datum
- Vertical Datum
- Map Projection
- Referenced Survey Control Points and/or Vertical Benchmarks
- Ellipsoid Model and Geoid Model
- Epoch Date and Datum or Control Network Adjustment Information
Coordinating Data Collection and Avoiding Data Duplication
An important use of metadata is to identify geospatial datasets in the context of larger groups of geospatial datasets. (For example: Metadata could identify a geospatial dataset of small business office locations created in the last 2 years in the context of datasets containing information for all businesses within the same area, or for datasets containing other demographic data.) This allows cooperating organizations or interested parties to avoid the creation or modification of a geospatial dataset when an existing dataset already exists that will meet their needs. Since the costs of geospatial data creation and modification are substantial this is an important use of metadata that can result in direct financial savings.
Keeping Metadata Up-To-Date
As mentioned previously in this chapter, metadata can be created or modified at the following points in the geospatial data set lifecycle:
- When geospatial data is created.
- When geospatial data is catalog or organized.
- When geospatial data is modified or edited.
- When geospatial data is archived or "decommissioned".
It is important to keep metadata up-to-date at these points. Metadata that is out-of-date, or that contains incorrect information, can be almost as dangerous or more dangerous than a geospatial dataset with no metadata at all. Here is an example illustrating this case:
A county planning department uses a parcel dataset and road centerline dataset to determine if proposed residential subdivisions will have sufficient access to public roads. One of the planners at the county receives a phone call asking about the development potential of a parcel in the county. The planner doesn’t realize that all public roads without a paved driving surface have been removed from the road centerline dataset by someone in the public works department and placed in a new dataset for unimproved roads. He informs the caller that the parcel won’t be eligble for residential development unless a new public access road is built connecting it to the nearest highway, because it isn’t currently served by any public roads. The planner doesn’t know that the parcel is in fact connected to a public road, but it is a gravel road that no longer appears in the road centerline dataset he is using.
What can we learn from this simple example? A couple of things stand out:
Any major edit or modification to a geospatial dataset should be captured in the metadata for that dataset.
Users of a geospatial dataset, even those within the same organization, should be informed of changes to existing datasets that they utilize. Updated metadata can be a useful vehicle to communicate and track these changes.
Organizations that create and modify geospatial data should have policies in place that guide the process used to update the metadata for geospatial datasets subject to these changes. Organizations that use geospatial data should have policies in place that guide the process used to track changes to the data they utilize, and to communicate these changes to the impacted users in their organization. This may include obtaining regular data updates and coordination with outside organizations.
Individual Feature Metadata
Metadata is typically associated with a geospatial dataset as a whole, or as a group. Metadata is often not associated with individual features within a dataset. Using individual feature metadata is worth considering. What type of information would be stored in individual feature metadata? It could include some of the following items:
The source of the data used to create the geometry and attribute values of the feature.*
The organization that created the feature, the geometry representing this feature, or the feature attributes.*
The organization responsible for maintaining the feature.*
A description of the real world object that this feature represents.
A date indicating when this feature “expires” or when it needs to be updated.
User comments related to the feature.
Creation Date and Time
Creation Notes
Date and Time of Last Modification
List of Modifications
It may not be necessary to have information for each one of the items above. In particular, the items marked with an asterisk may not need to be completed and maintained unless the information is different for the specific feature when compared to the dataset as a whole. For example, it may be that a geospatial dataset contains hydrographic features that are maintained by a state government agency, with the exception of features representing navigable water ways used for commerce, which might be created and maintained by a federal agency. The identity of the federal agency might be listed as the maintainer organization on the individual feature metadata for those particular bodies of water, while it is presumed that the state agency is the maintainer for all the other bodies of water in the dataset.
Automated Metadata Management
What Is Automated Metadata Management?
Automated metadata is generated with a minimum of human interaction using a computer program. An example would be a computer program that automatically generated a list of image files and their corresponding world files in a particular folder of the computer filesystem. Such a tool could be set-up to scan the folder on a regular basis to keep a current inventory of the images in the folder. Another example would be a program that parses, or reads through a ESRI Shapefile to determine the geometry type of the features and that creates a list of the feature attributes.
Why Would You Want To Use Automated Metadata Management?
Despite its importance in any GIS, metadata creation and maintenance is often neglected because of the time and energy it requires. Creating and maintaining the metadata for your GIS is like changing the oil in your car. You don’t really notice you’ve neglected to do it until you ruin an expensive engine. Imagine for a moment that your automotive maintenance was performed for you automatically when you pulled into your garage at home. That is the type of benefit that automated metadata management can bring to your GIS.
What Are The Limitations Of Automated Metadata Management?
One obstacle to widespread use of automated metadata management is the cost of acquiring or developing the computer programs that produce it. Off-the-shelf or out-of-the-box solutions can be expensive. They may not capture all the metadata elements that you need, as metadata needs are often unique to each organization. Software development is not a simple task, and developing custom metadata tools is not always practical.
Another limitation of automated metadata management is the lack of the human element. (This is also, ironically, one of its advantages.) This lack of human interaction causes two challenges for automated metadata management.
[1] Automated metadata management tools typically need some type of event to trigger this execution. Connecting these tools to the other components of a GIS so they can be properly triggered can be a challenge. (Each stage of the metadata lifecycle should be considered when determining how to trigger the execution of automated metadata management tools.)
[2] The content of metadata generated by an automated process is often limited to two types of information:
(1) Information that is already present, but not clearly evident from or separated in, existing geospatial data. (An example would be the number of layers in a DXF file, or the color type of a georeferenced JPEG image.)
(2) Information that is set-up prior to the execution of a tool and that applies to more than one geospatial dataset or more than one part of a geospatial dataset. (An example of this would be the organization identity and contact information for geospatial dataset. If an organization created multiple datasets then this information could be reused by an automated metadata tool during metadata creation.
A Simple Metadata Example
Below is a simple example of geospatial metadata. This metadata was generated for DXF files of survey maps creating subdivisions of land. This is the type of DXF file that might be used within an organization.
Survey Map Name: Parcel Map 05-062
DXF File Name: PM_05-062.dxf
Date Drafting Completed: 2008-02-04
Drafter: Scott Huey
Maximum Parcel Closure Error: 0.68 Feet
Notes:
- The west line of Parcel 3 as shown on the Parcel Map is shown with a length of 536.85 feet. If this length is corrected to 563.85 feet the closure error for Parcel 3 is reduced by 27.00 feet. This correction was made in the DXF file representing this parcel map.
- The Parcel Map shows a Pacific Telegraph Company easement across Parcel 1 and Parcel 4. The width of the easement is not shown on the Parcel Map. The Parcel Map contains a surveyor’s note indicating that the width of the easement was not listed in the deed granting the easement. As a result this easement was not added to the Easement layer of the DXF as required by the drafting procedures.