Difference between revisions of "Response to INSPIRE Metadata Draft"

From OSGeo
Jump to navigation Jump to search
 
Line 102: Line 102:
 
|}
 
|}
  
== General esponses that are harder to "fit" but should be offered. ==
+
== General responses that are harder to "fit" but should be offered. ==
  
 
{|
 
{|
Line 110: Line 110:
 
| 5.3.2 || || T || 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by Dublin Core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spatial access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data. This seems a specific example where the abstract IR model needs to go beyond what is defined in the A2 binding || What is indicated by a Temporal Reference should be consistent between 5.2.2 and 5.3.2, or clearer examples offered in the Annexes.
 
| 5.3.2 || || T || 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by Dublin Core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spatial access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data. This seems a specific example where the abstract IR model needs to go beyond what is defined in the A2 binding || What is indicated by a Temporal Reference should be consistent between 5.2.2 and 5.3.2, or clearer examples offered in the Annexes.
 
|-
 
|-
| A || || T || I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concern about Annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
+
| A || || T || What are the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval the answer is "no", but this is less clear when result records aren't primarily "Text" based. (Actually, this is a slightly wider concern about Annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants. The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?) || Add a treatment in Annex I that describes the specifics of search across multi-language elements; reconsider the expression of multilanguage CharacterStrings in the light of information retrieval best practise (see comment on this change)
 
 
<title>
 
  <langstring lang="En">Hello</langstring>
 
  <langstring lang="Dk">Hej</langstring>
 
</title>
 
 
 
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?) || Add a treatment in Annex I that describes the specifics of search across multi-language elements; reconsider the expression of multilanguage CharacterStrings in the light of information retrieval best practise (see comment on this change)
 

Latest revision as of 11:13, 30 March 2007

Any member of the Free and Open Source Geospatial software community is welcome to participate in creating this response. Please add your name, and contact details on your userpage, to the list of participants and at any key stage (first stable draft; when sending the response).

Due 15.03.2007? -- Stefan 19:11, 15 March 2007 (CET)

I am sure it was the 30th, the NAP draft was due the 15th?

Right now i am trying to get the docs for the form of the response out of the JRC and change our contact dets for the SDIC. -- Jo 23-03

The deadline is definitely the 30th March, thankfully. Comments must be submitted in this Excel template: http://www.ec-gis.org/inspire/sdic_call/templates/metadataIRComments.xls

The form for each comment is:

Chapter, section or clause no. | Paragraph/Figure/Table | Type of comment | Comment justification for change) | Proposed change 

So i am working on integrating Ian's and Jan's comments and reconnecting mine to chapter, verse and clause. I planned to put it all into the Excel template this evening, then i discovered that the template doesn't work properly in OpenOffice. So i'm just going to dump a pipe-delimited version of what is collated into this wiki page for others to see, and worry about getting it into the proper format later in the week :/ I realise it makes sense to do that anyway because there is a lot to do in terms of structuring to get as much directly connected to the draft as possible... What is there now is only a start. :/

-- Jo 26-03

Just to mention i am doing another batch this evening and will upload them either later or first thing a.m. Meanwhile the JRC replaced the spreadsheet template with one that works, which is great.

-- Jo 28-03

Participants

Responses

Note: these will be automatically re-sorted by chapter, clause etc order when they are submitted so there's no priority ordering and not much point in number ordering...

5.2.8 - T If a Resource responsible party for the data is mandated, then contact details, either an electronic mail address, a HTTP URL with details, or a phone number, should be included. If access to a dataset is restricted and it is not available via a service, no means of contact will impede the flow of data and the ability to reuse and redistribute it. Require contact details be provided for a resource responsible party.
A.2.8 - T Contact details should be mandated in the ISO19115 mapping in order to facilitate access to the data set. This could be an email, URL or telephone number. Add mandatory contact details for resource responsible party.
A.2.8.3 - T A contributor's "role" is not useful for discovery, evaluation or use of the data. In the interests of minimising the burden on the provider it should not be a mandatory property. Change Mandatory to Optional for role of resource responsible party
A.4.1 - T If a Resource responsible party for the metadata is mandated, then contact details, either an electronic mail address, a HTTP URL with details, or a phone number, should be included. A property such as email address can be used as an identifier to facilitate searches. Providing a contact for metadata assist in evaluation and reuse of it. Add contact details (email address, url or phone number) as mandatory if a metadata resource reponsible party is provided in the ISO19115 mapping.
5.2.12 T A free text lineage element is not structured enough for data evaluation. A more structured and detailed method of describing accuracy and quality of data should be prescribed, otherwise providers will be able to claim "compliance" with a minimal level of effort that does not further the interests of reuse of public sector information. If requiring lineage information. require it in a more structured form, which may overlap with Data Specification work
A.3.2 T A free text description of data "lineage" tries to perform too many functions at once. More detailed and structured description of a dataset's origins and processing history is needed to express and infer accurate data quality information. Conditions under which lineage is required to be provided, are too vague. Lineage should be structured information and not free text.
A.3.2 E Presumably should read 'mandatory when...' Should read 'mandatory when...'
5.3.4 Table 3 T
6.2 T Identifier are of two types. This is problematic. It is usually better to have only one type of identifier. The preferred one should be URL (URI). The document says that URL can be in some cases not unique. If you want to make identification more unique than mix URL with UUID. Like this: http://gis.vsb.cz/01f8da38-10d7-11da-b569-000f1f1a7b03 This may be covered in the Data Specification Implementing Rules? Recommend one preferred, URL-based identifier scheme
6.3 Paragraph 4 T A reference data set may not be available via a service, for many reasons including infrequent use, access restrictions, scarcity of resources or changes in distribution technologies. Should not assume that a reference data set is provided by a 'service'
6.3 Paragraph 13 G A provider may take a broad view of what consitutes "very fine grained" in order to exempt themselves from exposing insight into the detail and structure of a data set. More detailed description of what contitutes "very fine grained", and detail of what metadata is subject to exemption for components of a highly detailed data set, is required.
5.2.3 Paragraph 1 T A geographic extent may be in three dimensions and for some applications ( geology, meteorology, climatology ) vertical extent is crucial. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. The IR should not simply specify a bounding box but should outline the many alternatives. Change 'bounding box' to read 'bounding box, polygon or 3 dimensional set of polygons' (better wording)?
5.3.2 Table 1 T Article 11.1.a dictates that services must be searchable on the basis of "the public authorities responsible [for the data]", e.g. the resource responsible party. Resource responsible party should be a searchable property (marked with X)
5.3.2 Table 1 T Even a simple search through a large volume of data records cannot rely on a limited range of topic classifications and user-supplied keywords. 'Abstract' is likely to be the most verbose user-supplied description from which useful inferences can be made. Abstract should be a searchable property (marked with X)
G Paragraph 1 E The meaning of "the confrontation of this interpretation with the discovery set" is unclear. Change "the confrontation of this interpretation" to indicate clear meaning.
G Paragraph 2 G The text mentions "quality indicators", but the examples given in the table are all derived from the standard discovery metadata. These help to assess potential suitability of a dataset for a specific purpose, but provide no metric for quality of a data set (which may be partially complete, be at different resolution within different spatial extents). Article 11.1.a of the directive is specific about "the quality and validity of spatial data sets" as a *specific search criterion* Add a treatment of data quality metrics that can inform suitability for a particular purpose.
G Table G.1 T Fulltext description of data lineage is not enough to accurately or efficiently establish restrictions on use of data. The listing in table G.1 is blurring together use constraints with data processing and packaging information. Lineage information should be expressed in a more structured way, and exactly what lineage is meant to contain should be clearer.
4.1 Note 6, Paragraphs 4 & 5 G These paragraphs suggest that no description of quality of data available via a service will be mandated by IR. Service evaluation, and the means to talk about it in a structured way, will be a key to a successful data infrastructure. If we can quality as "fitness for use" then there are ways to transmit and reuse evaluation of a service or dataset for a specific purpose. Expectations that users of a data infrastructure will search "news and mail forums" or learn about service performance "by word of mouth", do not belong in a framework for a useful data infrastructure. Paragraphs 4 and 5 should be removed, this Note as a whole revised.
5.2.7 T It is necessary when implementing a spatial data or service discovery engine, to be able to get authoritatively identifying information about the service type, that does not involve human intervention in order to be able to verify it. The service type should be mandated as a URN or other consistent identifier (OGC:WMS is commonly used), either from a supplied list of well-known standard services, or by being able to refer to an external list in a structured way. Service type should not be a free-text field, but be a guaranteed unique and consistent property.
A.2.7 T In order to be reusable in service discovery or service chaining applications, Service Type should not be a freetext field but a URN or other reusable and definitive description of a service. The ISO19115 model is deficient in this respect. ISO19119 provides structured descriptions of service types and these are recommended in the current North American Draft Profile for metadata, the conceptual equivalent to this IR that will replace the current Common Profile. Service type should be an externally-delimited identifier, not a "Free text" property.
4.6 Paragraph 6 T The assertion that data should be available "preferably via a standard interface and/or standard encoding format such as XML" is too vague to be useful. An abstract model for discovery interfaces is common practise in the broader information retrieval community; it is needed in order to be able to ensure and evaluate conformance. This paragraph should either be removed or supplemented with more detail from the IR on Discovery Services.
A.3.8 T The table refers to a code list for "distributed computing platform", but no reference to a code list is supplied. If a code list is not available, this should be a URN or a URL, with ability to extend into a namespace supplied by user communities and future applications. A reference to the code list should be supplied, *or* the property should be a URN or a reference to a URI.
I T Temporal extent is optional in the Dublin Core mapping, "mandatory when it is meaningful for discovering and searching spatial data sets". Arguably, temporal characteristics of a data set are *always* useful in attempting to search a large repository and evaluate results. Temporal extent should be mandatory.
B.6 T Keywords are just as useful for discovering and filtering services as they are for data sets and series. If mandatory for data sets, should also be mandatory for services. Obligation/condition should change either M mandatory, or C (mandatory for datasets, dataset series, and services )
5.1 Paragraph 9 T "The discovery metadata elements are defined at an abstract level in order to make the Implementing Rules independent of ...". If there is no specified encoding or mapping into different common standard implementations, there is likely to be so much variance in encoding of metadata, as to cause new classes of problems (meta-mapping between different slightly discrepant models) while not solving the ones we face now. The usability of metadata in search engine applications. There is such a thing as too futureproof. Hard to suggest a concrete change as this is a deep conceptual issue.
5.3.3 Table 2 T The Operation name should be moved to Discovery level 1 and must be searchable, this is necessary for services searching. Move 'Operation name' to discovery level 1
5.1 Paragraph 2 T "the user query expressed through the search interface of the search engine and provided in a form compatible with the metadata repository interface". This is unclear; does it indicate that any repository can have different interface? Provide examples of a repository interface, and/or references to the Discovery Services draft
A.3.2 T States the value should be an Integer type, but the example given is a floating point value. Either change the example to be an integer, or the data type to be a Float
A.2.3.1 T EX_GeographicBoundingBox should always be defined for services, not conditional on their behaviour. A global-scope default can always be supplied Make EX_GeographicBoundingBox mandatory for all services
A.1.2.2 T “identifier: MD_identifier. condition: if the identifier is available”. Identifier will be probably available always and if not it should be generated. Remove "condition..." and mandate the provision of an identifier.
4.1 Note 6, Paragraph 2 T "Services, including web services, are routinely measured in terms of availability and performance. These parameters are easily quantified and users can easily agree on their value: services that are available more often (seeking the elusive 99.9% "up-time") are more desirable than services that are available less often, and services that provide faster response time are more desirable than similar services that are slower to respond.". This is not exact. When a user wants to use a service he must know more than about availability and speed of the service. For example cost, quality of used geodata, used algorithms, quality of used algorithms, possibility of chaining, quality of self-description of the process, development, update, security, type of call (synchronous x asynchronous). Expand the description of service quality to include a more useful metric than uptime and response speed.

General responses that are harder to "fit" but should be offered.

5.2 T Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract discovery model (The probably *do* belong in some result record schema). These three attributes seem to belong specifically to a particular service binding (Or as already said, to a very specific kind of returned result record). What I'd really like to see is a much clearer statement of what the purpose of the abstract discovery model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model. In the information retrieval community generally, it's considered really important to have a seperate abstract model for discovery (The search access points) and then bind that model on to as many backend schemas as needed.. this decoupling is seen as best practice in the information retrival domain, and most of my concerns here are that because of the apparent 1:1 mapping between the abstract model and the implementation. This is the approach taken in the [[Z3950 GEO profile] http://www.blueangeltech.com/standards/GeoProfile/geo22.htm] Add an explanation, or a reference to coverage in the discovery services draft, of the abstract discovery model.
5.3.2 T 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by Dublin Core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spatial access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data. This seems a specific example where the abstract IR model needs to go beyond what is defined in the A2 binding What is indicated by a Temporal Reference should be consistent between 5.2.2 and 5.3.2, or clearer examples offered in the Annexes.
A T What are the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval the answer is "no", but this is less clear when result records aren't primarily "Text" based. (Actually, this is a slightly wider concern about Annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants. The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?) Add a treatment in Annex I that describes the specifics of search across multi-language elements; reconsider the expression of multilanguage CharacterStrings in the light of information retrieval best practise (see comment on this change)