(chapter-idn)=
# Use Case - Collection Search (IDN)

This chapter provides a comprehensive and detailed process about how to implement a WGISS
OpenSearch client, which includes how to retrieve the OSDD for the collection of interest, and
how to build an OpenSearch request. 

In [None]:
import re
import json, requests, xml
# import pandas as pd
# import ipywidgets as widgets

from xml.dom import minidom
# from IPython.display import Image
from xml.etree import ElementTree
# from IPython.display import HTML
from IPython.display import Markdown as md

In [None]:
def get_api_request(template, os_querystring):
  # Fill (URL) template with OpenSearch parameter values provided in os_querystring and return as short HTTP URL without empty parameters.
  
  # print("URL template: " + template)
  
  # Limitation: the OSDD may use a default namespace for OpenSearch instead of using "os".
  # We make a simple correction here allowing to use OpenSearch queryables without namespace in requests.
  # A more generic solution to obtain namespaces from the OSDD and compare them with user supplied namespaces is future work.
  
  OS_NAMESPACE = 'os:'
      
  # perform substitutions in template
  for p in os_querystring:
      # print("  .. replacing:", p, "by", os_querystring[p])
      # template = re.sub('\{'+p+'.*?\}', os_querystring[p] , template)
      result = re.subn('\{'+p+'.*?\}', os_querystring[p] , template)
      n = result[1]
      template = result[0]
      if (n<1):
          if (':' in p):
                print("ERROR: parameter " + p + " not found in template.")
          else:
                # try with explicit namespace
                result = re.subn('\{'+OS_NAMESPACE+p+'.*?\}', os_querystring[p] , template)
                n = result[1]
                template = result[0]
                if (n<1):
                    print("ERROR: parameter " + OS_NAMESPACE+p + " not found in template.")   
      
      # print("- intermediate new template:" + template)
      
  # remove empty search parameters
  template=re.sub('&?[a-zA-Z]*=\{.*?\}', '' , template)
  
  # remove remaining empty search parameters which did not have an HTTP query parameter attached (e.g. /{time:end}).
  template=re.sub('.?\{.*?\}', '' , template)
  
  # print("API request: " + template)
            
  return (template)

## IDN Systems


The IDN, CMR OpenSearch (for IDN), and the GCMD’s Keyword Management Service (KMS)
only have operational systems which end-users can access.

- IDN site is available to all users.  Location: https://idn.ceos.org/
- OpenSearch API for IDN (via CMR). Production instance is available to all users.  Location: https://cmr.earthdata.nasa.gov/opensearch/
- KMS - production instance is available to all users.  Location: https://gcmd.earthdata.nasa.gov/kms/capabilities?format=html

The IDN site search interface and the CMR OpenSearch production instances will provide access to all collections which have been registered in the IDN. The KMS production instance will provide access to all approved GCMD keywords registered by IDN providers.

(section_Retrieve_Collections_via_IDN_OpenSearch)=
## Retrieve Collections via IDN OpenSearch

CEOS OpenSearch supports searching for collections through the IDN. Searching for granules in a specific collection is supported at the data partners via the Granule Gateways (see chapter
“CWIC” and chapter ["FedEO"](chapter-fedeo)). It executes a collection or inventory search, as appropriate, and returns the matching results.  In order to create a valid request, clients have to obtain the IDN OpenSearch OSDD and fill request
parameters with proper values.

**Step 1**  
>  Obtain the IDN OpenSearch OSDD to formulate a valid IDN OpenSearch request.

In [None]:
URL_OSDD = "https://cmr.earthdata.nasa.gov/opensearch/collections/descriptor_document.xml?clientId=ceosOpenSearchDoc"

The template of the OpenSearch request is available under the `<Url>` element corresponding to the media type (Atom) in the OSDD and is included below.

In [None]:
response = requests.get( URL_OSDD )
xmlstr = minidom.parseString(response.text).toprettyxml(indent='  ',newl='')
md("```xml\n" + xmlstr + "\n```\n")

**Step 2**  
>  Search collections of interest through IDN OpenSearch with proper request parameters. 

An example request can be formed as follows.

In [None]:
# find URL template for collection search
root = ElementTree.fromstring(response.text)

ns = {'os': 'http://a9.com/-/spec/opensearch/1.1/'}
collection_url_atom = root.find('os:Url[@rel="collection"][@type="application/atom+xml"]', ns)

collection_template = collection_url_atom.attrib['template']
collection_template

In [None]:
request_url = get_api_request(collection_template, {'count': '10', 'searchTerms': 'Landsat_8'})
request_url

In [None]:
response = requests.get( request_url )
xmlstr = minidom.parseString(response.text).toprettyxml(indent='   ', newl='')
md("```xml\n" + xmlstr + "\n```\n")

**Step 3**  
>  From the IDN OpenSearch response obtain the OSDD endpoint for the collection by parsing the href attribute in the `<link rel="search" type="application/opensearchdescription+xml" />` element. Note that the OSDD endpoint may refer to CWIC or FedEO. 

In [None]:
request_url = get_api_request(collection_template, {'count': '10', 'geo:uid': 'C1235542031-USGS_LTA'})
request_url

In [None]:
response = requests.get( request_url )
xmlstr = minidom.parseString(response.text).toprettyxml(indent='   ', newl='')
md("```xml\n" + xmlstr + "\n```\n")

Obtain the OSDD endpoint for the granule search for this collection by parsing the href attribute in `<link rel="search"
type="application/opensearchdescription+xml" >`

In [None]:
root = ElementTree.fromstring(response.text)
# Extract <link> element with the OSDD for the granule search with Atom response
el = root.find('{http://www.w3.org/2005/Atom}entry/{http://www.w3.org/2005/Atom}link[@rel="search"][@type="application/opensearchdescription+xml"]')
xmltxt = ElementTree.tostring(el, encoding='unicode', method='xml')
md("```xml\n" + xmltxt + "\n```\n")

Extract the URL of the OSDD endpoint from the `<link>`.

In [None]:
url_osdd = el.attrib['href']
el.attrib['href']

**Step 4**  
>  From the collection OSDD found in the IDN OpenSearch response, formulate a valid granule search request. How to do this is explained in sections (“CWIC”) and  ["FedEO"](chapter-fedeo).  

In principle, CEOS OpenSearch clients compliant with CEOS OpenSearch Best Practices should
not care whether the second step (i.e. Granule search) redirects to CWIC or FedEO as both
endpoints provide the same interface which clients should discover by obtaining the OSDD.
Search parameters in the URL template which are specific to one of both endpoints (i.e. not defined
in the CEOS Best Practice for OpenSearch or belonging to a foreign namespace) can be left empty
when preparing the granule search request.

(section_Available_Collection_Search_Criteria_IDN)=
## Available Collection Search Criteria

CEOS OpenSearch is used as the IDN’s collections search implementation based on
the [OpenSearch 1.1 (Draft 5) specification](https://github.com/dewitt/opensearch/blob/master/opensearch-1-1-draft-6.md) and is compliant with the CEOS OpenSearch Best
Practices. The IDN OpenSearch API allows clients to formulate OpenSearch compliant queries
against the IDN collections and specify the desired search results format as OpenSearch
compliant Atom or HTML. The IDN OpenSearch API implements the following search fields for
users’ queries:

**Table of collection search criteria**

In [None]:
%%html

<table>

<tr><th>HTTP Query<br/>
Parameter</th><th align="left">
Description
</th><th>Value &
Cardinality<br/>
(M) = mandatory<br/>
(O) = optional
</th><th>
OpenSearch Parameter</th></tr>


<tr><td>boundingBox</td><td align="left"> Inventory with a spatial extent overlapping
this bounding box
</td><td>(O)</td><td> geo:box</td></tr>

<tr><td>keyword</td><td align="left"> Inventory with terms expressed by these
search terms
</td><td>(O)</td><td> os:searchTerms</td></tr>

<tr><td>instrument</td><td align="left"> Inventory associated with a satellite
instrument expressed by this short name
</td><td>(O)</td><td> echo:instrument</td></tr>


<tr><td>satellite</td><td align="left"> Inventory associated with a
Satellite/platform expressed by this short
name
</td><td>(O)</td><td> eo:platform</td></tr>

<tr><td>geometry</td><td align="left"> Inventory with a spatial extent overlapping
this geometry
</td><td>(O)</td><td> geo:geometry</td></tr>

<tr><td>placeName</td><td align="left"> Inventory with a spatial location described
by this name
</td><td>(O)</td><td> geo:name</td></tr>

<tr><td>startTime</td><td align="left"> Inventory with a temporal extent containing
this start time
</td><td>(O)</td><td> time:start</td></tr>

<tr><td>endTime</td><td align="left"> Inventory with a temporal extent containing
this end time
</td><td>(O)</td><td> time:end</td></tr>


<tr><td>cursor</td><td align="left"> Start page for the search result </td><td>(O)</td><td> os:startPage</td></tr>

<tr><td>numberOfResults</td><td align="left"> Maximum number of records in the search
result
</td><td>(O)</td><td> os:count</td></tr>


<tr><td>offset</td><td align="left"> 0 - based offset used to skip the specified
number of results in the search result set
</td><td>(O)</td><td> os:startIndex</td></tr>

<tr><td>uid</td><td align="left"> Inventory associated with this unique ID</td><td>(O)</td><td> geo:uid</td></tr>
<tr><td>hasGranules</td><td align="left"> Inventory with granules</td><td>(O)</td><td> echo:hasGranules</td></tr>
<tr><td>isCwic</td><td align="left"> Inventory related to CWIC</td><td>(O)</td><td> echo:isCwic</td></tr>
<tr><td>isGeoss</td><td align="left"> Inventory related to GEOSS</td><td>(O)</td><td> echo:isGeoss</td></tr>
<tr><td>isCeos</td><td align="left"> Inventory related to CEOS</td><td>(O)</td><td></td></tr>
<tr><td>isEosdis</td><td align="left"> Inventory related to EOSDIS</td><td>(O)</td><td> echo:isEosdis</td></tr>
<tr><td>provider</td><td align="left"> Inventory associated with a provider</td><td>(O)</td><td> echo:provider</td></tr>
<tr><td>clientId</td><td align="left"> Client identifier to be used for metrics</td><td>(O)</td><td> cswOpenSearchDoc</td></tr>

</table>

Also, client developers are able to query with specific tags: isCeos, isCwic, isGeoss, and isFedEO.
Tagging allows arbitrary sets of collections to be grouped under a single namespace value. The
sets of collections can be recalled later when searching by tag fields.

IDN query examples:

- GET the first 10 IDN collections with results in the Atom format:
    https://cmr.earthdata.nasa.gov/opensearch/collections.atom?numberOfResults=10&clientId=cswOpenSearchDoc
- GET the first 10 IDN collections containing the GCMD instrument keyword MODIS
    with results in the Atom output format:
    https://cmr.earthdata.nasa.gov/opensearch/collections.atom?instrument=MODIS&numberOfResults=10&clientId=cswOpenSearchDoc
- GET the first 10 CWIC IDN collections containing the GCMD instrument keyword
    MODIS with results in the HTML format:
    https://cmr.earthdata.nasa.gov/opensearch/collections?instrument=MODIS&isCwic=true&numberOfResults=10&clientId=cswOpenSearchDoc

