Skip to content

Instantly share code, notes, and snippets.

@timrobertson100
Last active August 29, 2015 14:06
Show Gist options
  • Select an option

  • Save timrobertson100/1f0d68c8339e88b7c7de to your computer and use it in GitHub Desktop.

Select an option

Save timrobertson100/1f0d68c8339e88b7c7de to your computer and use it in GitHub Desktop.
Reducing occurrence download widths to match content

Optimizing the downloads for users

GBIF.org delivers really wide tables, which are unmanageable for many, and slow to work with. By only returning columns with actual values in the data returned for any query, users will have narrower tables and will be easier to manage.

Currently we have 441 fields in occurrence_hdfs. Of these, across all records, only 347 are populated in one or more records.

We could consider

  1. creating occurrence_hdfs only as wide as it needs to be - e.g. skip terms never populated (speeding up download MR jobs)
  2. doing the same query before each download query will likely reduce the width further depending on the biases in the data

Alternatively we could either:

  1. Offer the ability to select fields individually (like on the old portal), possibly populating the typical ones by default, and then letting people enable others.
  2. Offering a simple choice of predefined formats - e.g. Download typical format or Download all fields when the user downloads.

The following query shows how one could run a preliminary job, to determine which fields to actually SELECT in the real query.

SELECT DISTINCT field FROM
(
  SELECT
    explode(
      array(
        CASE abstract IS NULL WHEN FALSE THEN 'abstract' ELSE '' END,
        CASE acceptednameusage IS NULL WHEN FALSE THEN 'acceptednameusage' ELSE '' END,
        CASE acceptednameusageid IS NULL WHEN FALSE THEN 'acceptednameusageid' ELSE '' END,
        CASE accessrights IS NULL WHEN FALSE THEN 'accessrights' ELSE '' END,
        CASE accrualmethod IS NULL WHEN FALSE THEN 'accrualmethod' ELSE '' END,
        CASE accrualperiodicity IS NULL WHEN FALSE THEN 'accrualperiodicity' ELSE '' END,
        CASE accrualpolicy IS NULL WHEN FALSE THEN 'accrualpolicy' ELSE '' END,
        CASE alternative IS NULL WHEN FALSE THEN 'alternative' ELSE '' END,
        CASE associatedoccurrences IS NULL WHEN FALSE THEN 'associatedoccurrences' ELSE '' END,
        CASE associatedreferences IS NULL WHEN FALSE THEN 'associatedreferences' ELSE '' END,
        CASE associatedsequences IS NULL WHEN FALSE THEN 'associatedsequences' ELSE '' END,
        CASE associatedtaxa IS NULL WHEN FALSE THEN 'associatedtaxa' ELSE '' END,
        CASE audience IS NULL WHEN FALSE THEN 'audience' ELSE '' END,
        CASE available IS NULL WHEN FALSE THEN 'available' ELSE '' END,
        CASE basisofrecord IS NULL WHEN FALSE THEN 'basisofrecord' ELSE '' END,
        CASE bed IS NULL WHEN FALSE THEN 'bed' ELSE '' END,
        CASE behavior IS NULL WHEN FALSE THEN 'behavior' ELSE '' END,
        CASE bibliographiccitation IS NULL WHEN FALSE THEN 'bibliographiccitation' ELSE '' END,
        CASE catalognumber IS NULL WHEN FALSE THEN 'catalognumber' ELSE '' END,
        CASE class IS NULL WHEN FALSE THEN 'class' ELSE '' END,
        CASE classkey IS NULL WHEN FALSE THEN 'classkey' ELSE '' END,
        CASE collectioncode IS NULL WHEN FALSE THEN 'collectioncode' ELSE '' END,
        CASE collectionid IS NULL WHEN FALSE THEN 'collectionid' ELSE '' END,
        CASE conformsto IS NULL WHEN FALSE THEN 'conformsto' ELSE '' END,
        CASE continent IS NULL WHEN FALSE THEN 'continent' ELSE '' END,
        CASE contributor IS NULL WHEN FALSE THEN 'contributor' ELSE '' END,
        CASE coordinateaccuracy IS NULL WHEN FALSE THEN 'coordinateaccuracy' ELSE '' END,
        CASE countrycode IS NULL WHEN FALSE THEN 'countrycode' ELSE '' END,
        CASE county IS NULL WHEN FALSE THEN 'county' ELSE '' END,
        CASE coverage IS NULL WHEN FALSE THEN 'coverage' ELSE '' END,
        CASE crawlid IS NULL WHEN FALSE THEN 'crawlid' ELSE '' END,
        CASE created IS NULL WHEN FALSE THEN 'created' ELSE '' END,
        CASE creator IS NULL WHEN FALSE THEN 'creator' ELSE '' END,
        CASE datageneralizations IS NULL WHEN FALSE THEN 'datageneralizations' ELSE '' END,
        CASE datasetid IS NULL WHEN FALSE THEN 'datasetid' ELSE '' END,
        CASE datasetkey IS NULL WHEN FALSE THEN 'datasetkey' ELSE '' END,
        CASE datasetname IS NULL WHEN FALSE THEN 'datasetname' ELSE '' END,
        CASE date_ IS NULL WHEN FALSE THEN 'date_' ELSE '' END,
        CASE dateaccepted IS NULL WHEN FALSE THEN 'dateaccepted' ELSE '' END,
        CASE datecopyrighted IS NULL WHEN FALSE THEN 'datecopyrighted' ELSE '' END,
        CASE dateidentified IS NULL WHEN FALSE THEN 'dateidentified' ELSE '' END,
        CASE datesubmitted IS NULL WHEN FALSE THEN 'datesubmitted' ELSE '' END,
        CASE day IS NULL WHEN FALSE THEN 'day' ELSE '' END,
        CASE decimallatitude IS NULL WHEN FALSE THEN 'decimallatitude' ELSE '' END,
        CASE decimallongitude IS NULL WHEN FALSE THEN 'decimallongitude' ELSE '' END,
        CASE depth IS NULL WHEN FALSE THEN 'depth' ELSE '' END,
        CASE depthaccuracy IS NULL WHEN FALSE THEN 'depthaccuracy' ELSE '' END,
        CASE description IS NULL WHEN FALSE THEN 'description' ELSE '' END,
        CASE disposition IS NULL WHEN FALSE THEN 'disposition' ELSE '' END,
        CASE distanceabovesurface IS NULL WHEN FALSE THEN 'distanceabovesurface' ELSE '' END,
        CASE distanceabovesurfaceaccuracy IS NULL WHEN FALSE THEN 'distanceabovesurfaceaccuracy' ELSE '' END,
        CASE dynamicproperties IS NULL WHEN FALSE THEN 'dynamicproperties' ELSE '' END,
        CASE earliestageorloweststage IS NULL WHEN FALSE THEN 'earliestageorloweststage' ELSE '' END,
        CASE earliesteonorlowesteonothem IS NULL WHEN FALSE THEN 'earliesteonorlowesteonothem' ELSE '' END,
        CASE earliestepochorlowestseries IS NULL WHEN FALSE THEN 'earliestepochorlowestseries' ELSE '' END,
        CASE earliesteraorlowesterathem IS NULL WHEN FALSE THEN 'earliesteraorlowesterathem' ELSE '' END,
        CASE earliestperiodorlowestsystem IS NULL WHEN FALSE THEN 'earliestperiodorlowestsystem' ELSE '' END,
        CASE educationlevel IS NULL WHEN FALSE THEN 'educationlevel' ELSE '' END,
        CASE elevation IS NULL WHEN FALSE THEN 'elevation' ELSE '' END,
        CASE elevationaccuracy IS NULL WHEN FALSE THEN 'elevationaccuracy' ELSE '' END,
        CASE enddayofyear IS NULL WHEN FALSE THEN 'enddayofyear' ELSE '' END,
        CASE establishmentmeans IS NULL WHEN FALSE THEN 'establishmentmeans' ELSE '' END,
        CASE eventdate IS NULL WHEN FALSE THEN 'eventdate' ELSE '' END,
        CASE eventid IS NULL WHEN FALSE THEN 'eventid' ELSE '' END,
        CASE eventremarks IS NULL WHEN FALSE THEN 'eventremarks' ELSE '' END,
        CASE eventtime IS NULL WHEN FALSE THEN 'eventtime' ELSE '' END,
        CASE ext_multimedia IS NULL WHEN FALSE THEN 'ext_multimedia' ELSE '' END,
        CASE extent IS NULL WHEN FALSE THEN 'extent' ELSE '' END,
        CASE family IS NULL WHEN FALSE THEN 'family' ELSE '' END,
        CASE familykey IS NULL WHEN FALSE THEN 'familykey' ELSE '' END,
        CASE fieldnotes IS NULL WHEN FALSE THEN 'fieldnotes' ELSE '' END,
        CASE fieldnumber IS NULL WHEN FALSE THEN 'fieldnumber' ELSE '' END,
        CASE footprintspatialfit IS NULL WHEN FALSE THEN 'footprintspatialfit' ELSE '' END,
        CASE footprintsrs IS NULL WHEN FALSE THEN 'footprintsrs' ELSE '' END,
        CASE footprintwkt IS NULL WHEN FALSE THEN 'footprintwkt' ELSE '' END,
        CASE format_ IS NULL WHEN FALSE THEN 'format_' ELSE '' END,
        CASE formation IS NULL WHEN FALSE THEN 'formation' ELSE '' END,
        CASE fragmentcreated IS NULL WHEN FALSE THEN 'fragmentcreated' ELSE '' END,
        CASE gbifid IS NULL WHEN FALSE THEN 'gbifid' ELSE '' END,
        CASE genericname IS NULL WHEN FALSE THEN 'genericname' ELSE '' END,
        CASE genus IS NULL WHEN FALSE THEN 'genus' ELSE '' END,
        CASE genuskey IS NULL WHEN FALSE THEN 'genuskey' ELSE '' END,
        CASE geologicalcontextid IS NULL WHEN FALSE THEN 'geologicalcontextid' ELSE '' END,
        CASE georeferencedby IS NULL WHEN FALSE THEN 'georeferencedby' ELSE '' END,
        CASE georeferenceddate IS NULL WHEN FALSE THEN 'georeferenceddate' ELSE '' END,
        CASE georeferenceprotocol IS NULL WHEN FALSE THEN 'georeferenceprotocol' ELSE '' END,
        CASE georeferenceremarks IS NULL WHEN FALSE THEN 'georeferenceremarks' ELSE '' END,
        CASE georeferencesources IS NULL WHEN FALSE THEN 'georeferencesources' ELSE '' END,
        CASE georeferenceverificationstatus IS NULL WHEN FALSE THEN 'georeferenceverificationstatus' ELSE '' END,
        CASE group_ IS NULL WHEN FALSE THEN 'group_' ELSE '' END,
        CASE habitat IS NULL WHEN FALSE THEN 'habitat' ELSE '' END,
        CASE hascoordinate IS NULL WHEN FALSE THEN 'hascoordinate' ELSE '' END,
        CASE hasformat IS NULL WHEN FALSE THEN 'hasformat' ELSE '' END,
        CASE hasgeospatialissues IS NULL WHEN FALSE THEN 'hasgeospatialissues' ELSE '' END,
        CASE haspart IS NULL WHEN FALSE THEN 'haspart' ELSE '' END,
        CASE hasversion IS NULL WHEN FALSE THEN 'hasversion' ELSE '' END,
        CASE higherclassification IS NULL WHEN FALSE THEN 'higherclassification' ELSE '' END,
        CASE highergeography IS NULL WHEN FALSE THEN 'highergeography' ELSE '' END,
        CASE highergeographyid IS NULL WHEN FALSE THEN 'highergeographyid' ELSE '' END,
        CASE highestbiostratigraphiczone IS NULL WHEN FALSE THEN 'highestbiostratigraphiczone' ELSE '' END,
        CASE identificationid IS NULL WHEN FALSE THEN 'identificationid' ELSE '' END,
        CASE identificationqualifier IS NULL WHEN FALSE THEN 'identificationqualifier' ELSE '' END,
        CASE identificationreferences IS NULL WHEN FALSE THEN 'identificationreferences' ELSE '' END,
        CASE identificationremarks IS NULL WHEN FALSE THEN 'identificationremarks' ELSE '' END,
        CASE identificationverificationstatus IS NULL WHEN FALSE THEN 'identificationverificationstatus' ELSE '' END,
        CASE identifiedby IS NULL WHEN FALSE THEN 'identifiedby' ELSE '' END,
        CASE identifier IS NULL WHEN FALSE THEN 'identifier' ELSE '' END,
        CASE identifiercount IS NULL WHEN FALSE THEN 'identifiercount' ELSE '' END,
        CASE individualcount IS NULL WHEN FALSE THEN 'individualcount' ELSE '' END,
        CASE individualid IS NULL WHEN FALSE THEN 'individualid' ELSE '' END,
        CASE informationwithheld IS NULL WHEN FALSE THEN 'informationwithheld' ELSE '' END,
        CASE infraspecificepithet IS NULL WHEN FALSE THEN 'infraspecificepithet' ELSE '' END,
        CASE institutioncode IS NULL WHEN FALSE THEN 'institutioncode' ELSE '' END,
        CASE institutionid IS NULL WHEN FALSE THEN 'institutionid' ELSE '' END,
        CASE instructionalmethod IS NULL WHEN FALSE THEN 'instructionalmethod' ELSE '' END,
        CASE isformatof IS NULL WHEN FALSE THEN 'isformatof' ELSE '' END,
        CASE island IS NULL WHEN FALSE THEN 'island' ELSE '' END,
        CASE islandgroup IS NULL WHEN FALSE THEN 'islandgroup' ELSE '' END,
        CASE ispartof IS NULL WHEN FALSE THEN 'ispartof' ELSE '' END,
        CASE isreferencedby IS NULL WHEN FALSE THEN 'isreferencedby' ELSE '' END,
        CASE isreplacedby IS NULL WHEN FALSE THEN 'isreplacedby' ELSE '' END,
        CASE isrequiredby IS NULL WHEN FALSE THEN 'isrequiredby' ELSE '' END,
        CASE issue IS NULL WHEN FALSE THEN 'issue' ELSE '' END,
        CASE issued IS NULL WHEN FALSE THEN 'issued' ELSE '' END,
        CASE isversionof IS NULL WHEN FALSE THEN 'isversionof' ELSE '' END,
        CASE kingdom IS NULL WHEN FALSE THEN 'kingdom' ELSE '' END,
        CASE kingdomkey IS NULL WHEN FALSE THEN 'kingdomkey' ELSE '' END,
        CASE language IS NULL WHEN FALSE THEN 'language' ELSE '' END,
        CASE lastcrawled IS NULL WHEN FALSE THEN 'lastcrawled' ELSE '' END,
        CASE lastinterpreted IS NULL WHEN FALSE THEN 'lastinterpreted' ELSE '' END,
        CASE lastparsed IS NULL WHEN FALSE THEN 'lastparsed' ELSE '' END,
        CASE latestageorhigheststage IS NULL WHEN FALSE THEN 'latestageorhigheststage' ELSE '' END,
        CASE latesteonorhighesteonothem IS NULL WHEN FALSE THEN 'latesteonorhighesteonothem' ELSE '' END,
        CASE latestepochorhighestseries IS NULL WHEN FALSE THEN 'latestepochorhighestseries' ELSE '' END,
        CASE latesteraorhighesterathem IS NULL WHEN FALSE THEN 'latesteraorhighesterathem' ELSE '' END,
        CASE latestperiodorhighestsystem IS NULL WHEN FALSE THEN 'latestperiodorhighestsystem' ELSE '' END,
        CASE license IS NULL WHEN FALSE THEN 'license' ELSE '' END,
        CASE lifestage IS NULL WHEN FALSE THEN 'lifestage' ELSE '' END,
        CASE lithostratigraphicterms IS NULL WHEN FALSE THEN 'lithostratigraphicterms' ELSE '' END,
        CASE locality IS NULL WHEN FALSE THEN 'locality' ELSE '' END,
        CASE locationaccordingto IS NULL WHEN FALSE THEN 'locationaccordingto' ELSE '' END,
        CASE locationid IS NULL WHEN FALSE THEN 'locationid' ELSE '' END,
        CASE locationremarks IS NULL WHEN FALSE THEN 'locationremarks' ELSE '' END,
        CASE lowestbiostratigraphiczone IS NULL WHEN FALSE THEN 'lowestbiostratigraphiczone' ELSE '' END,
        CASE materialsampleid IS NULL WHEN FALSE THEN 'materialsampleid' ELSE '' END,
        CASE maximumdistanceabovesurfaceinmeters IS NULL WHEN FALSE THEN 'maximumdistanceabovesurfaceinmeters' ELSE '' END,
        CASE mediator IS NULL WHEN FALSE THEN 'mediator' ELSE '' END,
        CASE mediatype IS NULL WHEN FALSE THEN 'mediatype' ELSE '' END,
        CASE medium IS NULL WHEN FALSE THEN 'medium' ELSE '' END,
        CASE member IS NULL WHEN FALSE THEN 'member' ELSE '' END,
        CASE minimumdistanceabovesurfaceinmeters IS NULL WHEN FALSE THEN 'minimumdistanceabovesurfaceinmeters' ELSE '' END,
        CASE modified IS NULL WHEN FALSE THEN 'modified' ELSE '' END,
        CASE month IS NULL WHEN FALSE THEN 'month' ELSE '' END,
        CASE municipality IS NULL WHEN FALSE THEN 'municipality' ELSE '' END,
        CASE nameaccordingto IS NULL WHEN FALSE THEN 'nameaccordingto' ELSE '' END,
        CASE nameaccordingtoid IS NULL WHEN FALSE THEN 'nameaccordingtoid' ELSE '' END,
        CASE namepublishedin IS NULL WHEN FALSE THEN 'namepublishedin' ELSE '' END,
        CASE namepublishedinid IS NULL WHEN FALSE THEN 'namepublishedinid' ELSE '' END,
        CASE namepublishedinyear IS NULL WHEN FALSE THEN 'namepublishedinyear' ELSE '' END,
        CASE nomenclaturalcode IS NULL WHEN FALSE THEN 'nomenclaturalcode' ELSE '' END,
        CASE nomenclaturalstatus IS NULL WHEN FALSE THEN 'nomenclaturalstatus' ELSE '' END,
        CASE occurrencedetails IS NULL WHEN FALSE THEN 'occurrencedetails' ELSE '' END,
        CASE occurrenceid IS NULL WHEN FALSE THEN 'occurrenceid' ELSE '' END,
        CASE occurrenceremarks IS NULL WHEN FALSE THEN 'occurrenceremarks' ELSE '' END,
        CASE occurrencestatus IS NULL WHEN FALSE THEN 'occurrencestatus' ELSE '' END,
        CASE order_ IS NULL WHEN FALSE THEN 'order_' ELSE '' END,
        CASE orderkey IS NULL WHEN FALSE THEN 'orderkey' ELSE '' END,
        CASE originalnameusage IS NULL WHEN FALSE THEN 'originalnameusage' ELSE '' END,
        CASE originalnameusageid IS NULL WHEN FALSE THEN 'originalnameusageid' ELSE '' END,
        CASE othercatalognumbers IS NULL WHEN FALSE THEN 'othercatalognumbers' ELSE '' END,
        CASE ownerinstitutioncode IS NULL WHEN FALSE THEN 'ownerinstitutioncode' ELSE '' END,
        CASE parentnameusage IS NULL WHEN FALSE THEN 'parentnameusage' ELSE '' END,
        CASE parentnameusageid IS NULL WHEN FALSE THEN 'parentnameusageid' ELSE '' END,
        CASE phylum IS NULL WHEN FALSE THEN 'phylum' ELSE '' END,
        CASE phylumkey IS NULL WHEN FALSE THEN 'phylumkey' ELSE '' END,
        CASE pointradiusspatialfit IS NULL WHEN FALSE THEN 'pointradiusspatialfit' ELSE '' END,
        CASE preparations IS NULL WHEN FALSE THEN 'preparations' ELSE '' END,
        CASE previousidentifications IS NULL WHEN FALSE THEN 'previousidentifications' ELSE '' END,
        CASE protocol IS NULL WHEN FALSE THEN 'protocol' ELSE '' END,
        CASE provenance IS NULL WHEN FALSE THEN 'provenance' ELSE '' END,
        CASE publisher IS NULL WHEN FALSE THEN 'publisher' ELSE '' END,
        CASE publishingcountry IS NULL WHEN FALSE THEN 'publishingcountry' ELSE '' END,
        CASE publishingorgkey IS NULL WHEN FALSE THEN 'publishingorgkey' ELSE '' END,
        CASE recordedby IS NULL WHEN FALSE THEN 'recordedby' ELSE '' END,
        CASE recordnumber IS NULL WHEN FALSE THEN 'recordnumber' ELSE '' END,
        CASE references IS NULL WHEN FALSE THEN 'references' ELSE '' END,
        CASE relation IS NULL WHEN FALSE THEN 'relation' ELSE '' END,
        CASE replaces IS NULL WHEN FALSE THEN 'replaces' ELSE '' END,
        CASE reproductivecondition IS NULL WHEN FALSE THEN 'reproductivecondition' ELSE '' END,
        CASE requires IS NULL WHEN FALSE THEN 'requires' ELSE '' END,
        CASE rights IS NULL WHEN FALSE THEN 'rights' ELSE '' END,
        CASE rightsholder IS NULL WHEN FALSE THEN 'rightsholder' ELSE '' END,
        CASE samplingeffort IS NULL WHEN FALSE THEN 'samplingeffort' ELSE '' END,
        CASE samplingprotocol IS NULL WHEN FALSE THEN 'samplingprotocol' ELSE '' END,
        CASE scientificname IS NULL WHEN FALSE THEN 'scientificname' ELSE '' END,
        CASE scientificnameid IS NULL WHEN FALSE THEN 'scientificnameid' ELSE '' END,
        CASE sex IS NULL WHEN FALSE THEN 'sex' ELSE '' END,
        CASE source IS NULL WHEN FALSE THEN 'source' ELSE '' END,
        CASE spatial IS NULL WHEN FALSE THEN 'spatial' ELSE '' END,
        CASE species IS NULL WHEN FALSE THEN 'species' ELSE '' END,
        CASE specieskey IS NULL WHEN FALSE THEN 'specieskey' ELSE '' END,
        CASE specificepithet IS NULL WHEN FALSE THEN 'specificepithet' ELSE '' END,
        CASE startdayofyear IS NULL WHEN FALSE THEN 'startdayofyear' ELSE '' END,
        CASE stateprovince IS NULL WHEN FALSE THEN 'stateprovince' ELSE '' END,
        CASE subgenus IS NULL WHEN FALSE THEN 'subgenus' ELSE '' END,
        CASE subgenuskey IS NULL WHEN FALSE THEN 'subgenuskey' ELSE '' END,
        CASE subject IS NULL WHEN FALSE THEN 'subject' ELSE '' END,
        CASE tableofcontents IS NULL WHEN FALSE THEN 'tableofcontents' ELSE '' END,
        CASE taxonconceptid IS NULL WHEN FALSE THEN 'taxonconceptid' ELSE '' END,
        CASE taxonid IS NULL WHEN FALSE THEN 'taxonid' ELSE '' END,
        CASE taxonkey IS NULL WHEN FALSE THEN 'taxonkey' ELSE '' END,
        CASE taxonomicstatus IS NULL WHEN FALSE THEN 'taxonomicstatus' ELSE '' END,
        CASE taxonrank IS NULL WHEN FALSE THEN 'taxonrank' ELSE '' END,
        CASE taxonremarks IS NULL WHEN FALSE THEN 'taxonremarks' ELSE '' END,
        CASE temporal IS NULL WHEN FALSE THEN 'temporal' ELSE '' END,
        CASE title IS NULL WHEN FALSE THEN 'title' ELSE '' END,
        CASE type IS NULL WHEN FALSE THEN 'type' ELSE '' END,
        CASE typestatus IS NULL WHEN FALSE THEN 'typestatus' ELSE '' END,
        CASE typifiedname IS NULL WHEN FALSE THEN 'typifiedname' ELSE '' END,
        CASE unitqualifier IS NULL WHEN FALSE THEN 'unitqualifier' ELSE '' END,
        CASE v_abstract IS NULL WHEN FALSE THEN 'v_abstract' ELSE '' END,
        CASE v_acceptednameusage IS NULL WHEN FALSE THEN 'v_acceptednameusage' ELSE '' END,
        CASE v_acceptednameusageid IS NULL WHEN FALSE THEN 'v_acceptednameusageid' ELSE '' END,
        CASE v_accessrights IS NULL WHEN FALSE THEN 'v_accessrights' ELSE '' END,
        CASE v_accrualmethod IS NULL WHEN FALSE THEN 'v_accrualmethod' ELSE '' END,
        CASE v_accrualperiodicity IS NULL WHEN FALSE THEN 'v_accrualperiodicity' ELSE '' END,
        CASE v_accrualpolicy IS NULL WHEN FALSE THEN 'v_accrualpolicy' ELSE '' END,
        CASE v_alternative IS NULL WHEN FALSE THEN 'v_alternative' ELSE '' END,
        CASE v_associatedmedia IS NULL WHEN FALSE THEN 'v_associatedmedia' ELSE '' END,
        CASE v_associatedoccurrences IS NULL WHEN FALSE THEN 'v_associatedoccurrences' ELSE '' END,
        CASE v_associatedreferences IS NULL WHEN FALSE THEN 'v_associatedreferences' ELSE '' END,
        CASE v_associatedsequences IS NULL WHEN FALSE THEN 'v_associatedsequences' ELSE '' END,
        CASE v_associatedtaxa IS NULL WHEN FALSE THEN 'v_associatedtaxa' ELSE '' END,
        CASE v_audience IS NULL WHEN FALSE THEN 'v_audience' ELSE '' END,
        CASE v_available IS NULL WHEN FALSE THEN 'v_available' ELSE '' END,
        CASE v_basisofrecord IS NULL WHEN FALSE THEN 'v_basisofrecord' ELSE '' END,
        CASE v_bed IS NULL WHEN FALSE THEN 'v_bed' ELSE '' END,
        CASE v_behavior IS NULL WHEN FALSE THEN 'v_behavior' ELSE '' END,
        CASE v_bibliographiccitation IS NULL WHEN FALSE THEN 'v_bibliographiccitation' ELSE '' END,
        CASE v_catalognumber IS NULL WHEN FALSE THEN 'v_catalognumber' ELSE '' END,
        CASE v_class IS NULL WHEN FALSE THEN 'v_class' ELSE '' END,
        CASE v_collectioncode IS NULL WHEN FALSE THEN 'v_collectioncode' ELSE '' END,
        CASE v_collectionid IS NULL WHEN FALSE THEN 'v_collectionid' ELSE '' END,
        CASE v_conformsto IS NULL WHEN FALSE THEN 'v_conformsto' ELSE '' END,
        CASE v_continent IS NULL WHEN FALSE THEN 'v_continent' ELSE '' END,
        CASE v_contributor IS NULL WHEN FALSE THEN 'v_contributor' ELSE '' END,
        CASE v_coordinateprecision IS NULL WHEN FALSE THEN 'v_coordinateprecision' ELSE '' END,
        CASE v_coordinateuncertaintyinmeters IS NULL WHEN FALSE THEN 'v_coordinateuncertaintyinmeters' ELSE '' END,
        CASE v_country IS NULL WHEN FALSE THEN 'v_country' ELSE '' END,
        CASE v_countrycode IS NULL WHEN FALSE THEN 'v_countrycode' ELSE '' END,
        CASE v_county IS NULL WHEN FALSE THEN 'v_county' ELSE '' END,
        CASE v_coverage IS NULL WHEN FALSE THEN 'v_coverage' ELSE '' END,
        CASE v_created IS NULL WHEN FALSE THEN 'v_created' ELSE '' END,
        CASE v_creator IS NULL WHEN FALSE THEN 'v_creator' ELSE '' END,
        CASE v_datageneralizations IS NULL WHEN FALSE THEN 'v_datageneralizations' ELSE '' END,
        CASE v_datasetid IS NULL WHEN FALSE THEN 'v_datasetid' ELSE '' END,
        CASE v_datasetname IS NULL WHEN FALSE THEN 'v_datasetname' ELSE '' END,
        CASE v_date_ IS NULL WHEN FALSE THEN 'v_date_' ELSE '' END,
        CASE v_dateaccepted IS NULL WHEN FALSE THEN 'v_dateaccepted' ELSE '' END,
        CASE v_datecopyrighted IS NULL WHEN FALSE THEN 'v_datecopyrighted' ELSE '' END,
        CASE v_dateidentified IS NULL WHEN FALSE THEN 'v_dateidentified' ELSE '' END,
        CASE v_datesubmitted IS NULL WHEN FALSE THEN 'v_datesubmitted' ELSE '' END,
        CASE v_day IS NULL WHEN FALSE THEN 'v_day' ELSE '' END,
        CASE v_decimallatitude IS NULL WHEN FALSE THEN 'v_decimallatitude' ELSE '' END,
        CASE v_decimallongitude IS NULL WHEN FALSE THEN 'v_decimallongitude' ELSE '' END,
        CASE v_description IS NULL WHEN FALSE THEN 'v_description' ELSE '' END,
        CASE v_disposition IS NULL WHEN FALSE THEN 'v_disposition' ELSE '' END,
        CASE v_dynamicproperties IS NULL WHEN FALSE THEN 'v_dynamicproperties' ELSE '' END,
        CASE v_earliestageorloweststage IS NULL WHEN FALSE THEN 'v_earliestageorloweststage' ELSE '' END,
        CASE v_earliesteonorlowesteonothem IS NULL WHEN FALSE THEN 'v_earliesteonorlowesteonothem' ELSE '' END,
        CASE v_earliestepochorlowestseries IS NULL WHEN FALSE THEN 'v_earliestepochorlowestseries' ELSE '' END,
        CASE v_earliesteraorlowesterathem IS NULL WHEN FALSE THEN 'v_earliesteraorlowesterathem' ELSE '' END,
        CASE v_earliestperiodorlowestsystem IS NULL WHEN FALSE THEN 'v_earliestperiodorlowestsystem' ELSE '' END,
        CASE v_educationlevel IS NULL WHEN FALSE THEN 'v_educationlevel' ELSE '' END,
        CASE v_enddayofyear IS NULL WHEN FALSE THEN 'v_enddayofyear' ELSE '' END,
        CASE v_establishmentmeans IS NULL WHEN FALSE THEN 'v_establishmentmeans' ELSE '' END,
        CASE v_eventdate IS NULL WHEN FALSE THEN 'v_eventdate' ELSE '' END,
        CASE v_eventid IS NULL WHEN FALSE THEN 'v_eventid' ELSE '' END,
        CASE v_eventremarks IS NULL WHEN FALSE THEN 'v_eventremarks' ELSE '' END,
        CASE v_eventtime IS NULL WHEN FALSE THEN 'v_eventtime' ELSE '' END,
        CASE v_extent IS NULL WHEN FALSE THEN 'v_extent' ELSE '' END,
        CASE v_family IS NULL WHEN FALSE THEN 'v_family' ELSE '' END,
        CASE v_fieldnotes IS NULL WHEN FALSE THEN 'v_fieldnotes' ELSE '' END,
        CASE v_fieldnumber IS NULL WHEN FALSE THEN 'v_fieldnumber' ELSE '' END,
        CASE v_footprintspatialfit IS NULL WHEN FALSE THEN 'v_footprintspatialfit' ELSE '' END,
        CASE v_footprintsrs IS NULL WHEN FALSE THEN 'v_footprintsrs' ELSE '' END,
        CASE v_footprintwkt IS NULL WHEN FALSE THEN 'v_footprintwkt' ELSE '' END,
        CASE v_format_ IS NULL WHEN FALSE THEN 'v_format_' ELSE '' END,
        CASE v_formation IS NULL WHEN FALSE THEN 'v_formation' ELSE '' END,
        CASE v_genus IS NULL WHEN FALSE THEN 'v_genus' ELSE '' END,
        CASE v_geodeticdatum IS NULL WHEN FALSE THEN 'v_geodeticdatum' ELSE '' END,
        CASE v_geologicalcontextid IS NULL WHEN FALSE THEN 'v_geologicalcontextid' ELSE '' END,
        CASE v_georeferencedby IS NULL WHEN FALSE THEN 'v_georeferencedby' ELSE '' END,
        CASE v_georeferenceddate IS NULL WHEN FALSE THEN 'v_georeferenceddate' ELSE '' END,
        CASE v_georeferenceprotocol IS NULL WHEN FALSE THEN 'v_georeferenceprotocol' ELSE '' END,
        CASE v_georeferenceremarks IS NULL WHEN FALSE THEN 'v_georeferenceremarks' ELSE '' END,
        CASE v_georeferencesources IS NULL WHEN FALSE THEN 'v_georeferencesources' ELSE '' END,
        CASE v_georeferenceverificationstatus IS NULL WHEN FALSE THEN 'v_georeferenceverificationstatus' ELSE '' END,
        CASE v_group_ IS NULL WHEN FALSE THEN 'v_group_' ELSE '' END,
        CASE v_habitat IS NULL WHEN FALSE THEN 'v_habitat' ELSE '' END,
        CASE v_hasformat IS NULL WHEN FALSE THEN 'v_hasformat' ELSE '' END,
        CASE v_haspart IS NULL WHEN FALSE THEN 'v_haspart' ELSE '' END,
        CASE v_hasversion IS NULL WHEN FALSE THEN 'v_hasversion' ELSE '' END,
        CASE v_higherclassification IS NULL WHEN FALSE THEN 'v_higherclassification' ELSE '' END,
        CASE v_highergeography IS NULL WHEN FALSE THEN 'v_highergeography' ELSE '' END,
        CASE v_highergeographyid IS NULL WHEN FALSE THEN 'v_highergeographyid' ELSE '' END,
        CASE v_highestbiostratigraphiczone IS NULL WHEN FALSE THEN 'v_highestbiostratigraphiczone' ELSE '' END,
        CASE v_identificationid IS NULL WHEN FALSE THEN 'v_identificationid' ELSE '' END,
        CASE v_identificationqualifier IS NULL WHEN FALSE THEN 'v_identificationqualifier' ELSE '' END,
        CASE v_identificationreferences IS NULL WHEN FALSE THEN 'v_identificationreferences' ELSE '' END,
        CASE v_identificationremarks IS NULL WHEN FALSE THEN 'v_identificationremarks' ELSE '' END,
        CASE v_identificationverificationstatus IS NULL WHEN FALSE THEN 'v_identificationverificationstatus' ELSE '' END,
        CASE v_identifiedby IS NULL WHEN FALSE THEN 'v_identifiedby' ELSE '' END,
        CASE v_identifier IS NULL WHEN FALSE THEN 'v_identifier' ELSE '' END,
        CASE v_individualcount IS NULL WHEN FALSE THEN 'v_individualcount' ELSE '' END,
        CASE v_individualid IS NULL WHEN FALSE THEN 'v_individualid' ELSE '' END,
        CASE v_informationwithheld IS NULL WHEN FALSE THEN 'v_informationwithheld' ELSE '' END,
        CASE v_infraspecificepithet IS NULL WHEN FALSE THEN 'v_infraspecificepithet' ELSE '' END,
        CASE v_institutioncode IS NULL WHEN FALSE THEN 'v_institutioncode' ELSE '' END,
        CASE v_institutionid IS NULL WHEN FALSE THEN 'v_institutionid' ELSE '' END,
        CASE v_instructionalmethod IS NULL WHEN FALSE THEN 'v_instructionalmethod' ELSE '' END,
        CASE v_isformatof IS NULL WHEN FALSE THEN 'v_isformatof' ELSE '' END,
        CASE v_island IS NULL WHEN FALSE THEN 'v_island' ELSE '' END,
        CASE v_islandgroup IS NULL WHEN FALSE THEN 'v_islandgroup' ELSE '' END,
        CASE v_ispartof IS NULL WHEN FALSE THEN 'v_ispartof' ELSE '' END,
        CASE v_isreferencedby IS NULL WHEN FALSE THEN 'v_isreferencedby' ELSE '' END,
        CASE v_isreplacedby IS NULL WHEN FALSE THEN 'v_isreplacedby' ELSE '' END,
        CASE v_isrequiredby IS NULL WHEN FALSE THEN 'v_isrequiredby' ELSE '' END,
        CASE v_issued IS NULL WHEN FALSE THEN 'v_issued' ELSE '' END,
        CASE v_isversionof IS NULL WHEN FALSE THEN 'v_isversionof' ELSE '' END,
        CASE v_kingdom IS NULL WHEN FALSE THEN 'v_kingdom' ELSE '' END,
        CASE v_language IS NULL WHEN FALSE THEN 'v_language' ELSE '' END,
        CASE v_latestageorhigheststage IS NULL WHEN FALSE THEN 'v_latestageorhigheststage' ELSE '' END,
        CASE v_latesteonorhighesteonothem IS NULL WHEN FALSE THEN 'v_latesteonorhighesteonothem' ELSE '' END,
        CASE v_latestepochorhighestseries IS NULL WHEN FALSE THEN 'v_latestepochorhighestseries' ELSE '' END,
        CASE v_latesteraorhighesterathem IS NULL WHEN FALSE THEN 'v_latesteraorhighesterathem' ELSE '' END,
        CASE v_latestperiodorhighestsystem IS NULL WHEN FALSE THEN 'v_latestperiodorhighestsystem' ELSE '' END,
        CASE v_license IS NULL WHEN FALSE THEN 'v_license' ELSE '' END,
        CASE v_lifestage IS NULL WHEN FALSE THEN 'v_lifestage' ELSE '' END,
        CASE v_lithostratigraphicterms IS NULL WHEN FALSE THEN 'v_lithostratigraphicterms' ELSE '' END,
        CASE v_locality IS NULL WHEN FALSE THEN 'v_locality' ELSE '' END,
        CASE v_locationaccordingto IS NULL WHEN FALSE THEN 'v_locationaccordingto' ELSE '' END,
        CASE v_locationid IS NULL WHEN FALSE THEN 'v_locationid' ELSE '' END,
        CASE v_locationremarks IS NULL WHEN FALSE THEN 'v_locationremarks' ELSE '' END,
        CASE v_lowestbiostratigraphiczone IS NULL WHEN FALSE THEN 'v_lowestbiostratigraphiczone' ELSE '' END,
        CASE v_materialsampleid IS NULL WHEN FALSE THEN 'v_materialsampleid' ELSE '' END,
        CASE v_maximumdepthinmeters IS NULL WHEN FALSE THEN 'v_maximumdepthinmeters' ELSE '' END,
        CASE v_maximumdistanceabovesurfaceinmeters IS NULL WHEN FALSE THEN 'v_maximumdistanceabovesurfaceinmeters' ELSE '' END,
        CASE v_maximumelevationinmeters IS NULL WHEN FALSE THEN 'v_maximumelevationinmeters' ELSE '' END,
        CASE v_mediator IS NULL WHEN FALSE THEN 'v_mediator' ELSE '' END,
        CASE v_medium IS NULL WHEN FALSE THEN 'v_medium' ELSE '' END,
        CASE v_member IS NULL WHEN FALSE THEN 'v_member' ELSE '' END,
        CASE v_minimumdepthinmeters IS NULL WHEN FALSE THEN 'v_minimumdepthinmeters' ELSE '' END,
        CASE v_minimumdistanceabovesurfaceinmeters IS NULL WHEN FALSE THEN 'v_minimumdistanceabovesurfaceinmeters' ELSE '' END,
        CASE v_minimumelevationinmeters IS NULL WHEN FALSE THEN 'v_minimumelevationinmeters' ELSE '' END,
        CASE v_modified IS NULL WHEN FALSE THEN 'v_modified' ELSE '' END,
        CASE v_month IS NULL WHEN FALSE THEN 'v_month' ELSE '' END,
        CASE v_municipality IS NULL WHEN FALSE THEN 'v_municipality' ELSE '' END,
        CASE v_nameaccordingto IS NULL WHEN FALSE THEN 'v_nameaccordingto' ELSE '' END,
        CASE v_nameaccordingtoid IS NULL WHEN FALSE THEN 'v_nameaccordingtoid' ELSE '' END,
        CASE v_namepublishedin IS NULL WHEN FALSE THEN 'v_namepublishedin' ELSE '' END,
        CASE v_namepublishedinid IS NULL WHEN FALSE THEN 'v_namepublishedinid' ELSE '' END,
        CASE v_namepublishedinyear IS NULL WHEN FALSE THEN 'v_namepublishedinyear' ELSE '' END,
        CASE v_nomenclaturalcode IS NULL WHEN FALSE THEN 'v_nomenclaturalcode' ELSE '' END,
        CASE v_nomenclaturalstatus IS NULL WHEN FALSE THEN 'v_nomenclaturalstatus' ELSE '' END,
        CASE v_occurrencedetails IS NULL WHEN FALSE THEN 'v_occurrencedetails' ELSE '' END,
        CASE v_occurrenceid IS NULL WHEN FALSE THEN 'v_occurrenceid' ELSE '' END,
        CASE v_occurrenceremarks IS NULL WHEN FALSE THEN 'v_occurrenceremarks' ELSE '' END,
        CASE v_occurrencestatus IS NULL WHEN FALSE THEN 'v_occurrencestatus' ELSE '' END,
        CASE v_order_ IS NULL WHEN FALSE THEN 'v_order_' ELSE '' END,
        CASE v_originalnameusage IS NULL WHEN FALSE THEN 'v_originalnameusage' ELSE '' END,
        CASE v_originalnameusageid IS NULL WHEN FALSE THEN 'v_originalnameusageid' ELSE '' END,
        CASE v_othercatalognumbers IS NULL WHEN FALSE THEN 'v_othercatalognumbers' ELSE '' END,
        CASE v_ownerinstitutioncode IS NULL WHEN FALSE THEN 'v_ownerinstitutioncode' ELSE '' END,
        CASE v_parentnameusage IS NULL WHEN FALSE THEN 'v_parentnameusage' ELSE '' END,
        CASE v_parentnameusageid IS NULL WHEN FALSE THEN 'v_parentnameusageid' ELSE '' END,
        CASE v_phylum IS NULL WHEN FALSE THEN 'v_phylum' ELSE '' END,
        CASE v_pointradiusspatialfit IS NULL WHEN FALSE THEN 'v_pointradiusspatialfit' ELSE '' END,
        CASE v_preparations IS NULL WHEN FALSE THEN 'v_preparations' ELSE '' END,
        CASE v_previousidentifications IS NULL WHEN FALSE THEN 'v_previousidentifications' ELSE '' END,
        CASE v_provenance IS NULL WHEN FALSE THEN 'v_provenance' ELSE '' END,
        CASE v_publisher IS NULL WHEN FALSE THEN 'v_publisher' ELSE '' END,
        CASE v_recordedby IS NULL WHEN FALSE THEN 'v_recordedby' ELSE '' END,
        CASE v_recordnumber IS NULL WHEN FALSE THEN 'v_recordnumber' ELSE '' END,
        CASE v_references IS NULL WHEN FALSE THEN 'v_references' ELSE '' END,
        CASE v_relation IS NULL WHEN FALSE THEN 'v_relation' ELSE '' END,
        CASE v_replaces IS NULL WHEN FALSE THEN 'v_replaces' ELSE '' END,
        CASE v_reproductivecondition IS NULL WHEN FALSE THEN 'v_reproductivecondition' ELSE '' END,
        CASE v_requires IS NULL WHEN FALSE THEN 'v_requires' ELSE '' END,
        CASE v_rights IS NULL WHEN FALSE THEN 'v_rights' ELSE '' END,
        CASE v_rightsholder IS NULL WHEN FALSE THEN 'v_rightsholder' ELSE '' END,
        CASE v_samplingeffort IS NULL WHEN FALSE THEN 'v_samplingeffort' ELSE '' END,
        CASE v_samplingprotocol IS NULL WHEN FALSE THEN 'v_samplingprotocol' ELSE '' END,
        CASE v_scientificname IS NULL WHEN FALSE THEN 'v_scientificname' ELSE '' END,
        CASE v_scientificnameauthorship IS NULL WHEN FALSE THEN 'v_scientificnameauthorship' ELSE '' END,
        CASE v_scientificnameid IS NULL WHEN FALSE THEN 'v_scientificnameid' ELSE '' END,
        CASE v_sex IS NULL WHEN FALSE THEN 'v_sex' ELSE '' END,
        CASE v_source IS NULL WHEN FALSE THEN 'v_source' ELSE '' END,
        CASE v_spatial IS NULL WHEN FALSE THEN 'v_spatial' ELSE '' END,
        CASE v_specificepithet IS NULL WHEN FALSE THEN 'v_specificepithet' ELSE '' END,
        CASE v_startdayofyear IS NULL WHEN FALSE THEN 'v_startdayofyear' ELSE '' END,
        CASE v_stateprovince IS NULL WHEN FALSE THEN 'v_stateprovince' ELSE '' END,
        CASE v_subgenus IS NULL WHEN FALSE THEN 'v_subgenus' ELSE '' END,
        CASE v_subject IS NULL WHEN FALSE THEN 'v_subject' ELSE '' END,
        CASE v_tableofcontents IS NULL WHEN FALSE THEN 'v_tableofcontents' ELSE '' END,
        CASE v_taxonconceptid IS NULL WHEN FALSE THEN 'v_taxonconceptid' ELSE '' END,
        CASE v_taxonid IS NULL WHEN FALSE THEN 'v_taxonid' ELSE '' END,
        CASE v_taxonomicstatus IS NULL WHEN FALSE THEN 'v_taxonomicstatus' ELSE '' END,
        CASE v_taxonrank IS NULL WHEN FALSE THEN 'v_taxonrank' ELSE '' END,
        CASE v_taxonremarks IS NULL WHEN FALSE THEN 'v_taxonremarks' ELSE '' END,
        CASE v_temporal IS NULL WHEN FALSE THEN 'v_temporal' ELSE '' END,
        CASE v_title IS NULL WHEN FALSE THEN 'v_title' ELSE '' END,
        CASE v_type IS NULL WHEN FALSE THEN 'v_type' ELSE '' END,
        CASE v_typestatus IS NULL WHEN FALSE THEN 'v_typestatus' ELSE '' END,
        CASE v_valid IS NULL WHEN FALSE THEN 'v_valid' ELSE '' END,
        CASE v_verbatimcoordinates IS NULL WHEN FALSE THEN 'v_verbatimcoordinates' ELSE '' END,
        CASE v_verbatimcoordinatesystem IS NULL WHEN FALSE THEN 'v_verbatimcoordinatesystem' ELSE '' END,
        CASE v_verbatimdepth IS NULL WHEN FALSE THEN 'v_verbatimdepth' ELSE '' END,
        CASE v_verbatimelevation IS NULL WHEN FALSE THEN 'v_verbatimelevation' ELSE '' END,
        CASE v_verbatimeventdate IS NULL WHEN FALSE THEN 'v_verbatimeventdate' ELSE '' END,
        CASE v_verbatimlatitude IS NULL WHEN FALSE THEN 'v_verbatimlatitude' ELSE '' END,
        CASE v_verbatimlocality IS NULL WHEN FALSE THEN 'v_verbatimlocality' ELSE '' END,
        CASE v_verbatimlongitude IS NULL WHEN FALSE THEN 'v_verbatimlongitude' ELSE '' END,
        CASE v_verbatimsrs IS NULL WHEN FALSE THEN 'v_verbatimsrs' ELSE '' END,
        CASE v_verbatimtaxonrank IS NULL WHEN FALSE THEN 'v_verbatimtaxonrank' ELSE '' END,
        CASE v_vernacularname IS NULL WHEN FALSE THEN 'v_vernacularname' ELSE '' END,
        CASE v_waterbody IS NULL WHEN FALSE THEN 'v_waterbody' ELSE '' END,
        CASE v_year IS NULL WHEN FALSE THEN 'v_year' ELSE '' END,
        CASE valid IS NULL WHEN FALSE THEN 'valid' ELSE '' END,
        CASE verbatimcoordinates IS NULL WHEN FALSE THEN 'verbatimcoordinates' ELSE '' END,
        CASE verbatimcoordinatesystem IS NULL WHEN FALSE THEN 'verbatimcoordinatesystem' ELSE '' END,
        CASE verbatimdepth IS NULL WHEN FALSE THEN 'verbatimdepth' ELSE '' END,
        CASE verbatimelevation IS NULL WHEN FALSE THEN 'verbatimelevation' ELSE '' END,
        CASE verbatimeventdate IS NULL WHEN FALSE THEN 'verbatimeventdate' ELSE '' END,
        CASE verbatimlocality IS NULL WHEN FALSE THEN 'verbatimlocality' ELSE '' END,
        CASE verbatimsrs IS NULL WHEN FALSE THEN 'verbatimsrs' ELSE '' END,
        CASE verbatimtaxonrank IS NULL WHEN FALSE THEN 'verbatimtaxonrank' ELSE '' END,
        CASE vernacularname IS NULL WHEN FALSE THEN 'vernacularname' ELSE '' END,
        CASE waterbody IS NULL WHEN FALSE THEN 'waterbody' ELSE '' END,
        CASE xmlschema IS NULL WHEN FALSE THEN 'xmlschema' ELSE '' END,
        CASE year IS NULL WHEN FALSE THEN 'year' ELSE '' END
      )
    ) AS field
  FROM prod_b.occurrence_hdfs
) t1;

Returns:

acceptednameusage
acceptednameusageid
accessrights
associatedoccurrences
associatedreferences
associatedsequences
associatedtaxa
basisofrecord
bed
behavior
bibliographiccitation
catalognumber
class
classkey
collectioncode
collectionid
continent
coordinateaccuracy
countrycode
county
crawlid
created
datageneralizations
datasetid
datasetkey
datasetname
dateidentified
day
decimallatitude
decimallongitude
depth
depthaccuracy
disposition
dynamicproperties
earliestageorloweststage
earliesteonorlowesteonothem
earliestepochorlowestseries
earliesteraorlowesterathem
earliestperiodorlowestsystem
elevation
elevationaccuracy
enddayofyear
establishmentmeans
eventdate
eventid
eventremarks
eventtime
ext_multimedia
family
familykey
fieldnotes
fieldnumber
footprintspatialfit
footprintsrs
footprintwkt
formation
fragmentcreated
gbifid
genericname
genus
genuskey
geologicalcontextid
georeferencedby
georeferenceddate
georeferenceprotocol
georeferenceremarks
georeferencesources
georeferenceverificationstatus
group_
habitat
hascoordinate
hasgeospatialissues
higherclassification
highergeography
highergeographyid
highestbiostratigraphiczone
identificationid
identificationqualifier
identificationreferences
identificationremarks
identificationverificationstatus
identifiedby
identifier
individualid
informationwithheld
infraspecificepithet
institutioncode
institutionid
island
islandgroup
issue
kingdom
kingdomkey
language
lastcrawled
lastinterpreted
lastparsed
latestageorhigheststage
latesteonorhighesteonothem
latestepochorhighestseries
latesteraorhighesterathem
latestperiodorhighestsystem
lifestage
lithostratigraphicterms
locality
locationaccordingto
locationid
locationremarks
lowestbiostratigraphiczone
materialsampleid
maximumdistanceabovesurfaceinmeters
mediatype
member
minimumdistanceabovesurfaceinmeters
modified
month
municipality
nameaccordingto
nameaccordingtoid
namepublishedin
namepublishedinid
namepublishedinyear
nomenclaturalcode
nomenclaturalstatus
occurrencedetails
occurrenceid
occurrenceremarks
occurrencestatus
order_
orderkey
originalnameusage
originalnameusageid
othercatalognumbers
ownerinstitutioncode
parentnameusage
parentnameusageid
phylum
phylumkey
pointradiusspatialfit
preparations
previousidentifications
protocol
publishingcountry
publishingorgkey
recordedby
recordnumber
references
reproductivecondition
rights
rightsholder
samplingeffort
samplingprotocol
scientificname
scientificnameid
sex
source
species
specieskey
specificepithet
startdayofyear
stateprovince
taxonconceptid
taxonid
taxonkey
taxonomicstatus
taxonrank
taxonremarks
type
typestatus
typifiedname
unitqualifier
v_acceptednameusage
v_acceptednameusageid
v_accessrights
v_associatedmedia
v_associatedoccurrences
v_associatedreferences
v_associatedsequences
v_associatedtaxa
v_basisofrecord
v_bed
v_behavior
v_bibliographiccitation
v_catalognumber
v_class
v_collectioncode
v_collectionid
v_continent
v_coordinateprecision
v_coordinateuncertaintyinmeters
v_country
v_countrycode
v_county
v_created
v_datageneralizations
v_datasetid
v_datasetname
v_dateidentified
v_day
v_decimallatitude
v_decimallongitude
v_disposition
v_dynamicproperties
v_earliestageorloweststage
v_earliesteonorlowesteonothem
v_earliestepochorlowestseries
v_earliesteraorlowesterathem
v_earliestperiodorlowestsystem
v_enddayofyear
v_establishmentmeans
v_eventdate
v_eventid
v_eventremarks
v_eventtime
v_family
v_fieldnotes
v_fieldnumber
v_footprintspatialfit
v_footprintsrs
v_footprintwkt
v_formation
v_genus
v_geodeticdatum
v_geologicalcontextid
v_georeferencedby
v_georeferenceddate
v_georeferenceprotocol
v_georeferenceremarks
v_georeferencesources
v_georeferenceverificationstatus
v_group_
v_habitat
v_higherclassification
v_highergeography
v_highergeographyid
v_highestbiostratigraphiczone
v_identificationid
v_identificationqualifier
v_identificationreferences
v_identificationremarks
v_identificationverificationstatus
v_identifiedby
v_identifier
v_individualcount
v_individualid
v_informationwithheld
v_infraspecificepithet
v_institutioncode
v_institutionid
v_island
v_islandgroup
v_kingdom
v_language
v_latestageorhigheststage
v_latesteonorhighesteonothem
v_latestepochorhighestseries
v_latesteraorhighesterathem
v_latestperiodorhighestsystem
v_lifestage
v_lithostratigraphicterms
v_locality
v_locationaccordingto
v_locationid
v_locationremarks
v_lowestbiostratigraphiczone
v_materialsampleid
v_maximumdepthinmeters
v_maximumdistanceabovesurfaceinmeters
v_maximumelevationinmeters
v_member
v_minimumdepthinmeters
v_minimumdistanceabovesurfaceinmeters
v_minimumelevationinmeters
v_modified
v_month
v_municipality
v_nameaccordingto
v_nameaccordingtoid
v_namepublishedin
v_namepublishedinid
v_namepublishedinyear
v_nomenclaturalcode
v_nomenclaturalstatus
v_occurrencedetails
v_occurrenceid
v_occurrenceremarks
v_occurrencestatus
v_order_
v_originalnameusage
v_originalnameusageid
v_othercatalognumbers
v_ownerinstitutioncode
v_parentnameusage
v_parentnameusageid
v_phylum
v_pointradiusspatialfit
v_preparations
v_previousidentifications
v_recordedby
v_recordnumber
v_references
v_reproductivecondition
v_rights
v_rightsholder
v_samplingeffort
v_samplingprotocol
v_scientificname
v_scientificnameauthorship
v_scientificnameid
v_sex
v_source
v_specificepithet
v_startdayofyear
v_stateprovince
v_subgenus
v_taxonconceptid
v_taxonid
v_taxonomicstatus
v_taxonrank
v_taxonremarks
v_type
v_typestatus
v_verbatimcoordinates
v_verbatimcoordinatesystem
v_verbatimdepth
v_verbatimelevation
v_verbatimeventdate
v_verbatimlatitude
v_verbatimlocality
v_verbatimlongitude
v_verbatimsrs
v_verbatimtaxonrank
v_vernacularname
v_waterbody
v_year
verbatimcoordinates
verbatimcoordinatesystem
verbatimdepth
verbatimelevation
verbatimeventdate
verbatimlocality
verbatimsrs
verbatimtaxonrank
vernacularname
waterbody
xmlschema
year
@mdoering
Copy link
Copy Markdown

If there are nearly 100 columns always NULL across all records building the hdfs table with columns that actually do have values is a very good first step. I think we should definitely do that.

To improve the user friendliness I like the idea of providing a few standard formats best as it allows users to easily build standard consuming software on top of those archives. We might want to discuss this with the actual users, but I can see at least the following 3 formats:

  • complete with verbatim
  • complete, only interpreted
  • minimal suitable for niche modelling (gbifID, BoR, lat, lon, altitude, country, taxID, sciName, rank, eventDate)

Another often requested format for downloads is a species checklist. This is actually a spatial aggregation, returning all taxa found in an occurrence search (not necessarily just spatial quries). Thats a different issue but would definitely help to satisfy our users needs: http://dev.gbif.org/issues/browse/POR-242

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment