Skip to content

Instantly share code, notes, and snippets.

@david-andrew
Created February 21, 2024 13:49
Show Gist options
  • Select an option

  • Save david-andrew/0a71ad1e7bab8649d20381f5bf0ddd9b to your computer and use it in GitHub Desktop.

Select an option

Save david-andrew/0a71ad1e7bab8649d20381f5bf0ddd9b to your computer and use it in GitHub Desktop.
Several LLM Dojo Annotation Examples
Meta(path=PosixPath('datasets/mock_aqi.csv'), name='Air Quality Index', description='This dataset represents daily air quality observations collected from various monitoring stations across different cities worldwide. Each row corresponds to a single observation with details about the date, time, and location of the observation, along with specific air quality metrics and conditions.')
LLM identified column "year" as a DATE
LLM identified column "month" as a DATE
LLM identified column "day" as a DATE
LLM identified column "time" as a DATE
LLM identified column "lat" as a GEO
LLM identified column "lon" as a GEO
LLM identified column "country" as a GEO
LLM identified column "admin1" as a GEO
LLM identified column "admin2" as a GEO
LLM identified column "admin3" as a GEO
LLM identified column "AQI" as a FEATURE
LLM identified column "PM2.5" as a FEATURE
LLM identified column "CO_Level" as a FEATURE
LLM identified column "Is_Industrial" as a FEATURE
LLM identified column "Traffic_Density" as a FEATURE
LLM identified DATE column "year" as a YEAR
LLM identified DATE column "month" as a MONTH
LLM identified DATE column "day" as a DAY
LLM identified DATE column "time" as a DATE
LLM identified GEO column "lat" as a LATITUDE
LLM identified GEO column "lon" as a LONGITUDE
LLM identified GEO column "country" as a COUNTRY
LLM identified GEO column "admin1" as a STATE
LLM identified GEO column "admin2" as a COUNTY
LLM identified GEO column "admin3" as a COUNTY
LLM identified FEATURE column "AQI" as a INT
LLM identified FEATURE column "PM2.5" as a INT
LLM identified FEATURE column "CO_Level" as a INT
LLM identified FEATURE column "Is_Industrial" as a BINARY
LLM identified FEATURE column "Traffic_Density" as a STR
LLM identified no units for feature column "AQI"
LLM provided units and description for feature column "PM2.5": µg/m^3. The units "µg/m^3" represent micrograms per cubic meter, a measure of the concentration of a particulate matter (in this case, particles with diameters of 2.5 micrometers or smaller) in the air.
LLM was unsure about the units for feature column "CO_Level"
LLM identified no units for feature column "Is_Industrial"
LLM identified no units for feature column "Traffic_Density"
LLM identified coordinate pair: ('lon', 'lat')
LLM identified ('lon', 'lat') as the primary geo column(s)
LLM identified date group: ('day', 'year', 'month')
LLM identified ('day', 'year', 'month') as the primary date column(s)
LLM identified DATE/YEAR column "year" strftime format: "%Y"
LLM identified DATE/MONTH column "month" strftime format: "%m"
LLM identified DATE/DAY column "day" strftime format: "%d"
LLM identified DATE/DATE column "time" strftime format: "%H:%M"
LLM provided description for feature column "AQI": "The Air Quality Index (AQI) is a metric used to communicate how clean or polluted the air is on a daily basis. It quantifies the level of air pollution with numerical values; lower values indicate cleaner air, while higher values signify more polluted air. This index is based on the concentrations of several major pollutants, including particulate matter, sulfur dioxide, carbon monoxide, nitrogen dioxide, and ozone. The AQI scale typically ranges from 0 to 500, where a value of 100 generally corresponds to the national air quality standard for the pollutant, with values above 100 indicating poor air quality and potentially harmful health effects for certain sensitive groups of people."
LLM provided description for feature column "PM2.5": "This feature represents the concentration of particulate matter with a diameter of 2.5 micrometers or less (PM2.5) present in the air. PM2.5 is a significant air pollutant due to its ability to penetrate deep into the lungs and bloodstream, potentially resulting in various health problems. The values are measured in micrograms per cubic meter (µg/m^3), indicating the amount of particulate matter per volume of air."
LLM provided description for feature column "CO_Level": "This column quantifies the level of carbon monoxide (CO) in the air, measured as an integer value that typically represents categories or concentrations defined by air quality standards. Higher numbers indicate greater concentrations of CO, a colorless, odorless gas that can be harmful to health at elevated levels."
LLM provided description for feature column "Is_Industrial": "Indicates whether the air quality measurement was taken in an industrial area or not. Values are true for industrial areas and false for non-industrial areas."
LLM provided description for feature column "Traffic_Density": "Indicates the level of vehicle congestion in the area, with classifications such as High, Medium, or Low, reflecting the intensity of traffic flow."
LLM provided description for date column "year": "The column represents the year in which the air quality measurements were recorded. Each entry denotes the specific year associated with the corresponding air quality data, formatted as a four-digit number."
LLM provided description for date column "month": "This column represents the month of the year when the air quality data was recorded, using numerical values where January is 1 and December is 12."
LLM provided description for date column "day": "Represents the day of the month on which air quality measurements were taken. The values are recorded as integers ranging from 1 to 31, corresponding to the days in a month."
LLM provided description for date column "time": "This column records the specific hour of the day when air quality measurements were taken, formatted in 24-hour time from 00:00 to 23:59."
LLM provided description for geo column "lat": "This dataset column contains the latitude coordinates for various locations, representing the north-south position of a point on the Earth's surface. The values are in decimal degrees, where positive values indicate latitudes north of the equator, and negative values indicate latitudes south of the equator."
LLM provided description for geo column "lon": "This dataset contains the longitude coordinates of various locations, measured in degrees. These coordinates indicate the east-west position on the Earth's surface, with values ranging from -180 degrees (west) to +180 degrees (east) of the Prime Meridian."
LLM provided description for geo column "country": "This dataset segment lists the nations where air quality measurements were taken, showcasing a range of locations that includes the United States, the United Kingdom, Japan, and France, among others."
LLM provided description for geo column "admin1": "This dataset column contains names of major administrative subdivisions within countries, such as states in the United States, counties in the United Kingdom, prefectures in Japan, and regions in France. These divisions reflect the geographical areas relevant for air quality management and reporting."
LLM provided description for geo column "admin2": "This column contains the names of cities or local administrative regions where air quality measurements were taken, indicating the specific urban or local area within broader national or administrative boundaries."
LLM provided description for geo column "admin3": "This dataset includes a geographical classification that details specific urban or administrative areas within cities around the world. These areas range from the borough level, such as in New York City, to districts within other global cities, covering different municipal or local governance zones. Each entry identifies a precise location within larger metropolitan areas, providing a granular view of air quality measurements across diverse urban environments."
geo=[GeoAnnotation(name='lat', display_name=None, description="This dataset column contains the latitude coordinates for various locations, representing the north-south position of a point on the Earth's surface. The values are in decimal degrees, where positive values indicate latitudes north of the equator, and negative values indicate latitudes south of the equator.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LATITUDE: 'latitude'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='lon', display_name=None, description="This dataset contains the longitude coordinates of various locations, measured in degrees. These coordinates indicate the east-west position on the Earth's surface, with values ranging from -180 degrees (west) to +180 degrees (east) of the Prime Meridian.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LONGITUDE: 'longitude'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair='lat', coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='country', display_name=None, description='This dataset segment lists the nations where air quality measurements were taken, showcasing a range of locations that includes the United States, the United Kingdom, Japan, and France, among others.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin1', display_name=None, description='This dataset column contains names of major administrative subdivisions within countries, such as states in the United States, counties in the United Kingdom, prefectures in Japan, and regions in France. These divisions reflect the geographical areas relevant for air quality management and reporting.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.STATE: 'state/territory'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin2', display_name=None, description='This column contains the names of cities or local administrative regions where air quality measurements were taken, indicating the specific urban or local area within broader national or administrative boundaries.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTY: 'county/district'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin3', display_name=None, description='This dataset includes a geographical classification that details specific urban or administrative areas within cities around the world. These areas range from the borough level, such as in New York City, to districts within other global cities, covering different municipal or local governance zones. Each entry identifies a precise location within larger metropolitan areas, providing a granular view of air quality measurements across diverse urban environments.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTY: 'county/district'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None)] date=[DateAnnotation(name='year', display_name=None, description='The column represents the year in which the air quality measurements were recorded. Each entry denotes the specific year associated with the corresponding air quality data, formatted as a four-digit number.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.YEAR: 'year'>, primary_date=True, time_format='%Y', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='month', display_name=None, description='This column represents the month of the year when the air quality data was recorded, using numerical values where January is 1 and December is 12.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.MONTH: 'month'>, primary_date=True, time_format='%m', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='day', display_name=None, description='Represents the day of the month on which air quality measurements were taken. The values are recorded as integers ranging from 1 to 31, corresponding to the days in a month.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.DAY: 'day'>, primary_date=True, time_format='%d', associated_columns={<TimeField.DAY: 'Day'>: 'day', <TimeField.YEAR: 'Year'>: 'year', <TimeField.MONTH: 'Month'>: 'month'}, qualifies=None, aliases={}), DateAnnotation(name='time', display_name=None, description='This column records the specific hour of the day when air quality measurements were taken, formatted in 24-hour time from 00:00 to 23:59.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.DATE: 'date'>, primary_date=None, time_format='%H:%M', associated_columns=None, qualifies=None, aliases={})] feature=[FeatureAnnotation(name='AQI', display_name=None, description='The Air Quality Index (AQI) is a metric used to communicate how clean or polluted the air is on a daily basis. It quantifies the level of air pollution with numerical values; lower values indicate cleaner air, while higher values signify more polluted air. This index is based on the concentrations of several major pollutants, including particulate matter, sulfur dioxide, carbon monoxide, nitrogen dioxide, and ozone. The AQI scale typically ranges from 0 to 500, where a value of 100 generally corresponds to the national air quality standard for the pollutant, with values above 100 indicating poor air quality and potentially harmful health effects for certain sensitive groups of people.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='PM2.5', display_name=None, description='This feature represents the concentration of particulate matter with a diameter of 2.5 micrometers or less (PM2.5) present in the air. PM2.5 is a significant air pollutant due to its ability to penetrate deep into the lungs and bloodstream, potentially resulting in various health problems. The values are measured in micrograms per cubic meter (µg/m^3), indicating the amount of particulate matter per volume of air.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='µg/m^3', units_description='The units "µg/m^3" represent micrograms per cubic meter, a measure of the concentration of a particulate matter (in this case, particles with diameters of 2.5 micrometers or smaller) in the air.', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='CO_Level', display_name=None, description='This column quantifies the level of carbon monoxide (CO) in the air, measured as an integer value that typically represents categories or concentrations defined by air quality standards. Higher numbers indicate greater concentrations of CO, a colorless, odorless gas that can be harmful to health at elevated levels.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units=None, units_description=None, qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Is_Industrial', display_name=None, description='Indicates whether the air quality measurement was taken in an industrial area or not. Values are true for industrial areas and false for non-industrial areas.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.BINARY: 'binary'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Traffic_Density', display_name=None, description='Indicates the level of vehicle congestion in the area, with classifications such as High, Medium, or Low, reflecting the intensity of traffic flow.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={})]
Meta(path=PosixPath('datasets/mock_aqi2.csv'), name='Air Quality Index', description='This dataset represents daily air quality observations collected from various monitoring stations across different cities worldwide. Each row corresponds to a single observation with details about the date, time, and location of the observation, along with specific air quality metrics and conditions.')
LLM identified column "time" as a DATE
LLM identified column "location" as a GEO
LLM identified column "country" as a GEO
LLM identified column "admin1" as a GEO
LLM identified column "admin2" as a GEO
LLM identified column "admin3" as a GEO
LLM identified column "AQI" as a FEATURE
LLM identified column "PM2.5" as a FEATURE
LLM identified column "CO_Level" as a FEATURE
LLM identified column "Is_Industrial" as a FEATURE
LLM identified column "Traffic_Density" as a FEATURE
LLM identified DATE column "time" as a DATE
LLM identified GEO column "location" as a COORDINATES
LLM identified GEO column "country" as a COUNTRY
LLM identified GEO column "admin1" as a STATE
LLM identified GEO column "admin2" as a COUNTY
LLM identified GEO column "admin3" as a CITY
LLM identified FEATURE column "AQI" as a INT
LLM identified FEATURE column "PM2.5" as a INT
LLM identified FEATURE column "CO_Level" as a INT
LLM identified FEATURE column "Is_Industrial" as a BOOLEAN
LLM identified FEATURE column "Traffic_Density" as a STR
LLM identified no units for feature column "AQI"
LLM provided units and description for feature column "PM2.5": µg/m^3. The units "µg/m^3" refer to micrograms per cubic meter, a measurement of particle concentration in air, indicating the mass of particles (in this case, PM2.5) per volume of air.
LLM was unsure about the units for feature column "CO_Level"
LLM identified no units for feature column "Is_Industrial"
LLM identified no units for feature column "Traffic_Density"
LLM identified coordinate column "location" as having format: "LATLON"
LLM identified location as the primary geo column(s)
LLM identified 'time' as the primary date
LLM identified DATE/DATE column "time" strftime format: "%Y:%m:%d:%H:%M"
LLM provided description for feature column "AQI": "The Air Quality Index (AQI) is a numerical scale used to communicate the quality of the air in a specific area, with lower values indicating cleaner air and higher values signaling poorer air quality. This index is calculated based on the concentration of air pollutants, such as particulate matter and gases, which can affect human health and environmental conditions."
LLM provided description for feature column "PM2.5": "PM2.5 refers to fine particulate matter that is less than 2.5 micrometers in diameter. These particles are so small that they can be inhaled into the deepest parts of the lungs and may cause various health problems, such as respiratory and cardiovascular issues. The given values represent the concentration of these particles in the air, measured in micrograms per cubic meter (µg/m^3), indicating the level of air pollution and air quality in a specific area."
LLM provided description for feature column "CO_Level": "This field records the concentration levels of Carbon Monoxide (CO) in the air, measured on a scale where the numerical values correspond to specific categories indicating the extent of CO presence. Higher values reflect greater concentrations of Carbon Monoxide."
LLM provided description for feature column "Is_Industrial": "Indicates whether the air quality measurement was taken in an industrial area."
LLM provided description for feature column "Traffic_Density": "Indicates the level of vehicle congestion in a specific area, ranging from low to high."
LLM provided description for date column "time": "This column records the date and time of air quality measurements, formatted as Year:Month:Day:Hour:Minute."
LLM provided description for geo column "location": "This column contains geographical coordinates, specified by latitude and longitude pairs, representing different locations. Each pair's first value denotes the latitude (north or south of the equator) and the second value denotes the longitude (east or west of the Prime Meridian). These coordinates are used to precisely identify places on the Earth's surface."
LLM provided description for geo column "country": "This column represents the countries from which the air quality indexes are recorded. Each row denotes the country associated with a specific measurement, capturing data from various nations including the United States, the United Kingdom, Japan, France, among others."
LLM provided description for geo column "admin1": "This dataset field represents the first-level administrative divisions within various countries, which could include states, provinces, territories, or regions, depending on the nation's administrative structure. These divisions are directly below the national level and are significant in the context of governance, administrative control, and regional differentiation. Examples from the data include regions such as "New York" in the United States, "England" in the United Kingdom, "Tokyo" in Japan, "Île-de-France" in France, and "California" in the United States."
LLM provided description for geo column "admin2": "This dataset column categorizes global locations at a sub-national level, typically representing cities or counties within larger administrative divisions, such as states or provinces."
LLM provided description for geo column "admin3": "This column represents the names of smaller administrative divisions or districts within cities or larger administrative regions across different countries. These could range from neighborhoods, boroughs, or specific urban districts known for their unique identity, governance, or urban characteristics."
geo=[GeoAnnotation(name='location', display_name=None, description="This column contains geographical coordinates, specified by latitude and longitude pairs, representing different locations. Each pair's first value denotes the latitude (north or south of the equator) and the second value denotes the longitude (east or west of the Prime Meridian). These coordinates are used to precisely identify places on the Earth's surface.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COORDINATES: 'coordinates'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair=None, coord_format=<CoordFormat.LATLON: 'latlon'>, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='country', display_name=None, description='This column represents the countries from which the air quality indexes are recorded. Each row denotes the country associated with a specific measurement, capturing data from various nations including the United States, the United Kingdom, Japan, France, among others.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin1', display_name=None, description='This dataset field represents the first-level administrative divisions within various countries, which could include states, provinces, territories, or regions, depending on the nation\'s administrative structure. These divisions are directly below the national level and are significant in the context of governance, administrative control, and regional differentiation. Examples from the data include regions such as "New York" in the United States, "England" in the United Kingdom, "Tokyo" in Japan, "Île-de-France" in France, and "California" in the United States.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.STATE: 'state/territory'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin2', display_name=None, description='This dataset column categorizes global locations at a sub-national level, typically representing cities or counties within larger administrative divisions, such as states or provinces.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTY: 'county/district'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin3', display_name=None, description='This column represents the names of smaller administrative divisions or districts within cities or larger administrative regions across different countries. These could range from neighborhoods, boroughs, or specific urban districts known for their unique identity, governance, or urban characteristics.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.CITY: 'municipality/town'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None)] date=[DateAnnotation(name='time', display_name=None, description='This column records the date and time of air quality measurements, formatted as Year:Month:Day:Hour:Minute.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.DATE: 'date'>, primary_date=True, time_format='%Y:%m:%d:%H:%M', associated_columns=None, qualifies=None, aliases={})] feature=[FeatureAnnotation(name='AQI', display_name=None, description='The Air Quality Index (AQI) is a numerical scale used to communicate the quality of the air in a specific area, with lower values indicating cleaner air and higher values signaling poorer air quality. This index is calculated based on the concentration of air pollutants, such as particulate matter and gases, which can affect human health and environmental conditions.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='PM2.5', display_name=None, description='PM2.5 refers to fine particulate matter that is less than 2.5 micrometers in diameter. These particles are so small that they can be inhaled into the deepest parts of the lungs and may cause various health problems, such as respiratory and cardiovascular issues. The given values represent the concentration of these particles in the air, measured in micrograms per cubic meter (µg/m^3), indicating the level of air pollution and air quality in a specific area.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='µg/m^3', units_description='The units "µg/m^3" refer to micrograms per cubic meter, a measurement of particle concentration in air, indicating the mass of particles (in this case, PM2.5) per volume of air.', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='CO_Level', display_name=None, description='This field records the concentration levels of Carbon Monoxide (CO) in the air, measured on a scale where the numerical values correspond to specific categories indicating the extent of CO presence. Higher values reflect greater concentrations of Carbon Monoxide.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units=None, units_description=None, qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Is_Industrial', display_name=None, description='Indicates whether the air quality measurement was taken in an industrial area.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.BOOLEAN: 'boolean'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Traffic_Density', display_name=None, description='Indicates the level of vehicle congestion in a specific area, ranging from low to high.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={})]
Meta(path=PosixPath('datasets/mock_aqi3.csv'), name='Air Quality Index', description='This dataset represents daily air quality observations collected from various monitoring stations across different cities worldwide. Each row corresponds to a single observation with details about the date, time, and location of the observation, along with specific air quality metrics and conditions.')
LLM identified column "year" as a DATE
LLM identified column "month" as a DATE
LLM identified column "day" as a DATE
LLM identified column "hour" as a DATE
LLM identified column "minute" as a DATE
LLM identified column "lat" as a GEO
LLM identified column "lon" as a GEO
LLM identified column "country" as a GEO
LLM identified column "admin1" as a GEO
LLM identified column "admin2" as a GEO
LLM identified column "admin3" as a GEO
LLM identified column "AQI" as a FEATURE
LLM identified column "PM2.5" as a FEATURE
LLM identified column "CO_Level" as a FEATURE
LLM identified column "Is_Industrial" as a FEATURE
LLM identified column "Traffic_Density" as a FEATURE
LLM identified DATE column "year" as a YEAR
LLM identified DATE column "month" as a MONTH
LLM identified DATE column "day" as a DAY
LLM identified DATE column "hour" as a DATE
LLM identified DATE column "minute" as a DATE
LLM identified GEO column "lat" as a LATITUDE
LLM identified GEO column "lon" as a LONGITUDE
LLM identified GEO column "country" as a COUNTRY
LLM identified GEO column "admin1" as a STATE
LLM identified GEO column "admin2" as a COUNTY
LLM identified GEO column "admin3" as a CITY
LLM identified FEATURE column "AQI" as a INT
LLM identified FEATURE column "PM2.5" as a INT
LLM identified FEATURE column "CO_Level" as a INT
LLM identified FEATURE column "Is_Industrial" as a BOOLEAN
LLM identified FEATURE column "Traffic_Density" as a STR
LLM identified no units for feature column "AQI"
LLM provided units and description for feature column "PM2.5": μg/m^3. The unit "μg/m^3" stands for micrograms per cubic meter, measuring the concentration of a substance (in this case, particulate matter with a diameter of less than 2.5 micrometers) in air.
LLM was unsure about the units for feature column "CO_Level"
LLM identified no units for feature column "Is_Industrial"
LLM identified no units for feature column "Traffic_Density"
LLM identified coordinate pair: ('lon', 'lat')
LLM identified ('lon', 'lat') as the primary geo column(s)
LLM identified date group: ('day', 'year', 'month')
LLM identified ('day', 'year', 'month') as the primary date column(s)
LLM identified DATE/YEAR column "year" strftime format: "%Y"
LLM identified DATE/MONTH column "month" strftime format: "%m"
LLM identified DATE/DAY column "day" strftime format: "%d"
LLM identified DATE/DATE column "hour" strftime format: "%H"
LLM identified DATE/DATE column "minute" strftime format: "%M"
LLM provided description for feature column "AQI": "The Air Quality Index (AQI) is a measure used to communicate how polluted the air currently is or how polluted it is forecast to become. AQI values at or below 100 are generally thought to be satisfactory, whereas values above 100 indicate levels of health concern, with higher values denoting increasing levels of air pollution and greater health risks. The AQI is calculated using data from multiple air pollutants, including particulate matter, sulfur dioxide, carbon monoxide, nitrogen dioxide, and ground-level ozone."
LLM provided description for feature column "PM2.5": "Particulate matter (PM2.5) refers to atmospheric particles with a diameter of less than 2.5 micrometers. These fine particles can originate from various sources, including vehicle emissions, industrial processes, and natural sources such as wildfires and volcanic eruptions. Due to their small size, they can adversely affect respiratory and cardiovascular health by penetrating deep into the lungs and entering the bloodstream. Monitoring PM2.5 concentrations in the air is crucial for assessing air quality and managing public health risks."
LLM provided description for feature column "CO_Level": "This feature represents the concentration levels of Carbon Monoxide (CO) in the air, recorded as an integer. CO levels are commonly used as an indicator of air quality, with higher values indicating greater concentrations of Carbon Monoxide."
LLM provided description for feature column "Is_Industrial": "Indicates whether the air quality measurement was taken in an industrial area or not, with 'True' signifying an industrial setting and 'False' indicating a non-industrial or residential area."
LLM provided description for feature column "Traffic_Density": "Indicates the level of vehicle congestion on roads within a given area, categorized into levels such as High, Medium, and Low to reflect the intensity of traffic flow."
LLM provided description for date column "year": "The column represents the year when the air quality index measurements were recorded, using a four-digit format."
LLM provided description for date column "month": "This column represents the month of the year in which air quality data was recorded, with values ranging from 1 to 12 corresponding to January through December."
LLM provided description for date column "day": "This column represents the day of the month, formatted as a two-digit number. It is part of a date system that includes year and month information from associated columns, providing precise temporal context for the data records."
LLM provided description for date column "hour": "This column records the hour of the day when air quality measurements were taken, using a 24-hour clock format."
LLM provided description for date column "minute": "This column records the minutes component of timestamps, indicating the specific minute within an hour at which air quality measurements were taken. The format follows a 0-59 range to represent the first to the last minute of an hour."
LLM provided description for geo column "lat": "This dataset column contains the latitude coordinates for various locations, measured in degrees. Latitudes range from -90 degrees at the South Pole to +90 degrees at the North Pole, with the equator at 0 degrees. These values indicate the north-south position of a point on the Earth's surface."
LLM provided description for geo column "lon": "This column represents the longitude coordinates of various locations, measured in degrees. Longitude values indicate the location's east-west position on the Earth's surface, with negative values representing locations west of the Prime Meridian and positive values for locations east."
LLM provided description for geo column "country": "This field indicates the country where the air quality data was recorded, reflecting the geographical location from which each measurement originates."
LLM provided description for geo column "admin1": "This dataset column contains values representing the first-level administrative divisions within various countries, such as states, provinces, or regions. These divisions include, but are not limited to, locations like New York in the United States, England in the United Kingdom, Tokyo in Japan, Île-de-France in France, and California in the United States. The data is indicative of the geographical areas being analyzed or reported on in the context of air quality."
LLM provided description for geo column "admin2": "This column contains the names of cities around the world, each representing a specific location where air quality measurements have been taken or are being monitored."
LLM provided description for geo column "admin3": "The column contains the names of sub-city level administrative divisions, districts, or neighborhoods from various global cities, indicating the specific areas where air quality measurements were taken."
geo=[GeoAnnotation(name='lat', display_name=None, description="This dataset column contains the latitude coordinates for various locations, measured in degrees. Latitudes range from -90 degrees at the South Pole to +90 degrees at the North Pole, with the equator at 0 degrees. These values indicate the north-south position of a point on the Earth's surface.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LATITUDE: 'latitude'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='lon', display_name=None, description="This column represents the longitude coordinates of various locations, measured in degrees. Longitude values indicate the location's east-west position on the Earth's surface, with negative values representing locations west of the Prime Meridian and positive values for locations east.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LONGITUDE: 'longitude'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair='lat', coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='country', display_name=None, description='This field indicates the country where the air quality data was recorded, reflecting the geographical location from which each measurement originates.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin1', display_name=None, description='This dataset column contains values representing the first-level administrative divisions within various countries, such as states, provinces, or regions. These divisions include, but are not limited to, locations like New York in the United States, England in the United Kingdom, Tokyo in Japan, Île-de-France in France, and California in the United States. The data is indicative of the geographical areas being analyzed or reported on in the context of air quality.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.STATE: 'state/territory'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin2', display_name=None, description='This column contains the names of cities around the world, each representing a specific location where air quality measurements have been taken or are being monitored.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTY: 'county/district'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin3', display_name=None, description='The column contains the names of sub-city level administrative divisions, districts, or neighborhoods from various global cities, indicating the specific areas where air quality measurements were taken.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.CITY: 'municipality/town'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None)] date=[DateAnnotation(name='year', display_name=None, description='The column represents the year when the air quality index measurements were recorded, using a four-digit format.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.YEAR: 'year'>, primary_date=True, time_format='%Y', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='month', display_name=None, description='This column represents the month of the year in which air quality data was recorded, with values ranging from 1 to 12 corresponding to January through December.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.MONTH: 'month'>, primary_date=True, time_format='%m', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='day', display_name=None, description='This column represents the day of the month, formatted as a two-digit number. It is part of a date system that includes year and month information from associated columns, providing precise temporal context for the data records.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.DAY: 'day'>, primary_date=True, time_format='%d', associated_columns={<TimeField.DAY: 'Day'>: 'day', <TimeField.YEAR: 'Year'>: 'year', <TimeField.MONTH: 'Month'>: 'month'}, qualifies=None, aliases={}), DateAnnotation(name='hour', display_name=None, description='This column records the hour of the day when air quality measurements were taken, using a 24-hour clock format.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.DATE: 'date'>, primary_date=None, time_format='%H', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='minute', display_name=None, description='This column records the minutes component of timestamps, indicating the specific minute within an hour at which air quality measurements were taken. The format follows a 0-59 range to represent the first to the last minute of an hour.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.DATE: 'date'>, primary_date=None, time_format='%M', associated_columns=None, qualifies=None, aliases={})] feature=[FeatureAnnotation(name='AQI', display_name=None, description='The Air Quality Index (AQI) is a measure used to communicate how polluted the air currently is or how polluted it is forecast to become. AQI values at or below 100 are generally thought to be satisfactory, whereas values above 100 indicate levels of health concern, with higher values denoting increasing levels of air pollution and greater health risks. The AQI is calculated using data from multiple air pollutants, including particulate matter, sulfur dioxide, carbon monoxide, nitrogen dioxide, and ground-level ozone.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='PM2.5', display_name=None, description='Particulate matter (PM2.5) refers to atmospheric particles with a diameter of less than 2.5 micrometers. These fine particles can originate from various sources, including vehicle emissions, industrial processes, and natural sources such as wildfires and volcanic eruptions. Due to their small size, they can adversely affect respiratory and cardiovascular health by penetrating deep into the lungs and entering the bloodstream. Monitoring PM2.5 concentrations in the air is crucial for assessing air quality and managing public health risks.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='μg/m^3', units_description='The unit "μg/m^3" stands for micrograms per cubic meter, measuring the concentration of a substance (in this case, particulate matter with a diameter of less than 2.5 micrometers) in air.', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='CO_Level', display_name=None, description='This feature represents the concentration levels of Carbon Monoxide (CO) in the air, recorded as an integer. CO levels are commonly used as an indicator of air quality, with higher values indicating greater concentrations of Carbon Monoxide.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units=None, units_description=None, qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Is_Industrial', display_name=None, description="Indicates whether the air quality measurement was taken in an industrial area or not, with 'True' signifying an industrial setting and 'False' indicating a non-industrial or residential area.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.BOOLEAN: 'boolean'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Traffic_Density', display_name=None, description='Indicates the level of vehicle congestion on roads within a given area, categorized into levels such as High, Medium, and Low to reflect the intensity of traffic flow.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={})]
Meta(path=PosixPath('datasets/mock_aqi4.csv'), name='Air Quality Index', description='This dataset represents daily air quality observations collected from various monitoring stations across different cities worldwide. Each row corresponds to a single observation with details about the date, time, and location of the observation, along with specific air quality metrics and conditions.')
LLM identified column "year" as a DATE
LLM identified column "month" as a DATE
LLM identified column "day" as a DATE
LLM identified column "time" as a DATE
LLM identified column "latitude" as a GEO
LLM identified column "longitude" as a GEO
LLM identified column "Latitude_Station" as a GEO
LLM identified column "Longitude_Station" as a GEO
LLM identified column "country" as a GEO
LLM identified column "admin1" as a GEO
LLM identified column "admin2" as a GEO
LLM identified column "admin3" as a GEO
LLM identified column "AQI" as a FEATURE
LLM identified column "PM2.5" as a FEATURE
LLM identified column "CO_Level" as a FEATURE
LLM identified column "Is_Industrial" as a FEATURE
LLM identified column "Traffic_Density" as a FEATURE
LLM identified DATE column "year" as a YEAR
LLM identified DATE column "month" as a MONTH
LLM identified DATE column "day" as a DAY
LLM identified DATE column "time" as a DATE
LLM identified GEO column "latitude" as a LATITUDE
LLM identified GEO column "longitude" as a LONGITUDE
LLM identified GEO column "Latitude_Station" as a LATITUDE
LLM identified GEO column "Longitude_Station" as a LONGITUDE
LLM identified GEO column "country" as a COUNTRY
LLM identified GEO column "admin1" as a STATE
LLM identified GEO column "admin2" as a COUNTY
LLM identified GEO column "admin3" as a CITY
LLM identified FEATURE column "AQI" as a INT
LLM identified FEATURE column "PM2.5" as a INT
LLM identified FEATURE column "CO_Level" as a INT
LLM identified FEATURE column "Is_Industrial" as a BINARY
LLM identified FEATURE column "Traffic_Density" as a STR
LLM identified no units for feature column "AQI"
LLM provided units and description for feature column "PM2.5": µg/m^3. The units "µg/m^3" represent micrograms per cubic meter, a measurement of mass concentration indicating how many micrograms of a substance (in this case, PM2.5 particles) are present in one cubic meter of air.
LLM was unsure about the units for feature column "CO_Level"
LLM identified no units for feature column "Is_Industrial"
LLM identified no units for feature column "Traffic_Density"
LLM identified coordinate pair: ('Longitude_Station', 'Latitude_Station')
LLM identified coordinate pair: ('longitude', 'latitude')
LLM identified ('Longitude_Station', 'Latitude_Station') as the primary geo column(s)
LLM identified date group: ('day', 'year', 'month')
LLM identified ('day', 'year', 'month') as the primary date column(s)
LLM identified DATE/YEAR column "year" strftime format: "%Y"
LLM identified DATE/MONTH column "month" strftime format: "%m"
LLM identified DATE/DAY column "day" strftime format: "%d"
LLM identified DATE/DATE column "time" strftime format: "%H:%M"
LLM provided description for feature column "AQI": "The Air Quality Index (AQI) is a numeric scale used for reporting daily air quality. It indicates how clean or polluted the air is, and what associated health effects might be a concern for the general population. The AQI values range from 0 to 500, where lower values represent good air quality and higher values indicate worsening levels of air pollution and health concerns."
LLM provided description for feature column "PM2.5": "PM2.5 refers to particulate matter that is less than 2.5 micrometers in diameter. These tiny particles can penetrate deeply into the respiratory tract, reaching the lungs and potentially affecting lung function and aggravating conditions such as asthma and heart disease. The concentration of PM2.5 is measured in micrograms per cubic meter of air, providing an indication of air quality with respect to these fine particles."
LLM provided description for feature column "CO_Level": "This column records the carbon monoxide (CO) levels observed at various monitoring stations. The values represent discrete categories indicating the concentration of CO in the air, measured on a scale where higher numbers denote greater concentrations."
LLM provided description for feature column "Is_Industrial": "Indicates whether the air quality measurement was taken in an industrial area or not, with a binary value representing the presence (True) or absence (False) of industrial activities in the vicinity of the measurement site."
LLM provided description for feature column "Traffic_Density": "Indicates the volume of vehicles in a specific area, categorized as High, Medium, or Low, reflecting the relative amount of vehicular traffic present."
LLM provided description for date column "year": "The column records the year when air quality measurements were taken, with each value representing a specific calendar year."
LLM provided description for date column "month": "This column records the month of the year when the air quality data was collected, using numerical values from 1 (January) to 12 (December)."
LLM provided description for date column "day": "This column records the day of the month for air quality observations. Each value represents the numerical day within a specific month and year, indicating when the air quality index was measured."
LLM provided description for date column "time": "This column records the specific hour of the day when air quality index readings were taken, formatted as hours and minutes in 24-hour time."
LLM provided description for geo column "latitude": "This column represents the geographical north-south positions of various locations on the Earth's surface, measured in degrees. The values indicate the locations' distance from the Equator, with positive values denoting locations in the Northern Hemisphere."
LLM provided description for geo column "longitude": "This column contains geographical coordinates representing the east-west position of various locations on Earth. The values are in degrees, where negative values indicate locations west of the Prime Meridian (Greenwich, England) and positive values indicate locations east of the Prime Meridian. For example, -74.0060 suggests a location in the western hemisphere, while 139.6917 indicates a location in the eastern hemisphere."
LLM provided description for geo column "Latitude_Station": "This column represents the geographic latitude coordinates of air quality monitoring stations. Latitude coordinates are numerical measurements given in degrees, where positive values indicate locations north of the Equator and negative values indicate locations south of the Equator. These coordinates help in identifying the precise north-south position of the stations on the Earth's surface."
LLM provided description for geo column "Longitude_Station": "This column contains the longitudinal coordinates of air quality monitoring stations. Longitude measures how far east or west a location is from the Prime Meridian, expressed in degrees. Values can range from -180 degrees (west) to +180 degrees (east)."
LLM provided description for geo column "country": "This dataset feature lists the names of countries where air quality measurements have been taken, indicating the geographical location of each measurement."
LLM provided description for geo column "admin1": "This column contains the names of first-level administrative divisions, such as states, provinces, or regions, from various countries around the world. Each entry represents the geographical area where the air quality data was measured or compiled."
LLM provided description for geo column "admin2": "The column contains names of cities around the world, representing specific locations where air quality measurements have been recorded."
LLM provided description for geo column "admin3": "This column represents the name of a specific subdivision or district within a city, often used to identify smaller administrative areas or neighborhoods. These names can vary widely in their level of administrative significance and geographic size, reflecting a range of urban areas from densely populated city centers to more localized community identifiers."
geo=[GeoAnnotation(name='latitude', display_name=None, description="This column represents the geographical north-south positions of various locations on the Earth's surface, measured in degrees. The values indicate the locations' distance from the Equator, with positive values denoting locations in the Northern Hemisphere.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LATITUDE: 'latitude'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='longitude', display_name=None, description='This column contains geographical coordinates representing the east-west position of various locations on Earth. The values are in degrees, where negative values indicate locations west of the Prime Meridian (Greenwich, England) and positive values indicate locations east of the Prime Meridian. For example, -74.0060 suggests a location in the western hemisphere, while 139.6917 indicates a location in the eastern hemisphere.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LONGITUDE: 'longitude'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair='latitude', coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='Latitude_Station', display_name=None, description="This column represents the geographic latitude coordinates of air quality monitoring stations. Latitude coordinates are numerical measurements given in degrees, where positive values indicate locations north of the Equator and negative values indicate locations south of the Equator. These coordinates help in identifying the precise north-south position of the stations on the Earth's surface.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LATITUDE: 'latitude'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='Longitude_Station', display_name=None, description='This column contains the longitudinal coordinates of air quality monitoring stations. Longitude measures how far east or west a location is from the Prime Meridian, expressed in degrees. Values can range from -180 degrees (west) to +180 degrees (east).', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LONGITUDE: 'longitude'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair='Latitude_Station', coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='country', display_name=None, description='This dataset feature lists the names of countries where air quality measurements have been taken, indicating the geographical location of each measurement.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin1', display_name=None, description='This column contains the names of first-level administrative divisions, such as states, provinces, or regions, from various countries around the world. Each entry represents the geographical area where the air quality data was measured or compiled.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.STATE: 'state/territory'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin2', display_name=None, description='The column contains names of cities around the world, representing specific locations where air quality measurements have been recorded.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTY: 'county/district'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin3', display_name=None, description='This column represents the name of a specific subdivision or district within a city, often used to identify smaller administrative areas or neighborhoods. These names can vary widely in their level of administrative significance and geographic size, reflecting a range of urban areas from densely populated city centers to more localized community identifiers.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.CITY: 'municipality/town'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None)] date=[DateAnnotation(name='year', display_name=None, description='The column records the year when air quality measurements were taken, with each value representing a specific calendar year.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.YEAR: 'year'>, primary_date=True, time_format='%Y', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='month', display_name=None, description='This column records the month of the year when the air quality data was collected, using numerical values from 1 (January) to 12 (December).', type=<ColumnType.DATE: 'date'>, date_type=<DateType.MONTH: 'month'>, primary_date=True, time_format='%m', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='day', display_name=None, description='This column records the day of the month for air quality observations. Each value represents the numerical day within a specific month and year, indicating when the air quality index was measured.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.DAY: 'day'>, primary_date=True, time_format='%d', associated_columns={<TimeField.DAY: 'Day'>: 'day', <TimeField.YEAR: 'Year'>: 'year', <TimeField.MONTH: 'Month'>: 'month'}, qualifies=None, aliases={}), DateAnnotation(name='time', display_name=None, description='This column records the specific hour of the day when air quality index readings were taken, formatted as hours and minutes in 24-hour time.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.DATE: 'date'>, primary_date=None, time_format='%H:%M', associated_columns=None, qualifies=None, aliases={})] feature=[FeatureAnnotation(name='AQI', display_name=None, description='The Air Quality Index (AQI) is a numeric scale used for reporting daily air quality. It indicates how clean or polluted the air is, and what associated health effects might be a concern for the general population. The AQI values range from 0 to 500, where lower values represent good air quality and higher values indicate worsening levels of air pollution and health concerns.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='PM2.5', display_name=None, description='PM2.5 refers to particulate matter that is less than 2.5 micrometers in diameter. These tiny particles can penetrate deeply into the respiratory tract, reaching the lungs and potentially affecting lung function and aggravating conditions such as asthma and heart disease. The concentration of PM2.5 is measured in micrograms per cubic meter of air, providing an indication of air quality with respect to these fine particles.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='µg/m^3', units_description='The units "µg/m^3" represent micrograms per cubic meter, a measurement of mass concentration indicating how many micrograms of a substance (in this case, PM2.5 particles) are present in one cubic meter of air.', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='CO_Level', display_name=None, description='This column records the carbon monoxide (CO) levels observed at various monitoring stations. The values represent discrete categories indicating the concentration of CO in the air, measured on a scale where higher numbers denote greater concentrations.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units=None, units_description=None, qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Is_Industrial', display_name=None, description='Indicates whether the air quality measurement was taken in an industrial area or not, with a binary value representing the presence (True) or absence (False) of industrial activities in the vicinity of the measurement site.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.BINARY: 'binary'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Traffic_Density', display_name=None, description='Indicates the volume of vehicles in a specific area, categorized as High, Medium, or Low, reflecting the relative amount of vehicular traffic present.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={})]
Meta(path=PosixPath('datasets/mock_aqi5.csv'), name='Air Quality Index', description='This dataset represents daily air quality observations collected from various monitoring stations across different cities worldwide. Each row corresponds to a single observation with details about the date, time, and location of the observation, along with specific air quality metrics and conditions.')
LLM identified column "year" as a DATE
LLM identified column "month" as a DATE
LLM identified column "day" as a DATE
LLM identified column "time" as a DATE
LLM identified column "latitude" as a GEO
LLM identified column "longitude" as a GEO
LLM identified column "Latitude_Station" as a GEO
LLM identified column "Longitude_Station" as a GEO
LLM identified column "country" as a GEO
LLM identified column "admin1" as a GEO
LLM identified column "admin2" as a GEO
LLM identified column "admin3" as a GEO
LLM identified column "AQI" as a FEATURE
LLM identified column "PM2.5" as a FEATURE
LLM identified column "CO_Level" as a FEATURE
LLM identified column "Is_Industrial" as a FEATURE
LLM identified column "Traffic_Density" as a FEATURE
LLM identified column "Forecast_Year" as a DATE
LLM identified column "Forecast_Month" as a DATE
LLM identified column "Forecast_Day" as a FEATURE
LLM identified DATE column "year" as a YEAR
LLM identified DATE column "month" as a MONTH
LLM identified DATE column "day" as a DAY
LLM identified DATE column "time" as a DATE
LLM identified DATE column "Forecast_Year" as a YEAR
LLM identified DATE column "Forecast_Month" as a MONTH
LLM identified GEO column "latitude" as a LATITUDE
LLM identified GEO column "longitude" as a LONGITUDE
LLM identified GEO column "Latitude_Station" as a LATITUDE
LLM identified GEO column "Longitude_Station" as a LONGITUDE
LLM identified GEO column "country" as a COUNTRY
LLM identified GEO column "admin1" as a STATE
LLM identified GEO column "admin2" as a COUNTY
LLM identified GEO column "admin3" as a CITY
LLM identified FEATURE column "AQI" as a INT
LLM identified FEATURE column "PM2.5" as a INT
LLM identified FEATURE column "CO_Level" as a INT
LLM identified FEATURE column "Is_Industrial" as a BOOLEAN
LLM identified FEATURE column "Traffic_Density" as a STR
LLM identified FEATURE column "Forecast_Day" as a INT
LLM identified no units for feature column "AQI"
LLM provided units and description for feature column "PM2.5": µg/m^3. The unit "µg/m^3" refers to micrograms per cubic meter, a measure of concentration indicating how many micrograms of a substance (in this case, PM2.5 particles) are found in one cubic meter of air.
LLM was unsure about the units for feature column "CO_Level"
LLM identified no units for feature column "Is_Industrial"
LLM identified no units for feature column "Traffic_Density"
LLM identified no units for feature column "Forecast_Day"
LLM identified coordinate pair: ('Longitude_Station', 'Latitude_Station')
LLM identified coordinate pair: ('longitude', 'latitude')
LLM identified ('Longitude_Station', 'Latitude_Station') as the primary geo column(s)
LLM identified date group: ('Forecast_Month', 'Forecast_Year')
LLM identified date group: ('day', 'year', 'month')
LLM identified ('day', 'year', 'month') as the primary date column(s)
LLM identified DATE/YEAR column "year" strftime format: "%Y"
LLM identified DATE/MONTH column "month" strftime format: "%m"
LLM identified DATE/DAY column "day" strftime format: "%d"
LLM identified DATE/DATE column "time" strftime format: "%H:%M"
LLM identified DATE/YEAR column "Forecast_Year" strftime format: "%Y"
LLM identified DATE/MONTH column "Forecast_Month" strftime format: "%m"
LLM provided description for feature column "AQI": "The Air Quality Index (AQI) is a numerical scale used to communicate the quality of the air in a specific location at a given time. Values typically range from 0 to 500, where lower numbers indicate better air quality and higher numbers signify poorer air quality. This metric considers several air pollutants, including particulate matter, ground-level ozone, sulfur dioxide, nitrogen dioxide, and carbon monoxide, to provide a comprehensive view of air health and pollution levels."
LLM provided description for feature column "PM2.5": "PM2.5 refers to particulate matter that is less than 2.5 micrometers in diameter. These particles are so small that they can only be detected using an electron microscope. They are important to monitor because they are small enough to penetrate deep into the human lung and enter the bloodstream, potentially causing significant health problems. PM2.5 is produced from various sources, including combustion (vehicles, power plants, wood burning, etc.), industrial processes, and natural sources (dust, wildfires). High levels of PM2.5 in the air are associated with various adverse health effects, including respiratory and cardiovascular diseases, and can affect the lungs and heart."
LLM provided description for feature column "CO_Level": "This column indicates the level of carbon monoxide (CO) concentration measured at a specific location and time, represented as an integer. Carbon monoxide levels are crucial for assessing air quality as higher concentrations can be harmful to health."
LLM provided description for feature column "Is_Industrial": "Indicates whether the air quality measurement was taken at an industrial location."
LLM provided description for feature column "Traffic_Density": "Indicates the volume of vehicle flow in an area, categorized as high, medium, or low, which can impact air quality by varying amounts of pollutants released from vehicles."
LLM provided description for feature column "Forecast_Day": "This column represents the day of the month on which the air quality index (AQI) forecast is made, ranging from 1 to 31. The values are integers indicating the specific day, without specifying the month or year."
LLM provided description for date column "year": "This column records the year when the air quality measurements were taken, reflecting annual data points for analysis of air quality trends over time."
LLM provided description for date column "month": "This column records the month when the air quality measurements were taken, using numerical values from 1 to 12 to represent January through December, respectively."
LLM provided description for date column "day": "This column records the day of the month for each observed air quality index measurement, presented in a numeric format ranging from 1 to 31. It is part of a date series when combined with corresponding year and month data, providing a more detailed timestamp of each measurement."
LLM provided description for date column "time": "This field records the hour and minute of the day when the air quality index measurement was taken, using a 24-hour clock format."
LLM provided description for date column "Forecast_Year": "This column records the year for which the air quality index forecast is made, formatted in a four-digit year representation."
LLM provided description for date column "Forecast_Month": "The "Forecast_Month" column represents the month for which air quality predictions are made, encoded as integers from 1 to 12 corresponding to January through December."
LLM provided description for geo column "latitude": "This column contains the geographical coordinates specifying the north-south position of various locations on the Earth's surface. The values are measured in degrees, where positive values indicate locations north of the Equator and negative values indicate locations south of the Equator."
LLM provided description for geo column "longitude": "This column contains the longitudinal coordinates of various locations, measured in degrees. Longitude values indicate positions west or east of the Prime Meridian, which runs through Greenwich, England. Negative values represent locations west of the Prime Meridian, while positive values denote locations to the east."
LLM provided description for geo column "Latitude_Station": "This column represents the geographical latitude of various air quality monitoring stations. Latitude is a measure of a location's distance north or south of the Earth's equator, expressed in degrees. Here, the values indicate the precise north-south position of each station, enabling the identification of their specific location on the Earth's surface."
LLM provided description for geo column "Longitude_Station": "This column contains the longitudinal coordinates of various air quality monitoring stations around the world. Longitude is a geographic coordinate that specifies the east-west position of a point on the Earth's surface, measured in degrees. Values range from -180 to 180 degrees, with negative values indicating locations west of the Prime Meridian, and positive values indicating locations east of the Prime Meridian."
LLM provided description for geo column "country": "This field records the nations from which air quality data have been collected, indicating the geographic origin of each measurement in the dataset. Each entry corresponds to a specific country identified by its commonly used short name, such as "USA" for the United States of America or "UK" for the United Kingdom."
LLM provided description for geo column "admin1": "This dataset field represents the primary administrative divisions within a country, such as states, provinces, or regions, as they relate to the measurement of air quality. These divisions include a diverse range of areas, from densely populated cities like New York City and Tokyo to larger, broader regions such as California and Île-de-France."
LLM provided description for geo column "admin2": "This field represents the second-level administrative division in which the air quality is being monitored, typically corresponding to cities or metropolitan areas around the globe."
LLM provided description for geo column "admin3": "The values represent third-level administrative divisions or localities within larger metropolitan areas around the world, identifying specific neighborhoods, districts, or boroughs known for their distinct characteristics or administrative importance."
geo=[GeoAnnotation(name='latitude', display_name=None, description="This column contains the geographical coordinates specifying the north-south position of various locations on the Earth's surface. The values are measured in degrees, where positive values indicate locations north of the Equator and negative values indicate locations south of the Equator.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LATITUDE: 'latitude'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='longitude', display_name=None, description='This column contains the longitudinal coordinates of various locations, measured in degrees. Longitude values indicate positions west or east of the Prime Meridian, which runs through Greenwich, England. Negative values represent locations west of the Prime Meridian, while positive values denote locations to the east.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LONGITUDE: 'longitude'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair='latitude', coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='Latitude_Station', display_name=None, description="This column represents the geographical latitude of various air quality monitoring stations. Latitude is a measure of a location's distance north or south of the Earth's equator, expressed in degrees. Here, the values indicate the precise north-south position of each station, enabling the identification of their specific location on the Earth's surface.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LATITUDE: 'latitude'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='Longitude_Station', display_name=None, description="This column contains the longitudinal coordinates of various air quality monitoring stations around the world. Longitude is a geographic coordinate that specifies the east-west position of a point on the Earth's surface, measured in degrees. Values range from -180 to 180 degrees, with negative values indicating locations west of the Prime Meridian, and positive values indicating locations east of the Prime Meridian.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LONGITUDE: 'longitude'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair='Latitude_Station', coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='country', display_name=None, description='This field records the nations from which air quality data have been collected, indicating the geographic origin of each measurement in the dataset. Each entry corresponds to a specific country identified by its commonly used short name, such as "USA" for the United States of America or "UK" for the United Kingdom.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin1', display_name=None, description='This dataset field represents the primary administrative divisions within a country, such as states, provinces, or regions, as they relate to the measurement of air quality. These divisions include a diverse range of areas, from densely populated cities like New York City and Tokyo to larger, broader regions such as California and Île-de-France.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.STATE: 'state/territory'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin2', display_name=None, description='This field represents the second-level administrative division in which the air quality is being monitored, typically corresponding to cities or metropolitan areas around the globe.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTY: 'county/district'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin3', display_name=None, description='The values represent third-level administrative divisions or localities within larger metropolitan areas around the world, identifying specific neighborhoods, districts, or boroughs known for their distinct characteristics or administrative importance.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.CITY: 'municipality/town'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None)] date=[DateAnnotation(name='year', display_name=None, description='This column records the year when the air quality measurements were taken, reflecting annual data points for analysis of air quality trends over time.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.YEAR: 'year'>, primary_date=True, time_format='%Y', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='month', display_name=None, description='This column records the month when the air quality measurements were taken, using numerical values from 1 to 12 to represent January through December, respectively.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.MONTH: 'month'>, primary_date=True, time_format='%m', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='day', display_name=None, description='This column records the day of the month for each observed air quality index measurement, presented in a numeric format ranging from 1 to 31. It is part of a date series when combined with corresponding year and month data, providing a more detailed timestamp of each measurement.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.DAY: 'day'>, primary_date=True, time_format='%d', associated_columns={<TimeField.DAY: 'Day'>: 'day', <TimeField.YEAR: 'Year'>: 'year', <TimeField.MONTH: 'Month'>: 'month'}, qualifies=None, aliases={}), DateAnnotation(name='time', display_name=None, description='This field records the hour and minute of the day when the air quality index measurement was taken, using a 24-hour clock format.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.DATE: 'date'>, primary_date=None, time_format='%H:%M', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='Forecast_Year', display_name=None, description='This column records the year for which the air quality index forecast is made, formatted in a four-digit year representation.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.YEAR: 'year'>, primary_date=None, time_format='%Y', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='Forecast_Month', display_name=None, description='The "Forecast_Month" column represents the month for which air quality predictions are made, encoded as integers from 1 to 12 corresponding to January through December.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.MONTH: 'month'>, primary_date=None, time_format='%m', associated_columns={<TimeField.MONTH: 'Month'>: 'Forecast_Month', <TimeField.YEAR: 'Year'>: 'Forecast_Year'}, qualifies=None, aliases={})] feature=[FeatureAnnotation(name='AQI', display_name=None, description='The Air Quality Index (AQI) is a numerical scale used to communicate the quality of the air in a specific location at a given time. Values typically range from 0 to 500, where lower numbers indicate better air quality and higher numbers signify poorer air quality. This metric considers several air pollutants, including particulate matter, ground-level ozone, sulfur dioxide, nitrogen dioxide, and carbon monoxide, to provide a comprehensive view of air health and pollution levels.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='PM2.5', display_name=None, description='PM2.5 refers to particulate matter that is less than 2.5 micrometers in diameter. These particles are so small that they can only be detected using an electron microscope. They are important to monitor because they are small enough to penetrate deep into the human lung and enter the bloodstream, potentially causing significant health problems. PM2.5 is produced from various sources, including combustion (vehicles, power plants, wood burning, etc.), industrial processes, and natural sources (dust, wildfires). High levels of PM2.5 in the air are associated with various adverse health effects, including respiratory and cardiovascular diseases, and can affect the lungs and heart.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='µg/m^3', units_description='The unit "µg/m^3" refers to micrograms per cubic meter, a measure of concentration indicating how many micrograms of a substance (in this case, PM2.5 particles) are found in one cubic meter of air.', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='CO_Level', display_name=None, description='This column indicates the level of carbon monoxide (CO) concentration measured at a specific location and time, represented as an integer. Carbon monoxide levels are crucial for assessing air quality as higher concentrations can be harmful to health.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units=None, units_description=None, qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Is_Industrial', display_name=None, description='Indicates whether the air quality measurement was taken at an industrial location.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.BOOLEAN: 'boolean'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Traffic_Density', display_name=None, description='Indicates the volume of vehicle flow in an area, categorized as high, medium, or low, which can impact air quality by varying amounts of pollutants released from vehicles.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Forecast_Day', display_name=None, description='This column represents the day of the month on which the air quality index (AQI) forecast is made, ranging from 1 to 31. The values are integers indicating the specific day, without specifying the month or year.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={})]
Meta(path=PosixPath('datasets/idps.csv'), name='IDPs by Country of Origin', description='UNHCR internally displaced people (IDP)s by country and year of origin')
LLM identified column "Year" as a DATE
LLM identified column "Country of origin" as a GEO
LLM identified column "Country of origin (ISO)" as a GEO
LLM identified column "Country of asylum" as a GEO
LLM identified column "Country of asylum (ISO)" as a GEO
LLM identified column "IDPs" as a FEATURE
LLM identified DATE column "Year" as a YEAR
LLM identified GEO column "Country of origin" as a COUNTRY
LLM identified GEO column "Country of origin (ISO)" as a ISO3
LLM identified GEO column "Country of asylum" as a COUNTRY
LLM identified GEO column "Country of asylum (ISO)" as a ISO3
LLM identified FEATURE column "IDPs" as a INT
LLM was unsure about the units for feature column "IDPs"
LLM identified Country of origin as the primary geo column(s)
LLM identified 'Year' as the primary date
LLM identified DATE/YEAR column "Year" strftime format: "%Y"
LLM provided description for feature column "IDPs": "The numerical count of individuals within a country who have been forcibly displaced within their own national borders due to conflict, violence, natural disasters, or human rights violations. These people are known as internally displaced persons (IDPs)."
LLM provided description for date column "Year": "This column records the year associated with each observation of internally displaced persons (IDPs) by their country of origin, ranging at least from 2009 to 2013."
LLM provided description for geo column "Country of origin": "This column lists the countries from which internally displaced persons (IDPs) originate, indicating the national origin of individuals who have been forced to leave their homes but still remain within the borders of their country."
LLM provided description for geo column "Country of origin (ISO)": "This column represents the ISO 3166-1 alpha-3 codes for countries from which internally displaced persons (IDPs) originate, indicating the nationality or country affiliations of individuals who have been forced to leave their homes within their own country's boundaries. The example shown, "AFG," corresponds to Afghanistan."
LLM provided description for geo column "Country of asylum": "This column records the countries providing refuge or temporary protection to individuals who have been internally displaced from their original places of residence."
LLM provided description for geo column "Country of asylum (ISO)": "This column lists the ISO country codes of the nations where internally displaced persons (IDPs) have sought asylum. The "-" indicates missing or not applicable information for those records."
geo=[GeoAnnotation(name='Country of origin', display_name=None, description='This column lists the countries from which internally displaced persons (IDPs) originate, indicating the national origin of individuals who have been forced to leave their homes but still remain within the borders of their country.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='Country of origin (ISO)', display_name=None, description='This column represents the ISO 3166-1 alpha-3 codes for countries from which internally displaced persons (IDPs) originate, indicating the nationality or country affiliations of individuals who have been forced to leave their homes within their own country\'s boundaries. The example shown, "AFG," corresponds to Afghanistan.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.ISO3: 'iso3'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='Country of asylum', display_name=None, description='This column records the countries providing refuge or temporary protection to individuals who have been internally displaced from their original places of residence.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='Country of asylum (ISO)', display_name=None, description='This column lists the ISO country codes of the nations where internally displaced persons (IDPs) have sought asylum. The "-" indicates missing or not applicable information for those records.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.ISO3: 'iso3'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None)] date=[DateAnnotation(name='Year', display_name=None, description='This column records the year associated with each observation of internally displaced persons (IDPs) by their country of origin, ranging at least from 2009 to 2013.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.YEAR: 'year'>, primary_date=True, time_format='%Y', associated_columns=None, qualifies=None, aliases={})] feature=[FeatureAnnotation(name='IDPs', display_name=None, description='The numerical count of individuals within a country who have been forcibly displaced within their own national borders due to conflict, violence, natural disasters, or human rights violations. These people are known as internally displaced persons (IDPs).', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units=None, units_description=None, qualifies=None, qualifierrole=None, aliases={})]
Meta(path=PosixPath('datasets/us_bases_v2.csv'), name='U.S. Military Bases Abroad, 2015 and 2021', description='This dataset provides a count of U.S. military bases abroad by country for 2015 and 2021.')
LLM identified column "Year" as a DATE
LLM identified column "Country Name" as a GEO
LLM identified column "Total U.S. Military Bases" as a FEATURE
LLM identified DATE column "Year" as a YEAR
LLM identified GEO column "Country Name" as a COUNTRY
LLM identified FEATURE column "Total U.S. Military Bases" as a INT
LLM identified no units for feature column "Total U.S. Military Bases"
LLM identified 'Country Name' as the primary geo
LLM identified 'Year' as the primary date
LLM identified DATE/YEAR column "Year" strftime format: "%Y"
LLM provided description for feature column "Total U.S. Military Bases": "Represents the count of United States military bases located in foreign countries at the specified time points. These installations vary in size and purpose, including airfields, training facilities, naval bases, and support hubs."
LLM provided description for date column "Year": "The column records the year when data about U.S. Military Bases Abroad was collected, specifically indicating two points in time, 2015 and 2021."
LLM provided description for geo column "Country Name": "This dataset lists the locations of U.S. military bases outside of the United States, organized by the host nation where each base is situated. The countries mentioned include Germany, Japan, and South Korea, among others, reflecting the global distribution of U.S. military installations."
geo=[GeoAnnotation(name='Country Name', display_name=None, description='This dataset lists the locations of U.S. military bases outside of the United States, organized by the host nation where each base is situated. The countries mentioned include Germany, Japan, and South Korea, among others, reflecting the global distribution of U.S. military installations.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None)] date=[DateAnnotation(name='Year', display_name=None, description='The column records the year when data about U.S. Military Bases Abroad was collected, specifically indicating two points in time, 2015 and 2021.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.YEAR: 'year'>, primary_date=True, time_format='%Y', associated_columns=None, qualifies=None, aliases={})] feature=[FeatureAnnotation(name='Total U.S. Military Bases', display_name=None, description='Represents the count of United States military bases located in foreign countries at the specified time points. These installations vary in size and purpose, including airfields, training facilities, naval bases, and support hubs.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={})]
Meta(path=PosixPath('datasets/2019-10-29-2022-10-31_worldwide.csv'), name='ACLED 2019-2022', description='The Armed Conflict Location & Event Data Project (ACLED) collects real-time data on the locations, dates, actors, fatalities, and types of all reported political violence and protest events around the world.')
LLM identified column "data_id" as a FEATURE
LLM identified column "iso" as a GEO
LLM identified column "event_id_cnty" as a FEATURE
LLM identified column "event_id_no_cnty" as a DATE
LLM identified column "event_date" as a DATE
LLM identified column "year" as a DATE
LLM identified column "time_precision" as a FEATURE
LLM identified column "event_type" as a FEATURE
LLM identified column "sub_event_type" as a FEATURE
LLM identified column "actor1" as a FEATURE
LLM identified column "assoc_actor_1" as a FEATURE
LLM identified column "inter1" as a FEATURE
LLM identified column "actor2" as a FEATURE
LLM identified column "assoc_actor_2" as a FEATURE
LLM identified column "inter2" as a FEATURE
LLM identified column "interaction" as a FEATURE
LLM identified column "region" as a GEO
LLM identified column "country" as a GEO
LLM identified column "admin1" as a GEO
LLM identified column "admin2" as a GEO
LLM identified column "admin3" as a GEO
LLM identified column "location" as a GEO
LLM identified column "latitude" as a GEO
LLM identified column "longitude" as a GEO
LLM identified column "geo_precision" as a GEO
LLM identified column "source" as a FEATURE
LLM identified column "source_scale" as a FEATURE
LLM identified column "notes" as a FEATURE
LLM identified column "fatalities" as a FEATURE
LLM identified column "timestamp" as a DATE
LLM identified column "iso3" as a GEO
LLM identified DATE column "event_id_no_cnty" as a DATE
LLM identified DATE column "event_date" as a DATE
LLM identified DATE column "year" as a YEAR
LLM identified DATE column "timestamp" as a EPOCH
LLM identified GEO column "iso" as a ISO3
LLM identified GEO column "region" as a COUNTRY
LLM identified GEO column "country" as a COUNTRY
LLM identified GEO column "admin1" as a STATE
LLM identified GEO column "admin2" as a COUNTY
LLM identified GEO column "admin3" as a CITY
LLM identified GEO column "location" as a CITY
LLM identified GEO column "latitude" as a LATITUDE
LLM identified GEO column "longitude" as a LONGITUDE
The LLM was unsure about the date type for "geo_precision" with the following values (first 5 rows):
0 1
1 1
2 1
3 1
4 1
prompt='The column has been identified as containing geographic information.\nI need to identify the type of geographic information it contains. '
Select one of the following options: LATITUDE, LONGITUDE, COORDINATES, COUNTRY, ISO2, ISO3, STATE, COUNTY, CITY or None: invalid option: `DATE` out of options=['LATITUDE', 'LONGITUDE', 'COORDINATES', 'COUNTRY', 'ISO2', 'ISO3', 'STATE', 'COUNTY', 'CITY']
The LLM was unsure about the date type for "geo_precision" with the following values (first 5 rows):
0 1
1 1
2 1
3 1
4 1
prompt='The column has been identified as containing geographic information.\nI need to identify the type of geographic information it contains. '
Select one of the following options: LATITUDE, LONGITUDE, COORDINATES, COUNTRY, ISO2, ISO3, STATE, COUNTY, CITY or None: LLM identified GEO column "geo_precision" as a None
LLM identified GEO column "iso3" as a ISO3
LLM identified FEATURE column "data_id" as a INT
LLM identified FEATURE column "event_id_cnty" as a STR
LLM identified FEATURE column "time_precision" as a INT
LLM identified FEATURE column "event_type" as a STR
LLM identified FEATURE column "sub_event_type" as a STR
LLM identified FEATURE column "actor1" as a STR
LLM identified FEATURE column "assoc_actor_1" as a STR
LLM identified FEATURE column "inter1" as a INT
LLM identified FEATURE column "actor2" as a STR
LLM identified FEATURE column "assoc_actor_2" as a STR
LLM identified FEATURE column "inter2" as a INT
LLM identified FEATURE column "interaction" as a INT
LLM identified FEATURE column "source" as a STR
LLM identified FEATURE column "source_scale" as a STR
LLM identified FEATURE column "notes" as a STR
LLM identified FEATURE column "fatalities" as a INT
LLM identified no units for feature column "data_id"
LLM identified no units for feature column "event_id_cnty"
LLM identified no units for feature column "time_precision"
LLM identified no units for feature column "event_type"
LLM identified no units for feature column "sub_event_type"
LLM identified no units for feature column "actor1"
LLM identified no units for feature column "assoc_actor_1"
LLM identified no units for feature column "inter1"
LLM identified no units for feature column "actor2"
LLM identified no units for feature column "assoc_actor_2"
LLM identified no units for feature column "inter2"
LLM identified no units for feature column "interaction"
LLM identified no units for feature column "source"
LLM identified no units for feature column "source_scale"
LLM identified no units for feature column "notes"
LLM identified no units for feature column "fatalities"
LLM identified coordinate pair: ('longitude', 'latitude')
LLM identified ('longitude', 'latitude') as the primary geo column(s)
LLM identified event_date as the primary date column(s)
LLM identified DATE/DATE column "event_id_no_cnty" strftime format: "%d/%m/%Y"
LLM identified DATE/DATE column "event_date" strftime format: "%d %B %Y"
LLM identified DATE/YEAR column "year" strftime format: "%Y"
LLM provided description for feature column "data_id": "A unique identifier assigned to each event or observation recorded within the dataset, ensuring precise reference and data manipulation without ambiguity."
LLM provided description for feature column "event_id_cnty": "Unique identifier assigned to events, consisting of a country code followed by a sequence number."
LLM provided description for feature column "time_precision": "This attribute indicates the level of precision associated with the event date reported. A value of '1' typically suggests that the event date is precise to the exact day. Higher values might indicate less precise date reporting, such as an approximate date range or month."
LLM provided description for feature column "event_type": "Categorizes the nature of reported incidents within a dataset, including but not limited to protests, strategic developments, battles, and riots. Each entry designates the type of event recorded, reflecting different forms of social, political, or military activities documented during the specified timeframe."
LLM provided description for feature column "sub_event_type": "This column categorizes the nature of events documented in the dataset, detailing the specific type of action or activity that occurred. These categories range from non-violent actions, such as peaceful protests, to incidents of violence or conflict, including looting, property destruction, and territorial takeovers by non-state actors."
LLM provided description for feature column "actor1": "This field captures the primary actor involved in the reported events, detailing the specific group or affiliation, whether they are protestors, rioters, or organized groups such as the Group for Support of Islam and Muslims (JNIM). The entries are specific to incidents within Burkina Faso, indicating the involved party's role or stance in each event."
LLM provided description for feature column "assoc_actor_1": "This column records any actors associated with the primary actor involved in an event, providing details on additional groups or entities that played a secondary role or were indirectly involved. These associated actors can include various groups, organizations, or entities that are not the main focus of the event but have a significant connection to the activities reported."
LLM provided description for feature column "inter1": "This column categorizes the primary actor involved in the incident by encoding their type according to a predefined scale. For instance, values might correspond to different groups such as governments, rebels, civilians, etc., to denote the main entity that initiated the event being logged."
LLM provided description for feature column "actor2": "This feature represents the secondary actors involved in conflict events, detailing the opposing group, individuals, or security forces that are participants in the incident along with the primary actor. These actors can range from civilians and rioters to various national or local governmental forces, specifying their roles and affiliations within the context of the event."
LLM provided description for feature column "assoc_actor_2": "This column records the secondary actors associated with the event, providing additional context on other groups or entities involved alongside the primary actors. These actors can range from government forces, opposition groups, to local or international organizations, capturing a broader spectrum of involvement in the reported incidents."
LLM provided description for feature column "inter2": "This column categorizes the type of actor involved in the event, with integer codes representing different actor types such as governments, rebels, civilians, and external forces. Each number corresponds to a specific category of actor engaged in the event, with '0' often indicating that the actor does not fit into the predefined categories or that the actor is unknown."
LLM provided description for feature column "interaction": "The "interaction" column encodes the type of engagements between actors involved in conflict-related events. Each value represents a specific kind of interaction, with numerical codes categorizing the parties, such as government forces, rebels, civilians, and external forces, and the nature of their engagement, ranging from violence to non-violent actions. These codes help in analyzing the dynamics and relationships between different actors within conflict settings."
LLM provided description for feature column "source": "This feature lists the original media outlets, social media platforms, or other sources that reported the events or incidents recorded in the dataset. Each entry may include multiple sources, indicating that the information was reported by several outlets or channels. The sources can range from local radio stations and national news agencies to social media platforms like Facebook and Whatsapp, as well as other undisclosed sources."
LLM provided description for feature column "source_scale": "The "source_scale" indicates the origin and type of the entity providing the information. It differentiates between sources at various levels, such as national or local, as well as the nature of the source, like new media or local partners. This classification helps in understanding the proximity and perspective of the reporting entity relative to reported events."
LLM provided description for feature column "notes": "Detailed textual accounts of specific events, including date, actors involved, and main actions."
LLM provided description for feature column "fatalities": "Records the number of confirmed deaths resulting from the reported event."
LLM provided description for date column "event_id_no_cnty": "This column records the dates of specific events, formatted as day/month/year, capturing when each event occurred."
LLM provided description for date column "event_date": "Represents the specific day on which an event occurred, formatted as day, full month name, and year."
LLM provided description for date column "year": "This column records the year when an event occurred, formatted as a four-digit number (YYYY), spanning from 2019 to 2022."
LLM provided description for date column "timestamp": "This column records the date and time of each event in Unix epoch format, which counts the seconds since 00:00:00 UTC on 1 January 1970."
LLM provided description for geo column "iso": "This column represents the ISO 3166-1 numeric country codes, which are three-digit numbers assigned to countries for geographical identification. In this dataset, the code "854" corresponds to Burkina Faso."
LLM provided description for geo column "region": "This dataset categorizes incidents based on their geographical location within the continent of Africa, specifically focusing on the western part. This sector, known as Western Africa, encompasses a diverse range of countries along the Atlantic coast and further inland, characterized by its distinct cultures, languages, and historical backgrounds."
LLM provided description for geo column "country": "This field records the names of the countries where the reported incidents took place, indicating the geographical location within which each event occurred."
LLM provided description for geo column "admin1": "This column records the primary administrative divisions within a country, such as states, provinces, or regions, indicating the specific area where recorded incidents took place."
LLM provided description for geo column "admin2": "This column lists the names of secondary-level administrative divisions, such as districts or counties, within a country, indicating the specific region where each recorded event took place."
LLM provided description for geo column "admin3": "This column contains the names of tertiary administrative divisions, such as districts or municipalities, within a country where specific events or data points were recorded. These divisions represent a more localized level of governance or geographical area, coming after the first level (country) and second level (state or province) divisions."
LLM provided description for geo column "location": "This dataset's geographical column lists various locations within Burkina Faso, indicating the specific cities or areas where the recorded events took place. It includes major cities like Ouagadougou, the capital city, as well as smaller towns and regions such as Kamboinsin, Bobo-Dioulasso, Bondokuy, and Taparko."
LLM provided description for geo column "latitude": "This column contains the geographical latitude coordinates, which represent the north-south position of a point on the Earth's surface. The values are in decimal degrees, where positive values indicate locations north of the Equator and negative values indicate locations south of the Equator."
LLM provided description for geo column "longitude": "This column contains the longitudinal coordinates for various events, represented in decimal degrees. Each value specifies the east-west position of a point on the Earth's surface, with negative values indicating locations west of the Prime Meridian."
LLM provided description for geo column "iso3": "This column represents the three-letter country codes as defined by the ISO 3166-1 alpha-3 standard, identifying the country where each event in the dataset occurred. In this context, 'BFA' stands for Burkina Faso."
geo=[GeoAnnotation(name='iso', display_name=None, description='This column represents the ISO 3166-1 numeric country codes, which are three-digit numbers assigned to countries for geographical identification. In this dataset, the code "854" corresponds to Burkina Faso.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.ISO3: 'iso3'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='region', display_name=None, description='This dataset categorizes incidents based on their geographical location within the continent of Africa, specifically focusing on the western part. This sector, known as Western Africa, encompasses a diverse range of countries along the Atlantic coast and further inland, characterized by its distinct cultures, languages, and historical backgrounds.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='country', display_name=None, description='This field records the names of the countries where the reported incidents took place, indicating the geographical location within which each event occurred.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin1', display_name=None, description='This column records the primary administrative divisions within a country, such as states, provinces, or regions, indicating the specific area where recorded incidents took place.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.STATE: 'state/territory'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin2', display_name=None, description='This column lists the names of secondary-level administrative divisions, such as districts or counties, within a country, indicating the specific region where each recorded event took place.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTY: 'county/district'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='admin3', display_name=None, description='This column contains the names of tertiary administrative divisions, such as districts or municipalities, within a country where specific events or data points were recorded. These divisions represent a more localized level of governance or geographical area, coming after the first level (country) and second level (state or province) divisions.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.CITY: 'municipality/town'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='location', display_name=None, description="This dataset's geographical column lists various locations within Burkina Faso, indicating the specific cities or areas where the recorded events took place. It includes major cities like Ouagadougou, the capital city, as well as smaller towns and regions such as Kamboinsin, Bobo-Dioulasso, Bondokuy, and Taparko.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.CITY: 'municipality/town'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='latitude', display_name=None, description="This column contains the geographical latitude coordinates, which represent the north-south position of a point on the Earth's surface. The values are in decimal degrees, where positive values indicate locations north of the Equator and negative values indicate locations south of the Equator.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LATITUDE: 'latitude'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='longitude', display_name=None, description="This column contains the longitudinal coordinates for various events, represented in decimal degrees. Each value specifies the east-west position of a point on the Earth's surface, with negative values indicating locations west of the Prime Meridian.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LONGITUDE: 'longitude'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair='latitude', coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='iso3', display_name=None, description="This column represents the three-letter country codes as defined by the ISO 3166-1 alpha-3 standard, identifying the country where each event in the dataset occurred. In this context, 'BFA' stands for Burkina Faso.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.ISO3: 'iso3'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None)] date=[DateAnnotation(name='event_id_no_cnty', display_name=None, description='This column records the dates of specific events, formatted as day/month/year, capturing when each event occurred.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.DATE: 'date'>, primary_date=None, time_format='%d/%m/%Y', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='event_date', display_name=None, description='Represents the specific day on which an event occurred, formatted as day, full month name, and year.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.DATE: 'date'>, primary_date=True, time_format='%d %B %Y', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='year', display_name=None, description='This column records the year when an event occurred, formatted as a four-digit number (YYYY), spanning from 2019 to 2022.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.YEAR: 'year'>, primary_date=None, time_format='%Y', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='timestamp', display_name=None, description='This column records the date and time of each event in Unix epoch format, which counts the seconds since 00:00:00 UTC on 1 January 1970.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.EPOCH: 'epoch'>, primary_date=None, time_format='todo', associated_columns=None, qualifies=None, aliases={})] feature=[FeatureAnnotation(name='data_id', display_name=None, description='A unique identifier assigned to each event or observation recorded within the dataset, ensuring precise reference and data manipulation without ambiguity.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='event_id_cnty', display_name=None, description='Unique identifier assigned to events, consisting of a country code followed by a sequence number.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='time_precision', display_name=None, description="This attribute indicates the level of precision associated with the event date reported. A value of '1' typically suggests that the event date is precise to the exact day. Higher values might indicate less precise date reporting, such as an approximate date range or month.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='event_type', display_name=None, description='Categorizes the nature of reported incidents within a dataset, including but not limited to protests, strategic developments, battles, and riots. Each entry designates the type of event recorded, reflecting different forms of social, political, or military activities documented during the specified timeframe.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='sub_event_type', display_name=None, description='This column categorizes the nature of events documented in the dataset, detailing the specific type of action or activity that occurred. These categories range from non-violent actions, such as peaceful protests, to incidents of violence or conflict, including looting, property destruction, and territorial takeovers by non-state actors.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='actor1', display_name=None, description="This field captures the primary actor involved in the reported events, detailing the specific group or affiliation, whether they are protestors, rioters, or organized groups such as the Group for Support of Islam and Muslims (JNIM). The entries are specific to incidents within Burkina Faso, indicating the involved party's role or stance in each event.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='assoc_actor_1', display_name=None, description='This column records any actors associated with the primary actor involved in an event, providing details on additional groups or entities that played a secondary role or were indirectly involved. These associated actors can include various groups, organizations, or entities that are not the main focus of the event but have a significant connection to the activities reported.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='inter1', display_name=None, description='This column categorizes the primary actor involved in the incident by encoding their type according to a predefined scale. For instance, values might correspond to different groups such as governments, rebels, civilians, etc., to denote the main entity that initiated the event being logged.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='actor2', display_name=None, description='This feature represents the secondary actors involved in conflict events, detailing the opposing group, individuals, or security forces that are participants in the incident along with the primary actor. These actors can range from civilians and rioters to various national or local governmental forces, specifying their roles and affiliations within the context of the event.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='assoc_actor_2', display_name=None, description='This column records the secondary actors associated with the event, providing additional context on other groups or entities involved alongside the primary actors. These actors can range from government forces, opposition groups, to local or international organizations, capturing a broader spectrum of involvement in the reported incidents.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='inter2', display_name=None, description="This column categorizes the type of actor involved in the event, with integer codes representing different actor types such as governments, rebels, civilians, and external forces. Each number corresponds to a specific category of actor engaged in the event, with '0' often indicating that the actor does not fit into the predefined categories or that the actor is unknown.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='interaction', display_name=None, description='The "interaction" column encodes the type of engagements between actors involved in conflict-related events. Each value represents a specific kind of interaction, with numerical codes categorizing the parties, such as government forces, rebels, civilians, and external forces, and the nature of their engagement, ranging from violence to non-violent actions. These codes help in analyzing the dynamics and relationships between different actors within conflict settings.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='source', display_name=None, description='This feature lists the original media outlets, social media platforms, or other sources that reported the events or incidents recorded in the dataset. Each entry may include multiple sources, indicating that the information was reported by several outlets or channels. The sources can range from local radio stations and national news agencies to social media platforms like Facebook and Whatsapp, as well as other undisclosed sources.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='source_scale', display_name=None, description='The "source_scale" indicates the origin and type of the entity providing the information. It differentiates between sources at various levels, such as national or local, as well as the nature of the source, like new media or local partners. This classification helps in understanding the proximity and perspective of the reporting entity relative to reported events.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='notes', display_name=None, description='Detailed textual accounts of specific events, including date, actors involved, and main actions.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='fatalities', display_name=None, description='Records the number of confirmed deaths resulting from the reported event.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={})]
Meta(path=PosixPath('datasets/nd-gain-combined.csv'), name='ND-GAIN Vulnerability and Readiness Indicators', description="The ND-GAIN Country Index measures climate vulnerability and adaptation readiness based upon compiled indicators. Building on the University's academic expertise, ND-GAIN surveyed recent data and literature and consulted scholars, adaptation practitioners, and global development experts to select 45 indicators. Thirty-six indicators contribute to the vulnerability score and nine indicators contribute to readiness.")
LLM identified column "ISO3" as a GEO
LLM identified column "Name" as a GEO
LLM identified column "Year" as a DATE
LLM identified column "infrastructure" as a FEATURE
LLM identified column "ecosystems" as a FEATURE
LLM identified column "exposure" as a FEATURE
LLM identified column "food" as a FEATURE
LLM identified column "habitat" as a FEATURE
LLM identified column "vulnerability_delta" as a FEATURE
LLM identified column "water" as a FEATURE
LLM identified column "vulnerability" as a FEATURE
LLM identified column "capacity" as a FEATURE
LLM identified column "sensitivity" as a FEATURE
LLM identified column "health" as a FEATURE
LLM identified column "social" as a FEATURE
LLM identified column "readiness_delta" as a FEATURE
LLM identified column "economic" as a FEATURE
LLM identified column "governance" as a FEATURE
LLM identified column "readiness" as a FEATURE
LLM identified DATE column "Year" as a YEAR
LLM identified GEO column "ISO3" as a ISO3
LLM identified GEO column "Name" as a COUNTRY
LLM identified FEATURE column "infrastructure" as a FLOAT
LLM identified FEATURE column "ecosystems" as a FLOAT
LLM identified FEATURE column "exposure" as a FLOAT
LLM identified FEATURE column "food" as a FLOAT
LLM identified FEATURE column "habitat" as a FLOAT
LLM identified FEATURE column "vulnerability_delta" as a FLOAT
LLM identified FEATURE column "water" as a FLOAT
LLM identified FEATURE column "vulnerability" as a FLOAT
LLM identified FEATURE column "capacity" as a FLOAT
LLM identified FEATURE column "sensitivity" as a FLOAT
LLM identified FEATURE column "health" as a FLOAT
LLM identified FEATURE column "social" as a FLOAT
LLM identified FEATURE column "readiness_delta" as a FLOAT
LLM identified FEATURE column "economic" as a FLOAT
LLM identified FEATURE column "governance" as a FLOAT
LLM identified FEATURE column "readiness" as a FLOAT
LLM identified no units for feature column "infrastructure"
LLM identified no units for feature column "ecosystems"
LLM identified no units for feature column "exposure"
LLM identified no units for feature column "food"
LLM was unsure about the units for feature column "habitat"
LLM identified no units for feature column "vulnerability_delta"
LLM was unsure about the units for feature column "water"
LLM identified no units for feature column "vulnerability"
LLM identified no units for feature column "capacity"
LLM identified no units for feature column "sensitivity"
LLM identified no units for feature column "health"
LLM identified no units for feature column "social"
LLM identified no units for feature column "readiness_delta"
LLM identified no units for feature column "economic"
LLM identified no units for feature column "governance"
LLM identified no units for feature column "readiness"
LLM identified ISO3 as the primary geo column(s)
LLM identified 'Year' as the primary date
LLM identified DATE/YEAR column "Year" strftime format: "%Y"
LLM provided description for feature column "infrastructure": "This feature quantitatively represents the condition and capacity of a country's infrastructure, including aspects such as transportation networks, communications systems, water and energy supply, and public facilities. It is measured on a scale reflecting the adequacy and reliability of infrastructure to support current and future societal and economic activities."
LLM provided description for feature column "ecosystems": "This column quantifies the resilience and susceptibility of ecosystems within various regions, using a numerical scale where higher values indicate greater vulnerability. The data represent an assessment of ecological health and its capacity to withstand environmental stressors."
LLM provided description for feature column "exposure": "This value represents the degree to which a system is exposed to significant environmental changes due to climate variations or other external factors. It quantifies the susceptibility of a region's population, economy, and infrastructure to potential hazards, measured on a scale where higher values signify greater exposure."
LLM provided description for feature column "food": "This column quantifies the vulnerability related to food security within a region, represented as a floating-point number. Lower values indicate fewer concerns about food availability, access, and stability, while higher values suggest greater vulnerability to food insecurity. The measurement is not associated with specific units, thus allowing for broad comparative analysis across different locations or times."
LLM provided description for feature column "habitat": "This feature represents a numerical measure of the condition and resilience of natural habitats within a given area, assessing factors such as biodiversity, ecosystem health, and the presence of protected zones. Higher values indicate a stronger, healthier habitat capable of supporting diverse forms of life and withstanding environmental stressors."
LLM provided description for feature column "vulnerability_delta": "This feature represents the change in vulnerability score, quantifying how a country's vulnerability to climate change impacts has shifted over a specified period. It is measured as a floating-point number, where positive values indicate an increase in vulnerability, while negative values suggest a decrease."
LLM provided description for feature column "water": "This feature quantifies water availability and management within a region, reflecting the adequacy, reliability, and sustainability of water resources as well as the capacity to manage water supply and demand. It encompasses aspects such as freshwater availability, access to safe drinking water, and the effectiveness of water resource management policies and practices."
LLM provided description for feature column "vulnerability": "This metric quantifies the susceptibility of a country or region to the harmful impacts of climate change. It considers a range of factors, including but not limited to geographical, social, economic, and environmental aspects that could influence an area's ability to cope with climate stressors. High values indicate greater vulnerability."
LLM provided description for feature column "capacity": "A measure assessing a country or region's ability to leverage its strengths and resources to respond to climate challenges, encompassing aspects such as governance, social, and economic conditions."
LLM provided description for feature column "sensitivity": "This value represents the degree to which a country or region is susceptible to harm from environmental or climate change. It quantifies vulnerability by assessing aspects such as exposure to hazards, sensitivity to impacts, and adaptive capacity, expressed as a float indicating the level of sensitivity where higher values denote greater vulnerability."
LLM provided description for feature column "health": "Represents a numerical assessment of the health sector's vulnerability and readiness in adapting to climate change and other global challenges, encapsulating factors like disease prevalence, health service availability, and the ability to implement effective interventions."
LLM provided description for feature column "social": "This feature quantifies the social dimension of a region's vulnerability relating to climate change and other global challenges. It evaluates aspects like health, education, and social networks that influence a society's capacity to cope with and adapt to these challenges. The values, expressed as floating-point numbers, reflect the relative standing of different regions, with lower numbers indicating lower vulnerability."
LLM provided description for feature column "readiness_delta": "The "readiness_delta" represents the change in a country's readiness to adapt to climate change over a specified time period. This numerical value indicates whether a country's ability to implement adaptation strategies and absorb financial investments for climate change mitigation has increased or decreased, with negative values suggesting a decline in readiness and positive values indicating an improvement."
LLM provided description for feature column "economic": "This column quantifies the economic vulnerability and readiness aspect of a region's ability to adapt and respond to climate change impacts. It reflects a composite score based on various economic indicators, such as income levels, economic diversity, and investment in climate adaptation measures. The values range on a scale, with higher scores indicating better economic preparedness and resilience to environmental challenges."
LLM provided description for feature column "governance": "This feature represents a quantitative assessment of a country's governance quality, measuring aspects such as political stability, government effectiveness, regulatory quality, rule of law, and corruption control. The values are normalized scores, where higher values indicate better governance."
LLM provided description for feature column "readiness": "This column quantifies the readiness of a country or region to adapt to the challenges of climate change. It is represented as a numerical score, scaled between 0 and 1, where higher values denote greater capacity and preparedness to implement effective adaptation strategies to mitigate climate change impacts. The score encompasses various dimensions including governance, social, economic, and environmental factors to assess overall readiness."
LLM provided description for date column "Year": "This field represents the calendar year pertaining to the data collected or assessed, used to track annual changes in vulnerability and readiness indicators as part of the ND-GAIN project."
LLM provided description for geo column "ISO3": "The "ISO3" values represent three-letter country codes defined by the International Organization for Standardization (ISO) under ISO 3166-1 alpha-3. These codes are internationally recognized and used to denote specific countries in a standardized format, facilitating data clarity and global understanding. In the provided dataset, "AFG" corresponds to Afghanistan."
LLM provided description for geo column "Name": "This column lists the names of countries, specifically focusing on instances related to Afghanistan across multiple rows, indicating a repetitive analysis or multiple data entries for Afghanistan within the dataset."
geo=[GeoAnnotation(name='ISO3', display_name=None, description='The "ISO3" values represent three-letter country codes defined by the International Organization for Standardization (ISO) under ISO 3166-1 alpha-3. These codes are internationally recognized and used to denote specific countries in a standardized format, facilitating data clarity and global understanding. In the provided dataset, "AFG" corresponds to Afghanistan.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.ISO3: 'iso3'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='Name', display_name=None, description='This column lists the names of countries, specifically focusing on instances related to Afghanistan across multiple rows, indicating a repetitive analysis or multiple data entries for Afghanistan within the dataset.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None)] date=[DateAnnotation(name='Year', display_name=None, description='This field represents the calendar year pertaining to the data collected or assessed, used to track annual changes in vulnerability and readiness indicators as part of the ND-GAIN project.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.YEAR: 'year'>, primary_date=True, time_format='%Y', associated_columns=None, qualifies=None, aliases={})] feature=[FeatureAnnotation(name='infrastructure', display_name=None, description="This feature quantitatively represents the condition and capacity of a country's infrastructure, including aspects such as transportation networks, communications systems, water and energy supply, and public facilities. It is measured on a scale reflecting the adequacy and reliability of infrastructure to support current and future societal and economic activities.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='ecosystems', display_name=None, description='This column quantifies the resilience and susceptibility of ecosystems within various regions, using a numerical scale where higher values indicate greater vulnerability. The data represent an assessment of ecological health and its capacity to withstand environmental stressors.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='exposure', display_name=None, description="This value represents the degree to which a system is exposed to significant environmental changes due to climate variations or other external factors. It quantifies the susceptibility of a region's population, economy, and infrastructure to potential hazards, measured on a scale where higher values signify greater exposure.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='food', display_name=None, description='This column quantifies the vulnerability related to food security within a region, represented as a floating-point number. Lower values indicate fewer concerns about food availability, access, and stability, while higher values suggest greater vulnerability to food insecurity. The measurement is not associated with specific units, thus allowing for broad comparative analysis across different locations or times.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='habitat', display_name=None, description='This feature represents a numerical measure of the condition and resilience of natural habitats within a given area, assessing factors such as biodiversity, ecosystem health, and the presence of protected zones. Higher values indicate a stronger, healthier habitat capable of supporting diverse forms of life and withstanding environmental stressors.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units=None, units_description=None, qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='vulnerability_delta', display_name=None, description="This feature represents the change in vulnerability score, quantifying how a country's vulnerability to climate change impacts has shifted over a specified period. It is measured as a floating-point number, where positive values indicate an increase in vulnerability, while negative values suggest a decrease.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='water', display_name=None, description='This feature quantifies water availability and management within a region, reflecting the adequacy, reliability, and sustainability of water resources as well as the capacity to manage water supply and demand. It encompasses aspects such as freshwater availability, access to safe drinking water, and the effectiveness of water resource management policies and practices.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units=None, units_description=None, qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='vulnerability', display_name=None, description="This metric quantifies the susceptibility of a country or region to the harmful impacts of climate change. It considers a range of factors, including but not limited to geographical, social, economic, and environmental aspects that could influence an area's ability to cope with climate stressors. High values indicate greater vulnerability.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='capacity', display_name=None, description="A measure assessing a country or region's ability to leverage its strengths and resources to respond to climate challenges, encompassing aspects such as governance, social, and economic conditions.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='sensitivity', display_name=None, description='This value represents the degree to which a country or region is susceptible to harm from environmental or climate change. It quantifies vulnerability by assessing aspects such as exposure to hazards, sensitivity to impacts, and adaptive capacity, expressed as a float indicating the level of sensitivity where higher values denote greater vulnerability.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='health', display_name=None, description="Represents a numerical assessment of the health sector's vulnerability and readiness in adapting to climate change and other global challenges, encapsulating factors like disease prevalence, health service availability, and the ability to implement effective interventions.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='social', display_name=None, description="This feature quantifies the social dimension of a region's vulnerability relating to climate change and other global challenges. It evaluates aspects like health, education, and social networks that influence a society's capacity to cope with and adapt to these challenges. The values, expressed as floating-point numbers, reflect the relative standing of different regions, with lower numbers indicating lower vulnerability.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='readiness_delta', display_name=None, description='The "readiness_delta" represents the change in a country\'s readiness to adapt to climate change over a specified time period. This numerical value indicates whether a country\'s ability to implement adaptation strategies and absorb financial investments for climate change mitigation has increased or decreased, with negative values suggesting a decline in readiness and positive values indicating an improvement.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='economic', display_name=None, description="This column quantifies the economic vulnerability and readiness aspect of a region's ability to adapt and respond to climate change impacts. It reflects a composite score based on various economic indicators, such as income levels, economic diversity, and investment in climate adaptation measures. The values range on a scale, with higher scores indicating better economic preparedness and resilience to environmental challenges.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='governance', display_name=None, description="This feature represents a quantitative assessment of a country's governance quality, measuring aspects such as political stability, government effectiveness, regulatory quality, rule of law, and corruption control. The values are normalized scores, where higher values indicate better governance.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='readiness', display_name=None, description='This column quantifies the readiness of a country or region to adapt to the challenges of climate change. It is represented as a numerical score, scaled between 0 and 1, where higher values denote greater capacity and preparedness to implement effective adaptation strategies to mitigate climate change impacts. The score encompasses various dimensions including governance, social, economic, and environmental factors to assess overall readiness.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={})]
Meta(path=PosixPath('datasets/us-trade.csv'), name='U.S. Trade in Goods by Country', description='Balance of U.S. trade in goods by country by the U.S. Census Bureau. All figures are in millions of U.S. dollars on a nominal basis, not seasonally adjusted unless otherwise specified. Details may not equal totals due to rounding. Table reflects only those months for which there was trade.')
LLM identified column "year" as a DATE
LLM identified column "country" as a GEO
LLM identified column "imports" as a FEATURE
LLM identified column "exports" as a FEATURE
LLM identified DATE column "year" as a YEAR
LLM identified GEO column "country" as a COUNTRY
LLM identified FEATURE column "imports" as a FLOAT
LLM identified FEATURE column "exports" as a FLOAT
LLM was unsure about the units for feature column "imports"
LLM was unsure about the units for feature column "exports"
LLM identified 'country' as the primary geo
LLM identified 'year' as the primary date
LLM identified DATE/YEAR column "year" strftime format: "%Y"
LLM provided description for feature column "imports": "The column represents the value of goods imported from various countries to the United States, measured in billions of U.S. dollars."
LLM provided description for feature column "exports": "Values represent the total monetary value of goods exported by the United States to various countries, measured in billions of U.S. dollars."
LLM provided description for date column "year": "This column represents the year in which the trade transactions occurred, formatted as a four-digit number indicating the specific year from the Gregorian calendar."
LLM provided description for geo column "country": "This column lists the names of countries involved in trade relations with the United States, specifying the trading partner's geographical location."
geo=[GeoAnnotation(name='country', display_name=None, description="This column lists the names of countries involved in trade relations with the United States, specifying the trading partner's geographical location.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None)] date=[DateAnnotation(name='year', display_name=None, description='This column represents the year in which the trade transactions occurred, formatted as a four-digit number indicating the specific year from the Gregorian calendar.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.YEAR: 'year'>, primary_date=True, time_format='%Y', associated_columns=None, qualifies=None, aliases={})] feature=[FeatureAnnotation(name='imports', display_name=None, description='The column represents the value of goods imported from various countries to the United States, measured in billions of U.S. dollars.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units=None, units_description=None, qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='exports', display_name=None, description='Values represent the total monetary value of goods exported by the United States to various countries, measured in billions of U.S. dollars.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units=None, units_description=None, qualifies=None, qualifierrole=None, aliases={})]
Meta(path=PosixPath('datasets/overseas military bases.xlsx'), name='Overseas Military Bases - 2020 - China - By Country Location Only', description='Overseas Military Bases - 2020 - China Locations shown at Country Level only NOTE: the accuracy of this data is not clear but it can be used as a placeholder for now Dataset posted on 2022-08-15, 02:40 authored by Chun Yin Man, David Alexander Palmer Description This dataset contains both tabular and geospatial data of eight great powers\' overseas military bases, including China, the United States, the United Kingdoms, Russia, Japan, India, the United Arab Emirates, and France up until November 2020. An interactive view of this dataset: https://www.arcgis.com/apps/mapviewer/index.html?webmap=52bb2daa766e45aaab0e2196d7bfe469 Source All data were collected from multiple public sources and specified in each data point in the Excel file and Shapefile. For metadata, such as data description and available methods for geospatial data processing, please read the readme.pdf. Terms of use This dataset features in a collection of geospatial data "Geo-mapping databases for the Belt and Road Initiative" (https://doi.org/10.6084/m9.figshare.c.6076193). To cite this work, available citation styles can be found here: https://doi.org/10.6084/m9.figshare.c.6076193 FUNDING CRF grant no. C7052-18G, “Infrastructures of Faith: Religious Mobilities on the Belt and Road” ************************************************************* * Point of Contact ************************************************************* * Asian Religious Connections (ASIAR) * Hong Kong Institute for the Humanities and Social Sciences (HKIHSS), The University of Hong Kong * Pokfulam Road, Hong Kong * E-Mail: [email protected] * Facebook: https://fb.me/asiar.hku * Twitter: https://twitter.com/asiarhk * LinkedIn: https://www.linkedin.com/company/asian-religious-connections-hkihss')
LLM identified column "Date" as a DATE
LLM identified column "Base Count" as a FEATURE
LLM identified column "Name" as a FEATURE
LLM identified column "Country" as a GEO
LLM identified column "Division" as a GEO
LLM identified column "Status" as a FEATURE
LLM identified column "Source" as a FEATURE
LLM identified column "Link" as a FEATURE
LLM identified column "Arm" as a FEATURE
LLM identified column "Operator of Military Base" as a FEATURE
LLM identified column "Latitude" as a GEO
LLM identified column "Longitude" as a GEO
LLM identified column "Geo_Precision" as a FEATURE
LLM identified DATE column "Date" as a YEAR
LLM identified GEO column "Country" as a COUNTRY
LLM identified GEO column "Division" as a CITY
LLM identified GEO column "Latitude" as a LONGITUDE
LLM identified GEO column "Longitude" as a LONGITUDE
LLM identified FEATURE column "Base Count" as a INT
LLM identified FEATURE column "Name" as a STR
LLM identified FEATURE column "Status" as a STR
LLM identified FEATURE column "Source" as a STR
LLM identified FEATURE column "Link" as a STR
LLM identified FEATURE column "Arm" as a STR
LLM identified FEATURE column "Operator of Military Base" as a STR
LLM identified FEATURE column "Geo_Precision" as a BOOLEAN
LLM identified no units for feature column "Base Count"
LLM identified no units for feature column "Name"
LLM identified no units for feature column "Status"
LLM identified no units for feature column "Source"
LLM identified no units for feature column "Link"
LLM identified no units for feature column "Arm"
LLM identified no units for feature column "Operator of Military Base"
LLM identified no units for feature column "Geo_Precision"
LLM identified Longitude as the primary geo column(s)
LLM identified 'Date' as the primary date
LLM identified DATE/YEAR column "Date" strftime format: "%Y"
LLM provided description for feature column "Base Count": "Represents the number of military bases that China has in a specific overseas country as of the year 2020. Each value indicates the total bases located within that country."
LLM provided description for feature column "Name": "This column lists the official names or common identifiers of military bases located overseas that are affiliated with China as of 2020. These entries include various types of facilities, such as naval bases, support units, and intelligence gathering sites, some of which may not have a designated name."
LLM provided description for feature column "Status": "Indicates the operational status of China's overseas military bases, distinguishing between those that are fully operational and those whose presence or operations are subject to dispute or controversy."
LLM provided description for feature column "Source": "Indicates the nature of the information regarding China's overseas military bases in 2020, distinguishing between officially acknowledged sources and various unofficial channels, some of which are confirmed (denoted by '_c')."
LLM provided description for feature column "Link": "URLs to online resources providing detailed information about China's overseas military bases, including their locations, strategic importance, and related geopolitical insights."
LLM provided description for feature column "Arm": "This column categorizes the branch of the military or the type of military force that is present at each overseas military base, indicating whether the base is utilized by the navy, army, air force, or a combination of arms. It includes specific details such as access rights for naval bases."
LLM provided description for feature column "Operator of Military Base": "Indicates the country or entity that operates the military base in question."
LLM provided description for feature column "Geo_Precision": "Indicates whether the location coordinates provided for overseas military bases are precise and accurate."
LLM provided description for date column "Date": "The year when the recorded information about the overseas military bases was collected or relevant."
LLM provided description for geo column "Country": "This field lists the countries where China had established overseas military bases as of the year 2020. It includes nations across different regions, indicating the geographical spread of China's military presence abroad, with an additional category labeled "Other" to account for locations not specified in the detailed entries."
LLM provided description for geo column "Division": "This dataset column lists specific geographic locations around the world where China has established overseas military bases as of 2020. These locations can vary widely in nature, including cities, islands, or regions, some of which may be under disputed sovereignty."
LLM provided description for geo column "Latitude": "This column represents the geographic coordinate that specifies the north–south position of each Chinese overseas military base, measured in degrees. Each value notes the latitude where a base is located, indicating its position relative to the Equator. Higher values suggest a location further north, while lower values indicate proximity to or positioning in the southern hemisphere."
LLM provided description for geo column "Longitude": "This column represents the longitudinal coordinates of China's overseas military bases in 2020. Longitude measures how far east or west a location is from the prime meridian, expressed in degrees. These values indicate the east-west position of the bases on the Earth's surface."
geo=[GeoAnnotation(name='Country', display_name=None, description='This field lists the countries where China had established overseas military bases as of the year 2020. It includes nations across different regions, indicating the geographical spread of China\'s military presence abroad, with an additional category labeled "Other" to account for locations not specified in the detailed entries.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='Division', display_name=None, description='This dataset column lists specific geographic locations around the world where China has established overseas military bases as of 2020. These locations can vary widely in nature, including cities, islands, or regions, some of which may be under disputed sovereignty.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.CITY: 'municipality/town'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='Latitude', display_name=None, description='This column represents the geographic coordinate that specifies the north–south position of each Chinese overseas military base, measured in degrees. Each value notes the latitude where a base is located, indicating its position relative to the Equator. Higher values suggest a location further north, while lower values indicate proximity to or positioning in the southern hemisphere.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LONGITUDE: 'longitude'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='Longitude', display_name=None, description="This column represents the longitudinal coordinates of China's overseas military bases in 2020. Longitude measures how far east or west a location is from the prime meridian, expressed in degrees. These values indicate the east-west position of the bases on the Earth's surface.", type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.LONGITUDE: 'longitude'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None)] date=[DateAnnotation(name='Date', display_name=None, description='The year when the recorded information about the overseas military bases was collected or relevant.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.YEAR: 'year'>, primary_date=True, time_format='%Y', associated_columns=None, qualifies=None, aliases={})] feature=[FeatureAnnotation(name='Base Count', display_name=None, description='Represents the number of military bases that China has in a specific overseas country as of the year 2020. Each value indicates the total bases located within that country.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Name', display_name=None, description='This column lists the official names or common identifiers of military bases located overseas that are affiliated with China as of 2020. These entries include various types of facilities, such as naval bases, support units, and intelligence gathering sites, some of which may not have a designated name.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Status', display_name=None, description="Indicates the operational status of China's overseas military bases, distinguishing between those that are fully operational and those whose presence or operations are subject to dispute or controversy.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Source', display_name=None, description="Indicates the nature of the information regarding China's overseas military bases in 2020, distinguishing between officially acknowledged sources and various unofficial channels, some of which are confirmed (denoted by '_c').", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Link', display_name=None, description="URLs to online resources providing detailed information about China's overseas military bases, including their locations, strategic importance, and related geopolitical insights.", type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Arm', display_name=None, description='This column categorizes the branch of the military or the type of military force that is present at each overseas military base, indicating whether the base is utilized by the navy, army, air force, or a combination of arms. It includes specific details such as access rights for naval bases.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Operator of Military Base', display_name=None, description='Indicates the country or entity that operates the military base in question.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Geo_Precision', display_name=None, description='Indicates whether the location coordinates provided for overseas military bases are precise and accurate.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.BOOLEAN: 'boolean'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={})]
Meta(path=PosixPath('datasets/faostat-monthly_temperature_change.xlsx'), name='FAOSTAT Temperature Change - Country Level - 1961-2021 - Monthly', description='FAOSTAT estimated monthly temperature change at the country')
LLM identified column "Domain Code" as a GEO
LLM identified column "Domain" as a FEATURE
LLM identified column "Area Code (M49)" as a GEO
LLM identified column "Area" as a GEO
LLM identified column "Element Code" as a FEATURE
LLM identified column "Element" as a FEATURE
LLM identified column "Months Code" as a FEATURE
LLM identified column "Month" as a DATE
LLM identified column "Day" as a DATE
LLM identified column "Year Code" as a DATE
LLM identified column "Year" as a DATE
LLM identified column "Unit" as a FEATURE
LLM identified column "Value" as a FEATURE
LLM identified column "Flag" as a FEATURE
LLM identified column "Flag Description" as a FEATURE
LLM identified DATE column "Month" as a MONTH
LLM identified DATE column "Day" as a DAY
LLM identified DATE column "Year Code" as a YEAR
LLM identified DATE column "Year" as a YEAR
LLM identified GEO column "Domain Code" as a COUNTRY
LLM identified GEO column "Area Code (M49)" as a COUNTRY
LLM identified GEO column "Area" as a COUNTRY
LLM identified FEATURE column "Domain" as a STR
LLM identified FEATURE column "Element Code" as a INT
LLM identified FEATURE column "Element" as a FLOAT
LLM identified FEATURE column "Months Code" as a INT
LLM identified FEATURE column "Unit" as a FLOAT
LLM identified FEATURE column "Value" as a FLOAT
LLM identified FEATURE column "Flag" as a STR
LLM identified FEATURE column "Flag Description" as a STR
LLM was unsure about the units for feature column "Domain"
LLM was unsure about the units for feature column "Element Code"
LLM was unsure about the units for feature column "Element"
LLM identified no units for feature column "Months Code"
LLM provided units and description for feature column "Unit": °C. The "°C" unit represents degrees Celsius, a scale for temperature where 0°C is the freezing point and 100°C is the boiling point of water at 1 atmosphere of pressure.
LLM provided units and description for feature column "Value": °C. The "°C" unit denotes degrees Celsius, a scale for temperature measurement where 0°C represents the freezing point of water, and 100°C its boiling point at sea level.
LLM identified no units for feature column "Flag"
LLM identified no units for feature column "Flag Description"
LLM identified Area Code (M49) as the primary geo column(s)
LLM identified date group: ('Year', 'Month', 'Day')
LLM identified ('Year', 'Month', 'Day') as the primary date column(s)
LLM identified DATE/MONTH column "Month" strftime format: "%m"
LLM identified DATE/DAY column "Day" strftime format: "%d"
LLM identified DATE/YEAR column "Year Code" strftime format: "%Y"
LLM identified DATE/YEAR column "Year" strftime format: "%Y"
LLM provided description for feature column "Domain": "Identifies the broad category of data reported, in this case, indicating measurements related to changes in temperature over time."
LLM provided description for feature column "Element Code": "This code uniquely identifies the type of statistical measure recorded in the dataset, such as temperature anomalies or average temperature changes, for a given country and month."
LLM provided description for feature column "Element": "Indicates the variation in average temperature compared to a historical baseline, measured in degrees Celsius. This change reflects how much warmer or cooler a given location is in a specific month relative to the average conditions of that month over a base period."
LLM provided description for feature column "Months Code": "A unique code representing specific months or ranges of months over which temperature data were collected or aggregated. These codes allow for the identification and differentiation of monthly data, seasonal averages, and other temporal aggregations within the dataset."
LLM provided description for feature column "Unit": "This field specifies the temperature measurement in degrees Celsius. It denotes the recorded temperature change value for a specific location and time, utilizing the Celsius scale where the freezing and boiling points of water are 0°C and 100°C, respectively, under standard atmospheric conditions."
LLM provided description for feature column "Value": "This feature represents the temperature deviation in degrees Celsius for a given country and month from the baseline historical average temperature (1961-1990). Positive values indicate temperatures above the historical average, while negative values denote temperatures below it."
LLM provided description for feature column "Flag": "This field indicates the data quality or status, with "E" typically representing an estimated value based on statistical methods or other data sources."
LLM provided description for feature column "Flag Description": "Indicates the nature of the data value, specifying whether it was directly observed or derived through estimation techniques."
LLM provided description for date column "Month": "Represents the month of the year when the temperature data was recorded, using a numerical format where 1 corresponds to January and 12 to December."
LLM provided description for date column "Day": "This column represents the day of the month for temperature data recordings. Each entry is marked as the 1st day, indicating a standardized collection date to compare monthly temperature changes across years and locations."
LLM provided description for date column "Year Code": "Represents the year associated with the observed or measured temperature change data, ranging from 1961 to 2021."
LLM provided description for date column "Year": "This column records the calendar year during which the observed temperature changes occurred, ranging from the start in 1961 through to 2021. Each entry represents the year for monthly temperature observations at the country level as part of the FAOSTAT Temperature Change dataset."
LLM provided description for geo column "Domain Code": "This column represents the unique code indicating the specific domain of the dataset, focusing on environmental temperature changes at the country level, covering a span from 1961 to 2021, with monthly updates. The code "ET" suggests a standardized identifier within the dataset for easier categorization and reference."
LLM provided description for geo column "Area Code (M49)": "This column represents numerical codes assigned to countries as per the United Nations M49 standard for geographic regions. These codes facilitate the organization and analysis of data on a country-by-country basis by providing a unique identifier for each country, allowing for consistent data processing and comparison."
LLM provided description for geo column "Area": "This column lists the names of countries, specifically focusing on Afghanistan in the provided rows, indicating the geographical location to which the temperature change data corresponds."
geo=[GeoAnnotation(name='Domain Code', display_name=None, description='This column represents the unique code indicating the specific domain of the dataset, focusing on environmental temperature changes at the country level, covering a span from 1961 to 2021, with monthly updates. The code "ET" suggests a standardized identifier within the dataset for easier categorization and reference.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='Area Code (M49)', display_name=None, description='This column represents numerical codes assigned to countries as per the United Nations M49 standard for geographic regions. These codes facilitate the organization and analysis of data on a country-by-country basis by providing a unique identifier for each country, allowing for consistent data processing and comparison.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=True, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None), GeoAnnotation(name='Area', display_name=None, description='This column lists the names of countries, specifically focusing on Afghanistan in the provided rows, indicating the geographical location to which the temperature change data corresponds.', type=<ColumnType.GEO: 'geo'>, geo_type=<GeoType.COUNTRY: 'country'>, primary_geo=None, resolve_to_gadm=None, is_geo_pair=None, coord_format=None, qualifies=None, aliases={}, gadm_level=None)] date=[DateAnnotation(name='Month', display_name=None, description='Represents the month of the year when the temperature data was recorded, using a numerical format where 1 corresponds to January and 12 to December.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.MONTH: 'month'>, primary_date=True, time_format='%m', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='Day', display_name=None, description='This column represents the day of the month for temperature data recordings. Each entry is marked as the 1st day, indicating a standardized collection date to compare monthly temperature changes across years and locations.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.DAY: 'day'>, primary_date=True, time_format='%d', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='Year Code', display_name=None, description='Represents the year associated with the observed or measured temperature change data, ranging from 1961 to 2021.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.YEAR: 'year'>, primary_date=None, time_format='%Y', associated_columns=None, qualifies=None, aliases={}), DateAnnotation(name='Year', display_name=None, description='This column records the calendar year during which the observed temperature changes occurred, ranging from the start in 1961 through to 2021. Each entry represents the year for monthly temperature observations at the country level as part of the FAOSTAT Temperature Change dataset.', type=<ColumnType.DATE: 'date'>, date_type=<DateType.YEAR: 'year'>, primary_date=True, time_format='%Y', associated_columns={<TimeField.YEAR: 'Year'>: 'Year', <TimeField.MONTH: 'Month'>: 'Month', <TimeField.DAY: 'Day'>: 'Day'}, qualifies=None, aliases={})] feature=[FeatureAnnotation(name='Domain', display_name=None, description='Identifies the broad category of data reported, in this case, indicating measurements related to changes in temperature over time.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units=None, units_description=None, qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Element Code', display_name=None, description='This code uniquely identifies the type of statistical measure recorded in the dataset, such as temperature anomalies or average temperature changes, for a given country and month.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units=None, units_description=None, qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Element', display_name=None, description='Indicates the variation in average temperature compared to a historical baseline, measured in degrees Celsius. This change reflects how much warmer or cooler a given location is in a specific month relative to the average conditions of that month over a base period.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units=None, units_description=None, qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Months Code', display_name=None, description='A unique code representing specific months or ranges of months over which temperature data were collected or aggregated. These codes allow for the identification and differentiation of monthly data, seasonal averages, and other temporal aggregations within the dataset.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.INT: 'int'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Unit', display_name=None, description='This field specifies the temperature measurement in degrees Celsius. It denotes the recorded temperature change value for a specific location and time, utilizing the Celsius scale where the freezing and boiling points of water are 0°C and 100°C, respectively, under standard atmospheric conditions.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='°C', units_description='The "°C" unit represents degrees Celsius, a scale for temperature where 0°C is the freezing point and 100°C is the boiling point of water at 1 atmosphere of pressure.', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Value', display_name=None, description='This feature represents the temperature deviation in degrees Celsius for a given country and month from the baseline historical average temperature (1961-1990). Positive values indicate temperatures above the historical average, while negative values denote temperatures below it.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.FLOAT: 'float'>, units='°C', units_description='The "°C" unit denotes degrees Celsius, a scale for temperature measurement where 0°C represents the freezing point of water, and 100°C its boiling point at sea level.', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Flag', display_name=None, description='This field indicates the data quality or status, with "E" typically representing an estimated value based on statistical methods or other data sources.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={}), FeatureAnnotation(name='Flag Description', display_name=None, description='Indicates the nature of the data value, specifying whether it was directly observed or derived through estimation techniques.', type=<ColumnType.FEATURE: 'feature'>, feature_type=<FeatureType.STR: 'str'>, units='N/A', units_description='N/A', qualifies=None, qualifierrole=None, aliases={})]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment