Skip to content

Commit 5735f2c

Browse files
committed
Major update of the extension
- Added `geometry_type` property to Column Object - Added `table:primary_datetime` #12 - Allow most fields also in the Asset Object #9 - Allow more properties in the Column Object #8 - Deprecated `table:storage_options`: The property is not specific to tables but specific to fsspec. It should be generalized. - Deprecated `table:tables`: Tables in collections should be summarized using Item Asset Definitions or Collection Summaries instead. #10 - Clarified usage of common metadata and extensions in the Column Object #8 #1 - Improved schema
1 parent 0a851fc commit 5735f2c

7 files changed

Lines changed: 312 additions & 347 deletions

File tree

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# Changelog
2+
23
All notable changes to this project will be documented in this file.
34

45
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
@@ -8,14 +9,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
89

910
### Added
1011

12+
- Added `geometry_type` property to Column Object
13+
- Added `table:primary_datetime`
14+
- Allow most fields also in the Asset Object
15+
1116
### Changed
1217

18+
- Allow more properties in the Column Object
19+
1320
### Deprecated
1421

22+
- `table:storage_options`: The property is not specific to tables but specific to fsspec. It should be generalized.
23+
- `table:tables`: Tables in collections should be summarized using Item Asset Definitions or Collection Summaries instead.
24+
1525
### Removed
1626

1727
### Fixed
1828

29+
- Clarified usage of common metadata and extensions in the Column Object
30+
- Improved schema
31+
1932
## [v1.2.0] - 2021-08-30
2033

2134
- Fixed version number in json schema.

README.md

Lines changed: 71 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -19,54 +19,94 @@ Additionally, Collections can describe many tabular datasets using [Table object
1919
- [JSON Schema](json-schema/schema.json)
2020
- [Changelog](./CHANGELOG.md)
2121

22-
## Item Properties and Collection Fields
22+
## Fields
2323

24-
| Field Name | Type | Description |
25-
| ---------------------- | ----------------------------------- | ----------------------------------------------------------------- |
26-
| table:columns | [ [Column Object](#column-object) ] | A list of (#column objects) describing each column. |
27-
| table:primary_geometry | string | The primary geometry column name. |
28-
| table:row_count | number | The number of rows in the dataset. |
24+
The fields in the table below can be used in these parts of STAC documents:
2925

30-
**table:primary_geometry** Is the column name of the "primary" or "active" geometry. This is used by libraries like [geopandas] and [sf]
26+
- [x] Collections
27+
- [x] Item Properties (incl. Summaries in Collections)
28+
- [x] Assets (for both Collections and Items, incl. Item Asset Definitions in Collections)
29+
30+
| Field Name | Type | Description |
31+
| ---------------------- | ---------------------------------- | ----------- |
32+
| table:columns | \[[Column Object](#column-object)] | A list of (#column objects) describing each column. |
33+
| table:primary_geometry | string | The primary geometry column name. |
34+
| table:primary_datetime | string | The primary date/time column name. |
35+
| table:row_count | number | The number of rows in the dataset. |
36+
37+
---
38+
39+
The fields in the table below can be used in these parts of STAC documents:
40+
41+
- [x] Assets (for both Collections and Items, incl. Item Asset Definitions in Collections)
42+
43+
| Field Name | Type | Description |
44+
| ---------------------- | ----------------------------------- | ----------- |
45+
| table:storage_options | Map<string, any> | **DEPRECATED** Additional keywords for opening the dataset. |
46+
47+
---
48+
49+
The fields in the table below can be used in these parts of STAC documents:
50+
51+
- [x] Collections
52+
- [x] Item Properties (incl. Summaries in Collections)
53+
- [x] Assets (for both Collections and Items, incl. Item Asset Definitions in Collections)
54+
55+
They can be used to catalog a collection of tables, where each table is stored as an `Item`, without
56+
having to include column-level metadata from each table on the Collection.
57+
58+
| Field Name | Type | Description |
59+
| ------------ | ------------------------------------------ | ------------------------------------------ |
60+
| table:tables | Map<string, [Table Object](#table-object)> | **DEPRECATED** A mapping of table names to |
61+
62+
### table:primary_geometry
63+
64+
This is the column name of the "primary" or "active" geometry. This is used by libraries like [geopandas] and [sf]
3165
to control which geometry column is used. When a STAC item uses both the [projection] and `table` extensions, it's understood that the
32-
values in `proj:espg`, `proj:bbox`, etc. refer to the `primary_geometry` column.
66+
values in `proj:espg`, `proj:bbox`, etc. that (implicitly) apply to the asset refer to the `primary_geometry` column.
67+
68+
### table:storage_options
69+
70+
This can be used with [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) to specify additional keywords
71+
necessary to open the data. For example, an asset might use ``{"account_name": "ai4edataeuwest"}`` to indicate that the asset is
72+
in the ``ai4edataeuwest`` storage account. Libraries like [adlfs](https://github.com/dask/adlfs) use this information to open the dataset.
3373

3474
### Column Object
3575

3676
Column objects contain information about each colum in the table.
3777

38-
| Field Name | Type | Description |
39-
| ----------- | ------ | -------------------------------------------------------------------------------------------------------------------------- |
40-
| name | string | **REQUIRED**. The column name |
41-
| description | string | Detailed multi-line description to explain the dimension. CommonMark 0.29 syntax MAY be used for rich text representation. |
42-
| type | string | Data type of the column. If using a file format with a type system (like Parquet), we recommend you use those types. |
78+
| Field Name | Type | Description |
79+
| ------------- | ------ | ----------- |
80+
| name | string | **REQUIRED**. The column name. |
81+
| description | string | Detailed multi-line description to explain the dimension. CommonMark 0.29 syntax MAY be used for rich text representation. |
82+
| type | string | Native data type of the column. If using a file format with a type system (like Parquet), we recommend you use those types. |
83+
| geometry_type | string | Geometry type provided by the column. Only applies to geometry columns, e.g. the column identified by the `table:primary_geometry`. |
4384

44-
## *Asset Object* fields
85+
Other properties such as `description`, `license`, `unit`, `data_type` and `statistics` from
86+
[STAC common metadata](https://github.com/radiantearth/stac-spec/blob/master/commons/common-metadata.md)
87+
can be used in the Column Object.
4588

46-
The following fields can be used for assets (in the [`Asset Object`](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#asset-object)).
89+
`type` and `data_type` describe the same information, but `type` should use the native name in the given file format and `data_type` describes the [standardized data type name according to the STAC specification](https://github.com/radiantearth/stac-spec/blob/master/commons/common-metadata.md#data-types).
4790

48-
| Field Name | Type | Description |
49-
| --------------------- | ---------------- | -------------------------------------------- |
50-
| table:storage_options | Map<string, any> | Additional keywords for opening the dataset. |
91+
Columns can also include additional information from other extensions that are not otherwise covered on the asset-level
92+
and are column specific, e.g. [projection] extension information for additional geometry columns.
5193

52-
``table:storage_options`` can be used with [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) to specify additional keywords
53-
necessary to open the data. For example, an asset might use ``{"account_name": "ai4edataeuwest"}`` to indicate that the asset is
54-
in the ``ai4edataeuwest`` storage account. Libraries like [adlfs](https://github.com/dask/adlfs) use this information to open the dataset.
94+
#### geometry_type
5595

56-
## Collection Fields
96+
Describes the geometry type provided by the column. Do not provide the property, if mixed geometry types occur.
5797

58-
The following fields apply only to
59-
[Collections](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md).
60-
They can be used to catalog a collection of tables, where each table is stored as an `Item`, without
61-
having to include column-level metadata from each table on the Collection.
98+
Must be one of the GeoJSON geometry types:
6299

63-
| Field Name | Type | Description |
64-
| ------------ | ------------------------------------------ | ---------------------------------------- |
65-
| table:tables | Map<string, [Table Object](#table-object)> | **REQUIRED** A mapping of table names to |
100+
- `Point`
101+
- `MultiPoint`
102+
- `LineString`
103+
- `MultiLineString`
104+
- `Polygon`
105+
- `MultiPolygon`
66106

67107
### Table Object
68108

69-
Table objects contain high-level summaries about a table.
109+
**DEPRECATED:** Table objects contain high-level summaries about a table.
70110

71111
| Field Name | Type | Description |
72112
| ----------- | ------ | -------------------------------------------------------------------------------------------------------------------------- |
@@ -85,7 +125,7 @@ For a dataset consisting of a single table or many tables with the same schema (
85125
different points in time), you might include `table:columns` on the `Collection` itself, or both the `Collection` and `items`.
86126

87127
For datasets with many tables (for example, [USF Forest Inventory and Analysis](https://github.com/microsoft/AIforEarthDataSets/blob/main/data/forest-inventory-and-analysis.md)),
88-
we recommend cataloging just the *tables* at the Collection level in `table:tables`, and cataloging the the columns at just the `Item` level in `table:columns`
128+
we recommend cataloging the the columns at just the `Item` level in `table:columns`
89129
on each Item.
90130

91131
## Contributing

examples/collection-2.json

Lines changed: 0 additions & 34 deletions
This file was deleted.

examples/collection.json

Lines changed: 25 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
{
2-
"stac_version": "1.0.0",
2+
"stac_version": "1.1.0",
33
"stac_extensions": [
4-
"https://stac-extensions.github.io/item-assets/v1.0.0/schema.json",
54
"https://stac-extensions.github.io/table/v1.2.0/schema.json"
65
],
76
"type": "Collection",
@@ -31,7 +30,30 @@
3130
},
3231
"assets": {
3332
"example": {
34-
"href": "https://example.com/examples/file.xyz"
33+
"href": "https://example.com/examples/stac.geoparquet",
34+
"table:columns": [
35+
{
36+
"name": "geometry",
37+
"description": "The observation location.",
38+
"type": "binary",
39+
"geometry_type": "Polygon"
40+
},
41+
{
42+
"name": "datetime",
43+
"description": "The observation datetime.",
44+
"type": "datetime"
45+
},
46+
{
47+
"name": "id",
48+
"description": "The numerical identifier",
49+
"type": "int64"
50+
},
51+
{
52+
"name": "value"
53+
}
54+
],
55+
"table:primary_geometry": "geometry",
56+
"table:primary_datetime": "datetime"
3557
}
3658
},
3759
"item_assets": {
@@ -48,21 +70,6 @@
4870
"maximum": "2019-07-10T13:44:56Z"
4971
}
5072
},
51-
"table:columns": [
52-
{
53-
"name": "geometry",
54-
"description": "The observation location.",
55-
"type": "int64"
56-
},
57-
{
58-
"name": "id",
59-
"description": "The numerical identifier"
60-
},
61-
{
62-
"name": "value"
63-
}
64-
],
65-
"table:primary_geometry": "geometry",
6673
"links": [
6774
{
6875
"href": "https://example.com/examples/collection.json",

examples/item.json

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"stac_version": "1.0.0",
2+
"stac_version": "1.1.0",
33
"stac_extensions": [
44
"https://stac-extensions.github.io/table/v1.2.0/schema.json"
55
],
@@ -39,26 +39,7 @@
3939
]
4040
},
4141
"properties": {
42-
"datetime": "2020-12-11T22:38:32Z",
43-
"table:columns": [
44-
{
45-
"name": "geometry",
46-
"description": "The observation location.",
47-
"type": "byte_array"
48-
},
49-
{
50-
"name": "id",
51-
"description": "The numerical identifier",
52-
"type": "int64"
53-
},
54-
{
55-
"name": "value",
56-
"description": "The observed value",
57-
"type": "float64"
58-
}
59-
],
60-
"table:primary_geometry": "geometry",
61-
"table:row_count": 100
42+
"datetime": "2020-12-11T22:38:32Z"
6243
},
6344
"links": [
6445
{
@@ -68,7 +49,26 @@
6849
],
6950
"assets": {
7051
"data": {
71-
"href": "https://example.com/examples/file.xyz"
52+
"href": "https://example.com/examples/file.geoparquet",
53+
"table:columns": [
54+
{
55+
"name": "geometry",
56+
"description": "The observation location.",
57+
"type": "byte_array"
58+
},
59+
{
60+
"name": "id",
61+
"description": "The numerical identifier",
62+
"type": "int64"
63+
},
64+
{
65+
"name": "value",
66+
"description": "The observed value",
67+
"type": "float64"
68+
}
69+
],
70+
"table:primary_geometry": "geometry",
71+
"table:row_count": 100
7272
}
7373
}
7474
}

0 commit comments

Comments
 (0)