mirror of
https://github.com/dathere/ckan_geoconnex_bulk_runner.git
synced 2026-07-05 15:12:20 +00:00
feat: enhanced cargo workspace, NM usage, Dockerfile
This commit is contained in:
parent
71b08a53f0
commit
3a79fb2b0a
18 changed files with 362 additions and 2478 deletions
39
README.md
39
README.md
|
|
@ -4,38 +4,11 @@ https://github.com/user-attachments/assets/779fe866-d511-44f3-91a9-a1c2e1cfa189
|
|||
|
||||
> Status: This codebase is currently a work in progress and more documentation is planned.
|
||||
|
||||
The `ckan_geoconnex_bulk_runner` codebase is meant to run as a container for a bulk integration of a [CKAN](https://ckan.org) instance's relevant datasets and vector geospatial features (e.g. for water data hubs) to the [Geoconnex](https://geoconnex.us) knowledge graph. The codebase ultimately runs as a program outputting to standard output JSON-LD on a new line for each approved dataset/location which the Geoconnex crawler then uses to update the Geoconnex knowledge graph.
|
||||
The `ckan_geoconnex_bulk_runner` codebase is part of a multi-service infrastructure to sync water data hubs using [CKAN](https://ckan.org) to the [Geoconnex](https://geoconnex.us) knowledge graph.
|
||||
|
||||
Refer to the "Contributing via Bulk Containers" documentation here for more information: https://docs.geoconnex.us/contributing/bulk/
|
||||
- [**geoconnex_utils**](geoconnex_utils): Helper functions used throughout the ckan_geoconnex_bulk_runner project including JSON-LD construction and JSON schema validation.
|
||||
- [**geoconnex_release**](geoconnex_release): Compatible CKAN datasets and vector geospatial features for all connected water data hubs are uploaded to a `ckan-geoconnex-web-resources.jsonl` file in the latest GitHub release.
|
||||
- [**bulk_loader**](bulk_loader): Requests and outputs the latest JSONL file from the latest GitHub release. This is ran as a Docker container by Geoconnex on a periodic frequency to upload all water data hub web resources to the Geoconnex knowledge graph following the Geoconnex [bulk contribution specification](https://docs.geoconnex.us/contributing/bulk/).
|
||||
- [**ckan_geoconnex_bulk_runner_py](ckan_geoconnex_bulk_runner_py): Python library intended for usage by the ckanext-gztr and [DataPusher+](https://github.com/dathere/datapusher-plus) CKAN extensions.
|
||||
|
||||
This runner is expected to be implemented for a water data hub with the relevant fields and/or ckanext-gztr (not open-source yet) and/or [DataPusher+](https://github.com/dathere/datapusher-plus) enabled. For questions reach out to [datHere](https://dathere.com), [Center for Geospatial Solutions](https://cgsearth.org/), or add an issue/discussion.
|
||||
|
||||
## Installation and setup
|
||||
|
||||
```bash
|
||||
cargo run -p ckan_geoconnex_bulk_runner --release
|
||||
```
|
||||
|
||||
To ignore standard error output and only show valid output:
|
||||
|
||||
```bash
|
||||
cargo run -p ckan_geoconnex_bulk_runner --release 2>/dev/null
|
||||
```
|
||||
|
||||
## Run tests
|
||||
|
||||
```bash
|
||||
cargo test -p ckan_geoconnex_bulk_runner
|
||||
```
|
||||
|
||||
To include print statements in test output, run:
|
||||
|
||||
```bash
|
||||
cargo test -p ckan_geoconnex_bulk_runner -- --nocapture
|
||||
```
|
||||
|
||||
If you have the local dump files setup available you can run those tests with:
|
||||
|
||||
```bash
|
||||
cargo test -p ckan_geoconnex_bulk_runner -F local -- --nocapture
|
||||
```
|
||||
This runner is expected to be implemented for a water data hub with the relevant fields and/or ckanext-gztr (not open-source yet) and/or DataPusher+ enabled. For questions reach out to [datHere](https://dathere.com), [Center for Geospatial Solutions](https://cgsearth.org/), or add an issue/discussion.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue