Provider#
The Provider’s role is to call a Retriever object if the variables for the given source and dates provided by the users are not already in a local cache. The cache path must be provided by the user. The Provider object will then return the data to the user in the form of a xarray.Dataset object easily manipulable by the user.
- class CacheRetriever(path: PathLike)
Provides caching functionality for retrieved data.
- provide(source: str, variables: list[tuple[str, dict]], dates: datetime.date | str | pandas._libs.tslibs.timestamps.Timestamp | list[Any], is_static: bool = False, kwargs_str: str = '') tuple[xarray.core.dataset.Dataset, list[Any]]
Load cached data if available, or return missing variables/dates if not.
- Parameters:
source (str) – Identifier of the data source.
variables (list of tuple (str, dict)) – List of variable definitions (variable name and parameters).
dates (list of datetime.date or datetime.date) – The date or dates for which to load cached data.
is_static (bool, optional) – Flag indicating if the variables are static. Default is False.
kwargs_str (str, optional) – String representation of the keyword arguments used for retrieval.
- Returns:
A tuple containing: - xr.Dataset: Merged dataset of cached data. - list or dict: Missing variables if any, per date.
- Return type:
tuple
- save(data: Dataset, source: str, variables: list[tuple[str, dict]], dates: list[datetime.date] | datetime.date, is_static: bool = False, kwargs_str: str = '') None
Save the data to cache.
- Parameters:
data (xarray.Dataset) – The dataset to be saved.
source (str) – Identifier of the data source.
variables (list of tuple (str, dict)) – List of variable definitions.
dates (list of datetime.date or datetime.date) – Date or dates associated with the data.
is_static (bool, optional) – Flag indicating if the variables are static, in which case date is irrelevant.
kwargs_str (str, optional) – String representation of the keyword arguments used for retrieval.
- Return type:
None
- Raises:
PermissionError – If the cache directory is read-only.
- class DataProvider(cache: weathermart.provide.CacheRetriever | None, retrievers: Sequence[BaseRetriever])
Provides input data by retrieving from sources and utilizing caching.
This class integrates data retrieval with caching and exposes methods to obtain metadata fields, variable mappings, and coordinate reference systems (CRS) for a given source.
- get_crs(source: str) str | dict[str, str] | None
Get the coordinate reference system (CRS) for the given source.
- get_kwargs_str(kwargs: dict[str, Any]) str
Generate a string representation of keyword arguments.
This method processes a dictionary of keyword arguments and constructs a string by concatenating processed representations of each key-value pair, ignoring those in DataProvider._ignored_kwargs. Specific keys have custom formatting rules.
- Parameters:
kwargs (dict) – A dictionary containing keyword arguments to be processed.
- Returns:
A concatenated string representation of the keyword arguments, or an empty string if kwargs is empty.
- Return type:
str
- get_static_flag(source: str) bool
Determine if the source has static variables.
- get_variable_mapping(source: str) dict[Any, Any]
Get variable mapping for the specified source.
- provide(source: str, variables: list[tuple[str, dict]], dates: datetime.date | str | pandas._libs.tslibs.timestamps.Timestamp | list[Any], **kwargs: Any) Dataset
Retrieve data from cache and source as needed.
This method checks the cache for the requested data. If data for any date or variable is missing in the cache, it retrieves the missing parts from the source, saves them to the cache, and then concatenates all datasets.
- Parameters:
source (str) – The source identifier.
variables (list of tuple (str, dict)) – List of variable definitions.
dates (list of datetime.date or datetime.date) – Date or dates for which data is requested.
**kwargs (dict) – Additional keyword arguments to pass to the retriever.
- Returns:
The merged dataset containing the requested data.
- Return type:
xarray.Dataset
- provide_from_config(config: dict[str, Any], **kwargs: Any) None
Provide data based on a configuration dictionary.
The configuration dictionary should contain a ‘dates’ key and keys for each source with corresponding variable names. Additional keyword arguments can be specified in the ‘kwargs’ key of the configuration.
- Parameters:
config (dict of {str: list of str}) – Configuration dictionary with key ‘dates’ and source-to-variable mappings.
**kwargs (dict) – Additional keyword arguments to pass to the provide method.
- Return type:
None
- update_retriever_with_kwargs(source: str, **kwargs: Any) DataProvider
Update the retriever for the specified source with new keyword arguments.
- Parameters:
source (str) – The source identifier.
**kwargs (dict) – Keyword arguments to reinitialize the retriever.
- Returns:
The updated DataProvider instance.
- Return type:
DataProvider
- chunk_data(data: xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset) Any
Chunk data with specific case for station indexes.