Utilities
Various utilities used in the basic example and ndj_pipeline.
my_project.utils
Utility functions for modeling, data processing and one time use def tools.
- my_project.utils.start()
Program entry point for python and script.
- Raises
TypeError – If there are problems converting the environment
inputs to path objects. –
- Return type
None
- my_project.utils.update_environments(main_env='environment.yml', opt_env='environment2.yml')
Utility to assist updating the project environment.yml.
If we simply do conda env export > environment2.yml we will obtain every library, not just the key ones we select for inclusion. This can make knowing the subset of key libraries difficult, and reduce chances of successful env replication.
This util takes the current environment.yml (with current hashes and # commented libraries)* and an environment.yml which is generated by above command, and prints the minimal set of libraries needed.
While experimenting with new libraries, it can be useful to mark the new libraries as a comment in the file until ready to include.
- Parameters
main_env (
Union
[Path
,str
]) – path as stringopt_env (
Union
[Path
,str
]) – path as string
- Return type
None
ndj_pipeline.config
Configuration variables ndj_pipeline project.
ndj_pipeline.utils
Mix of utilities.
- ndj_pipeline.utils.clean_column_names(column_list)
Simple string cleaning rules for columns.
- Parameters
column_list (
List
[str
]) – Column names to be cleaned- Return type
Dict
[str
,str
]- Returns
A dict mapping old and cleaned column names.
- ndj_pipeline.utils.create_model_folder(model_config)
Create model asset folder and write config if it doesn’t exist.
- Return type
None
- ndj_pipeline.utils.create_tables_html()
Scan schemas directory to create HTML page for data documentation.
- Return type
None
- ndj_pipeline.utils.get_model(function)
Simple redirection to get named function from model.py.
- Return type
Callable
- ndj_pipeline.utils.get_model_path(model_config)
Returns the model path from config file.
- Return type
Path
- ndj_pipeline.utils.get_post(function)
Simple redirection to get named function from post.py.
- Return type
Callable
- ndj_pipeline.utils.load_model_config(model_config_path)
Loads model config, either from yaml or json format.
- Return type
Dict
[str
,Any
]
- ndj_pipeline.utils.main()
Run selected utility function.
Can be run from command line using… python -m ndj_pipeline.utils –tables
- Return type
None
- ndj_pipeline.utils.parse_schema_to_table(schema)
Parses a table schema into a HTML table for use in documentation.
- Return type
str