Utilities

Various utilities used in the basic example and ndj_pipeline.

my_project.utils

Utility functions for modeling, data processing and one time use def tools.

my_project.utils.start()

Program entry point for python and script.

Raises
  • TypeError – If there are problems converting the environment

  • inputs to path objects.

Return type

None

my_project.utils.update_environments(main_env='environment.yml', opt_env='environment2.yml')

Utility to assist updating the project environment.yml.

If we simply do conda env export > environment2.yml we will obtain every library, not just the key ones we select for inclusion. This can make knowing the subset of key libraries difficult, and reduce chances of successful env replication.

This util takes the current environment.yml (with current hashes and # commented libraries)* and an environment.yml which is generated by above command, and prints the minimal set of libraries needed.

While experimenting with new libraries, it can be useful to mark the new libraries as a comment in the file until ready to include.

Parameters
  • main_env (Union[Path, str]) – path as string

  • opt_env (Union[Path, str]) – path as string

Return type

None

ndj_pipeline.config

Configuration variables ndj_pipeline project.

ndj_pipeline.utils

Mix of utilities.

ndj_pipeline.utils.clean_column_names(column_list)

Simple string cleaning rules for columns.

Parameters

column_list (List[str]) – Column names to be cleaned

Return type

Dict[str, str]

Returns

A dict mapping old and cleaned column names.

ndj_pipeline.utils.create_model_folder(model_config)

Create model asset folder and write config if it doesn’t exist.

Return type

None

ndj_pipeline.utils.create_tables_html()

Scan schemas directory to create HTML page for data documentation.

Return type

None

ndj_pipeline.utils.get_model(function)

Simple redirection to get named function from model.py.

Return type

Callable

ndj_pipeline.utils.get_model_path(model_config)

Returns the model path from config file.

Return type

Path

ndj_pipeline.utils.get_post(function)

Simple redirection to get named function from post.py.

Return type

Callable

ndj_pipeline.utils.load_model_config(model_config_path)

Loads model config, either from yaml or json format.

Return type

Dict[str, Any]

ndj_pipeline.utils.main()

Run selected utility function.

Can be run from command line using… python -m ndj_pipeline.utils –tables

Return type

None

ndj_pipeline.utils.parse_schema_to_table(schema)

Parses a table schema into a HTML table for use in documentation.

Return type

str