Skip to content

Configurations

Melusine components can be instantiated using parameters defined in configurations. The from_config method accepts a config_dict argument.

from melusine.processors import Normalizer

normalizer_conf = {
    "input_columns": ["text"],
    "output_columns": ["normalized_text"],
    "form": "NFKD",
    "lowercase": False,
}

normalizer = Normalizer.from_config(config_dict=normalizer_conf)

Or a config_key argument.

from melusine.pipeline import MelusinePipeline

pipeline = MelusinePipeline.from_config(config_key="demo_pipeline")

When demo_pipeline is given as argument, parameters are read from the melusine.config object at key demo_pipeline.

Access Configurations

The melusine configurations can be accessed with the config object.

from melusine import config

print(config["demo_pipeline"])

The configuration of the demo_pipeline can then be easily inspected.

{
  'steps': [
    {'class_name': 'Cleaner', 'config_key': 'body_cleaner', 'module': 'melusine.processors'},
    {'class_name': 'Cleaner', 'config_key': 'header_cleaner', 'module': 'melusine.processors'},
    {'class_name': 'Segmenter', 'config_key': 'segmenter', 'module': 'melusine.processors'},
    {'class_name': 'ContentTagger', 'config_key': 'content_tagger', 'module': 'melusine.processors'},
    {'class_name': 'TextExtractor', 'config_key': 'text_extractor', 'module': 'melusine.processors'},
    {'class_name': 'Normalizer', 'config_key': 'demo_normalizer', 'module': 'melusine.processors'},
    {'class_name': 'EmergencyDetector', 'config_key': 'emergency_detector', 'module': 'melusine.detectors'}
  ]
}

Modify Configurations

The simplest way to modify configurations is to create a new directory directly.

from melusine import config

# Get a dict of the existing conf
new_conf = config.dict()

# Add/Modify a config key
new_conf["my_conf_key"] = "my_conf_value"

# Reset Melusine configurations
config.reset(new_conf)

To deliver code in a production environment, using configuration files should be preferred to modifying the configurations on the fly. Melusine lets you specify the path to a folder containing yaml files and loads them (the OmegaConf package is used behind the scene).

from melusine import config

# Specify the path to a conf folder
conf_path = "path/to/conf/folder"

# Reset Melusine configurations
config.reset(config_path=conf_path)

# >> Using config_path : path/to/conf/folder

When the MELUSINE_CONFIG_DIR environment variable is set, Melusine loads directly the configurations files located at the path specified by the environment variable.

import os

from melusine import config

# Specify the MELUSINE_CONFIG_DIR environment variable
os.environ["MELUSINE_CONFIG_DIR"] = "path/to/conf/folder"

# Reset Melusine configurations
config.reset()

# >> Using config_path from env variable MELUSINE_CONFIG_DIR
# >> Using config_path : path/to/conf/folder

Tip

If the MELUSINE_CONFIG_DIR is set before melusine is imported (e.g., before starting the program), you don't need to call config.reset().

Export Configurations

Creating your configuration folder from scratch would be cumbersome. It is advised to export the default configurations and then modify just the files you need.

from melusine import config

# Specify the path a folder (created if it doesn't exist)
conf_path = "path/to/conf/folder"

# Export default configurations to the folder
files_created = config.export_default_config(path=conf_path)

Tip

The export_default_config returns a list of path to all the files created.