mapreader.classify.classifier

Module Contents

Classes

ClassifierContainer

class mapreader.classify.classifier.ClassifierContainer(model, labels_map, dataloaders=None, device='default', input_size=(224, 224), is_inception=False, load_path=None, force_device=False, **kwargs)
Parameters:
  • model (str | torch.nn.Module | None)

  • labels_map (dict[int, str] | None)

  • dataloaders (dict[str, torch.utils.data.DataLoader] | None)

  • device (str | None)

  • input_size (int | None)

  • is_inception (bool)

  • load_path (str | None)

  • force_device (bool | None)

generate_layerwise_lrs(min_lr, max_lr, spacing='linspace')

Calculates layer-wise learning rates for a given set of model parameters.

Parameters:
  • min_lr (float) – The minimum learning rate to be used.

  • max_lr (float) – The maximum learning rate to be used.

  • spacing (str, optional) – The type of sequence to use for spacing the specified interval learning rates. Can be either "linspace" or "geomspace", where “linspace” uses evenly spaced learning rates over a specified interval and “geomspace” uses learning rates spaced evenly on a log scale (a geometric progression). By default "linspace".

Returns:

A list of dictionaries containing the parameters and learning rates for each layer.

Return type:

list of dicts

initialize_optimizer(optim_type='adam', params2optimize='default', optim_param_dict=None, add_optim=True)

Initializes an optimizer for the model and adds it to the classifier object.

Parameters:
  • optim_type (str, optional) – The type of optimizer to use. Can be set to "adam" (default), "adamw", or "sgd".

  • params2optimize (str or iterable, optional) – The parameters to optimize. If set to "default", all model parameters that require gradients will be optimized. Default is "default".

  • optim_param_dict (dict, optional) – The parameters to pass to the optimizer constructor as a dictionary, by default {"lr": 1e-3}.

  • add_optim (bool, optional) – If True, adds the optimizer to the classifier object, by default True.

Returns:

optimizer – The initialized optimizer. Only returned if add_optim is set to False.

Return type:

torch.optim.Optimizer

Notes

If add_optim is True, the optimizer will be added to object.

Note that the first argument of an optimizer is parameters to optimize, e.g. params2optimize = model_ft.parameters():

  • model_ft.parameters(): all parameters are being optimized

  • model_ft.fc.parameters(): only parameters of final layer are being optimized

Here, we use:

filter(lambda p: p.requires_grad, self.model.parameters())
add_optimizer(optimizer)

Add an optimizer to the classifier object.

Parameters:

optimizer (torch.optim.Optimizer) – The optimizer to add to the classifier object.

Return type:

None

initialize_scheduler(scheduler_type='steplr', scheduler_param_dict=None, add_scheduler=True)

Initializes a learning rate scheduler for the optimizer and adds it to the classifier object.

Parameters:
  • scheduler_type (str, optional) – The type of learning rate scheduler to use. Can be either "steplr" (default) or "onecyclelr".

  • scheduler_param_dict (dict, optional) – The parameters to pass to the scheduler constructor, by default {"step_size": 10, "gamma": 0.1}.

  • add_scheduler (bool, optional) – If True, adds the scheduler to the classifier object, by default True.

Raises:

ValueError – If the specified scheduler_type is not implemented.

Returns:

scheduler – The initialized learning rate scheduler. Only returned if add_scheduler is set to False.

Return type:

torch.optim.lr_scheduler._LRScheduler

add_scheduler(scheduler)

Add a scheduler to the classifier object.

Parameters:

scheduler (torch.optim.lr_scheduler._LRScheduler) – The scheduler to add to the classifier object.

Raises:

ValueError – If no optimizer has been set. Use initialize_optimizer or add_optimizer to set an optimizer first.

Return type:

None

add_criterion(criterion='cross entropy')

Add a loss criterion to the classifier object.

Parameters:

criterion (str or torch.nn.modules.loss._Loss) – The loss criterion to add to the classifier object. Accepted string values are “cross entropy” or “ce” (cross-entropy), “bce” (binary cross-entropy) and “mse” (mean squared error).

Returns:

The function only modifies the criterion attribute of the classifier and does not return anything.

Return type:

None

model_summary(input_size=None, trainable_col=False, **kwargs)

Print a summary of the model.

Parameters:
  • input_size (tuple or list, optional) – The size of the input data. If None, input size is taken from “train” dataloader (self.dataloaders["train"]).

  • trainable_col (bool, optional) – If True, adds a column showing which parameters are trainable. Defaults to False.

  • **kwargs (Dict) – Keyword arguments to pass to torchinfo.summary() (see https://github.com/TylerYep/torchinfo).

Return type:

None

Notes

Other ways to check params:

sum(p.numel() for p in myclassifier.model.parameters())
sum(p.numel() for p in myclassifier.model.parameters()
    if p.requires_grad)

And:

for name, param in self.model.named_parameters():
    n = name.split(".")[0].split("_")[0]
    print(name, param.requires_grad)
freeze_layers(layers_to_freeze=None)

Freezes the specified layers in the neural network by setting requires_grad attribute to False for their parameters.

Parameters:

layers_to_freeze (list of str, optional) – List of names of the layers to freeze. If a layer name ends with an asterisk ("*"), then all parameters whose name contains the layer name (excluding the asterisk) are frozen. Otherwise, only the parameters with an exact match to the layer name are frozen. By default, [].

Returns:

The function only modifies the requires_grad attribute of the specified parameters and does not return anything.

Return type:

None

Notes

Wildcards are accepted in the layers_to_freeze parameter.

unfreeze_layers(layers_to_unfreeze=None)

Unfreezes the specified layers in the neural network by setting requires_grad attribute to True for their parameters.

Parameters:

layers_to_unfreeze (list of str, optional) – List of names of the layers to unfreeze. If a layer name ends with an asterisk ("*"), then all parameters whose name contains the layer name (excluding the asterisk) are unfrozen. Otherwise, only the parameters with an exact match to the layer name are unfrozen. By default, [].

Returns:

The function only modifies the requires_grad attribute of the specified parameters and does not return anything.

Return type:

None

Notes

Wildcards are accepted in the layers_to_unfreeze parameter.

only_keep_layers(only_keep_layers_list=None)

Only keep the specified layers (only_keep_layers_list) for gradient computation during the backpropagation.

Parameters:

only_keep_layers_list (list, optional) – List of layer names to keep. All other layers will have their gradient computation turned off. Default is [].

Returns:

The function only modifies the requires_grad attribute of the specified parameters and does not return anything.

Return type:

None

inference(set_name='infer', verbose=False, print_info_batch_freq=5)

Run inference on a specified dataset (set_name).

Parameters:
  • set_name (str, optional) – The name of the dataset to run inference on, by default "infer".

  • verbose (bool, optional) – Whether to print verbose outputs, by default False.

  • print_info_batch_freq (int, optional) – The frequency of printouts, by default 5.

Return type:

None

Notes

This method calls the mapreader.train.classifier.classifier.train() method with the num_epochs set to 1 and all the other parameters specified in the function arguments.

train_component_summary()

Print a summary of the optimizer, criterion, and trainable model components.

Returns:

None

Return type:

None

train(phases=None, num_epochs=25, save_model_dir='models', verbose=False, tensorboard_path=None, tmp_file_save_freq=2, remove_after_load=True, print_info_batch_freq=5)

Train the model on the specified phases for a given number of epochs.

Wrapper function for mapreader.train.classifier.classifier.train_core() method to capture exceptions (KeyboardInterrupt is the only supported exception currently).

Parameters:
  • phases (list of str, optional) – The phases to run through during each training iteration. Default is ["train", "val"].

  • num_epochs (int, optional) – The number of epochs to train the model for. Default is 25.

  • save_model_dir (str or None, optional) – The directory to save the model in. Default is "models". If set to None, the model is not saved.

  • verbose (int, optional) – Whether to print verbose outputs, by default False.

  • tensorboard_path (str or None, optional) – The path to the directory to save TensorBoard logs in. If set to None, no TensorBoard logs are saved. Default is None.

  • tmp_file_save_freq (int, optional) – The frequency (in epochs) to save a temporary file of the model. Default is 2. If set to 0 or None, no temporary file is saved.

  • remove_after_load (bool, optional) – Whether to remove the temporary file after loading it. Default is True.

  • print_info_batch_freq (int, optional) – The frequency (in batches) to print training information. Default is 5. If set to 0 or None, no training information is printed.

Returns:

The function saves the model to the save_model_dir directory, and optionally to a temporary file. If interrupted with a KeyboardInterrupt, the function tries to load the temporary file. If no temporary file is found, it continues without loading.

Return type:

None

Notes

Refer to the documentation of mapreader.train.classifier.classifier.train_core() for more information.

train_core(phases=None, num_epochs=25, save_model_dir='models', verbose=False, tensorboard_path=None, tmp_file_save_freq=2, print_info_batch_freq=5)

Trains/fine-tunes a classifier for the specified number of epochs on the given phases using the specified hyperparameters.

Parameters:
  • phases (list of str, optional) – The phases to run through during each training iteration. Default is ["train", "val"].

  • num_epochs (int, optional) – The number of epochs to train the model for. Default is 25.

  • save_model_dir (str or None, optional) – The directory to save the model in. Default is "models". If set to None, the model is not saved.

  • verbose (bool, optional) – Whether to print verbose outputs, by default False.

  • tensorboard_path (str or None, optional) – The path to the directory to save TensorBoard logs in. If set to None, no TensorBoard logs are saved. Default is None.

  • tmp_file_save_freq (int, optional) – The frequency (in epochs) to save a temporary file of the model. Default is 2. If set to 0 or None, no temporary file is saved.

  • print_info_batch_freq (int, optional) – The frequency (in batches) to print training information. Default is 5. If set to 0 or None, no training information is printed.

Raises:
  • ValueError

    If the criterion is not set. Use the add_criterion method to set the criterion.

    If the optimizer is not set and the phase is “train”. Use the initialize_optimizer or add_optimizer method to set the optimizer.

  • KeyError – If the specified phase cannot be found in the keys of the object’s dataloaders dictionary property.

Return type:

None

calculate_add_metrics(y_true, y_pred, y_score, phase, epoch=-1, tboard_writer=None)

Calculate and add metrics to the classifier’s metrics dictionary.

Parameters:
  • y_true (array-like of shape (n_samples,)) – True binary labels or multiclass labels. Can be considered ground truth or (correct) target values.

  • y_pred (array-like of shape (n_samples,)) – Predicted binary labels or multiclass labels. The estimated targets as returned by a classifier.

  • y_score (array-like of shape (n_samples, n_classes)) – Predicted probabilities for each class. Only required when y_pred is not binary.

  • phase (str) – Name of the current phase, typically "train" or "val". See train function.

  • epoch (int, optional) – Current epoch number. Default is -1.

  • tboard_writer (object, optional) – TensorBoard SummaryWriter object to write the metrics. Default is None.

Return type:

None

Notes

This method uses both the sklearn.metrics.precision_recall_fscore_support and sklearn.metrics.roc_auc_score functions from scikit-learn to calculate the metrics for each average type ("micro", "macro" and "weighted"). The results are then added to the metrics dictionary. It also writes the metrics to the TensorBoard SummaryWriter, if tboard_writer is not None.

plot_metric(y_axis, y_label, legends, x_axis='epoch', x_label='epoch', colors=5 * ['k', 'tab:red'], styles=10 * ['-'], markers=10 * ['o'], figsize=(10, 5), plt_yrange=None, plt_xrange=None)

Plot the metrics of the classifier object.

Parameters:
  • y_axis (list of str) – A list of metric names to be plotted on the y-axis.

  • y_label (str) – The label for the y-axis.

  • legends (list of str) – The legend labels for each metric.

  • x_axis (str, optional) – The metric to be used as the x-axis. Can be "epoch" (default) or any other metric name present in the dataset.

  • x_label (str, optional) – The label for the x-axis. Defaults to "epoch".

  • colors (list of str, optional) – The colors to be used for the lines of each metric. It must be at least the same size as y_axis. Defaults to 5 * ["k", "tab:red"].

  • styles (list of str, optional) – The line styles to be used for the lines of each metric. It must be at least the same size as y_axis. Defaults to 10 * ["-"].

  • markers (list of str, optional) – The markers to be used for the lines of each metric. It must be at least the same size as y_axis. Defaults to 10 * ["o"].

  • figsize (tuple of int, optional) – The size of the figure in inches. Defaults to (10, 5).

  • plt_yrange (tuple of float, optional) – The range of values for the y-axis. Defaults to None.

  • plt_xrange (tuple of float, optional) – The range of values for the x-axis. Defaults to None.

Return type:

None

Notes

This function requires the matplotlib package.

show_sample(set_name='train', batch_number=1, print_batch_info=True, figsize=(15, 10))

Displays a sample of training or validation data in a grid format with their corresponding class labels.

Parameters:
  • set_name (str, optional) – Name of the dataset ("train"/"validation") to display the sample from, by default "train".

  • batch_number (int, optional) – Which batch to display, by default 1.

  • print_batch_info (bool, optional) – Whether to print information about the batch size, by default True.

  • figsize (tuple, optional) – Figure size (width, height) in inches, by default (15, 10).

Returns:

Displays the sample images with their corresponding class labels.

Return type:

None

Raises:

StopIteration – If the specified number of batches to display exceeds the total number of batches in the dataset.

Notes

This method uses the dataloader of the ImageClassifierData class and the torchvision.utils.make_grid function to display the sample data in a grid format. It also calls the _imshow method of the ImageClassifierData class to show the sample data.

print_batch_info(set_name='train')

Print information about a dataset’s batches, samples, and batch-size.

Parameters:

set_name (str, optional) – Name of the dataset to display batch information for (default is "train").

Return type:

None

show_inference_sample_results(label, num_samples=6, set_name='test', min_conf=None, max_conf=None, figsize=(15, 15))

Shows a sample of the results of the inference.

Parameters:
  • label (str, optional) – The label for which to display results.

  • num_samples (int, optional) – The number of sample results to display. Defaults to 6.

  • set_name (str, optional) – The name of the dataset split to use for inference. Defaults to "test".

  • min_conf (float, optional) – The minimum confidence score for a sample result to be displayed. Samples with lower confidence scores will be skipped. Defaults to None.

  • max_conf (float, optional) – The maximum confidence score for a sample result to be displayed. Samples with higher confidence scores will be skipped. Defaults to None.

  • figsize (tuple[int, int], optional) – Figure size (width, height) in inches, displaying the sample results. Defaults to (15, 15).

Return type:

None

save(save_path='default.obj', force=False)

Save the object to a file.

Parameters:
  • save_path (str, optional) – The path to the file to write. If the file already exists and force is not True, a FileExistsError is raised. Defaults to "default.obj".

  • force (bool, optional) – Whether to overwrite the file if it already exists. Defaults to False.

Raises:

FileExistsError – If the file already exists and force is not True.

Return type:

None

Notes

The object is saved in two parts. First, a serialized copy of the object’s dictionary is written to the specified file using the joblib.dump function. The object’s model attribute is excluded from this dictionary and saved separately using the torch.save function, with a filename derived from the original save_path.

save_predictions(set_name, save_path=None, delimiter=',')
Parameters:
  • set_name (str)

  • save_path (str | None)

  • delimiter (str)

load_dataset(dataset, set_name, batch_size=16, sampler=None, shuffle=False, num_workers=0, **kwargs)

Creates a DataLoader from a PatchDataset and adds it to the dataloaders dictionary.

Parameters:
  • dataset (PatchDataset) – The dataset to add

  • set_name (str) – The name to use for the dataset

  • batch_size (Optional[int], optional) – The batch size to use when creating the DataLoader, by default 16

  • sampler (Optional[Union[Sampler, None]], optional) – The sampler to use when creating the DataLoader, by default None

  • shuffle (Optional[bool], optional) – Whether to shuffle the PatchDataset, by default False

  • num_workers (Optional[int], optional) – The number of worker threads to use for loading data, by default 0.

Return type:

None

load(load_path, force_device=False)

This function loads the state of a class instance from a saved file using the joblib library. It also loads a PyTorch model from a separate file and maps it to the device used to load the class instance.

Parameters:
  • load_path (str) – Path to the saved file to load.

  • force_device (bool or str, optional) – Whether to force the use of a specific device, or the name of the device to use. If set to True, the default device is used. Defaults to False.

Raises:

FileNotFoundError – If the specified file does not exist.

Return type:

None

cprint(type_info, bc_color, text)

Print colored text with additional information.

Parameters:
  • type_info (str) – The type of message to display.

  • bc_color (str) – The color to use for the message text.

  • text (str) – The text to display.

Returns:

The colored message is displayed on the standard output stream.

Return type:

None

update_progress(progress, text='', barLength=30)

Update the progress bar.

Parameters:
  • progress (float or int) – The progress value to display, between 0 and 1. If an integer is provided, it will be converted to a float. If a value outside the range [0, 1] is provided, it will be clamped to the nearest valid value.

  • text (str, optional) – Additional text to display after the progress bar, defaults to "".

  • barLength (int, optional) – The length of the progress bar in characters, defaults to 30.

Raises:

TypeError – If progress is not a floating point value or an integer.

Returns:

The progress bar is displayed on the standard output stream.

Return type:

None