mapreader.classify.classifier
Module Contents
Classes
- class mapreader.classify.classifier.ClassifierContainer(model, labels_map, dataloaders=None, device='default', input_size=(224, 224), is_inception=False, load_path=None, force_device=False, **kwargs)
- Parameters:
model (str | torch.nn.Module | None)
labels_map (dict[int, str] | None)
dataloaders (dict[str, torch.utils.data.DataLoader] | None)
device (str | None)
input_size (int | None)
is_inception (bool)
load_path (str | None)
force_device (bool | None)
- generate_layerwise_lrs(min_lr, max_lr, spacing='linspace')
Calculates layer-wise learning rates for a given set of model parameters.
- Parameters:
min_lr (float) – The minimum learning rate to be used.
max_lr (float) – The maximum learning rate to be used.
spacing (str, optional) – The type of sequence to use for spacing the specified interval learning rates. Can be either
"linspace"
or"geomspace"
, where “linspace” uses evenly spaced learning rates over a specified interval and “geomspace” uses learning rates spaced evenly on a log scale (a geometric progression). By default"linspace"
.
- Returns:
A list of dictionaries containing the parameters and learning rates for each layer.
- Return type:
list of dicts
- initialize_optimizer(optim_type='adam', params2optimize='default', optim_param_dict=None, add_optim=True)
Initializes an optimizer for the model and adds it to the classifier object.
- Parameters:
optim_type (str, optional) – The type of optimizer to use. Can be set to
"adam"
(default),"adamw"
, or"sgd"
.params2optimize (str or iterable, optional) – The parameters to optimize. If set to
"default"
, all model parameters that require gradients will be optimized. Default is"default"
.optim_param_dict (dict, optional) – The parameters to pass to the optimizer constructor as a dictionary, by default
{"lr": 1e-3}
.add_optim (bool, optional) – If
True
, adds the optimizer to the classifier object, by defaultTrue
.
- Returns:
optimizer – The initialized optimizer. Only returned if
add_optim
is set toFalse
.- Return type:
torch.optim.Optimizer
Notes
If
add_optim
is True, the optimizer will be added to object.Note that the first argument of an optimizer is parameters to optimize, e.g.
params2optimize = model_ft.parameters()
:model_ft.parameters()
: all parameters are being optimizedmodel_ft.fc.parameters()
: only parameters of final layer are being optimized
Here, we use:
filter(lambda p: p.requires_grad, self.model.parameters())
- add_optimizer(optimizer)
Add an optimizer to the classifier object.
- Parameters:
optimizer (torch.optim.Optimizer) – The optimizer to add to the classifier object.
- Return type:
None
- initialize_scheduler(scheduler_type='steplr', scheduler_param_dict=None, add_scheduler=True)
Initializes a learning rate scheduler for the optimizer and adds it to the classifier object.
- Parameters:
scheduler_type (str, optional) – The type of learning rate scheduler to use. Can be either
"steplr"
(default) or"onecyclelr"
.scheduler_param_dict (dict, optional) – The parameters to pass to the scheduler constructor, by default
{"step_size": 10, "gamma": 0.1}
.add_scheduler (bool, optional) – If
True
, adds the scheduler to the classifier object, by defaultTrue
.
- Raises:
ValueError – If the specified
scheduler_type
is not implemented.- Returns:
scheduler – The initialized learning rate scheduler. Only returned if
add_scheduler
is set to False.- Return type:
torch.optim.lr_scheduler._LRScheduler
- add_scheduler(scheduler)
Add a scheduler to the classifier object.
- Parameters:
scheduler (torch.optim.lr_scheduler._LRScheduler) – The scheduler to add to the classifier object.
- Raises:
ValueError – If no optimizer has been set. Use
initialize_optimizer
oradd_optimizer
to set an optimizer first.- Return type:
None
- add_criterion(criterion='cross entropy')
Add a loss criterion to the classifier object.
- Parameters:
criterion (str or torch.nn.modules.loss._Loss) – The loss criterion to add to the classifier object. Accepted string values are “cross entropy” or “ce” (cross-entropy), “bce” (binary cross-entropy) and “mse” (mean squared error).
- Returns:
The function only modifies the
criterion
attribute of the classifier and does not return anything.- Return type:
None
- model_summary(input_size=None, trainable_col=False, **kwargs)
Print a summary of the model.
- Parameters:
input_size (tuple or list, optional) – The size of the input data. If None, input size is taken from “train” dataloader (
self.dataloaders["train"]
).trainable_col (bool, optional) – If
True
, adds a column showing which parameters are trainable. Defaults toFalse
.**kwargs (Dict) – Keyword arguments to pass to
torchinfo.summary()
(see https://github.com/TylerYep/torchinfo).
- Return type:
None
Notes
Other ways to check params:
sum(p.numel() for p in myclassifier.model.parameters())
sum(p.numel() for p in myclassifier.model.parameters() if p.requires_grad)
And:
for name, param in self.model.named_parameters(): n = name.split(".")[0].split("_")[0] print(name, param.requires_grad)
- freeze_layers(layers_to_freeze=None)
Freezes the specified layers in the neural network by setting
requires_grad
attribute to False for their parameters.- Parameters:
layers_to_freeze (list of str, optional) – List of names of the layers to freeze. If a layer name ends with an asterisk (
"*"
), then all parameters whose name contains the layer name (excluding the asterisk) are frozen. Otherwise, only the parameters with an exact match to the layer name are frozen. By default,[]
.- Returns:
The function only modifies the
requires_grad
attribute of the specified parameters and does not return anything.- Return type:
None
Notes
Wildcards are accepted in the
layers_to_freeze
parameter.
- unfreeze_layers(layers_to_unfreeze=None)
Unfreezes the specified layers in the neural network by setting
requires_grad
attribute to True for their parameters.- Parameters:
layers_to_unfreeze (list of str, optional) – List of names of the layers to unfreeze. If a layer name ends with an asterisk (
"*"
), then all parameters whose name contains the layer name (excluding the asterisk) are unfrozen. Otherwise, only the parameters with an exact match to the layer name are unfrozen. By default,[]
.- Returns:
The function only modifies the
requires_grad
attribute of the specified parameters and does not return anything.- Return type:
None
Notes
Wildcards are accepted in the
layers_to_unfreeze
parameter.
- only_keep_layers(only_keep_layers_list=None)
Only keep the specified layers (
only_keep_layers_list
) for gradient computation during the backpropagation.- Parameters:
only_keep_layers_list (list, optional) – List of layer names to keep. All other layers will have their gradient computation turned off. Default is
[]
.- Returns:
The function only modifies the
requires_grad
attribute of the specified parameters and does not return anything.- Return type:
None
- inference(set_name='infer', verbose=False, print_info_batch_freq=5)
Run inference on a specified dataset (
set_name
).- Parameters:
set_name (str, optional) – The name of the dataset to run inference on, by default
"infer"
.verbose (bool, optional) – Whether to print verbose outputs, by default False.
print_info_batch_freq (int, optional) – The frequency of printouts, by default
5
.
- Return type:
None
Notes
This method calls the
mapreader.train.classifier.classifier.train()
method with thenum_epochs
set to1
and all the other parameters specified in the function arguments.
- train_component_summary()
Print a summary of the optimizer, criterion, and trainable model components.
Returns:
None
- Return type:
None
- train(phases=None, num_epochs=25, save_model_dir='models', verbose=False, tensorboard_path=None, tmp_file_save_freq=2, remove_after_load=True, print_info_batch_freq=5)
Train the model on the specified phases for a given number of epochs.
Wrapper function for
mapreader.train.classifier.classifier.train_core()
method to capture exceptions (KeyboardInterrupt
is the only supported exception currently).- Parameters:
phases (list of str, optional) – The phases to run through during each training iteration. Default is
["train", "val"]
.num_epochs (int, optional) – The number of epochs to train the model for. Default is
25
.save_model_dir (str or None, optional) – The directory to save the model in. Default is
"models"
. If set toNone
, the model is not saved.verbose (int, optional) – Whether to print verbose outputs, by default
False
.tensorboard_path (str or None, optional) – The path to the directory to save TensorBoard logs in. If set to
None
, no TensorBoard logs are saved. Default isNone
.tmp_file_save_freq (int, optional) – The frequency (in epochs) to save a temporary file of the model. Default is
2
. If set to0
orNone
, no temporary file is saved.remove_after_load (bool, optional) – Whether to remove the temporary file after loading it. Default is
True
.print_info_batch_freq (int, optional) – The frequency (in batches) to print training information. Default is
5
. If set to0
orNone
, no training information is printed.
- Returns:
The function saves the model to the
save_model_dir
directory, and optionally to a temporary file. If interrupted with aKeyboardInterrupt
, the function tries to load the temporary file. If no temporary file is found, it continues without loading.- Return type:
None
Notes
Refer to the documentation of
mapreader.train.classifier.classifier.train_core()
for more information.
- train_core(phases=None, num_epochs=25, save_model_dir='models', verbose=False, tensorboard_path=None, tmp_file_save_freq=2, print_info_batch_freq=5)
Trains/fine-tunes a classifier for the specified number of epochs on the given phases using the specified hyperparameters.
- Parameters:
phases (list of str, optional) – The phases to run through during each training iteration. Default is
["train", "val"]
.num_epochs (int, optional) – The number of epochs to train the model for. Default is
25
.save_model_dir (str or None, optional) – The directory to save the model in. Default is
"models"
. If set toNone
, the model is not saved.verbose (bool, optional) – Whether to print verbose outputs, by default
False
.tensorboard_path (str or None, optional) – The path to the directory to save TensorBoard logs in. If set to
None
, no TensorBoard logs are saved. Default isNone
.tmp_file_save_freq (int, optional) – The frequency (in epochs) to save a temporary file of the model. Default is
2
. If set to0
orNone
, no temporary file is saved.print_info_batch_freq (int, optional) – The frequency (in batches) to print training information. Default is
5
. If set to0
orNone
, no training information is printed.
- Raises:
ValueError –
If the criterion is not set. Use the
add_criterion
method to set the criterion.If the optimizer is not set and the phase is “train”. Use the
initialize_optimizer
oradd_optimizer
method to set the optimizer.KeyError – If the specified phase cannot be found in the keys of the object’s
dataloaders
dictionary property.
- Return type:
None
- calculate_add_metrics(y_true, y_pred, y_score, phase, epoch=-1, tboard_writer=None)
Calculate and add metrics to the classifier’s metrics dictionary.
- Parameters:
y_true (array-like of shape (n_samples,)) – True binary labels or multiclass labels. Can be considered ground truth or (correct) target values.
y_pred (array-like of shape (n_samples,)) – Predicted binary labels or multiclass labels. The estimated targets as returned by a classifier.
y_score (array-like of shape (n_samples, n_classes)) – Predicted probabilities for each class. Only required when
y_pred
is not binary.phase (str) – Name of the current phase, typically
"train"
or"val"
. Seetrain
function.epoch (int, optional) – Current epoch number. Default is
-1
.tboard_writer (object, optional) – TensorBoard SummaryWriter object to write the metrics. Default is
None
.
- Return type:
None
Notes
This method uses both the
sklearn.metrics.precision_recall_fscore_support
andsklearn.metrics.roc_auc_score
functions fromscikit-learn
to calculate the metrics for each average type ("micro"
,"macro"
and"weighted"
). The results are then added to themetrics
dictionary. It also writes the metrics to the TensorBoard SummaryWriter, iftboard_writer
is not None.
- plot_metric(y_axis, y_label, legends, x_axis='epoch', x_label='epoch', colors=5 * ['k', 'tab:red'], styles=10 * ['-'], markers=10 * ['o'], figsize=(10, 5), plt_yrange=None, plt_xrange=None)
Plot the metrics of the classifier object.
- Parameters:
y_axis (list of str) – A list of metric names to be plotted on the y-axis.
y_label (str) – The label for the y-axis.
legends (list of str) – The legend labels for each metric.
x_axis (str, optional) – The metric to be used as the x-axis. Can be
"epoch"
(default) or any other metric name present in the dataset.x_label (str, optional) – The label for the x-axis. Defaults to
"epoch"
.colors (list of str, optional) – The colors to be used for the lines of each metric. It must be at least the same size as
y_axis
. Defaults to5 * ["k", "tab:red"]
.styles (list of str, optional) – The line styles to be used for the lines of each metric. It must be at least the same size as
y_axis
. Defaults to10 * ["-"]
.markers (list of str, optional) – The markers to be used for the lines of each metric. It must be at least the same size as
y_axis
. Defaults to10 * ["o"]
.figsize (tuple of int, optional) – The size of the figure in inches. Defaults to
(10, 5)
.plt_yrange (tuple of float, optional) – The range of values for the y-axis. Defaults to
None
.plt_xrange (tuple of float, optional) – The range of values for the x-axis. Defaults to
None
.
- Return type:
None
Notes
This function requires the
matplotlib
package.
- show_sample(set_name='train', batch_number=1, print_batch_info=True, figsize=(15, 10))
Displays a sample of training or validation data in a grid format with their corresponding class labels.
- Parameters:
set_name (str, optional) – Name of the dataset (
"train"
/"validation"
) to display the sample from, by default"train"
.batch_number (int, optional) – Which batch to display, by default
1
.print_batch_info (bool, optional) – Whether to print information about the batch size, by default
True
.figsize (tuple, optional) – Figure size (width, height) in inches, by default
(15, 10)
.
- Returns:
Displays the sample images with their corresponding class labels.
- Return type:
None
- Raises:
StopIteration – If the specified number of batches to display exceeds the total number of batches in the dataset.
Notes
This method uses the dataloader of the
ImageClassifierData
class and thetorchvision.utils.make_grid
function to display the sample data in a grid format. It also calls the_imshow
method of theImageClassifierData
class to show the sample data.
- print_batch_info(set_name='train')
Print information about a dataset’s batches, samples, and batch-size.
- Parameters:
set_name (str, optional) – Name of the dataset to display batch information for (default is
"train"
).- Return type:
None
- show_inference_sample_results(label, num_samples=6, set_name='test', min_conf=None, max_conf=None, figsize=(15, 15))
Shows a sample of the results of the inference.
- Parameters:
label (str, optional) – The label for which to display results.
num_samples (int, optional) – The number of sample results to display. Defaults to
6
.set_name (str, optional) – The name of the dataset split to use for inference. Defaults to
"test"
.min_conf (float, optional) – The minimum confidence score for a sample result to be displayed. Samples with lower confidence scores will be skipped. Defaults to
None
.max_conf (float, optional) – The maximum confidence score for a sample result to be displayed. Samples with higher confidence scores will be skipped. Defaults to
None
.figsize (tuple[int, int], optional) – Figure size (width, height) in inches, displaying the sample results. Defaults to
(15, 15)
.
- Return type:
None
- save(save_path='default.obj', force=False)
Save the object to a file.
- Parameters:
save_path (str, optional) – The path to the file to write. If the file already exists and
force
is notTrue
, aFileExistsError
is raised. Defaults to"default.obj"
.force (bool, optional) – Whether to overwrite the file if it already exists. Defaults to
False
.
- Raises:
FileExistsError – If the file already exists and
force
is notTrue
.- Return type:
None
Notes
The object is saved in two parts. First, a serialized copy of the object’s dictionary is written to the specified file using the
joblib.dump
function. The object’smodel
attribute is excluded from this dictionary and saved separately using thetorch.save
function, with a filename derived from the originalsave_path
.
- save_predictions(set_name, save_path=None, delimiter=',')
- Parameters:
set_name (str)
save_path (str | None)
delimiter (str)
- load_dataset(dataset, set_name, batch_size=16, sampler=None, shuffle=False, num_workers=0, **kwargs)
Creates a DataLoader from a PatchDataset and adds it to the
dataloaders
dictionary.- Parameters:
dataset (PatchDataset) – The dataset to add
set_name (str) – The name to use for the dataset
batch_size (Optional[int], optional) – The batch size to use when creating the DataLoader, by default 16
sampler (Optional[Union[Sampler, None]], optional) – The sampler to use when creating the DataLoader, by default None
shuffle (Optional[bool], optional) – Whether to shuffle the PatchDataset, by default False
num_workers (Optional[int], optional) – The number of worker threads to use for loading data, by default 0.
- Return type:
None
- load(load_path, force_device=False)
This function loads the state of a class instance from a saved file using the joblib library. It also loads a PyTorch model from a separate file and maps it to the device used to load the class instance.
- Parameters:
load_path (str) – Path to the saved file to load.
force_device (bool or str, optional) – Whether to force the use of a specific device, or the name of the device to use. If set to
True
, the default device is used. Defaults toFalse
.
- Raises:
FileNotFoundError – If the specified file does not exist.
- Return type:
None
- cprint(type_info, bc_color, text)
Print colored text with additional information.
- Parameters:
type_info (str) – The type of message to display.
bc_color (str) – The color to use for the message text.
text (str) – The text to display.
- Returns:
The colored message is displayed on the standard output stream.
- Return type:
None
- update_progress(progress, text='', barLength=30)
Update the progress bar.
- Parameters:
progress (float or int) – The progress value to display, between
0
and1
. If an integer is provided, it will be converted to a float. If a value outside the range[0, 1]
is provided, it will be clamped to the nearest valid value.text (str, optional) – Additional text to display after the progress bar, defaults to
""
.barLength (int, optional) – The length of the progress bar in characters, defaults to
30
.
- Raises:
TypeError – If progress is not a floating point value or an integer.
- Returns:
The progress bar is displayed on the standard output stream.
- Return type:
None