mapreader.classify.classifier ============================= .. py:module:: mapreader.classify.classifier Classes ------- .. autoapisummary:: mapreader.classify.classifier.ClassifierContainer Module Contents --------------- .. py:class:: ClassifierContainer(model = None, labels_map = None, dataloaders = None, device = 'default', input_size = (224, 224), is_inception = False, load_path = None, force_device = False, huggingface = False, **kwargs) A class to store and train a PyTorch model. :param model: The PyTorch model to add to the object. - If passed as a string, will run ``_initialize_model(model, **kwargs)``. See https://pytorch.org/vision/0.8/models.html for options. - Must be ``None`` if ``load_path`` is specified as model will be loaded from file. :type model: str, nn.Module or None :param labels_map: A dictionary containing the mapping of each label index to its label, with indices as keys and labels as values (i.e. idx: label). Can only be ``None`` if ``load_path`` is specified as labels_map will be loaded from file. :type labels_map: Dict or None :param dataloaders: A dictionary containing set names as keys and dataloaders as values (i.e. set_name: dataloader). :type dataloaders: Dict or None :param device: The device to be used for training and storing models. Can be set to "default", "cpu", "cuda:0", etc. By default, "default". :type device: str, optional :param input_size: The expected input size of the model. Default is ``(224,224)``. :type input_size: int, optional :param is_inception: Whether the model is an Inception-style model. Default is ``False``. :type is_inception: bool, optional :param load_path: The path to an ``.obj`` file containing a :type load_path: str, optional :param force_device: Whether to force the use of a specific device. If set to ``True``, the default device is used. Defaults to ``False``. :type force_device: bool, optional :param kwargs: Keyword arguments to pass to the :meth:`~.classify.classifier.ClassifierContainer._initialize_model` method (if passing ``model`` as a string). :type kwargs: Dict .. attribute:: device The device being used for training and storing models. :type: torch.device .. attribute:: dataloaders A dictionary to store dataloaders for the model. :type: dict .. attribute:: labels_map A dictionary mapping label indices to their labels. :type: dict .. attribute:: dataset_sizes A dictionary to store sizes of datasets for the model. :type: dict .. attribute:: model The model. :type: torch.nn.Module .. attribute:: input_size The size of the input to the model. :type: Tuple of int .. attribute:: is_inception A flag indicating if the model is an Inception model. :type: bool .. attribute:: optimizer The optimizer being used for training the model. :type: None or torch.optim.Optimizer .. attribute:: scheduler The learning rate scheduler being used for training the model. :type: None or torch.optim.lr_scheduler._LRScheduler .. attribute:: loss_fn The loss function to use for training the model. :type: None or nn.modules.loss._Loss .. attribute:: metrics A dictionary to store the metrics computed during training. :type: dict .. attribute:: last_epoch The last epoch number completed during training. :type: int .. attribute:: best_loss The best validation loss achieved during training. :type: torch.Tensor .. attribute:: best_epoch The epoch in which the best validation loss was achieved during training. :type: int .. attribute:: tmp_save_filename A temporary file name to save checkpoints during training and validation. :type: str .. py:method:: generate_layerwise_lrs(min_lr, max_lr, spacing = 'linspace') Calculates layer-wise learning rates for a given set of model parameters. :param min_lr: The minimum learning rate to be used. :type min_lr: float :param max_lr: The maximum learning rate to be used. :type max_lr: float :param spacing: The type of sequence to use for spacing the specified interval learning rates. Can be either ``"linspace"`` or ``"geomspace"``, where `"linspace"` uses evenly spaced learning rates over a specified interval and `"geomspace"` uses learning rates spaced evenly on a log scale (a geometric progression). By default ``"linspace"``. :type spacing: str, optional :returns: A list of dictionaries containing the parameters and learning rates for each layer. :rtype: list of dicts .. py:method:: initialize_optimizer(optim_type = 'adam', params2optimize = 'default', optim_param_dict = None) Initializes an optimizer for the model and adds it to the classifier object. :param optim_type: The type of optimizer to use. Can be set to ``"adam"`` (default), ``"adamw"``, or ``"sgd"``. :type optim_type: str, optional :param params2optimize: The parameters to optimize. If set to ``"default"``, all model parameters that require gradients will be optimized. Default is ``"default"``. :type params2optimize: str or iterable, optional :param optim_param_dict: The parameters to pass to the optimizer constructor as a dictionary, by default ``{"lr": 1e-3}``. :type optim_param_dict: dict, optional .. rubric:: Notes Note that the first argument of an optimizer is parameters to optimize, e.g. ``params2optimize = model_ft.parameters()``: - ``model_ft.parameters()``: all parameters are being optimized - ``model_ft.fc.parameters()``: only parameters of final layer are being optimized Here, we use: .. code-block:: python filter(lambda p: p.requires_grad, self.model.parameters()) .. py:method:: add_optimizer(optimizer) Add an optimizer to the classifier object. :param optimizer: The optimizer to add to the classifier object. :type optimizer: torch.optim.Optimizer :rtype: None .. py:method:: initialize_scheduler(scheduler_type = 'steplr', scheduler_param_dict = None) Initializes a learning rate scheduler for the optimizer and adds it to the classifier object. Only `StepLR` is implemented - otherwise use `torch.optim.lr_scheduler` directly and the `add_scheduler` method. :param scheduler_type: The type of learning rate scheduler to use. Default is ``"steplr"``. :type scheduler_type: str, optional :param scheduler_param_dict: The parameters to pass to the scheduler constructor, by default ``{"step_size": 10, "gamma": 0.1}``. :type scheduler_param_dict: dict, optional :raises ValueError: If the specified ``scheduler_type`` is not implemented. .. py:method:: add_scheduler(scheduler) Add a scheduler to the classifier object. Note that during training, `scheduler.step()` is called after each epoch - i.e. do not use schedulers that should be stepped after each batch! :param scheduler: The scheduler to add to the classifier object. :type scheduler: torch.optim.lr_scheduler._LRScheduler :raises ValueError: If no optimizer has been set. Use ``initialize_optimizer`` or ``add_optimizer`` to set an optimizer first. :rtype: None .. py:method:: add_loss_fn(loss_fn = 'cross entropy') Add a loss function to the classifier object. :param loss_fn: The loss function to add to the classifier object. Accepted string values are "cross entropy" or "ce" (cross-entropy), "bce" (binary cross-entropy) and "mse" (mean squared error). :type loss_fn: str or torch.nn.modules.loss._Loss :returns: The function only modifies the ``loss_fn`` attribute of the classifier and does not return anything. :rtype: None .. py:method:: model_summary(input_size = None, trainable_col = False, **kwargs) Print a summary of the model. :param input_size: The size of the input data. If None, input size is taken from "train" dataloader (``self.dataloaders["train"]``). :type input_size: tuple or list, optional :param trainable_col: If ``True``, adds a column showing which parameters are trainable. Defaults to ``False``. :type trainable_col: bool, optional :param \*\*kwargs: Keyword arguments to pass to ``torchinfo.summary()`` (see https://github.com/TylerYep/torchinfo). :type \*\*kwargs: Dict .. rubric:: Notes Other ways to check params: .. code-block:: python sum(p.numel() for p in myclassifier.model.parameters()) .. code-block:: python sum(p.numel() for p in myclassifier.model.parameters() if p.requires_grad) And: .. code-block:: python for name, param in self.model.named_parameters(): n = name.split(".")[0].split("_")[0] print(name, param.requires_grad) .. py:method:: freeze_layers(layers_to_freeze) Freezes the specified layers in the neural network by setting ``requires_grad`` attribute to False for their parameters. :param layers_to_freeze: List of names of the layers to freeze. If a layer name ends with an asterisk (``"*"``), then all parameters whose name contains the layer name (excluding the asterisk) are frozen. Otherwise, only the parameters with an exact match to the layer name are frozen. :type layers_to_freeze: list of str :returns: The function only modifies the ``requires_grad`` attribute of the specified parameters and does not return anything. :rtype: None .. rubric:: Notes e.g. ["layer1*", "layer2*"] will freeze all parameters whose name contains "layer1" and "layer2" (excluding the asterisk). e.g. ["layer1", "layer2"] will freeze all parameters with an exact match to "layer1" and "layer2". .. py:method:: unfreeze_layers(layers_to_unfreeze) Unfreezes the specified layers in the neural network by setting ``requires_grad`` attribute to True for their parameters. :param layers_to_unfreeze: List of names of the layers to unfreeze. If a layer name ends with an asterisk (``"*"``), then all parameters whose name contains the layer name (excluding the asterisk) are unfrozen. Otherwise, only the parameters with an exact match to the layer name are unfrozen. :type layers_to_unfreeze: list of str :returns: The function only modifies the ``requires_grad`` attribute of the specified parameters and does not return anything. :rtype: None .. rubric:: Notes e.g. ["layer1*", "layer2*"] will unfreeze all parameters whose name contains "layer1" and "layer2" (excluding the asterisk). e.g. ["layer1", "layer2"] will unfreeze all parameters with an exact match to "layer1" and "layer2". .. py:method:: only_keep_layers(only_keep_layers_list) Only keep the specified layers (``only_keep_layers_list``) for gradient computation during the backpropagation. :param only_keep_layers_list: List of layer names to keep. All other layers will have their gradient computation turned off. :type only_keep_layers_list: list :returns: The function only modifies the ``requires_grad`` attribute of the specified parameters and does not return anything. :rtype: None .. py:method:: inference(set_name = 'infer', verbose = False, print_info_batch_freq = 5) Run inference on a specified dataset (``set_name``). :param set_name: The name of the dataset to run inference on, by default ``"infer"``. :type set_name: str, optional :param verbose: Whether to print verbose outputs, by default False. :type verbose: bool, optional :param print_info_batch_freq: The frequency of printouts, by default ``5``. :type print_info_batch_freq: int, optional :rtype: None .. rubric:: Notes This method calls the :meth:`~.train.classifier.classifier.train` method with the ``num_epochs`` set to ``1`` and all the other parameters specified in the function arguments. .. py:method:: train_component_summary() Print a summary of the optimizer, loss function, and trainable model components. Returns: -------- None .. py:method:: train(phases = None, num_epochs = 25, save_model_dir = 'models', verbose = False, tensorboard_path = None, tmp_file_save_freq = 2, remove_after_load = True, print_info_batch_freq = 5) Train the model on the specified phases for a given number of epochs. Wrapper function for :meth:`~.train.classifier.classifier.train_core` method to capture exceptions (``KeyboardInterrupt`` is the only supported exception currently). :param phases: The phases to run through during each training iteration. Default is ``["train", "val"]``. :type phases: list of str, optional :param num_epochs: The number of epochs to train the model for. Default is ``25``. :type num_epochs: int, optional :param save_model_dir: The directory to save the model in. Default is ``"models"``. If set to ``None``, the model is not saved. :type save_model_dir: str or None, optional :param verbose: Whether to print verbose outputs, by default ``False``. :type verbose: int, optional :param tensorboard_path: The path to the directory to save TensorBoard logs in. If set to ``None``, no TensorBoard logs are saved. Default is ``None``. :type tensorboard_path: str or None, optional :param tmp_file_save_freq: The frequency (in epochs) to save a temporary file of the model. Default is ``2``. If set to ``0`` or ``None``, no temporary file is saved. :type tmp_file_save_freq: int, optional :param remove_after_load: Whether to remove the temporary file after loading it. Default is ``True``. :type remove_after_load: bool, optional :param print_info_batch_freq: The frequency (in batches) to print training information. Default is ``5``. If set to ``0`` or ``None``, no training information is printed. :type print_info_batch_freq: int, optional :returns: The function saves the model to the ``save_model_dir`` directory, and optionally to a temporary file. If interrupted with a ``KeyboardInterrupt``, the function tries to load the temporary file. If no temporary file is found, it continues without loading. :rtype: None .. rubric:: Notes Refer to the documentation of :meth:`~.train.classifier.classifier.train_core` for more information. .. py:method:: train_core(phases = None, num_epochs = 25, save_model_dir = 'models', verbose = False, tensorboard_path = None, tmp_file_save_freq = 2, print_info_batch_freq = 5) Trains/fine-tunes a classifier for the specified number of epochs on the given phases using the specified hyperparameters. :param phases: The phases to run through during each training iteration. Default is ``["train", "val"]``. :type phases: list of str, optional :param num_epochs: The number of epochs to train the model for. Default is ``25``. :type num_epochs: int, optional :param save_model_dir: The directory to save the model in. Default is ``"models"``. If set to ``None``, the model is not saved. :type save_model_dir: str or None, optional :param verbose: Whether to print verbose outputs, by default ``False``. :type verbose: bool, optional :param tensorboard_path: The path to the directory to save TensorBoard logs in. If set to ``None``, no TensorBoard logs are saved. Default is ``None``. :type tensorboard_path: str or None, optional :param tmp_file_save_freq: The frequency (in epochs) to save a temporary file of the model. Default is ``2``. If set to ``0`` or ``None``, no temporary file is saved. :type tmp_file_save_freq: int, optional :param print_info_batch_freq: The frequency (in batches) to print training information. Default is ``5``. If set to ``0`` or ``None``, no training information is printed. :type print_info_batch_freq: int, optional :raises ValueError: If the loss function is not set. Use the :meth:`~.classify.classifier.ClassifierContainer.add_loss_fn` method to set the loss function. If the optimizer is not set and the phase is "train". Use the :meth:`~.classify.classifier.ClassifierContainer.initialize_optimizer` or :meth:`~.classify.classifier.ClassifierContainer.add_optimizer` method to set the optimizer. :raises KeyError: If the specified phase cannot be found in the keys of the object's :attr:`~.classify.classifier.ClassifierContainer.dataloaders` dictionary property. :rtype: None .. py:method:: calculate_add_metrics(y_true, y_pred, y_score, phase, epoch = -1, tboard_writer=None) Calculate and add metrics to the classifier's metrics dictionary. :param y_true: True binary labels or multiclass labels. Can be considered ground truth or (correct) target values. :type y_true: 1d array-like of shape (n_samples,) :param y_pred: Predicted binary labels or multiclass labels. The estimated targets as returned by a classifier. :type y_pred: 1d array-like of shape (n_samples,) :param y_score: Predicted probabilities for each class. :type y_score: array-like of shape (n_samples, n_classes) :param phase: Name of the current phase, typically ``"train"`` or ``"val"``. See ``train`` function. :type phase: str :param epoch: Current epoch number. Default is ``-1``. :type epoch: int, optional :param tboard_writer: TensorBoard SummaryWriter object to write the metrics. Default is ``None``. :type tboard_writer: object, optional :rtype: None .. rubric:: Notes This method uses both the :func:`sklearn.metrics.precision_recall_fscore_support` and :func:`sklearn.metrics.roc_auc_score` functions from ``scikit-learn`` to calculate the metrics for each average type (``"micro"``, ``"macro"`` and ``"weighted"``). The results are then added to the ``metrics`` dictionary. It also writes the metrics to the TensorBoard SummaryWriter, if ``tboard_writer`` is not None. .. py:method:: list_metrics(phases = 'all') Prints the available metrics for the specified phases. :param phases: The phases to find metrics for, by default "all" :type phases: str | list[str], optional .. py:method:: plot_metric(metrics, phases = 'all', colors = None, figsize = (10, 5), plt_yrange = None, plt_xrange = None) Plot the metrics of the classifier object. :param metrics: A string of list of strings containing metric names to be plotted on the y-axis. :type metrics: str or list of str :param phases: The phases for which the metric is to be plotted. Defaults to ``"all"``. :type phases: str or list of str, optional :param colors: Colors to be used for the lines of each metric. Length must be at least the length of the number of phases being plotted (``phases``). If None, will use the default matplotlib colors. Defaults to ``None``. :type colors: list of str, optional :param figsize: The size of the figure in inches. Defaults to ``(10, 5)``. :type figsize: tuple of int, optional :param plt_yrange: The range of values for the y-axis. Defaults to ``None``. :type plt_yrange: tuple of float, optional :param plt_xrange: The range of values for the x-axis. Defaults to ``None``. :type plt_xrange: tuple of float, optional .. py:method:: print_batch_info(set_name = 'train') Print information about a dataset's batches, samples, and batch-size. :param set_name: Name of the dataset to display batch information for (default is ``"train"``). :type set_name: str, optional :rtype: None .. py:method:: show_inference_sample_results(label, num_samples = 6, set_name = 'test', min_conf = None, max_conf = None, figsize = (15, 15)) Shows a sample of the results of the inference with current model. :param label: The label for which to display results. :type label: str, optional :param num_samples: The number of sample results to display. Defaults to ``6``. :type num_samples: int, optional :param set_name: The name of the dataset split to use for inference. Defaults to ``"test"``. :type set_name: str, optional :param min_conf: The minimum confidence score for a sample result to be displayed. Samples with lower confidence scores will be skipped. Defaults to ``None``. :type min_conf: float, optional :param max_conf: The maximum confidence score for a sample result to be displayed. Samples with higher confidence scores will be skipped. Defaults to ``None``. :type max_conf: float, optional :param figsize: Figure size (width, height) in inches, displaying the sample results. Defaults to ``(15, 15)``. :type figsize: tuple[int, int], optional :rtype: None .. py:method:: save(save_path = 'default.obj', force = False) Save the object to a file. :param save_path: The path to the file to write. If the file already exists and ``force`` is not ``True``, a ``FileExistsError`` is raised. Defaults to ``"default.obj"``. :type save_path: str, optional :param force: Whether to overwrite the file if it already exists. Defaults to ``False``. :type force: bool, optional :raises FileExistsError: If the file already exists and ``force`` is not ``True``. .. rubric:: Notes The object is saved in two parts. First, a serialized copy of the object's dictionary is written to the specified file using the ``joblib.dump`` function. The object's ``model`` attribute is excluded from this dictionary and saved separately using the ``torch.save`` function, with a filename derived from the original ``save_path``. .. py:method:: save_predictions(set_name, save_path = None, delimiter = ',') .. py:method:: load_dataset(dataset, set_name, batch_size = 16, sampler = None, shuffle = False, num_workers = 0, **kwargs) Creates a DataLoader from a PatchDataset and adds it to the ``dataloaders`` dictionary. :param dataset: The dataset to add :type dataset: PatchDataset :param set_name: The name to use for the dataset :type set_name: str :param batch_size: The batch size to use when creating the DataLoader, by default 16 :type batch_size: Optional[int], optional :param sampler: The sampler to use when creating the DataLoader, by default None :type sampler: Optional[Union[Sampler, None]], optional :param shuffle: Whether to shuffle the PatchDataset, by default False :type shuffle: Optional[bool], optional :param num_workers: The number of worker threads to use for loading data, by default 0. :type num_workers: Optional[int], optional .. py:method:: load(load_path, force_device = False) This function loads the state of a class instance from a saved file using the joblib library. It also loads a PyTorch model from a separate file and maps it to the device used to load the class instance. :param load_path: Path to the saved file to load. :type load_path: str :param force_device: Whether to force the use of a specific device, or the name of the device to use. If set to ``True``, the default device is used. Defaults to ``False``. :type force_device: bool or str, optional :raises FileNotFoundError: If the specified file does not exist. :rtype: None .. py:method:: cprint(type_info, bc_color, text) Print colored text with additional information. :param type_info: The type of message to display. :type type_info: str :param bc_color: The color to use for the message text. :type bc_color: str :param text: The text to display. :type text: str :returns: The colored message is displayed on the standard output stream. :rtype: None .. py:method:: update_progress(progress, text = '', barLength = 30) Update the progress bar. :param progress: The progress value to display, between ``0`` and ``1``. If an integer is provided, it will be converted to a float. If a value outside the range ``[0, 1]`` is provided, it will be clamped to the nearest valid value. :type progress: float or int :param text: Additional text to display after the progress bar, defaults to ``""``. :type text: str, optional :param barLength: The length of the progress bar in characters, defaults to ``30``. :type barLength: int, optional :raises TypeError: If progress is not a floating point value or an integer. :returns: The progress bar is displayed on the standard output stream. :rtype: None