mapreader.classify.datasets =========================== .. py:module:: mapreader.classify.datasets Attributes ---------- .. autoapisummary:: mapreader.classify.datasets.parhugin_installed Classes ------- .. autoapisummary:: mapreader.classify.datasets.PatchDataset mapreader.classify.datasets.PatchContextDataset Module Contents --------------- .. py:data:: parhugin_installed :value: True .. py:class:: PatchDataset(patch_df, transform, delimiter = ',', patch_paths_col = 'image_path', label_col = None, label_index_col = None, image_mode = 'RGB') Bases: :py:obj:`torch.utils.data.Dataset` A PyTorch Dataset class for loading image patches from a DataFrame. :param patch_df: DataFrame or path to CSV/TSV/geojson file containing the paths to image patches and their labels. :type patch_df: str or pathlib.Path or pandas.DataFrame or gpd.GeoDataFrame :param transform: The transform to use on the image. A string can be used to call default transforms - options are "train", "test" or "val". Alternatively, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose) that takes in an image and performs image transformations can be used. At minimum, transform should be ``torchvision.transforms.ToTensor()``. :type transform: Union[str, transforms.Compose, Callable] :param delimiter: The delimiter to use when reading the CSV/TSV file. By default ``","``. :type delimiter: str, optional :param patch_paths_col: The name of the column in the DataFrame containing the image paths. Default is "image_path". :type patch_paths_col: str, optional :param label_col: The name of the column containing the image labels. Default is None. :type label_col: str, optional :param label_index_col: The name of the column containing the indices of the image labels. Default is None. :type label_index_col: str, optional :param image_mode: The color format to convert the image to. Default is "RGB". :type image_mode: str, optional .. attribute:: patch_df DataFrame containing the paths to image patches and their labels. :type: pandas.DataFrame or gpd.GeoDataFrame .. attribute:: label_col The name of the column containing the image labels. :type: str .. attribute:: label_index_col The name of the column containing the labels indices. :type: str .. attribute:: patch_paths_col The name of the column in the DataFrame containing the image paths. :type: str .. attribute:: image_mode The color format to convert the image to. :type: str .. attribute:: unique_labels The unique labels in the label column of the patch_df DataFrame. :type: list .. attribute:: transform A callable object (a torchvision transform) that takes in an image and performs image transformations. :type: callable .. method:: __len__() Returns the length of the dataset. .. method:: __getitem__(idx) Retrieves the image, its label and the index of that label at the given index in the dataset. .. method:: return_orig_image(idx) Retrieves the original image at the given index in the dataset. .. method:: _default_transform(t_type, resize2) Returns a transforms.Compose containing the default image transformations for the train and validation sets. :raises ValueError: If ``label_col`` not in ``patch_df``. :raises ValueError: If ``label_index_col`` not in ``patch_df``. :raises ValueError: If ``transform`` passed as a string, but not one of "train", "test" or "val". .. py:attribute:: label_col :value: None .. py:attribute:: label_index_col :value: None .. py:attribute:: image_mode :value: 'RGB' .. py:attribute:: patch_paths_col :value: 'image_path' .. py:attribute:: unique_labels :value: [] .. py:method:: return_orig_image(idx) Return the original image associated with the given index. :param idx: The index of the desired image, or a Tensor containing the index. :type idx: int or Tensor :returns: The original image associated with the given index. :rtype: PIL.Image.Image .. rubric:: Notes This method returns the original image associated with the given index by loading the image file using the file path stored in the ``patch_paths_col`` column of the ``patch_df`` DataFrame at the given index. The loaded image is then converted to the format specified by the ``image_mode`` attribute of the object. The resulting :class:`PIL.Image.Image` object is returned. .. py:method:: create_dataloaders(set_name = 'infer', batch_size = 16, shuffle = False, num_workers = 0, **kwargs) Creates a dictionary containing a PyTorch dataloader. :param set_name: The name to use for the dataloader. :type set_name: str, optional :param batch_size: The batch size to use for the dataloader. By default ``16``. :type batch_size: int, optional :param shuffle: Whether to shuffle the PatchDataset, by default False :type shuffle: bool, optional :param num_workers: The number of worker threads to use for loading data. By default ``0``. :type num_workers: int, optional :param \*\*kwargs: Additional keyword arguments to pass to PyTorch's ``DataLoader`` constructor. :returns: Dictionary containing dataloaders. :rtype: Dict .. py:class:: PatchContextDataset(patch_df, total_df, transform, delimiter = ',', patch_paths_col = 'image_path', label_col = None, label_index_col = None, image_mode = 'RGB', context_dir = './maps/maps_context', create_context = False, parent_path = './maps') Bases: :py:obj:`PatchDataset` A PyTorch Dataset class for loading contextual information about image patches from a DataFrame. :param patch_df: DataFrame or path to CSV/TSV/geojson file containing the paths to image patches and their labels. :type patch_df: str or pathlib.Path or pandas.DataFrame or gpd.GeoDataFrame :param total_df: DataFrame or path to CSV/TSV/geojson file containing the paths to all images and their labels. :type total_df: str or pathlib.Path or pandas.DataFrame or gpd.GeoDataFrame :param transform: Torchvision transform to be applied to context images. Either "train" or "val". :type transform: str :param delimiter: The delimiter to use when reading the CSV/TSV file. By default ``","``. :type delimiter: str :param patch_paths_col: The name of the column in the DataFrame containing the image paths. Default is "image_path". :type patch_paths_col: str, optional :param label_col: The name of the column containing the image labels. Default is None. :type label_col: str, optional :param label_index_col: The name of the column containing the indices of the image labels. Default is None. :type label_index_col: str, optional :param image_mode: The color space of the images. Default is "RGB". :type image_mode: str, optional :param context_dir: The path to context maps (or, where to save context if not created yet). Default is "./maps/maps_context". :type context_dir: str, optional :param create_context: Whether or not to create context maps. Default is False. :type create_context: bool, optional :param parent_path: The path to the directory containing parent images. Default is "./maps". :type parent_path: str, optional .. attribute:: patch_df DataFrame with columns representing image paths, labels, and object bounding boxes. :type: pandas.DataFrame or gpd.GeoDataFrame .. attribute:: label_col The name of the column containing the image labels. :type: str .. attribute:: label_index_col The name of the column containing the labels indices. :type: str .. attribute:: patch_paths_col The name of the column in the DataFrame containing the image paths. :type: str .. attribute:: image_mode The color space of the images. :type: str .. attribute:: parent_path The path to the directory containing parent images. :type: str .. attribute:: create_context Whether or not to create context maps. :type: bool .. attribute:: context_dir The path to context maps. :type: str .. attribute:: unique_labels The unique labels in ``label_col``. :type: list or str .. py:attribute:: label_col :value: None .. py:attribute:: label_index_col :value: None .. py:attribute:: image_mode :value: 'RGB' .. py:attribute:: patch_paths_col :value: 'image_path' .. py:attribute:: parent_path :value: './maps' .. py:attribute:: create_context :value: False .. py:attribute:: context_dir :value: b'.' .. py:method:: save_context(processors = 10, sleep_time = 0.001, use_parhugin = True, overwrite = False) Save context images for all patches in the patch_df. :param processors: The number of required processors for the job, by default 10. :type processors: int, optional :param sleep_time: The time to wait between jobs, by default 0.001. :type sleep_time: float, optional :param use_parhugin: Whether to use Parhugin to parallelize the job, by default True. :type use_parhugin: bool, optional :param overwrite: Whether to overwrite existing parent files, by default False. :type overwrite: bool, optional :rtype: None .. rubric:: Notes Parhugin is a Python package for parallelizing computations across multiple CPU cores. The method uses Parhugin to parallelize the computation of saving parent patches to disk. When Parhugin is installed and ``use_parhugin`` is set to True, the method parallelizes the calling of the ``get_context_id`` method and its corresponding arguments. If Parhugin is not installed or ``use_parhugin`` is set to False, the method executes the loop over patch indices sequentially instead. .. py:method:: get_context_id(id, overwrite = False, save_context = False, return_image = True) Save the parents of a specific patch to the specified location. :param id: Index of the patch in the dataset. :param overwrite: Whether to overwrite the existing parent files. Default is False. :type overwrite: bool, optional :param save_context: Whether to save the context image. Default is False. :type save_context: bool, optional :param return_image: Whether to return the context image. Default is True. :type return_image: bool, optional :raises ValueError: If the patch is not found in the dataset. :rtype: None .. py:method:: plot_sample(idx) Plot a sample patch and its corresponding context from the dataset. :param idx: The index of the sample to plot. :type idx: int :returns: Displays the plot of the sample patch and its corresponding context. :rtype: None .. rubric:: Notes This method plots a sample patch and its corresponding context side-by- side in a single figure with two subplots. The figure size is set to 10in x 5in, and the titles of the subplots are set to "Patch" and "Context", respectively. The resulting figure is displayed using the ``matplotlib`` library (required).