`mapreader.classify.load_annotations`

Module Contents

Classes

AnnotationsLoader

class mapreader.classify.load_annotations.AnnotationsLoader

load(annotations, delimiter=',', id_col='image_id', patch_paths_col='image_path', label_col='label', append=True, scramble_frame=False, reset_index=False)

Loads annotations from a csv file or dataframe and can be used to set the id_col, patch_paths_col and label_col attributes.

Parameters:

annotations (Union[str, pd.DataFrame]) – The annotations. Can either be the path to a csv file or a pandas.DataFrame.
delimiter (Optional[str], optional) – The delimiter to use when loading the csv file as a dataframe, by default “,”.
id_col (Optional[str], optional) – The name of the column which contains the image IDs, by default “image_id”.
patch_paths_col (Optional[str], optional) – The name of the column containing the image paths, by default “image_path”.
label_col (Optional[str], optional) – The name of the column containing the image labels, by default “label”.
append (Optional[bool], optional) – Whether to append the annotations to a pre-existing annotations dataframe. If False, existing dataframe will be overwritten. By default True.
scramble_frame (Optional[bool], optional) – Whether to shuffle the rows of the dataframe, by default False.
reset_index (Optional[bool], optional) – Whether to reset the index of the dataframe (e.g. after shuffling), by default False.

Raises:

ValueError – If annotations is passed as something other than a string or pd.DataFrame.

show_patch(patch_id)

Display a patch and its label.

Parameters:: patch_id (str) – The image ID of the patch to show.
Return type:: None

print_unique_labels()

Prints unique labels

Raises:: ValueError – If no annotations are found.
Return type:: None

review_labels(label_to_review=None, chunks=8 * 3, num_cols=8, exclude_df=None, include_df=None, deduplicate_col='image_id')

Perform image review on annotations and update labels for a given label or all labels.

Parameters:

label_to_review (str, optional) – The target label to review. If not provided, all labels will be reviewed, by default None.
chunks (int, optional) – The number of images to display at a time, by default 24.
num_cols (int, optional) – The number of columns in the display, by default 8.
exclude_df (pandas.DataFrame, optional) – A DataFrame of images to exclude from review, by default None.
include_df (pandas.DataFrame, optional) – A DataFrame of images to include for review, by default None.
deduplicate_col (str, optional) – The column to use for deduplicating reviewed images, by default "image_id".

Return type:

None

Notes

This method reviews images with their corresponding labels and allows the user to change the label for each image.

Updated labels are saved in self.annotations and in a newly created self.reviewed DataFrame. If exclude_df is provided, images found in this df are skipped in the review process. If include_df is provided, only images found in this df are reviewed. The self.reviewed DataFrame is deduplicated based on the deduplicate_col.

show_sample(label_to_show, num_samples=9)

Show a random sample of images with the specified label (tar_label).

Parameters:

label_to_show (str, optional) – The label of the images to show.
num_sample (int, optional) – The number of images to show. If None, all images with the specified label will be shown. Default is 9.
num_samples (Optional[int]) –

Return type:

None

create_datasets(frac_train=0.7, frac_val=0.15, frac_test=0.15, random_state=1364, train_transform='train', val_transform='val', test_transform='test')

Splits the dataset into three subsets: training, validation, and test sets (DataFrames) and saves them as a dictionary in self.datasets.

Parameters:

frac_train (float, optional) – Fraction of the dataset to be used for training. By default 0.70.
frac_val (float, optional) – Fraction of the dataset to be used for validation. By default 0.15.
frac_test (float, optional) – Fraction of the dataset to be used for testing. By default 0.15.
random_state (int, optional) – Random seed to ensure reproducibility. The default is 1364.
train_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the training dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “train”.
val_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the validation dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “val”.
test_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the test dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “test”.

Raises:

ValueError – If the sum of fractions of training, validation and test sets does not add up to 1.

Return type:

None

Notes

This method saves the split datasets as a dictionary in self.datasets.

Following fractional ratios provided by the user, where each subset is stratified by the values in a specific column (that is, each subset has the same relative frequency of the values in the column). It performs this splitting by running train_test_split() twice.

See PatchDataset for more information on transforms.

create_dataloaders(batch_size=16, sampler='default', shuffle=False, num_workers=0, **kwargs)

Creates a dictionary containing PyTorch dataloaders saves it to as self.dataloaders and returns it.

Parameters:

batch_size (int, optional) – The batch size to use for the dataloader. By default 16.
sampler (Sampler, str or None, optional) – The sampler to use when creating batches from the training dataset.
shuffle (bool, optional) – Whether to shuffle the dataset during training. By default False.
num_workers (int, optional) – The number of worker threads to use for loading data. By default 0.
**kwds – Additional keyword arguments to pass to PyTorch’s DataLoader constructor.

Returns:

Dictionary containing dataloaders.

Return type:

Dict

Notes

sampler will only be applied to the training dataset (datasets[“train”]).

mapreader.classify.load_annotations

Module Contents

Classes

`mapreader.classify.load_annotations`