mapreader.classify.load_annotations
Module Contents
Classes
- class mapreader.classify.load_annotations.AnnotationsLoader
- load(annotations, delimiter=',', id_col='image_id', patch_paths_col='image_path', label_col='label', append=True, scramble_frame=False, reset_index=False)
Loads annotations from a csv file or dataframe and can be used to set the
id_col
,patch_paths_col
andlabel_col
attributes.- Parameters:
annotations (Union[str, pd.DataFrame]) – The annotations. Can either be the path to a csv file or a pandas.DataFrame.
delimiter (Optional[str], optional) – The delimiter to use when loading the csv file as a dataframe, by default “,”.
id_col (Optional[str], optional) – The name of the column which contains the image IDs, by default “image_id”.
patch_paths_col (Optional[str], optional) – The name of the column containing the image paths, by default “image_path”.
label_col (Optional[str], optional) – The name of the column containing the image labels, by default “label”.
append (Optional[bool], optional) – Whether to append the annotations to a pre-existing
annotations
dataframe. If False, existing dataframe will be overwritten. By default True.scramble_frame (Optional[bool], optional) – Whether to shuffle the rows of the dataframe, by default False.
reset_index (Optional[bool], optional) – Whether to reset the index of the dataframe (e.g. after shuffling), by default False.
- Raises:
ValueError – If
annotations
is passed as something other than a string or pd.DataFrame.
- show_patch(patch_id)
Display a patch and its label.
- Parameters:
patch_id (str) – The image ID of the patch to show.
- Return type:
None
- print_unique_labels()
Prints unique labels
- Raises:
ValueError – If no annotations are found.
- Return type:
None
- review_labels(label_to_review=None, chunks=8 * 3, num_cols=8, exclude_df=None, include_df=None, deduplicate_col='image_id')
Perform image review on annotations and update labels for a given label or all labels.
- Parameters:
label_to_review (str, optional) – The target label to review. If not provided, all labels will be reviewed, by default
None
.chunks (int, optional) – The number of images to display at a time, by default
24
.num_cols (int, optional) – The number of columns in the display, by default
8
.exclude_df (pandas.DataFrame, optional) – A DataFrame of images to exclude from review, by default
None
.include_df (pandas.DataFrame, optional) – A DataFrame of images to include for review, by default
None
.deduplicate_col (str, optional) – The column to use for deduplicating reviewed images, by default
"image_id"
.
- Return type:
None
Notes
This method reviews images with their corresponding labels and allows the user to change the label for each image.
Updated labels are saved in
self.annotations
and in a newly createdself.reviewed
DataFrame. Ifexclude_df
is provided, images found in this df are skipped in the review process. Ifinclude_df
is provided, only images found in this df are reviewed. Theself.reviewed
DataFrame is deduplicated based on thededuplicate_col
.
- show_sample(label_to_show, num_samples=9)
Show a random sample of images with the specified label (tar_label).
- Parameters:
label_to_show (str, optional) – The label of the images to show.
num_sample (int, optional) – The number of images to show. If
None
, all images with the specified label will be shown. Default is9
.num_samples (Optional[int]) –
- Return type:
None
- create_datasets(frac_train=0.7, frac_val=0.15, frac_test=0.15, random_state=1364, train_transform='train', val_transform='val', test_transform='test')
Splits the dataset into three subsets: training, validation, and test sets (DataFrames) and saves them as a dictionary in
self.datasets
.- Parameters:
frac_train (float, optional) – Fraction of the dataset to be used for training. By default
0.70
.frac_val (float, optional) – Fraction of the dataset to be used for validation. By default
0.15
.frac_test (float, optional) – Fraction of the dataset to be used for testing. By default
0.15
.random_state (int, optional) – Random seed to ensure reproducibility. The default is
1364
.train_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the training dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “train”.
val_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the validation dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “val”.
test_transform (str, tochvision.transforms.Compose or Callable, optional) – The transform to use on the test dataset images. Options are “train”, “test” or “val” or, a callable object (e.g. a torchvision transform or torchvision.transforms.Compose). By default “test”.
- Raises:
ValueError – If the sum of fractions of training, validation and test sets does not add up to 1.
- Return type:
None
Notes
This method saves the split datasets as a dictionary in
self.datasets
.Following fractional ratios provided by the user, where each subset is stratified by the values in a specific column (that is, each subset has the same relative frequency of the values in the column). It performs this splitting by running
train_test_split()
twice.See
PatchDataset
for more information on transforms.
- create_dataloaders(batch_size=16, sampler='default', shuffle=False, num_workers=0, **kwargs)
Creates a dictionary containing PyTorch dataloaders saves it to as
self.dataloaders
and returns it.- Parameters:
batch_size (int, optional) – The batch size to use for the dataloader. By default
16
.sampler (Sampler, str or None, optional) – The sampler to use when creating batches from the training dataset.
shuffle (bool, optional) – Whether to shuffle the dataset during training. By default
False
.num_workers (int, optional) – The number of worker threads to use for loading data. By default
0
.**kwds – Additional keyword arguments to pass to PyTorch’s
DataLoader
constructor.
- Returns:
Dictionary containing dataloaders.
- Return type:
Dict
Notes
sampler
will only be applied to the training dataset (datasets[“train”]).