---------------------------------------------------------------------------NameError Traceback (most recent call last)
CellIn[1], line 4 1importos 2importpolarsaspl----> 4img_dir=os.path.join(base_dir,"images") 6bb_file=os.path.join(base_dir,"bounding_boxes.txt") 7classes_translation_file=os.path.join(base_dir,"classes_fixed.txt")NameError: name 'base_dir' is not defined
Create class for our dataset
To read in the images, there are many options, including:
Here, we are using imageio.imread from imageio which is an excellent option because it automatically creates a NumPy ndarrays, choosing a dtype based on the image, and it is faster than other options (scikit-image actually use it now instead of their own implementation).
PyTorch provides torch.utils.data.Dataset, an abstract class representing a dataset. You need to write a subclass of torch.utils.data.Dataset (let’s call it NABirdsDataset) so that it inherits from torch.utils.data.Dataset, but with characteristics matching our own dataset.
Load the packages:
from torch.utils.data import Dataset, DataLoaderimport imageio.v3 as iioimport matplotlib.pyplot as plt
---------------------------------------------------------------------------ModuleNotFoundError Traceback (most recent call last)
CellIn[2], line 1----> 1fromtorch.utils.dataimportDataset,DataLoader 2importimageio.v3asiio 3importmatplotlib.pyplotaspltModuleNotFoundError: No module named 'torch'
A PyTorch custom Dataset class must implement three methods:
__init__: initializes a new instance (object) of the class,
__len__: returns the number of samples in the new dataset class, and
__getitem__: loads and returns a sample from the dataset at a given index idx:
---------------------------------------------------------------------------NameError Traceback (most recent call last)
CellIn[3], line 1----> 1classNABirdsDataset(Dataset): 2"""NABirds dataset class.""" 3def__init__(self,metadata_file,data_dir,transform=None):NameError: name 'Dataset' is not defined
---------------------------------------------------------------------------NameError Traceback (most recent call last)
CellIn[4], line 1----> 1nabirds_train=NABirdsDataset( 2metadata_train, 3os.path.join(base_dir,img_dir) 4)NameError: name 'NABirdsDataset' is not defined
Display a data sample
Let’s display the first 4 images and their bounding boxes:
fig = plt.figure()for i, sample inenumerate(nabirds_train):print(i, sample['image'].shape) ax = plt.subplot(1, 4, i +1) plt.tight_layout() ax.set_title(f"Sample {i}, identification: {sample['id']}, picture by {sample['photographer']}" ) ax.axis('off') ax.imshow(sample['image']) rect = patches.Rectangle( (sample['bb'][0], sample['bb'][1]), sample['bb'][2], sample['bb'][3], linewidth=2, edgecolor='r', facecolor='none' ) ax.add_patch(rect)if i ==3: plt.show()break
---------------------------------------------------------------------------NameError Traceback (most recent call last)
CellIn[5], line 1----> 1fig=plt.figure() 3fori,sampleinenumerate(nabirds_train): 4print(i,sample['image'].shape)NameError: name 'plt' is not defined
Notice how the images are all of different sizes. This is a problem. We are also not making use of the bounding boxes this dataset comes with, hence using parts of images we know do not contain any bird unnecessarily.
We will address these problems in the next section.