Configuration file in TensorFlow Object Detection Api

gandham vignesh babu
3 min readJan 24, 2020
  1. num_classes define the number of classes that we are using.
  2. image_resizer is nothing but resizing the image of fixed size maintaining the aspect ration.
  3. feature extractor is nothing but the network to which we are giving the image to generate the feature map. In this case we are using the resnet but we use the VGG16 generally.
  4. first stage feature stride: If we see the feature map is finally giving the pixels we will be conisdering the pixels to generate the anchor boxes right.To generate the anchor box will perform the sliding window operation over the image.So to move to the next pixels to generate the anchor boxes How many pixels we should move If stride is 16 we should move 16 pixels.
# Anchor box scales
# Note that if im_size is smaller, anchor_box_scales should be scaled
# Original anchor_box_scales in the paper is [128, 256, 512]
we are using the anchor boxes of lower size hence we are reducing the size of the anchor boxes.
#self.anchor_box_scales = [64, 128, 256]
Thse are the ratios of the anchor boxes that we have generated which are like 1:1 ,1:2 and 2:1.If you see the config file that we have written we are also giving the same aspect ratios which are 1:1,1:2 and 2:1.# Anchor box ratios
self.anchor_box_ratios = [[1, 1], [1./math.sqrt(2), 2./math.sqrt(2)], [2./math.sqrt(2), 1./math.sqrt(2)]]

calculation of stride :

Calculate the featuremap size based on the network that you are using.

We have to generate the anchor boxes for the every unit we have in the feature map. For every unit we will generate the anchor boxes which are number_of_apect_ratios * number_of_scales. we will choose the size of the anchor boxes 128 or 256 based on the size of the object. if the size if the object is big then we will choose the 256 else if it is small we will choose the 128.

Why we use the stride in configuration??

Inorder to generate the anchor that reach each and every corner of the original image we do that . the optimum value of the stride needs to be calculated based on the output shape of the feature map.

IF the featuremap is of size 107,134 and image size is like 1700,2200. Then the value of the stride will be like int(1700/107,2200/134).( 16,16).

General configuration file:

Here we are extracting the features using the resnet architecture , we can also extract the features using the VGG NET.

Do we have the dense layers incase of generating the region ptoposal networks??

No , But for the final classification layers we will be having the the dense layers which will be reshaped based on the n*c where the n is the number of region proposal that are generated.

--

--

gandham vignesh babu

Datascientist and machine learning engineer with strong math background . Working on various problem statements involving modeling, data processing and data min