Object Detection -YOLO (You Look Only Once)

gandham vignesh babu
2 min readJan 30, 2020

What is yolo?

yolo is algorithm which is used in object detection which gives good accuracy and higher speeds.

What exactly happens in YOLO??

We divide the image into various cells in grid format. For example, We divide the image into 3*3 grid. For the each grid we can generate the anchors for the grid.For each cell in a grid we have the classlabels for it. These class labels will be dontcares which are small randomvalues.

The classlabels varies depends on the number of anchor boxes.

If we have two anchor boxes and three classes of objects then the class labels will be [pc,x,y,w,h,c1,c2,c3,pc,x,y,w,h,c1,c2,c3]

pc means nothing but the confidence score to which the object belongs to

x,y,w,h are the coordinates, c1,c2,c3 are the class labels to which object belongs like dog,cat etc.., if the object is of class1 then c1=1,c2=0,c3=0.

How the training data looks like in YOLO??

As usual for training we give the image , coordinates of the object,label associated with the coordinates.Later they are transformed as per the grid size that we are considering.

It will be Gridsize*Gridsize*Number of anchors* (B * 5 + C).These B coordinates will be the bounding box coordinates in image.

For each object in training image is assigned to grid cell that contains the object midpoint and anchor box for the gridcell with the highest IOU.

Is someone else tell me training time we are using anchor box terminology become boundingbox in prediction time is that right?Prediction time acnhorbox not using only boundingbox right?

REPLY

Yes you are correct.Anchor box is only used to see the IOU matching with the ground truth bounding box. If the value of IOU between anchorbox and ground truth box of particular object is greater than 0.5 then we will consider the anchor and for that classlabels are [object confidence as 1(object with which IOU>0.5),bounding box coordinates of that object,classlabel as 1 for that object and zero for remaining].

Note:

  1. Anchors are sort of boundingboxes calculated on the coco dataset using the k-means clustering.’
  2. If my input image is of shape 100*100. When it is made into 3*3 grid then the 100/3,100/3 will go into each grid.
  3. Non max supression can be used to solve the problem of multiple detections for each class. where non max supression can be applied independently for each classlabel.Non max supression uses the Intersection over union.

--

--

gandham vignesh babu

Datascientist and machine learning engineer with strong math background . Working on various problem statements involving modeling, data processing and data min