What s the difference between BB regression algorithms used in R-CNN variants vs BB in YOLO localization techniques

0 votes


What's the difference between the bounding box(BB) produced by "BB regression algorithms in region-based object detectors" vs "bounding box in single shot detectors"? and can they be used interchangeably if not why?

While understanding variants of R-CNN and Yolo algorithms for object detection, I came across two major techniques to perform object detection i.e Region-based(R-CNN) and niche-sliding window based(YOLO).

Both use different variants(complicated to simple) in both regimes but in the end, they are just localizing objects in the image using Bounding boxes!. I am just trying to focus on the localization(assuming classification is happening!) below since that is more relevant to the question asked & explained my understanding in brief:

  • Region-based:

    • Here, we let the Neural network to predict continuous variables(BB coordinates) and refers to that as regression.
    • The regression that is defined (which is not linear at all), is just a CNN or other variants(all layers were differentiable),outputs are four values (𝑟,𝑐,ℎ,𝑤), where (𝑟,𝑐) specify the values of the position of the left corner and (ℎ,𝑤) the height and width of the BB.
    • In order to train this NN, a smooth L1 loss was used to learn the precise BB by penalizing when the outputs of the NN are very different from the labeled (𝑟,𝑐,ℎ,𝑤) in the training set!
  • niche-Sliding window(convolutionally implemented!) based:

    • first, we divide the image into say 19*19 grid cells.
    • the way you assign an object to a grid-cell is by selecting the midpoint of an object and then assigning that object to whichever one grid cell contains the midpoint of the object. So each object, even if the objects span multiple grid cells, that object is assigned only to one of the 19 by 19 grid cells.
    • Now, you take the two coordinates of this grid-cell and calculate the precise BB(bx, by, bh, bw) for that object using some method such as
    • (bx, by, bh, bw) are relative to the grid cell where x & y are center point and h & w are the height of precise BB i.e the height of the bounding box is specified as a fraction of the overall width of the grid cell and h& w can be >1.
    • There multiple ways of calculating precise BB specified in the paper.

Both Algorithms:

  • outputs precise bounding boxes.!

  • works in supervised learning settings, they were using labeled dataset where the labels are bounding boxes stored(manually marked my some annotator using tools like labelimg ) for each image in a JSON/XML file format.

I am trying to understand the two localization techniques on a more abstract level(as well as having an in-depth idea of both techniques!) to get more clarity on:

  • in what sense they are different?, &

  • why 2 were created, I mean what are the failure/success points of 1 on the another?.

  • and can they be used interchangeably, if not then why?

please feel free to correct me if I am wrong somewhere, feedback is highly appreciated! Citing to any particular section of a research paper would be more rewarding!

Apr 11, 2022 in Machine Learning by Dev
• 6,000 points

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.

Related Questions In Machine Learning

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

What's the difference between regression testing and mutation testing?

Regression testing is a test suite that ...READ MORE

answered Mar 8, 2022 in Machine Learning by Dev
• 6,000 points
0 votes
1 answer
0 votes
1 answer

How can I Get Laplacian pyramid using opencv

As far as I can see you ...READ MORE

answered Sep 4, 2018 in Python by Priyaj
• 58,100 points
0 votes
1 answer

tf.reshape vs tf.contrib.layers.flatten

All 3 options reshape identically: import tensorflow as ...READ MORE

answered Oct 10, 2018 in Python by Priyaj
• 58,100 points
0 votes
1 answer

Leela Chess Zero: how large is the probability vector in the output layer?

The next move's probability vector (called the ...READ MORE

answered Mar 9, 2022 in Machine Learning by Nandini
• 5,480 points
0 votes
1 answer
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP