Question:

What's the difference between the bounding box(BB) produced by "BB regression algorithms in region-based object detectors" vs "bounding box in single shot detectors"? and can they be used interchangeably if not why?

While understanding variants of R-CNN and Yolo algorithms for object detection, I came across two major techniques to perform object detection i.e Region-based(R-CNN) and niche-sliding window based(YOLO).

Both use different variants(complicated to simple) in both regimes but in the end, they are just localizing objects in the image using Bounding boxes!. I am just trying to focus on the localization(assuming classification is happening!) below since that is more relevant to the question asked & explained my understanding in brief:

• Region-based:

• Here, we let the Neural network to predict continuous variables(BB coordinates) and refers to that as regression.
• The regression that is defined (which is not linear at all), is just a CNN or other variants(all layers were differentiable),outputs are four values (𝑟,𝑐,ℎ,𝑤), where (𝑟,𝑐) specify the values of the position of the left corner and (ℎ,𝑤) the height and width of the BB.
• In order to train this NN, a smooth L1 loss was used to learn the precise BB by penalizing when the outputs of the NN are very different from the labeled (𝑟,𝑐,ℎ,𝑤) in the training set!
• niche-Sliding window(convolutionally implemented!) based:

• first, we divide the image into say 19*19 grid cells.
• the way you assign an object to a grid-cell is by selecting the midpoint of an object and then assigning that object to whichever one grid cell contains the midpoint of the object. So each object, even if the objects span multiple grid cells, that object is assigned only to one of the 19 by 19 grid cells.
• Now, you take the two coordinates of this grid-cell and calculate the precise BB(bx, by, bh, bw) for that object using some method such as
• (bx, by, bh, bw) are relative to the grid cell where x & y are center point and h & w are the height of precise BB i.e the height of the bounding box is specified as a fraction of the overall width of the grid cell and h& w can be >1.
• There multiple ways of calculating precise BB specified in the paper.

Both Algorithms:

• outputs precise bounding boxes.!

• works in supervised learning settings, they were using labeled dataset where the labels are bounding boxes stored(manually marked my some annotator using tools like labelimg ) for each image in a JSON/XML file format.

I am trying to understand the two localization techniques on a more abstract level(as well as having an in-depth idea of both techniques!) to get more clarity on:

• in what sense they are different?, &

• why 2 were created, I mean what are the failure/success points of 1 on the another?.

• and can they be used interchangeably, if not then why?

please feel free to correct me if I am wrong somewhere, feedback is highly appreciated! Citing to any particular section of a research paper would be more rewarding!

Apr 11, 2022 216 views

## Can someone explain to me the difference between a cost function and the gradient descent equation in logistic regression?

when we train a model with data, ...READ MORE

## difference between a cost function and the gradient descent equation in logistic regression?

Cost function is a way to evaluate ...READ MORE

## Is there a way to force the coefficient of the independent variable to be a positive coefficient in the linear regression model used in R?

A Few Constraints This is an example of ...READ MORE

## What's the difference between regression testing and mutation testing?

Regression testing is a test suite that ...READ MORE

## What are the techniques used in supervised learning?

There are two main techniques used in ...READ MORE

## What are the techniques used in unsupervised learning?

There are two main techniques used in ...READ MORE

## How can I Get Laplacian pyramid using opencv

As far as I can see you ...READ MORE

## tf.reshape vs tf.contrib.layers.flatten

All 3 options reshape identically: import tensorflow as ...READ MORE