<div dir="ltr"><div><div><div>I have an application where I would like to significantly reduce the resolution of an image using pyramid reductions, run a detector on that reduced-resolution image, and then warp the resulting bounding box back up to the native-resolution coordinate system. However, I have some questions about dealing with the bounding box coordinates and how detectors treat, or should treat, these coordinates.<br><br>I wrote a new OCV image_pyramid process, which can be configured to apply multiple iterations of OCV's pyrUp or pyDown and can emit a homography (there is actually a 0.5 pixel translation on each level, so it's not just a scaling) representing the warping from output image back to source image. I am also writing a <b>detected_object_bounding_box_warp</b> node to apply a homography to each bounding box in a detected_object_set. Basically, it will warp the top-left and bottom-left coordinates with the homography. Though, I admit it is a little weird applying a homography to a bounding box that has to remain an axis-aligned rectangle. So, maybe there is a discussion about renaming or reworking this.<br><br></div>My question is related to the precise definition of the bounding box, which encodes the upper left and lower right coordinates of the box. Do we have a standard enforced for how detectors populate these bounding boxes? Let's say you have a detection covering a rectangle of pixels starting at upper-left pixel indices (x1,y1) to lower-right (x2,y2) inclusive, I would think the bounding box definition should be (x1-0.5,y1-0.5) to lower-right (x2+0.5,y2+0.5), which is the box in image coordinates completely containing the area of the pixels. A different of a half pixel may seem trivial, but in my case, I am detecting on a highly reduced version of the image and then upscaling the bounding boxes, so that half pixel of different can be compounded many times.<br><br></div>Thanks,<br></div>Matt<br></div>