From matt.brown at kitware.com  Mon Mar 12 10:59:39 2018
From: matt.brown at kitware.com (Matt Brown)
Date: Mon, 12 Mar 2018 10:59:39 -0400
Subject: [Kwiver-users] Bounding box coordinates
Message-ID: <CAMo5cUhXD1wGOA0iBBGTVdcZMMXwZ4CKdY8fmYPSup09g76JKA@mail.gmail.com>

I have an application where I would like to significantly reduce the
resolution of an image using pyramid reductions, run a detector on that
reduced-resolution image, and then warp the resulting bounding box back up
to the native-resolution coordinate system. However, I have some questions
about dealing with the bounding box coordinates and how detectors treat, or
should treat, these coordinates.

I wrote a new OCV image_pyramid process, which can be configured to apply
multiple iterations of OCV's pyrUp or pyDown and can emit a homography
(there is actually a 0.5 pixel translation on each level, so it's not just
a scaling) representing the warping from output image back to source image.
I am also writing a *detected_object_bounding_box_warp* node to apply a
homography to each bounding box in a detected_object_set. Basically, it
will warp the top-left and bottom-left coordinates with the homography.
Though, I admit it is a little weird applying a homography to a bounding
box that has to remain an axis-aligned rectangle. So, maybe there is a
discussion about renaming or reworking this.

My question is related to the precise definition of the bounding box, which
encodes the upper left and lower right coordinates of the box. Do we have a
standard enforced for how detectors populate these bounding boxes? Let's
say you have a detection covering a rectangle of pixels starting at
upper-left pixel indices (x1,y1) to lower-right (x2,y2) inclusive, I would
think the bounding box definition should be (x1-0.5,y1-0.5) to lower-right
(x2+0.5,y2+0.5), which is the box in image coordinates completely
containing the area of the pixels. A different of a half pixel may seem
trivial, but in my case, I am detecting on a highly reduced version of the
image and then upscaling the bounding boxes, so that half pixel of
different can be compounded many times.

Thanks,
Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://public.kitware.com/pipermail/kwiver-users/attachments/20180312/993180f8/attachment.html>

From paul.tunison at kitware.com  Mon Mar 12 12:05:35 2018
From: paul.tunison at kitware.com (Paul Tunison)
Date: Mon, 12 Mar 2018 12:05:35 -0400
Subject: [Kwiver-users] Bounding box coordinates
In-Reply-To: <CAMo5cUhXD1wGOA0iBBGTVdcZMMXwZ4CKdY8fmYPSup09g76JKA@mail.gmail.com>
References: <CAMo5cUhXD1wGOA0iBBGTVdcZMMXwZ4CKdY8fmYPSup09g76JKA@mail.gmail.com>
Message-ID: <CAEYMaswNj71ToK6NWEfK-vzCqNxP5t7qQ-2CJu9FU0GTCtBGCA@mail.gmail.com>

I don't remember the name, but the process could be constrained to only
allow the type of transform that allows translation and scaling (no
rotation, etc.).

On Mon, Mar 12, 2018 at 10:59 AM, Matt Brown <matt.brown at kitware.com> wrote:

> I have an application where I would like to significantly reduce the
> resolution of an image using pyramid reductions, run a detector on that
> reduced-resolution image, and then warp the resulting bounding box back up
> to the native-resolution coordinate system. However, I have some questions
> about dealing with the bounding box coordinates and how detectors treat, or
> should treat, these coordinates.
>
> I wrote a new OCV image_pyramid process, which can be configured to apply
> multiple iterations of OCV's pyrUp or pyDown and can emit a homography
> (there is actually a 0.5 pixel translation on each level, so it's not just
> a scaling) representing the warping from output image back to source image.
> I am also writing a *detected_object_bounding_box_warp* node to apply a
> homography to each bounding box in a detected_object_set. Basically, it
> will warp the top-left and bottom-left coordinates with the homography.
> Though, I admit it is a little weird applying a homography to a bounding
> box that has to remain an axis-aligned rectangle. So, maybe there is a
> discussion about renaming or reworking this.
>
> My question is related to the precise definition of the bounding box,
> which encodes the upper left and lower right coordinates of the box. Do we
> have a standard enforced for how detectors populate these bounding boxes?
> Let's say you have a detection covering a rectangle of pixels starting at
> upper-left pixel indices (x1,y1) to lower-right (x2,y2) inclusive, I would
> think the bounding box definition should be (x1-0.5,y1-0.5) to lower-right
> (x2+0.5,y2+0.5), which is the box in image coordinates completely
> containing the area of the pixels. A different of a half pixel may seem
> trivial, but in my case, I am detecting on a highly reduced version of the
> image and then upscaling the bounding boxes, so that half pixel of
> different can be compounded many times.
>
> Thanks,
> Matt
>
> _______________________________________________
> Kwiver-users mailing list
> Kwiver-users at public.kitware.com
> https://public.kitware.com/mailman/listinfo/kwiver-users
>
>


-- 
Paul Tunison
Senior R&D Engineer
Kitware, Inc.
28 Corporate Drive
Clifton Park, NY 12065 USA

Phone: (518) 371-3971 Ext.164
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://public.kitware.com/pipermail/kwiver-users/attachments/20180312/07495e8c/attachment.html>