I. INTRODUCTION
THE MPEG-4
standard is a more generic coding standard that permits an object-based
audio-visual representation of data. The advantages of this object-based
representation are the following. First, different object types may have
different suitable coded representations. Coding them separately could achieve
better compression efficiency. Second, it allows harmonious integration of
different types of data into one scene.
Each
video sequence may have several video objects. A video object at a given time
instant is called a video object plane (VOP). VOPs can have arbitrary shapes
and can be encoded independently of each other, or dependently using motion
compensation. Compressed video objects are very vulnerable to channel errors;
hence, error concealment techniques are needed to navigate the effect of data
corruption or loss. In this paper, we describe an error concealment technique
for object-based video sequences coded using the MPEG-4 video coding standard.
II.SHAPE
CODING IN MPEG-4
MPEG-4 adopted alpha planes for shape
representation. There are two types of alpha planes: binary alpha planes and
grayscale alpha planes. Binary alpha planes are binary images representing the
shape of the VOPs. Grayscale alpha planes are typically 8-bit grayscale images
representing the degree of transparency of the VOPs.
The
information representing a VOP is composed of two parts, shape and texture. A
VOP with an arbitrary shape is first extended into a rectangular VOP based on
its shape information such that the width and height of the extended
rectangular VOP are the smallest integer multiples of 16. This process is
called “VOP formation”. After VOP formation, a binary image having the size of
the extended rectangular VOP, normally referred to as the binary mask for the
VOP, is used to represent the shape information of the VOP. Pixels with value 1
represent the VOP and pixels with value 0 represent the background. The binary
mask is also divided into 16x16 blocks called binary alpha blocks (BABs). Each
BAB is encoded using a shape encoder. At the decoder, the BABs will be decoded
to reconstruct the arbitrarily shaped VOP.
When
coding the shape of an object, each of its BABs is either intracoded using a
block-based context-based arithmetic coder or interceded using a combination of
motion estimation/compensation and context-based arithmetic encoding (CAE).
III. PREVIOUS ERROR CONCEALMENT FOR
SHAPE CODING IN MPEG-4 VIDEO
Error
resilience and concealment techniques are needed for video transmission over
unreliable communication channels. Since MPEG-4 is aimed at making media
delivery possible over all kinds of delivery technologies, including
error-prone networks such as wireless networks and the Internet, error
resilience and concealment techniques are necessary.
Although
both shape and texture are essential for the representation of a VOP, shape is
always considered more important. In addition, binary shape information is more
sensitive to errors compared to texture information. Thus, shape information is
more critical than texture and error concealment for shape is considered a
requisite.
Many
techniques have been developed for texture recovery that cannot be used for
shape recovery for the following reasons: 1) a binary mask represents shape
while texture is represented using grayscale values; 2) the only important feature
in shape data is the boundary of the video object plane. All pixels within the
boundary are opaque, and the rest are transparent; and 3) even though shape and
texture are coded using motion estimation and compensation techniques, the
coded representations are different. Shape is coded in the spatial domain using
block-based context-based arithmetic encoding, while texture is coded in
transform domain(DCT coefficients).Thus, the nature of shape data requires
alternative techniques for error concealment.
A binary Markov random field (MRF) is used as the underlying image
model, and maximum a posteriori (MAP) estimation is employed to yield the most
likely image given the observed image data. Each missing pixel is estimated
from the pixels in its clique as the median of weighted clique pixel values.
The weights are based on the likelihood of an edge being in the direction of
the missing pixel and a pixel in its neighborhood. The rationale behind this
selection is to weigh more the difference between the missing pixel and a pixel
in its clique in a direction along which pixels have a tendency to be the same.
Although this method is good,
there exist some problems with it. First, it makes use of spatial redundancy
only to recover missing pixels. Second, the outcome of the method depends on
the error pattern of the image. That is, the distribution of the correctly
received pixels in the image plays a very strong role in the recovery of
damaged area, and may produce unpredictable results in some cases. Third, this
method tries to recover image data based on block recovery, not on boundary
recovery for shape information when only the boundary is of concern, may not be
very effective. Fourth, the method is computationally intensive. The number of
the iterations needed before convergence depends on the content of the image.
The iteration involves
a) finding the
boundary pixels in the neighboring blocks;
b) calculating
the slope of the best line-fit for each of the border pixels;
c) estimating
missing pixels;
Steps a) and b) are needed for each
binary alpha block and step c) is required for every missing pixel.
IV.
GLOBAL MOTION COMPENSATION METHOD
FOR SHAPE
CONCEALMENT
Predictive coding gives a
compressed video stream more vulnerable to errors. One way to make the
compressed video stream more resilient is to use intracoding. This can
effectively prevent error propagation, but results in a loss in compression efficiency.
This loss, however, can be ameliorated if rate control schemes are used.
The use of intracoding means that any
error concealment strategy will only be able to utilize spatial data, as no information
regarding the temporal correlation between VOPs at different time instances of
errors. Some temporal information (motion data) is necessary in these cases for
effective recovery. One way a decoder can obtain temporal information when
intracoding is used is for the encoder to insert it into the bit stream.
However, this raises the problem about how to insert additional data, in
particular motion data, into the bit stream without changing the format of the
bit stream. In other words, how does one add motion information to the bit
stream without affecting the syntax of the bit stream? Fortunately, MPEG-4
defines a variable length USER-DATA field, which allows additional data defined
by applications to be transmitted along. This field can be used to insert
motion data. The decoder can then retrieve this data and use it to find shape
damage resulting from transmission errors.
Another problem raised by motion data
insertion is how much motion data is enough for concealing errors without
compromising compression efficiency. Since shape coding deals with boundaries,
we need not trace the motion of all the macro blocks belonging to an object.
Only the motion of the object’s boundary is required. Hence, one can utilize
global motion estimation to trace the motion of the object. This can be
employed by the decoder to perform error concealment, when necessary.
It is important to note that block-based
motion estimation usually produces better results than global motion estimation
in the case of texture. In block-based motion estimation, uniform motion is
assumed for each block, whereas in global motion estimation, uniform motion is
assumed for the entire object. Since macro blocks may have different texture,
block-based motion estimation for texture tends to be more accurate. However,
since all BABs inside a VOP are opaque, it is not necessary to use block –based
motion estimation for shape since the motion information that is important is
that belonging to the boundary. Furthermore, by applying global motion estimation,
the continuity of the boundary is maintained, while block-based motion
estimation may not guarantee continuity at the junction points between
neighboring blocks. An advantage of block-based motion estimation is that it
may be more accurate when there are changes between consecutive VOPs that
cannot be accurately reflected by global motion. Since these cases do not
happen very often in a video sequence, and even local motion estimation may not
produce good results for all these cases, global motion estimation for boundary
may be the better for most cases.
A.
GLOBAL MOTION COMPENSATION TECHNICS
In Global motion estimation or camera
motion estimation apparent motion in most image sequence obtained by a camera
can be attributed to either camera motion or the movement of the objects in scene.
Local motion estimation is applied to trace the movement of objects where as
global motion is applied to follow camera motion. a grate amount of redundancy
between successive frames can be removed by applying global motion estimation.
Global motion parameters can be effectively used to predict the stationary
parts of a screen, thus saving the bandwidth needed to otherwise transmit
motion vectors for them.
Three dimensional object or camera movement can be classified using the
following categories.
1) Change of camera focal length – zoom
2) Rotation around the camera access
3) Translation in the plane normal to
the camera axis
4) Rotation around an axis normal to the
camera axis –pan
5) Translation along the camera axis
These characteristics are represented by
a rotation transformation followed by a translation transformation to the 3-D
point to be mapped. To calculate the transformation 12 parameters are needed.
The transformation is given by
x2 x1
r1 r2 r3 x
y2 =
R y1 +
T , where
R = r4 r5
r6 , T
= y
z2 z1 r7 r8
r9 z
R-represents rotation transformation matrix
Its elements are obtained from the pan
angle ‘α’, tilt angle ‘β’, and the swing angle
‘γ’.’ T’ represents the translation
transformation matrix derived from 3-D translation
T
X1
y1 z1 represents the coordinate of a
3-D point before applying the global
T
Motion transforms, and x2 y2 z2 represents the coordinate of the 3-D
point after global motion transform supplied
B.
GLOBAL MOTION ESTIMATION PARAMETERS
NEEDED FOR SHAPE RECOVERY
When applying global motion estimation
to perform shape error concealment, it is only the shape of the video object
plane and its boundary is important. the only aspects of 3-D camera motion that
are utilized in the proposed method are the change of the camera focal length,
camera rotation around its axis, and translation in the plane normal to the camera.
the other aspects namely translation along the camera axis and pan can be
approximated by zoom, and since rotation around an axis normal to the camera
axis is usually very slight and it is approximated by the translation in the
plane normal to the camera axis. Thus the 3-D camera movement becomes 2-D
global movement with following characteristics
1) Change of the camera focal length-zoom
or scale
2) Rotation around the camera axis
3) Translation in the plane normal to
the camera axis
Relative coordinates of a VOP in a frame
are encoded through the motion vectors; translation is already embedded in the
bit stream. Thus the global motion parameters required are zoom and 2-D rotation.
To represent the global motion 4 parameters are used
1) Two parameters are used for the
centroid of the object they are
x' = (Σi Σj bi, j * j)/ (area (VOP))
y'= - (Σi Σj bi, j * i)/ (area (VOP))
2) Orientation angle (θ) of an object in
a binary image can be obtained as
a=ΣiΣj (xi, j - x’) ^2 bi, j
b=2ΣiΣj (xi, j - x') (yi, j - y') b i, j
c=ΣiΣj (yi, j - y') ^2bi, j
θ=tan‾ 1(b/ (a-c))
Rotational angle ∆θ=cur – θref
knowing the centroids of the current VOP(x'cur , y'cur) and ref
VOP(x'ref , y'ref), scale, and the rotation angle of the current VOP
relative to the reference VOP,it is possible to use these global motion
parameters to map a boundary pixel in the reference VOP to that in the current
VOP or vice versa.
The
mapping of the coordinate of boundary pixels in the current VOP to that of the
reference VOP is given by
xref= ([(x cur - x'cur)cos (-∆θ) – (ycur - y'cur)sin(-∆θ)] /
scale) + x'ref
yref = ([(ycur - y'cur)cos (-∆θ) + (xcur - x'cur)sin(-∆θ)] /
scale) + y'ref
xcur= ([(xref - x’ref) cos(∆θ) – (yref - y'ref)sin(∆θ)] *
scale) + x'cur
ycur = ( [ (yref - y'ref)cos(∆θ) + (xref - x'ref)sin(∆θ)] *
scale) + y'cur
Where (xref , yref) are the coordinates
of a boundary pixel in the reference VOP, and (xcur , ycur) are the coordinates
of a boundary pixel in the current VOP
C.
ERROR CONCEALMENT FOR SHAPE USING
GLOBAL MOTION COMPENSATION
It is assumed that the current VOP is
corrupted and the reference VOP is undamaged. The steps involved in shape
recovery based on global motion are
1) extracting
the boundary of the current VOP
2) patching the
boundary of the current VOP
3) Filling the
reconstructing boundary of the current VOP with opaque pixels.
Since the reference VOP assumed
undamaged the boundary of the reference VOP is always intact. The boundary of
the current VOP may or may not be intact depending on the error pattern. If the
boundary of the current damaged VOP is continuous, no global motion
compensation is needed, and only step (3) is carried out to fill in the mixing
pixels and recover the VOP. If the boundary of the current damaged VOP is not
intact, global motion compensation is initially applied to recover the boundary
of the VOP, and the VOP is then filled in with opaque pixels. The recovered VOP
can then be used as a reference VOP for a subsequent VOP. If a sequence of VOP
is damaged shape recovery commences at the first damaged VOP and proceeds until
the last damaged VOP is recovered
D.BOUNDARY EXTRACTION
We consider 2 methods for
extracting the boundary of the binary image. While both techniques involved
scanning all pixels in the image to find those that lie on the boundary, the
criteria used for determining whether a pixel is a boundary pixel. The first
method uses a pixel’s 8-neighbourhood to make a decision.
In the first method, if any
pixel in the 4-neighbourhood of a current pixel does not belong to the object,
the current pixel is considered as a boundary pixel. In the second method, if
any pixel in the 8-neighbourhood of a current pixel does not belong to the
object, the current pixel is considered as a boundary pixel. The boundaries
extracted by these two methods are different. The boundary extracted by the
exploiting in the 4-neighbourhood is said to be 8-connected, and that extracted
by using 8-neighbourhoods is 4-connected.
Due to the ease of
traversing a 4-connected boundary. 8-neighbourhoods are chosen to perform
boundary extraction. Although this method of boundary extraction seems
effective it did not perform well. The window size of the median filter chosen
was 5. The process is as follows
1) apply the 1-D
median filter to the image horizontally
2) apply the 1-D
median filter vertically to the image resulting from step(1)
3) Compare the
image resulting from step (2) and the image before step (1). If there is no
difference, terminate and the resulting image is used for boundary extraction;
otherwise go to step(1) and repeat
E. BOUNDARY PATCHING BY GLOBAL MOTION
COMPENSATION
After
extracting the boundary of the corrupted VOP, decision has to be made whether
boundary patching is needed or not. The criterion for deciding whether the
extracted boundary is closed is, if the number of the endpoints is zero, then
the boundary is considered closed. if the boundary is broken it is patched
using global motion compensation. The process can be divided in to the
following steps.
1) partition the
endpoints in the current VOP into pairs such that each pair of endpoints
belongs to one separate part of the boundary note that the total number of
endpoints is always even.also, the identification of a pair of endpoints can be
done by starting from one endpoint and continuously traversing the connected
boundary pixel in any direction until another endpoint is met.
2) Extract the
closed boundary of the reference VOP .map the endpoints in the current VOP to
the boundary pixels in the reference VOP according to (2).
3) Traverse the
boundary of the reference VOP starting from endpoint 1 along the direction of 1->1′, and record the order of the endpoints
traversed.
4) Traverse the
boundary of the reference VOP according to the order just derived from step
3).for each curve whose endpoints belong to different pair.
F.
VOP RECOVERY BY FILLING IN THE CLOSED BOUNDARY
After the discontinuous boundary is
reconnected, the final step is to fill in the closed boundary with opaque pixels
that make up the VOP.the algorithm employed for filling in the closed boundary
is a scan line seed-filling algorithm. This algorithm needs a stack of stored seeds
(pixel) for the line scan, and initial seed to initialize the filling process.
This algorithm works as long as the whole area to be filled is 4-connacted,
regardless of whether the shape is convex, concave or ring-like.
1) Initially, pick a pixel inside the
boundary as the initial seed and push it into the empty stack. In our method,
the centroid is used as the seed.
If the centroid is inside the VOP if the
centroid is outside the VOP, an arbitrary pixel inside the VOP is chosen.
2) Decide whether the stack is empty. If
the stack is empty, it means the filling process is over, then terminate;
otherwise, go to step 3).
3) Pop up a seed from the stack, and set
it to be opaque. denote this seed pixel as (x,y).traversed from this point to
the left and right along the line on which the seed lies, setting each
traversed pixel to opaque, until the boundaries of the VOP are met. This
process is known as “line scan filling”. Record the left boundary pixel (xl , y) and right
boundary pixel as (xr, y).
4) Scan the line right above the current
line from(x l ,y +1) to (x r ,y +1) to see if there is any pixel
that is transparent and within the boundary. If so, it means there are some
4-connected pixels in the line above the current line to be filled; find the
right most 4-connected point (xr1, y +1) in the above line and push it into the stack.
5) Apply the same process in step 4) to
the line right below the current line.
6) Go to 2)
Note that steps 4) and 5) guarantee that the whole area is scanned as
long as it is 4-connected, regardless of whether the shape is convex, concave,
or ring-like.
V. CONCLUSION
In this
paper, a new method for shape concealment based on intrashape coding and global
motion compensation was proposed. This method, which utilizes global motion
data inserted as part of the USER_DATA field in the compressed stream, consists
of three steps: 1) boundary extraction from shape; 2) boundary patching using
global motion compensation; and 3) boundary fill to reconstruct the shape of
damaged video object planes. The method can achieve good results on some QCIF/CIF
video sequences, even if the shape is severely damaged. It works well for video
sequence with slow motion or high frame rate. For video sequence with fast
motion or low frame rate, the result is acceptable.
No comments:
Post a Comment