TECHNO WORLD: Error Concealment for Shape in MPEG-4 Error Concealment for Shape in MPEG-4 Object

I. INTRODUCTION

THE MPEG-4 standard is a more generic coding standard that permits an object-based audio-visual representation of data. The advantages of this object-based representation are the following. First, different object types may have different suitable coded representations. Coding them separately could achieve better compression efficiency. Second, it allows harmonious integration of different types of data into one scene.

Each video sequence may have several video objects. A video object at a given time instant is called a video object plane (VOP). VOPs can have arbitrary shapes and can be encoded independently of each other, or dependently using motion compensation. Compressed video objects are very vulnerable to channel errors; hence, error concealment techniques are needed to navigate the effect of data corruption or loss. In this paper, we describe an error concealment technique for object-based video sequences coded using the MPEG-4 video coding standard.

II.SHAPE CODING IN MPEG-4

MPEG-4 adopted alpha planes for shape representation. There are two types of alpha planes: binary alpha planes and grayscale alpha planes. Binary alpha planes are binary images representing the shape of the VOPs. Grayscale alpha planes are typically 8-bit grayscale images representing the degree of transparency of the VOPs.

The information representing a VOP is composed of two parts, shape and texture. A VOP with an arbitrary shape is first extended into a rectangular VOP based on its shape information such that the width and height of the extended rectangular VOP are the smallest integer multiples of 16. This process is called “VOP formation”. After VOP formation, a binary image having the size of the extended rectangular VOP, normally referred to as the binary mask for the VOP, is used to represent the shape information of the VOP. Pixels with value 1 represent the VOP and pixels with value 0 represent the background. The binary mask is also divided into 16x16 blocks called binary alpha blocks (BABs). Each BAB is encoded using a shape encoder. At the decoder, the BABs will be decoded to reconstruct the arbitrarily shaped VOP.

When coding the shape of an object, each of its BABs is either intracoded using a block-based context-based arithmetic coder or interceded using a combination of motion estimation/compensation and context-based arithmetic encoding (CAE).

III. PREVIOUS ERROR CONCEALMENT FOR SHAPE CODING IN MPEG-4 VIDEO

Error resilience and concealment techniques are needed for video transmission over unreliable communication channels. Since MPEG-4 is aimed at making media delivery possible over all kinds of delivery technologies, including error-prone networks such as wireless networks and the Internet, error resilience and concealment techniques are necessary.

Although both shape and texture are essential for the representation of a VOP, shape is always considered more important. In addition, binary shape information is more sensitive to errors compared to texture information. Thus, shape information is more critical than texture and error concealment for shape is considered a requisite.

Many techniques have been developed for texture recovery that cannot be used for shape recovery for the following reasons: 1) a binary mask represents shape while texture is represented using grayscale values; 2) the only important feature in shape data is the boundary of the video object plane. All pixels within the boundary are opaque, and the rest are transparent; and 3) even though shape and texture are coded using motion estimation and compensation techniques, the coded representations are different. Shape is coded in the spatial domain using block-based context-based arithmetic encoding, while texture is coded in transform domain(DCT coefficients).Thus, the nature of shape data requires alternative techniques for error concealment.

A binary Markov random field (MRF) is used as the underlying image model, and maximum a posteriori (MAP) estimation is employed to yield the most likely image given the observed image data. Each missing pixel is estimated from the pixels in its clique as the median of weighted clique pixel values. The weights are based on the likelihood of an edge being in the direction of the missing pixel and a pixel in its neighborhood. The rationale behind this selection is to weigh more the difference between the missing pixel and a pixel in its clique in a direction along which pixels have a tendency to be the same.

Although this method is good, there exist some problems with it. First, it makes use of spatial redundancy only to recover missing pixels. Second, the outcome of the method depends on the error pattern of the image. That is, the distribution of the correctly received pixels in the image plays a very strong role in the recovery of damaged area, and may produce unpredictable results in some cases. Third, this method tries to recover image data based on block recovery, not on boundary recovery for shape information when only the boundary is of concern, may not be very effective. Fourth, the method is computationally intensive. The number of the iterations needed before convergence depends on the content of the image. The iteration involves

a) finding the boundary pixels in the neighboring blocks;

b) calculating the slope of the best line-fit for each of the border pixels;

c) estimating missing pixels;

Steps a) and b) are needed for each binary alpha block and step c) is required for every missing pixel.

IV. GLOBAL MOTION COMPENSATION METHOD

FOR SHAPE CONCEALMENT

Predictive coding gives a compressed video stream more vulnerable to errors. One way to make the compressed video stream more resilient is to use intracoding. This can effectively prevent error propagation, but results in a loss in compression efficiency. This loss, however, can be ameliorated if rate control schemes are used.

The use of intracoding means that any error concealment strategy will only be able to utilize spatial data, as no information regarding the temporal correlation between VOPs at different time instances of errors. Some temporal information (motion data) is necessary in these cases for effective recovery. One way a decoder can obtain temporal information when intracoding is used is for the encoder to insert it into the bit stream. However, this raises the problem about how to insert additional data, in particular motion data, into the bit stream without changing the format of the bit stream. In other words, how does one add motion information to the bit stream without affecting the syntax of the bit stream? Fortunately, MPEG-4 defines a variable length USER-DATA field, which allows additional data defined by applications to be transmitted along. This field can be used to insert motion data. The decoder can then retrieve this data and use it to find shape damage resulting from transmission errors.

Another problem raised by motion data insertion is how much motion data is enough for concealing errors without compromising compression efficiency. Since shape coding deals with boundaries, we need not trace the motion of all the macro blocks belonging to an object. Only the motion of the object’s boundary is required. Hence, one can utilize global motion estimation to trace the motion of the object. This can be employed by the decoder to perform error concealment, when necessary.

It is important to note that block-based motion estimation usually produces better results than global motion estimation in the case of texture. In block-based motion estimation, uniform motion is assumed for each block, whereas in global motion estimation, uniform motion is assumed for the entire object. Since macro blocks may have different texture, block-based motion estimation for texture tends to be more accurate. However, since all BABs inside a VOP are opaque, it is not necessary to use block –based motion estimation for shape since the motion information that is important is that belonging to the boundary. Furthermore, by applying global motion estimation, the continuity of the boundary is maintained, while block-based motion estimation may not guarantee continuity at the junction points between neighboring blocks. An advantage of block-based motion estimation is that it may be more accurate when there are changes between consecutive VOPs that cannot be accurately reflected by global motion. Since these cases do not happen very often in a video sequence, and even local motion estimation may not produce good results for all these cases, global motion estimation for boundary may be the better for most cases.

A. GLOBAL MOTION COMPENSATION TECHNICS

In Global motion estimation or camera motion estimation apparent motion in most image sequence obtained by a camera can be attributed to either camera motion or the movement of the objects in scene. Local motion estimation is applied to trace the movement of objects where as global motion is applied to follow camera motion. a grate amount of redundancy between successive frames can be removed by applying global motion estimation. Global motion parameters can be effectively used to predict the stationary parts of a screen, thus saving the bandwidth needed to otherwise transmit motion vectors for them.

Three dimensional object or camera movement can be classified using the following categories.

1) Change of camera focal length – zoom

2) Rotation around the camera access

3) Translation in the plane normal to the camera axis

4) Rotation around an axis normal to the camera axis –pan

5) Translation along the camera axis

These characteristics are represented by a rotation transformation followed by a translation transformation to the 3-D point to be mapped. To calculate the transformation 12 parameters are needed. The transformation is given by

x2 x1 r1 r2 r3 x

y2 = R y1 + T , where R = r4 r5 r6 , T = y

z2 z1 r7 r8 r9 z

R-represents rotation transformation matrix

Its elements are obtained from the pan angle ‘α’, tilt angle ‘β’, and the swing angle

‘γ’.’ T’ represents the translation transformation matrix derived from 3-D translation

X1 y1 z1 represents the coordinate of a 3-D point before applying the global

Motion transforms, and x2 y2 z2 represents the coordinate of the 3-D point after global motion transform supplied

B. GLOBAL MOTION ESTIMATION PARAMETERS

NEEDED FOR SHAPE RECOVERY

When applying global motion estimation to perform shape error concealment, it is only the shape of the video object plane and its boundary is important. the only aspects of 3-D camera motion that are utilized in the proposed method are the change of the camera focal length, camera rotation around its axis, and translation in the plane normal to the camera. the other aspects namely translation along the camera axis and pan can be approximated by zoom, and since rotation around an axis normal to the camera axis is usually very slight and it is approximated by the translation in the plane normal to the camera axis. Thus the 3-D camera movement becomes 2-D global movement with following characteristics

1) Change of the camera focal length-zoom or scale

2) Rotation around the camera axis

3) Translation in the plane normal to the camera axis

Relative coordinates of a VOP in a frame are encoded through the motion vectors; translation is already embedded in the bit stream. Thus the global motion parameters required are zoom and 2-D rotation. To represent the global motion 4 parameters are used

1) Two parameters are used for the centroid of the object they are

x' = (Σi Σj bi, j * j)/ (area (VOP))

y'= - (Σi Σj bi, j * i)/ (area (VOP))

2) Orientation angle (θ) of an object in a binary image can be obtained as

a=ΣiΣj (xi, j - x’) ^2 bi, j

b=2ΣiΣj (xi, j - x') (yi, j - y') b i, j

c=ΣiΣj (yi, j - y') ^2bi, j

θ=tan‾ 1(b/ (a-c))

Rotational angle ∆θ=cur – θref

knowing the centroids of the current VOP(x'cur , y'cur) and ref VOP(x'ref , y'ref), scale, and the rotation angle of the current VOP relative to the reference VOP,it is possible to use these global motion parameters to map a boundary pixel in the reference VOP to that in the current VOP or vice versa.

The mapping of the coordinate of boundary pixels in the current VOP to that of the reference VOP is given by

xref= ([(x cur - x'cur)cos (-∆θ) – (ycur - y'cur)sin(-∆θ)] / scale) + x'ref

yref = ([(ycur - y'cur)cos (-∆θ) + (xcur - x'cur)sin(-∆θ)] / scale) + y'ref

xcur= ([(xref - x’ref) cos(∆θ) – (yref - y'ref)sin(∆θ)] * scale) + x'cur

ycur = ( [ (yref - y'ref)cos(∆θ) + (xref - x'ref)sin(∆θ)] * scale) + y'cur

Where (xref , yref) are the coordinates of a boundary pixel in the reference VOP, and (xcur , ycur) are the coordinates of a boundary pixel in the current VOP

C. ERROR CONCEALMENT FOR SHAPE USING

GLOBAL MOTION COMPENSATION

It is assumed that the current VOP is corrupted and the reference VOP is undamaged. The steps involved in shape recovery based on global motion are

1) extracting the boundary of the current VOP

2) patching the boundary of the current VOP

3) Filling the reconstructing boundary of the current VOP with opaque pixels.

Since the reference VOP assumed undamaged the boundary of the reference VOP is always intact. The boundary of the current VOP may or may not be intact depending on the error pattern. If the boundary of the current damaged VOP is continuous, no global motion compensation is needed, and only step (3) is carried out to fill in the mixing pixels and recover the VOP. If the boundary of the current damaged VOP is not intact, global motion compensation is initially applied to recover the boundary of the VOP, and the VOP is then filled in with opaque pixels. The recovered VOP can then be used as a reference VOP for a subsequent VOP. If a sequence of VOP is damaged shape recovery commences at the first damaged VOP and proceeds until the last damaged VOP is recovered

D.BOUNDARY EXTRACTION

We consider 2 methods for extracting the boundary of the binary image. While both techniques involved scanning all pixels in the image to find those that lie on the boundary, the criteria used for determining whether a pixel is a boundary pixel. The first method uses a pixel’s 8-neighbourhood to make a decision.

In the first method, if any pixel in the 4-neighbourhood of a current pixel does not belong to the object, the current pixel is considered as a boundary pixel. In the second method, if any pixel in the 8-neighbourhood of a current pixel does not belong to the object, the current pixel is considered as a boundary pixel. The boundaries extracted by these two methods are different. The boundary extracted by the exploiting in the 4-neighbourhood is said to be 8-connected, and that extracted by using 8-neighbourhoods is 4-connected.

Due to the ease of traversing a 4-connected boundary. 8-neighbourhoods are chosen to perform boundary extraction. Although this method of boundary extraction seems effective it did not perform well. The window size of the median filter chosen was 5. The process is as follows

1) apply the 1-D median filter to the image horizontally

2) apply the 1-D median filter vertically to the image resulting from step(1)

3) Compare the image resulting from step (2) and the image before step (1). If there is no difference, terminate and the resulting image is used for boundary extraction; otherwise go to step(1) and repeat

E. BOUNDARY PATCHING BY GLOBAL MOTION COMPENSATION

After extracting the boundary of the corrupted VOP, decision has to be made whether boundary patching is needed or not. The criterion for deciding whether the extracted boundary is closed is, if the number of the endpoints is zero, then the boundary is considered closed. if the boundary is broken it is patched using global motion compensation. The process can be divided in to the following steps.

1) partition the endpoints in the current VOP into pairs such that each pair of endpoints belongs to one separate part of the boundary note that the total number of endpoints is always even.also, the identification of a pair of endpoints can be done by starting from one endpoint and continuously traversing the connected boundary pixel in any direction until another endpoint is met.

2) Extract the closed boundary of the reference VOP .map the endpoints in the current VOP to the boundary pixels in the reference VOP according to (2).

3) Traverse the boundary of the reference VOP starting from endpoint 1 along the direction of 1->1′, and record the order of the endpoints traversed.

4) Traverse the boundary of the reference VOP according to the order just derived from step 3).for each curve whose endpoints belong to different pair.

F. VOP RECOVERY BY FILLING IN THE CLOSED BOUNDARY

After the discontinuous boundary is reconnected, the final step is to fill in the closed boundary with opaque pixels that make up the VOP.the algorithm employed for filling in the closed boundary is a scan line seed-filling algorithm. This algorithm needs a stack of stored seeds (pixel) for the line scan, and initial seed to initialize the filling process. This algorithm works as long as the whole area to be filled is 4-connacted, regardless of whether the shape is convex, concave or ring-like.

1) Initially, pick a pixel inside the boundary as the initial seed and push it into the empty stack. In our method, the centroid is used as the seed.

If the centroid is inside the VOP if the centroid is outside the VOP, an arbitrary pixel inside the VOP is chosen.

2) Decide whether the stack is empty. If the stack is empty, it means the filling process is over, then terminate; otherwise, go to step 3).

3) Pop up a seed from the stack, and set it to be opaque. denote this seed pixel as (x,y).traversed from this point to the left and right along the line on which the seed lies, setting each traversed pixel to opaque, until the boundaries of the VOP are met. This process is known as “line scan filling”. Record the left boundary pixel (xl , y) and right boundary pixel as (xr, y).

4) Scan the line right above the current line from(x l ,y +1) to (x r ,y +1) to see if there is any pixel that is transparent and within the boundary. If so, it means there are some 4-connected pixels in the line above the current line to be filled; find the right most 4-connected point (xr1, y +1) in the above line and push it into the stack.

5) Apply the same process in step 4) to the line right below the current line.

6) Go to 2)

Note that steps 4) and 5) guarantee that the whole area is scanned as long as it is 4-connected, regardless of whether the shape is convex, concave, or ring-like.

V. CONCLUSION

In this paper, a new method for shape concealment based on intrashape coding and global motion compensation was proposed. This method, which utilizes global motion data inserted as part of the USER_DATA field in the compressed stream, consists of three steps: 1) boundary extraction from shape; 2) boundary patching using global motion compensation; and 3) boundary fill to reconstruct the shape of damaged video object planes. The method can achieve good results on some QCIF/CIF video sequences, even if the shape is severely damaged. It works well for video sequence with slow motion or high frame rate. For video sequence with fast motion or low frame rate, the result is acceptable.

TECHNO WORLD

Thursday, September 6, 2012

Error Concealment for Shape in MPEG-4 Error Concealment for Shape in MPEG-4 Object –Based Video Coding.

No comments:

Post a Comment