24 August 2009

Video and Audio Compression

from
A Practical Guide to Video and Audio Compression
Cliff Wootton 2005, Elsevier Inc.
ISBN: 0-240-80630-1

Intro

Video compression is all about trade-offs. Ask yourself what constitutes the best video experience for your customers. That is what determines where you are going to compromise. Which of these are the dominant factors for you?

Image quality
Sound quality
Frame rate
Saving disk space
Moving content around our network more quickly
Saving bandwidth
Reducing the playback overhead for older processors
Portability across platforms
Portability across players
Open standards
Licensing costs for the tools
Licensing costs for use of content
Revenue streams from customers to you
Access control and rights management
Reduced labor costs in production

You will need to weigh these factors against each other. Some of them are mutually exclusive.
The actual compression process itself is almost trivial in comparison to the contextual setting (the context in which the video is arriving as well as the context where it is going to be deployed once it has been processed) and the preprocessing activity. It is not necessary to use mathematical theory to understand compression.

What Is a Video Compressor?
All video compressors share common characteristics. In fact, these terms describe the step-by-step process of compressing video:

Frame difference
Motion estimation
Discrete cosine transformation
Entropy coding

Video compression is only a small part of the end-to-end process. That process starts with deciding what to shoot, continues through the editing and composition of the footage, and usually ends with delivery on some kind of removable media or broadcast system. In a domestic setting, the end-to-end process might be the capture of analogue video directly off the air followed by digitization and efficient storage inside a home video server. This is what a TiVo Personal Video Recorder (PVR) does, and compression is an essential part of how that product works.

There is usually a lot of setting up involved before you ever compress anything. Preparing the content first so the compressor produces the best-quality output is very important. A rule of thumb is that about 90% of the work happens before the compression actually begins.

The rule of thumb:
about 90% of the coverage is about things you need to know in order to utilize that
10% of the time you will actually spend compressing video in the most effective way possible.

The word codec is derived from coder–decoder and is used to refer to both ends of the process—squeezing video down and expanding it to a viewable format again on playback.
Compatible coders and decoders must be used, so they tend to be paired up when they are delivered in a system like QuickTime or Windows Media. Sometimes the coder is provided for no charge and is included with the decoder. Other times you will have to buy the coder separately. By the way, the terms coder and encoder in general refer to the same thing.

Hot-pluggable connections are those that are safe to connect while your equipment is turned on. This is, in general, true of a signal connection but not a power connection. Some hardware, such as SCSI drives, must never be connected or unconnected while powered on. On the other hand, Firewire interfaces for disk drives are designed to be hot pluggable.

It is important to know whether we are working with high-definition or standard-definition content. Moving images shot on film are quite different from TV pictures due to the way that TV transmission interlaces alternate lines of a picture.

Interlacing separates the odd and even lines and transmits them separately. It allows the overall frame rate to be half what it would need to be if the whole display were delivered progressively. Thus, it reduces the bandwidth required to 50% and is therefore a form of compression.
Interlacing is actually a pretty harsh kind of compression given the artifacts that it introduces and the amount of processing complexity involved when trying to eliminate the unwanted effects. Harsh compression is a common result of squashing the video as much as possible, which often leads to some compromises on the viewing quality. The artifacts you can see are the visible signs of that compression.

Because the sampling and compression of audio and video are essentially the same, artifacts that affect one will affect the other. They just present themselves differently to your ears and eyes.

This is content that is delivered to you as a continuous series of pictures and your system has to keep up. There is little opportunity to pause or buffer things to be dealt with later. Your system has to process the video as it arrives. It is often a critical part of a much larger streaming service that is delivering the encoded video to many thousands or even millions of subscribers. It has to work reliably all the time, every time. That ability will be compromised if you make suboptimum choices early on. Changing your mind about foundational systems you have already deployed can be difficult or impossible.

How we store video in files? Some applications require particular kinds of containers and will not work if you present your video in the wrong kind of file. It is a bit like taking a flight with a commercial airline. Your suitcase may be the wrong size or shape or may weigh too much. You have to do something about it before you will be allowed to take it on the plane. It is the same with video. You may need to run some conversions on the video files before presenting the contents for compression.

In the context of video encoding, we have to make sure the right licenses are in place. We need
rights control because the content we are encoding may not always be our own. Playback
clients make decisions of their own based on the metadata in the content, or they can interact
with the server to determine when, where, and how the content may be played. Your playback client is the hardware apparatus, software application, movie player, or web page plug-in that you use to view the content.

Where do you want to put your finished compressed video output? Are you doing this so you can archive some content? Is there a public-facing service that you are going to provide? This is often called deployment. It is a process of delivering your content to the right place

How your compressed video is streamed to your customers? Streaming comes in a variety of formats. Sometimes we are just delivering one program, but even then we are delivering several streams of content at the same time. Audio and video are processed and delivered to the viewer independently, even though they appear to be delivered together. That is actually an illusion because they are carefully synchronized. It is quite obvious when they are not in sync, however, and it could be your responsibility to fix the problem.

About the client players for which you are creating your content: Using open standards helps to reach a wider audience. Beware of situations where a specific player is mandated. This is either because you have chosen a proprietary codec or because the open standard is not supported correctly. That may be accidental or purposeful. Companies that manufacture encoders and players will sometimes advertise that they support an open standard but then deliver it inside a proprietary container.

You are likely to hit a few bumps along the way as you try your hand at video compression. These will manifest themselves in a particularly difficult-to-encode video sequence. You will no doubt have a limited bit rate budget and the complexity of the content may require more data than you can afford to send. So you will have to trade off some complexity to reduce the bandwidth requirements. Degrading the picture quality is one option, or you can reduce the frame rate. The opportunities to improve your encoded video quality begin when you plan what to shoot.

Conventions

Film size is always specified in metric values measured in millimeters (mm).
Sometimesscanning is described as dots per inch or lines per inch.
TV screen sizes are always described in inches measured diagonally. Most of the time, this won’t matter to us, since we are describing digital imagery measured in pixels. The imaging area of film is measured in mm, and therefore a film-scanning resolution in dots per mm seems a sensible compromise.

TV pictures generally scan with interlaced lines, and computers use a progressive scanning layout. The difference between them is the delivery order of the lines in the picture. Frame rates are also different.

The convention for describing a scanning format is to indicate the number of physical lines, the scanning model, and the field rate. For interlaced displays, the field rate is twice the frame rate, while for progressive displays, they are the same

For example, 525i60 and 625i50 describe the American and European display formats, respectively.

In the abbreviations we use, note that uppercase B refers to bytes, and lowercase b is
bits. So GB is gigabytes. When we multiply bits or bytes by each increment, the value 1000 is actually replaced by the nearest equivalent base-2 number. So we multiply memory size by 1024 instead of 1000 to get kilobytes.

The MPEG-4 part 10, otherwise known as H.264 codec, is part of a family of video encoders that is listed below

engineering people use the term H.264 and commercial or marketing people prefer AVC.

Further confusion arises during discussion of the Windows Media codecs, since they have been lodged with SMPTE for ratification as an open standard. All of the naming conventions in Table 1-3 have been used in documents about video compression and codecs: Unless it is necessary to refer to the Windows Media codec by a different alias, the term VC-1 will be used here as far as possible

We Need Video Compression?

There are quite a few products and services available today that just wouldn’t be possible without compression. Many more are being developed.

Delivering digital video and audio through the available networks is simply impossible without compressing the content first.
To give you some history, there has been a desire to deliver TV services through telephone networks for many years. Trials were carried out during the 1980s. Ultimately, they were all unsuccessful because they couldn’t get the information down the wire quickly enough. Now we are on the threshold of being able to compress TV services enough that they can fit into the bandwidth being made available to broadband users. The crossing point of those two technologies is a very important threshold. Beyond it, even more sophisticated services become available as the broadcast on-air TV service comes to occupy a smaller percentage of the available bandwidth. So, as bandwidth increases and compressors get better, all kinds of new ways to enjoy TV and Internet services come online. For example, a weather forecasting service
could be packaged as an interactive presentation and downloaded in the background. If this is cached on a local hard disk, it will always be available on demand, at an instant’s notice. An updated copy can be delivered in the background as often as needed. Similar services can be developed around airline flight details, traffic conditions, and sports results.

Compression Is About Trade-Offs
Compressing video is all about making the best compromises possible without giving up too much quality. To that end, anything that reduces the amount of video to be encoded will help reduce the overall size of the finished output file or stream.

Compression is not only about keeping overall file size small. It also deals with optimizing data throughput—the amount of data that will steadily move through your playback pipeline and get onto the screen.

If you don’t compress the video properly, it will not fit the pipe and therefore cannot be streamed in real time.
Reducing the number of frames to be delivered helps reduce the capacity required,but the motion becomes jerky and unrealistic. Keeping the frame count up may mean you have to compromise on the amount of data per frame. That leads to loss of quality and a blocky appearance. Judging the right setting is difficult, because certain content compresses more easily, while other material creates a spike in the bit rate required. That spike can be allowed to momentarily absorb a higher bit rate, in which case the quality will stay the same. Alternatively, you can cap the bit rate that is available. If you cap the bit rate, the quality will momentarily decline and then recover after the spike has passed. A good example of this is a dissolve between two scenes when compressed using MPEG-2 for broadcast TV services operating within a fixed and capped bit rate.

First We Have to Digitize
Although some compression can take place while video is still in an analog form,

we only get the large compression ratios by first converting the data to a digital representation Converting from analog to digital form is popularly called digitizing.
and then reducing the redundancy.

We now have techniques for digitally representing virtually every thing that we might consume.
The whole world is being digitized, but we aren’t yet living in the world of The Matrix.
Digitizing processes are normally only concerned with creating a representation of a view. Video structure allows us to isolate a view at a particular time, but unless we apply a lot more processing, we cannot easily isolate objects within a scene or reconstruct the 3D spatial model of a scene.

Software exists that can do that kind of analysis, but it is very difficult. It does lead to very efficient compression, though. So standards like MPEG-4 allow for 3D models of real-world objects to be used. That content would have the necessary structure to exploit this kind of compression because it was preserved during the creation process. Movie special effects use 3D-model and 2D-view digitizing to combine artificially created scene components and characters with real-world pictures. Even so, many measurements must still be taken when the plates (footage) are shot.

Spatial Compression
Spatial compression squashes a single image. The encoder only considers that data, which is self-contained within a single picture and bears no relationship to other frames in a sequence. This process we use it all the time when we take pictures with digital still cameras and upload them as a JPEG file. GIF and TIFF images are also examples of spatial compression. Simple video codecs just create a sequence of still frames that are coded in this way. Motion JPEG is an example in which every frame is discrete from the others.

The process starts with uncompressed data that describes a color value at a Cartesian (or X–Y) point in the image. Figure 2-1 shows a basic image pixel map. The next stage is to apply some run-length encoding, which is a way of describing a range of pixels whose value is the same.

Descriptions of the image, such as “pixels 0,0 to 100,100 are all black,” are recorded in the file. A much more compact description is shown in Figure 2-2. This coding mechanism assumes that the coding operates on scan lines. Otherwise it

would just describe a diagonal line.
The run-length encoding technique eliminates much redundant data without losing quality. A lossless compressor such as this reduces the data to about 50% of the original size,
depending on the image complexity. This is particularly good for cell-animated footage.

The TIFF image format uses this technique and is sometimes called LZW compression after its inventors, Lempel, Ziv, and Welch. Use of LZW coding is subject to some royalty fees if you want to implement it, because the concepts embodied in it are patented. This should be included in the purchase price of any tools you buy.

The next level of spatial compression in terms of complexity is the JPEG technique, which breaks the image into macroblocks and applies the discrete cosine transform (DCT). This kind of compression starts to become lossy. Minimal losses are undetectable by the human eye, but as the compression ratio increases, the image visibly degrades. Compression using the JPEG technique reduces the data to about 10% of the original size.

Temporal Compression
Video presentation is concerned with time and the presentation of the images at regular
intervals. The time axis gives us extra opportunities to save space by looking for redundancy
across multiple images.

This kind of compression is always lossy. It is founded on the concept of looking for differences between successive images and describing those differences, without having to repeat the description of any part of the image that is unchanged.

Spatial compression is used to define a starting point or key frame. After that, only the differences are described. Reasonably good quality is achieved at a data rate of one tenth of the original data size of the original uncompressed format. Research efforts are underway to investigate ever more complex ways to encode the video without requiring the decoder to work much harder. The innovation in encoders leads to significantly improved compression factors during the player deployment lifetime without needing to replace the player.

A shortcut to temporal compression is to lose some frames, however it is not recommended.
In any case, it is not a suitable option for TV transmission that must maintain the frame rate.

Why Do I Need Video Compression?
Service providers and content owners are constantly looking for new avenues of profit from the material they own the rights to. For this reason, technology that provides a means to facilitate the delivery of that content to new markets is very attractive to them. Content owners require an efficient way to deliver content to their centralized repositories. Cheap and effective ways to provide that content to end users are needed, too. Video compression can be used at the point where video is imported into your workflow at the beginning of the content chain as well as at the delivery end. If you are using video compression at the input, you must be very careful not to introduce undesirable artifacts. For archival and transcoding reasons, you should store only uncompressed source video if you can afford sufficient storage capacity.

Some Real-World Scenarios
Let’s examine some of the possible scenarios where video compression can provide assistance.
In some of these examples, video compression enables an entire commercial activity that simply would not be possible otherwise. We’ll take a look at some areas of business to see how compression helps them.

Mobile Journalism
News-gathering operations used to involve a team of people going out into the field to operate bulky and very expensive equipment. As technology has progressed, cameras have gotten smaller and easier to use. A film crew used to typically include a sound engineer, camera person, and producer, as well as the journalist being filmed. These days, the camera is very likely carried by the journalist and is set up to operate automatically.

Broadcast news coverage is being originated on videophones, mini-cams, and video enabled
mobile-phone devices. The quality of these cameras is rapidly improving. To maintain a comfortable size and weight for portable use, the storage capacity in terms of hardware has very strict limits. Video compression increases the capacity and thus the recording time available by condensing the data before recording takes place.

Current practice is to shoot on a small DV camera, edit the footage on a laptop, and then send it back to base via a videophone or satellite transceiver. The quality will clearly not be the same as that from a studio camera, but it is surprisingly good even though a high compression ratio is used.

Trials are underway to determine whether useful results can be obtained with a PDA device fitted with a video camera and integral mobile phone to send the material back to a field headquarters. The problem is mainly one of picture size and available bandwidth for delivery.

Online Interactive Multi-Player Games
Multi-player online gaming systems have become very popular in recent years. The realism of the visuals increases all the time. So, too, does the requirement to hurl an ever growing quantity of bits down a very narrow pipe. The difficulty increases as the games become more popular, with more streams having to be delivered simultaneously. Online games differ significantly from normal video, because for a game to be compelling, some aspects of what you see must be computed as a consequence of your actions. Otherwise, the experience is not interactive enough.

There are some useful techniques to apply that will reduce the bit rate required. For example, portions of the image can be static. Static images don’t require any particular bit rate from one frame to the next since they are unchanged. Only pixels containing a moving object need to be delivered. More sophisticated games are evolving, and interactivity becomes more interesting
if you cache the different visual components of the scene in the local player hardware and then composite them as needed. This allows some virtual-reality (VR) techniques to be employed to animate the backdrop from a large static image.

Nevertheless, compression is still required in order to shrink these component assets down to a reasonable size, even if they are served from a local cache or CD-ROM. New standards-based codecs will facilitate much more sophisticated game play. Codecs such as H.264 are very efficient. Fully exploiting the capabilities of the MPEG-4 standard will allow you to create non-rectangular, alpha-blended areas of moving video. You could map that video onto a 3D mesh that represents some terrain or even a face. The MPEG-4 standard also provides scene construction mechanisms so that video assets can be projected into a 3D environment at the player. This allows the user to control the point of view. It also reduces the bit rate required for delivery, because only the flat, 2D versions of the content need to be delivered as component objects. As the scene becomes more realistic, video compression helps keep games like FPS etc small enough to deploy online or on some kind of sell-through, removable-disk format.

Online Betting
Betting systems are sometimes grouped together with online gaming, and that may be appropriate in some cases. But online gaming is more about the interaction between groups of users and may involve the transfer of large amounts of data on a peer-to-peer basis.

Betting systems can be an extension of the real-world betting shop where you place your wager and watch the outcome of the horse race or sports event on a wall of monitor screens. The transfer of that monitor wall to your domestic PC or TV screen is facilitated by efficient and cheap video compression. Real-time compression comes to the fore here because you cannot introduce more than fractions of a second of delay—the end users have wagered their own money and they expect the results to arrive in a timely manner.

Another scenario could involve a virtual poker game. These are often based around VR simulations of a scene, but with suitable compression a live game could be streamed to anyone who wants to dial in and watch. Virtualizing a pack of cards is possible by simulating the cards on the screen, and a video-conferencing system could be used to enable observation of facial expressions of the other players in the game.

Sports and News coverage
Of all the different genres of content that broadcasters provide to end users, news and sports have some particularly important criteria that directly affect the way that video is compressed for presentation.

News and sports are both very information-rich genres. Archiving systems tend to be large in both cases because there is a lot of material available. The metadata associated with the content assists the searching process and also facilitates the digital rights management (DRM) process. The content is easily accessible and widely available, but the playback can be controlled. Video may need to be encrypted as well as encoded. Other technologies such as watermarking are used, and these present additional technical problems. In general, the rights protection techniques that are available impose further loads on an already hardworking compression system.

The nature of news content is that the material must be encoded quickly and presented as soon after the event as possible. The same is true of sports coverage, and services that present the highlights of a sporting event need to be able to select and encode fragments of content easily, quickly, and reliably. These demands lead to the implementation of very large infrastructure projects such as the BBC Colledia-based Jupiter system deployed in its news division. This facilitates the sharing of media assets as soon as they start to arrive. Editing by multiple teams at the same time is possible, and the finished packages are then routed to transmission servers in a form that is ready to deploy to the national TV broadcast service as well as to the Internet.

Advertising
Advertising on the Internet is beginning use video to present more compelling content. The newer codecs such as H.264 allow the macroblocks to be presented in quite sophisticated
geometrical arrangements. It is now feasible to fit video into the traditional banner advertising rectangles that have a very different aspect ratio from normal video. Creating the content may need to be done using video editing tools that allow non-standard raster sizes to be used.
More information about these standard sizes is available at the Interactive Advertising Bureau (IAB) Web site.

Video Conferencing
Large corporations have used video conferencing for many years. As far back as the 1980s,
multinational corporations were prepared to permanently lease lines from the telecommunications companies in order to link headquarters offices in the United States with
European offices. This generally required a dedicated room to be set aside and was sufficiently
expensive that only one video-conferencing station would be built per site. Only one group of people could participate at a time, and the use of the technology was reserved for important meetings.

Video conferencing can now be deployed to a desktop or mobile phone. This is only possible because video compression reduces the data-transfer rate to a trickle compared with the systems in use just a few years ago. Video conferencing applications currently lack the levels of interoperability between competing systems that telephone users enjoy for speech. That will come in time. For now, the systems being introduced are breaking new ground in making this available to the general public and establishing the fundamental principles of how the infrastructure should support it.
An example of an advanced video-conferencing user interface that supports multiple simultaneous users is available in the MacOS X version 10.4 operating system and is called iChat AV.

Remote Medicine
The use of remote apparatus and VR techniques for medicine is starting to facilitate so called
“telemedicine,” where an expert in some aspect of the medical condition participates in a surgical operation being performed on the other side of the world. Clearly there are issues here regarding the need for force feedback when medical instruments are being operated remotely. Otherwise, how can the operating surgeon “feel” what the instrument is doing on the remote servo-operated system? Game players have used force-feedback systems for some time. The challenge is to adapt this for other situations and maintain a totally synchronized remote experience. Video compression is a critical technology that allows multiple simultaneous camera views to be delivered over long distances. This will also work well for MRI, ultrasound, and X-ray-imaging systems that could all have their output fed in real time to a remote surgeon. The requirements here are for very high resolution. Xray images need to be digitized in grayscale to increased bit depths and at a much higher resolution than TV. This obviously increases the amount of data to be transferred.

Remote Education
Young people often have an immediate grasp of technology and readily participate in interactive games and educational uses of video and computing systems. The education community has fully embraced computer simulation, games, and interactive software. Some of the most advanced CD-ROM products were designed for educational purposes. With equal enthusiasm, the education community has embraced the Internet, mainly by way of Web sites. Video compression provides opportunities to deploy an even richer kind of media for use in educational systems. This enhances the enjoyment of consumers when they participate. Indeed, it may be the only way to enfranchise some special-needs children who already have learning difficulties.

Online Support and Customer Services
Online help systems may be implemented with video-led tuition. When designing and
implementing such a system, it is important to avoid alienating the user. Presenting users with an experience that feels like talking to a machine would be counterproductive. Automated answering systems already bother some users due to the sterile nature of the interchange. An avatar-based help system might fare no better and present an unsatisfying experience unless it is backed up by well-designed artificial intelligence.

Entertainment
Online gaming and betting could be categorized as entertainment. Uses of video compression
with other forms of entertainment are also possible. DVD sales have taken off faster than anyone could ever have predicted. They are cheap to manufacture and provide added-value features that can enhance the viewer’s enjoyment. The MPEG-4 standard offers packaging for interactive material in a way that the current DVD specification cannot match. Hybrid DVD disks with MPEG-4 interactive content and players with MPEG-4 support could herald a renaissance in content authoring similar to what took place in the mid-1990s with CD-ROMs.
The more video can be compressed into smaller packages without losing quality, the better the experience for the viewer within the same delivery form factor (disk) or capacity (bit rate). The H.264 codec is being adopted widely as the natural format for delivering high definition TV (HDTV) content on DVD and allows us to store even longer definition programs on the existing 5-GB and 9-GB disks.

Religion
All of the major religions have a presence on the Internet. There are Web sites that describe their philosophy, theology, and origins. Video compression provides a way to involve members of the community who may not be physically able to attend ceremonies. They may even be able to participate through a streamed-video broadcast. This may well be within the financial reach of medium to large churches, and as costs are reduced, even small communities may be able to deploy this kind of service. There are great social benefits to be gained from community-based use of video-compression systems. Such applications could be built around video-conferencing technologies quite inexpensively.

Commerce
Quite unexpectedly, the shopping channel has become one of the more popular television
formats. This seems to provide an oddly compelling kind of viewing. Production values are very low cost, and yet people tune in regularly to watch. Broadcasting these channels at 4.5 megabits per second (Mbps) on a satellite link may ultimately prove to be too expensive. As broadband technology improves its reach to consumers, these channels could be delivered at much lower bit rates through a networked infrastructure.

A variation of this that can be combined with a video-conferencing system is the business-to-business application. Sales pitches; demos; and all manner of commercial meetings, seminars, and presentations could take place courtesy of fast and efficient video compression.

Security and Surveillance
Modern society requires that a great deal of our travel and day-to-day activity take place under surveillance. A commuter traveling from home to a railway station by car, then by train, and then on an inner-city rapid-transit system may well be captured by as many as 200 cameras between home and the office desk. That is a lot of video, which until recently has been recorded on VHS tapes, sometimes at very low resolution, at reduced frame rates, and presented four at once in quarter-frame panes in order to save tape.

Newer systems are being introduced that use video compression to preserve video quality, increase frame rates, and automate the storage of the video on centralized repositories. By using digital video, the searching and facial-recognition systems can be connected to the repository. Suspects can be followed from one camera to another by synchronizing the streams and playing them back together. This is a good thing if it helps to trace and then arrest a felon. Our legislators have to draw a very careful line between using this technology for the good of society as a whole and infringing on our rights to go about our daily lives without intervention by the state. You may disagree with or feel uncomfortable about this level of surveillance, but it will likely continue to take place.

Compliance Recording
Broadcasters are required to record their output and store it for 90 days, so that if someone
wants to complain about something that was said or a rights issue needs to be resolved, the evidence is there to support or deny the claim. This is called compliance recording, and historically it was accomplished through a manually operated bank of VHS recorders running in LP mode and storing 8 hours of video per tape, requiring three cassettes per day per channel. The BBC outputs at least six full-frame TV services that need to be monitored in this way. The archive for 90 days of recording is some 1620 tapes. These all have to be labeled, cataloged, and stored for easy access in case of a retrieval request. The TX-2 compliance recorder was built on a Windows platform and was designed according to the requirements of the regulatory organizations so that UK broadcasters could store 90 days’ worth of content in an automated system. The compliance recorder is based on a master node with attached slaves, which can handle up to 16 channels in a fly configured system. Access to the archived footage is achieved via a Web-based interface, and the video is then streamed back to the requesting client. This recorder could not have been built without video compression, and it is a good
example of the kind of product that can be built on top of a platform such as Windows Media running on a Windows operating system, or other manufacturer’s technology. Because this is a software-based system, the compression ratio and hence the capacity and quality of the video storage can be configured. Less video but at a higher quality can be stored, or maximal time at low quality. The choice is yours.

Conference Proceedings
Using large-screen displays at conferences is becoming very popular. These are being driven by a video feed shot by professional camera people, and the video is often captured and made available to delegates after the conference. Siggraph conference proceedings, for example, make significant use of compression to create the DVD proceedings disk, and the Apple developer conference proceedings have for some years been a showcase of Apple’s prowess with video workflow and production processes as well as its engineering work on codecs.

Broadband Video on Demand
During 2004, the BBC tested a system called the Internet Media Player (BBC iMP). This system presents an electronic program guide (EPG) over a 14-day window. The user browses the EPG listings and is able to call up something that was missed during the previous week. Alternatively, a recording can be scheduled during the next few days. In order to adequately protect the content, the BBC iMP trials are run on a Windows-based platform that supports the Windows Media DRM functionality. If the iMP player were used on a laptop connected to a fixed broadband service, the downloaded material could be taken on the road and viewed remotely. This enables video to be as mobile as music carried around on Walkman and iPod devices. Future experiments in the area of broadband-delivered TV will explore some interesting peer-to-peer file techniques, which are designed to alleviate the bandwidth burden
on service providers. For this to work, we must have reliable and robust DRM solutions, or the super-distribution model will fail to get acceptance from the content providers.

Home Theatre Systems
Hollywood movies are designed to be viewed on a large screen in a darkened room with a surround-sound system. There is now a growing market for equipment to be deployed at home to give you the same experience. The media is still mostly available in standard definition but some high-definition content is being broadcast already. More high-definition services will be launched during the next few years. Plasma, LCD, or LED flat screens are available in sizes up to 60 inches diagonal. If you want to go larger than that, you will need to consider a projection system. At large screen sizes, it helps to increase the resolution of the image that is being
projected, and that may require some special hardware to scale it up and interpolate the additional pixels. At these increased screen sizes, any artifacts that result from the compression
will be very obvious. Higher bit rates will be necessary to allow a lower compression ratio. Some DVD products are shipped in special editions that give up all the special features in order to increase the bit rate. The gradual advancement of codec technology works in your favor. New designs yield better performance for the same bit rate as technology improves. The bottom line is that compressing video to use on a standard-definition TV set may not be good enough for home-cinema purists.

Digital Cinema
Interestingly, the high-definition TV standards that are emerging seem to be appropriate for use in digital-cinema (D-cinema) situations. The same content will play in the domestic
environment just as easily. As high-definition TV becomes more popular and more people install home theatre systems, commercial cinema complexes will need to develop their
business in new ways. They will have to do this in order to differentiate their product and give people a reason to visit the cinema instead of watching the movie at home.

Platforms
With the increasing trends toward technological convergence, devices that were inconceivable
as potential targets for video content are now becoming viable. Science fiction writers have been extolling the virtues of portable hand-held video devices for years, and now the technology is here to realize that capability. In fact, modern third-generation mobile phones are more functional and more compact than science fiction writers had envisaged being available hundreds of years into the future. Handheld video, and touch-screen, flat-screen, and large-screen video, are all available here and now. They are being rolled out in a front room near you right this minute. What we take for granted and routinely use every day is already way beyond the futuristic technologies of the Star Trek crew.

Portable Video Shoot and Edit
Portable cameras have been around for a long time. Amateur film formats were made available to the consumer as 8-mm home movie products; they replaced earlier and more unwieldy film gauges. The 8-mm formats became increasingly popular in the 1950s and ‘60s. The major shortcomings of these were that they held only enough footage to shoot 4 minutes, and most models required that the film be turned over halfway through, so your maximum shot length was only 2 minutes. At the time, battery technology was less sophisticated than what we take for granted now, and many cameras were driven by clockwork mechanisms. These devices were displaced quite rapidly with the introduction of VHS homevideo systems in the late 1970s. Several formats were introduced to try and encourage mass appeal. But editing the content was cumbersome and required several expensive four-head video recorders. Just after the start of the new millennium, digital cameras reached a price point that was affordable for the home-movie enthusiast.

Now that the cameras can be fitted with Firewire interfaces (also called iLink and IEEE 1394), their connection to a computer has revolutionized the video workflow. These cameras use the digital video (DV) format that is virtually identical to the DVCAM format used by professional videographers and TV companies. The DV format was originally conceived by Sony as digital 8-mm tape for use in Sony Handycam® recorders. The current state of the art that is represented by a system such as an Apple Macintosh G4 12-inch laptop with a FireWire connection to a Sony DCR PC 105 camera. The camera and laptop fit in a small briefcase. This combination is amazingly capable for a very reasonable total purchase price of less than $3000. The Apple laptop comes already installed with the iMovie video-editing software that is sufficient to edit and then burn a DVD (with the iDVD application). You can walk out of the store with it and start working on your movie project right away. Of course, there are alternative software offerings, and other manufacturers’ laptops support the same functionality. Sony VAIO computers are very video capable because they are designed to complement Sony’s range of cameras, and the Adobe Premier and Avid DV editing systems are comparable to the Apple Final Cut Pro software if you want to use Windows-based machines.
This is all done more effectively on desktop machines with greater processing power. The laptop solution is part of an end-to-end process of workflow that allows a lot of work to be done in the field before content is shipped back to base.

Video Playback on Handheld Devices
Handheld video playback is becoming quite commonplace. There are several classes of device available depending on what you need. Obviously, the more sophisticated they are, the more expensive the hardware. There is a natural convergence here, so that ultimately all of these capabilities may be found in a single generic device. These fall into a family of mobile video devices that include

Portable TV sets supporting terrestrial digital-TV reception
Portable DVD viewers
Diskless portable movie players
PDA viewers

Video Phones
The new generation of mobile-phone devices is converging with the role of the handheld
personal digital assistant (PDA). These mobile phones are widely available and have cameras
and video playback built in. They also have address books and other PDA-like applications,
although these may be less sophisticated than those found in a genuine PDA. Some services were being developed for so-called 2.5G mobile phones, but now that the genuine 3G phone s are shipping, they will likely replace the 2.5G offerings.

H.264 on Mobile Devices
H.264 is designed to be useful for mobile devices and consumer playback of video. Rolling this standard out for some applications must take account of the installed base of players, and that will take some time. So it is likely that, initially, H.264 will be used primarily as a mobile format.

The Ultimate Handheld Device
Taking the capabilities of a portable TV, DVD player, PDA, and mobile phone and integrating
them into a single device gets very close to the ultimate handheld portable media device. Well it might, if the capabilities of a sub-notebook computer are included. A fair use policy is now required for consumer digital video that allows us to transfer our legitimately purchased DVD to a “memory card” or other local-storage medium. These memory cards are useful gadgets to take on a long journey to occupy us as we travel, but the content owners are not comfortable with us being able to make such copies. There are still issues with the form factor for a handheld device like this. To create a viewing experience that is convenient for long journeys, we might end up with a device that is a little bulkier than a phone should be. Maybe a hands-free kit addresses that issue or possibly a Bluetooth headset. Usable keypads increase the size of these devices. Currently, it is quite expensive to provide sufficient storage capacity without resorting to an embedded hard disk. That tends to reduce the battery life, so we might look to the new storage technologies that are being developed. Terabyte memory chips based on holographic techniques may yield the power–size–weight combination that is required. Newer display technologies such as organic LED devices may offer brighter images with less
power consumed. Cameras are already reduced to a tiny charged cathode device (CCD)
assembled on a chip, which is smaller than a cubic centimeter. The key to this will be standardization. Common screen sizes, players, video codecs, and connection protocols could enable an entire industry to be built around these devices. Open standards facilitate this sort of thing, and there are high hopes that H.264 (AVC) and the other parts of the MPEG-4 standard will play an important role here.

Personal Video Recorders
Personal video recorders (PVRs) are often generically referred to as TiVo, although they are
manufactured by a variety of different companies. Some of them do indeed license the TiVo software, but others do not. Another popular brand is DirecTV.

Analog Off-Air PVR Devices
A classic TiVo device works very hard to compress incoming analog video to store it effectively and provide trick-play features. The compression quality level can be set in the preferences. The compromise is space versus visible artifacts. At the lowest quality, the video is fairly noisy if the picture contains a lot of movement. This is okay if you are just recording a program that you don’t want to keep forever—for example, just a time shift to view the program at a different time. If you want to record a movie, you will probably choose a higher-quality recording format than you would for a news program. The functionality is broadly divided into trick-play capabilities and a mechanism to ensure that you record all the programs you want to, even if you do not know when they were going to be aired. In the longer term, these devices scale from a single-box solution up to a home media server with several connected clients. This would be attractive to schools for streaming TV services directly to the classroom. University campus TV, hospital TV services, and corporate video-distribution services are candidates. Standards-based solutions offer good economies of scale, low thresholds of lock-in to one supplier, and good commercial opportunities for independent content developers.

Digital Off-Air PVR Devices
When digital PVR devices are deployed, recording television programs off-air becomes far
more efficient. In the digital domain, the incoming transport stream must be de-multiplexed,
and packets belonging to the program stream we are interested in are stored on a hard disk. The broadcaster already optimally compresses these streams. Some storage benefits could be gained by transcoding them. Note that we certainly cannot add any data back to the video that has already been removed at source. Future development work on PVR devices will focus on storing and managing content that has been delivered digitally. This is within the reach of software-based product designs and does not require massive amounts of expensive hardware.
There are complex rights issues attached to home digital-recording technology, and it is in constant evolution.

Mobile PVR Solutions
Another interesting product was demonstrated by Pace at the International Broadcasting
Convention (IBC) in 2004. It was a handheld PVR designed to record material being broadcast
using the DVB-H mobile-TV standard. Coupling this with the H.264 codec and a working DRM solution brings us very close to a system that could be rolled out very soon. Provided rights issues and the content-delivery technology can be developed at the front end, products such as the PVR2GO shown in Figure 2-10 could be very successful.

The Future
The technology that enables PVR devices is getting cheaper, and the coding techniques are pushing the storage capacity (measured in hours) ever upward. Nevertheless, not every household will want to own a PVR. In addition, the higher end of the functionality spectrum may only ever be available to users with a lot of disposable income. Some of the basic functionality may just be built into a TV set. As TV receivers are gradually replaced with the new technology, they ship with video compression and local storage already built in. Pause and rewind of live video, for instance, is very likely to be built into TV sets, and for it to be manufactured cheaply enough, the functionality will be implemented in just a few integrated circuits and will then be as ubiquitous as the Teletext decoders found in European TV sets.
Broadband connectivity is penetrating the marketplace very rapidly—perhaps not quite as fast as DVD players did, but in quite large numbers all the same. A critical threshold is reached when video codecs are good enough to deliver satisfactory video at a bit rate that is equal to or less than what is available on a Broadband connection. Indeed, H.264 encoding packed into MPEG-4 multimedia containers coupled with a PVR storage facility and a fast, low-contention broadband link is a potential fourth TV platform that offers solutions to many of the problems that cannot be easily solved on the satellite-, terrestrial- and cable-based digital-TV platforms. MPEG-4 interactive multimedia packages could be delivered alongside the existing digital-TV content in an MPEG-2 transport stream. Indeed, the standards body has made special provision to allow this delivery mechanism, and MPEG-4 itself does not need to standardize a transport stream because there is already one available.

Harrykar's Techies Blog

Total Pageviews

Search: This Blog, Linked From Here, The Web, My fav sites, My Blogroll

Translate

24 August 2009

Video and Audio Compression

Intro

We Need Video Compression?

No comments:

Post a Comment