User:Ryan Cooley/MPEG1: Difference between revisions
imported>Rcooley (4) |
imported>Rcooley (5) |
||
Line 8: | Line 8: | ||
'''MPEG-1''' was an early [[standard]] for [[lossy]] compression of [[video]] and [[audio]]. It was designed to compress raw video and CD audio | '''MPEG-1''' was an early [[standard]] for [[lossy]] compression of [[video]] and [[audio]]. It was designed to compress raw video and CD audio to 1.5Mb/s without discernible quality loss, making [[Video CD]]s and [[Digital Video Broadcasting]] possible. | ||
Perhaps the most well-known part of the MPEG-1 standard today is the MP3 audio format it introduced. | Perhaps the most well-known part of the MPEG-1 standard today is the MP3 audio format it introduced. | ||
The MPEG-1 standard is published as [[ISO/ | The MPEG-1 standard is published as [[ISO/IEC_11172]]. | ||
== History == | == History == | ||
Modeled on the collaborative approach | Modeled on the successful collaborative approach and technologies developed by the [[Joint Photographics Expert Group]] (which created the [[JPEG]] still-image compression standard) and [[CCITT's]] [[Experts Group on Telephony]] (which created the [[H.261]] standard for [[video conferencing]] over [[ISDN]] lines) the [[MPEG]] working group was established in January 1988. MPEG was to address the need for [[standard]] video and audio encoding formats, and build on H.261 to get better quality through the use of more complex, non-realtime encoding methods. <ref>http://www.cis.temple.edu/~vasilis/Courses/CIS750/Papers/mpeg_6.pdf pp.2</ref> | ||
Development of the MPEG-1 standard began in [[May 1988]]. 14 video and 14 audio codec proposals were submitted by individual companies and institutions for evaluation. The codecs were extensively tested for computational complexity and subjective (human | Development of the MPEG-1 standard began in [[May 1988]]. 14 video and 14 audio codec proposals were submitted by individual companies and institutions for evaluation. The codecs were extensively tested for computational complexity and subjective (human perceived) quality, at (combined video+audio) data rates of 1.5Mbps. The codecs that excelled in this testing were utilized as the basis for the standard and refined further, with additional features and other improvements being incorporated. <ref>http://www.chiariglione.org/mpeg/meetings/santa_clara90/santa_clara_press.htm</ref> | ||
After 20 meetings of the full group in various cities around the world, and 4 <sup>1</sup>/<sub>2</sub> years of development and testing, | After 20 meetings of the full group in various cities around the world, and 4 <sup>1</sup>/<sub>2</sub> years of development and testing, (a draft standard was produced September 1990, and only minor changes were introduced) the final standard was approved in early [[November 1992]]. <ref>http://www.chiariglione.org/mpeg/meetings.htm</ref> Before the MPEG-1 standard had even been finalized/published/drafted, work began on a second standard, MPEG-2, intended to extend MPEG-1 technology to provide full broadcast-quality at high bitrates (3 - 15 [[Mbps]]), and support for [[interlaced]] video. <ref>http://www.chiariglione.org/mpeg/meetings/london/london_press.htm</ref> Due in part to the similarity between the two codecs, all standard MPEG-2 decoders include full support for playing MPEG-1 video. | ||
Today, MPEG-1 is by far the most widely compatible lossy audio/video format in the world. Due to its age, most patents on MPEG-1 Video and Layer II audio technology have expired (MP3 being a notable exception), and can be implemented without payment of license fees in almost all countries. Most computer software for video playback includes MPEG-1 decoding, in addition to any other supported formats. The immense popularity of MP3 audio has established a massive [[installed base]] of hardware that can playback all 3 layers of MPEG-1 audio. The widespread popularity of MPEG-2 (mostly with broadcasters) means MPEG-1 is playable by most digital cable/satellite set-top-boxes, and digital disc and tape players. | Today, MPEG-1 is by far the most widely compatible lossy audio/video format in the world. Due to its age, most patents on MPEG-1 Video and Layer II audio technology have expired (MP3 being a notable exception), and can be implemented without payment of license fees in almost all countries. Most computer software for video playback includes MPEG-1 decoding, in addition to any other supported formats. The immense popularity of MP3 audio has established a massive [[installed base]] of hardware that can playback all 3 layers of MPEG-1 audio. The widespread popularity of MPEG-2 (mostly with broadcasters) means MPEG-1 is playable by most digital cable/satellite set-top-boxes, and digital disc and tape players. | ||
Line 47: | Line 47: | ||
== Video == | == Video == | ||
Part 2 of the MPEG-1 standard covers video | Part 2 of the MPEG-1 standard covers video and is defined in [[ISO/IEC_11172-2]] | ||
=== DCT === | |||
Each 8x8 macroblock is encoded using the ''Forward'' Discreet Cosign Transform ([[FDCT]]). This process by itself is lossless, and will be reversed by the ''Inverse'' DCT ([[IDCT]]), for playback, later. | |||
The FDCT process converts the 64 uncompressed pixel values (brightness) into 64 different ''frequency'' values. One large value that is average of the entire 8x8 block (the '''DC coefficient''') and 63 smaller, positive or negative values (the '''AC coefficients'''), that are relative to the value of the DC coefficient. | |||
The (large) DC coefficient remains mostly consistent from one block to the next, and can so can be compressed quite effectively. A significant number of the AC coefficients will be 0, which can then be very efficiently compressed in a later step. Additionally, the frequency conversion is necessary for the quantization step. | |||
=== Quantization === | |||
A quantization table is a string of 64-numbers (0-255) that tells the encoder what visual information is important, and which is not. Each number corresponds to a certain frequency component of the video image. | |||
Each value (''frequency'') of the DCT transformed block is divided by it's corresponding value in the quantization table. The visual information in some frequencies, deemed less visually important, will be reduced, while other frequency components may be eliminated completely. [could be worded better!] | |||
This quantization process eliminates a large amount of data, and is the main lossy processing step in MPEG-1 video encoding. This is also the source of most MPEG-1 video artifacts, like [[blockiness]], [[color banding]], noise, [[ringing]], discoloration, et al. when video is encoded with an insufficient bitrate. | |||
Line 65: | Line 80: | ||
Complexity (memory) | Complexity (memory) | ||
Delay | Delay | ||
"The DC-picture type is used to make fast searches possible on sequential DSMs such as tape recorders with a fast search mechanism. The DC-picture type is never used in conjunction with the other picture types. | "The DC-picture type is used to make fast searches possible on sequential DSMs such as tape recorders with a fast search mechanism. The DC-picture type is never used in conjunction with the other picture types." ???Any relation to DC coefficients??? | ||
Line 71: | Line 86: | ||
Keyframe placement | Keyframe placement | ||
DCT | DCT (reversible)* | ||
Quantization | Quantization* | ||
Quantizer Noise | Table (num 1-255)* | ||
Banding | Quantizer Noise* | ||
Ringing | Banding* | ||
Coefficients | Ringing (large coefficients in high frequency sub-bands) | ||
AC | Coefficients* | ||
DC (Spatial prediction) | AC * | ||
DC (Spatial prediction)* | |||
1/2 or 1/3 interpolation? | 1/2 or 1/3 interpolation? | ||
zigzag | zigzag | ||
Line 101: | Line 117: | ||
== Audio == | == Audio == | ||
Part 3 of the MPEG-1 standard covers audio | Part 3 of the MPEG-1 standard covers audio and is defined in [[ISO/IEC_11172-3]] | ||
MPEG-1 audio utilizes perceptual masking | MPEG-1 audio utilizes perceptual masking with sub-band coding with a polyphased filter bank to reduce the bitrate of the audio stream. It has been shown to be particularly efficient on high quality percussive sounds (impulses) thanks to the very effective time-domain concealment characteristics of its 32 sub-band [[polyphased filter bank]]. | ||
mono, stereo, joint stereo (impulse, m/s), dual. | *[[Sampling rate]]s: 32, 44.1 and 48 kHz | ||
*[[Bitrate]]s: 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320 and 384 kbit/s | |||
The format is based on successive digital frames of 1152 sampling intervals with four possible formats: | |||
* mono format | |||
* stereo format | |||
* joint stereo format (stereo irrelevance) | |||
* dual channel (uncorrelated) format | |||
mono, stereo, joint stereo (impulse, m/s), dual. | |||
efficient time-domain concealment characteristics | |||
=== Layer I === | === Layer I === | ||
file extension .mp1 | MPEG-1 Layer I is nothing more than a simplified version of Layer II, designed for low-delay and low complexity to facilitate [[real-time]] encoding on the hardware available in 1990 for applications like teleconferencing and studio editing. With the substantial performance improvements in digital processing since, it has now been long obsolete. | ||
Simple | |||
It saw limited adoption in it's time, and most notably was used on the defunct [[Digital Compact Cassette]]. Layer I audio files will most often use the extension '''.mp1''' | |||
Realtime | |||
Delay | file extension .mp1* | ||
Digital Compact Cassette | Simple* | ||
Obsolete today | Realtime* | ||
Delay* | |||
Digital Compact Cassette* | |||
Obsolete today* | |||
=== Layer II === | === Layer II === | ||
Despite some 20 years of progress in the field of digital audio coding, MP2 remains the preeminent lossy audio coding standard due to its especially high audio coding performances on highly critical audio material such as castanet, symphonic orchestra, male and female voices and particularly high quality percussive sounds (impulses) like triangle and glockenspiel. Testing has shown MP2 to be equivalent or superior to than much more recent audio codecs, such as [[Dolby Digital]] AC-3. <ref>Wustenhagen et al, ''Subjective Listening Test of Multi-channel Audio Codecs'', AES 105th Convention Paper 4813, San Francisco 1998</ref><!--MP2 scored the same as AC-3, despite using an inferior matrixed mode for 5.1 surround--> | |||
Subjective audio testing by experts, in the most critical conditions ever implemented, have shown MP2 to offer transparent audio compression at 256kbps for 16-bit 44.1khz [[CD]] audio. <ref>http://www.faqs.org/faqs/mpeg-faq/part1/ "You can compress the same stereo program down to 256 Kbits/s with no loss in discernable quality." (the original papers would be much, much better refs, but I can't seem to find them! This just proves they exist!)</ref> That (approx) 1:6 compression ratio for CD audio is particularly impressive since it's quite close to upper theoretical limit of [[Perceptual Entropy]], at just over 1:8. <ref>J. Johnston, ''Estimation of Perceptual Entropy Using Noise Masking Criteria,'' in Proc. ICASSP-88, pp. 2524-2527, May 1988.</ref> | |||
<ref>6. J. Johnston, ''Transform Coding of Audio Signals Using Perceptual Noise Criteria,'' IEEE J. Sel. Areas in Comm., pp. 314-323, Feb. 1988.</ref> | |||
Achieving much higher compression is simply not possible without discarding some perceptible information. | |||
audio broadcasting | audio broadcasting | ||
error resilient | |||
Musicam | Musicam | ||
32 sub-bands | 32 sub-bands | ||
Exceeds MP3 somewhere between 192-256 kbps | Exceeds MP3 somewhere between 192-256 kbps | ||
dominant standard* | |||
Audiophile* | |||
impulses* | |||
superior to AC-3* | |||
pro-transparent at 256kbps* | |||
same fundamental problem today* | |||
Focus on [time-domain] critical audio* | |||
=== Layer III/MP3 === | === Layer III/MP3 === |
Revision as of 13:01, 18 March 2008
MPEG-1 articles (MPEG-1, MP1, MP2, MP3) on wikipedia are complete crap. Disorganized, slanted, incomplete, misconstrued, etc. It's far easier to start from scratch than try to fix all the individual existing ones, and will give far better end results; I will copy some content from the existing articles.
Do not make any changes to this page for now. This is my mind-dump and accommodating others before I'm done will just make much, much more work for me. Put any suggestions on the Talk page, and I will eventually address them.
-RC
MPEG-1 was an early standard for lossy compression of video and audio. It was designed to compress raw video and CD audio to 1.5Mb/s without discernible quality loss, making Video CDs and Digital Video Broadcasting possible.
Perhaps the most well-known part of the MPEG-1 standard today is the MP3 audio format it introduced.
The MPEG-1 standard is published as ISO/IEC_11172.
History
Modeled on the successful collaborative approach and technologies developed by the Joint Photographics Expert Group (which created the JPEG still-image compression standard) and CCITT's Experts Group on Telephony (which created the H.261 standard for video conferencing over ISDN lines) the MPEG working group was established in January 1988. MPEG was to address the need for standard video and audio encoding formats, and build on H.261 to get better quality through the use of more complex, non-realtime encoding methods. [1]
Development of the MPEG-1 standard began in May 1988. 14 video and 14 audio codec proposals were submitted by individual companies and institutions for evaluation. The codecs were extensively tested for computational complexity and subjective (human perceived) quality, at (combined video+audio) data rates of 1.5Mbps. The codecs that excelled in this testing were utilized as the basis for the standard and refined further, with additional features and other improvements being incorporated. [2]
After 20 meetings of the full group in various cities around the world, and 4 1/2 years of development and testing, (a draft standard was produced September 1990, and only minor changes were introduced) the final standard was approved in early November 1992. [3] Before the MPEG-1 standard had even been finalized/published/drafted, work began on a second standard, MPEG-2, intended to extend MPEG-1 technology to provide full broadcast-quality at high bitrates (3 - 15 Mbps), and support for interlaced video. [4] Due in part to the similarity between the two codecs, all standard MPEG-2 decoders include full support for playing MPEG-1 video.
Today, MPEG-1 is by far the most widely compatible lossy audio/video format in the world. Due to its age, most patents on MPEG-1 Video and Layer II audio technology have expired (MP3 being a notable exception), and can be implemented without payment of license fees in almost all countries. Most computer software for video playback includes MPEG-1 decoding, in addition to any other supported formats. The immense popularity of MP3 audio has established a massive installed base of hardware that can playback all 3 layers of MPEG-1 audio. The widespread popularity of MPEG-2 (mostly with broadcasters) means MPEG-1 is playable by most digital cable/satellite set-top-boxes, and digital disc and tape players.
Notably, the MPEG-1 standard very strictly defines the bitstream, and decoder function, but does not define how MPEG-1 encoding is to be performed (although they did provide a reference implementation). This means that MPEG-1 coding efficiency can drastically vary depending on the encoder used, and generally means that newer encoders perform significantly better than their predecessors.
Began development in 1988 Approved November 1992 Published August 1993 Lossy most compatible format MPEG-2
Application
VCD players DVB DAB MP3 MPEG-2? audio: SVCD DVD players (not surround) ATSC/HDTV (failed)
Video
Part 2 of the MPEG-1 standard covers video and is defined in ISO/IEC_11172-2
DCT
Each 8x8 macroblock is encoded using the Forward Discreet Cosign Transform (FDCT). This process by itself is lossless, and will be reversed by the Inverse DCT (IDCT), for playback, later.
The FDCT process converts the 64 uncompressed pixel values (brightness) into 64 different frequency values. One large value that is average of the entire 8x8 block (the DC coefficient) and 63 smaller, positive or negative values (the AC coefficients), that are relative to the value of the DC coefficient.
The (large) DC coefficient remains mostly consistent from one block to the next, and can so can be compressed quite effectively. A significant number of the AC coefficients will be 0, which can then be very efficiently compressed in a later step. Additionally, the frequency conversion is necessary for the quantization step.
Quantization
A quantization table is a string of 64-numbers (0-255) that tells the encoder what visual information is important, and which is not. Each number corresponds to a certain frequency component of the video image.
Each value (frequency) of the DCT transformed block is divided by it's corresponding value in the quantization table. The visual information in some frequencies, deemed less visually important, will be reduced, while other frequency components may be eliminated completely. [could be worded better!]
This quantization process eliminates a large amount of data, and is the main lossy processing step in MPEG-1 video encoding. This is also the source of most MPEG-1 video artifacts, like blockiness, color banding, noise, ringing, discoloration, et al. when video is encoded with an insufficient bitrate.
Part 2 Dimentions 4094x4094 Datarate Constrained Parameters Bitstream
Luma Chroma
I-frames (Intraframe) Seeking P-frames (Predicted) B-frames (Bidirectional) Complexity (memory) Delay "The DC-picture type is used to make fast searches possible on sequential DSMs such as tape recorders with a fast search mechanism. The DC-picture type is never used in conjunction with the other picture types." ???Any relation to DC coefficients???
GOP Keyframe placement
DCT (reversible)* Quantization* Table (num 1-255)* Quantizer Noise* Banding* Ringing (large coefficients in high frequency sub-bands) Coefficients* AC * DC (Spatial prediction)* 1/2 or 1/3 interpolation? zigzag Macroblocks 16 dimentions Blockiness Motion Vectors/Estimation Black borders/Noise pel precision (half pixel IIRC) Two MV per macroblock (forward/backward pred) Prediction error Huffman Table (for frequent values) RLE (fixed length for uncommon codes) Variable RLE? Others? CBR/VBR Spacial Complexity Temporal Complexity
Audio
Part 3 of the MPEG-1 standard covers audio and is defined in ISO/IEC_11172-3
MPEG-1 audio utilizes perceptual masking with sub-band coding with a polyphased filter bank to reduce the bitrate of the audio stream. It has been shown to be particularly efficient on high quality percussive sounds (impulses) thanks to the very effective time-domain concealment characteristics of its 32 sub-band polyphased filter bank.
- Sampling rates: 32, 44.1 and 48 kHz
- Bitrates: 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320 and 384 kbit/s
The format is based on successive digital frames of 1152 sampling intervals with four possible formats:
- mono format
- stereo format
- joint stereo format (stereo irrelevance)
- dual channel (uncorrelated) format
mono, stereo, joint stereo (impulse, m/s), dual. efficient time-domain concealment characteristics
Layer I
MPEG-1 Layer I is nothing more than a simplified version of Layer II, designed for low-delay and low complexity to facilitate real-time encoding on the hardware available in 1990 for applications like teleconferencing and studio editing. With the substantial performance improvements in digital processing since, it has now been long obsolete.
It saw limited adoption in it's time, and most notably was used on the defunct Digital Compact Cassette. Layer I audio files will most often use the extension .mp1
file extension .mp1* Simple* Realtime* Delay* Digital Compact Cassette* Obsolete today*
Layer II
Despite some 20 years of progress in the field of digital audio coding, MP2 remains the preeminent lossy audio coding standard due to its especially high audio coding performances on highly critical audio material such as castanet, symphonic orchestra, male and female voices and particularly high quality percussive sounds (impulses) like triangle and glockenspiel. Testing has shown MP2 to be equivalent or superior to than much more recent audio codecs, such as Dolby Digital AC-3. [5]
Subjective audio testing by experts, in the most critical conditions ever implemented, have shown MP2 to offer transparent audio compression at 256kbps for 16-bit 44.1khz CD audio. [6] That (approx) 1:6 compression ratio for CD audio is particularly impressive since it's quite close to upper theoretical limit of Perceptual Entropy, at just over 1:8. [7] [8] Achieving much higher compression is simply not possible without discarding some perceptible information.
audio broadcasting error resilient Musicam 32 sub-bands Exceeds MP3 somewhere between 192-256 kbps dominant standard* Audiophile* impulses* superior to AC-3* pro-transparent at 256kbps* same fundamental problem today* Focus on [time-domain] critical audio*
Layer III/MP3
9 months? ASPEC (Fraunhoffer) freq transform encoder entropy coding Hybrid MDCT pre-echo worse aliasing issues "aliasing compensation" mid/side (or impulse) joint stereo 576 frequency components selectivity "If there is a transient, 192 samples are taken instead of 576 to limit the temporal spread of quantization noise"? psychoacoustic model and frame format from MP1/2 ringing CBR/VBR Frames are not independent
Systems
Part 1 of the MPEG-1 standard covers systems which is the logical layout of the encoded audio, video, and other bitstream data.
"The MPEG-1 Systems design is essentially identical to the MPEG-2 Program Stream structure." [9]
Program Stream Interleaving PES Wrap-around DTS Timebase correction Pixel/Display Aspect Ratio
See Also
References
- ↑ http://www.cis.temple.edu/~vasilis/Courses/CIS750/Papers/mpeg_6.pdf pp.2
- ↑ http://www.chiariglione.org/mpeg/meetings/santa_clara90/santa_clara_press.htm
- ↑ http://www.chiariglione.org/mpeg/meetings.htm
- ↑ http://www.chiariglione.org/mpeg/meetings/london/london_press.htm
- ↑ Wustenhagen et al, Subjective Listening Test of Multi-channel Audio Codecs, AES 105th Convention Paper 4813, San Francisco 1998
- ↑ http://www.faqs.org/faqs/mpeg-faq/part1/ "You can compress the same stereo program down to 256 Kbits/s with no loss in discernable quality." (the original papers would be much, much better refs, but I can't seem to find them! This just proves they exist!)
- ↑ J. Johnston, Estimation of Perceptual Entropy Using Noise Masking Criteria, in Proc. ICASSP-88, pp. 2524-2527, May 1988.
- ↑ 6. J. Johnston, Transform Coding of Audio Signals Using Perceptual Noise Criteria, IEEE J. Sel. Areas in Comm., pp. 314-323, Feb. 1988.
- ↑ http://www.chiariglione.org/mpeg/faq/mp1-sys/mp1-sys.htm
External Links
- http://www.chiariglione.org/mpeg/ Official Home Page of the Moving Picture Experts Group (MPEG) a working group of ISO/IEC