Television or broadcast video has been around for well over 50 years. It has mostly developed and been disseminated in an analog form to which most of the fundamental agreed standards apply. Digital video was developed to initially serve the analog broadcast marketplace with unchanged standards. It is important to remember these basic facts.
Broadcast video uses Raster Scan where the picture is made by a series of regular scans called lines and fields. Video is ultimately a spot of light travelling at uniform speed across a cathode ray tube; the changing brightness of the spot traces out a picture. The electronic signal that we use to control the spot has two important parameters; the voltage and the timing. The voltage controls the brightness, the timing decides where on the screen the spot is located. The vertical position is stepped down sequentially in what we call lines, the horizontal position is determined by the delay after a reference start-of-line marker. If we forget for now all the nuts and bolts that allow the system to work (sync pulses, CSC bursts, etc) then what actually counts to define the video standard is the number of lines and the timing width of the picture content that are both used to create the actual visible image. This timing is normally called the Active Line Time.
NTSC is defined as having 486 active lines with a nominal active line length of 52.7 microseconds. PAL is 576 active lines with a nominal active length of 52 microseconds. These format specifications are what represent the image with its 4:3 aspect ratio. This in turn has formed the specification for image generating kit like cameras, and IT HAS NOT CHANGED through digital video adoption. Note though that the analog video specifications include tolerances (hence use of nominal above) and for convenience we will not always use the middle of the range. There is also some argument about the exact number of active analog lines, but the above numbers are the usually accepted values.
Early on in the evolution of digital video, international committees met and decided to create formats that would enhance manufacturing ease and format conversion. It was therefore decided to use a common number of digital samples per line of video. It was also a requirement that the digital sample period was a little longer than the actual active line time.
It was decided to use a sample frequency of 13.5MHz, and to have 720 samples for each line. This results in about 712 active samples for NTSC and about 702 for PAL . These numbers remember represent a nominal 4:3 image.
If the digital video is used to produce analog video, then the extra pixel samples will be cropped and lost in the blanking. Conversely any analog source will only produce active pictures for the narrower period, and in fact many digital cameras are designed to also have pictures for only this period. If the full 720 sample width of the digital image is used, then it is clearly a little bit wider than 4:3; it is if you like a 4:3 image with some bits on the sides. Image origination for the digital format must therefore take this into account.
Let us therefore look at the specifics of image production for the two digital video formats. Paint and draw type applications, and most animation origination will work with square pixels, as will devices like digital scanners used to produce images. This means that the image is represented on the computer screen as having 1:1 relationship with the pixel size. An image of equal numbers of pixels in height and width will come out looking square. This is not true of digital video. We have seen that a 4:3 NTSC image is represented by 712x486 pixels, but clearly a 4:3 square pixel image would be 712x534 (or 648x486). The video image thus has a stretched horizontals appearance on a computer screen; it is what we call an anamorphic image. Many applications use an anamorphic ratio (e.g. 0.9), but since these are often approximations I dont believe they are helpful to a good understanding. On the PAL side, the 4:3 image is represented in video by 702x576 pixels, with a square pixel equivalent being 768x576. The video image thus appears to have a squashed horizontals appearance.
The NTSC situation is then complicated some, because the development of the popular DV consumer format decided to use a cropped 480 line frame size. Since the same analog display equipment is still being used, and indeed the same cameras as used before were also used, then it stands to reason that the aspect and anamorphic ratios remained the same. I believe the easiest way to think of this is to imagine a 4:3 full NTSC image at 712x486. If we crop 6 pixels from the height, then clearly we need to crop 8 pixels from the width to maintain a 4:3 image. (OK, this is an approximation but near enough.) We thus arrive at an active digital video image of 704x480. The square pixel equivalent to this would be 704x528 (or 640x480).
Lately there has been use of widescreen video with a 16:9 image aspect ratio. This simply represents a further 4:3 anamorphic ratio to that already used. Clearly the square pixel sizes need to be modified pro rata, but otherwise all the numbers remain the same as before.
In all formats, we have 2 obvious alternate generation procedures:
1) We can generate a 4:3(16:9) image and then drop it into the slightly wider full digital frame, leaving some blank image area.
2) We can actually generate a non 4:3(16:9) image that will subsequently fill the digital frame.
To preserve quality, it is customary to generate an image based upon always having too many pixels, so that the final digital video frame is a reduction in size.
In summary, some obvious image sizes in pixels to use to generate square pixel images for digital video are as follows:
NTSC D1 (720 x 486) normal
4:3 square pixels 712 x 534
Full width square pixels 720 x 534
NTSC D1 (720 x 486) widescreen
16:9 square pixels 864 x 486
Full width square pixels 874 x 486
NTSC DV (720 x 480) normal
4:3 square pixels 704 x 528
Full width square pixels 720 x 528
NTSC DV (720 x 480) widescreen
16:9 square pixels 854 x 480
Full width square pixels 874 x 480
PAL (720 x 576) normal
4:3 square pixels 768 x 576
Full width square pixels 788 x 576
PAL (720 x 576) widescreen
16:9 square pixels 1024 x 576
Full width square pixels 1050 x 576
Some practical points
At some stage the square pixel image has to be converted to the anamorphic image actually used in the digital video. This involves some processing, normally called interpolation, and the quality can vary widely. The artist should determine where is the best place in the production chain to get the format change done with the best quality. For example, Photoshop (if set to best) does a very good process, After Effects is good, but Premiere is mediocre.
Many NLE applications like Premiere and Final Cut Pro have magic number recognition. If they see a graphic image size that they think is appropriate to a video frame, then they will automatically apply a transformation to the size. Many applications use only an approximation of the requisite ratios, and since this transformation may not be to your required quality, it is much safer to always apply your own transformation before import.