Byte: Difference between revisions

Revision as of 21:18, 12 April 2007

The Hexer hex editor, displaying the Linux kernel version 2.6.20.6; this image illustrates the value of bytes composing a program as they appear in the hexadecimal format

In computer science, a byte is a unit of data consisting of eight bits. When grouped together, bytes can contain the information to form a document, such as a photograph or a book. All data represented on a computer are composed of bytes, from e-mails and pictures, to programs and data stored on a hard drive. Although it may appear to be a simple concept at first glance, there is much information that goes to a deeper level than the byte.

Technical definition

For example, in electronics, information is determined by the toggle of two states, usually referred to as 'on' and 'off'. To represent this state, computer scientists use the values of 0 (off) and 1 (on); we refer to this value as a bit.

Each byte is made of eight bits, and can represent any number from 0 to 255. We obtain this number of possible values, which is 256 when including the 0, by raising the possible values of a bit (two) to the power of the length of a byte (eight); thus, 2⁸ = 256 possible values in a byte.

Bytes can be used to represent a countless array of data types, from characters in a string of text, to the assembled and linked machine code of a binary executable file. Every file, sector of system memory, and network stream is composed of bytes.

Perhaps the oldest formation of bytes was plain text, that is, plain alphanumeric characters with no punctuation. To make up for the absence of basic punctuation, telegrams would often use the word "STOP" in place of a period. The actual value of each character has varied in years past. Today, however, we have the American Standard Code for Information Interchange (ASCII), which allows data to be readable when being transmitted through different mediums, such as from one operating system to another. For instance, a user that typed a plain text document in Linux will be able to read his file correctly on a Macintosh computer. An example of ASCII include the capital letters in the English language, which range from 101 for "A" to 127 for "Z".

Endianness

For more information, see: Endianness.

Of course, since data almost always consists of more than one byte, these strings of numbers must be arranged in a certain fashion in order for a device to read it correctly. In computer science, we refer to this as endianness. Just as some human languages are written from left to right, such as English, while others are written from right to left, such as Greek, bytes are not always arranged in the same fashion.

The method of which the most significant byte is first, or stored in the lowest memory sector, rather, is called the 'Big Endian'. Its antonym is the 'Little Endian', in which the most significant byte is stored in the highest memory sector. Unfortunately, neither order of data is standard. For this reason, network communications must specify which method they are using before they transmit information.

Word origin and ambiguity

Although the origin of the word 'byte' is unknown, it is believed to have been coined by Dr. Werner Buchholz of IBM in 1964. It is a play on the word 'bit', and was originally used to refer to a the number of bits used to represent a character.^[1] This number is usually eight, but in some cases (especially in times past), it can be any number between 2 and as many as 128 bits. Thus, the word 'byte' is actually an ambiguous term. For this reason, an eight bit byte is sometimes referred to as an 'octet'.^[2]

Sub-units

For more information, see: SI prefix.

While basic, byte is not the most commonly used unit of data. Because files are normally many thousands or even billions of times larger than a byte, other terms are used to increase readability. Metric prefixes are added to the word byte, such as kilo for one thousand bytes (kilobyte), mega for one million (megabyte), giga for one billion (gigabyte), and even tera, which is one trillion (terabyte). One thousand megabytes composes a terabyte, and even the largest consumer hard drives today are only three-fourths a terabyte (750 'gigs' or gigabytes). The rapid pace of technology may make the terabyte a common apperance in the future, however.

Conflicting definitions

For more information, see: Binary prefix.

Traditionally, the computer world has used a value of 1024 instead of 1000 when referring to a kilobyte. The reason for this is that the programmers needed a number compatible with the base of 2, and 1024 is equal to 2 to the 10th power. This, however, is now non-standard; it has recently been replaced with the term 'kibibyte', abbreviated as KiB; this standard is known as the 'binary prefix'.

While the difference between 1000 and 1024 may seem trivial, one must note that as the size of a disk increases, so does the margin of error. The difference between 1TB and 1TiB, for instance, is approximately 1.1%. As hard drives become larger, the need for a distinction between these two prefixes will grow. This has been a problem for hard disk drive manufacturers in particular. For example, one well known disk manufacturer, Western Digital, has recently been taken to court for their use of the base of 10 when labeling the capacity of their drives.^[3]

Table of prefixes

SI prefixes (abbreviation)	Value	Binary prefixes (abbreviation)	Value	Difference
kilobyte (KB)	10³	kibibyte (KiB)	2¹⁰	1.024%
megabyte (MB)	10⁶	mebibyte (MiB)	2²⁰	1.049%
gigabyte (GB)	10⁹	gibibyte (GiB)	2³⁰	1.074%
terabyte (TB)	10¹²	tebibyte (TiB)	2⁴⁰	1.010%
petabyte (PB)	10¹⁵	pebibyte (PiB)	2⁵⁰	1.126%
exabyte (EB)	10¹⁸	exbibyte (EiB)	2⁶⁰	1.153%
zettabyte (ZB)	10²¹	zebibyte (ZiB)	2⁷⁰	1.181%
yottabyte (YB)	10²⁴	yobibyte (YiB)	2⁸⁰	1.209%

References

↑ Dave Wilton (2006-04-8). Wordorigins.org; bit/byte.
↑ Bob Bemer (Accessed April 12th, 2007). Origins of the Term "BYTE".
↑ Nate Mook (2006-06-28). Western Digital Settles Capacity Suit.

[1] Dave Wilton (2006-04-8). Wordorigins.org; bit/byte.

[2] Bob Bemer (Accessed April 12th, 2007). Origins of the Term "BYTE".

[3] Nate Mook (2006-06-28). Western Digital Settles Capacity Suit.

[1]

[2]

[3]

@@ Line 18: / Line 18: @@
 The method of which the most significant byte is first, or stored in the lowest memory sector, rather, is called the 'Big Endian'. Its antonym is the 'Little Endian', in which the most significant byte is stored in the highest memory sector. Unfortunately, neither order of data is standard. For this reason, network communications must specify which method they are using before they transmit information.
-===Word origin===
+===Word origin and ambiguity===
 Although the origin of the word 'byte' is unknown, it is believed to have been coined by Dr. Werner Buchholz of [[IBM]] in 1964. It is a play on the word 'bit', and was originally used to refer to a the number of bits used to represent a character.<ref>{{cite web
 | url=http://www.wordorigins.org/index.php/bit_byte/

Byte: Difference between revisions

Revision as of 21:18, 12 April 2007

Contents

Technical definition

Endianness

Word origin and ambiguity

Sub-units

Conflicting definitions

Table of prefixes

Related topics

References

Navigation menu

Byte: Difference between revisions

Revision as of 21:18, 12 April 2007

Technical definition

Endianness

Word origin and ambiguity

Sub-units

Conflicting definitions

Table of prefixes

Related topics

References

Navigation menu

Search