Bits, Nibbles, Bytes
Bit
Bits is a single 0 or 1. But this is not very meaningful; what would be more useful are numbers and characters that can be used to calculate things or print them on the screen.
Nibble
Nibble is a series of 4 bits, e.g., 0110, 1111. This is more meaningful as we can store 4 bits of information, which can be the decimal numbers 6 (0110 in Binary) and 15 (1111 in Binary).
Byte
Byte is a series of 8 bits, or 2 Nibble, e.g., 0110 0110, 1111 1111. This is what most computers use today, as a single unit of information. The most important reason for choosing 8 bits, instead of anything else, is that it can be represented as 2^power. So the above 2 bytes can represent the decimal numbers 102 (0110 0110 in Binary) and 255 (1111 1111 in Binary).
Word, DWord, QWord
Word
1 Word is a series of 2 Bytes. Most computers today are 16-bit, 32-bit, and 64-bit, not 8-bit. So we need to call this something too. So, 2 Bytes or 16 bit is called Word.
DWord
1 Double Word/DWord is a series of 4 Bytes. For 32-bit addressing or numbers, we use DWord.
QWord
1 Quad Word/QWord is a series of 8 Bytes. For 64-bit addressing or numbers, we use QWord.
Kilo Bytes, Mega Bytes, Giga Bytes, Tera Bytes and Peta Bytes
Now, of course, the size of things is no longer just how many bits our CPU can handle but how big a piece of information like a number, text, image, or video is.
Kilo Bytes/KB
1 KB = 1024 B = 2 X 10¹⁰ b (Bits)
Mega Bytes/MB
1 MB = 1024 X 1024 B = 2 X 10²⁰ b (Bits)
Giga Bytes/GB
1 GB = 1024 X 1024 X 1024 B = 2 X 10³⁰ b (Bits)
Tera Bytes/TB
1 TB = 1024 X 1024 X 1024 X 1024 B = 2 X 10⁴⁰ b (Bits)
Peta Bytes/PB
1 PB = 1024 X 1024 X 1024 X 1024 X 1024 B = 2 X 10⁵⁰ b (Bits)
Bytes and Hexadecimal Numbers
Hexadecimal Numbers
Binary has two numbers, 0 and 1. Decimal, as the name implies, has tens numbers: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. Similarly, Hexadecimal has sixteen numbers: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, and f.
So, as an example, 1f is a Hexadecimal number which can be converted to Decimal as we convert Binary to Decimal:
1f = 1 X 16¹ + f X 16⁰ = 16 + 15 = 31
Observe f = 15, so we can deduce a = 10, b = 11, …, f = 15 in Hexadecimal.
Bytes and Hexadecimal Numbers
But what is so special about Hexadecimal Numbers? Well, this perfectly fits into the way Bytes are represented. So, let's take the two examples above.
0110 0110
This can be represented in Hexadecimal as just 0x66.
1111 1111
Similarly, this can be represented in Hexadecimal as just 0xff.
Note: We put ‘0x’ as the prefix of all Hexadecimal numbers.
You must have seen addresses represented in 32-bit (4 B) machines as 0x12ff3410. This is in hex and can be broken up into bits as:
0001 0010 1111 1111 0011 0100 0001 0000
Which do you like better? I like the hex version of the same 4 Bytes much better than the binary version.
Signed Integer 16b (2B), 32b (4B), 64b (8B)
Finally, we can start representing numbers. The first type of Numbers we generally deal with are called Integers. These can be Signed or Unsigned. Signed as Integers, which can be positive or negative, e.g., -1, 10, -999, etc.
We generally represent them as 16-bit, 32-bit, or 64-bit, which can also be defined as 2 Bytes, 4 Bytes, or 64 Bytes.
We generally take the Rightmost Bit (Most Significant Digit) to represent the Sign, so we are left with numbers 16–1=15 bits for 16-bit, 32–1=31 bits for 32-bit, or 64–1=63 bits for 64-bit Integers, respectively.
Negative Integers (Twos Complement)
I just mentioned that Negative Integers uses the Rightmost Bit (Most Significant Digit) to represent its a Negative Integer, but I am not being fully correct. Rather Negative Integer are represented as Integers with which if you add the Positive version of Integer will result into 0.
So the -6 will be represented as an Integer that when added to +6 will result into a 0. So how does that work?
Lets take 6, and convert it into a 16-bit (2 Bytes) Integer: 0000 0000 0000 0110.
- Then, we have to turn into its Complement (1s Complement):
1111 1111 1111 1001.
What this means is that all the bits that were 0, were flipped to 1, and all the bits that were 1 previously, were flipped to 0. - Then, we add 1 to this new number (2s Complement):
1111 1111 1111 1001 + 0000 0000 0000 0001
= 1111 1111 1111 1010.
This is how we represent -6: 1111 1111 1111 1010.
And to show you that I didn’t lie when I said, +6 + (-6) is actually equal to 0. Lets add these Binary numbers together:
Adding the +6 + (-6):
0000 0000 0000 0110
+ 1111 1111 1111 1010
- - - - - - - - - - -
1 0000 0000 0000 0000
But we do get a 1 at the Rightmost position don’t we? Yes, but it gets dropped as it doesn’t fit in a 16-bit Integer. So, the final result indeed is going to be 0 or in hex: 0x0000.
Unsigned Integer 16b (2B), 32b (4Bytes), 64b (8B)
The other kind of Integers are Unsigned Integers. In this case, we don’t need the Rightmost Bit, as all numbers are Positive. So, we can use the complete set of bits to represent an Integer. This type of Integer is generally used to store the size of an element or something similar.
Note: The article in here: https://towardsdatascience.com/unsinged-signed-integers-and-casting-in-rust-9a847bfc398f is actually a great resource to read, as it goes in details using Rust.
Double 64b (8B), 128b (16B)
Double is actually represented as 2 Signed Integers in most systems. So for 32-bit (4 Bytes) systems it is 64-bit (8 Bytes), while in 64-bit (8 Bytes) systems it is 128-bit (16 Bytes).
Float — Exponent and Mentisa
Floating points are numbers like 3.414. So how are these represented? Well for a start it represented in the same 16-bit, 32-bit or 64-bit space like Integers, but unlike Integers it is not represented in the exact Binary representation, as its not memory efficient. Rather, its components are stored.
- First the number is converted to its Binary Format, so for our example 3.414: 11.11001111.
- Then it is normalized into Implicit or Explicit Forms. Lets just use the Explicit Form right now: 0.1111001111 X 2²
- The number is actually broken up into 3 separate components: (1) Sign (2) Exponent (3) Mentisa.
So the Rightmost Bit (Most Significant Bit) is stores the sign, i.e., 0 for Positive, 1 for Negative.
For the Exponent, which in our case is 2, it is transformed into a processed called Biasing as Negative Exponents are not used. So, values from:
-8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7 are converted to
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15.
For our case 2, it is converted to 10, which in Binary is 1010.
And finally, the Mentisa is the number after the . decimal value which is 1111001111.
So it is represented in 16-bit as 0 01010 1111001111.
So the exponent takes 5-bits.
Then for 32-bit as 0 00001010 00000000000001111001111.
So the exponent takes 8-bits.
ASCII and Unicode
We are done with numbers, which are represented as 16-bit, 32-bit or 64-bit Integer or Floating point, or their Double counter parts which are 64-bit or 128-bit as mentioned above. But how do we represent characters like ‘a’, ‘%’?
These were initially represented as 1 Bytes (8-bit) with an encoding rule called ASCII. Here is how the table looks:
So we can see that ‘a’ is represented as 97, or in bits 0110 0001, or in hex 0x61.
Similarly, ‘%’ is represented as 37, or in bits 0010 0101, or in hex 0x25.
This normal ASCII table goes from 0–127, which can be represented with 7-bits. What happened to another bit? Well for that we have the extended ASCII table:
But with time, we needed to represent many more characters and it was not possible to do with 8-bits (1 Byte) whose range is from 0 to 255. For this we needed a much more flexible approach which led to UTF-8 encoding, which is to use more than 1 Byte — 2/3 Bytes to represent these characters.
Note: Unicode is backwards compatible with ASCII. i.e., 0–127 Decimal Representation of Characters in Unicode is same as the ASCII Characters.
So how does a character like ‘😀’ is represented?
Well here is what its Decimal Value looks like: 128512
And its Hex Value: 0x1F600
There are total of 149,878 characters currently in the UTF-8 unicode standard.
Conclusion
It helps to learn how humans and computers represent information under the hood, and as it demystifies things at a deeper level. Hopefully this guide helped you to learn something new today!