OCTAL NUMBERS
You can represent a number by using the octal number system; that is,
base 8. For example, if the number is 10, it can be represented in the octal as
12, that is, 1*81 + 2*80.
Explanation
When octal numbers are printed they are preceeded by "%0".
Hexadecimal numbers use base 16. The characters used in hexadecimal
numbers are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. For example, if
the decimal number is 22, it is represented as 16 in the hexadecimal
representation: 1*161 + 6*160 .
Explanation
You can print numbers in hexadecimal form by using the format "0x".
Floating-point numbers represent two components: one is an exponent and
the other is fraction. For example, the number 200.07 can be represented as 0.20007*103, where 0.2007 is the
fraction and 3 is the exponent. In a binary form, they are represented
similarly. There are two types of representation: short or single- precision
floating-point number and long or double-precision floating-point number. short occupies 4 bytes or 32 bits while long occupies 8 bytes or 64 bits.
Program/Example
In C, short or single-precision floating point is represented by the data type float and appears as:
float f ;
A single-precision floating-point number is represented as follows:
Here the fractional part occupies 23 bits from 0 to 22. The exponent
part occupies 8 bits from 23 to 30 (bias exponent, that is, exponent +
01111111). The sign bit occupies the 31st bit.
Suppose the decimal number is 100.25. It can be converted as follows:
1.
Convert 100.25 into its equivalent binary representation: 1100100.01.
2.
Then represent this number so that there is only 1 bit on the left side
of the decimal point: 1.0010001*26
3.
In a binary representation, exponent 6 means the number 110. Now add the
bias, 0111 1111, to get the exponent: 1000 0101
Since the number is positive, the sign bit is 0. The significant, or
fractional, part is:
1001 0001 0000 0000 0000 000
Note that up until the fractional part, only those bits that are on the
right side of the decimal point are present. The 0s are added to the right side
to make the fractional part take up 23 bits.
Special rules are applied for some numbers:
1.
The number 0 is stored as all 0s, but the sign bit is 1.
2.
Positive infinity is represented as all 1s in the exponent and all 0s in
the fractional part with the sign bit 0.
3.
Negative infinity is represented as all 1s in the exponent and all 0s in
fractional part with the sign bit 1.
4.
A NAN (not a number) is an invalid floating number in which all the
exponent bits are 1, and in the fractional part you may have 1s or 0s.
The range of the float data type is 10−38 to 1038 for
positive values and −1038 to −10−38 for
negative values.
The values are accurate to 6 or 7 significant digits depending on the
actual implementation.
Conversion of a number in the
floating-point form to a decimal number
Suppose the number has the following components:
a.
Sign bit: 1
b.
Exponent: 1000 0011
c.
Significant or fractional part: 1001 0010 0000 0000 0000 000
Since the exponent is
bias, find out the unbiased exponent.
d.
100 = 1000 0011 – 0111 1111 (number 4)
Represent the number
as 1.1001001*24
Represent the number
without the exponent as 11001.001
Convert the binary
number to decimal: −25.125
For double precision, you can declare the variable as double d; it is
represented as
Here the fractional part occupies 52 bits from 0 to 51. The exponent
part occupies 11 bits from 52 to 62 (the bias exponent is the exponent plus 011
1111 1111). The sign bit occupies bit 63. The range of double representation is
+10−308 to +10308 and −10308 to −10−308.
The precision is to 10 or more digits.
Formats for representing floating
points
Following are the valid representions of floating points:
0.23456
2.3456E-1
.23456
2.3456E-4
-.232456E-4
2345.6
23.456E2
-23456
23456e3
Following are the invalid formats:
e1
2.5e-.5
25.2-e5
2.5.3
You can determine whether a format is valid or invalid based on the
following rules:
1.
The value can include a sign, it must include a numerical part, and it
may or may not have exponent part.
2.
The numerical part can be of following form:
d.d, d., .d, d, where d is a set of digits.
3.
If the exponent part is present, it should be represented by ‘e’ or ‘E’,
which is followed by a positive or negative integer. It should not have a
decimal point and there should be at least 1 digit after ‘E’.
4.
All floating numbers have decimal points or ‘e’ (or both).
5.
When ‘e’ or ‘E’ is used, it is called scientific notation.
6.
When you write a constant, such as 50, it is interpreted as an integer.
To interpret it as floating point you have to write it as 50.0 or 50, or 50e0.
You can use the format %f for printing floating numbers. For example, printf("%f\n", f);
%f prints output
with 6 decimal places. If you want to print output with 8 columns and 3 decimal
places, you can use the format %8.3f. For printing double you can use %lf.
Floating-point computation may give incorrect results in the following
situations:
2.
If the calculated value exceeds the range allowable for the type;
3.
If the two calculated values involve approximation then their operation
may involve approximation.
Points to Remember
1.
C provides two main floating-point representations: float (single
precision) and double (double precision).
2.
A floating-point number has a fractional part and a biased exponent.
3. Float occupies 4
bytes and double occupies 8 bytes.
Type conversion occurs when the expression has data of mixed data types,
for example, converting an integer value into a float value, or assigning the
value of the expression to a variable with different data types.
Program/Example
In type conversion, the data type is promoted from lower to higher
because converting higher to lower involves loss of precision and value.
For type conversion, C maintains a hierarchy of data types using the
following rules:
1.
Integer types are lower than floating-point types.
2.
Signed types are lower than unsigned types.
3.
Short whole-number types are lower than longer types.
4.
The hierarchy of data types is as follows: double, float, long, int,
short, char.
These general rules are accompanied by specific rules, as follows:
1.
If the mixed expression is of the double data type, the other operand is
also converted to double and the result will be double.
2.
If the mixed expression is of the unsigned long data type, then the
other operand is also converted to double and the result will be double.
3.
Float is promoted to double.
4.
If the expression includes long and unsigned integer data types, the
unsigned integer is converted to unsigned long and the result will be unsigned
long.
5.
If the expression contains long and any other data type, that data type
is converted to long and the result will be long.
6.
If the expression includes unsigned integer and any other data type, the
other data type is converted to an unsigned integer and the result will be
unsigned integer.
7.
Character and short data are promoted to integer.
8. Unsigned char and
unsigned short are converted to unsigned integer.
Forced conversion occurs when you are converting the value of the larger
data type to the value of the smaller data type, for example, if the
declaration is char c;
and you use the expression c = 300; Since the maximum possible value
for c is 127, the value 300 cannot be accommodated in c. In such a case, the integer 300 is
converted to char using forced conversion.
Program/Example
In general, forced conversion occurs in the following cases:
1.
When an expression gives a larger data type but the variable has a
smaller data type.
2.
When a function is written using a smaller data type but you call the
function by using larger data type. For example, in printf you specify %d, but you provide floating-point value.
Forced conversion is performed according to following rules:
1.
Normally, when floating points are converted to integers, truncation
occurs. For example, 10.76 is converted to 10.
2.
When double is converted to float, the values are rounded or truncated,
depending on implementation.
3.
When longer integers are converted to shorter ones, only the lower bits
are preserved and high-order bits are skipped. For example, the bit
representation of 300 is 1 0010 1100. If it is assigned to character, the lower
bits are preserved since a character can have 8 bits. So you will get the
number 0010 1100 (44 in decimal).
In the case of type conversion, lower data types are converted to higher
data types, so it is better to a write a function using higher data types such
as int or double even if you call the function with char or float. C provides built-in mathematical
functions such as sqrt (square root) which take the argument as double data type. Suppose
you want to call the function by using the integer variable ‘k’. You can call the function
sqrt((double) n)
This is called type casting, that is, converting the data
type explicitly. Here the value ‘k’ is properly converted to the double data
type value.
Points to Remember
1.
C makes forced conversion when it converts from higher data type to
lower data type.
2.
Forced conversion may decrease the precision or convert the value to one
that doesn't have a relation with the original value.
3. Type casting is the
preferred method of forced conversion.
Type casting is used when you want to convert the value of a variable
from one type to another. Suppose you want to print the value of a double data
type in integer form. You can use type casting to do this. Type casting is done
to cast an operator which is the name of the target data type in parentheses.
Program
#include <stdio.h>
main()
{
double d1 = 123.56; \\ A
int i1=456; \\ B
printf("the value of d1 as int without cast operator
%d\n",d1); \\ C
printf("the value of d1 as int with cast operator
%d\n",(int)d1);
\\ D
printf("the value of i1 as double without cast operator
%f\n",i1); \\
E
printf("the value of i1 as double with cast operator
%f\n",(double)i1);
\\ F
i1 = 10;
printf("effect of multiple unary operator
%f\n",(double)++i1); \\ G
i1 = 10; \\ H
//printf("effect of multiple unary operator %f\n",(double) ++
-i1);
error
\\ I i1 = 10;
printf("effect of multiple unary operator %f\n",(double)-
++i1);\\ J
i1 = 10; \\ K
printf("effect of multiple unary operator %f\n",(double)-
-i1); \\ L
i1 = 10; \\ M
printf("effect of multiple unary operator
%f\n",(double)-i1++); \\ N
}
Explanation
1.
Statement A defines variable d1 as double.
2.
Statement B defines variable i1 as int.
3.
Statement C tries to print the integer value of d1 using the placeholder %d. You will see that some random value
is printed.
4.
Statement D prints the value of d1 using a cast operator. You will see that it
will print that value correctly.
5.
Statements E and F print the values of i1 using a cast operator. These
will print correctly as well.
6.
Statements from G onwards give you the effects of multiple unary
operators. A cast operator is also a unary operator.
7.
Unary operators are associated from right to left, that is, the left
unary operator is applied to the right value.
8.
Statement G gives the effect of the cast operator double. The increment
operator, in this case i1, is first incremented and then type casting is done.
9.
If you do not comment out statement I you will get errors. This is
because if unary +, − is included with the increment and decrement
operator, it may introduce ambiguity. For example, +++i may be taken as unary +
and increment operator ++, or it may be taken as increment operator ++ and
unary +. Any such ambiguous expressions are not allowed in the language.
10. Statement J will not
introduce any error because you put the space in this operator, which is used
to resolve any ambiguity.
Points to Remember
1.
Type casting is used when you want to convert the value of one data type
to another.
2.
Type casting does not change the actual value of the variable, but the
resultant value may be put in temporary storage.
3.
Type casting is done using a cast operator that is also a unary
operator.
4.
The unary operators are associated from right to left.