问个《编程实践》(英文版)里面的问题# JobHunting - 待字闺中
j*y
1 楼
在193~194页,书里谈到了一个下面的问题:
---
Signedness of char. In C and C++, it is not specified whether the char data type is signed or unsigned. This can lead to trouble when combining chars and ints, such as in code that calls the int-valued routine getchar(). If you say
? char c; /* should be int */
? c = getchar();
the value of c will be between 0 and 255 if char is unsigned, and between -128 and 127 if char is signed, for the almost universal configuration of 8-bit characters on a two's complement machine. This has implications if the character is to be used as an array subscript or if it is to be tested against EOF, which usually has value -1 in stdio.
For instance, we had developed this code in Section 6.1 after fixing a few boundary conditions in the original version. The comparison s[i] == EOF will always fail if char is unsigned:
? int i;
? char s[MAX];
?
? for (i = 0; i < MAX-1; i++)
? if ((s[i] = getchar()) == '\n' || s[i] == EOF)
? break;
? s[i] = '\0';
When getchar returns EOF, the value 255 (0xFF. the result of converting -1 to unsigned char) will be stored in s[i]. If s[i] is unsigned, this will remain 255 for the comparison with EOF, which will fail.
Even if char is signed, however, the code isn't correct. The comparison will succeed at EOF, but a valid input byte of 0xFF will look just like EOF and terminate the loop prematurely. So regardless of the sign of char, you must always store the return value of getchar in an int for comparison with EOF.
Here is how to write the loop portably:
int c, i;
char s[MAX];
for (i = 0; i < MAX-1; i++) {
if ((c = getchar()) == '\n' || c == EOF)
break;
s[i] = c;
}
s[i] = '\0';
---
初看似乎有理,但万一机器上从char到int转换时用的是符号扩展(sign extension,见K&R的The C Programming Language上第44页)的话,还是会有问题吧。
假设文件里真的包含一个0xFF的字符,那么getchar()读出来之后,赋值给c之前要转换为int,如果是用的符号扩展,还是会变成-1吧?这样不是还没读到文件末尾就结束了?
我的理解对吗?
---
Signedness of char. In C and C++, it is not specified whether the char data type is signed or unsigned. This can lead to trouble when combining chars and ints, such as in code that calls the int-valued routine getchar(). If you say
? char c; /* should be int */
? c = getchar();
the value of c will be between 0 and 255 if char is unsigned, and between -128 and 127 if char is signed, for the almost universal configuration of 8-bit characters on a two's complement machine. This has implications if the character is to be used as an array subscript or if it is to be tested against EOF, which usually has value -1 in stdio.
For instance, we had developed this code in Section 6.1 after fixing a few boundary conditions in the original version. The comparison s[i] == EOF will always fail if char is unsigned:
? int i;
? char s[MAX];
?
? for (i = 0; i < MAX-1; i++)
? if ((s[i] = getchar()) == '\n' || s[i] == EOF)
? break;
? s[i] = '\0';
When getchar returns EOF, the value 255 (0xFF. the result of converting -1 to unsigned char) will be stored in s[i]. If s[i] is unsigned, this will remain 255 for the comparison with EOF, which will fail.
Even if char is signed, however, the code isn't correct. The comparison will succeed at EOF, but a valid input byte of 0xFF will look just like EOF and terminate the loop prematurely. So regardless of the sign of char, you must always store the return value of getchar in an int for comparison with EOF.
Here is how to write the loop portably:
int c, i;
char s[MAX];
for (i = 0; i < MAX-1; i++) {
if ((c = getchar()) == '\n' || c == EOF)
break;
s[i] = c;
}
s[i] = '\0';
---
初看似乎有理,但万一机器上从char到int转换时用的是符号扩展(sign extension,见K&R的The C Programming Language上第44页)的话,还是会有问题吧。
假设文件里真的包含一个0xFF的字符,那么getchar()读出来之后,赋值给c之前要转换为int,如果是用的符号扩展,还是会变成-1吧?这样不是还没读到文件末尾就结束了?
我的理解对吗?