请教一道老题目 - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>JobHunting - 待字闺中

请教一道老题目

请教一道老题目# JobHunting - 待字闺中

U*A2015-01-16 08:01

1 楼

就是没有看懂题目。
Given a byte array, which is an encoding of characters. Here is the rule:
a. If the first bit of a byte is 0, that byte stands for a one-byte
character
b. If the first bit of a byte is 1, that byte and its following byte
together stand for a two-byte character
Now implement a function to decide if the last character is a one-byte
character or a two-byte character
Constraint: You must scan the byte array from the end to the start.
Otherwise it will be very trivial.
能否请高人给个例子。谢谢

w*a2015-01-16 08:01

2 楼

这个不是日本更年期阿姨出的题么？

U*A2015-01-16 08:01

3 楼

就是啊，我没有看懂题目，太弱了。

U*A2015-01-16 08:01

4 楼

顶顶。

l*n2015-01-16 08:01

5 楼

二进制表示的byte串
01101111 10101010 01010101
第一个byte 01101111 组成了第一个character,因为第一个byte以0开头
第二三个byte 10101010 01010101组成了第二个character，因为第二个byte以1开头
从尾向前扫描一遍，就可以判断最后一个byte是属于一个单字节的字符还是双字节的字符

【在 U***A 的大作中提到】

: 就是啊，我没有看懂题目，太弱了。

U*A2015-01-16 08:01

6 楼

got it. Thanks.

b*i2015-01-16 08:01

7 楼

Hi, not so hard using recursion:
Explanation:
For every index, I maintain three states: ONEBYTE (single byte),
TWOBYTEFIRST (1st element of a two-byte), TWOBYTELAST (2nd element of a two-
byte). If you now the state of i-1'th element, you can deduce the state of i
'th.
One minor case is that if the last element is identified as the first byte
of a two-byte, the below logic still returns one-byte. But this can be
modified.
------------------------
class Solution {
public:
int last_byte_class(string bytes) {
// return 1: one-byte, 2: two-byte
int n = bytes.length();
int st = helper(bytes, n-1);
if(st == TWOBYTELAST)
return 2;
else
return 1;
}
int helper(string bytes, int last_index) {
char c = bytes[last_index];
int first_bit = find_first_bit(c), curr_state;
if(last_index == 0) {
curr_state = (first_bit == 0) ? ONEBYTE : TWOBYTEFIRST;
} else {
int prev_state = helper(bytes, last_index-1);
if(prev_state == ONEBYTE || prev_state == TWOBYTELAST)
curr_state = (first_bit == 0) ? ONEBYTE: TWOBYTEFIRST;
else
curr_state = TWOBYTELAST;
}
return curr_state;
}
int find_first_bit(char c) {
return (c >> 7) & 1;
}
private:
enum State {ONEBYTE, TWOBYTELAST, TWOBYTEFIRST};
};

【在 U***A 的大作中提到】

: 就是没有看懂题目。
: Given a byte array, which is an encoding of characters. Here is the rule:
: a. If the first bit of a byte is 0, that byte stands for a one-byte
: character
: b. If the first bit of a byte is 1, that byte and its following byte
: together stand for a two-byte character
: Now implement a function to decide if the last character is a one-byte
: character or a two-byte character
: Constraint: You must scan the byte array from the end to the start.
: Otherwise it will be very trivial.

n*n2015-01-16 08:01

8 楼

是不就是当年 UC-DOS 这类系统实现汉字编码的原理嘛。
标准 ascii 码定义了127个符号。编号为1-127，转换为二进制就是 0000 0001 - 0111
1111 这个段。编号128-255的字符是扩展字符，在不同的语言中代表不同的符号。
但是汉字的数量远大于128，一个字节的编码是不够的，于是就需要两个字节来代表一
个汉字。在解码的是时候需要判断第一个字节的最高位是否为0，如果为0，那这个字节
代表一个ascii字符，如果是1，那么这个字节和下一个字节组成一个双字节的汉字字符
。