avatar
请教一道老题目# JobHunting - 待字闺中
U*A
1
就是没有看懂题目。
Given a byte array, which is an encoding of characters. Here is the rule:
a. If the first bit of a byte is 0, that byte stands for a one-byte
character
b. If the first bit of a byte is 1, that byte and its following byte
together stand for a two-byte character
Now implement a function to decide if the last character is a one-byte
character or a two-byte character
Constraint: You must scan the byte array from the end to the start.
Otherwise it will be very trivial.
能否请高人给个例子。谢谢
avatar
w*a
2
这个不是日本更年期阿姨出的题么?
avatar
U*A
3
就是啊,我没有看懂题目,太弱了。
avatar
U*A
4
顶顶。
avatar
l*n
5
二进制表示的byte串
01101111 10101010 01010101
第一个byte 01101111 组成了第一个character,因为第一个byte以0开头
第二三个byte 10101010 01010101组成了第二个character,因为第二个byte以1开头
从尾向前扫描一遍,就可以判断最后一个byte是属于一个单字节的字符还是双字节的字符

【在 U***A 的大作中提到】
: 就是啊,我没有看懂题目,太弱了。
avatar
U*A
6
got it. Thanks.
avatar
b*i
7
Hi, not so hard using recursion:
Explanation:
For every index, I maintain three states: ONEBYTE (single byte),
TWOBYTEFIRST (1st element of a two-byte), TWOBYTELAST (2nd element of a two-
byte). If you now the state of i-1'th element, you can deduce the state of i
'th.
One minor case is that if the last element is identified as the first byte
of a two-byte, the below logic still returns one-byte. But this can be
modified.
------------------------
class Solution {
public:
int last_byte_class(string bytes) {
// return 1: one-byte, 2: two-byte
int n = bytes.length();
int st = helper(bytes, n-1);
if(st == TWOBYTELAST)
return 2;
else
return 1;
}
int helper(string bytes, int last_index) {
char c = bytes[last_index];
int first_bit = find_first_bit(c), curr_state;
if(last_index == 0) {
curr_state = (first_bit == 0) ? ONEBYTE : TWOBYTEFIRST;
} else {
int prev_state = helper(bytes, last_index-1);
if(prev_state == ONEBYTE || prev_state == TWOBYTELAST)
curr_state = (first_bit == 0) ? ONEBYTE: TWOBYTEFIRST;
else
curr_state = TWOBYTELAST;
}
return curr_state;
}
int find_first_bit(char c) {
return (c >> 7) & 1;
}
private:
enum State {ONEBYTE, TWOBYTELAST, TWOBYTEFIRST};
};

【在 U***A 的大作中提到】
: 就是没有看懂题目。
: Given a byte array, which is an encoding of characters. Here is the rule:
: a. If the first bit of a byte is 0, that byte stands for a one-byte
: character
: b. If the first bit of a byte is 1, that byte and its following byte
: together stand for a two-byte character
: Now implement a function to decide if the last character is a one-byte
: character or a two-byte character
: Constraint: You must scan the byte array from the end to the start.
: Otherwise it will be very trivial.

avatar
n*n
8
是不就是当年 UC-DOS 这类系统实现汉字编码的原理嘛。
标准 ascii 码定义了127个符号。编号为1-127,转换为二进制就是 0000 0001 - 0111
1111 这个段。编号128-255的字符是扩展字符,在不同的语言中代表不同的符号。
但是汉字的数量远大于128,一个字节的编码是不够的,于是就需要两个字节来代表一
个汉字。在解码的是时候需要判断第一个字节的最高位是否为0,如果为0,那这个字节
代表一个ascii字符,如果是1,那么这个字节和下一个字节组成一个双字节的汉字字符
avatar
s*x
9
不会出现nest的情况吧?
1aaa 1bbb 0ccc ...
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。