Redian新闻
>
怎麼得到字符串中的raw bytes?
avatar
怎麼得到字符串中的raw bytes?# Java - 爪哇娇娃
l*c
1
用String.charAt()得到的是字符,是經過encoding的。怎麼能得到第i個raw byte?
getBytes()似乎得到的也是encoding過的。
avatar
g*g
2
What do you mean raw bytes? String is always in certain encoding, you
get different bytes in different encoding, internally, java used
Unicode-16 encoding.

【在 l*****c 的大作中提到】
: 用String.charAt()得到的是字符,是經過encoding的。怎麼能得到第i個raw byte?
: getBytes()似乎得到的也是encoding過的。

avatar
l*c
3
Or say, I want the string internal memory block, I do not want any encoding
applied when I retrieve this memory block. I'm C++ guru, not Java guru. In C
++, I always have the raw memory block.
Currently, when I call String.charAt(), it does the encoding for me. Say,
they string is Unicode, it give me the the ith char, but I want the ith byte.

【在 g*****g 的大作中提到】
: What do you mean raw bytes? String is always in certain encoding, you
: get different bytes in different encoding, internally, java used
: Unicode-16 encoding.

avatar
o*g
4
你的思路不对
java里string的意义是一串char
char的意义是一个字符,这个字符跟byte无关
internal memory block是char或者string隐藏的东西了,一般情况下,你没必要知道
人家愿意放一个图片存这个char,那是人家愿意。
另外,jvm隐藏了memory block。即便Array,人家也不保证这些元素是连续的空间存放的
你不能做这个假设
string同理

encoding
C
byte.

【在 l*****c 的大作中提到】
: Or say, I want the string internal memory block, I do not want any encoding
: applied when I retrieve this memory block. I'm C++ guru, not Java guru. In C
: ++, I always have the raw memory block.
: Currently, when I call String.charAt(), it does the encoding for me. Say,
: they string is Unicode, it give me the the ith char, but I want the ith byte.

avatar
g*g
5
In C++, you do use encoding. Let's take it this way, even the simplest
ASCII is an encoding scheme. Everything in computer is encoded, data is
encoded, instruction is encoded. How can a string not encoded?
In most C++ systems, the native char array is OS dependent. For example,
ISO-8859-1 would be the default encoding on windows. If you are using
western characters only, you can call String.getBytes("ISO-8859-1"),
then you can apply your C++ tricks. That being said, Java provides a
strong String

【在 l*****c 的大作中提到】
: Or say, I want the string internal memory block, I do not want any encoding
: applied when I retrieve this memory block. I'm C++ guru, not Java guru. In C
: ++, I always have the raw memory block.
: Currently, when I call String.charAt(), it does the encoding for me. Say,
: they string is Unicode, it give me the the ith char, but I want the ith byte.

avatar
l*c
6
這個不行,我現在就是要做這件事情。幫忙想想怎麼做到?

放的

【在 o***g 的大作中提到】
: 你的思路不对
: java里string的意义是一串char
: char的意义是一个字符,这个字符跟byte无关
: internal memory block是char或者string隐藏的东西了,一般情况下,你没必要知道
: 人家愿意放一个图片存这个char,那是人家愿意。
: 另外,jvm隐藏了memory block。即便Array,人家也不保证这些元素是连续的空间存放的
: 你不能做这个假设
: string同理
:
: encoding

avatar
l*c
7
I think you know what I mean. Or I would say you know what I skipped in my statement.
OK. Actually I need to do special encoding and escaping. So, I need to get
the raw binary sequence of the string (UTF-8) and apply my encoding. Any
suggestion how to use the stupid java to do this?

【在 g*****g 的大作中提到】
: In C++, you do use encoding. Let's take it this way, even the simplest
: ASCII is an encoding scheme. Everything in computer is encoded, data is
: encoded, instruction is encoded. How can a string not encoded?
: In most C++ systems, the native char array is OS dependent. For example,
: ISO-8859-1 would be the default encoding on windows. If you are using
: western characters only, you can call String.getBytes("ISO-8859-1"),
: then you can apply your C++ tricks. That being said, Java provides a
: strong String

avatar
c*t
8
Why don't you use byte[] for stuff read in? Why did you have to use
String in the first place? How did you get your data into String?

statement.

【在 l*****c 的大作中提到】
: I think you know what I mean. Or I would say you know what I skipped in my statement.
: OK. Actually I need to do special encoding and escaping. So, I need to get
: the raw binary sequence of the string (UTF-8) and apply my encoding. Any
: suggestion how to use the stupid java to do this?

avatar
g*g
9
If you need to encode your string in UTF-8, use
String.getBytes("UTF-8") and you'll get it.
String.getBytes(charset) does encoding
new String(bytes, charset) does decoding

statement.

【在 l*****c 的大作中提到】
: I think you know what I mean. Or I would say you know what I skipped in my statement.
: OK. Actually I need to do special encoding and escaping. So, I need to get
: the raw binary sequence of the string (UTF-8) and apply my encoding. Any
: suggestion how to use the stupid java to do this?

avatar
l*c
10
sigh, because the input is passed to me in String. I don't control of their
API.

【在 c*****t 的大作中提到】
: Why don't you use byte[] for stuff read in? Why did you have to use
: String in the first place? How did you get your data into String?
:
: statement.

avatar
l*c
11
Thanks, let me try it.

【在 g*****g 的大作中提到】
: If you need to encode your string in UTF-8, use
: String.getBytes("UTF-8") and you'll get it.
: String.getBytes(charset) does encoding
: new String(bytes, charset) does decoding
:
: statement.

avatar
l*c
12
Thanks, let me try it.

【在 g*****g 的大作中提到】
: If you need to encode your string in UTF-8, use
: String.getBytes("UTF-8") and you'll get it.
: String.getBytes(charset) does encoding
: new String(bytes, charset) does decoding
:
: statement.

avatar
b*y
13
UTF-8不错的。楼上说的有道理。
avatar
F*n
14
goodbug is right. everything is encoded. there is no "raw bytes" either in C++ or Java. There is only default encoding.
avatar
l*c
15
I assume everyone knows what goodbug said, what he said is well known right.
But, there is "raw bytes". What I did not say clear is, it is the raw bytes
of a specific encoding.
Say, you encode the string "今狐冲" in UTF-8, it is actually store in memory
as "E4 BB 8A E7 8B 90 E5 86 B2" (of course, when I paste it here, it is
encoded in another encoding schema). What I need is each of these individual
raw bytes.

C++ or Java. There is only default encoding.

【在 F****n 的大作中提到】
: goodbug is right. everything is encoded. there is no "raw bytes" either in C++ or Java. There is only default encoding.
相关阅读
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。