What do you mean raw bytes? String is always in certain encoding, you get different bytes in different encoding, internally, java used Unicode-16 encoding.
Or say, I want the string internal memory block, I do not want any encoding applied when I retrieve this memory block. I'm C++ guru, not Java guru. In C ++, I always have the raw memory block. Currently, when I call String.charAt(), it does the encoding for me. Say, they string is Unicode, it give me the the ith char, but I want the ith byte.
【在 g*****g 的大作中提到】 : What do you mean raw bytes? String is always in certain encoding, you : get different bytes in different encoding, internally, java used : Unicode-16 encoding.
【在 l*****c 的大作中提到】 : Or say, I want the string internal memory block, I do not want any encoding : applied when I retrieve this memory block. I'm C++ guru, not Java guru. In C : ++, I always have the raw memory block. : Currently, when I call String.charAt(), it does the encoding for me. Say, : they string is Unicode, it give me the the ith char, but I want the ith byte.
g*g
5 楼
In C++, you do use encoding. Let's take it this way, even the simplest ASCII is an encoding scheme. Everything in computer is encoded, data is encoded, instruction is encoded. How can a string not encoded? In most C++ systems, the native char array is OS dependent. For example, ISO-8859-1 would be the default encoding on windows. If you are using western characters only, you can call String.getBytes("ISO-8859-1"), then you can apply your C++ tricks. That being said, Java provides a strong String
【在 l*****c 的大作中提到】 : Or say, I want the string internal memory block, I do not want any encoding : applied when I retrieve this memory block. I'm C++ guru, not Java guru. In C : ++, I always have the raw memory block. : Currently, when I call String.charAt(), it does the encoding for me. Say, : they string is Unicode, it give me the the ith char, but I want the ith byte.
I think you know what I mean. Or I would say you know what I skipped in my statement. OK. Actually I need to do special encoding and escaping. So, I need to get the raw binary sequence of the string (UTF-8) and apply my encoding. Any suggestion how to use the stupid java to do this?
【在 g*****g 的大作中提到】 : In C++, you do use encoding. Let's take it this way, even the simplest : ASCII is an encoding scheme. Everything in computer is encoded, data is : encoded, instruction is encoded. How can a string not encoded? : In most C++ systems, the native char array is OS dependent. For example, : ISO-8859-1 would be the default encoding on windows. If you are using : western characters only, you can call String.getBytes("ISO-8859-1"), : then you can apply your C++ tricks. That being said, Java provides a : strong String
c*t
8 楼
Why don't you use byte[] for stuff read in? Why did you have to use String in the first place? How did you get your data into String?
statement.
【在 l*****c 的大作中提到】 : I think you know what I mean. Or I would say you know what I skipped in my statement. : OK. Actually I need to do special encoding and escaping. So, I need to get : the raw binary sequence of the string (UTF-8) and apply my encoding. Any : suggestion how to use the stupid java to do this?
g*g
9 楼
If you need to encode your string in UTF-8, use String.getBytes("UTF-8") and you'll get it. String.getBytes(charset) does encoding new String(bytes, charset) does decoding
statement.
【在 l*****c 的大作中提到】 : I think you know what I mean. Or I would say you know what I skipped in my statement. : OK. Actually I need to do special encoding and escaping. So, I need to get : the raw binary sequence of the string (UTF-8) and apply my encoding. Any : suggestion how to use the stupid java to do this?
l*c
10 楼
sigh, because the input is passed to me in String. I don't control of their API.
【在 c*****t 的大作中提到】 : Why don't you use byte[] for stuff read in? Why did you have to use : String in the first place? How did you get your data into String? : : statement.
l*c
11 楼
Thanks, let me try it.
【在 g*****g 的大作中提到】 : If you need to encode your string in UTF-8, use : String.getBytes("UTF-8") and you'll get it. : String.getBytes(charset) does encoding : new String(bytes, charset) does decoding : : statement.
l*c
12 楼
Thanks, let me try it.
【在 g*****g 的大作中提到】 : If you need to encode your string in UTF-8, use : String.getBytes("UTF-8") and you'll get it. : String.getBytes(charset) does encoding : new String(bytes, charset) does decoding : : statement.
b*y
13 楼
UTF-8不错的。楼上说的有道理。
F*n
14 楼
goodbug is right. everything is encoded. there is no "raw bytes" either in C++ or Java. There is only default encoding.
l*c
15 楼
I assume everyone knows what goodbug said, what he said is well known right. But, there is "raw bytes". What I did not say clear is, it is the raw bytes of a specific encoding. Say, you encode the string "今狐冲" in UTF-8, it is actually store in memory as "E4 BB 8A E7 8B 90 E5 86 B2" (of course, when I paste it here, it is encoded in another encoding schema). What I need is each of these individual raw bytes.
C++ or Java. There is only default encoding.
【在 F****n 的大作中提到】 : goodbug is right. everything is encoded. there is no "raw bytes" either in C++ or Java. There is only default encoding.