HOW TO DELETE IN KEY-VALUE STORE - 未名空间MITBBS历史存档

国际科技财经博客移民网络热点娱乐民生时事公众号

Redian新闻

>未名空间

>Programming - 葵花宝典

HOW TO DELETE IN KEY-VALUE STORE

HOW TO DELETE IN KEY-VALUE STORE# Programming - 葵花宝典

j*u2015-11-24 08:11

1 楼

比如web design啊，或者跳舞啊什么的，谢谢！

j*s2015-11-24 08:11

2 楼

不知从什么时候开始，手机变成了生活中，再也离不开的必须品，它像是一个千古难觅
的知音，在任何时候，任何场景，都能默默地陪伴于左右，它理解你一切苦楚，你一个
人孤独寂寞的时候它在、你和不相熟的陌生人遭遇无话可说的尴尬时它在、你在学习工
作时突然间想起某件事情的时候它也在……
它不会说话、不会跳舞、不用吃饭，却能在你需要它的时候，第一时间给予你男（女）
朋友一样的温暖，家人一样的安慰。有时候会想，我是何其荣幸生活在一个拥有它的时
代？
起初我自信是它的全世界，看着它每天就那么静静地躺在身旁，等着自己去抚摸，去宠
爱，心里莫名地骄傲：“呵，玩具而已，谁说玩物一定会丧志？”
可渐渐地，角色不知何时悄悄发生了转变，某一天，我惊讶地发现，原来自己是那么迷
恋这个几乎在吃饭、学习、工作、走路、睡觉前与后，都不曾离开过我的它。俨然它已
经变成了我的全世界。
想来想去，离了手机，我可能真的不能活呵呵。

w*g2015-11-24 08:11

3 楼

赌IP5的一些东西, 什么外形啊, 性能什么的

k*r2015-11-24 08:11

4 楼

To DANIU men,
I am trying to study distributed key-value store. And have a question.
Like Dynamo and Voldemort, the data is append-only written, and the index is
caches. I understand the write and retrieve procedures. However, how to
deal with delete in these systems?
Thanks,

d*m2015-11-24 08:11

5 楼

这个是个很好的问题

d*g2015-11-24 08:11

6 楼

哈哈，有人要说PA了，静观其变

【在 w*****g 的大作中提到】

: 赌IP5的一些东西, 什么外形啊, 性能什么的

g*g2015-11-24 08:11

7 楼

Isn't delete just a write? Delete operation is appended in commit log,
during compaction, the row is removed.

is

【在 k****r 的大作中提到】

: To DANIU men,
: I am trying to study distributed key-value store. And have a question.
: Like Dynamo and Voldemort, the data is append-only written, and the index is
: caches. I understand the write and retrieve procedures. However, how to
: deal with delete in these systems?
: Thanks,

d*m2015-11-24 08:11

8 楼

我自己的答案是，不能

w*g2015-11-24 08:11

9 楼

切, 这是我们果粉之间的腻称.
你们这些安猪WSN是永远不懂我们果果微机男的.

【在 d********g 的大作中提到】

: 哈哈，有人要说PA了，静观其变

w*g2015-11-24 08:11

10 楼

我不知道具体系统怎么实现的。log structure storage的惯用做法应该是
先把对应的记录定位了，打上叉叉表示删掉了。（能search就能定位记录)
然后用打叉叉的记录多了以后用garbage collection来回收空间。

is

【在 k****r 的大作中提到】

d*m2015-11-24 08:11

11 楼

这是现代人的通病了

d*g2015-11-24 08:11

12 楼

你这是污蔑aaaty，难道你是说it只许州官放火，不许百姓点灯？你这样PA it，我都看
不下去了

【在 w*****g 的大作中提到】

: 切, 这是我们果粉之间的腻称.
: 你们这些安猪WSN是永远不懂我们果果微机男的.

k*r2015-11-24 08:11

13 楼

re: Isn't delete just a write? Delete operation is appended in commit log,
during compaction, the row is removed.
If delete is one special type of write, can I understand the it as a rewrite
on a key?
BTW, how to swap the page cache for keys? Lets say, the older data file and
index file are like keyA-valueA1, keyB-valueB1, keyC-valueC1, and the new
coming ones are: keyA-valueA2, keyC-valueC2. Then, what is swap doing on
cache? it should still be like keyA...keyB...keyC, but only value of A and B
are updated, right? Then, if the new operation is delB, the keyB value is
set as a special value?
re: 我不知道具体系统怎么实现的。log structure storage的惯用做法应该是
先把对应的记录定位了，打上叉叉表示删掉了。（能search就能定位记录)
然后用打叉叉的记录多了以后用garbage collection来回收空间。
garbage collection is operated during SWAP to new version of data?

j*s2015-11-24 08:11

14 楼

是的，我觉得我也离不开呵呵

【在 d********m 的大作中提到】

: 这是现代人的通病了

r*y2015-11-24 08:11

15 楼

首先你不能打着gay的旗帜套近乎，那也不安全啊
其次来点原创精神好不好，从安猪WSN那里copy腻称，丢gf的脸啊

【在 w*****g 的大作中提到】

: 切, 这是我们果粉之间的腻称.
: 你们这些安猪WSN是永远不懂我们果果微机男的.

g*g2015-11-24 08:11

16 楼

Every row/column has a flag, you mark it as invalid, that's a delete. All
write operations will also invalidate and/or update the cache for given row.

rewrite
and
B

【在 k****r 的大作中提到】

: re: Isn't delete just a write? Delete operation is appended in commit log,
: during compaction, the row is removed.
: If delete is one special type of write, can I understand the it as a rewrite
: on a key?
: BTW, how to swap the page cache for keys? Lets say, the older data file and
: index file are like keyA-valueA1, keyB-valueB1, keyC-valueC1, and the new
: coming ones are: keyA-valueA2, keyC-valueC2. Then, what is swap doing on
: cache? it should still be like keyA...keyB...keyC, but only value of A and B
: are updated, right? Then, if the new operation is delB, the keyB value is
: set as a special value?

o*o2015-11-24 08:11

17 楼

赌输了名媛贴裸照

w*g2015-11-24 08:11

18 楼

按goodbug说的，日志里就是
keyA-valueA1
keyB-valueB1
keyC-valueC1
keyA-valueA2,
keyC-valueC2.
delB
读的时候往回找，先找到啥是啥。看到delB就表示B已经没了。
这就真是纯log structure了。
不过我没想明白on-disk索引怎么做，所以说了打叉叉那个办法。
log structure当时被想出来的假设是大内存能够保证足够高的cache hit rate，
所以读磁盘的效率差并不重要，而按log的顺序写入能保证最大化写入吞吐量。
cache在内存中，更新和log顺序没有关系。如果索引在内存中，也和log没关系。
其实以后SSD多了，顺序写的优势也就不再那么重要了。

rewrite
and
B

【在 k****r 的大作中提到】

w*g2015-11-24 08:11

19 楼

什么578糟的.
我现在对革命前景很悲观.

【在 r****y 的大作中提到】

: 首先你不能打着gay的旗帜套近乎，那也不安全啊
: 其次来点原创精神好不好，从安猪WSN那里copy腻称，丢gf的脸啊

k*r2015-11-24 08:11

20 楼

Thank you for your reply. And it should make sense, especially for the
system with the LRU cache.
However, I don't remember there is related field in Dynamo/Voldemort systems
. Also, I though its caching for versions is not similar with LRU....

row.

【在 g*****g 的大作中提到】

: Every row/column has a flag, you mark it as invalid, that's a delete. All
: write operations will also invalidate and/or update the cache for given row.
:
: rewrite
: and
: B

s*32015-11-24 08:11

21 楼

能什么样子呢，完全没有概念啊，这里有没有apple的工作人员啊，用汉语拼音描述一
下形状呗。。。

w*g2015-11-24 08:11

22 楼

log-structured file system是1990年左右出现的，在HDD时代是一个很牛B的发现。
其实现在SSD盛行后已经没啥优势了。之所有目前盛行，我觉得主要是industry还没缓
过劲来，
还有就是实现比较容易。类似的，event-vs-thread以及nginx的牛B是前multi-core时
代的事情。
现在核越来越多了，单核计算能力也越来越强，未来的方向是明显利于thread的。
但是industry还没缓过劲来，所以会阶段性地出现continuation passing style大行其
道的局面。
node.js之类的应该火不了几年了。

【在 w***g 的大作中提到】

: 按goodbug说的，日志里就是
: keyA-valueA1
: keyB-valueB1
: keyC-valueC1
: keyA-valueA2,
: keyC-valueC2.
: delB
: 读的时候往回找，先找到啥是啥。看到delB就表示B已经没了。
: 这就真是纯log structure了。
: 不过我没想明白on-disk索引怎么做，所以说了打叉叉那个办法。

h*82015-11-24 08:11

23 楼

留名看此贴能保留多久

k*r2015-11-24 08:11

24 楼

Also, what is the relations between the log, and the data in disk/memory.'
Thanks,

c*y2015-11-24 08:11

25 楼

坑
留名

g*g2015-11-24 08:11

26 楼

分布式系统里删除特殊的地方在于不是所有节点都会收到这条指令，可能会丢。如果你
真的删除了数据，你怎么确定是一个写操作这个节点没收到，还是这个删除别的节点没
收到？
所以 Cassandra一类的系统是写入一个特殊值叫 tombstone，只在一定时间之后做
compaction，这样出错的概率就很低。

【在 w***g 的大作中提到】

g*g2015-11-24 08:11

27 楼

Every system is different, my knowledge is on Cassandra and it's not
necessary accurate for other system. Think of it as one possible solution.

systems

【在 k****r 的大作中提到】

: Thank you for your reply. And it should make sense, especially for the
: system with the LRU cache.
: However, I don't remember there is related field in Dynamo/Voldemort systems
: . Also, I though its caching for versions is not similar with LRU....
:
: row.

w*g2015-11-24 08:11

28 楼

传统数据库的数据主要存在一个磁盘上的B+树(或者hash)表中，log是一个为了保证数据
完整性的机制。更新B+树之前先把对应的操作写入log中。这样如果B+树更新到一半系统
断电导致数据结构损坏，可以通过replay log的办法重建B+树。后来人们发现其实所有
的数据都在log里，要用的时候去log里也能找出数据来，就觉得可以把B+树扔掉了。
然后log就变成了磁盘上唯一的数据结构。数据库的log和一般程序的日志不一样。
数据库的log存的是数据本身，所以必须存在于磁盘上。内存中只能是索引或者cache。

【在 k****r 的大作中提到】

: Also, what is the relations between the log, and the data in disk/memory.'
: Thanks,

k*r2015-11-24 08:11

29 楼

Many thanks for DANIUMEN!!! I think I still have a lot of things to learn:)

w*z2015-11-24 08:11

30 楼

In case of C*, make sure to run repair at least once within GC grade period.
Otherwise, tombstones may come back to live. As goodbug said, deletes might
not get to all the nodes because of the eventual consistency, and repair
will fix that.
Read this blog if you want to know more, delete in distributes system is
tricky.
http://thelastpickle.com/blog/2011/05/15/Deletes-and-Tombstones

【在 g*****g 的大作中提到】

: 分布式系统里删除特殊的地方在于不是所有节点都会收到这条指令，可能会丢。如果你
: 真的删除了数据，你怎么确定是一个写操作这个节点没收到，还是这个删除别的节点没
: 收到？
: 所以 Cassandra一类的系统是写入一个特殊值叫 tombstone，只在一定时间之后做
: compaction，这样出错的概率就很低。

w*z2015-11-24 08:11

31 楼

commit log is for durability. C* writes to memtable along with commit log.
In case node crashes before the memtable is flushed to disk, it will recover
from commit log.
For reads, it doesn't go to commit log, it goes to memtable and sstable and
merge them.

【在 w***g 的大作中提到】

k*r2015-11-24 08:11

32 楼

Nice information. Let me study first :P

period.
might

【在 w**z 的大作中提到】

: In case of C*, make sure to run repair at least once within GC grade period.
: Otherwise, tombstones may come back to live. As goodbug said, deletes might
: not get to all the nodes because of the eventual consistency, and repair
: will fix that.
: Read this blog if you want to know more, delete in distributes system is
: tricky.
: http://thelastpickle.com/blog/2011/05/15/Deletes-and-Tombstones