Data Structure: log structure merge tree(LSM)

Data Structure: log structure merge tree(LSM):

LSM Tree is a data structure that restricts your datastore to append only operations, whereas your datastore is simply a log containing totally ordered key-value pairs. The sort order is by a key which can be any byte sequence. The values too can be any arbitrary byte sequence. For example, in Cassandra's data model, a column family (akin to a table in a relational database) contains a row key (the key) and any sequence of arbitrary key values that can vary in cardinality (the value).

1. Fast write

2. Data stored in relational database
a. backed by data structure B-tree

B-tree:

it's optimized for reads
updating is relatively more expensive as it involves random I/o, updating multiple pages on the disk
this limits how fast a B-tree can ingest data

3. LSM works differently writes are batched in memory as they arrive in a structure called Memtable (memory)
1. Memtable is ordered by object key and is usually implemented as a balanced binary trees
4. Memtable reaches some specific size, it is flushed to disk as an immutable Sorted String table(disk)

SSTable:

SSTable stores the key-value pairs in a sorted sequence. These writes are all sequential IO, they are fast on any storage media
The new SSTable becomes the most recent segment of the LSM tree
As more data comes in more and more of these immutable SSTables are created and added to the LSM tree
with each one representing a small chronological subset of the incoming changes
an update to an existing object key does not overwrite an existing SSTable.
instead, the new entry is added to the most recent SSTable which supersedes any entries in old SSTables for the object key.
to perform a delete it adds a marker called a tombstone to the most recent SSTable for the object key
when we encounter a tombstone on read we know the object has been deleted
As outdated entries as keys are updated and tombstones are added it will take up precious space to fix these issues there are periodic merging and compaction processes running in the background
To Merge SSTable and discard outdated or deleted values. This help to reclaim disk space and caps the number of SSTables a reader has to look through.
Since they are sored this merging and compaction process is simple and efficient
the approach is similar to the one used to merge phase of the mergesort algorithm
this in a nutshell is how an LSM tree provides fast writes

Recap:

* An LSM tree buffers incoming writes in memory
* when the buffer fills up, we sort and flush it to disk as an immutable SSTable.
* As more and more SSTable are created because of update and delete which mark tombstone and consumes precious memory space, to resolve this issue LSM tree merges the SSTables and Compacts them in the Background using an approach similar to the one used to merge phase of the mergesort algorithm

Compaction Process:

When SSTables are merged they are organized into levels
This is where tree part of the LSM tree comes in part
the different strategies to determine when and how SSTables are merged and compacted

1. Tiering => write optimized
2. Leveling => read Optimized

Key Points:

Compaction keeps the number of SSTables manageable
SSTables are organized into levels where each is exponentially larger as SSTables from the level above are merged into it
Compaction consumes a lot of I/O, Mistuned Compaction could starve the system and slow down both reading and writing

Standard Optimization for LSM trees in Production Systems

to look up a key it performs searches on SSTables at every level Though the search is fast on sorted data going through all the data consumes a lot of I/O.
Many Systems keep a summary table in memory that contains the min/max range of each disk block of every level this allows the system to skip searches on those disk blocks where the key does not fall within the scope this saves a lot of random IO
Another issue that could be potentially expensive is looking up a key that does not expensive
this would require looking through all the eligible blocks in all level
The most system keeps a Bloom filter on each level, Bloom Filter is a space-efficient data structure that returns

a firm no if the key does not exist
probable yes if the key might exist

Bloom Filter:

it's a space-efficient probabilistic data structure
is used to answer the question, Is this element in the set?

A Bloom filter would answer with either a "firm no" or a "Probable yes".
False positives are possible -> element not there but say it is present
False negatives are not possible -> element is there but says it is not
we can't remove the item from the bloom filter, it never forgets.

This above feature allows the system to skip a level entirely if the key does not exist there which dramatically reduces the number of random I/O required.

NO SQL database backed by LSM tree could be tuned to support a very high write rate

NoSQL database: Cassandra, RocksDB, influx dB

reference: youtube LSM tree, LSM Tree Blog

Search This Blog

Techi Life Requirements