There are three forms of
columnar-orientation currently deployed by database systems today. Each builds
upon the next. The simplest form uses column-orientation to provide better data
compression. The next level of maturity stores columnar data in separate
structures to support columnar projection. The most mature implementations
support a columnar database engine that performs relational algebra on
column-oriented data. Let me explain…
Imagine a simple table with 1M rows…
with the schema and the first several rows depicted in Figure 1. Conceptually,
a row-orientation deploys data on disk and in-memory as depicted in Figure 2
and a column-orientation deploys data on disk and in-memory as depicted in
Figure 3. The actual deployment may be significantly different, as we will see.
Note that I am going to throw out
some indicative numbers around compression. I will suggest that applying
compression to rows will provide from 1.5X to 3.5X compression with and average
of 2.5X… and that applying compression to columns provides from 3X compression
to 50X compression with the average around 10X. These are supportable numbers
but the compression you see for any specific data set will vary.
There are two powerful compression
techniques that individually or combined provide most of the benefits:
dictionary-encoding and run-length encoding. For the purposes of this blog I
will describe only dictionary-encoding; and I will do an injustice to that by
explaining it only briefly and conceptually… just enough that you get the idea.
Further compression is possible by
encoding runs of similar values to a value plus the number of times it repeats
so that the bit stream 0000000000000000 could be represented as 01111 (0 occurs
24 times).
You can now also start to see why column-orientation
compresses better that a row-orientation. In the row block above there is
little opportunity to encode whole rows in a dictionary… the cardinality of
rows in a table is too high (note that this may not be true for a dimension
table which is, in-effect, a dictionary). There is some opportunity to encode
the bit runs in a row… as noted, you can expect to get 2X-2.5X from row
compression for a fact table. Column-orientation allows dictionary encoding to
be applied effectively to low cardinality columns… and this accounts for the
advantage there.
Dictionary-encoding reduces data to
a compressed form by building a map that provides a translation for each
cardinal value in the table to a tightly compressed form. For example, if there
are indeed only three values possible in the DeptID field above then we might
build a dictionary for that column as depicted in Figure 4. You can see… by
encoding and storing the data in the minimal number of bits required,
significant storage reduction is possible… and the lower the cardinality of a
column the smaller the resulting bit representation.
Note that there is no free lunch
here. There is a cost to be paid in CPU cycles to compress data and to
decompress data… but for a read-optimized data warehouse database compression
is cool. Exactly how cool depends on the level of maturity and we will get to
that as we go.
It is crucial to remember that
column store databases are relational. They ingest rows and emit rows and
perform relational algebra in-between. So there has to be some magic that turns
tuples into columns and restores them from columns. The integrity of a row has
to persist. Again I am going to defer on the details and point you at the
references below… but imagine that for each row a bit map is built that, for
each column, points to the entry in the column dictionary with the proper
value.
There is no free lunch to column
store… no free lunch anywhere, it seems. Building this bit map on INSERT is
very expensive, and modifying it on UPDATE is fairly expensive. This is why
column-orientation is not suitable for OLTP workloads without some extra
effort. But the cost is amortized by significant performance gains for READs.
One last concept: since peripheral
I/O reads blocks imagine two approaches to column compression: one applies the
concepts above to an entire table breaking each table into separate
column-oriented files that may be read separately; and one which applies the
concepts individually to each large block in a table file. Imagine, in the
first case that Figure 2 represents a picture of the first few rows in our
1M-row table. Imagine, in the second case, that Figure 2 represents the rows in
one block of data re-oriented into columns.
This second, block-oriented,
approach is called PAX, and it is more-or-less the approach used by Exadata. In
the PAX approach each block contains its own mini-column store and a dictionary
for dictionary encoding with the values in the block. Because the cardinality
for columns within a block will often be less than for an entire table there
are some distinct advantages to PAX compression. Compression will be higher by
more than a little than for full table columnar compression.
When Exadata reads a block from disk
it decompresses the data back into rows and performs row-oriented processing to
complete the query. This is very cool for Exadata… a great feature. As noted,
column compression may be 4X better than row compression on the average. This
reduces the storage requirements and reduces the overhead of I/O by 4X… and
this is a very significant improvement. But Exadata stops here. It is not
a columnar-oriented DBMS and it misses the significant advantages that come
from the next two levels of column-orientation… I’ll take these up in the next
post.
To be clear, all of the databases
that use these more mature techniques: Teradata, HANA, Greenplum, Vertica,
Paraccel, DB2, and SQL Server gain from columnar compression even if the PAX
approach provides some small advantage as a compression technique.
It is also worth noting that
Teradata does not gain as much as others in this regard. This is not because of
poor design, rather it is due to the fact that, to their credit, Teradata
implemented a Teradata-specific dictionary-based compression scheme long ago.
Columnar compression let others catch up to what Teradata has offered for
years.
And before you ask… Netezza offers
no columnar orientation… preferring to compress deeply using an FPGA
co-processor to decompress… and to reduce I/O using zone maps rather than the
using the mid-level column projection techniques in the next blog here.
References
- C-Store: A Column-oriented DBMS (Stonebraker et al., VLDB 2005)
- Column-oriented Database Systems (Harizopoulos, Abadi, Boncz, VLDB 2009)
- A tour through hybrid column/row-oriented DBMS scheme (Abadi, DBMS Musings)
Comments
Post a Comment