Afaik the win of columnar storage comes from the fact that you can very quickly ...

Someone · 2025-10-02T07:16:50 1759389410

> so queries like select a where b = 'x' are very quick

I wouldn’t say “very quick”. They only need to read and look at the data for columns a and b, whereas, with a row-oriented approach, with storage being block-based, you will read additional data, often the entire dataset.

That’s faster, but for large datasets you need an index to make things “very quick”. This format supports that, but whether to have that is orthogonal to being row/column oriented.

mr_toad · 2025-10-02T13:25:13 1759411513

Sum(x) is a better example. Indexing x won’t help when you need all the values.

rovr138 · 2025-10-02T05:18:47 1759382327

Another useful one is aggregations. Think sum(), concat(), max(), etc. You can operate on the column.

This is in contrast to row based. You have to scan the full row, to get a column. Think how you'd usually read a CSV (read line, parse line).

PhilippGille · 2025-10-02T11:13:38 1759403618

When b = 'x' is true for many rows and you select * or multiple columns, then it's the opposite, because reading all row data is slower in column based data structures than in row based ones.

IMO it's easier to explain in terms of workload:

- OLTP (T = transactional workloads), row based, for operating on rows - OLAP (A = analytical workloads), column based, for operating on columns (sum/min/max/...)