Starburst Galaxy

  •  Get started

  •  Working with data

  •  Data engineering

  •  Developer tools

  •  Cluster administration

  •  Security and compliance

  •  Troubleshooting

  • Galaxy status

  •  Reference

  • Partitioning #

    The use of query engines has become a popular way for companies to adopt the agility and flexibility of modern data lake architectures. However, handling the massive amounts of data that need to be scanned and processed in the data lake to serve users’ queries remains a significant challenge.

    Partitioning has been a common solution to reduce query time and cost, but it has limitations.

    Partitioning does not reduce data reads for queries that filter on other columns different from the partition columns, and it doesn’t help in cases where query patterns are dynamic and involve predicates on more than a few columns.

    Smart Indexing #

    The solution is Smart Indexing, which is a method of creating separate files that can be used to quickly identify data. Smart Indexing eliminates the need for extensive data reads and enables faster query performance.

    Starburst Galaxy uses an indexing mechanism that is uniquely optimized for high-performance analytics called nanoblock indexing. Instead of storing one large index for each column that the user selects, Starburst Warp Speed dynamically creates millions of nanoblocks — a few dozen kilobyte-sized sub-sections of the indexed column. This removes the need to worry about data partitioning and layout.