Tag: Storage

Understanding the Parquet file format

Published: September 27, 2021

Apache Parquet is a column storage file format used by many Hadoop systems. This post describes what Parquet is and the tricks it uses to minimise file size. We also discuss how to use Parquet, within an R workflow.

tags: r, big-data, parquet, feather, storage

Tag: Storage

Understanding the Parquet file format

Recent Posts

Top Tags

Authors