Tag: Parquet

Parquet vs the RDS Format

Published: February 1, 2024

A benefit of using the {arrow} package with parquet files is it enables you to work with ridiculously large data sets from the comfort of an R session. In this post we explore the timescales associated with different methods of data storage.

tags: r, arrow, parquet, rds

Reading and Writing Data with {arrow}

Author: Colin Gillespie

Published: January 18, 2024

Apache Arrow is a cross-language format for super fast in-memory data. It's designed for efficient analytic operations. In this post, we look at reading and writing data using Arrow and the advantages of the parquet file format.

tags: r, arrow, parquet

Understanding the Parquet file format

Author: Colin Gillespie

Published: September 27, 2021

Apache Parquet is a column storage file format used by many Hadoop systems. This post describes what Parquet is and the tricks it uses to minimise file size. We also discuss how to use Parquet, within an R workflow.

tags: r, big-data, parquet, feather, storage

Tag: Parquet

Parquet vs the RDS Format

Reading and Writing Data with {arrow}

Understanding the Parquet file format

Recent Posts

Top Tags

Authors