Apache Parquet is a popular column storage file format used by Hadoop systems, such as Pig, Spark, and Hive. The file format is language independent and has a binary representation. Parquet is used to efficiently store large data sets and has the extension .parquet. This blog post aims to understand how parquet works and the tricks it uses to efficiently store data. Key features of parquet are: it’s cross platform it’s a recognised file format used by many systems it stores data in a column layout it stores metadata The latter two points allow for efficient storage and querying of data.
storage
Recent Posts
- Shiny in Production: Sponsors
- Reproducible reports with Jupyter
- posit::conf(2023)
- Shiny in Production: Full speaker lineup
- Using Stan to analyse global UFO sighting reports
- Talks to watch at the RSS International Conference 2023
- Our ISO 27001 Certification
- Best Practices for Data Cleaning and Preprocessing
- SatRdays London 2023 - Recordings
- Generate multiple presentations with Quarto parameters