Apache Parquet is a popular column storage file format used by Hadoop systems, such as Pig, Spark, and Hive. The file format is language independent and has a binary representation. Parquet is used to efficiently store large data sets and has the extension .parquet. This blog post aims to understand how parquet works and the tricks it uses to efficiently store data. Key features of parquet are: it’s cross platform it’s a recognised file format used by many systems it stores data in a column layout it stores metadata The latter two points allow for efficient storage and querying of data.
storage
Recent Posts
- How to customise the style of your {shinydashboard} Shiny app
- Network Error Logging - Important Insights
- SatRdays London 2023: Speakers
- Content Security Policy - Why You Need It
- Why should I use R: The Excel R Data Wrangling comparison: Part 1
- Shiny in Production 2023: Workshops
- Work smarter; not harder: COVID-19 processing for the WHO/Europe
- Should I learn Stan?
- February Training Update
- Quarto for the Python user