No binary format will ever kill CSV: plain-text based formats embody the UNIX ph...

nyokodo · on March 13, 2024

> You won't remember Parquet in 15 years, but you will have CSV files in 50 years.

You're probably right about CSV but probably not parquet. Parquet is already 11 years old, there are vast data warehouses that store parquet, it's first class in the spark ecosystem, and a key component of iceberg. Crucially, formats like parquet are "good enough" for a use case that doesn't appear to be going away. There is a high probability in my estimation that enough places are still using them in 15 years to be memorable even if it isn't as common or as visible.

dheera · on March 13, 2024

CSV is actually a nice format if it weren't for literal newlines being allowed INSIDE values. That alone makes it much harder to parse correctly with simple code because you can't count on ASCII mode readline()-like functions to fetch 1 record in entirety.

Considering it also separates records with newlines, they really should have replaced newlines with "\n" and require escaping "\" with "\\".