Why Spark Suffers on Wide Iceberg Tables

At my previous job, we struggled mightily with reading and writing wide iceberg tables with Spark. Watching this Future Data Systems talk by Russell Spitzer, I think I finally understand why. He mentions it almost as a footnote, in response to a question about the Iceberg REST catalog:

One of the most common problems that people have with really wide tables…the way Parquet is constructed, even though they have columnar representations of your data, you have to keep all of the column vectors for the same row group in the same file…your files are very wide and your columns end up being very short to end up in the same file

We had tables with 4000+ columns with about 300 million rows, and we were using Spark to process data. With Spark, we wanted to keep Parquet file sizes small so that reads and writes are fast. This means that a single spark task could not possibly handle many rows due to the file size constraint on the Parquet file.

Looking back, one way to get around this issue is to split the table up into multiple iceberg tables, but this approach itself needs to be benchmarked. This is a reminder to sanity check the number of rows in a Parquet file of the Spark output when working with large and wide data sets.