⋅5 min read
Improve your PySpark ETL's performance by providing explicit schema
Have you ever stumbled upon a Spark ETL and you were left wondering how a simple loading of a dataset can take hours, even though the filtered dataset you are specifying is relatively small?

Dan