2024-06-18
PAMREIN's daily Open Notebook (COMMONS Lab)
Check Github
Todo --[]
Meetings
Daily report (What did I learn?)
Again I have problems with storage. I can have about 900GB memory per task. If I read in all the .parquet files, it is not enough.
Even all the .parquet files are only about 180GB, the compressed format will explode through the read in.
I am looking for parallising this step, which can be maybe possible with "use_legacy_dataset=False" (https://stackoverflow.com/questions/74236493/why-reading-a-parquet-dataset-requires-much-more-memory-than-the-size-of-the-dat).