2024-06-18

PAMREIN's daily Open Notebook (COMMONS Lab)

Todo - Check Github

-[]

Meetings

Daily report (What did I learn?)

Again I have problems with storage. I can have about 900GB memory per task. If I read in all the .parquet files, it is not enough. Even all the .parquet files are only about 180GB, the compressed format will explode through the read in.
I am looking for parallising this step, which can be maybe possible with "use_legacy_dataset=False" (https://stackoverflow.com/questions/74236493/why-reading-a-parquet-dataset-requires-much-more-memory-than-the-size-of-the-dat).

Future perspective

Keywords

Abbreviations