2024-06-14
PAMREIN's daily Open Notebook (COMMONS Lab)
Check Github
Todo --[]
Meetings
Daily report (What did I learn?)
scontrol show partition - Check the hardware and possible cpus / memory...
sacct -j 12820165 --format=JobID,JobName%30,MaxRSS,Elapsed - will check the memory and cpu use of a running job seff 12820165 - more comprehensive description of memory and cpu use of this job-id
The size of all the predicted files (n45) is arround 771.22Gb. Because of storage problems after taking out the important columns (all columns) I have to reduce storage. The main idea was to make a gzip command.
But for comparison reason, I will do ones a .gzip and ones a .paquet (polats binary) file.
Filename: compounds1_generalized_230106_frozen_metadata_for_MINES_split_14.csv_85718.53 shape: (24_035_930, 6) .tsv: 4.7G .gzip: 1.4G *.parquet: 1.3G
Filename: reactions1_generalized_230106_frozen_metadata_for_MINES_split_14.csv_85718.53 .tsv: 13G .gzip: 3.1G *.parquet: 2.9G