Differences

This shows you the differences between two versions of the page.

--- cluster:cluster [2023-03-20 11:37] – [Szenario 2: parallel] Markus Rosenstihl
+++ cluster:cluster [2024-10-08 11:54] (current) – [AG Vogel] Markus Rosenstihl
@@ Line 1: / Line 1: @@
 ====== Tips/Tricks ======
-===== Profiling ====
+===== Profiling C Programs ====
 You can profile programs with [[https://valgrind.org/docs/manual/cl-manual.html#cl-manual.options|''valgrind'']] and analyze the output file with ''kcachegrind''.
@@ Line 20: / Line 20: @@
 === Generelle Punkte ===
 == Gross ist besser ==
-Lieber wenige grosse Dateien als viele kleine Dateien, grose Dateien erzeugen weniger IOPS, das bleastet das Netzwerk und die SSDs nicht so stark.
+Lieber wenige grosse Dateien als viele kleine Dateien, grose Dateien erzeugen weniger IOPS, das belastet das Netzwerk und die SSDs nicht so stark.
 Nicht vergessen, es könnten noch viele andere ebenfalls auf das gleiche Dateisystem zugreifen.
@@ Line 123: / Line 123: @@
 </code>
-Bei vielen (>1000) kleinen Dateien bietet es sich auch an nur das lokale (scratch) Dateisystem zu benutzen, nicht ein Netzlaufwerk. Das macht aber leider die Sammlung wieder komplexer. Ein Vorschlag der Admins wäre die Dateien des Jobs mit ''tar'' zu packen (''tar cfz files.tar.gz files*.dat''), dann diese von allen Knoten auf einem Konten entpacken und zusammenführen.
+Bei vielen (>1000) kleinen Dateien bietet es sich an nur das lokale (scratch) Dateisystem zu benutzen, nicht ein Netzlaufwerk. Das macht aber leider die Sammlung wieder komplexer. Ein Vorschlag der Admins wäre die Dateien des Jobs mit ''tar'' zu packen (''tar cfz files.tar.gz files*.dat''), dann diese Dateien von allen Knoten auf einem Konten entpacken und zusammenführen.
 ====== SLURM Job Submission  ======
@@ Line 291: / Line 291: @@
 Another possible way is the ''--multi-prog'' parameter for srun. As an example you can use this [[https://hpc.nmsu.edu/discovery/slurm/serial-parallel-jobs/#_using_multi_prog|document]].
+One can let jobs wait for each other also with the ''-d, --dependency=singleton'' parameter.
+This tells the job to begin execution after any previously launched jobs sharing the same job name and user has [[https://ulhpc-tutorials.readthedocs.io/en/latest/sequential/basics/|terminated]]. Job name is set with ''-J'' parameter.
+<code bash>
+# Abstract search space parameters
+min=1
+max=2000
+chunksize=200
+for i in $(seq $min $chunksize $max); do
+    ${CMD_PREFIX} sbatch \
+                  -J ${JOBNAME}_$(($i/$chunksize%${MAXNODES})) --dependency singleton \
+                  ${LAUNCHER} --joblog log/state.${i}.parallel.log  "{$i..$((i+$chunksize))}";
+done
+</code>
+===== Tools =====
+These are tools that exist, if requested we will try and make them available on the cluster:
+  * [[https://researchcomputing.princeton.edu/support/knowledge-base/spark|Spark]]
+  * [[https://docs.dask.org/en/stable/deploying.html|Dask]]
+  * [[https://modin.readthedocs.io/en/stable/|Modin]]
+  * [[https://researchcomputing.princeton.edu/support/knowledge-base/apptainer|Apptainer]]
 ====== Group Specific ======
 ===== AG Drossel =====
 ===== AG Liebchen =====
 ===== AG Vogel =====
+The head node protein does not allow password logins, you need to use ssh keys.
+  - create a key: ''ssh-keygen -t ed25519''
+  - We admins stronlgy recommend to use a very strong passphrase. Together with ssh-agent you have to type it only once per login to your desktop!
+  - add the public part to the authorized_keys file and set correct premissions: ''cat .ssh/id_ed25519.pub | tee -a .ssh/authorized_keys && chmod 0600 .ssh/authorized_keys''
+  - now login to protein.cluster, it may ask for the passphrase.
+===== SSH Agent =====
+The ''ssh-agent'' should be startet automatically on login, Cinnamon for example will show a screen upon login to the desktop. If not you need to set the **GNOME Keyring SSH Agent** to start automatically:
+{{:cluster:startup_apps_cinnamon.png?600 |}}