Skip to content
Snippets Groups Projects
Commit 3edbea43 authored by midou's avatar midou
Browse files

Merge branch 'develop' of https://dci-gitlab.cines.fr/dci/abs into develop

parents 2c526d02 b5365ea4
No related branches found
No related tags found
No related merge requests found
variables:
GIT_CLONE_PATH: $SCRATCHDIR/abs/$CI_CONCURRENT_PROJECT_ID
stages:
- download
- compile
- run-small
- run
.runner: &runner
tags:
- alfred, occigen
.variables: &smilei
<<: *runner
variables:
APP_NAME: "SMILEI"
.download-app: &download-app
stage: download
script:
- cd ${APP_NAME}
- ./download.sh
#GLOBAL WORKFLOW
dl-smilei:
<<: *smilei
<<: *download-app
......@@ -83,3 +83,10 @@ cd testcase_XXX
```
And getting no error code returned.
Those steps can also be used in a batch file for running the simulation using a job scheduler.
***Run:***
Information about jobs to be launched can be found in the READ.md file of each test case (testcase_small, testcase_medium and testcase_large).
***Validation:***
- Metric: The metric used to bench is the total time of the simulation in seconds excluding initialization and I/Os. (For more details see the validate.sh script).
- physical: ? (TODO)
#!/bin/bash
#SBATCH --job-name=RamsesBig
#SBATCH --job-name=RamsesLarge
#SBATCH --nodes=320
#SBATCH --ntasks=12800
#SBATCH --threads-per-core=1
#SBATCH --time=00:30:00
#SBATCH --output=ramses3dBig-jean-zay-cpu-%j.out
#SBATCH --output=ramses3dLarge-jean-zay-cpu-%j.out
#SBATCH --hint=nomultithread
#SBATCH --qos=qos_cpu-dev
#SBATCH --qos=qos_cpu-t3
#SBATCH -A qbg@cpu
set -x
source ./env
SIMU=$bench_dir/testcase_big/ # Working directory
SIMU=$bench_dir/testcase_large/ # Working directory
NML="cosmo.nml" # Namelist file
RAMSES=$ramses_dir/ramses/bin/ramses3d # ramses executable
DATE=`date +"%m-%d-%y-%H-%M-%S"`
......
Test case presentation
======================
General information:
---------------------
To obtain a representative case of a real astrophysical problem, **we start from an evolved case**, i.e. the code has already run (more than 24 hours in this case) and the code starts again from restart files located in a folder named necessarily "input", because the ramses executable is implemented this way (these restart files are unarchived in the "prepare.sh" phase).
The code **cannot change the number of MPI processes** of the initial simulation when it starts from a checkpoint/restart, so **the number of processes is fixed**.
For this case, we will use **12800 MPI** processes, and we will need about **3.5-3.6GB** of memory per MPI process. To simplify we will take 1 core per MPI process on machines with nodes with enough memory and we will depopulate when necessary. For example we will use 320 nodes for Jean-Zay, 40 tasks per node, i.e one task per cpus.
Some metrics :
---------------
cf. https://dci-gitlab.cines.fr/dci/abs/-/wikis/home :
- TGCC/Joliot Curie AMD:
- Jean-Zay-cpu/Cascade Lake nodes:
Other:
-------
Test case designed by F. Bournaud, informations provided by F. Bournaud, integrated in abs by C. Jourdain.
\ No newline at end of file
#!/bin/bash
echo "************************************************************"
echo "* Prepare Large case in $bench_dir"
echo "************************************************************"
mkdir -p $bench_dir/testcase_large/
if [[ $HOSTNAME = *"occigen"* ]]; then
source ../machines/occigen-bdw/env
tar xvf /store/CINES/dci/SHARED/abs/ramses/input.tar -C $bench_dir/testcase_large/.
tar xvf /store/CINES/dci/SHARED/abs/ramses/PROD.tar -C $bench_dir/testcase_large/.
elif [[ $HOSTNAME = *"jean-zay"* ]]; then
source ../machines/jean-zay-cpu/env
tar xvf $SCRATCH/abs/ramses/input_big.tar -C $bench_dir/testcase_large/.
tar xvf $SCRATCH/abs/ramses/PROD_BIG.tar -C $bench_dir/testcase_large/.
elif [[ $HOSTNAME = *"irene"* ]]; then
source ../machines/irene-amd/env
tar xvf $CCCSCRATCHDIR/abs/ramses/input_big.tar -C $bench_dir/testcase_large/.
tar xvf $CCCSCRATCHDIR/abs/ramses/PROD_BIG.tar -C $bench_dir/testcase_large/.
else
echo "Hostname not recognized: abort"
exit 1
fi
cp $bench_dir/testcase_large/PROD_BIG/cosmo.nml $bench_dir/testcase_large/.
echo "************************************************************"
echo "* End Prepare large case "
echo "************************************************************"
#!/bin/bash
if [[ $HOSTNAME = *"occigen"* ]]; then
cd ../machines/occigen-bdw/
elif [[ $HOSTNAME = *"jean-zay"* ]]; then
cd ../machines/jean-zay-cpu/
elif [[ $HOSTNAME = *"irene"* ]]; then
cd ../machines/irene-amd/
else
echo "Hostname not recognized: abort"
exit 1
fi
source ./env
if [ -z "$bench_dir" ]; then
echo "bench_dir missing"
exit 1
fi
sbatch batch_large.slurm
#!/bin/bash
if [[ $HOSTNAME = *"occigen"* ]]; then
source ../machines/occigen-bdw/env
elif [[ $HOSTNAME = *"jean-zay"* ]]; then
source ../machines/jean-zay-cpu/env
elif [[ $HOSTNAME = *"irene"* ]]; then
source ../machines/irene-amd/env
else
echo "Hostname not recognized: abort"
exit 1
fi
echo "*******************************************************************************************************************"
echo "* Validate Large case in $bench_dir/testcase_large/ *"
echo "*******************************************************************************************************************"
cd $bench_dir/testcase_large
echo "* ls -1 *.log" && ls -1 *.log
log_file=$1
if [ -z "$log_file" ]; then
log_file=`ls -thlx run_* | awk -F " " '{ print $1 }'`
echo "* No argument provided (ex: ./validate run_%m-%d-%y-%H-%M-%S.log) -> validation of the last run named $log_file"
else
echo "* Validation of the run named $log_file"
fi
start_t=`grep startup $log_file | awk '{ print $5 }'`
end_t=`grep "Total elapsed time:" $log_file | awk '{ print $4 }'`
perf=`bc -l <<< $end_t-$start_t`
if [ -z "$perf" ] || [ -z "$end_t" ]
then
echo "* Large bench is not validated"
else
echo "* Large bench is validated:"
echo "log file: $log_file"
echo "end = $end_t s (total simulation time)"
echo "perf = $perf s (total simulation time excluding initialization and i/o)"
fi
echo "*******************************************************************************************************************"
#TODO add a physical validation
Test case presentation
======================
The small test case of the template application is doing nothing but watching videos on Youtube all day long.
It uses no DFT method, nor spectral method or anything else. The FFTW is widely not use for this case.
General information:
---------------------
To obtain a representative case of a real astrophysical problem, **we start from an evolved case**, i.e. the code has already run (more than 24 hours in this case) and the code starts again from restart files located in a folder named necessarily "input", because the ramses executable is implemented this way (these restart files are unarchived in the "prepare.sh" phase).
The code **cannot change the number of MPI processes** of the initial simulation when it starts from a checkpoint/restart, so **the number of processes is fixed**.
Case profile
------------
For this case, we will use **4096 MPI** processes, and we will need about **1.4GB** of memory per MPI process. To simplify we will take 1 core per MPI process on machines with nodes with enough memory and we will depopulate when necessary. For example on Occigen/Broadwell node with 28 core per node, we will use 147 nodes to have aproximately 28 MPI tasks per node.
A profiling of a small specfem3D test case performed on Occigen on 1 haswell node (64GB) is available in this folder:
`profile_occigen-hsw.html"
It has been generated using Intel APS (infos: https://software.intel.com/sites/products/snapshots/application-snapshot/)
Some metrics :
---------------
cf. https://dci-gitlab.cines.fr/dci/abs/-/wikis/home :
- TGCC/Joliot Curie AMD:
- Jean-Zay-cpu/Cascade Lake nodes:
Other:
-------
Test case designed by F. Bournaud, informations provided by F. Bournaud, integrated in abs by C. Jourdain.
......@@ -5,6 +5,7 @@ echo "************************************************************"
mkdir -p $bench_dir/testcase_medium/
if [[ $HOSTNAME = *"occigen"* ]]; then
source ../machines/occigen-bdw/env
#lfs setstripe -c 18 /store/CINES/dci/SHARED/abs/ramses/input.tar # ? squareroot(330)~18
tar xvf /store/CINES/dci/SHARED/abs/ramses/input.tar -C $bench_dir/testcase_medium/.
tar xvf /store/CINES/dci/SHARED/abs/ramses/PROD.tar -C $bench_dir/testcase_medium/.
elif [[ $HOSTNAME = *"jean-zay"* ]]; then
......@@ -13,6 +14,7 @@ elif [[ $HOSTNAME = *"jean-zay"* ]]; then
tar xvf $SCRATCH/abs/ramses/PROD.tar -C $bench_dir/testcase_medium/.
elif [[ $HOSTNAME = *"irene"* ]]; then
source ../machines/irene-amd/env
#lfs setstripe -c 18 $CCCSCRATCHDIR/abs/ramses/input.tar # ? squareroot(330)~18
tar xvf $CCCSCRATCHDIR/abs/ramses/input.tar -C $bench_dir/testcase_medium/.
tar xvf $CCCSCRATCHDIR/abs/ramses/PROD.tar -C $bench_dir/testcase_medium/.
else
......
......@@ -3,9 +3,14 @@ Test case presentation
General information:
---------------------
The small test case of ramses is a debugging case that runs in ~10 minutes on 256 or 128 cores so a few nodes. It uses the initial conditions IC_l9. To allow debugging on a variable number of cores, we start from a non-evolved initial condition (IC_l9) and therefore not necessarily representative of a real case, so no problem on this test case to change the number of cores and nodes.
For this simulation, we start from a **non-evolved initial condition** (called IC_l9) and therefore not really representative of a real case.
The small test case (A.K.A Debug test case) of ramses is a **debugging case** that runs in **less than 10 minutes** on **256 cores**.
In this version the code allocates 2.6GB/core. We can reduce it to a little less than 2GB/core if needed, for that we have to reduce "ngridmax" in the cosmo.nml namelist to 300000 instead of 400000 and npartmax to 1700000 instead of 2000000.
Debbuging:
---------------
To allow debugging on a variable number of cores, we lauch the simulation from a non-evolved system. So, if you need to perform a function/debug test, you can use this test case to change the number of cores and nodes. If necessary, you can reduce the memory footprint, for this we need to reduce "ngridmax" in the name list cosmo.nml to 300000 instead of 400000 and npartmax to 1700000 instead of 2000000 (for example).
Please use 256 MPI and cores in the context of the benchmark.
Warning:
--------
......@@ -13,8 +18,10 @@ Don't worry about messages like "File IC_l9/ic_tempb not found" in the .log file
Some metrics :
---------------
The test case ran correctly in 563 seconds on 256 OCCIGEN/Haswel cores. For info the same test case runs in 468 seconds on 256 cores of TGCC/Joliot Curie AMD . (January 2020)
- TGCC/Joliot Curie AMD: 468 seconds on 256 cores () (January 2020)
- Occigen/Haswell nodes: 563 seconds on 256 cores (11 nodes) (January 2020)
- Occigen/Broadwell nodes: 481 seconds on 256 cores (10 nodes) (february 2020)
- Jean-Zay-cpu/Cascade Lake nodes: 545 seconds on 256 cores (7 nodes) (february 2020)
Other:
-------
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment