Command: mpiexec -n 4 ./mmult4_c.exe 1024 Resources: 1 node (24 physical, 24 logical cores per node) Memory: 126 GiB per node Tasks: 4 processes Machine: p0560 Start time: śr. sty 20 17:38:23 2021 Total time: 1 second Full path: .../trial_package/4_profiling_imbalance Summary: mmult4_c.exe is Compute-bound in this configuration Compute: 55.8% |=====| MPI: 34.6% |==| I/O: 9.6% || This application run was Compute-bound. A breakdown of this time and advice for investigating further is in the CPU section below. CPU: A breakdown of the 55.8% CPU time: Scalar numeric ops: 0.0% | Vector numeric ops: 58.4% |=====| Memory accesses: 41.6% |===| The CPU performance appears well-optimized for numerical computation. The biggest gains may now come from running at larger scales. Significant time is spent on memory accesses. Use a profiler to identify time-consuming loops and check their cache performance. MPI: A breakdown of the 34.6% MPI time: Time in collective calls: 16.7% |=| Time in point-to-point calls: 83.3% |=======| Effective process collective rate: 0.00 bytes/s Effective process point-to-point rate: 472 MB/s I/O: A breakdown of the 9.6% I/O time: Time in reads: 0.0% | Time in writes: 100.0% |=========| Effective process read rate: 0.00 bytes/s Effective process write rate: 96.1 MB/s Most of the time is spent in write operations with a low effective transfer rate. This may be caused by contention for the filesystem or inefficient access patterns. Use an I/O profiler to investigate which write calls are affected. Threads: A breakdown of how multiple threads were used: Computation: 0.0% | Synchronization: 0.0% | Physical core utilization: 16.7% |=| System load: 16.7% |=| No measurable time is spent in multithreaded code. Physical core utilization is low. Try increasing the number of processes to improve performance. Memory: Per-process memory usage may also affect scaling: Mean process memory usage: 59.4 MiB Peak process memory usage: 72.5 MiB Peak node memory usage: 2.0% || The peak node memory usage is very low. Running with fewer MPI processes and more data on each process may be more efficient. Energy: A breakdown of how the 0.00671 Wh was used: CPU: 100.0% |=========| System: not supported Mean node power: not supported Peak node power: 0.00 W The whole system energy has been calculated using the CPU energy usage. System power metrics: No Arm IPMI Energy Agent config file found in /var/spool/ipmi-energy-agent. Did you start the Arm IPMI Energy Agent?