Command:        mpiexec -n 4 ./mmult4_c.exe 1024
Resources:      1 node (24 physical, 24 logical cores per node)
Memory:         126 GiB per node
Tasks:          4 processes
Machine:        p0560
Start time:     śr. sty 20 17:38:23 2021
Total time:     1 second
Full path:      .../trial_package/4_profiling_imbalance

Summary: mmult4_c.exe is Compute-bound in this configuration
Compute:                                     55.8% |=====|
MPI:                                         34.6% |==|
I/O:                                          9.6% ||
This application run was Compute-bound. A breakdown of this time and advice for investigating further is in the CPU section below. 

CPU:
A breakdown of the 55.8% CPU time:
Scalar numeric ops:                           0.0% |
Vector numeric ops:                          58.4% |=====|
Memory accesses:                             41.6% |===|
The CPU performance appears well-optimized for numerical computation. The biggest gains may now come from running at larger scales.
Significant time is spent on memory accesses. Use a profiler to identify time-consuming loops and check their cache performance.

MPI:
A breakdown of the 34.6% MPI time:
Time in collective calls:                    16.7% |=|
Time in point-to-point calls:                83.3% |=======|
Effective process collective rate:            0.00 bytes/s
Effective process point-to-point rate:         472 MB/s

I/O:
A breakdown of the 9.6% I/O time:
Time in reads:                                0.0% |
Time in writes:                             100.0% |=========|
Effective process read rate:                  0.00 bytes/s
Effective process write rate:                 96.1 MB/s
Most of the time is spent in write operations with a low effective transfer rate. This may be caused by contention for the filesystem or inefficient access patterns. Use an I/O profiler to investigate which write calls are affected.

Threads:
A breakdown of how multiple threads were used:
Computation:                                  0.0% |
Synchronization:                              0.0% |
Physical core utilization:                   16.7% |=|
System load:                                 16.7% |=|
No measurable time is spent in multithreaded code.
Physical core utilization is low. Try increasing the number of processes to improve performance.

Memory:
Per-process memory usage may also affect scaling:
Mean process memory usage:                    59.4 MiB
Peak process memory usage:                    72.5 MiB
Peak node memory usage:                       2.0% ||
The peak node memory usage is very low. Running with fewer MPI processes and more data on each process may be more efficient.

Energy:
A breakdown of how the 0.00671 Wh was used:
CPU:                                        100.0% |=========|
System:                                   not supported
Mean node power:                          not supported
Peak node power:                              0.00 W
The whole system energy has been calculated using the CPU energy usage.
System power metrics: No Arm IPMI Energy Agent config file found in /var/spool/ipmi-energy-agent. Did you start the Arm IPMI Energy Agent?