提交 · 0d74d445bcdd27a6eef027ea57e2131f25e41891 · HPCSource / octopus

该项目从 https://gitlab.com/octopus-code/octopus.git 镜像。 Pull mirroring failed 9个月前.
由于尝试失败次数过多，仓库镜像已暂停，可以由项目维护者或所有者恢复。
上次成功更新 9个月前。

5月 24, 2024
- Improve the debug message for kernel launch errors · 0d74d445
  由 Sebastian Ohlmann 创作于 9个月前
```
Print out the kernel name and also some more detail about the dimensions
that are problematic.
```
  0d74d445
4月 16, 2024
- Add get_stream function to get the current stream number · 3ac52d18
  由 Sebastian Ohlmann 创作于 10个月前
  
  3ac52d18
- Add async option to some functions · 73045c50
  由 Sebastian Ohlmann 创作于 10个月前
  
  73045c50
- Add -lineinfo to all kernels · b4055211
  由 Sebastian Ohlmann 创作于 10个月前
```
This does not impact performance, but helps in debugging
```
  b4055211
4月 09, 2024
- Add check to avoid using CUDA-aware MPI if not available · 5e0b19e1
  由 Sebastian Ohlmann 创作于 11个月前
```
Co-authored-by: Cristian Le <cristian.le@mpsd.mpg.de>
```
  5e0b19e1
4月 04, 2024
- Fix integer types to new convention · 7ed69801
  由 Sebastian Ohlmann 创作于 11个月前
  
  7ed69801
- Fix whitespace issue · 93a81feb
  由 Sebastian Ohlmann 创作于 11个月前
  
  93a81feb
- Set threads per block to 256 by default · 0eb428c9
  由 Sebastian Ohlmann 创作于 11个月前
```
This is recommended by the CUDA best practices guide, see
https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#thread-and-block-heuristics
Make sure that the limit of threads per block for the corresponding
kernel is respected (though it is unlikely to be smaller than 256).
```
  0eb428c9
- If the ressources are not available, the code will compute grid sizes that are not available, · 746a8bbe
  由 Nicolas Tancogne-Dejean 创作于 1年前
```
causing an error. This is now properly captured by the code.
```
  746a8bbe
2月 06, 2024
- Simplify mock_autotools.cmake · 40314597
  由 Henri Menke 创作于 1年前
  
  40314597
12月 19, 2023

Replace `kind_oct_m` with `intrinsic :: iso_fortran_env`. Access to the... · c88b62e9

由 Alex Buccheri 创作于 1年前

Replace `kind_oct_m` with `intrinsic :: iso_fortran_env`. Access to the fortran intrinsics is also available through `global_oct_m`.

c88b62e9

Replace all instances of _i4, _i8, r4 and _r8. Note, this affects routine... · 9ab9ee82

由 Alex Buccheri 创作于 1年前

Replace all instances of _i4, _i8, r4 and _r8. Note, this affects routine names as well as precision suffixes.

Update OPTION variables to also use standard fortran precision, _int64

9ab9ee82

Replace declarations using kinds aliases with fortran standard variables.... · 6ff93907

由 Alex Buccheri 创作于 1年前

Replace declarations using kinds aliases with fortran standard variables. Note, formatting has not been preserved, but it's only cosmetic.

6ff93907

12月 18, 2023
- Replace CNST macro with explicit, modern precision. · f83fdb60
  由 Alex Buccheri 创作于 1年前
  
  f83fdb60
5月 10, 2023
- rename messages_print_stress to · fb02f432
  由 Meisam Farzalipour Tabriz 创作于 1年前
```
... messages_print_with_emphasis
```
  fb02f432
3月 08, 2023
- Make the DOTPV kernel work for OpenCL too. · b62fe310
  由 NicolasTD 创作于 2年前
  
  b62fe310
3月 01, 2023
- Fixes the force kernel after the change of the previous commit · 08b217ac
  由 NicolasTD 创作于 2年前
```
The use a complex as two doubles required to have the state index running first
in the GPU kernel.
```
  08b217ac
2月 01, 2023
- Make the style_checke pipeline happier. · 948dca9d
  由 NicolasTD 创作于 2年前
  
  948dca9d
- Fixing some issues and improving the previous commits · 06d4e32d
  由 NicolasTD 创作于 2年前
  
  06d4e32d
1月 31, 2023
- Fix a compilation problem when OpenCL is not present. · ccdfebd0
  由 NicolasTD 创作于 2年前
  
  ccdfebd0
- Fixing further more kernels and routines. · ffa411b7
  由 NicolasTD 创作于 2年前
  
  ffa411b7
- Making the code run: fixing some kernel arguments that OpenCL is rightfully complaining about. · 3091c6a1
  由 NicolasTD 创作于 2年前
  
  3091c6a1
- Fix the debug flag for some kernels. This is now working for both OpenCL and CUDA. · 3a0c7552
  由 NicolasTD 创作于 2年前
  
  3a0c7552
- style fix · b93eef28
  由 Meisam 创作于 2年前
```
fix indentations
remove trailing white spaces
```
  b93eef28
1月 10, 2023
- In order to call the routine from a generic code, the cuda call must be fenced with the proper if. · 54437933
  由 NicolasTD 创作于 2年前
  
  54437933
1月 03, 2023
- The routine was not working for integers, causing the code to stop is one... · d9d542de
  由 Nicolas Tancogne-Dejean 创作于 2年前
```
The routine was not working for integers, causing the code to stop is one tries to initialize all GPU buffers to zero.
```
  d9d542de
12月 13, 2022
- fixed variable documentation · da3e2b7e
  由 Martin Lueders 创作于 2年前
  
  da3e2b7e
12月 12, 2022
- renamed variable, and re-introduced bugfix · 136ac522
  由 Martin Lueders 创作于 2年前
  
  136ac522
12月 08, 2022
- fixed typo · 1aac5169
  由 Martin Lueders 创作于 2年前
  
  1aac5169
12月 07, 2022
- introduced input option to force initialization · d8d83a31
  由 Martin Lueders 创作于 2年前
  
  d8d83a31
- moved initialization to buffer_create · 71a2c319
  由 Martin Lueders 创作于 2年前
  
  71a2c319
11月 24, 2022
- revert accidental changes · 28a4f6bd
  由 Martin Lueders 创作于 2年前
  
  28a4f6bd
- moved initialization to buffer_create · dd157174
  由 Martin Lueders 创作于 2年前
  
  dd157174
10月 11, 2022

Improve message displaying rank and device · c8affd43

由 Sebastian Ohlmann 创作于 2年前

Also show the host name. This allows easier identification of ranks and
devices to hosts. Also get rid of the leading zeros in this output.

c8affd43

9月 28, 2022

Batchifying the calculation of the commutator of the position operator and the... · d6a3ce6c

由 Nicolas Tancogne-Dejean 创作于 2年前

Batchifying the calculation of the commutator of the position operator and the DFT+U(+V) term. This allows to compute the current fully batchified.

d6a3ce6c

9月 15, 2022
- Fix some warnings. · 8a504816
  由 Nicolas Tancogne-Dejean 创作于 2年前
  
  8a504816
- Adding the GPU support of a part of the DFT+U code. · 71254d75
  由 Nicolas Tancogne-Dejean 创作于 2年前
  
  71254d75
8月 17, 2022

Fix conversion warnings · eec44207
由 Sebastian Ohlmann 创作于 2年前

eec44207

Convert pack_size to i8 type · ed5cf1e8

由 Sebastian Ohlmann 创作于 2年前

Avoid potential integer overflows when computing the product of
pack_size along its dimensions or with other, potentially large numbers.
Add some more low-level routines to allow calling with i4 and i8
integers.

ed5cf1e8

Fix an integer overflow in the set_zero GPU code · 531a4ffc

由 Sebastian Ohlmann 创作于 2年前

For large batch sizes, the multiplication of the number of grid points
with the number of states in a batch could overflow. Change the
integers involved to 8-byte.

531a4ffc