提交 · 9b503848120268cbb47a4796ba20dcc6c554f0a0 · HPCSource / tensorflow

该项目从 https://github.com/tensorflow/tensorflow.git 镜像。 Pull mirroring failed 10个月前.
由于尝试失败次数过多，仓库镜像已暂停，可以由项目维护者或所有者恢复。
上次成功更新 11个月前。

4月 10, 2024

Skip adding reshapes for resharding tensors where their dimension sizes are... · 9b503848

由 A. Unique TensorFlower 创作于 11个月前

Skip adding reshapes for resharding tensors where their dimension sizes are not divisible by the mesh dimensions.

Also get rid of a duplicate debug print statement

PiperOrigin-RevId: 623571611

9b503848

Set metadata and frontend attributes for instructions added/modified in... · 943bacb3

由 A. Unique TensorFlower 创作于 11个月前

    Set metadata and frontend attributes for instructions added/modified in space_to_batch_converter HLO pass.

PiperOrigin-RevId: 623548534

943bacb3

add some internal changes · 8b346142
由 Feng Wang 创作于 11个月前
```
PiperOrigin-RevId: 623547261
```
8b346142

Change StreamExecutor::HostMemoryAllocate to return a... · 52a98f23

由 Kyle Lucke 创作于 11个月前

Change StreamExecutor::HostMemoryAllocate to return a std::unique_ptr<MemoryAllocation> rather than a std::unique_ptr<HostMemoryAllocation>.

PiperOrigin-RevId: 623546174

52a98f23

Add Eigen Listener for thread pool schedule and execute events · de029c02
由 Clive Verghese 创作于 11个月前
```
PiperOrigin-RevId: 623545426
```
de029c02
Updates the lazy memory implementation to automatically include low-cardinality memory constraints. · 67288c39
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623544771
```
67288c39
Move Dnn-specific functions into GpuExecutors instead of StreamExecutor. · 5fbdffe8
由 Kyle Lucke 创作于 11个月前
```
PiperOrigin-RevId: 623542918
```
5fbdffe8
Directly use absl::Status instead of xla::Status now that they're the same thing. · c8a502db
由 Kyle Lucke 创作于 11个月前
```
PiperOrigin-RevId: 623535753
```
c8a502db
[xla:gpu] Check fusion root opcode before declaring it multi-output fusion · 3e92b6fd
由 Eugene Zhulenev 创作于 11个月前
```
PiperOrigin-RevId: 623535133
```
3e92b6fd
Remove final specifier from MockLoadedExecutable. · 8432e06e
由 Ionel Gog 创作于 11个月前
```
PiperOrigin-RevId: 623531760
```
8432e06e
Split patch files from Triton integration in different folders · 47a6262f
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623516339
```
47a6262f
Move Blas caching into the GpuExecutor, the only StreamExecutor that does them. · 6094cd00
由 Kyle Lucke 创作于 11个月前
```
PiperOrigin-RevId: 623504775
```
6094cd00
Move Fft caching into the GpuExecutor, the only StreamExecutor that does them. · 45636596
由 Kyle Lucke 创作于 11个月前
```
PiperOrigin-RevId: 623500061
```
45636596
Create and use 2.17 toolchains in Docker build containers. · 87a99438
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623491147
```
87a99438

Remove jaxlib compatibiliy hack · 342f600b

由 Henning Becker 创作于 11个月前

jaxlib updates it's upstream XLA version, so this ~hack~ compatiblity layer is not needed anymore.

PiperOrigin-RevId: 623491067

342f600b

[XLA:Python] Delete unused method make_cross_host_receive_buffers. · 96abb630

由 Peter Hawkins 创作于 11个月前

This was added as part of a prototype but no users appear to exist any more, and it was never a publicized API.

PiperOrigin-RevId: 623488172

96abb630

Enforces cumulative solver timeouts during lazy memory instantiation. · e19ea654
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623486944
```
e19ea654

PR #11182: [NVIDIA GPU] Optimize collective permute using memcpy · 378b7f1f

由 TJ Xu 创作于 11个月前

Imported from GitHub PR https://github.com/openxla/xla/pull/11182

When the source and target pair of a collective permute instruction are located within the same node, we will try to use cudaMemCpyPeerToPeer to send data instead of invoking nccl send and recv kernels. Because memcpy will use the copy engine on the GPU instead of launching a kernel, copy engine can fully saturate the bandwidth without occupying any SM.
Since memcpyP2p needs the destination pointer of the other device, we first communicate the pointer values with peer using nccl calls and use the pointer to invoke memcpy.
Some perf comparison:
sending a tensor of this size bf16[512,24576]{1,0}:
Using collective-permute: 445 us
using memcpy: 100us
Copybara import of the project:

--
73a2568520d411eb6f62b6383a3841cb0f8edf22 by TJ <tjx@nvidia.com>:

optimize collective permute using memcpy

--
1dd896f544852fa898e937af8c0b94e075db3ec2 by TJ <tjx@nvidia.com>:

Remove nccl calls and use Async value

--
a2fbc7d29d3d8106753f36e2a477e0889da6cb58 by TJ <tjx@nvidia.com>:

wrap async map and mutex with a class

Merging this change closes #11182

PiperOrigin-RevId: 623481864

378b7f1f

Automated Code Change · 6a135372
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623480882
```
6a135372

PR #11331: [ROCm] Fix compilation issues related to aa08925e2dd1351c68e7838e074fe35908e3 · 3269c4c9

由 zoranjovanovic-ns 创作于 11个月前

Imported from GitHub PR https://github.com/openxla/xla/pull/11331

Copybara import of the project:

--
a4f80a3f32a10230bfdaa1dadc12f1f385883d8a by Zoran Jovanovic <zjovanov@amd.com>:

Fix compilatio issues related to aa08925e2dd1351c68e7838e074fe35908e3ea07
(Triton in XLA for ROCm - ir_emitter_triton related changes)

Merging this change closes #11331

PiperOrigin-RevId: 623476962

3269c4c9

PR #11398: [ROCm] Build fix · 08e5f22c

由 ekuznetsov139 创作于 11个月前

Imported from GitHub PR https://github.com/openxla/xla/pull/11398

This contains two fixes:
* A compilation fix for stream_executor/rocm/rocm_executor.cc after it got broken by  the commit https://github.com/openxla/xla/commit/3c5cf0ac2adbc0a651d3a2ec822e47927218b153
* A workaround that allows static_assert(false) to be compiled with compilers not implementing https://cplusplus.github.io/CWG/issues/2518.html
Copybara import of the project:

--
5cdd25c3190140fec619073d25a20952fed2fd8b by Eugene Kuznetsov <eugene.kuznetsov@amd.com>:

Build fix

--
b86d12ff02a2a6247a7f991f77dfdafb08a79703 by Eugene Kuznetsov <eugene.kuznetsov@amd.com>:

Workaround for static_assert needed for some compilers
(https://cplusplus.github.io/CWG/issues/2518.html)

Merging this change closes #11398

PiperOrigin-RevId: 623471087

08e5f22c

Temporarily disable newer cuDNN fMHA features due to miscompile · e9297bec

由 Henning Becker 创作于 11个月前

I'm trying to make XLA compatible with cuDNN 9 and stumbled upon
a miscompile in the fMHA rewriter.

It seems to be related with the cause mask pattern matcher, at least
forcing the causal mask flag to true, makes the broken test path.

I extract a reproducer and added an (integration) test which
can be removed or converted into a proper unit test once this is fixed.

IN the meantime I disable all fMHA features that require a version
of cuDNN 8.9.6+.

PiperOrigin-RevId: 623470743

e9297bec

Automated Code Change · f746e8ce
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623456155
```
f746e8ce
Use stable_sort to make horizontal loop fusion pass deterministic. · 5dbf683d
由 Adrian Kuegel 创作于 11个月前
```
PiperOrigin-RevId: 623454637
```
5dbf683d
Automated Code Change · 3c84b2d9
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623446648
```
3c84b2d9
Automated Code Change · 017edc69
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623442999
```
017edc69
Automated Code Change · 28e73e0b
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623442112
```
28e73e0b

PR #11307: [ROCm] Upstreaming old changes · d7fce506

由 mmakevic-amd 创作于 11个月前

Imported from GitHub PR https://github.com/openxla/xla/pull/11307

This PR upstreams changes from [#2467](https://github.com/ROCm/tensorflow-upstream/pull/2467/commits/c3d966e559ffd13f6a473d1b57e606314a456ce8) and also upstreams changes to `gpu_layout_assignment.cc` introduced [here](https://github.com/ROCm/tensorflow-upstream/commit/b34b1eb35c095a16208356913a956256cc419e99) and [here](https://github.com/ROCm/tensorflow-upstream/commit/dfb7dd0b2f7e6c6bbcd3f1dbc9e299be9554b4a7).
Copybara import of the project:

--
ceaa8b0ab3dfda7ad93a7c57f5c26e6b31365a80 by mmakevic <Milica.Makevic@amd.com>:

adding env variable to control rocm_dnn workspace limit

--
b7661ef811dfff68e03b4fe4306c1f63ffd1e2ee by mmakevic <Milica.Makevic@amd.com>:

Changes to use NHWC for FP16 ops on XLA path for ROCm 5.0 and MI100/200

Fix double =

--
cfeb2dec629eac32d5afc8256f2c684bbe81fe75 by mmakevic <Milica.Makevic@amd.com>:

Check if rocm or cuda are used during runtime

Merging this change closes #11307

PiperOrigin-RevId: 623438181

d7fce506

Fix order of removing dead computations in HloDCE · 9ce969cd

由 Adrian Kuegel 创作于 11个月前

We used to delete subcomputations that become dead due to removal of its
calling computation before deleting the calling computation. This is asking for
trouble, e.g. regarding cleanup routines.

PiperOrigin-RevId: 623434223

9ce969cd

[XLA] Add argument and return shardings to while body and cond based on result... · 478770e9

由 A. Unique TensorFlower 创作于 11个月前

[XLA] Add argument and return shardings to while body and cond based on result shardings if present.

This is needed in case propagation runs before this conversion, in which case the best indicator for the argument and return shardings are the shardings of the while results.

PiperOrigin-RevId: 623433069

478770e9

Automated Code Change · 3f1a0acd
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623431068
```
3f1a0acd
Automated Code Change · f84ee5c6
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623429617
```
f84ee5c6

[XLA:GPU] Add fusion into Triton Softmax in PriorityFusion pass. · efaed143

由 Oleg Shyshkov 创作于 11个月前

This feature is under development and will likely remain unstable until we finish work on SymbolicTileAnalysis.

PiperOrigin-RevId: 623428765

efaed143

Implement map HLO -> MLIR. · 2d7639a2
由 Johannes Reifferscheid 创作于 11个月前
```
PiperOrigin-RevId: 623424640
```
2d7639a2
Automated Code Change · f8ed431d
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623421779
```
f8ed431d
Automated Code Change · 97e7939f
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623421390
```
97e7939f
compat: Update forward compatibility horizon to 2024-04-10 · 7aa2b13e
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623420947
```
7aa2b13e
Automated Code Change · 90742006
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623420750
```
90742006
Update GraphDef version to 1828. · cf310baa
由 A. Unique TensorFlower 创作于 11个月前
```
PiperOrigin-RevId: 623420486
```
cf310baa
Remove outdated iOS example app. · 780cb5f9
由 Quentin Khan 创作于 11个月前
```
PiperOrigin-RevId: 623416443
```
780cb5f9