该项目从 https://github.com/tensorflow/tensorflow.git 镜像。
Pull mirroring failed .
由于尝试失败次数过多,仓库镜像已暂停,可以由项目维护者或所有者恢复。
上次成功更新 。
由于尝试失败次数过多,仓库镜像已暂停,可以由项目维护者或所有者恢复。
上次成功更新 。
- 4月 10, 2024
-
-
由 A. Unique TensorFlower 创作于
Skip adding reshapes for resharding tensors where their dimension sizes are not divisible by the mesh dimensions. Also get rid of a duplicate debug print statement PiperOrigin-RevId: 623571611
-
由 A. Unique TensorFlower 创作于
Set metadata and frontend attributes for instructions added/modified in space_to_batch_converter HLO pass. PiperOrigin-RevId: 623548534
-
由 Feng Wang 创作于
PiperOrigin-RevId: 623547261
-
由 Kyle Lucke 创作于
Change StreamExecutor::HostMemoryAllocate to return a std::unique_ptr<MemoryAllocation> rather than a std::unique_ptr<HostMemoryAllocation>. PiperOrigin-RevId: 623546174
-
由 Clive Verghese 创作于
PiperOrigin-RevId: 623545426
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623544771
-
由 Kyle Lucke 创作于
PiperOrigin-RevId: 623542918
-
由 Kyle Lucke 创作于
PiperOrigin-RevId: 623535753
-
由 Eugene Zhulenev 创作于
PiperOrigin-RevId: 623535133
-
由 Ionel Gog 创作于
PiperOrigin-RevId: 623531760
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623516339
-
由 Kyle Lucke 创作于
PiperOrigin-RevId: 623504775
-
由 Kyle Lucke 创作于
PiperOrigin-RevId: 623500061
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623491147
-
由 Henning Becker 创作于
jaxlib updates it's upstream XLA version, so this ~hack~ compatiblity layer is not needed anymore. PiperOrigin-RevId: 623491067
-
由 Peter Hawkins 创作于
This was added as part of a prototype but no users appear to exist any more, and it was never a publicized API. PiperOrigin-RevId: 623488172
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623486944
-
由 TJ Xu 创作于
Imported from GitHub PR https://github.com/openxla/xla/pull/11182 When the source and target pair of a collective permute instruction are located within the same node, we will try to use cudaMemCpyPeerToPeer to send data instead of invoking nccl send and recv kernels. Because memcpy will use the copy engine on the GPU instead of launching a kernel, copy engine can fully saturate the bandwidth without occupying any SM. Since memcpyP2p needs the destination pointer of the other device, we first communicate the pointer values with peer using nccl calls and use the pointer to invoke memcpy. Some perf comparison: sending a tensor of this size bf16[512,24576]{1,0}: Using collective-permute: 445 us using memcpy: 100us Copybara import of the project: -- 73a2568520d411eb6f62b6383a3841cb0f8edf22 by TJ <tjx@nvidia.com>: optimize collective permute using memcpy -- 1dd896f544852fa898e937af8c0b94e075db3ec2 by TJ <tjx@nvidia.com>: Remove nccl calls and use Async value -- a2fbc7d29d3d8106753f36e2a477e0889da6cb58 by TJ <tjx@nvidia.com>: wrap async map and mutex with a class Merging this change closes #11182 PiperOrigin-RevId: 623481864
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623480882
-
由 zoranjovanovic-ns 创作于
Imported from GitHub PR https://github.com/openxla/xla/pull/11331 Copybara import of the project: -- a4f80a3f32a10230bfdaa1dadc12f1f385883d8a by Zoran Jovanovic <zjovanov@amd.com>: Fix compilatio issues related to aa08925e2dd1351c68e7838e074fe35908e3ea07 (Triton in XLA for ROCm - ir_emitter_triton related changes) Merging this change closes #11331 PiperOrigin-RevId: 623476962
-
由 ekuznetsov139 创作于
Imported from GitHub PR https://github.com/openxla/xla/pull/11398 This contains two fixes: * A compilation fix for stream_executor/rocm/rocm_executor.cc after it got broken by the commit https://github.com/openxla/xla/commit/3c5cf0ac2adbc0a651d3a2ec822e47927218b153 * A workaround that allows static_assert(false) to be compiled with compilers not implementing https://cplusplus.github.io/CWG/issues/2518.html Copybara import of the project: -- 5cdd25c3190140fec619073d25a20952fed2fd8b by Eugene Kuznetsov <eugene.kuznetsov@amd.com>: Build fix -- b86d12ff02a2a6247a7f991f77dfdafb08a79703 by Eugene Kuznetsov <eugene.kuznetsov@amd.com>: Workaround for static_assert needed for some compilers (https://cplusplus.github.io/CWG/issues/2518.html) Merging this change closes #11398 PiperOrigin-RevId: 623471087
-
由 Henning Becker 创作于
I'm trying to make XLA compatible with cuDNN 9 and stumbled upon a miscompile in the fMHA rewriter. It seems to be related with the cause mask pattern matcher, at least forcing the causal mask flag to true, makes the broken test path. I extract a reproducer and added an (integration) test which can be removed or converted into a proper unit test once this is fixed. IN the meantime I disable all fMHA features that require a version of cuDNN 8.9.6+. PiperOrigin-RevId: 623470743
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623456155
-
由 Adrian Kuegel 创作于
PiperOrigin-RevId: 623454637
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623446648
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623442999
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623442112
-
由 mmakevic-amd 创作于
Imported from GitHub PR https://github.com/openxla/xla/pull/11307 This PR upstreams changes from [#2467](https://github.com/ROCm/tensorflow-upstream/pull/2467/commits/c3d966e559ffd13f6a473d1b57e606314a456ce8) and also upstreams changes to `gpu_layout_assignment.cc` introduced [here](https://github.com/ROCm/tensorflow-upstream/commit/b34b1eb35c095a16208356913a956256cc419e99) and [here](https://github.com/ROCm/tensorflow-upstream/commit/dfb7dd0b2f7e6c6bbcd3f1dbc9e299be9554b4a7). Copybara import of the project: -- ceaa8b0ab3dfda7ad93a7c57f5c26e6b31365a80 by mmakevic <Milica.Makevic@amd.com>: adding env variable to control rocm_dnn workspace limit -- b7661ef811dfff68e03b4fe4306c1f63ffd1e2ee by mmakevic <Milica.Makevic@amd.com>: Changes to use NHWC for FP16 ops on XLA path for ROCm 5.0 and MI100/200 Fix double = -- cfeb2dec629eac32d5afc8256f2c684bbe81fe75 by mmakevic <Milica.Makevic@amd.com>: Check if rocm or cuda are used during runtime Merging this change closes #11307 PiperOrigin-RevId: 623438181
-
由 Adrian Kuegel 创作于
We used to delete subcomputations that become dead due to removal of its calling computation before deleting the calling computation. This is asking for trouble, e.g. regarding cleanup routines. PiperOrigin-RevId: 623434223
-
由 A. Unique TensorFlower 创作于
[XLA] Add argument and return shardings to while body and cond based on result shardings if present. This is needed in case propagation runs before this conversion, in which case the best indicator for the argument and return shardings are the shardings of the while results. PiperOrigin-RevId: 623433069
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623431068
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623429617
-
由 Oleg Shyshkov 创作于
This feature is under development and will likely remain unstable until we finish work on SymbolicTileAnalysis. PiperOrigin-RevId: 623428765
-
由 Johannes Reifferscheid 创作于
PiperOrigin-RevId: 623424640
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623421779
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623421390
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623420947
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623420750
-
由 A. Unique TensorFlower 创作于
PiperOrigin-RevId: 623420486
-
由 Quentin Khan 创作于
PiperOrigin-RevId: 623416443
-