获取提交引用时发生错误。请稍后再试。
[xla] hlo_computation: compact instructions' vector on Cleanup()
tl;dr: this gives a 1.26x compilation time speedup for a large, dense model in XLA:GPU. The largest perf leaf seen in profiles of a large, dense model is related to computing the post order. Surprisingly, it is not the DFS itself what's most expensive; rather, most of the time is spent on scanning through HloComputation::Instructions() to identify DFS roots. The reason this scan becomes expensive as instructions are removed is that the vector holding HloInstructionInfo (introduced in cl/600130708 || https://github.com/openxla/xla/commit/247280ab727) is not shrunk as it flows through the pipeline, making us having to walk through many deleted "tombstone" entries. Here is the histogram of # of tombstones encountered during post order computations for this model: ``` [ 1 - 1,536,345) ****************************** (1,300,248) [1,536,345 - 3,072,690) (2) [3,072,690 - 4,609,034) (364) [4,609,034 - 6,145,378) (10,443) ``` To ameliorate this, this CL shrinks the vector periodically, so far only between passes. This is done by running compaction on the vector during HloComputation::Cleanup(), which is called after every pass. The cost of compaction is made proportional to the number of deleted entries by swapping--if needed--each tombstone with the rightmost (within the vector) non-deleted entry. This brings the number of seen tombstones down significantly: ``` [ 1 - 327,699) ****************************** (937,541) [ 327,699 - 655,396) (308) [ 655,396 - 983,094) (0) [ 983,094 - 1,310,792) (1) ``` Note: we could further improve compaction by calling Cleanup() from some passes, instead of just between passes. However, that would not yield a significant gain; at least for this model, scanning the instructions' vector now takes ~1% of total time (vs. ~17% before). FUTURE_COPYBARA_INTEGRATE_REVIEW=https://github.com/openxla/xla/pull/10503 from pearu:pearu/log1p d35cef4f5fa09482c49edfee709e86c5ca29adde PiperOrigin-RevId: 619057964
显示
- tensorflow/compiler/mlir/quantization/stablehlo/BUILD 1 个添加, 0 个删除tensorflow/compiler/mlir/quantization/stablehlo/BUILD
- tensorflow/compiler/mlir/quantization/stablehlo/passes/passes.td 7 个添加, 0 个删除...low/compiler/mlir/quantization/stablehlo/passes/passes.td
- tensorflow/compiler/mlir/quantization/stablehlo/passes/quantize_composite_functions.cc 5 个添加, 0 个删除...tization/stablehlo/passes/quantize_composite_functions.cc
- tensorflow/compiler/mlir/quantization/stablehlo/passes/xla_call_module_to_call.cc 83 个添加, 0 个删除.../quantization/stablehlo/passes/xla_call_module_to_call.cc
- tensorflow/compiler/mlir/quantization/stablehlo/python/integration_test/quantize_model_test.py 120 个添加, 0 个删除.../stablehlo/python/integration_test/quantize_model_test.py
- tensorflow/compiler/mlir/quantization/stablehlo/python/integration_test/quantize_model_test_base.py 50 个添加, 0 个删除...lehlo/python/integration_test/quantize_model_test_base.py
- tensorflow/compiler/mlir/quantization/stablehlo/tests/components/post_calibration_component.mlir 2 个添加, 7 个删除...tablehlo/tests/components/post_calibration_component.mlir
- tensorflow/compiler/mlir/quantization/stablehlo/tests/passes/quantize_composite_functions.mlir 3 个添加, 3 个删除.../stablehlo/tests/passes/quantize_composite_functions.mlir
- tensorflow/compiler/mlir/quantization/stablehlo/tests/passes/xla_call_module_to_call.mlir 23 个添加, 0 个删除...ation/stablehlo/tests/passes/xla_call_module_to_call.mlir
- third_party/xla/xla/hlo/ir/hlo_computation.cc 57 个添加, 2 个删除third_party/xla/xla/hlo/ir/hlo_computation.cc
- third_party/xla/xla/hlo/ir/hlo_computation.h 6 个添加, 9 个删除third_party/xla/xla/hlo/ir/hlo_computation.h
- third_party/xla/xla/python/xla_client.py 1 个添加, 1 个删除third_party/xla/xla/python/xla_client.py
- third_party/xla/xla/service/elemental_ir_emitter.cc 55 个添加, 11 个删除third_party/xla/xla/service/elemental_ir_emitter.cc
- third_party/xla/xla/tests/BUILD 23 个添加, 0 个删除third_party/xla/xla/tests/BUILD
- third_party/xla/xla/tests/complex_unary_op_samples.h 1448 个添加, 0 个删除third_party/xla/xla/tests/complex_unary_op_samples.h
- third_party/xla/xla/tests/complex_unary_op_test.cc 101 个添加, 0 个删除third_party/xla/xla/tests/complex_unary_op_test.cc
- third_party/xla/xla/tests/generate_complex_unary_op_samples.py 231 个添加, 0 个删除..._party/xla/xla/tests/generate_complex_unary_op_samples.py
想要评论请 注册 或 登录