提交 · v0.3.2 · HPCSource / colossalai

该项目从 https://github.com/hpcaitech/ColossalAI.git 镜像。拉取镜像更新于 31分钟前。

9月 06, 2023
- [release] update version (#4623) · 9709b8f5
  由 Hongxin Liu 创作于 1年前
  
  v0.3.2
  
  9709b8f5
9月 05, 2023
- Merge pull request #4612 from hpcaitech/feature/shardformer · efba0f44
  由 Hongxin Liu 创作于 1年前
```
[shardformer] update hybrid parallel plugin and fix bugs
```
  efba0f44
- Merge branch 'main' into feature/shardformer · fae6c92e
  由 Hongxin Liu 创作于 1年前
  
  fae6c92e
- [legacy] move builder and registry to legacy (#4603) · ac178ca5
  由 Hongxin Liu 创作于 1年前
  
  ac178ca5
- [legacy] move engine to legacy (#4560) · 8accecd5
  由 Hongxin Liu 创作于 1年前
```
* [legacy] move engine to legacy

* [example] fix seq parallel example

* [example] fix seq parallel example

* [test] test gemini pluging hang

* [test] test gemini pluging hang

* [test] test gemini pluging hang

* [test] test gemini pluging hang

* [test] test gemini pluging hang

* [example] update seq parallel requirements
```
  8accecd5
- [legacy] move trainer to legacy (#4545) · 89fe0277
  由 Hongxin Liu 创作于 1年前
```
* [legacy] move trainer to legacy

* [doc] update docs related to trainer

* [test] ignore legacy test
```
  89fe0277
- [test] fix gemini checkpoint and gpt test (#4620) · bd186784
  由 Hongxin Liu 创作于 1年前
  
  bd186784
- [zero] hotfix master param sync (#4618) · 807e01a4
  由 Hongxin Liu 创作于 1年前
```
* [zero] add method to update master params

* [zero] update zero plugin

* [plugin] update low level zero plugin
```
  807e01a4
- [test] ignore gpt2 shardformer test (#4619) · e71d2452
  由 Hongxin Liu 创作于 1年前
  
  e71d2452
- [shardformer] update shardformer readme (#4617) · ec086680
  由 flybird11111 创作于 1年前
```
[shardformer] update shardformer readme

[shardformer] update shardformer readme
```
  ec086680
- [shardformer] Add overlap optional for HybridParallelPlugin (#4615) · 86d22581
  由 Bin Jia 创作于 1年前
```
* add optional overlap for plugin

* remove fixed todo
```
  86d22581
9月 04, 2023

Merge branch 'main' into feature/shardformer · a39a5c66
由 Hongxin Liu 创作于 1年前

a39a5c66
[checkpointio] support huggingface from_pretrained for all plugins (#4606) · e79b1e80
由 Baizhou Zhang 创作于 1年前

e79b1e80

[shardformer] update bert finetune example with HybridParallelPlugin (#4584) · 0a94fcd3

由 flybird11111 创作于 1年前


* [shardformer] fix opt test hanging

* fix

* test

* test

* test

* fix test

* fix test

* remove print

* add fix

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] fix epoch change

* [shardformer] broadcast add pp group

* [shardformer] fix opt test hanging

* fix

* test

* test

* [shardformer] zero1+pp and the corresponding tests (#4517)

* pause

* finish pp+zero1

* Update test_shard_vit.py

* [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)

* fix overlap bug and support bert, add overlap as an option in shardconfig

* support overlap for chatglm and bloom

* [shardformer] fix emerged bugs after updating transformers (#4526)

* test

* fix test

* fix test

* remove print

* add fix

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] Add overlap support for gpt2 (#4535)

* add overlap support for gpt2

* remove unused code

* remove unused code

* [shardformer] support pp+tp+zero1 tests (#4531)

* [shardformer] fix opt test hanging

* fix

* test

* test

* test

* fix test

* fix test

* remove print

* add fix

* [shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] fix submodule replacement bug when enabling pp (#4544)

* [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)

* implement sharded optimizer saving

* add more param info

* finish implementation of sharded optimizer saving

* fix bugs in optimizer sharded saving

* add pp+zero test

* param group loading

* greedy loading of optimizer

* fix bug when loading

* implement optimizer sharded saving

* add optimizer test & arrange checkpointIO utils

* fix gemini sharding state_dict

* add verbose option

* add loading of master params

* fix typehint

* fix master/working mapping in fp16 amp

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] fix epoch change

* [shardformer] broadcast add pp group

* rebase feature/shardformer

* update pipeline

* [shardformer] fix

* [shardformer] fix

* [shardformer] bert finetune fix

* [shardformer] add all_reduce operation to loss

add all_reduce operation to loss

* [shardformer] make compatible with pytree.

make compatible with pytree.

* [shardformer] disable tp

disable tp

* [shardformer] add 3d plugin to ci test

* [shardformer] update num_microbatches to None

* [shardformer] update microbatchsize

* [shardformer] update assert

* update scheduler

* update scheduler

---------

Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>

0a94fcd3

[shardformer] Pytree fix (#4533) · 24c07687

由 Jianghai 创作于 1年前

* pytree test

* test bert

* test bert

* test bert

* revise

* add register

* add register

24c07687

Merge pull request #4542 from hpcaitech/chatglm · aaeb520c
由 yingliu-hpc 创作于 1年前
```
[coati] Add chatglm in coati
```
aaeb520c
[doc] add llama2 benchmark (#4604) · 8d7b0229
由 binmakeswell 创作于 1年前
```
* [doc] add llama2 benchmark

* [doc] add llama2 benchmark
```
8d7b0229
[DOC] hotfix/llama2news (#4595) · 7a978eb3
由 binmakeswell 创作于 1年前
```
* [doc] add llama2 news

* [doc] add llama2 news

* [doc] add llama2 news
```
7a978eb3
[checkpointio] optimize zero optim checkpoint io (#4591) · 63ecafb1
由 Hongxin Liu 创作于 1年前
```
* [zero] update checkpoint io to save memory

* [checkpointio] add device map to save memory
```
63ecafb1

9月 01, 2023
- [pipeline] 1f1b schedule receive microbatch size (#4589) · 508ca36f
  由 Hongxin Liu 创作于 1年前
  
  508ca36f
- [Fix] Fix compile error (#4357) · cfa60708
  由 Mashiro 创作于 1年前
  
  cfa60708
- Update Dockerfile (#4499) · eb952ea8
  由栾鹏创作于 1年前
```
fix dockerfile build
```
  eb952ea8
- [zero]fix zero ckptIO with offload (#4529) · cbac7822
  由 LuGY 创作于 1年前
```
* fix zero ckptio with offload

* fix load device

* saved tensors in ckpt should be on CPU

* fix unit test

* fix unit test

* add clear cache

* save memory for CI
```
  cbac7822
- [shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) · 38ccb8b1
  由 Baizhou Zhang 创作于 1年前
```
* hybrid plugin support huggingface from_pretrained

* add huggingface compatibility tests

* add folder cleaning

* fix bugs
```
  38ccb8b1
8月 31, 2023

[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) · c9625dbb

由 Baizhou Zhang 创作于 1年前

* implement sharded optimizer saving

* add more param info

* finish implementation of sharded optimizer saving

* fix bugs in optimizer sharded saving

* add pp+zero test

* param group loading

* greedy loading of optimizer

* fix bug when loading

* implement optimizer sharded saving

* add optimizer test & arrange checkpointIO utils

* fix gemini sharding state_dict

* add verbose option

* add loading of master params

* fix typehint

* fix master/working mapping in fp16 amp

c9625dbb

[shardformer] fix submodule replacement bug when enabling pp (#4544) · 2c787d7f
由 Baizhou Zhang 创作于 1年前

2c787d7f

8月 30, 2023
- [devops] cancel previous runs in the PR (#4546) · c7b60f75
  由 Hongxin Liu 创作于 1年前
  
  c7b60f75
- [example] change accelerate version (#4431) · f1ae8c91
  由 Tian Siyuan 创作于 1年前
```
Co-authored-by: Siyuan Tian <siyuant@vmware.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
```
  f1ae8c91
- [example] update streamlit 0.73.1 to 1.11.1 (#4386) · 8e2e1992
  由 ChengDaqi2023 创作于 1年前
  
  8e2e1992
- [shardformer] support pp+tp+zero1 tests (#4531) · ec18fc73
  由 flybird11111 创作于 1年前
```
* [shardformer] fix opt test hanging

* fix

* test

* test

* test

* fix test

* fix test

* remove print

* add fix

* [shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1
```
  ec18fc73
- fix runtime prepare pass (#4502) · 12c95a9f
  由 Lufang Chen 创作于 1年前
```
Co-authored-by: lufang.chen <lufang.chen@nio.com>
```
  12c95a9f
- keep requirements same with main branch · 9f852f24
  由 Ying Liu 创作于 1年前
  
  9f852f24
- [shardformer] fix opt test hanging (#4521) · d367b887
  由 flybird11111 创作于 1年前
```
* [shardformer] fix opt test hanging

* fix

* test

* test

* test

* fix test

* fix test

* remove print

* add fix
```
  d367b887
- fix colossalai version in coati examples · c648dc09
  由 Ying Liu 创作于 1年前
  
  c648dc09
- Merge pull request #4541 from ver217/coati/chatglm · 661a1ef7
  由 yingliu-hpc 创作于 1年前
```
[coati] update ci
```
  661a1ef7
- [coati] update ci · 1c43bfd5
  由 ver217 创作于 1年前
  
  1c43bfd5
8月 29, 2023

[shardformer] Add overlap support for gpt2 (#4535) · e241b74f
由 Bin Jia 创作于 1年前
```
* add overlap support for gpt2

* remove unused code

* remove unused code
```
e241b74f

[coati] add chatglm model (#4539) · 1467e3b4

由 yingliu-hpc 创作于 1年前

* update configuration of chatglm and add support in coati

* add unit test & update chatglm default config & fix bos index issue

* remove chatglm due to oom

* add dataset pkg in requirement-text

* fix parameter issue in test_models

* add ref in tokenize & rm unnessary parts

* separate source & target tokenization in chatglm

* add unit test to chatglm

* fix test dataset issue

* update truncation of chatglm

* fix Colossalai version

* fix colossal ai version in test

1467e3b4

[shardformer] fix emerged bugs after updating transformers (#4526) · 0387a47e
由 Baizhou Zhang 创作于 1年前

0387a47e

8月 28, 2023

[example] add llama2 example (#4527) · 0b00def8

由 Hongxin Liu 创作于 1年前

* [example] transfer llama-1 example

* [example] fit llama-2

* [example] refactor scripts folder

* [example] fit new gemini plugin

* [cli] fix multinode runner

* [example] fit gemini optim checkpoint

* [example] refactor scripts

* [example] update requirements

* [example] update requirements

* [example] rename llama to llama2

* [example] update readme and pretrain script

* [example] refactor scripts

0b00def8