Skip to content
代码片段 群组 项目
该项目从 https://github.com/hpcaitech/ColossalAI.git 镜像。 拉取镜像更新于
  1. 9月 06, 2023
  2. 9月 05, 2023
  3. 9月 04, 2023
    • Hongxin Liu's avatar
      Merge branch 'main' into feature/shardformer · a39a5c66
      Hongxin Liu 创作于
      a39a5c66
    • Baizhou Zhang's avatar
    • flybird11111's avatar
      [shardformer] update bert finetune example with HybridParallelPlugin (#4584) · 0a94fcd3
      flybird11111 创作于
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] fix epoch change
      
      * [shardformer] broadcast add pp group
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * [shardformer] zero1+pp and the corresponding tests (#4517)
      
      * pause
      
      * finish pp+zero1
      
      * Update test_shard_vit.py
      
      * [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)
      
      * fix overlap bug and support bert, add overlap as an option in shardconfig
      
      * support overlap for chatglm and bloom
      
      * [shardformer] fix emerged bugs after updating transformers (#4526)
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] Add overlap support for gpt2 (#4535)
      
      * add overlap support for gpt2
      
      * remove unused code
      
      * remove unused code
      
      * [shardformer] support pp+tp+zero1 tests (#4531)
      
      * [shardformer] fix opt test hanging
      
      * fix
      
      * test
      
      * test
      
      * test
      
      * fix test
      
      * fix test
      
      * remove print
      
      * add fix
      
      * [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] pp+tp+zero1
      
      * [shardformer] fix submodule replacement bug when enabling pp (#4544)
      
      * [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)
      
      * implement sharded optimizer saving
      
      * add more param info
      
      * finish implementation of sharded optimizer saving
      
      * fix bugs in optimizer sharded saving
      
      * add pp+zero test
      
      * param group loading
      
      * greedy loading of optimizer
      
      * fix bug when loading
      
      * implement optimizer sharded saving
      
      * add optimizer test & arrange checkpointIO utils
      
      * fix gemini sharding state_dict
      
      * add verbose option
      
      * add loading of master params
      
      * fix typehint
      
      * fix master/working mapping in fp16 amp
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] add bert finetune example
      
      * [shardformer] fix epoch change
      
      * [shardformer] broadcast add pp group
      
      * rebase feature/shardformer
      
      * update pipeline
      
      * [shardformer] fix
      
      * [shardformer] fix
      
      * [shardformer] bert finetune fix
      
      * [shardformer] add all_reduce operation to loss
      
      add all_reduce operation to loss
      
      * [shardformer] make compatible with pytree.
      
      make compatible with pytree.
      
      * [shardformer] disable tp
      
      disable tp
      
      * [shardformer] add 3d plugin to ci test
      
      * [shardformer] update num_microbatches to None
      
      * [shardformer] update microbatchsize
      
      * [shardformer] update assert
      
      * update scheduler
      
      * update scheduler
      
      ---------
      
      Co-authored-by: default avatarJianghai <72591262+CjhHa1@users.noreply.github.com>
      Co-authored-by: default avatarBin Jia <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarBaizhou Zhang <eddiezhang@pku.edu.cn>
      0a94fcd3
    • Jianghai's avatar
      [shardformer] Pytree fix (#4533) · 24c07687
      Jianghai 创作于
      * pytree test
      
      * test bert
      
      * test bert
      
      * test bert
      
      * revise
      
      * add register
      
      * add register
      24c07687
    • yingliu-hpc's avatar
      Merge pull request #4542 from hpcaitech/chatglm · aaeb520c
      yingliu-hpc 创作于
      [coati] Add chatglm in coati
      aaeb520c
    • binmakeswell's avatar
      [doc] add llama2 benchmark (#4604) · 8d7b0229
      binmakeswell 创作于
      * [doc] add llama2 benchmark
      
      * [doc] add llama2 benchmark
      8d7b0229
    • binmakeswell's avatar
      [DOC] hotfix/llama2news (#4595) · 7a978eb3
      binmakeswell 创作于
      * [doc] add llama2 news
      
      * [doc] add llama2 news
      
      * [doc] add llama2 news
      7a978eb3
    • Hongxin Liu's avatar
      [checkpointio] optimize zero optim checkpoint io (#4591) · 63ecafb1
      Hongxin Liu 创作于
      * [zero] update checkpoint io to save memory
      
      * [checkpointio] add device map to save memory
      63ecafb1
  4. 9月 01, 2023
  5. 8月 31, 2023
  6. 8月 30, 2023
  7. 8月 29, 2023
  8. 8月 28, 2023
    • Hongxin Liu's avatar
      [example] add llama2 example (#4527) · 0b00def8
      Hongxin Liu 创作于
      * [example] transfer llama-1 example
      
      * [example] fit llama-2
      
      * [example] refactor scripts folder
      
      * [example] fit new gemini plugin
      
      * [cli] fix multinode runner
      
      * [example] fit gemini optim checkpoint
      
      * [example] refactor scripts
      
      * [example] update requirements
      
      * [example] update requirements
      
      * [example] rename llama to llama2
      
      * [example] update readme and pretrain script
      
      * [example] refactor scripts
      0b00def8