v0.8.0 · 标签 · HPCSource / cnvkit

v0.8.0
ab9c400b · Bump version to 0.8.0 · 9月 13, 2016
Version 0.8

This is a larger release and the first update since our
[publication](http://dx.doi.org/10.1371/journal.pcbi.1004873).

CNVkit now runs under Python 3 as well as 2.7. (#3, #101; thanks @mpschr)

File format changes:

- New "depth" column in .cnn, .cnr, .cns
- In .cns, "weight" is the sum, not mean, of bin-level weights within the segment

New script ``cnn_updater.py`` can be used to add the "depth" column to existing
.cnn, .cnr and .cns files. However, most CNVkit commands should still work with
pre-v0.8 files without using this script first. For best results, rebuild the
.cnr and .cns for an ongoing study using the existing targetcoverage,
antitargetcoverage and reference .cnn files.

Algorithmic changes:

- `reference, `gender`, `call`, `diagram`, `export`: Gender, or chromosomal sex,
  is now inferred with a statistical test instead of a fixed threshold,
  significantly improving the inferences on noisy or aneuploid samples. (#116)
- `reference`, `fix`, `call`: Center log2 values by median of chromosome
  medians, by default. (#114)
- `reference`, `metrics`, `segmetrics`: Improve the calculation of biweight
  location and biweight midvariance (now in descriptives.py).

These deprecated components (since 0.7.x) have been removed:

- Commands `rescale` and `loh` -- use `call` and `scatter`, respectively, instead
- Some options in `export bed` and `export theta` -- use `call` first instead
- Script `genome2access.py` -- use `cnvkit.py access` instead

Updated commands:

`batch`:

- New option --method, with choices "hybrid" (default), "wgs", "amplicon", to
  simplify/streamline usage with whole-genome or amplicon sequencing protocols.
  See documentation for details; in short, "wgs" and "amplicon" do not use
  antitargets or the edge/density bias correction; "wgs" by default uses the
  sequencing-accessible genome as the targets, and uses a more stringent
  significance threshold for segmentation.
- Hide/deprecate --split option; it's always on now. To ensure bin coordinates
  do not change between `batch` runs (they generally won't anyway), use the
  -r/--reference option instead of specifying -t and -a in `batch`.
- Add --drop-low-coverage option, which is passed to `segment` internally.
- The -p/--processes option is also passed to `coverage` and `segment`
  internally (see below).

`antitarget`:

- Increase the default average bin size from 100kb to 200kb.

`coverage`:

- Parallelize coverage calculation over BED rows. The number of threads can be
  specified with the `-p` option. (#121; thanks @brentp)

`segment`:

- Parallelize CBS and Haar segmentation methods across chromosomes. (#123, #125;
  thanks @brentp)

`call`:

- New --filter option, with choices 'cn', 'ampdel', 'ci', 'sem' implemented.
- With VCF b-allele frequencies (`-v`, 'baf'), always calculate the
  allele-specific integer copy numbers 'cn1' and 'cn2' so that 'cn1' is the
  larger one. BAF mirror direction stays majority-rules. (#105; thanks @mpschr)
- If b-allele frequencies are used and total copy number is zero, report allelic
  copy numbers as 0, not NaN.

`scatter`:

- Add --title option.
- Allow selecting & labeling gene(s) w/ only segments as input

`heatmap`, `scatter`:

- Allow saving plots in any image file format supported by matplotlib, not just
  The file format is determined by the output filename's extension, e.g. 'png'
  saves in PNG format -- making it easier to integrate CNVkit plots with HTML
  reports. (#120; thanks @chapmanb)

`diagram`:

- Add -g/--gender option to specify sample's known gender.

`gainloss`:

- Make output tables more consistent across options. Show individual gene names
  (rather than all genes grouped within a segment in 1 row); don't show rows
  with no gene name; report the segment probe count instead of number of probes
  within the gene; show any extra columns present in the input .cns file. (#107,
  #108; thanks @mpschr)

`gender`:

- Show column headers and Y-chromosome log2 values in the output table.

`segmetrics`:

- Add stats options for mean, median, mode
- Add MSE, SEM stats as options

`metrics`, `segmetrics`:

- Add --drop-low-coverage option (like in `segment` and `gainloss`)

Internals:

- New sub-package tabio: a more robust I/O framwork unifying support for tabular
  formats, including CNVkit's .cnn/.cnr/.cns, BED, SEG, VCF, GATK/Picard
  interval list, and text coordinates (chr:start:end).
  Base class GenomicArray and its derived classes CopyNumArray and VariantArray
  do not implement their own I/O, but rather are instantiated via tabio.
  The "import-" commands use this as well.
- Removed rary.RegionArray; all functionality is now in tabio and GenomicArray.
- New module "descriptives.py" implements descriptive statistics on plain numpy
  arrays or pandas Series instances, independent of CNVkit.
- Better testing on Travis, covering Python 2.7, 3.4 and 3.5, on both Linux and
  OS X (thanks @kyleabeauchamp, @rmcgibbo, and @mpharrigan; #110)

Bug fixes:

- `batch`: Errors in parallel processes will immediately be raised as exceptions
  at the top level, rather than dying silently. Previously, no error would occur
  until a missing output file was needed later in the pipeline. (#55)
- `segment`:

    - Skip possible R warning text when parsing CBS output (#106) and run
      Rscript with the --vanilla option (#112; thanks @jsmedmar). Non-isolated
      R processes were prone to add various warning messages to the expected SEG
      output, which could crash the "segment" command for some users.
    - Handle zero-weight bins better (#128; thanks @chapmanb).

- `scatter`:

    - Handle selected segments with an empty gene name (#104; thanks @mpschr).
    - Don't crash on zero-length GenomicArray/CopyNumArray inputs.

- VCF parsing (now within tabio) improved:

    - More robust to missing genotype (GT) & depth (DP) fields (#102)
    - Handle VCFs from MuTect2 (#122)

- `export theta`: don't crash when SNP VCF is a single, unpaired sample, or if
  segmented input (.cns) is empty.
- `heatmap`: Avoid a possible crash if a sample is missing a chromosome.

Packaging:

- Universal wheels are enabled for installation with pip (via setup.cfg).

New & updated dependencies:

- futures
- futurize
- numpy raised to version 1.9
- pandas raised to version 0.18.1
- pysam version 0.9.1.1 is specifically excluded
下载源代码