v0.9.0 · 标签 · HPCSource / cnvkit

v0.9.0
87a0c6ed · Bump version to 0.9.0 · 8月 17, 2017
Version 0.9.0
=============

In addition to bug fixes, documentation updates, and usability improvements,
this release includes some larger changes:

- The off-target bins in .cnn and .cnr files are now assigned the label
  "Antitarget" instead of "Background" in the "gene" column. The label
  "Background" in existing files will still be handled the same way, but new
  output files generated with CNVkit 0.9.0 and later will use the "Antitarget"
  label -- so, earlier versions of CNVkit may have problems with files produced
  by CNVkit 0.9.0. Some command line options and API keyword arguments similarly
  replace "background" with "antitarget", with shims in place for compatibility
  with existing scripts. (#171)

- The sub-packages 'genome' and 'tabio' are now in a separate top-level package
  'skgenome', still included in the CNVkit distribution. (See "Python API"
  below.) This does not affect the command-line usage of CNVkit, but clears the
  way to extract a scikit-genome package that can be installed and used
  separately from CNVkit for computing with genomic intervals.

Documentation
-------------

- Link to example VCF in the test suite
- Describe the 'breaks' command's output columns ( #220)
- Show an example customizing a plot with pyplot ( #196)

Dependencies
------------

- pysam: raise minimum to 0.10; support new version 0.11.2.1 (#218; thanks
  @chapmanb)
- pandas: support new version 0.20.1 (#215)
- numpy: support new version 0.13 (#235, #238)

Commands
--------

`batch`:

- Log the CNVkit version number at the start of the run
- Print a message at the end if no tumor/test samples specified. (#214)
- Clarify error messages for bad option combinations (#216)
- Removed deprecated, suppressed/invisible option `--split`. It was a shim in
  the 0.8 series to support old scripts.

`reference`:

- Ensure the inferred chromosomal sex matches between the targets and
  antitargets for the same sample. If the inferences do not match, prefer
  antitargets. (#234, #237)

`fix`:

- Warn & don't reweight bins if most antitargets have no/low coverage. This
  avoids a variety of surprising downstream problems when the input was
  specified as hybrid capture (the default), but is actualy from
  targeted amplicon sequencing, or otherwise has no reads mapped to most
  off-target bins.

`segment`:

- Log the segmentation and p-value/q-value threshold

`call`:

- Add option --center-at
- Let --center w/o argument do 'median'

`diagram`:

- New option `--title` to add a custom title to the top of the generated figure
  (#239; thanks @micknudsen)

`export vcf`:

- When given a .cnr file corresponding to the usual segmented input file (.cns),
  emit the CIPOS and CIEND tags in the generated VCF. These indicate the
  "fuzzy" coordinates of segment breakpoints. Here, the ranges are simply the
  widths of the underlying bins adjacent to each segment breakpoint. These tags
  can help meta-methods aggregate/harmonize CNVkit's calls with those of other
  structural variant callers. (#72)

`import-picard`:

- Don't accept directory as an argument (was deprecated).
- Be a little more flexible in filenames accepted: instead of requiring input
  files to be named `*.targetcoverage.???` or `*.antitargetcoverage.???`, strip
  the full suffix and default to 'targetcoverage.cnn' output suffix, or
  'antitargetcoverage.cnn' if input filename contains 'antitarget'. Works the
  same for filenames following the earlier convention, but now pretty safe for
  amplicon targets with arbitrary filenames, and slightly less spooky.

Bug fixes
---------

- `antitarget`: Don't crash if -g/--access is not given (#207)
- `batch`: Don't crash in 'wgs' mode when given just targets (-t) without a
  FASTA reference genome sequence (-f)
-`call --filter ampdel`: Drop segments with copy number (`cn` field) between 0
  and 5, exclusive, as the documentation indicates. Previously, it was just
  merging adjacent segments with copy number 1--4, but not dropping them. (#222)
- `export cdt`: Match the CDT spec. Fix a regression in which columns could be
  swapped/misaligned versus the header. Add a dummy "EWEIGHT" row to ensure Java
  TreeView starts reading data from the correct line in the file.
- `export theta`: Don't crash on bins where reference is NaN. (#168)
- `metrics`, `descriptives`: Handle degenerate/trivial cases consistently. (#202)
- `segment`: Handle sample names that are integers with leading zeros (#213)
- `sex`: Don't crash if chrX and chrY are both missing (#236)
- VCF parsing (`call`, `scatter`, `segment`):
    - Safely handle small or empty VCF files that previously could trigger a
      crash during BAF calculation. Now, with an empty VCF an all-blank "baf"
      will be emitted. (#218, #224; thanks @chapmanb)
    - Improve handling of Mutect2 VCF files, somewhat. Mutect2 VCFs are still
      not recommended as input to CNVkit; try FreeBayes or GATK HaplotypeCaller
      instead. (#195)

Python API
----------

Moved sub-packages 'genome' and 'tabio' to separate top-level package 'skgenome'
(#201). The top-level `cnvlib` API is mostly the same otherwise, but supporting
modules were refactored to decouple `skgenome` from `cnvlib` and remove
redundancies. In particular:

- Split module `cnvlib.core` split into `skgenome.tabio` and `cnvlib.cmdutil`
- Remove GenomicArray static method `row2label` in favor of functions `to_label`
  and `from_label` in new module `skgenome.rangelabel`.
- The SEG writer in 'tabio' now replaces chromosome names with 1-based integer
  indices, per SEG spec/convention. The `export seg` command now uses this
  writer directly.

Scripts
-------

- Remove the script `coverage_bin_size.py`, previously deprecated in favor of
  the `autobin` command.
- Add `skg_convert.py` to convert between tabular formats.
- Add `cnn_annotate.py` to replace the 'gene' field for each bin in a .cnn or
  .cnr file, given a gene annotation database like refFlat.txt. The need for
  this comes up occasionally when users notice at the end of an analysis that
  vendor-annotated targets are not the desired gene names.
下载源代码