v0.9.1 · 标签 · HPCSource / cnvkit

v0.9.1
b20dc5be · Bump version to 0.9.1 · 11月 09, 2017
Version 0.9.1

Highlights: Useful enhancements and changes to plotting and segmentation, and a
new script for single-exon CNV testing. Plus, bug fixes and usability
improvements to avoid unexpected errors. (#250, #255, #262, etc.)

Dependencies
------------

- Compatible with the most recent pandas version 0.21.0
  (#273, #274; thanks @chapmanb)
- R dependencies were reduced to simplify installation

Scripts
-------

- Renamed "cnn_*.py" to "cnv_*.py"
- New script "cnv_ztest.py" to detect single-bin (e.g. single exon) deep
  deletions and high-level amplifications.
- In "cnv_updater.py", rename "Background" (i.e. off-target) bins to
  "Antitarget", addition to adding a "depth" column if it's missing.

Commands
--------

`autobin`:

- Raise the maximum target/antitarget bin sizes to 50kb/1Mb.

`fix`:

- Allow specifying sample_id via ``--sample-id``/``-id``, in case the input
  coverage filenames do not have the expected form
  "sample_id.targetcoverage.cnn" and "sample_id.antitargetcoverage.cnn".
  (#269; thanks @chapmanb)

`segment`:

- Process each chromosome arm separately (with 'cbs' and 'haar', but not
  'flasso'). Centromere locations are guessed from the largest gap between
  sequencing-accessible regions, and are not necessarily the true locations,
  although they do match fairly well on the human genome.
- Logging of dropped bins is streamlined somewhat.
- New method `-m none` to only calculate arm-level segment means (for testing
  and experimentation).

`scatter`:

- Highlight non-neutral segments from .call.cns. If segments have the columns
  'cn' and potentially also 'cn1' and 'cn2' (as added by the `call` command),
  use those fields to display copy number alterations, LOH and allelic imbalance
  with colorized segments (orange by default), and use gray for neutral
  segments. If a VCF is also given, the same is done for SNVs in the lower
  panel.  Otherwise, all segments are colorized as before. (#18, #157)
- New option `--by-bins` to display x-axis positions by sequential bin number on
  each chromosome, rather than genomic coordinates. This makes the plots much
  more useful with targeted amplicon sequencing data, or very small gene panels.
  (#63)
- Trend line (`--trend`) now accounts for bin weights, which generally results
  in a better fit.
- Improved interaction of -c and -g options:

    - Only apply the window margin (-w) if -g is used alone, or -c specifies a small
      chromosomal region with no genes.
    - Allow an empty gene list (-g '' or -g ',') to prevent highlighting and
      labeling of any genes / small non-genic "Selection" in the -c region.
    - If any gene in -g is not fully within the region specified by -c, name that
      gene and its coordinates in the error message.
    - If the -c region has size <=0, show a specific error message.
    - Handle NaN log2 values when calculating y-axis limits.

`heatmap`:

- Incorporate the `--by-bins` argument to match `scatter`. (#63)
- Warn if selected region contains no data for a sample. This helps troubleshoot
  if a chromosome name was mis-specified on the command line. (#268)

`export seg`:

- Change column headers to match DNAcopy output. The column headers generally
  don't matter in the SEG format, but the DNAcopy dataframe is considered the
  canonical form.

Python API
----------

- cnvlib.do_segment -- new keyword argument min_weight to drop bins with
  'weight' below the specified value. If not used, then only bins with weight 0
  will be dropped. This feature is not recommended for normal usage and is not
  available on the command line.
- cnvlib.do_scatter -- Remove deprecated keyword argument 'background_marker' in
  favor of 'antitarget_marker', corresponding to `scatter` options deprecated in
  v0.9.0.
- cnvlib.cnary.CopyNumArray: Add method 'smoothed', which calculates the
  trendline displayed by the `scatter` command.
- skgenome.tabio: Add read support for samtools 'dict' format, which resembles the
  plain-text SAM header and can contain chromosome names and sizes.
- skgenome.gary.GenomicArray: Add magic methods __bool__ (Py3) and __nonzero__
  (Py2) to ensure an empty GenomicArray, i.e. 0 rows, is treated as false-ish on
  both Python 2.7 and 3.x.
下载源代码