Version 0.9.0 ============= In addition to bug fixes, documentation updates, and usability improvements, this release includes some larger changes: - The off-target bins in .cnn and .cnr files are now assigned the label "Antitarget" instead of "Background" in the "gene" column. The label "Background" in existing files will still be handled the same way, but new output files generated with CNVkit 0.9.0 and later will use the "Antitarget" label -- so, earlier versions of CNVkit may have problems with files produced by CNVkit 0.9.0. Some command line options and API keyword arguments similarly replace "background" with "antitarget", with shims in place for compatibility with existing scripts. (#171) - The sub-packages 'genome' and 'tabio' are now in a separate top-level package 'skgenome', still included in the CNVkit distribution. (See "Python API" below.) This does not affect the command-line usage of CNVkit, but clears the way to extract a scikit-genome package that can be installed and used separately from CNVkit for computing with genomic intervals. Documentation ------------- - Link to example VCF in the test suite - Describe the 'breaks' command's output columns ( #220) - Show an example customizing a plot with pyplot ( #196) Dependencies ------------ - pysam: raise minimum to 0.10; support new version 0.11.2.1 (#218; thanks @chapmanb) - pandas: support new version 0.20.1 (#215) - numpy: support new version 0.13 (#235, #238) Commands -------- `batch`: - Log the CNVkit version number at the start of the run - Print a message at the end if no tumor/test samples specified. (#214) - Clarify error messages for bad option combinations (#216) - Removed deprecated, suppressed/invisible option `--split`. It was a shim in the 0.8 series to support old scripts. `reference`: - Ensure the inferred chromosomal sex matches between the targets and antitargets for the same sample. If the inferences do not match, prefer antitargets. (#234, #237) `fix`: - Warn & don't reweight bins if most antitargets have no/low coverage. This avoids a variety of surprising downstream problems when the input was specified as hybrid capture (the default), but is actualy from targeted amplicon sequencing, or otherwise has no reads mapped to most off-target bins. `segment`: - Log the segmentation and p-value/q-value threshold `call`: - Add option --center-at - Let --center w/o argument do 'median' `diagram`: - New option `--title` to add a custom title to the top of the generated figure (#239; thanks @micknudsen) `export vcf`: - When given a .cnr file corresponding to the usual segmented input file (.cns), emit the CIPOS and CIEND tags in the generated VCF. These indicate the "fuzzy" coordinates of segment breakpoints. Here, the ranges are simply the widths of the underlying bins adjacent to each segment breakpoint. These tags can help meta-methods aggregate/harmonize CNVkit's calls with those of other structural variant callers. (#72) `import-picard`: - Don't accept directory as an argument (was deprecated). - Be a little more flexible in filenames accepted: instead of requiring input files to be named `*.targetcoverage.???` or `*.antitargetcoverage.???`, strip the full suffix and default to 'targetcoverage.cnn' output suffix, or 'antitargetcoverage.cnn' if input filename contains 'antitarget'. Works the same for filenames following the earlier convention, but now pretty safe for amplicon targets with arbitrary filenames, and slightly less spooky. Bug fixes --------- - `antitarget`: Don't crash if -g/--access is not given (#207) - `batch`: Don't crash in 'wgs' mode when given just targets (-t) without a FASTA reference genome sequence (-f) -`call --filter ampdel`: Drop segments with copy number (`cn` field) between 0 and 5, exclusive, as the documentation indicates. Previously, it was just merging adjacent segments with copy number 1--4, but not dropping them. (#222) - `export cdt`: Match the CDT spec. Fix a regression in which columns could be swapped/misaligned versus the header. Add a dummy "EWEIGHT" row to ensure Java TreeView starts reading data from the correct line in the file. - `export theta`: Don't crash on bins where reference is NaN. (#168) - `metrics`, `descriptives`: Handle degenerate/trivial cases consistently. (#202) - `segment`: Handle sample names that are integers with leading zeros (#213) - `sex`: Don't crash if chrX and chrY are both missing (#236) - VCF parsing (`call`, `scatter`, `segment`): - Safely handle small or empty VCF files that previously could trigger a crash during BAF calculation. Now, with an empty VCF an all-blank "baf" will be emitted. (#218, #224; thanks @chapmanb) - Improve handling of Mutect2 VCF files, somewhat. Mutect2 VCFs are still not recommended as input to CNVkit; try FreeBayes or GATK HaplotypeCaller instead. (#195) Python API ---------- Moved sub-packages 'genome' and 'tabio' to separate top-level package 'skgenome' (#201). The top-level `cnvlib` API is mostly the same otherwise, but supporting modules were refactored to decouple `skgenome` from `cnvlib` and remove redundancies. In particular: - Split module `cnvlib.core` split into `skgenome.tabio` and `cnvlib.cmdutil` - Remove GenomicArray static method `row2label` in favor of functions `to_label` and `from_label` in new module `skgenome.rangelabel`. - The SEG writer in 'tabio' now replaces chromosome names with 1-based integer indices, per SEG spec/convention. The `export seg` command now uses this writer directly. Scripts ------- - Remove the script `coverage_bin_size.py`, previously deprecated in favor of the `autobin` command. - Add `skg_convert.py` to convert between tabular formats. - Add `cnn_annotate.py` to replace the 'gene' field for each bin in a .cnn or .cnr file, given a gene annotation database like refFlat.txt. The need for this comes up occasionally when users notice at the end of an analysis that vendor-annotated targets are not the desired gene names.