Version 0.8 This is a larger release and the first update since our [publication](http://dx.doi.org/10.1371/journal.pcbi.1004873). CNVkit now runs under Python 3 as well as 2.7. (#3, #101; thanks @mpschr) File format changes: - New "depth" column in .cnn, .cnr, .cns - In .cns, "weight" is the sum, not mean, of bin-level weights within the segment New script ``cnn_updater.py`` can be used to add the "depth" column to existing .cnn, .cnr and .cns files. However, most CNVkit commands should still work with pre-v0.8 files without using this script first. For best results, rebuild the .cnr and .cns for an ongoing study using the existing targetcoverage, antitargetcoverage and reference .cnn files. Algorithmic changes: - `reference, `gender`, `call`, `diagram`, `export`: Gender, or chromosomal sex, is now inferred with a statistical test instead of a fixed threshold, significantly improving the inferences on noisy or aneuploid samples. (#116) - `reference`, `fix`, `call`: Center log2 values by median of chromosome medians, by default. (#114) - `reference`, `metrics`, `segmetrics`: Improve the calculation of biweight location and biweight midvariance (now in descriptives.py). These deprecated components (since 0.7.x) have been removed: - Commands `rescale` and `loh` -- use `call` and `scatter`, respectively, instead - Some options in `export bed` and `export theta` -- use `call` first instead - Script `genome2access.py` -- use `cnvkit.py access` instead Updated commands: `batch`: - New option --method, with choices "hybrid" (default), "wgs", "amplicon", to simplify/streamline usage with whole-genome or amplicon sequencing protocols. See documentation for details; in short, "wgs" and "amplicon" do not use antitargets or the edge/density bias correction; "wgs" by default uses the sequencing-accessible genome as the targets, and uses a more stringent significance threshold for segmentation. - Hide/deprecate --split option; it's always on now. To ensure bin coordinates do not change between `batch` runs (they generally won't anyway), use the -r/--reference option instead of specifying -t and -a in `batch`. - Add --drop-low-coverage option, which is passed to `segment` internally. - The -p/--processes option is also passed to `coverage` and `segment` internally (see below). `antitarget`: - Increase the default average bin size from 100kb to 200kb. `coverage`: - Parallelize coverage calculation over BED rows. The number of threads can be specified with the `-p` option. (#121; thanks @brentp) `segment`: - Parallelize CBS and Haar segmentation methods across chromosomes. (#123, #125; thanks @brentp) `call`: - New --filter option, with choices 'cn', 'ampdel', 'ci', 'sem' implemented. - With VCF b-allele frequencies (`-v`, 'baf'), always calculate the allele-specific integer copy numbers 'cn1' and 'cn2' so that 'cn1' is the larger one. BAF mirror direction stays majority-rules. (#105; thanks @mpschr) - If b-allele frequencies are used and total copy number is zero, report allelic copy numbers as 0, not NaN. `scatter`: - Add --title option. - Allow selecting & labeling gene(s) w/ only segments as input `heatmap`, `scatter`: - Allow saving plots in any image file format supported by matplotlib, not just The file format is determined by the output filename's extension, e.g. 'png' saves in PNG format -- making it easier to integrate CNVkit plots with HTML reports. (#120; thanks @chapmanb) `diagram`: - Add -g/--gender option to specify sample's known gender. `gainloss`: - Make output tables more consistent across options. Show individual gene names (rather than all genes grouped within a segment in 1 row); don't show rows with no gene name; report the segment probe count instead of number of probes within the gene; show any extra columns present in the input .cns file. (#107, #108; thanks @mpschr) `gender`: - Show column headers and Y-chromosome log2 values in the output table. `segmetrics`: - Add stats options for mean, median, mode - Add MSE, SEM stats as options `metrics`, `segmetrics`: - Add --drop-low-coverage option (like in `segment` and `gainloss`) Internals: - New sub-package tabio: a more robust I/O framwork unifying support for tabular formats, including CNVkit's .cnn/.cnr/.cns, BED, SEG, VCF, GATK/Picard interval list, and text coordinates (chr:start:end). Base class GenomicArray and its derived classes CopyNumArray and VariantArray do not implement their own I/O, but rather are instantiated via tabio. The "import-" commands use this as well. - Removed rary.RegionArray; all functionality is now in tabio and GenomicArray. - New module "descriptives.py" implements descriptive statistics on plain numpy arrays or pandas Series instances, independent of CNVkit. - Better testing on Travis, covering Python 2.7, 3.4 and 3.5, on both Linux and OS X (thanks @kyleabeauchamp, @rmcgibbo, and @mpharrigan; #110) Bug fixes: - `batch`: Errors in parallel processes will immediately be raised as exceptions at the top level, rather than dying silently. Previously, no error would occur until a missing output file was needed later in the pipeline. (#55) - `segment`: - Skip possible R warning text when parsing CBS output (#106) and run Rscript with the --vanilla option (#112; thanks @jsmedmar). Non-isolated R processes were prone to add various warning messages to the expected SEG output, which could crash the "segment" command for some users. - Handle zero-weight bins better (#128; thanks @chapmanb). - `scatter`: - Handle selected segments with an empty gene name (#104; thanks @mpschr). - Don't crash on zero-length GenomicArray/CopyNumArray inputs. - VCF parsing (now within tabio) improved: - More robust to missing genotype (GT) & depth (DP) fields (#102) - Handle VCFs from MuTect2 (#122) - `export theta`: don't crash when SNP VCF is a single, unpaired sample, or if segmented input (.cns) is empty. - `heatmap`: Avoid a possible crash if a sample is missing a chromosome. Packaging: - Universal wheels are enabled for installation with pip (via setup.cfg). New & updated dependencies: - futures - futurize - numpy raised to version 1.9 - pandas raised to version 0.18.1 - pysam version 0.9.1.1 is specifically excluded