Skip to content
使用标签,可以设置提交历史上的特定点为重要提交
  • v0.8.0
    ab9c400b · Bump version to 0.8.0 ·
    Version 0.8
    
    This is a larger release and the first update since our
    [publication](http://dx.doi.org/10.1371/journal.pcbi.1004873).
    
    CNVkit now runs under Python 3 as well as 2.7. (#3, #101; thanks @mpschr)
    
    File format changes:
    
    - New "depth" column in .cnn, .cnr, .cns
    - In .cns, "weight" is the sum, not mean, of bin-level weights within the segment
    
    New script ``cnn_updater.py`` can be used to add the "depth" column to existing
    .cnn, .cnr and .cns files. However, most CNVkit commands should still work with
    pre-v0.8 files without using this script first. For best results, rebuild the
    .cnr and .cns for an ongoing study using the existing targetcoverage,
    antitargetcoverage and reference .cnn files.
    
    Algorithmic changes:
    
    - `reference, `gender`, `call`, `diagram`, `export`: Gender, or chromosomal sex,
      is now inferred with a statistical test instead of a fixed threshold,
      significantly improving the inferences on noisy or aneuploid samples. (#116)
    - `reference`, `fix`, `call`: Center log2 values by median of chromosome
      medians, by default. (#114)
    - `reference`, `metrics`, `segmetrics`: Improve the calculation of biweight
      location and biweight midvariance (now in descriptives.py).
    
    These deprecated components (since 0.7.x) have been removed:
    
    - Commands `rescale` and `loh` -- use `call` and `scatter`, respectively, instead
    - Some options in `export bed` and `export theta` -- use `call` first instead
    - Script `genome2access.py` -- use `cnvkit.py access` instead
    
    Updated commands:
    
    `batch`:
    
    - New option --method, with choices "hybrid" (default), "wgs", "amplicon", to
      simplify/streamline usage with whole-genome or amplicon sequencing protocols.
      See documentation for details; in short, "wgs" and "amplicon" do not use
      antitargets or the edge/density bias correction; "wgs" by default uses the
      sequencing-accessible genome as the targets, and uses a more stringent
      significance threshold for segmentation.
    - Hide/deprecate --split option; it's always on now. To ensure bin coordinates
      do not change between `batch` runs (they generally won't anyway), use the
      -r/--reference option instead of specifying -t and -a in `batch`.
    - Add --drop-low-coverage option, which is passed to `segment` internally.
    - The -p/--processes option is also passed to `coverage` and `segment`
      internally (see below).
    
    `antitarget`:
    
    - Increase the default average bin size from 100kb to 200kb.
    
    `coverage`:
    
    - Parallelize coverage calculation over BED rows. The number of threads can be
      specified with the `-p` option. (#121; thanks @brentp)
    
    `segment`:
    
    - Parallelize CBS and Haar segmentation methods across chromosomes. (#123, #125;
      thanks @brentp)
    
    `call`:
    
    - New --filter option, with choices 'cn', 'ampdel', 'ci', 'sem' implemented.
    - With VCF b-allele frequencies (`-v`, 'baf'), always calculate the
      allele-specific integer copy numbers 'cn1' and 'cn2' so that 'cn1' is the
      larger one. BAF mirror direction stays majority-rules. (#105; thanks @mpschr)
    - If b-allele frequencies are used and total copy number is zero, report allelic
      copy numbers as 0, not NaN.
    
    `scatter`:
    
    - Add --title option.
    - Allow selecting & labeling gene(s) w/ only segments as input
    
    `heatmap`, `scatter`:
    
    - Allow saving plots in any image file format supported by matplotlib, not just
      The file format is determined by the output filename's extension, e.g. 'png'
      saves in PNG format -- making it easier to integrate CNVkit plots with HTML
      reports. (#120; thanks @chapmanb)
    
    `diagram`:
    
    - Add -g/--gender option to specify sample's known gender.
    
    `gainloss`:
    
    - Make output tables more consistent across options. Show individual gene names
      (rather than all genes grouped within a segment in 1 row); don't show rows
      with no gene name; report the segment probe count instead of number of probes
      within the gene; show any extra columns present in the input .cns file. (#107,
      #108; thanks @mpschr)
    
    `gender`:
    
    - Show column headers and Y-chromosome log2 values in the output table.
    
    `segmetrics`:
    
    - Add stats options for mean, median, mode
    - Add MSE, SEM stats as options
    
    `metrics`, `segmetrics`:
    
    - Add --drop-low-coverage option (like in `segment` and `gainloss`)
    
    Internals:
    
    - New sub-package tabio: a more robust I/O framwork unifying support for tabular
      formats, including CNVkit's .cnn/.cnr/.cns, BED, SEG, VCF, GATK/Picard
      interval list, and text coordinates (chr:start:end).
      Base class GenomicArray and its derived classes CopyNumArray and VariantArray
      do not implement their own I/O, but rather are instantiated via tabio.
      The "import-" commands use this as well.
    - Removed rary.RegionArray; all functionality is now in tabio and GenomicArray.
    - New module "descriptives.py" implements descriptive statistics on plain numpy
      arrays or pandas Series instances, independent of CNVkit.
    - Better testing on Travis, covering Python 2.7, 3.4 and 3.5, on both Linux and
      OS X (thanks @kyleabeauchamp, @rmcgibbo, and @mpharrigan; #110)
    
    Bug fixes:
    
    - `batch`: Errors in parallel processes will immediately be raised as exceptions
      at the top level, rather than dying silently. Previously, no error would occur
      until a missing output file was needed later in the pipeline. (#55)
    - `segment`:
    
        - Skip possible R warning text when parsing CBS output (#106) and run
          Rscript with the --vanilla option (#112; thanks @jsmedmar). Non-isolated
          R processes were prone to add various warning messages to the expected SEG
          output, which could crash the "segment" command for some users.
        - Handle zero-weight bins better (#128; thanks @chapmanb).
    
    - `scatter`:
    
        - Handle selected segments with an empty gene name (#104; thanks @mpschr).
        - Don't crash on zero-length GenomicArray/CopyNumArray inputs.
    
    - VCF parsing (now within tabio) improved:
    
        - More robust to missing genotype (GT) & depth (DP) fields (#102)
        - Handle VCFs from MuTect2 (#122)
    
    - `export theta`: don't crash when SNP VCF is a single, unpaired sample, or if
      segmented input (.cns) is empty.
    - `heatmap`: Avoid a possible crash if a sample is missing a chromosome.
    
    Packaging:
    
    - Universal wheels are enabled for installation with pip (via setup.cfg).
    
    New & updated dependencies:
    
    - futures
    - futurize
    - numpy raised to version 1.9
    - pandas raised to version 0.18.1
    - pysam version 0.9.1.1 is specifically excluded
  • v0.7.11
    ceeaa0ff · Bump version to 0.7.11 ·
    Version 0.7.11
    
    New dependency on pyfaidx, a Python library for handling samtools-style
    FASTA indexes (.fai).
    
    export vcf:
    
    - Add CNVkit version and current date (i.e. local calendar date that the
      "cnvkit.py export vcf" command was run) to the VCF header.
    
    export theta:
    
    - Given a VCF of SNVs called jointly in paired tumor and normal samples,
      extract SNP allele counts to THetA2's custom input format
      ("snp_formatted.txt"). The two additional files CNVkit generates this way can
      be used with THetA2's "--TUMOR_SNP" and "--NORMAL_SNP" options to improve
      estimates of tumor purity and clonality.
    - Use CNVkit's segment weights and probe counts to estimate normal-sample read
      counts for each segment if no copy number reference profile (.cnn) or paired
      normal sample (.cnr) is given.
      The command's second argument is now optional and deprecated in favor of the
      "-r"/"--reference" option, which does the same thing.
    
    import-theta:
    
    - Save integer copy number in the "cn" column of the output file(s) (CNVkit's
      .cns format).
    
    call, export nexus-ogt:
    
    - When reading structural variants from a VCF file, interpret the END tag as the
      variant end position, not the length, per the VCF 4.2 specification.
      This bug could cause the b-allele frequencies calculated in `call` and
      `export nexus-ogt` to be erroneously repeated across many consecutive bins.
    
    scatter:
    
    - When loading CNVkit files (in any command), identify and drop rows with "NaN"
      log2 values. (CNVkit never emits these, but they could happen if a user
      generates .cnr files from Illumina CGH array data files using a custom
      script.) The other rows (spread, gc, rmask) can be NaN without a problem, but
      plotting with `scatter` would crash when adjusting the y-axis based on NaN
      log2 values. (#95)
    - Detect & warn if input .cnr/.cns/.vcf is not sorted by genomic coordinates.
      This could happen if the input VCF or manually constructed .cnr/.cns file (not
      generated by CNVkit) was not sorted by genomic coordinates. Then the error
      message was cryptic, because some bins/segments/SNVs are selected successfully
      but plotting would crash when laying out the x-axis coordinates.
    
    Internals & packaging:
    
    - Use the pyfaidx library to extract sequences from a genome FASTA file (used in
      the `reference` command), replacing some custom code in cnvlib. (#73; thanks
      @mdshw5)
    - Documentation updates.
  • v0.7.10
    4da7808d · Bump version to 0.7.10 ·
    Version 0.7.10
    
    diagram:
    
    - Label genes even when given only segments (.cns). Plotting segments alone,
      without bin-level copy ratios (.cnr), can be convenient to produce an
      uncluttered PDF with a smaller file size while retaining most of the important
      CNV information.
    
    scatter:
    
    - For calculating and plotting SNV b-allele frequencies, select the sample of
      interest from the given VCF based on the .cnr/.cns base filename, unless
      specified with `--sample-id`.
    
    export nexus-ogt:
    
    - Use normal-sample BAFs if normal-sample .cnr given.  Previously, it would load
      tumor BAFs (taking the first tumor sample from the PEDIGREE tag) even if the
      properly-named .cnr file was for the normal sample in the VCF.
    - Add --sample-id option to select VCF sample. Useful in case .cnr filename base
      doesn't match the sample IDs in the VCF header.
    - Add filtering options --min-weight, --min-variant-depth.
    
        - The `--min-variant-depth` option works the same as in `scatter -v`,
          filtering SNVs by coverage depth (INFO field DP, usually) for the b-allele
          frequency calculation.
        - The `--min-weight` option allows the user to discard low-weight bins since
          Nexus Copy Number doesn't use CNVKit's weights for its own segmentation
          and could be misled by the noisier log2 ratios in less-reliable bins.  For
          choosing the cutoff value, 0.5 is suitable in our experience, but check
          the distribution of weights in your own data first.
    
    export vcf:
    
    - Add custom VCF "FORMAT" fields: FOLD_CHANGE, FOLD_CHANGE_LOG2, PROBES. (#91;
      thanks @pcingola)
    
    segment:
    
    - The "flasso" method now works again; it was broken for a few releases. (#88; thanks
      @pcingola)
    
    Packaging & internal:
    
    - Add GRCh37 "access" BED file for users' convenience. The `access` command will
      also now raise an error if the chromosome names don't match between the
      "access" and "target" BED files.
    - Work with the latest version of pysam (0.9). (#86)
    - Silence some superfluous warnings from the latest version of pandas (0.18).
    - Documentation updates, including more details on the `call` command.
  • v0.7.9
    9a069c43 · Bump version to 0.7.9 ·
    Version 0.7.9
    
    Bug fixes, most importantly to work around a regression in pysam.
    
    installation:
    
    - Require pysam version earlier than buggy 0.9 (#86)
    
    fix, reference:
    
    - If the majority of target bins have no or very low coverage, warn the user
      about this, skip bias corrections, and mask out the low-coverage target bins
      during centering to ensure the output is still vaguely usable and sane.
      This issue could occur because the wrong target BED was used initially, or
      maybe hybridization failed in library prep.
    
    reference:
    
    - Ensure the output table's columns are ordered correctly. In some cases it was
      possible for the output tables columns to be ordered differently, which still
      works in CNVkit, but is weird.
    
    call, rescale, export:
    
    - Check specified gender more sensibly; on failure, default to female.
      Specifically, use case-insensitive string comparison to test whether the given
      argument means "male". Treating chrX as having neutral ploidy is probably a
      less surprising fallback, especially if the "-y" flag is forgotten elsewhere
      in the pipeline.
  • v0.7.8
    3a4d3a3b · Bump version to 0.7.8 ·
    Version 0.7.8
    
    call:
    
    - Put absolute copy number in a new "cn" column. When rescaling log2 ratios for
      purity, do not round to integer absolute copy number values. (#83)
    - New `-v`/`--vcf` option: Calculate b-allele frequency (BAF) average for each
      segment and output as a new column "baf". Rescale BAFs if `--purity` is
      specified. Then, using BAF and total copy number (CN, the "cn" column), assign
      major and minor allele copy number to each segment and output as new columns
      "cn1" and "cn2". These values can indicate allelic imbalance, including loss
      of heterozygosity (LOH). (#84)
    - New "--center" option that works the same as in "rescale".
    - New method "-m none" to perform any specified transformations (rescaling,
      re-centering, adding b-allele frequencies), but do not call integer copy
      numbers.
    
    rescale:
    
    - Deprecated in favor of "call" with the "-m none" option, which does the same
      thing.
    - If recentering is specified with `--center`, do it before, not after,
      rescaling log2 values for tumor sample purity.
    
    export bed, vcf:
    
    - Take absolute copy number from "cn" column if present (#83)
    
    antitarget:
    
    - Whitelist chromosomes X and Y along with integer chromosome names for
      inclusion as canonical mammalian chromosomes. Keep the fallback to "short"
      chromosome names if no such canonical chromosome names are detected. (#37)
    
    reference:
    
    - Expose bias corrections (GC, RepeatMasker, targeting density) as command-line
      options `--no-gc`, `--no-rmask`, and `--no-edge`, similar to the `fix`
      command. (#80)
    
    Internal:
    
    - VariantArray.read_vcf: somatic mask was the opposite of what it should have
      been, i.e. skip_somatic was skipping germline and retaining only somatic SNVs.
  • v0.7.7
    c13f7c83 · Bump version to 0.7.7 ·
    Version 0.7.7
    
    Small improvements, bugfixes, and documentation updates.
    
    fix:
    
        - Removed the hard filter on RepeatMasker fraction of antitarget bins. This
          filter doesn't appear to improve calling on current benchmarks.
    
        - Drop bins that have very high coverage in the reference, in addition to
          the low-coverage bins already dropped (normalized log2 values outside +/-
          5).
    
        - Ignore very-low-coverage bins when recentering (by default). For
          good-quality samples this doesn't make much difference, but it's safer and
          seems to improve the centering slightly on lower-quality samples.
    
        - Ensure antitarget bin weights are not set to 0 if the majority of target
          bins have no coverage -- this would cause segmentation to fail. (#82)
    
        - Don't crash if antitargets are empty (to support WGS and targeted amplicon
          capture), fixing a regression.
    
    antitarget:
    
        - Keep untargeted contigs that appear to be "canonical" chromosomes. Prefer
          chromosomes with numeric names (autosomes in most mammalian reference
          genomes); but if none of the targeted chromosomes have numeric names, then
          fall back to chromosomes with names no longer than the longest-named
          targeted chromosome. (#37)
    
    batch:
    
        - Disallow input BAMs with duplicate base filenames (#81). Now it will
          trigger an error instead of overwriting some output files.
    
    segment:
    
        - `--drop-outlier` option now masks outliers according to multiples (default
          10x) of the 95'ile, not 90'ile. Benchmarking looks better.
    
    Plots `scatter`, `heatmap`:
    
        - With the "-c/--chromosome" option, handle unbounded ranges (e.g.
          "chr1:100-" or "chr5:-100000") treating the missing start/end of the range
          as the start/end of the specified chromosome.
    
    heatmap:
    
        - A more efficient implementation.  Now, plotting a heatmap of .cnr is
          feasible, and behavior is a bit more consistent (e.g. placement of
          rectangles is more accurate; plotting a selection where only some samples
          have data will still show all samples).
    
        - Don't crash if selection overlaps no segments, e.g. if the selection is a
          centromeric or telomeric region. Previously it would crash with an obscure
          error.
    
    Misc. bugfixes:
    
        - batch: log # parallel processes correctly for "-p 0"
        - import-theta: fix crash; namedtuples are immutable (#77).
        - metrics: require --segments (#79)
        - rescale: fix crash if --purity is not specified
        - VariantArray: Fix VCF parsing if filters are not used.
  • v0.7.6
    84a7823a · Bump version to 0.7.6 ·
    Version 0.7.6
    
    Minor bugfixes and improvements.
    
    scatter:
    
    - Tweaked plot colors for better visibility and accessibility: points are
      slightly darker, and segments are now a deep gold color instead of red.
    
    fix:
    
    - Downweight targets or antitargets proportionally to their relative variability
      of bin log2 values; i.e. if targets are twice twice as variable (by
      interquartile range of bin log2 values) as antitargets, divide all target bin
      weights by 2. This happens after all bias corrections and reference
      normalization, and appears to improve the final segmentation results.
    
    antitarget:
    
    - Don't emit antitargets for untargeted chromosomes with long names, e.g.
      "chr6_apd_hap1" -- these are presumably alternative/unassigned contigs, not
      real canonical chromosomes that deserve to be included for CNV calling.
      But do continue to keep untargeted chromosomes with names up to the length of
      the longest-named targeted chromosome.
      (Improves on #37)
    - Indicate default --min-size in help message.
    
    batch:
    
    - Log the number parallel processes correctly when "-p 0" is used to
      automatically detect the number of CPUs -- previously, this option would print
      on the console that samples were being run in serial, but then launch multiple
      parallel processes.
    
    segment:
    
    - Change --drop-outliers default from 5 to 10, based on performance in
      benchmarking.
    
    Internally:
    
    - Fixed detection of autosomes to be used for re-centering bin log2 values and
      detecting gender.
    - Fixed parsing the GATK/Picard "interval list" file format - strand and name
      were swapped.
  • v0.7.5
    e02b12c8 · Bump version to 0.7.5 ·
    Version 0.7.5
    
    Global speedups, friendlier error handling and miscellaneous bug fixes.
    Documentation updates (thanks @kyleabeauchamp; #67).
    Expanded unit tests & restored continuous integration (TravisCI).
    Raised the minimum pandas version to 0.17.1, the latest.
    
    rescale (new command; #64):
    
    - Adjust .cnr or .cns files for normal contamination or subclone fraction.
    - Re-center log2 values by median (the usual), mode, mean, or biweight location.
    
    segment:
    
    - Detect outlier bins and ignore them during segmentation using a method similar
      to BIC-seq. Command line option: `--drop-outliers`; any outlier bins found
      will be logged.
    
    coverage:
    
    - If the given target BED files is missing the 4th column (gene names), fill in
      the dummy name "-" instead of crashing.
    
    segmetrics:
    
    - Expose alpha and #bootstraps as command-line options
    
    antitarget:
    
    - Reduce default bin size from 150kb to 100kb.
    
    fix:
    
    - Speed improvements: now about 20 times faster on exomes.
    
    API changes:
    
    - Gene names to treat as meaningless and to ignore in reporting (by default "-",
      ".", "CGH") can be globally configured in params.py
      (params.IGNORE_GENE_NAMES).
    - vary.VariantArray (used in `scatter`) can now parse VCF files with no samples
      (genotypes).
  • v0.7.4
    5a0c9c93 · Bump version to 0.7.4 ·
    Version 0.7.4
    
    This is primarily a bugfix release.
    
    heatmap:
    
    - Sub-chromosomal regions can now be selected for display with the `-c` option,
      e.g. `-c chr7:125000000-145000000`, just like the same option in `scatter`.
    
    segment:
    
    - Fix the listing of gene names in each segment in the output .cns file.
      Previously, briefly, each gene's name was truncated to 1 character.
    
    export:
    
    - `bed --show variant` now filters CNAs on sex chromosomes correctly, taking
      reference and sample genders into account.
    - `nexus-ogt` format now emits BAFs more similar to the original VCF allele
      frequencies. Previously, if multiple SNVs fell into a single CNVkit genomic
      bin, the allele frequencies of those SNVs would all be "mirrored" above 0.5
      before taking the median. Now the SNVs are mirrored in the direction of the
      majority of the SNVs in the bin, whether above or below 0.5, so that the
      output looks more balanced and low-frequency SNVs are more apparent.
  • v0.7.3
    828ff64d · Set version to 0.7.3 ·
    Version 0.7.3
    
    access:
    
    - New command equivalent to the now-deprecated `genome2access.py` script.
    
    target, antitarget:
    
    - Always write output files in 4-column BED format.
    
    scatter:
    
    - Copy ratios (.cnr) are no longer required. Without this input file, behavior
      is similar to the now-deprecated `loh` command, but still more flexible.
    - VCF input file can include multiple tumor samples and PEDIGREE tags; if a
      tumor sample ID is specified, all PEDIGREE tags will be checked to find the
      matching normal sample.
    - VCFs processed by CLC Genomics Server are now parsed correctly.
    
    loh:
    
    - Deprecated. Use `scatter` with `-v` and no .cnr file instead.
    
    segment:
    
    - Preliminary support for segmenting SNP allele frequencies from a VCF in
      addition to total copy number (`-v` option). Details are likely to change in
      a later release. (#34)
    - In the `weight` column of the output file, values are now the sum, not the
      mean, of the weights of the probes covered by that segment.
    - The `haar` segmentation method is improved to avoid duplicate breakpoints and
      run much faster.
    
    export bed:
    
    - Deprecate `--show-all` in favor of `--show` with possible arguments `all`
      (like --show-all), `ploidy` (default behavior), or `variant` (show the same
      regions as export vcf).
    
    export vcf:
    
    - Fix a typo in the SVLEN tag definition in the VCF header -- Number should be
      1, not -1 which caused GATK parsing to fail. (#57; thanks @chapmanb)
    
    Python library `cnvlib`:
    
    - Logging is now done with the Python standard library's `logging` module,
      making it easier to silence or redirect status messages. In particular, unit
      tests run more quietly. (#52)
    - Internal refactoring (including new features in GenomicArray, RegionArray,
      VariantArray) resulting in changes to the `cnvlib` API , as well as some
      performance improvements.
  • v0.7.2
    b6a7a55f · Version 0.7.2 ·
    Version 0.7.2
    
    A variety of mostly minor improvements and bug fixes over v0.7.1.
    
    segment, gainloss, segmetrics:
    
        - Don't exclude very-low-coverage bins from calculations by default;
          instead, expose this option as `--drop-low-coverage`. (This option
          usually helps on tumor samples with some normal contamination, but leads
          to problems on germline samples with homozygous deletions.)
    
    segment:
    
        - Output .cns files now have a "weight" column which is the mean of the
          weights of the bins it covers.
        - Output of the 'haar' segmentation method now has each segment's gene
          names listed, as with the other methods.
        - Fixed a bug where every segment's probe count (the "probes" column) could
          be overwritten with the `_` character. (#53; thanks @chapmanb)
    
    segmetrics:
    
        - Each statistic is now printed in its own column, instead of squeezing all
          stats into the "gene" column. The confidence/prediction interval stats
          get two columns, `_lo` and `_hi` (lower and upper bound).
    
    loh, scatter:
    
        - Given a VCF called on a tumor-normal pair, use the paired normal to
          select appropriate germline SNPs for plotting.
    
    export:
    
        - New format "nexus-ogt" combines bin-level copy number ratios with
          b-allele frequencies given a VCF and a .cnr file. This replaces
          "nexus-basic" with the `-v` option that was introduced in v0.7.1;
          "nexus-ogt" stores the same info but can be viewed in BioDiscovery Nexus
          Copy Number without any special configuration (load it as the
          "Custom-OGT" data format).
        - Renamed `bed` option `--show-neutral` to `--show-all`.
        - `vcf` option `-g`/`--gender` now works properly for identifying CNVs on
          sex chromosomes.
    
    call:
    
        - Fixed the `threshold` method to calculate absolute copy number on sex
          chromosomes correctly (#49; thanks @tskir).
  • v0.7.1
    9f118335 · Bump version to 0.7.1 ·
    Version 0.7.1
    
    This is primarily a bugfix release. Many more unit test cases were added to the
    automated test suite. Code coverage is now monitored (thanks @stevepeak) at:
    https://codecov.io/github/etal/cnvkit/commits
    
    export nexus-basic:
    
        - New optional argument "-v"/"--vcf" extracts SNV b-allele frequencies from
          the given VCF file, matches them to the bins in the .cnr file, and prints
          an additional "baf" column in the output table. These allele frequencies
          can then be viewed in Nexus Copy Number, similar to a SNP array.
    
    call:
    
        - Fixed a bug in the "threshold" method where the copy number of haploid
          chromosomes was twice what it should be. The "clonal" method already
          handled these chromosomes properly. (#49)
    
    reference:
    
        - Handle blank/empty antitarget BED and coverage (.cnn) files. This was a
          regression from earlier releases in v0.7.0. (#51)
    
    fix:
    
        - Catch duplicated target ranges, e.g. the exact same bait labeled with two
          different gene names, and report those ranges in the error message. The
          "target" command's "--split" option should usually fix these, but
          sometimes it's not used.
    
    faidx:
    
        - Catch invalid ranges that extend beyond the length of the chromosome and
          raise an informative error. This would error before, too, but the message
          would be baffling.
  • v0.7.0
    a19c48ab · Bump version to 0.7.0 ·
    Version 0.7.0
    
    CNVkit now depends on pandas, scipy and pyvcf. The internals were largely
    rewritten, so please report any bugs or other regressions you find.
    
    Documentation is much improved.
    
    export:
        - VCF format is supported (#5, #41). The generated VCFs are compatible with
          many third-party tools, including development versions of MetaSV. (Thanks
          @chapmanb)
        - Removed the "freebayes" sub-command; use "export bed" instead.
    
    segment:
        - The names of genes (or other targeted loci) covered by each segment are
          now included in the output .cns file.
        - The p-value or q-value threshold (depending on the method) can now be
          specified with -t/--threshold.
        - The "haar" method works properly now (#6). This segmentation algorithm is
          implemented in Python and does not require R to run. It is a bit faster
          than CBS, but not as accurate.
    
    loh:
        - Plot variant allele frequencies (VAFs) as their actual values, 0 to 1,
          instead of the mirrored b-allele frequency (0.5 to 1). Draw segment mean
          allele frequencies separately above and below 0.5. This matches how the
          equivalent SNP array data are typically viewed.
    
    antitarget:
        - Generate off-target bins for all chromosomes present in the "access" BED
          file, not just those where targeted regions occur. (#37)
    
    coverage:
        - A minimum read mapping quality (MAPQ) value can now be specified with
          -q/--min-mapq. The default value is 0, i.e. reads are no longer excluded
          for low MAPQ or ambiguous mapping location. This should generally improve
          calling accuracy and avoid some spurious deletion calls.
  • v0.6.1
    d2f17561 · Bump version to 0.6.1 ·
    Version 0.6.1
    
    Small fixes in segmentation, affecting the output of `segment` and preventing
    crashes in `segmetrics`:
    
    - Exclude fewer low-coverage bins from segmentation (using a lower minimum
      coverage threshold).
    - In case the first or last bins on a chromosome were excluded from
      segmentation, adjust the first and last segments on each chromosome so that
      their endpoints match the first and last bins.
    - If no bins on a chromosome passed the coverage filter, instead of omitting
      the chromosome from segmentation output, generate a single segment covering
      the full chromosome, with segment log2 ratio 0.0. (So, all chromosomes in the
      .cnr file will be present in the .cns file, too.)
  • v0.6.0
    7f3521e5 · Bump version to 0.6.0 ·
    Version v0.6.0
    
    Added two new commands, `call` and `segmetrics`, and a new `export` format, BED.
    
    `segmetrics`::
    
    - Calculates summary statistics of the residual bin-level log2 ratio estimates
      from the segment means, similar to the existing `metrics` command, but for
      each segment individually. Results are output in the same format as the CNVkit
      segmentation file (.cns), with the stat names and calculated values printed in
      the "gene" column.
    - Supported stats: standard deviation, median absolute deviation, inter-quartile
      range, Tukey's biweight midvariance (as in `metrics`); also confidence
      interval, estimated by bootstrap; and prediction interval, estimated by the
      range between the 2.5-97.5 percentiles of bin-level log2 ratio values within
      the segment.
    - Thanks to @mjafin for suggesting this feature (#28).
    
    `call`::
    
    - Given segmented log2 ratio estimates (.cns file), round the copy ratio
      estimates to integer values using either:
    
        - A list of threshold log2 values for each copy number state, or
        - Some algebra, given known tumor cell fraction and normal ploidy.
    
    - The output is another .cns file, where the values in the `log2` column are
      still log2-transformed, but represent integers in log2 scale. E.g. neutral
      diploid state is represented as "0.0", not the integer 2. These output files
      are still compatible with the other CNVkit commands that accept .cns files.
    - These calculations were previously done by the `export freebayes` command.
      That command is deprecated but still available in this release; it will be
      removed in the next release. The recommended approach is to instead run `call`
      first on each .cns file, and then `export bed` on all the adjusted .cns files
      to get an equivalent BED file compatible with FreeBayes `--cnv-map` option.
    
    `export bed`:
    
    - New format supporting the same features as `export freebayes` that were not
      moved into the `call` command (see above). The output BED file is still
      compatible with the FreeBayes `--cnv-map` option.
    - New option `--show-neutral` to also output neutral-CN segments/regions, in
      addition to the CNV regions output by default.
    
    Smaller changes:
    
    - `gainloss`: Reduced the default log2 ratio threshold from .5 to .2
    - `import-picard`: Use the un-normalized mean coverage instead of the normalized
      coverage of each target as the log2 coverage values in the output .cnn file.
      This matches the output of the `coverage` command; CNVkit normalizes coverages
      later in the pipeline.
    - Some internal refactoring. Please report any bugs, real or perceived, on
      our GitHub issue tracker.
  • v0.5.2
    96a33fd3 · Bump version to 0.5.2 ·
  • v0.5.1
    9a63f26e · Bumped version to 0.5.1 ·
  • v0.5.0
    1c362a84 · Bumped version to 0.5.0 ·
    Version 0.5.0:
    
    This release includes a variety of improvements to CNVkit's calling accuracy
    and robustness. All CNVkit files built with previous versions will continue to
    work with this version, but for best results, I recommend rebuilding your
    reference.cnn file(s) from the targetcoverage.cnn and antitargetcoverage.cnn
    files.
    
    `coverage`:
        - Output target/antitarget coverage (.cnn) files are no longer
          median-centered. Read depths in each bin are still log2-scaled, but the
          observed read depth can now be easily recovered from .cnn files.
    
    `reference`, `fix`:
        - Include a "flat pseudocount" in addition to the given normals, making
          paired tumor-normal calling much more robust and accurate.
        - Perform bias corrections on the input normal samples before calculating
          the average and spread of log2 values.
    
    `fix`:
        - Do bias corrections before subtracting the reference, instead of after,
          because the reference already includes bias corrections now.
        - In addition to weighting bins by spread (which can only be observed with
          a pooled reference), also weight by bin size and deviation of reference
          log2 values in each bin from the global median. So, useful bin weights
          are now derived from "flat" and single-normal-sample references, too.
    
    `segment`:
        - Recalculate CBS segment means using bin weights (in the R library this
          simply the mean, arguably a bug).
        - Set CBS segment start/end positions to match the underlying bin start/end
          positions.
        - Improved centromere detection -- only exclude one "large gap", if any,
          from each chromosome.
        - Tuned CBS calling parameters to improve accuracy (see benchmarks in the
          repo etal/cnvkit-examples).
    
    `diagram`:
        - Label genes using the same criteria as the `gainloss` command: if
          segments are given, use the segment value at each gene, otherwise
          calculate the weighted average of bin-level log2 values within each gene.
        - New option -m/--min-probes to match `gainloss`.
        - Guess gender from chrX more reliably, so that the same gender is called
          from the bin-level (.cnr) and segmented (.cns) values given.
    
    `scatter`, `loh`:
        - When plotting allele frequencies from a VCF, if segments are given
          (.cns), also apply those segments to allele frequencies to show LOH
          regions that match CNVs.
        - Skip somatic variants identified in a VCF, and try to retain only
          germline variants, when plotting LOH. (This is not very well standardized
          across callers, so please watch for bad behavior from callers other than
          FreeBayes and MuTect, and let me know about it!)
        - `scatter` only: Added options `--y-min`, `--y-max` to set y-axis limits
          on the plot.
        - Removed the deprecated `-r` option. Use `-c` instead.
    
    The long-deprecated `cbs` command has been removed. Use `segment` instead.
    
    Bugs in parsing and writing empty and 1-line VCF, BED and CNVkit files, and
    other VCF quirks, have now been fixed (Thanks @chapmanb!)
  • v0.4.1
    e49c99fd · Bumped version to 0.4.1 ·
    Version 0.4.1
    
    New features:
    
    - `scatter` command:
        Option -c can now take coordinate ranges like -r, so -r is deprecated and
        will be removed in the next release.
    
    - `genome2access.py` script:
        New -x option to exclude additional regions. Added a new file
        "data/access-5k-mappable.hg19.bed" which used this option to exclude the
        Encode "Duke" and "Dac" low-mappability regions.
    
    Also:
    
    - Improved the help/usage messages for several commands. Added a "version"
      command that prints the current CNVkit version. (Thanks @HenrikBengtsson)
    - Tuned CBS calling parameters to improve segmentation accuracy according to
      some benchmarks.
    - Sped up a few slow functions identified by profiling. In particular,
      `metrics` is much faster now.
    - Fixed bugs/incompatibilities in plotting commands and cleaned up the source
      code (Thanks @chapmanb and @roryk)
    
    CNVkit can now be obtained and run as a Docker container:
    https://registry.hub.docker.com/u/etal/cnvkit/
  • v0.4.0
    2b2f09d6 · Bumped version to 0.4.0 ·
    Version 0.4.0:
    
    - Now safely operates without off-target bins (i.e. empty "antitargets"), so
      CNVkit can be used on WGS and amplicon capture datasets.
    - New options in the "scatter" and "loh" plots, including contributions by
      @chapmanb and @roryk.
    - Bug fixes in export and plotting commands, among others.
    - Substantially improved documentation.