CorrDCA
CorrDCA.compute_weightsCorrDCA.compute_weightsCorrDCA.covariance_matrixCorrDCA.read_fasta_alignmentCorrDCA.remove_duplicate_sequences
CorrDCA.compute_weights — Methodcompute_weights(Z::Matrix{Ti}, theta::Real) where Ti <: IntegerCompute the normalized counts of the number of sequences at hamming distance ≤ theta from any given sequence in Z.
CorrDCA.compute_weights — Methodcompute_weights(Z::Matrix{Ti}, theta::Symbol) where Ti<:IntegerCompute the normalized counts of the number of sequences at hamming distance ≤ of a precomputed optimal threshold. See Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0092721] and in particular the Supplementary Information at section 2 Reweighting Scheme for details.
CorrDCA.covariance_matrix — Methodcovariance_matrix(Z, W; pc::Real=0)Compute the covariance matrix from numerical alignment Z and weights W. The output is a N(q-1) × N(q-1) (skipping color q) symmetric matrix.
Keywords arguments:
pcin [0,1]: pseudocount [default =0]
end
CorrDCA.read_fasta_alignment — Methodread_fasta_alignment(filename::AbstractString, max_gap_fraction::Real)Return a L × M matrix of integers (L is the sequence length, and M is the number of sequences) of the multiple sequence alignment contained in the fasta file filename including all sequences with a fraction of gaps (-) ≤ max_gap_fraction.
CorrDCA.remove_duplicate_sequences — Methodremove_duplicate_sequences(Z::Matrix{Ti}) where Ti<:IntegerRemove duplicate sequences (columns) in the alignment matrix Z
Examples
julia> Z = [1 2 3 1;
1 3 2 1;]
2×4 Array{Int64,2}:
1 2 3 1
1 3 2 1
julia> remove_duplicate_sequences(Z)
removing duplicate sequences... done: 4 -> 3
([1 2 3; 1 3 2], [1, 2, 3])