CorrDCA
CorrDCA.compute_weights
CorrDCA.compute_weights
CorrDCA.covariance_matrix
CorrDCA.read_fasta_alignment
CorrDCA.remove_duplicate_sequences
CorrDCA.compute_weights
— Methodcompute_weights(Z::Matrix{Ti}, theta::Real) where Ti <: Integer
Compute the normalized counts of the number of sequences at hamming distance ≤ theta
from any given sequence in Z
.
CorrDCA.compute_weights
— Methodcompute_weights(Z::Matrix{Ti}, theta::Symbol) where Ti<:Integer
Compute the normalized counts of the number of sequences at hamming distance ≤ of a precomputed optimal threshold. See Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners
[https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0092721] and in particular the Supplementary Information at section 2 Reweighting Scheme
for details.
CorrDCA.covariance_matrix
— Methodcovariance_matrix(Z, W; pc::Real=0)
Compute the covariance matrix from numerical alignment Z
and weights W
. The output is a N(q-1) × N(q-1)
(skipping color q
) symmetric matrix.
Keywords arguments:
pc
in [0,1]: pseudocount [default =0
]
end
CorrDCA.read_fasta_alignment
— Methodread_fasta_alignment(filename::AbstractString, max_gap_fraction::Real)
Return a L × M
matrix of integers (L
is the sequence length, and M
is the number of sequences) of the multiple sequence alignment contained in the fasta file filename
including all sequences with a fraction of gaps (-
) ≤ max_gap_fraction
.
CorrDCA.remove_duplicate_sequences
— Methodremove_duplicate_sequences(Z::Matrix{Ti}) where Ti<:Integer
Remove duplicate sequences (columns) in the alignment matrix Z
Examples
julia> Z = [1 2 3 1;
1 3 2 1;]
2×4 Array{Int64,2}:
1 2 3 1
1 3 2 1
julia> remove_duplicate_sequences(Z)
removing duplicate sequences... done: 4 -> 3
([1 2 3; 1 3 2], [1, 2, 3])