OncoDiversity
OncoDiversity.CDR3SeqDataOncoDiversity.DiversityScoresOncoDiversity.DiversityScoresOncoDiversity.calcIPdiversityOncoDiversity.calcOverQOncoDiversity.countframeOncoDiversity.countframefullOncoDiversity.diversityOncoDiversity.filteredDatasetOncoDiversity.filteredDatasetOncoDiversity.findInflectionOncoDiversity.patientdiversityOncoDiversity.reportInflectionOncoDiversity.requirecolOncoDiversity.summarize
OncoDiversity.CDR3SeqData — TypeCDR3SeqData{T,U}store two data tables that make up a CDR3 Dataset, the per (patient, sequence) count and the physiochemisty data for each sequence. These tables can be joined together usind DataFrame(ds::CDR3SeqData).
OncoDiversity.DiversityScores — TypeDiversityScores{T}Stores a dataframe of diversity scores along with the range of qs that it was computed over.
OncoDiversity.DiversityScores — TypeDiversityScores(countDF::DataFrame, rangeQ::AbstractVector, samplecol=:patient)Computes point estimates of diversity from a countframe DataFrame, range of q values, and optional sample column identifier.
Point estimates of diversity of interest:
- low q diversity (q=0.01)
- high q diversity (q=100)
- delta qD diversity = low q diversity - high q diversity
- inflection point q
- magnitude of slope at the inflection point
- diversity qD evaluated at q=inflection point q
- Shannon diversity (log qD at q=1)
- Simpson diversity (1/qD at q=2)
Input: DataFrame of counts, vector of q values, optional sample column symbol (defaults to :patient) Output: DiversityScores struct
OncoDiversity.calcIPdiversity — MethodcalcIPdiversity(df::DataFrame, qrange::Array)Computes diversity over a range of q values (qrange) and an input DataFrame of frequencies across types, then numerically determines the inflection point and the magnitude of the slope at that point, and returns a DataFrame with a column for patient name, a column for q at which the inflection point occurs, and a column for the magnitude of the slope at the inflection point.
Input: DataFrame and array of q values Output: DataFrame of inflection point q and slope diversity per patient
OncoDiversity.calcOverQ — MethodcalcOverQ(origDF::DataFrame,rangeQ::Array{Float64})Computes the diversity score over a range of q values. Returns a DataFrame where each column is a different q value and each row is a different patient.
Input: DataFrame of frequencies per patient and range of q Output: DataFrame of diversity scores across the range of q per patient
OncoDiversity.countframe — Methodcountframe(a::CDR3SeqAnalysis, typeDF::DataFrame)Counts the number of occurances of CDR3 by patients, after checking that patient and CDR3 columns are present in the DataFrame, and returns as a DataFrame of the counts.
Input: CDR3SeqAnalysis struct and DataFrame of CDR3 recoveries Output: DataFrame
OncoDiversity.countframefull — Methodcountframefull(a::CDR3SeqAnalysis, df::DataFrame)Merges df, a DataFrame of counts, with all the physiochemical information associated with the CDR3 sequences.
Input: CDR3SeqAnalysis struc and DataFrame Output: DataFrame
OncoDiversity.diversity — Methoddiversity(seq, q)Computes the generalized diversity index for a given value of q and the associated frequencies of the dataset.
\[ ^qD = \left( \sum_i^n (p_i^q) \right)^{1/(1-q)}\]
where p_i are the frequency across types and q is the order of diversity and you are summing over all types present in the dataset.
OncoDiversity.filteredDataset — MethodfilteredDataset(::Type{CDR3SeqData}, df::DataFrame, receptorname::Vector{String})Subsets the dataset based on receptor types of interest, allowing users to subset based on combintations of types.
Input: CDR3SeqData type, a DataFrame of counts, and a string that matches the types of the dataset Output: DataFrame
Example: df_filtered = filteredDataset(CDR3SeqData, df, ["TRA","TRB"])
OncoDiversity.filteredDataset — MethodfilteredDataset(::Type{CDR3SeqData}, df::DataFrame, receptorname::String)Subsets the dataset based on a receptor type of interest
Input: CDR3SeqData type, a DataFrame of counts, and a string that matches the types of the dataset Output: DataFrame
Example: df_filtered = filteredDataset(CDR3SeqData, df, "TRA")
OncoDiversity.findInflection — MethodfindInflection(q::Array, qD::Array)Computes the inflection point of a sequence numerically.
OncoDiversity.patientdiversity — Methodpatientdiversity(countframe::DataFrame,q::Real)Computes a DataFrame of diversity scores across patients from a countframe DataFrame and a single value of q.
Input: DataFrame and a real number Output: DataFrame
OncoDiversity.reportInflection — MethodreportInflection(q::Array, qD::Array)Computes the inflection point and the magnitude of the slope at that point using findInflection and findInflectionLocal functions.
OncoDiversity.requirecol — Methodrequirecol(df::DataFrame, colname::AbstractString)Raises an assertion error if colname is not a column of DataFrame.
OncoDiversity.summarize — Methodsummarize(df::CDR3SeqData)Computes a summary of the CDR3 sequences recovered in the dataset per patient including the total number of recoveries, minimum number on recoveries per sequence, maximum number of recoveries per sequence, average number of recoveries per sequence, median number of recoveries per sequence, and number of unique CDR3 recoveries.
Input: CDR3SeqData struct Output: DataFrame