OncoDiversity

OncoDiversity.CDR3SeqDataType
CDR3SeqData{T,U}

store two data tables that make up a CDR3 Dataset, the per (patient, sequence) count and the physiochemisty data for each sequence. These tables can be joined together usind DataFrame(ds::CDR3SeqData).

source
OncoDiversity.DiversityScoresType
DiversityScores(countDF::DataFrame, rangeQ::AbstractVector, samplecol=:patient)

Computes point estimates of diversity from a countframe DataFrame, range of q values, and optional sample column identifier.

Point estimates of diversity of interest:

  • low q diversity (q=0.01)
  • high q diversity (q=100)
  • delta qD diversity = low q diversity - high q diversity
  • inflection point q
  • magnitude of slope at the inflection point
  • diversity qD evaluated at q=inflection point q
  • Shannon diversity (log qD at q=1)
  • Simpson diversity (1/qD at q=2)

Input: DataFrame of counts, vector of q values, optional sample column symbol (defaults to :patient) Output: DiversityScores struct

source
OncoDiversity.calcIPdiversityMethod
calcIPdiversity(df::DataFrame, qrange::Array)

Computes diversity over a range of q values (qrange) and an input DataFrame of frequencies across types, then numerically determines the inflection point and the magnitude of the slope at that point, and returns a DataFrame with a column for patient name, a column for q at which the inflection point occurs, and a column for the magnitude of the slope at the inflection point.

Input: DataFrame and array of q values Output: DataFrame of inflection point q and slope diversity per patient

source
OncoDiversity.calcOverQMethod
calcOverQ(origDF::DataFrame,rangeQ::Array{Float64})

Computes the diversity score over a range of q values. Returns a DataFrame where each column is a different q value and each row is a different patient.

Input: DataFrame of frequencies per patient and range of q Output: DataFrame of diversity scores across the range of q per patient

source
OncoDiversity.countframeMethod
countframe(a::CDR3SeqAnalysis, typeDF::DataFrame)

Counts the number of occurances of CDR3 by patients, after checking that patient and CDR3 columns are present in the DataFrame, and returns as a DataFrame of the counts.

Input: CDR3SeqAnalysis struct and DataFrame of CDR3 recoveries Output: DataFrame

source
OncoDiversity.countframefullMethod
countframefull(a::CDR3SeqAnalysis, df::DataFrame)

Merges df, a DataFrame of counts, with all the physiochemical information associated with the CDR3 sequences.

Input: CDR3SeqAnalysis struc and DataFrame Output: DataFrame

source
OncoDiversity.diversityMethod
diversity(seq, q)

Computes the generalized diversity index for a given value of q and the associated frequencies of the dataset.

\[ ^qD = \left( \sum_i^n (p_i^q) \right)^{1/(1-q)}\]

where p_i are the frequency across types and q is the order of diversity and you are summing over all types present in the dataset.

source
OncoDiversity.filteredDatasetMethod
filteredDataset(::Type{CDR3SeqData}, df::DataFrame, receptorname::Vector{String})

Subsets the dataset based on receptor types of interest, allowing users to subset based on combintations of types.

Input: CDR3SeqData type, a DataFrame of counts, and a string that matches the types of the dataset Output: DataFrame

Example: df_filtered = filteredDataset(CDR3SeqData, df, ["TRA","TRB"])

source
OncoDiversity.filteredDatasetMethod
filteredDataset(::Type{CDR3SeqData}, df::DataFrame, receptorname::String)

Subsets the dataset based on a receptor type of interest

Input: CDR3SeqData type, a DataFrame of counts, and a string that matches the types of the dataset Output: DataFrame

Example: df_filtered = filteredDataset(CDR3SeqData, df, "TRA")

source
OncoDiversity.patientdiversityMethod
patientdiversity(countframe::DataFrame,q::Real)

Computes a DataFrame of diversity scores across patients from a countframe DataFrame and a single value of q.

Input: DataFrame and a real number Output: DataFrame

source
OncoDiversity.reportInflectionMethod
reportInflection(q::Array, qD::Array)

Computes the inflection point and the magnitude of the slope at that point using findInflection and findInflectionLocal functions.

source
OncoDiversity.requirecolMethod
requirecol(df::DataFrame, colname::AbstractString)

Raises an assertion error if colname is not a column of DataFrame.

source
OncoDiversity.summarizeMethod
summarize(df::CDR3SeqData)

Computes a summary of the CDR3 sequences recovered in the dataset per patient including the total number of recoveries, minimum number on recoveries per sequence, maximum number of recoveries per sequence, average number of recoveries per sequence, median number of recoveries per sequence, and number of unique CDR3 recoveries.

Input: CDR3SeqData struct Output: DataFrame

source