Analysis Methods

Non-statistical Analysis Method

cellphonedb.src.core.methods.cpdb_analysis_method.call(cpdb_file_path: str | None = None, meta_file_path: str | None = None, counts_file_path: str | None = None, counts_data: str | None = None, output_path: str | None = None, microenvs_file_path: str | None = None, separator: str = '|', threshold: float = 0.1, result_precision: int = 3, debug: bool = False, output_suffix: str | None = None, score_interactions: bool = False, threads: int = 4) → dict[source]

Non-statistical method for analysis

This methods calculates the mean and percent for the cluster interactions and for each gene interaction. No shuffling nor DEGs are involved.

Parameters:

cpdb_file_path (str) – CellphoneDB database file path
meta_file_path (str) – Path to metadata csv file
counts_file_path (str) – Path to counts csv file
counts_data (str) – Type of gene identifiers in the counts data: “ensembl”, “gene_name”, “hgnc_symbol”
output_path (str) – Output path used to store the analysis results (and to store intermediate files when debugging)
microenvs_file_path (str, optional) – Path to Micro-environment file. Its content is used to limit cluster interactions
separator (str) – Separator for pairs of genes (gene1|gene2) and clusters (cluster1|cluster2).
threshold (float) – Percentage of cells expressing the specific ligand/receptor [0.0 - 1.0]
result_precision (int) – Number of decimal digits in results.
debug (bool) – Storge intermediate data as pickle file (debug_intermediate.pkl).
output_suffix (str, optional) – Suffix to append to the result file’s name (if not provided, timestamp will be used)
score_interactions (bool) – If True, CellphoneDB interactions will be scored per cell type pair, and returned in interaction_scores_dict
threads (int) – Number of threads to be used when scoring interactions

Returns:

means_result
deconvoluted_result
deconvoluted_percents
interaction_scores_dict

Return type:

Dict with the following keys

Statistical Analysis Method

cellphonedb.src.core.methods.cpdb_statistical_analysis_method.call(cpdb_file_path: str | None = None, meta_file_path: str | None = None, counts_file_path: str | None = None, counts_data: str | None = None, output_path: str | None = None, microenvs_file_path: str | None = None, active_tfs_file_path: str | None = None, iterations: int = 1000, threshold: float = 0.1, threads: int = 4, debug_seed: int = -1, result_precision: int = 3, pvalue: float = 0.05, subsampling=False, subsampling_log=False, subsampling_num_pc=100, subsampling_num_cells=None, separator: str = '|', debug: bool = False, output_suffix: str | None = None, score_interactions: bool = False) → dict[source]

Statistical method for analysis

This method calculates the mean and percent for the cluster interactions and for each gene interaction. No shuffling nor DEGs are involved.

cpdb_file_path: str
CellphoneDB database file path

meta_file_path: str
Path to metadata csv file

counts_file_path: str
Path to counts csv file

counts_data: str
Type of gene identifiers in the counts data: “ensembl”, “gene_name”, “hgnc_symbol”

output_path: str
Output path used to store the analysis results (and to store intermediate files when debugging)

microenvs_file_path: str, optional
Path to Micro-environment file. Its content is used to limit cluster interactions

active_tfs_file_path: str, optional
Path to active TFs. Its content is used to limit cluster interactions.

iterations: int
Number of times cell type labels will be shuffled across cells in order to determine statistically significant ligand/receptor expression means.

threshold: float
Percentage of cells expressing the specific ligand/receptor [0.0 - 1.0]

threads: int
Number of threads to be used during the shuffling of clusters/cell types across cells, and in scoring interactions - if score_interactions argument was set to True

debug_seed: int
This parameter is used for testing only (and only in single-threaded mode only - see: https://stackoverflow.com/questions/21494489/what-does-numpy-random-seed0-do).

result_precision: int
Number of decimal digits in results.

pvalue: float
A p-value below which a ligand/receptor expression mean is considered to be statistically significant.

subsampling: bool
Enable subsampling

subsampling_log: bool,
Enable subsampling log1p for non log-transformed data inputs !!mandatory!!

subsampling_num_pc: int,
Subsampling NumPC argument (number of PCs to use) [100]

subsampling_num_cells: int
Number of cells to subsample to [1/3 of cells]

separator: str
Separator for pairs of genes (gene1|gene2) and clusters (cluster1|cluster2).

debug: bool
Storge intermediate data as pickle file (debug_intermediate.pkl).

output_suffix: str, optional
Suffix to append to the result file’s name (if not provided, timestamp will be used)

score_interactions: bool

If True, CellphoneDB interactions will be scored per cell type pair, and returned in interaction_scores_dict

Dict with the following keys:

deconvoluted
deconvoluted_percents,
means
pvalues
significant_means
interaction_scores

Differentially Expressed Genes (DEGs) Analysis Method

cellphonedb.src.core.methods.cpdb_degs_analysis_method.call(cpdb_file_path: str | None = None, meta_file_path: str | None = None, counts_file_path: str | None = None, degs_file_path: str | None = None, counts_data: str | None = None, output_path: str | None = None, microenvs_file_path: str | None = None, active_tfs_file_path: str | None = None, separator: str = '|', threshold: float = 0.1, result_precision: int = 3, debug: bool = False, output_suffix: str | None = None, score_interactions: bool = False, threads: int = 4) → dict[source]

Differentially Expressed Genes (DEGs) analysis

This analysis bypass previous statistical analysis where mean’s pvalues are computed using a permutation approach. Instead of deriving pvalues from a re-shufling strategy to identify relevant means (aka, mean expression of ligand/receptor in a cell-cell pair), relevant interactions are identified from a list of differentially expressed genes (DEGs) provided by the user and computed from their counts matrix.

Parameters:

cpdb_file_path (str) – CellphoneDB database file path
meta_file_path (str) – Path to metadata csv file
counts_file_path (str) – Path to counts csv file
degs_file_path (str) – Path to differential expression csv file
counts_data (str) – Type of gene identifiers in the counts data: “ensembl”, “gene_name”, “hgnc_symbol”
output_path (str) – Output path used to store the analysis results (and to store intermediate files when debugging)
microenvs_file_path (str, optional) – Path to Micro-environment file. Its content is used to limit cluster interactions
active_tfs_file_path (str, optional) – Path to active TFs. Its content is used to limit cluster interactions.
separator (str, optional) – Separator for pairs of genes (gene1|gene2) and clusters (cluster1|cluster2).
threshold (float, optional) – Percentage of cells expressing the specific ligand/receptor [0.0 - 1.0]
result_precision (int, optional) – Number of decimal digits in results.
debug (bool, optional) – Storge intermediate data as pickle file (debug_intermediate.pkl).
output_suffix (str, optional) – Suffix to append to the result file’s name (if not provided, timestamp will be used)
score_interactions (bool) – If True, CellphoneDB interactions will be scored per cell type pair, and returned in interaction_scores_dict
threads (int) – Number of threads to be used when scoring interactions

Returns:

deconvoluted_result
deconvoluted_percents
means_result
relevant_interactions_result
significant_means
interaction_scores_dict

Return type:

Dict