Analysis Methods

Non-statistical Analysis Method

cellphonedb.src.core.methods.cpdb_analysis_method.call(cpdb_file_path: str | None = None, meta_file_path: str | None = None, counts_file_path: str | None = None, counts_data: str | None = None, output_path: str | None = None, microenvs_file_path: str | None = None, separator: str = '|', threshold: float = 0.1, result_precision: int = 3, debug: bool = False, output_suffix: str | None = None, score_interactions: bool = False, threads: int = 4) dict[source]

Non-statistical method for analysis

This methods calculates the mean and percent for the cluster interactions and for each gene interaction. No shuffling nor DEGs are involved.

Parameters:
  • cpdb_file_path (str) – CellphoneDB database file path

  • meta_file_path (str) – Path to metadata csv file

  • counts_file_path (str) – Path to counts csv file

  • counts_data (str) – Type of gene identifiers in the counts data: “ensembl”, “gene_name”, “hgnc_symbol”

  • output_path (str) – Output path used to store the analysis results (and to store intermediate files when debugging)

  • microenvs_file_path (str, optional) – Path to Micro-environment file. Its content is used to limit cluster interactions

  • separator (str) – Separator for pairs of genes (gene1|gene2) and clusters (cluster1|cluster2).

  • threshold (float) – Percentage of cells expressing the specific ligand/receptor [0.0 - 1.0]

  • result_precision (int) – Number of decimal digits in results.

  • debug (bool) – Storge intermediate data as pickle file (debug_intermediate.pkl).

  • output_suffix (str, optional) – Suffix to append to the result file’s name (if not provided, timestamp will be used)

  • score_interactions (bool) – If True, CellphoneDB interactions will be scored per cell type pair, and returned in interaction_scores_dict

  • threads (int) – Number of threads to be used when scoring interactions

Returns:

  • means_result

  • deconvoluted_result

  • deconvoluted_percents

  • interaction_scores_dict

Return type:

Dict with the following keys

Statistical Analysis Method

cellphonedb.src.core.methods.cpdb_statistical_analysis_method.call(cpdb_file_path: str | None = None, meta_file_path: str | None = None, counts_file_path: str | None = None, counts_data: str | None = None, output_path: str | None = None, microenvs_file_path: str | None = None, active_tfs_file_path: str | None = None, iterations: int = 1000, threshold: float = 0.1, threads: int = 4, debug_seed: int = -1, result_precision: int = 3, pvalue: float = 0.05, subsampling=False, subsampling_log=False, subsampling_num_pc=100, subsampling_num_cells=None, separator: str = '|', debug: bool = False, output_suffix: str | None = None, score_interactions: bool = False) dict[source]

Statistical method for analysis

This method calculates the mean and percent for the cluster interactions and for each gene interaction. No shuffling nor DEGs are involved.

cpdb_file_path: str

CellphoneDB database file path

meta_file_path: str

Path to metadata csv file

counts_file_path: str

Path to counts csv file

counts_data: str

Type of gene identifiers in the counts data: “ensembl”, “gene_name”, “hgnc_symbol”

output_path: str

Output path used to store the analysis results (and to store intermediate files when debugging)

microenvs_file_path: str, optional

Path to Micro-environment file. Its content is used to limit cluster interactions

active_tfs_file_path: str, optional

Path to active TFs. Its content is used to limit cluster interactions.

iterations: int

Number of times cell type labels will be shuffled across cells in order to determine statistically significant ligand/receptor expression means.

threshold: float

Percentage of cells expressing the specific ligand/receptor [0.0 - 1.0]

threads: int

Number of threads to be used during the shuffling of clusters/cell types across cells, and in scoring interactions - if score_interactions argument was set to True

debug_seed: int

This parameter is used for testing only (and only in single-threaded mode only - see: https://stackoverflow.com/questions/21494489/what-does-numpy-random-seed0-do).

result_precision: int

Number of decimal digits in results.

pvalue: float

A p-value below which a ligand/receptor expression mean is considered to be statistically significant.

subsampling: bool

Enable subsampling

subsampling_log: bool,

Enable subsampling log1p for non log-transformed data inputs !!mandatory!!

subsampling_num_pc: int,

Subsampling NumPC argument (number of PCs to use) [100]

subsampling_num_cells: int

Number of cells to subsample to [1/3 of cells]

separator: str

Separator for pairs of genes (gene1|gene2) and clusters (cluster1|cluster2).

debug: bool

Storge intermediate data as pickle file (debug_intermediate.pkl).

output_suffix: str, optional

Suffix to append to the result file’s name (if not provided, timestamp will be used)

score_interactions: bool

If True, CellphoneDB interactions will be scored per cell type pair, and returned in interaction_scores_dict

Dict with the following keys:
  • deconvoluted

  • deconvoluted_percents,

  • means

  • pvalues

  • significant_means

  • interaction_scores

Differentially Expressed Genes (DEGs) Analysis Method

cellphonedb.src.core.methods.cpdb_degs_analysis_method.call(cpdb_file_path: str | None = None, meta_file_path: str | None = None, counts_file_path: str | None = None, degs_file_path: str | None = None, counts_data: str | None = None, output_path: str | None = None, microenvs_file_path: str | None = None, active_tfs_file_path: str | None = None, separator: str = '|', threshold: float = 0.1, result_precision: int = 3, debug: bool = False, output_suffix: str | None = None, score_interactions: bool = False, threads: int = 4) dict[source]

Differentially Expressed Genes (DEGs) analysis

This analysis bypass previous statistical analysis where mean’s pvalues are computed using a permutation approach. Instead of deriving pvalues from a re-shufling strategy to identify relevant means (aka, mean expression of ligand/receptor in a cell-cell pair), relevant interactions are identified from a list of differentially expressed genes (DEGs) provided by the user and computed from their counts matrix.

Parameters:
  • cpdb_file_path (str) – CellphoneDB database file path

  • meta_file_path (str) – Path to metadata csv file

  • counts_file_path (str) – Path to counts csv file

  • degs_file_path (str) – Path to differential expression csv file

  • counts_data (str) – Type of gene identifiers in the counts data: “ensembl”, “gene_name”, “hgnc_symbol”

  • output_path (str) – Output path used to store the analysis results (and to store intermediate files when debugging)

  • microenvs_file_path (str, optional) – Path to Micro-environment file. Its content is used to limit cluster interactions

  • active_tfs_file_path (str, optional) – Path to active TFs. Its content is used to limit cluster interactions.

  • separator (str, optional) – Separator for pairs of genes (gene1|gene2) and clusters (cluster1|cluster2).

  • threshold (float, optional) – Percentage of cells expressing the specific ligand/receptor [0.0 - 1.0]

  • result_precision (int, optional) – Number of decimal digits in results.

  • debug (bool, optional) – Storge intermediate data as pickle file (debug_intermediate.pkl).

  • output_suffix (str, optional) – Suffix to append to the result file’s name (if not provided, timestamp will be used)

  • score_interactions (bool) – If True, CellphoneDB interactions will be scored per cell type pair, and returned in interaction_scores_dict

  • threads (int) – Number of threads to be used when scoring interactions

Returns:

  • deconvoluted_result

  • deconvoluted_percents

  • means_result

  • relevant_interactions_result

  • significant_means

  • interaction_scores_dict

Return type:

Dict