Analysis Methods
Non-statistical Analysis Method
- cellphonedb.src.core.methods.cpdb_analysis_method.call(cpdb_file_path: str | None = None, meta_file_path: str | None = None, counts_file_path: str | None = None, counts_data: str | None = None, output_path: str | None = None, microenvs_file_path: str | None = None, separator: str = '|', threshold: float = 0.1, result_precision: int = 3, debug: bool = False, output_suffix: str | None = None, score_interactions: bool = False, threads: int = 4) dict [source]
Non-statistical method for analysis
This methods calculates the mean and percent for the cluster interactions and for each gene interaction. No shuffling nor DEGs are involved.
- Parameters:
cpdb_file_path (str) – CellphoneDB database file path
meta_file_path (str) – Path to metadata csv file
counts_file_path (str) – Path to counts csv file
counts_data (str) – Type of gene identifiers in the counts data: “ensembl”, “gene_name”, “hgnc_symbol”
output_path (str) – Output path used to store the analysis results (and to store intermediate files when debugging)
microenvs_file_path (str, optional) – Path to Micro-environment file. Its content is used to limit cluster interactions
separator (str) – Separator for pairs of genes (gene1|gene2) and clusters (cluster1|cluster2).
threshold (float) – Percentage of cells expressing the specific ligand/receptor [0.0 - 1.0]
result_precision (int) – Number of decimal digits in results.
debug (bool) – Storge intermediate data as pickle file (debug_intermediate.pkl).
output_suffix (str, optional) – Suffix to append to the result file’s name (if not provided, timestamp will be used)
score_interactions (bool) – If True, CellphoneDB interactions will be scored per cell type pair, and returned in interaction_scores_dict
threads (int) – Number of threads to be used when scoring interactions
- Returns:
means_result
deconvoluted_result
deconvoluted_percents
interaction_scores_dict
- Return type:
Dict with the following keys
Statistical Analysis Method
- cellphonedb.src.core.methods.cpdb_statistical_analysis_method.call(cpdb_file_path: str | None = None, meta_file_path: str | None = None, counts_file_path: str | None = None, counts_data: str | None = None, output_path: str | None = None, microenvs_file_path: str | None = None, active_tfs_file_path: str | None = None, iterations: int = 1000, threshold: float = 0.1, threads: int = 4, debug_seed: int = -1, result_precision: int = 3, pvalue: float = 0.05, subsampling=False, subsampling_log=False, subsampling_num_pc=100, subsampling_num_cells=None, separator: str = '|', debug: bool = False, output_suffix: str | None = None, score_interactions: bool = False) dict [source]
Statistical method for analysis
This method calculates the mean and percent for the cluster interactions and for each gene interaction. No shuffling nor DEGs are involved.
- cpdb_file_path: str
CellphoneDB database file path
- meta_file_path: str
Path to metadata csv file
- counts_file_path: str
Path to counts csv file
- counts_data: str
Type of gene identifiers in the counts data: “ensembl”, “gene_name”, “hgnc_symbol”
- output_path: str
Output path used to store the analysis results (and to store intermediate files when debugging)
- microenvs_file_path: str, optional
Path to Micro-environment file. Its content is used to limit cluster interactions
- active_tfs_file_path: str, optional
Path to active TFs. Its content is used to limit cluster interactions.
- iterations: int
Number of times cell type labels will be shuffled across cells in order to determine statistically significant ligand/receptor expression means.
- threshold: float
Percentage of cells expressing the specific ligand/receptor [0.0 - 1.0]
- threads: int
Number of threads to be used during the shuffling of clusters/cell types across cells, and in scoring interactions - if score_interactions argument was set to True
- debug_seed: int
This parameter is used for testing only (and only in single-threaded mode only - see: https://stackoverflow.com/questions/21494489/what-does-numpy-random-seed0-do).
- result_precision: int
Number of decimal digits in results.
- pvalue: float
A p-value below which a ligand/receptor expression mean is considered to be statistically significant.
- subsampling: bool
Enable subsampling
- subsampling_log: bool,
Enable subsampling log1p for non log-transformed data inputs !!mandatory!!
- subsampling_num_pc: int,
Subsampling NumPC argument (number of PCs to use) [100]
- subsampling_num_cells: int
Number of cells to subsample to [1/3 of cells]
- separator: str
Separator for pairs of genes (gene1|gene2) and clusters (cluster1|cluster2).
- debug: bool
Storge intermediate data as pickle file (debug_intermediate.pkl).
- output_suffix: str, optional
Suffix to append to the result file’s name (if not provided, timestamp will be used)
- score_interactions: bool
If True, CellphoneDB interactions will be scored per cell type pair, and returned in interaction_scores_dict
- Dict with the following keys:
deconvoluted
deconvoluted_percents,
means
pvalues
significant_means
interaction_scores
Differentially Expressed Genes (DEGs) Analysis Method
- cellphonedb.src.core.methods.cpdb_degs_analysis_method.call(cpdb_file_path: str | None = None, meta_file_path: str | None = None, counts_file_path: str | None = None, degs_file_path: str | None = None, counts_data: str | None = None, output_path: str | None = None, microenvs_file_path: str | None = None, active_tfs_file_path: str | None = None, separator: str = '|', threshold: float = 0.1, result_precision: int = 3, debug: bool = False, output_suffix: str | None = None, score_interactions: bool = False, threads: int = 4) dict [source]
Differentially Expressed Genes (DEGs) analysis
This analysis bypass previous statistical analysis where mean’s pvalues are computed using a permutation approach. Instead of deriving pvalues from a re-shufling strategy to identify relevant means (aka, mean expression of ligand/receptor in a cell-cell pair), relevant interactions are identified from a list of differentially expressed genes (DEGs) provided by the user and computed from their counts matrix.
- Parameters:
cpdb_file_path (str) – CellphoneDB database file path
meta_file_path (str) – Path to metadata csv file
counts_file_path (str) – Path to counts csv file
degs_file_path (str) – Path to differential expression csv file
counts_data (str) – Type of gene identifiers in the counts data: “ensembl”, “gene_name”, “hgnc_symbol”
output_path (str) – Output path used to store the analysis results (and to store intermediate files when debugging)
microenvs_file_path (str, optional) – Path to Micro-environment file. Its content is used to limit cluster interactions
active_tfs_file_path (str, optional) – Path to active TFs. Its content is used to limit cluster interactions.
separator (str, optional) – Separator for pairs of genes (gene1|gene2) and clusters (cluster1|cluster2).
threshold (float, optional) – Percentage of cells expressing the specific ligand/receptor [0.0 - 1.0]
result_precision (int, optional) – Number of decimal digits in results.
debug (bool, optional) – Storge intermediate data as pickle file (debug_intermediate.pkl).
output_suffix (str, optional) – Suffix to append to the result file’s name (if not provided, timestamp will be used)
score_interactions (bool) – If True, CellphoneDB interactions will be scored per cell type pair, and returned in interaction_scores_dict
threads (int) – Number of threads to be used when scoring interactions
- Returns:
deconvoluted_result
deconvoluted_percents
means_result
relevant_interactions_result
significant_means
interaction_scores_dict
- Return type:
Dict