Package 'rhcoclust'

Title: Robust Hierarchical Co-Clustering to Identify Significant Co-Cluster
Description: Here we performs robust hierarchical co-clustering between row and column entities of a data matrix in absence and presence of outlying observations. It can be used to explore important co-clusters consisting of important samples and their regulatory significant features. Please see Hasan, Badsha and Mollah (2020) <doi:10.1101/2020.05.13.094946>.
Authors: Md. Bahadur Badsha [aut, cre], Mohammad Nazmol Hasan [aut], Md. Nurul Haque Mollah [aut]
Maintainer: Md. Bahadur Badsha <[email protected]>
License: GPL (>=2)
Version: 2.0.0
Built: 2024-10-12 03:38:59 UTC
Source: https://github.com/mdbahadur/rhcoclust

Help Index


A real glutathione metabolism pathway (GMP) dataset for 'rhcoclust' package

Description

This is the real data matrix that used as an example in 'rhcoclust' package.

Details

The real data matrix collected from Nyström-Persson et al., 2013 (https://toxygates.nibiohn.go.jp/toxygates/#columns).The column of the data matrix represents doses of chemical compounds (DCCs) and row represents genes. Each element in the dataset represents fold change gene expression data.

Value

Matrix

Author(s)

Md. Bahadur Badsha <[email protected]>

References

1. Nyström-Persson et al., (2013). Toxygates: interactive toxicity analysis on a hybrid microarray and linked data platform. Bioinformatics, Volume 29, Issue 23,Pages 3080–3086.

Examples

# Load library
library(rhcoclust)

# Load real data
data("FCGE_Data_GMP")

A real PPAR signaling pathways (PPAR-SP) dataset for 'rhcoclust' package

Description

This is the real data matrix that used as an example in 'rhcoclust' package.

Details

The real data matrix collected from Nyström-Persson et al., 2013 (https://toxygates.nibiohn.go.jp/toxygates/#columns).The column of the data matrix represents doses of chemical compounds (DCCs) and row represents genes. Each element in the dataset represents fold change gene expression data.

Value

Matrix

Author(s)

Md. Bahadur Badsha <[email protected]>

References

1. Nyström-Persson et al., (2013). Toxygates: interactive toxicity analysis on a hybrid microarray and linked data platform. Bioinformatics, Volume 29, Issue 23,Pages 3080–3086.

Examples

# Load library
library(rhcoclust)

# Load real data
data("FCGE_Data_PPARs")

Plot of the 'rhcoclust' objects

Description

This function used for two plots from output of rhcoclust (i) plot results for gene (row) and compound (column) co-cluster graph, and (ii) plot graph of QCC for identification of biomarker co-cluster.

Usage

plot_rhcoclust(CoClustObj, plot.cocluster = FALSE, plot.SCC = FALSE,
cex.xaxis = 0.7, cex.yaxis = 0.5)

Arguments

CoClustObj

Output objects from rhcoclust

plot.cocluster

To set no plotting as the default for cocluster.

plot.SCC

To set no plotting as the default for SCC.

cex.xaxis

A numerical value giving to control/annotation text size in x-axis. Default is 0.7.

cex.yaxis

A numerical value giving to control/annotation text size in y-axis. Default is 0.5.

Value

Plots

Author(s)

Md. Bahadur Badsha <[email protected]>

See Also

rhcoclust for generating a graph objects for clustering network

Examples

# Load necessary library
library(rhcoclust)
library(fields)

# Load real data
data("FCGE_Data_GMP")
data("FCGE_Data_PPARs")
# Load predefined real data
# Real data use: data <- FCGE_Data_PPARs
# Real data use: data <- FCGE_Data_GMP

# Load predefined simulated data
data("simu_data")

# simulated data
data <- simu_data

# Apply rhcoclust to identify significant co-cluster of samples and their regulatory features
CoClustObj <- rhcoclust(data, rk=4, ck=3, method.dist = "manhattan", method.hclust = "ward.D")
# For real data either FCGE_Data_PPARs or FCGE_Data_GMP
#CoClustObj <- rhcoclust(data, rk=3, ck=3, method.dist = "manhattan", method.hclust = "ward.D")

# Plot co-cluster
# Please use par(mar=c(6, 10, 3, 6))  or modify if needed for best fit of the graph
# mar order: bottom, left, top, and right
plot_rhcoclust (CoClustObj, plot.coclust = TRUE, plot.SCC = FALSE,
cex.xaxis = 0.7, cex.yaxis = 0.5)

# Plot SCC
# Please use dev.off() to avoid the figure margin from previous plot
plot_rhcoclust (CoClustObj, plot.coclust = FALSE, plot.SCC = TRUE)
# Please add legend with change or add any parameters if needed.
legend("topleft",
       legend = c("Upper-significant", "Insignificant","Down-significant"),
       col = c("red","black","blue"),
       bty = "n",
       pch = c(20,20,20),
       pt.cex = 2,
       cex = 1.2,
       x.intersp = 0.2,
       y.intersp = 0.4,
       text.col = "black",
       horiz = FALSE ,
       inset = c(0.3, -0.08))

Reverse given string

Description

This is the function for reverse string

Usage

reversestring(string, n = 1)

Arguments

string

Given string or seq

n

By which n-plets we should reverse the given string

Details

This function is uesd to reverse given string or seq

Value

Reversed string or seq

Author(s)

Md. Bahadur Badsha <[email protected]>

Examples

x1 <- c("R1C1", "R2C2", "R3C3")
reversestring(x1, 2)

The function for co-clustering sample and feature to explore significant samples and their regulatory features

Description

Toxicogenomic studies require co-clustering to identify biomarker genes for the assessment of chemical toxicity from gene expression levels. It is also essential in the drug discovery experiments. However, gene expression datasets are often contaminated by outliers due to several steps involve in the data generating process. This package performs robust hierarchical co-clustering between row and column entities of a data matrix by reducing the influence of outlying observations. It can be used to explore biomarker genes those are divided into upregulatory and downregulatory groups by the influence of different chemical compounds groups more accurately. It can also provide the statistical significance of the identified co-clusters.

Usage

rhcoclust(data, rk, ck, method.dist = "manhattan", method.hclust= "ward.D")

Arguments

data

A data matrix containing data having the characteristics of interval and ratio level of measurement or continuous data

rk

Number of clusters in the row entities of the data matrix

ck

Number of clusters in the column entities of the data matrix

method.dist

The distance measure to be used. The default is "manhattan". The other options are "euclidean", "maximum", "canberra", "binary" or "minkowski". Any unambiguous substring can be given.

method.hclust

The agglomeration method to be used. The default is "ward.D". The other options are "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).

Value

A list of object that containing the following:

Coclust_MeanMat : A data frame containing combination of row and column cluster number in the first column and their ranked co-cluster mean in the second column. In the first column first number indicates row cluster index and second number indicates column cluster index, respectively.

CoClsDtMat : The reorganized transformed data matrix to generate co-cluster graph.

NG_Cocls : The index of gene/row names.

NC_Cocls : The index of column names.

rowclust :The gene/row entity clusters.

colclust : The column entity clusters.

colorsG: Colors of genes/row entity clusters to generate co-cluster graph.

colorsC: Colors of DCCs/column entity clusters to generate co-cluster graph.

CentralLine: Central Line of individual control chart to generate graph of control chart and to identify significant co-clusters.

UpContLimit: Upper Control Limit to generate graph of control chart and to identify significant co-clusters.

LowrContLimit: Lower Control Limit to generate graph of control chart and to identify significant co-clusters.

color: Colors to generate individual control chart.

pchmark: Shape of points to generate individual control chart.

Author(s)

Md. Bahadur Badsha <[email protected]>

Examples

# Load necessary library
library(rhcoclust)
library(fields)

# Load real data
data("FCGE_Data_GMP")
data("FCGE_Data_PPARs")
# Load predefined real data
# Real data use: data <- FCGE_Data_PPARs
# Real data use: data <- FCGE_Data_GMP

# Load predefined simulated data
data("simu_data")

# simulated data
data <- simu_data
# Apply rhcoclust to identify significant co-cluster of samples and their regulatory features
CoClustObj <- rhcoclust(data, rk=4, ck=3, method.dist = "manhattan", method.hclust = "ward.D")
# For real data either FCGE_Data_PPARs or FCGE_Data_GMP
#CoClustObj <- rhcoclust(data, rk=3, ck=3, method.dist = "manhattan", method.hclust = "ward.D")

# A data frame containing combination of row and column cluster number in the first
# column and their ranked co-cluster mean in the second cluster.
#GC_cls_MeanMat <- CoClustObj$Coclust_MeanMat

# The reorganized transformed data matrix to generate co-cluster graph.
CoClsDtMat <- CoClustObj$CoClsDtMat

# The gene/row entity clusters.
rowclust <- CoClustObj$rowclust

# The column entity clusters.
colclust <- CoClustObj$colclust

# Colors of genes/row entity clusters to generate co-cluster graph
colorsG <- CoClustObj$colorsG

# Colors of DCCs/column entity clusters to generate co-cluster graph
colorsC <- CoClustObj$colorsC

# Central Line of individual control chart to generate graph of control chart and to
# identify significant co-clusters.
CntrLine_QC <- CoClustObj$CentralLine

# Upper Control Limit to generate graph of control chart and to identify significant
# co-clusters.
UCL_QC <- CoClustObj$UpContLimit

# Lower Control Limit to generate graph of control chart and to identify significant
# co-clusters.
LCL_QC <- CoClustObj$LowrContLimit

# Colors to generate individual control chart.
ColorQC <- CoClustObj$color

# Shape of points to generate individual control chart.
PcmQC <- CoClustObj$pchmark

# Plot co-cluster
# par(mar=c(6,10,3,6)) # Modify if needed
# mar order: bottom, left, top, and right
# please use different values if needed for cex.xaxis and cex.yaxis
# to adjust xaxis and yaxis text
plot_rhcoclust (CoClustObj, plot.coclust = TRUE, plot.SCC = FALSE)

# Plot SCC
# use dev.off() to avoid the figure margin from previous plot
plot_rhcoclust (CoClustObj, plot.coclust = FALSE, plot.SCC = TRUE)

Interaction network (internet) of the 'rhcoclust' objects

Description

This function is used for visualization of clustering interaction network plot for the objects that are generated by rhcoclust and list of up-regulated and down-regulated variables list. There are three layers in the network plot, (i) genes (rows) are shown in the first layer, (ii) co-cluster are in the second layer and (iii) columns (compounds) are in the last layer. Red and blue color indicates up and down regulated respectively.

Usage

rhcoclust_internet(data, CoClustObj, CoClust.sig =FALSE,
cex.nodes = 0.7, edge.width = 1)

Arguments

data

A data matrix containing data having the characteristics of interval and ratio level of measurement or continuous data

CoClustObj

A list of output objects from rhcoclust

CoClust.sig

To set no plotting as the default for up and down regulated group. Default is FALSE

cex.nodes

A numerical value giving to control/annotation node size in the network. Default is 0.7.

edge.width

A numerical value giving to control/annotation edge width in the network. Default is 1.

Value

Plot A list up-regulated and down-regulated variables.

Author(s)

Md. Bahadur Badsha <[email protected]>

See Also

rhcoclust for generating a graph objects for clustering network

Examples

# Load necessary library
library(rhcoclust)
library(fields)

# Load real data
data("FCGE_Data_GMP")
data("FCGE_Data_PPARs")
# Load predefined real data
# Real data use: data <- FCGE_Data_PPARs
# Real data use: data <- FCGE_Data_GMP

# Load predefined simulated data
data("simu_data")

# simulated data
data <- simu_data

# Apply rhcoclust to identify significant co-cluster of samples and their regulatory features
CoClustObj <- rhcoclust(data, rk=4, ck=3, method.dist = "manhattan", method.hclust = "ward.D")
# For real data either FCGE_Data_PPARs or FCGE_Data_GMP
#CoClustObj <- rhcoclust(data, rk=3, ck=3, method.dist = "manhattan", method.hclust = "ward.D")

# Plot interaction network (internet)
# Please use dev.off() to avoid the figure margin from previous plot
# mar order: bottom, left, top, and right
# please use par(mar=c(5,2,5,2)) or modify when necessary to best fit for the plot
Netrhcoclust <- rhcoclust_internet(data, CoClustObj = CoClustObj,
CoClust.sig = FALSE, cex.nodes = 0.7, edge.width = 1)
# Please change or add any parameter if needed.
text(x = -1, y = 1.1, "Row Cluster", cex = 0.7)
# Please change or add any parameter if needed.
text(x = 0, y = 1.1, "Co-Cluster", cex = 0.7)
# Please change or add any parameter if needed.
text(x = 1, y = 1.1, "Column Cluster", cex = 0.7)

Visualization of clustering network plot

Description

This function is used for visualization of clustering network plot, the plot objects are generated by rhcoclust.

Usage

rhcoclust_network(CoClustObj)

Arguments

CoClustObj

Output objects from rhcoclust

Value

Plot

Author(s)

Md. Bahadur Badsha <[email protected]>

See Also

rhcoclust for generating a graph objects for clustering network

Examples

# Load necessary library
library(rhcoclust)
library(fields)
library(igraph)

# Load real data
data("FCGE_Data_GMP")
data("FCGE_Data_PPARs")
# Load predefined real data
# Real data use: data <- FCGE_Data_PPARs
# Real data use: data <- FCGE_Data_GMP

# Load predefined simulated data
data("simu_data")

# simulated data
data <- simu_data


# Apply rhcoclust to identify significant co-cluster of samples and their regulatory features
CoClustObj <- rhcoclust(data, rk=4, ck=3, method.dist = "manhattan", method.hclust = "ward.D")
# For real data either FCGE_Data_PPARs or FCGE_Data_GMP
#CoClustObj <- rhcoclust(data, rk=3, ck=3, method.dist = "manhattan", method.hclust = "ward.D")

# Visualization of clustering network plot
rhcoclust_network(CoClustObj)

A predefined simulated data for 'rhcoclust' package

Description

This is the predefined simulated data matrix that used as an example in 'rhcoclust' package.

Details

The column of the simulated data matrix represents doses of chemical compounds (DCCs) and row represents genes. Each element in the dataset represents fold change gene expression data.

Value

Matrix

Author(s)

Md Bahadur Badsha <[email protected]>

See Also

simulate_data

Examples

library(rhcoclust)

# Load data
data("simu_data")

Simulate data for robust hirarachical clustering to identify significant co-cluster

Description

We generate fold change gene expression (FCGE) data according to the characteristics of toxicogenomic data.

Usage

simulate_data(no.gene, no.dcc)

Arguments

no.gene

Number of genes in the simulated data.

no.dcc

Numbe of doses of chemical compounds (dcc) in the simulated data.

Details

There are four gene groups and three DCCs groups in the simulated dataset. The gene group 1, 2, 3 and 4 consists the genes G1-G10, G11-G20, G21-G30 and G31-G50 respectively and DCCs group 1, 2 and 3 consists the DCCs C1_High-C5_High- C1_Middle-C5_Middle, C6_High-C10_High- C6_Middle-C10_Middle and C1_Low-C12_Low- C11_Middle- C12_Middle- C11_High- C12_High respectively. Where, G stands for gene and C stands for chemical compound arranged in the row and column of the simulated data matrix respectively. The error term N(0,0.35) from normal distribution with mean 0 and variance 0.35 is added to each element of the simulated dataset. In the simulated dataset the gene group-1 is up-regulated by the DCCs group-1, gene goup-2 is up and down-regulated by the DCCs group-2 and 1 respectively. The gene group-3 is down-regulated by the DCs group-2. The gene group-4 is not regulated by any of the DCCs groups and DCCs group-3 does not influence any of the genes in the dataset.

Value

A list of object that containing the following:

SimData: A simulated data matrix as generated.

SimDataRnd: Randomly distributed row and column entity of simulated data.

GCmat: Transformed simulated data.

Author(s)

Md. Bahadur Badsha <[email protected]>

Examples

# Load library
library(rhcoclust)

# Number of genes in the simulated data.
no.gene <- 50

# Numbe of doses of chemical compounds (dcc) in the simulated data.
no.dcc <- 12

SimulteData <- simulate_data(no.gene,no.dcc)

# A simulated data matrix as generated.
SimulteData$SimData

# A randomly distributed row and column entity of simulated data.
SimulteData$SimDataRnd

# A transformed simulated data.
SimulteData$GCmat