Saturday, December 31, 2016

noiseqbio too many false positives



###################################################
### code chunk number 33: NOISeq.Rnw:875-877
mynoiseqbio = noiseqbio(mydata, k = 0.5, norm = "n", factor="Zconditions", lc = 1, r = 20, adj = 1.5, plot = FALSE, a0per = 0.9, filter = 1,
                        # random.seed = 12345, 
                        conditions = c("control","high")
                        )#this runs 20161230 5:52pm
head(mynoiseqbio@results[[1]])
summary(mynoiseqbio@results[[1]]) #too many false positives

So, I decided that noiseqbio did not give reasonable outcomes. Although, different parameters may yield different results.

For comparison, I tried noiseq, which gave reasonable results.














Friday, December 30, 2016

NOIseq RNAseq differential analysis


http://bioconductor.org/packages/release/bioc/html/NOISeq.html

M: fold-change differences
D: absolute expression differences

(M,D) pair for each gene is evaluated based on a null distribution estimated from technical or biological replicates or simulations in 2011GR.

In NIOSEQBIO, theta=(M+D)/2 seems to be the statistic used for null distribution based on my understanding of its manual.




Probability = 0.8 was the cutoff for differentially expressed genes in 2011GR.
Probability = 0.95 (FDR) is recommended for biologically replicated samples.

In its Tarazona2011GR, noiseq-real and noiseq-sim were used. These two versions have now evolved to noiseq and noiseqbio.  


NOISEQBIO is optimized for biological replicates.

When using noiseq and noiseqbio, normalization and filtering can be done through parameters, 'norm'.
Regarding the low-count filtering, it is not necessary to filter in NOISeq  method. In contrast, it is recommended to do it in NOISeqBIO , which by default fliters out low-count features with CPM method (filter=1 ).



# noiseq(input, k = 0.5, norm = c("rpkm","uqua","tmm","n"),  replicates = c("technical","biological","no"), factor=NULL, conditions=NULL, pnr = 0.2, nss = 5, v = 0.02, lc = 0)

mynoiseq = noiseq(mydata, k = 0.5, norm = "rpkm", factor="Tissue", pnr = 0.2, 
                  nss = 5, v = 0.02, lc = 1, replicates = "technical")
head(mynoiseq@results[[1]])

> myfactors
           Tissue TissueRun
R1L1Kidney Kidney  Kidney_1
R1L2Liver   Liver   Liver_1
R1L3Kidney Kidney  Kidney_1
R1L4Liver   Liver   Liver_1
R1L6Liver   Liver   Liver_1
R1L7Kidney Kidney  Kidney_1
R1L8Liver   Liver   Liver_1
R2L2Kidney Kidney  Kidney_2
R2L3Liver   Liver   Liver_2
R2L6Kidney Kidney  Kidney_2



mynoiseqbio = noiseqbio(mydata, k = 0.5, norm = "rpkm", factor="Tissue", lc = 1, r = 20, adj = 1.5, plot = FALSE,   a0per = 0.9, random.seed = 12345, filter = 2)
# "r=20" seems to indicate 20 bootstraps when biological replicate number <5. 





Authors stated that noiseq output prob are not equivalent to p-values? 




Q: what are "up" and "down" deg referenced to?









Output format


mynoiseq.deg1 = degenes(mynoiseq, q = 0.8, M = "up")










References:
 [1] S. Tarazona, F. Garca-Alcalde, J. Dopazo, A. Ferrer, and A. Conesa. Dierential expression in RNA-seq: A matter of depth. Genome Research , 21: 2213 - 2223, 2011.

[2] S. Tarazona, P. Furio-Tar, D. Turra, A. Di Pietro, M.J. Nueda, A. Ferrer, and A. Conesa. Data quality aware analysis of dierential expression in RNA-seq with NOISeq R/Bioc package. Nucleic Acids Research ,

43(21):e140, 2015.

[8] B. Efron, R. Tibshirani, J.D. Storey, V. Tusher. Empirical Bayes Analysis of a Microarray Experiment. Journal of the American Statistical Association , 2001.



Tuesday, December 13, 2016

big challenges in evolution and ecology


https://dynamicecology.wordpress.com/2015/07/06/what-are-the-top-5-grand-challenges-in-biology/

 1) linking genotype to phenotype, and understanding how the environment influences that link and 2) understanding biological diversity (its evolution, its maintenance, and the consequences of its loss).

So, understanding the origin of life would definitely be one of my grand challenges.

A clear fourth challenge relates to understanding the brain – clearly this is a huge, very active area of research, and it’s also something that students will find really engaging. (It’s also one of the challenges on the list I linked to earlier.)


  1. Linking phenotype to genotype
  2. Understanding biodiversity
  3. Origins of life
  4. Understanding the brain
  5. Sustainable agriculture

http://www.imperial.ac.uk/ecosystems-and-environment/
http://www.imperial.ac.uk/ecosystems-and-environment/grand-challenges/

Understanding biodiversity, linking past, present, and future of biodiversity

Environmental monitoring and evaluation, developing new tools and methods for environmental monitoring and evaluation

Engineering complex ecosystems.

Predicting and mitigating environmental change for managing the effect of local, regional and global change

Experiments: manupulation of the natural world to understand the mechanics of ecosystems
Scaling: summing up ecological and evolution processed locally on individuals to understand regional and global patterns

Ecoinformatics and genomics: integrating genomic and ecological data to understand the natural world






Exemplar Research Questions


The following list provides some examples of topics on which faculty in Ecology and Evolutionary Biology at University of Tennessee, Knoxville, would be interested in recruiting graduate students for entry in August 2017.
This list is not exhaustive – indeed, far from it. There are other faculty members who will be recruiting students in the Department. Also, the listed faculty members may recruit students who have different interests to those listed. But we prepared this list just to illustrate to prospective students some of the diversity of topics on which we envision recruiting, spanning conservation, macroevolution, global change ecology, molecular genetics, biology education and systematics, among many other topics.
  • How can large-scale efforts to conserve biodiversity or ecosystem services, which are led by governments or international nonprofits, most effectively complement bottom-up conservation efforts lePDFd by local communities?
  • Conservation organizations often have a hierarchical management structure – how effectively do hierarchies allocate resources to support conservation of biodiversity and ecosystem services?
  • How will species range dynamics drive genetic divergence? How do feedbacks reinforce patterns of genetic divergence on the landscape?
  • Does contemporary evolution along the gradients of global change alter ecosystem function?
Jessica Budke (http://jmbudke.github.io)
  • What morphological and transcriptomic changes do plant species undergo transitioning from terrestrial to aquatic habitats? How do we resolve relationships between morphologically austere taxa?
  • What are the functional roles of maternal structures for offspring survival, development, and fitness?
  • How do different types of selective pressures on individuals shape the evolution of animal social systems?
  • What role (if any) does infectious disease play in conservation management planning for endangered populations?
  • How does among population variation in plant phenotype affect population structuring of herbivores?
  • What role does host breadth play in range size and diversification rate of herbivorous insects?
  • How can we understand better theoretically the origins of news species and the links between micro-evolutionary processes and macro-evolutionary patterns?
  • How did human social complexity evolve and what are the implications of our evolutionary past for our social behavior?
  • How do human activities impact species, communities, and ecosystem function across spatial and temporal scales?
  • How will future demand for food and biofuels interact with likely agricultural yield improvements, climate change, and changes in land rental rates to affect future land-cover transformations and their subsequent impact on biodiversity?
  • How do assembly costs and translation errors shape selection on codon usage and how do they play themselves out in the face of biased mutation and genetic drift?
  • Some pathogens replicate intracellularly within hosts and move between host cells through budding or bursting. How does the rate of intracellular replication affect the rates of immune response clearance by the host? How, in turn, does this lead to changes in the survival of the host and transmission of the pathogen between hosts?
  • How are biological processes integrated across scales and levels of biological resolution from within organism level to those operating at population/community/landscape levels?
  • How do we effectively utilize mathematical and computational methods for spatial control – what to do, where to do it, when to do it, and how to assess the resulting solutions – for problems in epidemiology, invasive species management and conservation biology?
  • How do invaders and antagonistic interactions alter soil fungal communities, the function of key plant mutualisms and shape the demography and life history evolution of native community members?
  • What role does the ecological context, specifically selection driven by the absence of mates and pollinators, play in the evolution of selfing and genomic changes within and between species? Is selfing an evolutionary dead end or a reversible mating system?
  • What are the effects (actual and predicted) and ramifications of land-use and climate change, management, and disturbance on biodiversity in natural, managed, and agricultural settings?
  • What important roles do animals play in the seed dispersal process in animal-mediated seed dispersal systems?
  • How can we recognize species of mushroom-forming fungi? Why are there so many species of fungi? How are they related to each other, and what factors have promoted their diversification?
  • What are general biogeographical patterns in fungi? What processes are responsible for patterns we observe?
Gary McCracken (email: gmccrack@utk.edu)
  • How do highly mobile predators (bats) track ephemeral and patchy resources (insects) in three dimensional space?
  • Why are some host species associated with a greater diversity of viral pathogens than are other host species?
  • Are spectral diversity metrics derived from hyperspectral imagery good indicators of forest species richness? What other remotely sensed indices can be used to investigate richness and seasonality of vegetation?
  • Does inclusion of species’ physiological limits improve the precision of ecological niche models and potential distribution estimates?
Susan Riechert (email: sriecher@utk.edu)
  • What is the importance of behavior in adapting animal populations to different and changing environments?
  • What factors limit local adaptation to environmental context and why do weaker strategies persist?
  • What is the parentage of the presumed allopolyploid lettuces (Lactuca) in North America, how many species are present, when did they arrive from Eurasia, what has been the consequence of polyploidy for their biology and evolution.
  • How can biology programs enhance graduate student instruction of introductory biology courses?
  • How do instructor active learning practices relate to student perception of their effectiveness in large introductory biology classes?
Jen Schweitzer (http://jenschweitzer.com)
  • Under what varied circumstances do soils and soil microbial communities determine plant traits and act as selective agents?
  • What is the role of plant-pollinator interactions on soil processes?
  • What are the processes generating spatial patterns of biodiversity? What are the roles of biotic and abiotic factors in determining species’ range limits?
  • How do population-level variation in physiology and climatic variation affect predictions of the impacts of climate change?
  • What are the direct and indirect effects of particular plant invasions? A direct effect might be shading, for example, or allelopathy, while an indirect effect might be changing the nutrient cycle (e.g., for instance, by being a nitrogen fixer) or the fire regime.
  • What are the non-target impacts of particular insects introduced for biological control?
  • What is the role of polyploidy in governing the success (in terms of species richness) of plant lineages? Why are some polyploid lineages highly diverse, while others are not?
  • What can contemporary patterns of genetic variation within and among populations tell us about species boundaries and the process of speciation?
  • What are the causes/consequences of diversification of reproductive traits in plants?
  • How does a particular reproductive trait, or set of traits, in a clade of plants develop and how does it contribute to diversification of the clade?



Below is the list, in the chronological order that I plan to introduce them, of “foundational questions in ecology and evolution”:
  1. Why does life exist at all?
  2. What makes life different from non-life?
  3. Why do some individuals die and some live?
  4. How to do living things survive?
  5. How random is nature?
  6. Why don’t we live forever?
  7. Why do life forms look the way they do?
  8. Why are there diverse organisms?
  9. How do we partition diversity?
  10. What drives the patterns of diversity that we see across the earth?
  11. What determines the population size of different kinds of organisms?
  12. Why are some places more biodiverse than others?
  13. What are the various ways in which organisms interact with each other?
  14. Is there a difference between interactions between members of the same species versus different?
  15. Why is nature often a very nasty place?
  16. Why do organisms cooperate with each other?
  17. Why are there more plants than animals?
  18. What actually keeps ecosystems going? How do ecosystems work?
  19. How old is the earth?
  20. How do new species come into being?
  21. Are some species more closely related to each other?
  22. Why did some species go extinct?
  23. Why is there sexual reproduction?
  24. Why are there male and female organisms? Why aren’t there more types?
  25. Why are traits heritable?
  26. What are genes and how do they work in conjunction with the environment?
  27. Where do new traits come from?
  28. How are species often so well adapted to their environments?
  29. Why do organisms display behaviors? Different behaviors?
  30. Why do species change over time?
  31. Do different species affect each other’s evolution?
  32. What evolves?
  33. Are humans subject to evolutionary change in the same way as other organisms?
I recognize that if one were to write a list of contemporary “big questions” in ecology and evolution, there would be a lot of additional questions to add to this list. But my goal is not to capture the big questions of now: I want to create a comprehensive list of the questions that led to the formation of these scientific fields.
Feel free to comment on these “foundational questions”:
  • Are these questions well-phrased and clear?
  • Is this a complete list? Are there any critical questions missing?
  • Do any of these questions seem superfluous?
  • Is the order of the list logical?
Below is a list of some of the sources that I used to come up with these questions:
envrionment360 On His Bicentennial, Mr. Darwin’s Questions Endure
This page has some great commentary on Darwin’s tendency to ask questions about specific observations he made, questions that fall into some of the broad categories of my “foundational questions”. I also really like the “inherent tendency to vary” quote as it relates to the question of the diversity we observe in nature: being a keen observer of this diversity will be a key characteristic of WmD.
Ernst Mayr’s The Growth of Biological ThoughtMayr asserts that Darwin’s central questions were “Can species change, and can one species be transmuted into another?”.
Macroevolution.net Alfred Russel Wallace
This brief biography of Wallace discusses his “why do some die and some live?” question that was inspired in part by his malarial delirium.
UCLA Newsroom Stepping out of Darwin’s shadow
This page discusses some of Wallace’s important questions that relate to biogeography: why organisms exist in particular locations, and why species vary in abundance in different locations.
Natural History Museum London Darwin’s questions on caterpillar colouring
I like this page just because it highlights that Darwin was not above asking Wallace a question related to the evolution of organisms (in this case butterflies).
Thomas N. Sherratt and David M. Wilkinson Big Questions in Ecology and EvolutionThis book contains a bunch of nicely-phrased questions that inspired some of my questions above, including the “why the world is green” question of Hairston, Smith, and Slobodkin and questions such as why species exist and why the tropics are more diverse. In particular it looks at the question of chaos, which inspired my question on randomness.
Journal of Ecology “Identification of 100 fundamental ecological questions
Although these questions are by-and-large a lot more specific — and wonky! — than mine, it was important to see to what degree my questions encompassed these. A lot of these are about human impacts, an area that I will not approach until later in the WmD Project.





GRAND CHALLENGE 1: BIOGEOCHEMICAL CYCLES

GRAND CHALLENGE 2: BIOLOGICAL DIVERSITY AND ECOSYSTEM FUNCTIONING

GRAND CHALLENGE 3: CLIMATE VARIABILITY

GRAND CHALLENGE 4: HYDROLOGIC FORECASTING

GRAND CHALLENGE 5: INFECTIOUS DISEASE AND THE ENVIRONMENT











Sunday, December 11, 2016

UTC, car rental


http://treasurer.tennessee.edu/travel/Web%20announcement.htm

The business rates may not be used for personal travel. For personal travel you must use corporate code XZ56TNP.


Wednesday, December 7, 2016

Computational Geometry: Line Segment Properties ( Two lines Clockwise or Counterclockwise)

Computational Geometry: Line Segment Properties ( Two lines Clockwise or Counterclockwise)

https://www.youtube.com/watch?v=3YFUQDRL1s4

big data, aging, Alzheimer’s dieases

AMP-AD Knowlege Portal

https://www.synapse.org/#!Synapse:syn2580853/wiki/409840

https://www.nia.nih.gov/research/blog/2016/12/increasing-usability-big-data-alzheimers-research?utm_source=20161207_blog&utm_medium=email&utm_campaign=research_blog

simulation of photonic crystals and metamaterials

X Zhang, simcenter thesis defense

https://en.wikipedia.org/wiki/Photonic_crystal

photonic crystals, control propagation of lights,
photonic crystals: bandgap properties, in-plane wave propagation

related commercial software
MPB:MIT photonic bands
HFSS: high frequency structural simulator
CST MWS(CST microwave studio)

Petrov-Galerkin methods for electromagnetic simulations
maxwell's equation, 2D version: TE mode and TM mode

simulation at 500 THZ

Adjoint variables

Bezier curves -> optimal band and optimization


introduction to systems biology, student training



why systems biology

Uri Alon's sysems biology courses: 

cellular aging studies in Qin's lab

MIT quantitative biology course

EdX systems biology



Coursera.org on "systems biology"

van Emde boas tree

keys are unique integers drawn from the set {0, 1, 2, 3, ..., u-1}, where u = 2^(2k).


Tuesday, December 6, 2016

time lapsed image analysis for RLS inference

image data: images produced by HYAA

Dang lab uses FIJI http://fiji.sc/http://fiji.sc/
In Dang lab, a person plays video of time lapsed images in IJ  at a speed of 5 frames per second.  Typically, each cell is counted twice.

Monday, December 5, 2016

structural controllability

two scenario of structural uncontrollable structure
inaccessibility
dilation

cactus: minimal structure that contains neither inaccessbile or dilations.
cacti:


Evolutionary theory and control theory.


Friday, December 2, 2016

Laplace transformation of graphic function


https://www.youtube.com/watch?v=f1mZArY0lLE

https://www.youtube.com/watch?v=ZGPtPkTft8g

UTC grade submission guideline


Final grading is open for the full-term and grades can be entered and changed until 9:00 a.m. on Monday, December 19, 2016. 

To enter grades go to our main webpage https://www.utc.edu/ and click on the MyMocsNet link in the upper right hand corner, enter your UTCID and Password and hit enter, then click on Login to My MocsNet.  Click on the SSB (Self Service Banner) link located in the bottom left area of the Home or Faculty tab, click on Faculty Services, choose XE Midterm & Final Grades, and choose the Final Grades tab.

There are grading guidelines to the right of the grading page along with a link to the training.  If you need assistance you can email me or call me at 425-5780. 

Link to Academic Calendars and Exam Schedules:

Tuesday, November 29, 2016

*** Control systems engineering, control theory, Laplace transform, observability,

A control system has an input, a process, and an output. It can be open loop or closed loop. Open loop systems do not monitor or correct the output. Closed loop systems can monitor output and make adjustments.

linear time-invariant differential equation


Transfer function is another way of mathematically modeling a system.  Transfer function can be derived from the linear, time-invariant differential equation using Laplace transform. Transfer function can only be used for linear systems. (Lapalace transformation was developed as a technique to solve differential equations).

State-space representation is another model for systems and is suitable for non-linear systems.
Essentially, state-space model change nth-order differential equation into n simultaneous first-order equations. It seems to me that the state-space model is the mostly used ODE modeling methods in systems biology.

Test signals with different waveforms can be used to study systems.

The basic analysis of a system is to evaluate the time response of a system.

A sensitivity analysis can yield the percentage of change in a specification as a function of a change in a system parameter.

In biology, many ODEs has nonlinear terms with product of variables. So, transfer function cannot be applied, but state-space method can be used.

Controllability and Observability are well understood in continuous time-invariant linear state-space model, see https://en.wikipedia.org/wiki/State-space_representation#State_variables 

Stability: a system is stable if every bounded input yields a bounded output. So, does aging changes a stable gene network into an unstable network?

Observability: If the initial state vector x(t0) can be found from input u(t) and output y(t) over a finite interval of time from t0, the system is observable; otherwise it is unobservable. 
Observability is the ability to deduce state variables from knowledge of input u(t) and output y(t). 



























































cpsc 5210

RSA
project nolvety,



genome compression


https://en.wikipedia.org/wiki/Compression_of_Genomic_Re-Sequencing_Data

Number theory, data compression for NGS data

Can RSA or other methods be used for NGS sequence compression?

RSA


https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Operation

lab meeting

1a) DE gene lists for RNAseq project
TODO: there are various time points between control and treatment. Should we use the consensus DEG list?

It seems that "GeneID" in BGI report are from NCBI. Example of 57573 is

and 

So, "Gene ID" is a standard NCBI number.

1b) Pathway analysis plan for DE gene lists
TODO: There are different sources of human gene/protein networks. We should try several for comparisons.
TODO: We should try different clustering method, such as hlcust, mcl, etc (refer to Qin's previous paper for clustering analysis).

2) time-lapsed image analysis for yeast replicative lifespan
   We can use ImageJ, MATlab or R.

Saturday, November 26, 2016

simcenter qinlab tools

"module load qinlab" can add these to $PATH

hqin@ridgeside[~/demo.lgf/
RNAseq.hisat2]->ls /usr/local/qinlab/
bin                            samtools-1.3.1.tar.bz2
hisat2                         share
hisat2-2.0.5                   stringtie
hisat2-2.0.5-Linux_x86_64.zip  stringtie-1.3.1c.Linux_x86_64
samtools-1.3.1                 stringtie-1.3.1c.Linux_x86_64.tar.gz

Monday, November 21, 2016

R libarary ridgeside (simcenter)


Global
ls /usr/local/lib/R/site-library/

Local

SimCenter mailing address


University of Tennessee at Chattanooga
701 E. 701 ML King Blvd
Chattanooga, TN 37403

UTC teaching evaluations


http://www.utc.edu/planning-evaluation-institutional-research/student-rating-of-faculty/index.php

RNAseq software installation on qbert or Simcenter clusters

====================For hisat2 and supporting programs
Install hisat2
ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.0.5-Linux_x86_64.zip

Install stringtie 1.3.1c
http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.3.1c.Linux_x86_64.tar.gz

Install samtools
https://github.com/samtools/samtools/releases/download/1.3.1/samtools-1.3.1.tar.bz2
The above link is from http://www.htslib.org/download/
See also https://github.com/samtools/samtools/releases/


====================For R packages
Under shell, run R

Inside of R:
 source("https://bioconductor.org/biocLite.R")
 biocLite('ballgown')

 install.packages('devtools') #A USA mirror site may be chosen

 library(devtools)
 devtools::install_github('alyssafrazee/RSkittleBrewer') 


========== Testing the installation
Download the test files and codes from
ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/

under shell
$ ./rnaseq_pipeline.config.sh
$./rnaseq_pipeline.sh out

=========Additional R packages
#Please also run the following code to install all packages in R. This may take 10-12 hours.
install.packages(new.packages())

#A prompt will ask for a mirror site. Any site from USA should work.

big java sites



http://bcs.wiley.com/he-bcs/Books?action=index&bcsId=9799&itemId=1119056446

Wiley rep


Rep NameContact Details
MARY VANN - 0065
Phone: 6175041370
Email:  MVANN@WILEY.COM

Friday, November 18, 2016

bibtex doi bug


in qin_network.bib, I added a reference with DOI field. This filed generates an error in *bbl file using $bibtex$.  I removed the DOI fileds and the bug disappeared.


Wednesday, November 16, 2016

toread, Graph Metrics for Temporal Networks - Springer


http://www.springer.com/cda/content/document/cda_downloaddocument/9783642364600-c1.pdf?SGWID=0-0-45-1393604-p174915729

toread: An Introduction to Temporal Graph Data Management1


https://www.cs.umd.edu/sites/default/files/scholarly_papers/Khurana_SchPaper_1.pdf

toread Path Problems in Temporal Graphs


http://www.vldb.org/pvldb/vol7/p721-wu.pdf

Path Problems in Temporal Graphs
Huanhuan Wu∗, James Cheng∗ , Silu Huang∗, Yiping Ke#, Yi Lu∗, Yanyan Xu∗ ∗Department of Computer Science and Engineering, The Chinese University of Hong Kong {hhwu,jcheng,slhuang,ylu,yyxu}@cse.cuhk.edu.hk #Institute of High Performance Computing, Singapore

safety training, UTC

hazardous materials

gasoline can be easily ignited, but diesel is not.

universal waste:
florescent lamp should be recycled.
computer batteries.
motor batteries

Dot hazard marking
Global harmonization container markings
NFPA rating explanation guide, NFPA 704, HMIS

423 425 HELP


Tuesday, November 15, 2016

integrating gene expression and network, a reference collection


Convert p-value of differential expression into Z-scores based using inverse Gaussian CDF.


Maybe because Ideker02 is looking for 'active subnetwork', only positive Z-score were used. No, both positive and negative Z-score were calculated.
Ideker02 seems to combine K-means and simulated annealing for network clustering. 


Tornow,S. and Mewes,H.W. (2003) Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res., 31, 6283–6289.

Segal,E. et al. (2003) Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19, 264–272.

Morrison,J.L. et al. (2005) GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics, 6, 233.



Ma, X., Lee, H., Wang, L., Sun, F.: ‘CGI: a new approach for prioritizing genes by combining gene expression and protein–protein interaction data’, Bioinformatics, 2007, 23, pp. 215–221


Integrating gene expression and protein-protein interaction network to prioritize cancer-associated
genes, Chao Wu, Jun Zhu  and Xuegong Zhang

http://www.biomedcentral.com/1471-2105/13/182
http://scholar.google.com/scholar?cites=14200881095439672925&as_sdt=5,43&sciodt=0,43&hl=en


Li et al. BMC Medical Genomics 2014, 7(Suppl 2):S4 Prediction of disease-related genes based on weighted tissue-specific networks by using DNA methylation

http://www.biomedcentral.com/1755-8794/7/S2/S4

http://hongqinlab.blogspot.com/2014/12/li14-bmc-medical-genomics-predict.html

http://hongqinlab.blogspot.com/2014/12/integrating-gene-expression-data-into.html

http://hongqinlab.blogspot.com/2014/03/build-human-gene-network-reliability.html

From Ma, 2007 Bioinformatics CGI paper:
Gene expression data and protein interaction data have been
integrated for gene function prediction. For example, Ideker
et al. (2002) used protein interaction data and gene expression
data to screen for differentially expressed subnetworks between
different conditions
. In Tornow and Mewes (2003) and Segal
et al. (2003), gene expression data and protein interactions are
used to group genes into functional modules. These methods provide
insights into the regulatory modules of the whole networks at
the systems biology level. However, it is not clear how to adapt their
methods to identify genes contributing to the phenotype of interest.
Morrison et al. (2005) adapted the Google search engine to prioritize
genes for a phenotype by integrating gene expression profiles
and protein interaction data. However, the algorithm ignores the
information from proteins linked to the target protein through other
intermediate proteins, referred to in the rest of this paper as indirect
neighbors.

Qin: Did the previous methods use human pathogenic genes? Seems not if they did not cite dbSNP or OMIM. 

X. Zhou, M.-C. J. Kao, and W. H. Wong. Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci U S A, 99(20):12783–12788, Oct 2002


WGCNA: an R package for weighted correlation network analysis.




Monday, November 14, 2016

RNAseq demo (hisat2, stringtie) error at GBitVec: index 7 out of bounds (size 7) (osX and linux)



Data and codes are downloaded from ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/ 



Byte-5:sva hqin$ ps
  PID TTY           TIME CMD
49035 ttys000    0:00.08 -bash
51361 ttys000    0:00.01 bash ./rnaseq_pipeline.sh out
51377 ttys000    0:00.03 bash ./rnaseq_pipeline.sh out
51378 ttys000    0:00.00 tee ./run.log
52036 ttys000    0:00.07 perl /Users/hqin/bin/hisat2 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/indexes/
52044 ttys000    0:00.49 perl /Users/hqin/bin/hisat2 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/indexes/
52045 ttys000    0:00.49 perl /Users/hqin/bin/hisat2 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/indexes/
52046 ttys000    3:02.29 /Users/hqin/bin/hisat2-align-s --wrapper basic-0 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisa
52047 ttys000    0:00.31 gzip -dc /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/samples/ERR188401_chrX_2.fastq.gz
52048 ttys000    0:00.31 gzip -dc /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/samples/ERR188401_chrX_1.fastq.gz
  492 ttys001    0:00.21 -bash
51812 ttys002    0:00.02 -bash

51867 ttys002    0:11.01 tar xvfz hg38_tran.tar.gz




Byte-5:samtools hqin$ cd
Byte-5:~ hqin$ cd demo.lgf/
Byte-5:demo.lgf hqin$ cd RNAseq.hisat2/
Byte-5:RNAseq.hisat2 hqin$ ./rnaseq_pipeline.sh out
ERROR: samtools program not found, please edit the configuration script.
Byte-5:RNAseq.hisat2 hqin$ source /Users/hqin/.bash_profile
Byte-5:RNAseq.hisat2 hqin$ ./rnaseq_pipeline.sh out
[2016-11-14 15:07:24] #> START:  ./rnaseq_pipeline.sh out
[2016-11-14 15:07:24] Processing sample: ERR188044_chrX
[2016-11-14 15:07:24]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:07:56]    * Alignments conversion (SAMTools)
[2016-11-14 15:08:40]    * Assemble transcripts (StringTie)
[2016-11-14 15:08:51] Processing sample: ERR188104_chrX
[2016-11-14 15:08:51]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:09:37]    * Alignments conversion (SAMTools)
[2016-11-14 15:10:24]    * Assemble transcripts (StringTie)
[2016-11-14 15:10:36] Processing sample: ERR188234_chrX
[2016-11-14 15:10:36]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:11:11]    * Alignments conversion (SAMTools)
[2016-11-14 15:12:23]    * Assemble transcripts (StringTie)
[2016-11-14 15:12:45] Processing sample: ERR188245_chrX
[2016-11-14 15:12:45]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:13:45]    * Alignments conversion (SAMTools)
[2016-11-14 15:14:40]    * Assemble transcripts (StringTie)
[2016-11-14 15:14:51] Processing sample: ERR188257_chrX
[2016-11-14 15:14:51]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:15:42]    * Alignments conversion (SAMTools)
[2016-11-14 15:16:50]    * Assemble transcripts (StringTie)
[2016-11-14 15:17:04] Processing sample: ERR188273_chrX
[2016-11-14 15:17:04]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:17:50]    * Alignments conversion (SAMTools)
[2016-11-14 15:18:34]    * Assemble transcripts (StringTie)
[2016-11-14 15:18:44] Processing sample: ERR188337_chrX
[2016-11-14 15:18:44]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:21:18]    * Alignments conversion (SAMTools)
[2016-11-14 15:22:44]    * Assemble transcripts (StringTie)
[2016-11-14 15:23:09] Processing sample: ERR188383_chrX
[2016-11-14 15:23:09]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:25:31]    * Alignments conversion (SAMTools)
[2016-11-14 15:27:13]    * Assemble transcripts (StringTie)
[2016-11-14 15:27:36] Processing sample: ERR188401_chrX
[2016-11-14 15:27:36]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:31:58]    * Alignments conversion (SAMTools)
[2016-11-14 15:33:53]    * Assemble transcripts (StringTie)
[2016-11-14 15:34:12] Processing sample: ERR188428_chrX
[2016-11-14 15:34:12]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:35:46]    * Alignments conversion (SAMTools)
[2016-11-14 15:36:44]    * Assemble transcripts (StringTie)
[2016-11-14 15:36:59] Processing sample: ERR188454_chrX
[2016-11-14 15:36:59]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:39:12]    * Alignments conversion (SAMTools)
[2016-11-14 15:40:34]    * Assemble transcripts (StringTie)
[2016-11-14 15:40:50] Processing sample: ERR204916_chrX
[2016-11-14 15:40:50]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:41:50]    * Alignments conversion (SAMTools)
[2016-11-14 15:43:03]    * Assemble transcripts (StringTie)
[2016-11-14 15:43:18] #> Merge all transcripts (StringTie)
[2016-11-14 15:43:29] #> Estimate abundance for each sample (StringTie)
Error at GBitVec: index 7 out of bounds (size 7)
Byte-5:RNAseq.hisat2 hqin$ 
Rerun the shell at Linux (ridgeside) same error:
Error at GBitVec: index 7 out of bounds (size 7)
./rnaseq_pipeline.sh: line 82:  7126 Segmentation fault      (core dumped) $STRINGTIE -e -B -p $
NUMCPUS -G ${BALLGOWNLOC}/stringtie_merged.gtf -o ${BALLGOWNLOC}/${dsample}/${dsample}.gtf ${ALI
GNLOC}/${sample}.bam



Download v.1.3.1b, rerun the shell script at osX
... ... 
[2016-11-15 14:04:53] #> Merge all transcripts (StringTie)
[2016-11-15 14:04:57] #> Estimate abundance for each sample (StringTie)
Error at GBitVec: index 9 out of bounds (size 9)
./rnaseq_pipeline.sh: line 82:  2422 Abort trap: 6           $STRINGTIE -e -B -p $NUMCPUS -G ${BALLGOWNLOC}/stringtie_merged.gtf -o ${BALLGOWNLOC}/${dsample}/${dsample}.gtf ${ALIGNLOC}/${sample}.bam

Byte-5:RNAseq.hisat2 hqin$ stringtie -v
Command line was:
stringtie -v

StringTie v1.3.1b usage: