הטכניון - מכון טכנולוגי לישראל Technion - Israel Institute of Technology Технион - израильский технологический институт ألتخنيون - معهد تكنولوجي لإسرائيل

02360523 - מבוא לביואינפורמטיקה 02360523 - Introduction to Bioinformatics 02360523 - Introduction to Bioinformatics 02360523 - Introduction to Bioinformatics

אביב 2020-2021Spring 2020-2021Весна 2020-2021ربيع 2020-2021

שאלות ותשובות - HW2 - Q3 Frequently Asked Questions - HW2 - Q3 Вопросы и Ответы - HW2 - Q3 أسئلة وأجوبة - HW2 - Q3

		.. (לתיקייה המכילה)

How can we get the data per cluster when we use the hclust() function ?
You can use the cutree() function. https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/cutree This function allows you to cut the dendogram according to number of clusters that you want or according to the tree hight.

It's not clear what we should put instead of "clusters" variable in the code: tsne = Rtsne(t(sdy$expr)) plot(tsne$Y, col=clusters)
You should build your tSNE plot relying on the expression data. Then, you should use the three clustering teqniques. Following the experiments that you will conduct with different parameters, each teqnique will give you its own "clusters" parameter, that tells you to which cluster each sample is related. You should use this parameter and plot the given clusters on the tSNE plot.

How can we possibly show/explain the way the data is clustered by using the demographic variables?

באחת מתוצאות הקלסטור שתבחרו אתם אמורים להשוות בין קלסטרים שונים
מבחינת גיל\מין\גזע.

לגבי גיל -
אתם יכולים לבנות
boxplot
כשבציר X
יש את מספר הקלסטר
ובציר Y
גיל.

ניתן להשתמש ב
boxplot(y~x)
או ב
ggplot(sample_info,aes(x,y,group=x))+geom_boxplot

את הרכב הקלסטרים לפי
gender,
race
ניתן להראות בטבלה
table(y,x)

גם כאן,
x
זה קלסטרים

y
זה הפרמטר הדמוגרפי.

The dbscan gives me no result, or it gives very high number of clusters (the same as a number of samples).
What can be the reason and how can I manage this technique to give me a reasonable number of clusters?

כיוון שמדובר על דאטה עם הרבה מאוד מימדים
(גנים)
dbscan
מתקשה לקלסטר.
אתם יכולים להראות תוצאה עם מספר נקודות מינימלי פר קלסטר
(שזה אומר נקודה אחת פר קלסטר)
כי זה מה שמתאפשר ע"י השיטה בהנתן מספר מימדים נוכחי.

הצביעה של
tSNE
תהיה בהתאם.

לחלופין, או יחד עם זה אתם יכולים להוריד את המימדים של הדאטאסט שלכם
ולבצע קלסטור ע"י
dbscan
כבר על ה
data
לאחר הורדת מימד.

קרי, אתם יכולים לבצע
tSNE
על ה
data

ולהריץ על התוצאה את
dbscan
בצורה של

dbscan(tsne$Y,eps=1.5,minPts=5)

כשתקבלו את הקלסטרים תוכלו לצבוע בהתאם לקלסטרים את הגרף של
tSNE.

שאלות ותשובות - HW2 - Q3 Frequently Asked Questions - HW2 - Q3 Вопросы и Ответы - HW2 - Q3 أسئلة وأجوبة - HW2 - Q3

How can we get the data per cluster when we use the hclust() function ?

It's not clear what we should put instead of "clusters" variable in the code: tsne = Rtsne(t(sdy$expr)) plot(tsne$Y, col=clusters)

How can we possibly show/explain the way the data is clustered by using the demographic variables?

The dbscan gives me no result, or it gives very high number of clusters (the same as a number of samples). What can be the reason and how can I manage this technique to give me a reasonable number of clusters?

It's not clear what we should put instead of "clusters" variable in the code:

tsne = Rtsne(t(sdy$expr))
plot(tsne$Y, col=clusters)

The dbscan gives me no result, or it gives very high number of clusters (the same as a number of samples).
What can be the reason and how can I manage this technique to give me a reasonable number of clusters?