Created
May 30, 2016 20:21
-
-
Save gabraganca/8cc1c1d907a7b1683899d16317c96392 to your computer and use it in GitHub Desktop.
Exercvício de Clustering para TDA
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Disco Fino Vs. Disco Espesso" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"import pandas as pd\n", | |
"import matplotlib.pyplot as plt\n", | |
"%matplotlib inline\n", | |
"import seaborn as sns\n", | |
"from sklearn.mixture import GMM\n", | |
"from sklearn.cluster import KMeans, MeanShift, estimate_bandwidth\n", | |
"from sklearn import preprocessing\n", | |
"from scipy.cluster.hierarchy import dendrogram, linkage, fcluster\n", | |
"from astroML.plotting.tools import draw_ellipse\n", | |
"from astroML.clustering import HierarchicalClustering, get_graph_segments" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Os espectros obtidos pelo programa SEGUE do projetos Sloan Digital Sky Survey possui uma qualidade suficinete para que os pesquisadores obtivessem parâmetros estelares. Estes parâmetros foram estimados usando um pipeline automático chamado SSPP (SEGUE Stellar Parameters Pipeline.\n", | |
"\n", | |
"A célular abaixo carrega uma seleção dos dados do nono catálago do Sloan (Data Release 9). " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>ra</th>\n", | |
" <th>dec</th>\n", | |
" <th>Ar</th>\n", | |
" <th>upsf</th>\n", | |
" <th>uErr</th>\n", | |
" <th>gpsf</th>\n", | |
" <th>gErr</th>\n", | |
" <th>rpsf</th>\n", | |
" <th>rErr</th>\n", | |
" <th>ipsf</th>\n", | |
" <th>...</th>\n", | |
" <th>FeH</th>\n", | |
" <th>FeHErr</th>\n", | |
" <th>Teff</th>\n", | |
" <th>TeffErr</th>\n", | |
" <th>logg</th>\n", | |
" <th>loggErr</th>\n", | |
" <th>alphFe</th>\n", | |
" <th>alphFeErr</th>\n", | |
" <th>radVel</th>\n", | |
" <th>radVelErr</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>40.272091</td>\n", | |
" <td>-0.642501</td>\n", | |
" <td>0.085</td>\n", | |
" <td>19.240999</td>\n", | |
" <td>0.034</td>\n", | |
" <td>17.525999</td>\n", | |
" <td>0.020</td>\n", | |
" <td>16.840000</td>\n", | |
" <td>0.017</td>\n", | |
" <td>16.613001</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.45424</td>\n", | |
" <td>0.074101</td>\n", | |
" <td>5166.100098</td>\n", | |
" <td>23.374001</td>\n", | |
" <td>4.4887</td>\n", | |
" <td>0.12392</td>\n", | |
" <td>0.28739</td>\n", | |
" <td>0.008659</td>\n", | |
" <td>-8.209900</td>\n", | |
" <td>1.5169</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>55.392067</td>\n", | |
" <td>0.857916</td>\n", | |
" <td>0.274</td>\n", | |
" <td>17.273001</td>\n", | |
" <td>0.025</td>\n", | |
" <td>16.242001</td>\n", | |
" <td>0.017</td>\n", | |
" <td>15.846000</td>\n", | |
" <td>0.020</td>\n", | |
" <td>15.686000</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.63559</td>\n", | |
" <td>0.035678</td>\n", | |
" <td>6306.200195</td>\n", | |
" <td>23.291000</td>\n", | |
" <td>3.7930</td>\n", | |
" <td>0.10438</td>\n", | |
" <td>0.11835</td>\n", | |
" <td>0.008271</td>\n", | |
" <td>-31.660000</td>\n", | |
" <td>1.1476</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>58.287651</td>\n", | |
" <td>0.935628</td>\n", | |
" <td>0.682</td>\n", | |
" <td>17.990000</td>\n", | |
" <td>0.023</td>\n", | |
" <td>16.725000</td>\n", | |
" <td>0.019</td>\n", | |
" <td>16.159000</td>\n", | |
" <td>0.016</td>\n", | |
" <td>15.933000</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.50773</td>\n", | |
" <td>0.040073</td>\n", | |
" <td>6241.600098</td>\n", | |
" <td>28.103001</td>\n", | |
" <td>3.9929</td>\n", | |
" <td>0.10319</td>\n", | |
" <td>0.17559</td>\n", | |
" <td>0.010561</td>\n", | |
" <td>26.035999</td>\n", | |
" <td>1.5966</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>30</th>\n", | |
" <td>71.559631</td>\n", | |
" <td>-0.067917</td>\n", | |
" <td>0.191</td>\n", | |
" <td>18.517000</td>\n", | |
" <td>0.023</td>\n", | |
" <td>17.209000</td>\n", | |
" <td>0.022</td>\n", | |
" <td>16.761000</td>\n", | |
" <td>0.016</td>\n", | |
" <td>16.586000</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.30270</td>\n", | |
" <td>0.032576</td>\n", | |
" <td>5940.700195</td>\n", | |
" <td>9.883500</td>\n", | |
" <td>4.1815</td>\n", | |
" <td>0.12755</td>\n", | |
" <td>0.11419</td>\n", | |
" <td>0.014712</td>\n", | |
" <td>39.236000</td>\n", | |
" <td>1.2742</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>40</th>\n", | |
" <td>84.242233</td>\n", | |
" <td>0.752848</td>\n", | |
" <td>0.716</td>\n", | |
" <td>18.823999</td>\n", | |
" <td>0.024</td>\n", | |
" <td>17.229000</td>\n", | |
" <td>0.013</td>\n", | |
" <td>16.471001</td>\n", | |
" <td>0.014</td>\n", | |
" <td>16.141001</td>\n", | |
" <td>...</td>\n", | |
" <td>-0.71515</td>\n", | |
" <td>0.091300</td>\n", | |
" <td>5354.000000</td>\n", | |
" <td>65.188004</td>\n", | |
" <td>4.5545</td>\n", | |
" <td>0.12342</td>\n", | |
" <td>0.43478</td>\n", | |
" <td>0.006252</td>\n", | |
" <td>56.334999</td>\n", | |
" <td>1.6817</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>5 rows × 30 columns</p>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" ra dec Ar upsf uErr gpsf gErr rpsf \\\n", | |
"0 40.272091 -0.642501 0.085 19.240999 0.034 17.525999 0.020 16.840000 \n", | |
"10 55.392067 0.857916 0.274 17.273001 0.025 16.242001 0.017 15.846000 \n", | |
"20 58.287651 0.935628 0.682 17.990000 0.023 16.725000 0.019 16.159000 \n", | |
"30 71.559631 -0.067917 0.191 18.517000 0.023 17.209000 0.022 16.761000 \n", | |
"40 84.242233 0.752848 0.716 18.823999 0.024 17.229000 0.013 16.471001 \n", | |
"\n", | |
" rErr ipsf ... FeH FeHErr Teff TeffErr \\\n", | |
"0 0.017 16.613001 ... -0.45424 0.074101 5166.100098 23.374001 \n", | |
"10 0.020 15.686000 ... -0.63559 0.035678 6306.200195 23.291000 \n", | |
"20 0.016 15.933000 ... -0.50773 0.040073 6241.600098 28.103001 \n", | |
"30 0.016 16.586000 ... -0.30270 0.032576 5940.700195 9.883500 \n", | |
"40 0.014 16.141001 ... -0.71515 0.091300 5354.000000 65.188004 \n", | |
"\n", | |
" logg loggErr alphFe alphFeErr radVel radVelErr \n", | |
"0 4.4887 0.12392 0.28739 0.008659 -8.209900 1.5169 \n", | |
"10 3.7930 0.10438 0.11835 0.008271 -31.660000 1.1476 \n", | |
"20 3.9929 0.10319 0.17559 0.010561 26.035999 1.5966 \n", | |
"30 4.1815 0.12755 0.11419 0.014712 39.236000 1.2742 \n", | |
"40 4.5545 0.12342 0.43478 0.006252 56.334999 1.6817 \n", | |
"\n", | |
"[5 rows x 30 columns]" | |
] | |
}, | |
"execution_count": 21, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Note que podemos carregar os dados diretamente de um link. Bacana, não?\n", | |
"sspp_df = pd.read_csv('https://www.dropbox.com/s/8gh2fhsog82jjzr/sspp_data.csv?dl=1', index_col=0)\n", | |
"\n", | |
"sspp_df.head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Os discos fino e espesso são duas subestruturas do disco Galáctico. O disco fino é mais recente que o disco espesso, e por isso possui objetos de maior metalicidade. Um gráfico canôninico para identificar estas duas componentes é a comparação entre a metalicidade [Fe/H] e a razão de abundâncias dos elementos alpha em relação ao Fe ([$\\alpha$/Fe]), que será menor no disco fino. \n", | |
"\n", | |
"Assim, com os dados do SEGUE, identifique as duas componentes usando os métodos de clustering que vimos e aponte qual é o que você acredita ser o melhor para descrever o problema." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Utilizando o método escolhido, você consegue identificar quais estrelas pertencem a qual aglomerado?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Caso consiga identificar as estrelas de cada aglomerado, como fica o diagrama HR (utilize `Teff` e `logg`)? E a posições no céu (utilize `ra`, `dec`)?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment