Created
January 12, 2016 20:10
-
-
Save cthoyt/57ca559d77ce0507b041 to your computer and use it in GitHub Desktop.
Biological Databases Practical 2016 Day 1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": false | |
}, | |
"source": [ | |
"# Introduction relational databases\n", | |
"\n", | |
"Before we start to import and analyzes biological data in our own database we have to understand the **basic principals** of relational databases and databases in general. Perhaps you have already worked with data, generated by a devise in the wet lab, or colleagues have send you **comma separated files** (csv) you have imported into Excel or Calc (LibreOffice/openOffice). Usually this works fine but at least if the data are too big, the columns or rows are too many you will struggling with the data. You will curse you colleague if you are under time pressure and have to **correct, clean and analyze** the data.\n", | |
"\n", | |
"With Excel it will be already difficult to perform the following task:\n", | |
"* Group all proteins by organism and gene ontology keyword. Calculate the average amino acid length for each of these groups.\n", | |
"\n", | |
"With relational databases this will be an easy task if the schema is correctly designed.\n", | |
"\n", | |
"## Definitions\n", | |
"\n", | |
"If we search for a definition of relational databases we found this on [wikipedia](https://en.wikipedia.org/wiki/Relational_database)\n", | |
"\n", | |
"<cite>A relational database is a digital database whose organization is based on the relational model of data [...] The various software systems used to maintain relational databases are known as **relational database management system** (RDBMS). Virtually all relational database systems use SQL (**Structured Query Language**) as the language for querying and maintaining the database.</cite>\n", | |
"\n", | |
"As a **relational database management system** (RDBMS) we use MySQL and later we will learn the basics of the **Structured Query Language** (SQL).\n", | |
"\n", | |
"### In General\n", | |
"\n", | |
"Relational databases hold data. This data can be of **different data types**, such as numbers, characters, or dates. Within the database, the data are organized into **logical units called tables**. A table is like a spreadsheet, because it contains rows of data. Each row is made up of a number of columns. The columns hold data of a specific data type, like integer values or strings of characters. In most cases, a database has more than one table. To relate the tables together, a database designer takes advantage of natural (or artificial) **links between the tables**. In a spreadsheet, you can link rows in different sheets by cell values. The same idea holds in a relational database, and the column used to make the link is referred to as a key column (foreign key, primary key).\n", | |
"\n", | |
"\n", | |
"\n", | |
"Often data in tables have many cells with the same information again and again. This **data redundancy** is the existence of data that is additional to the actual data and permits correction of errors in stored or transmitted data. \n", | |
"\n", | |
"\n", | |
"### SQL: Structured Query Language\n", | |
"\n", | |
"Structured Query Language (or SQL, pronounced sea-quill) is a standard language for accessing and manipulating relational databases. If you are know standard SQL it's easy to query near all RDBMS (like MySQL,MsSQL,Oracle,sqllite,...). \n", | |
"\n", | |
"SQL has two main components: a **Data Definition Language** (DDL) and a **Data Manipulation Language** (DML). \n", | |
"* DDL\n", | |
" * defining data structures, especially database schemas\n", | |
" * commands are used to **create, modify, or delete** items (such as tables) in a database. \n", | |
"* DML \n", | |
" * commands are used to **selecting, inserting, deleting and updating** data from a table in the database.\n", | |
" \n", | |
"All RDBMSs have addtional functions which are not always available in other RDBMSs.\n", | |
" \n", | |
"### SQL data types\n", | |
"\n", | |
"Data types we will use in this lecture are\n", | |
"\n", | |
"* varchar\n", | |
"* integer\n", | |
"* text\n", | |
"* date\n", | |
"* float\n", | |
"\n", | |
"and later others. Some RDBMS define also their own data types e.g.MySQL\n", | |
"\n", | |
"* enum\n", | |
"\n", | |
"### Example\n", | |
"\n", | |
"Guess you read an article about ***Apoptosis of glia cells*** and want to know now more about which proteins are described in SwissProt (manually annotated and reviewed part of UniProt). \n", | |
"\n", | |
"The problem is often, that the data you can download from a website like UniProt are not in the right format to directly import it into your system. You need perhaps sometime only slightly changes but this could be a nightmare if you don't have the right tool. Often script languages like [Python](https://www.python.org/) provides you with the right tools to do this changes.\n", | |
"\n", | |
"After you have imported the data you often see inconsistencies in the data like typos or strange values. You have to check and clean before you can analyze the data. This is possible with short python scripts we will write at the end of this part of the lecture.\n", | |
"\n", | |
"### Excercise\n", | |
"Open [UniProt](http://www.uniprot.org/) and query in *advanced search* for \"APP\". Restrict your search for \"human\". Download the different formats. Open and describe the characteristics and differences." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment