Prove your database skills using PostgreSQL.
Details | |
---|---|
Due date |
2020-01-24 |
Format |
File upload, data upload |
Pages |
- |
Submission |
via github |
Filename |
contributors.md |
Data |
Quality-controlled hourly data from Exercise #2 |
Challanges |
Manage and analyse your HOBO data using PostgreSQL |
Get used to data management techniques using PostgreSQL as your database. With this exercise, you should be able to store data from measurement campaigns in a secure, performant, distributed and available manner. On top, you will be able to utilize database views to assist data analysis tasks.
Be sure to complete HOBO excercise first, as you will need the data. You will have to upload the data to the database, recap the lecture covering data imports. Data upload will be evaluated from the database using the HOBO id, therefore the metadata file has to be filled correctly.
Important: From #2 on, the tasks are meant to be solved by means of SQL statements and R chunks. This is not described in each and every task for reasons of brevity. In order to complete this exercise, the SQL code, R code or both is necessary along with the solution of the respective task. In case the Task asks for a table and you use the result of a SQL query as this table, include the SQL!
Note: For this exercise a day temperature is considered to be between 6am and 6 pm, while a night temperature is considered to be between 6pm and 6am.
please read this carefully
The SQL exercises are submitted using github. Everyone has to participate to get the course-wide grade for exercise #6. There are several ways to participate:
The preferred workflow to hand in tasks is:
Do not hesitate to share incomplete, buggy or even wrong code. This gives others a chance to participate and this way github can help you to organize a coding project with 10+ people.
Solve this task as a class. Look carefully into the raw_data table. Implement a suitable table for your qualitiy checked temperature data. Your data shall be linked to the hobo table and the raw_data table. That means, with a single JOIN query it should be possible to load the hobo metadata and the corresponding raw_data record to each qaulity checked value. Do also implement the data quality information as normalized as possible.
The exercise is considered finished as soon as the example code in the Exercise 6 repo works without errors and shows reasonable results.
Create a view called ‘overview’ for each HOBO. Try to solve this task without intermediate views (remind the UNION…). Refer to the table below for the necessary information. The table is descibing the desired columns. The view should have one row for each HOBO:
attribute | value |
---|---|
HOBO id | hobo id |
raw data | count() of associated data |
checked data | count() of associated quality checked data |
hobos within 2km 2019 | count other hobos within a distance of 2km in 2020 |
hobos within 2km | count other hobos within a distance of 2km in all years |
used in 2019 | yes/no |
used in 2018 | yes/no |
used in 2017 | yes/no |
the last three coulmns can either contain the string ‘yes’ or ‘no’ or just a boolean value. They should indicate if the same hobo id was used ba a student during the last three years.
Create a view called indices on the hobos table. It should collect a number of indices for each HOBO and indexed by the id (not hobo_id!). Calculate the following numbers:
Create a map of all hobos, either using QGis or leaflet in RMarkdown. Show one index on each map and select the one map that illustrates spatial variablility of one of the indices. Discuss the decision on Github, before you accept the final pull request to submit the solution.
Use the districts table to join the indices to the city districs in Freiburg. Does this spatial aggregation change the overall picture?
In this task you’ll have to create various plots in RMarkdown.
Does the temperature show a location drift?
This is an important analysis for many spatial interpolation algorithms, as they often assume the varable to be independent of location. That means, check if there is a relationship between the temperature and either coordiante dimension.
Create a timeseries of all temperature measurements of this student generation. Plot the mean and median temperature by solid thick lines and indicate the variability of all sensors by a bounding area around the mean.