Details
`Due date`	2020-01-24
`Format`	File upload, data upload
`Pages`	-
`Submission`	via github
`Filename`	contributors.md
`Data`	Quality-controlled hourly data from Exercise #2
`Challanges`	Manage and analyse your HOBO data using PostgreSQL

Aim

Get used to data management techniques using PostgreSQL as your database. With this exercise, you should be able to store data from measurement campaigns in a secure, performant, distributed and available manner. On top, you will be able to utilize database views to assist data analysis tasks.

Preparation

Be sure to complete HOBO excercise first, as you will need the data. You will have to upload the data to the database, recap the lecture covering data imports. Data upload will be evaluated from the database using the HOBO id, therefore the metadata file has to be filled correctly.

Important: From #2 on, the tasks are meant to be solved by means of SQL statements and R chunks. This is not described in each and every task for reasons of brevity. In order to complete this exercise, the SQL code, R code or both is necessary along with the solution of the respective task. In case the Task asks for a table and you use the result of a SQL query as this table, include the SQL!

Note: For this exercise a day temperature is considered to be between 6am and 6 pm, while a night temperature is considered to be between 6pm and 6am.

Submission

please read this carefully

The SQL exercises are submitted using github. Everyone has to participate to get the course-wide grade for exercise #6. There are several ways to participate:

direct code commit, the commiter will claim authorship for the commited code if note stated different
code review of code commits (inside a pull reqeust)
comments on pull requests
code chunks that you did not personally upload to github. Then you have to mark the section you authored and put you name into the contributors.md file.

The preferred workflow to hand in tasks is:

fork the repository Exercise 6
make your adaptions
pull request your change from your repo into the base repo
another person reviews your code
maybe adaptions
optional: define milestones, or split the tasks into projects to organize yourself
the course instructor might comment PR, as well

Do not hesitate to share incomplete, buggy or even wrong code. This gives others a chance to participate and this way github can help you to organize a coding project with 10+ people.

Task 1

Solve this task as a class. Look carefully into the raw_data table. Implement a suitable table for your qualitiy checked temperature data. Your data shall be linked to the hobo table and the raw_data table. That means, with a single JOIN query it should be possible to load the hobo metadata and the corresponding raw_data record to each qaulity checked value. Do also implement the data quality information as normalized as possible.

The exercise is considered finished as soon as the example code in the Exercise 6 repo works without errors and shows reasonable results.

Task 2

Create a view called ‘overview’ for each HOBO. Try to solve this task without intermediate views (remind the UNION…). Refer to the table below for the necessary information. The table is descibing the desired columns. The view should have one row for each HOBO:

attribute	value
HOBO id	hobo id
raw data	count() of associated data
checked data	count() of associated quality checked data
hobos within 2km 2019	count other hobos within a distance of 2km in 2020
hobos within 2km	count other hobos within a distance of 2km in all years
used in 2019	yes/no
used in 2018	yes/no
used in 2017	yes/no

the last three coulmns can either contain the string ‘yes’ or ‘no’ or just a boolean value. They should indicate if the same hobo id was used ba a student during the last three years.

Task 3

Create a view called indices on the hobos table. It should collect a number of indices for each HOBO and indexed by the id (not hobo_id!). Calculate the following numbers:

mean of day temperatures
variance of day temperatures
mean of night temperatures
variance of night temperatures
90% percentile of day temperatures
90% percentile of night temperatures

Create a map of all hobos, either using QGis or leaflet in RMarkdown. Show one index on each map and select the one map that illustrates spatial variablility of one of the indices. Discuss the decision on Github, before you accept the final pull request to submit the solution.

Use the districts table to join the indices to the city districs in Freiburg. Does this spatial aggregation change the overall picture?

Task 4

In this task you’ll have to create various plots in RMarkdown.

a)

Does the temperature show a location drift?

This is an important analysis for many spatial interpolation algorithms, as they often assume the varable to be independent of location. That means, check if there is a relationship between the temperature and either coordiante dimension.

b)

Create a timeseries of all temperature measurements of this student generation. Plot the mean and median temperature by solid thick lines and indicate the variability of all sensors by a bounding area around the mean.