2019 Data Course

2019 Data overview

distribution

This short ‘analysis’ script shall demonstrate how we can use Github to create automated reports on the collected data in a collaborative way. The following table summarizes all sensor locations used in 2019:

All of 2019’s HOBO locations
hobo_id	radiation_influence	longitude	latitude	min_t	max_t	mean_t	p90_t
10088310	high	7.806718	48.01462	2.935209	3.349429	3.150005	3.310494
10347312	moderate	7.846758	47.99385	3.597357	4.183214	3.869199	4.112050
10347315	moderate	7.856484	47.99165	3.000959	3.534959	3.263331	3.478476
10347318	low	7.845553	48.00467	2.468881	2.799980	2.620185	2.757361
10347320	moderate	7.815115	47.99623	3.250396	3.660208	3.443525	3.606123
10347325	moderate	7.852197	47.99449	3.565740	4.078900	3.815450	4.010816
10347326	very high	7.815046	47.99625	3.730811	4.406255	4.055476	4.325709
10347334	high	7.832835	47.99594	2.929849	3.201699	3.060031	3.168557
10347365	moderate	7.857000	48.00600	3.624805	3.828831	3.727947	3.808827
10347370	moderate	7.837758	48.00113	2.724418	3.183245	2.949453	3.137543
10347386	moderate	7.820571	47.99443	3.176835	3.504983	3.337344	3.473536
10347392	moderate	7.821586	48.00696	4.095670	4.401841	4.247455	4.364576
10349993	low	7.852334	48.01661	3.452544	3.889417	3.662897	3.836199
10350007	low	7.835518	47.98811	3.277500	3.585531	3.425900	3.552456
10350009	moderate	7.842000	47.99100	3.021436	3.251051	3.150306	3.234738
10350010	moderate	7.904306	47.98636	3.074440	3.565604	3.293331	3.498308
10350042	high	7.809647	47.98841	2.878318	3.201388	3.034553	3.164152
10350043	very high	7.815046	47.99625	2.719985	3.021985	2.864837	2.986971
10350049	high	7.828954	47.97359	3.309319	3.844928	3.570272	3.786965
10350057	high	7.833801	47.99558	3.256022	3.625774	3.428990	3.580334
10350062	high	7.831752	48.01040	2.500043	2.841713	2.665629	2.805714
10350081	moderate	7.836256	47.98446	3.697644	3.963889	3.825795	3.933556
10350090	moderate	7.836111	47.98450	3.511293	3.773598	3.633039	3.737476
10350097	high	7.815128	47.99635	3.459292	3.756738	3.611459	3.725979
10350099	moderate	7.855889	48.02044	3.917778	4.143911	4.025511	4.114489
10610853	high	7.849450	47.98659	2.296417	3.468750	2.952569	3.376808
10760706	moderate	7.868000	48.04020	2.992475	3.482252	3.224680	3.423634
10760710	moderate	7.811149	48.01132	4.796038	5.185191	4.979109	5.138973
10760763	low	7.866200	48.00960	2.844423	3.120141	2.979783	3.090195
10760810	moderate	7.853889	48.01222	3.397208	3.620208	3.509257	3.597540
10760814	moderate	7.851530	48.01164	2.661575	2.894356	2.770222	2.864765
10760823	low	7.856944	47.94222	1.874965	2.446047	2.145790	2.371102
10801132	moderate	7.853889	48.01222	3.252114	3.607371	3.418845	3.562727
10801134	moderate	7.846786	48.00408	2.455714	2.814943	2.632527	2.772409

The database is searched on compile time of this RMarkdown file. Check the source for the corresponding SQL code. As you can see, there is no username or password hardcoded into the file. We use environment variables for passing in this information. The calculated indices are the mean of hourly minimum, maximum, mean and 90% percentile temperature, aggregated live on query. Remember, that the database stores 5-min data. We do also exclude the first few days and the last day of the campaign and limit the used temperatures to a recorded light intensity range of 250 < light < 2500.

An overview over all used locations and their spatial distribution is given below:

Let’s group by radiation_influence and have a look at the mean temperatures:

hobos.2019 %>%
  ggplot(aes(x=radiation_influence, y=mean_t)) +
  geom_violin()

analysis

Before you continue with in-depth analysis, or run a violin/boxplot on each index value, let’s look at the correlations between the indices:

correlation(hobos.2019)

##        min_t max_t mean_t p90_t
## min_t   1.00  0.95   0.98  0.96
## max_t   0.95  1.00   0.99  1.00
## mean_t  0.98  0.99   1.00  0.99
## p90_t   0.96  1.00   0.99  1.00

Looks like the there is a high correlation between the average hourly minimum and average hourly mean temperature. Let’s verify this by a T-Test.

t.test(hobos.2019$min_t, hobos.2019$mean_t, paired = T)

## 
##  Paired t-test
## 
## data:  hobos.2019$min_t and hobos.2019$mean_t
## t = -11.411, df = 33, p-value = 5.38e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2286325 -0.1594397
## sample estimates:
## mean of the differences 
##              -0.1940361

2019 Data Course

Mirko Maelicke

20 1 2019

2019 Data overview

distribution

analysis