User Tools

Site Tools


Sidebar

Data Analysis

Lectures

Worksheets

Examples

Reader

courses:msc:msc-phygeo-data-analysis:description

c1.staticflickr.com_9_8190_30010598645_82795c72fd_o.jpg

Data Analysis

Regular course within the MSc Physical Geography at Marburg University.

Course description

Data analysis is a key competence of professional geographers but it requires profound knowledge in both (statistical) analysis methods and computer sciences. While the reason for the former is obvious, the latter is a direct result of a growing data deluge, induced by technological progress on both the field of data collection and the one of data distribution.

In this module we will learn how to describe data and infer information from it by using the statistical scripting language R. Using R will not just open the door to a cosmos of data analysis functionality but also provide a flexible (also not really ideal) tool for workflow automation in the disciplinary context.

Please note that this module will focus on datasets which are not spatially explicit. Spatially explicit data analysis will be in the focus of the introductory and advanced remote sensing and GIS related modules.

The individual sessions can be grouped into three sections:

  • First things first: in session 1 and 2 we will have a quick introduction to R and some basic syntax examples. We will also introduce the R markdown language along with Git, GitHub and GitHub's classroom functionality which is required for submitting the student assignments.
  • Data exploration: in sessions 2 to 5 we will shift from a technical to a scientific task driven perspective and start with common visual and non-visual data exploration methods. Such methods are typically used for the initial analysis of environmental datasets.
  • Data modelling: in sessions 6 to 11 we will concentrate on the modelling of data dependencies primarily from a regression analysis perspective. This will encompass simple linear to more complex generalized linear models, cross-validation concepts and also some aspects of time series analysis.
  • Visualization: also intrinsically related to any other content of this module, we will also explicitly focus on visualization features for publication quality results towards the end of the course.

Have fun!

Syllabus

The course has 1 session per week, 3 hours per session.

Session Topic Content
First things first
1 First things first Data and information, R, R Studio, R markdown, GitHub, GitHub classroom
2 First things second Working environment, data sets, data types, data structures, logical operators, control structures
Data exploration
3 Look at your data Reading and writing (tabulated) data, visual data exploitation, descriptive statistics
4 Clean your data Tailoring data sets, fill values and NA, aggregating, merging or sub-setting data sets
Data modelling
5 Explain your data Linear regression modelling, confidence intervals, sample tests, variance analysis
6 Predict your data Cross-validation
7 Select your variables Multiple linear models, feature selection
8 Predict your non-linear data Generalized additive models
9 T-4 and holding Build-in hold to finish up the explanation sessions
10 Predict your temporal data Auto-correlation, AR and ARIMA models
11 Explain your temporal data Decomposing time series
Visualization and wrap up
12 Visualize your data Publication quality graphics
13 Visualize your map data Publication quality graphics
14 Wrap up Feedback, goodbye
courses/msc/msc-phygeo-data-analysis/description.txt · Last modified: 2017/05/02 15:52 by tnauss