Weather prediction short term algorithm


I have a set of data from the last 10 days, updated every 15 minutes, which state the temperature and humidity in a closed area (no external influences), for example a greenhouse, in the format: temperature_value, humidity_value, date(dd-mm-yyyy hh:mm:ss).

Can anyone point me to some useful short term prediction algorithms that I can use to make prediction over the following days for these values? The results will be shown every 15 minutes in the future, just like the training set is.

Basically, what I want most is to compare those algorithms and to see which one was more precise, once the time passes.

From searching over the internet, I have found some algorithms like "support vector machines", "linear regression", "random forests", but I am not sure these will help me in my particular situation, as, from what I understood, they tend to make predictions too, but on another level (for example, to predict if a mail if spam based on it's characteristics).

Thank you for your kind answers!

Show source
| documentation   | algorithm   | machine-learning   | prediction   | weather   2016-08-02 17:08 1 Answers

Answers ( 1 )

  1. 2016-08-02 20:08

    You can use something on the lines of decision trees (I'd suggest id3 decision tree algorithm).

    In the decision tree algorithm, given the training data (latest weather samples collected) the algorithm creates a tree with the attributes with the highest entropy (ability to provide information) at the top (node) of the tree and the subsequent nodes in the tree with the least entropy.

    This information gain (entropy) of each attribute is calculated and is used during the construction of the tree.

    In your case the nodes of the tree would be temperature, humidity, wind speed (attributes of the sample data set) etc.

    At the leaf nodes would be the prediction values. Suppose that you want to predict the weather as sunny, cloudy and rainy. The leaf nodes in this case would be sunny, cloudy and rainy. Depending on the testing dataset (a set of attributes) that you provide, the ID3 decision tree would guide you down to one of the leaf nodes (prediction values) in the tree.

    Since you mentioned that new attributes (dataset) arrives every 15 minutes, I'd suggest that you construct this tree every 15 minutes by recalculating the information gain of each attribute.

    You can read more about it from here.

    I had written the code for the same and posted it on GitHub. I will post it here if I find it.

◀ Go back