Undergraduate Projects

Predicting Nutrient Percentages in NSW Soil Health Data

We were tasked to apply machine learning techniques to analyse a real-world dataset. This was the NSW Government soil-health dataset, which included scales of key soil features, a chemical analysis and a texture analysis. We developed models that could predict the percentage of nutrients in soil. Our models primarily involved classification, which were evaluated through the Accuracy, Precision and Recall metrics.

Ridge and Lasso regression were performed, and it was determined that nutrients were not reliably predicted in a continuous format. The regression models were then converted to binary classification models which found moderately high performance (84% accuracy).

A neural network model was developed, utilising 50 hidden layers, a regularisation strength of ~0.075, and a learning rate of ~0.003. This model performed the best out of all of those considered, with the most balanced precision and recall, and the highest accuracy (87%).

K-Nearest Neighbours was performed and optimised through principal components analysis (PCA), finding similar but slightly worse performance (80% accuracy) than the ridge and lasso models.

Finally, a support vector machine model was built with a soft margin of 1, with a linear kernel producing the best accuracy, matching that found in the neural network (87%).

Developed By:
Rebecca Adams
Mitchell Carder
Alivavine Rawali
Krish Singh