Using Data Mining Algorithms in Separation of Sediment Sources in Nodeh Watershed, Gonabad

Authors

10.22052/deej.2018.7.19.49

Abstract

Introduction: Reduction of sediment supply requires the implementation of soil conservation and sediment control programs in the form of watershed management plans. Sediment control programs require identifying the relative importance of sediment sources, their quantitative ascription and identification of critical areas within the watersheds. The sediment source ascription is involves two main steps so that in the first, several diagnostic tracers are selected for obvious and significant separation of potential sources of sediment and in the second step selected tracers for potential sources of sediment are compared, with corresponding values extracted from the sediment samples taken in the watershed outlet. Also, due to the large amount and complexity of data available, nowadays in geo- and environmental sciences, we face the need to develop and incorporate more robust and efficient methods for their analysis and modelling. Therefore recent fundamental progress in data mining algorithms can considerably contribute to the development of the emerging field - environmental data science.
 Methodology: According to what was said, in this research, the data mining algorithms used to separate sediment sources in the Nodeh watershed of Gonabad located in Razavi-Khorasan province by using the geochemical (includes the 21 elements of Mg, Sr, Mn, Ba, Zn, Y, V, Ti, Pb, P, Na, Li, K, Cu, Cr, Co, Ce, B, Ca, Al and Fe), granulometric (includes the D90, D50, D10, percent of sand, percent of silt, percent of clay, skewness and kurtosis and the diameters less than 1, 2 and 4 millimeters and less than 500, 250, 125 and 63 microns) and lithological variables (includes the quartz, tuff, laterite, dacite, andesite, dolomite, calcite, andesitic tuff, lithic andesite and salt). A set of 11 classification algorithms includes the decision tree, random forest, regression methods, discriminant analysis, local linear model tree, nearest neighbor analysis, support vector machine, logistic regression, artificial neural network, pattern recognition and group method of data handling programmed in the MATLAB software and the results compared based on the coefficient of determination and mean squared error.
 Results and Discussion: Study of geochemical element concentrations in 7 geological units showed that the Ca, Fe, Mg and Al elements have the highest and B and Co have the lowest concentrations within the soil samples. Overall evaluation of classification algorithms in training stage showed that the discriminant analysis, random forest, k nearest neighbor and support vector machines with linear, polynomial, multiple and RBF kernels with maximum values of the coefficient of determination (R2=1) and minimum values of the mean squared error (RMSE=0) are the most accurate algorithms in sediment source separation but the regression trees method has the worst performance.  Also, at testing stage, the support vector machines with RBF kernel was the most accurate and the classification trees with maximum error rate was the most inaccurate algorithm. Also, entrance of geochemical and granulometric variables lead to the highest and lowest accuracy in the sediment source separation, respectively. Using the geochemical variables for the separation of sediment sources, types of support vector machines, nearest neighbor analysis, discriminant analysis and the random forest algorithm had the highest coefficients of determination and lowest error values in the training and testing stages. By entering the lithological variables, the random forest algorithm had the highest accuracy for the sediment sources classification in the training and testing stages and the discriminant analysis and support vector machines were located thereafter. Finally, fitting the classification algorithms using granulometric variables showed that the support vector machines had highest accuracy in the training and testing stages of models and the random forest and nearest neighbor analysis were ranked thereafter.
 Conclusion:Totally, due to the proper accuracy and performance of data mining classifier algorithms, application of these methods in the natural sciences is suggested especially in the large amounts of data. These algorithms are used to find patterns in large sets of data and help classify new information. Especially, the support vector machines that are supervised classifier algorithms and besides that, in the natural sciences have successful results. In the watershed management considering the time and cost, sediment source ascriptions are difficult to obtain using monitoring techniques, but data mining procedures, have emerged as a potentially valuable alternative. Therefore, application and evaluation of these methods are suggested for further studies and natural sciences data.

Keywords


1. Abraham, A., 2005. Artificial neural networks, Oklahoma State University, Stillwater, USA. 908 PP. 2. Arabkhedri, M., 2014. An overview of the effective factors on the water erosion in Iran. Journal of Land Management, 2 (1): 23-35. 3. American Society for Testing and Materials (ASTM), 2008. Standard test method for particle-size analysis of soils. In: Annual Book of ASTM Standards. Philadelphia. 4. Besalatpour, A.A., Ayoubi, S.A., Hajabasi, D.A., 2016. Gamma test to select the optimal inputs in modeling soil shear strength using artificial neural network. Journal of Soil and Water Conservation Researches (Agricultural Sciences and Natural Resources). 20 (1): 97-114. 5. Breiman, L., 2001. Application and analysis of random forests and machine learning. Journal of Water Management, 15(1): 5-32. 6. Breiman, L., Friedman J., Olshen R., and Stone, C., 1984. Classification and Regression Trees, Chapman & Hall/CRC Press, Boca Raton, FL. 7. Chan, C., Lewis, B., 2002. A basic primer on data mining, Information Systems Management. Journal information System Management, 19(4): 56-69. 8. Chen, Zh. And Wang, J., 2007. Landslide hazard mapping using logistic regression model in MackenzieValley, Canada. Geomorphology, Vol.42. 9. Demirci, M., Baltaci, A., 2013. Prediction of suspended sediment in river using fuzzy logic and multi linear regression approaches. Neural Computing and Applications, 23 (1): 145-151. 10. Feyznia, S., 2008. Applied sedimentology with emphasis on soil erosion and sediment production. Gorgan University of agricultural sciences and natural resources press, 356 pp. 11. Haddadchi, A., Ryder, D.S., Evrard, O. and Olley, J., 2013. Sediment fingerprinting in fluvial systems: review of tracers, sediment sources and mixing models. International Journal of Sediment Research, 28(4): 560-578. 12. Harma, N., Zakaullah, M.D., Tiwari, H., Kumar, D., 2015. Runoff and sediment yield modeling using ANN, and support vector machines (case study: from Nepal watershed). Ore Geology Reviews, 17(9): 63-89. 13. Han, D. and Kamber, M., 2001. Data Mining: Concepts and Techniques. San Diego Academic Press. 14. Hayatzadeh, m., Chezgi, G., Dastorani, M.T., 2015. Evaluation of sediment rating curve and neural network using a combination of morphological parameters Baghabas area. Journal of Agricultural Sciences and Natural Resources, 19 (70): 101-119. 15. Joudi, A. R. & Sattari, M. T., 2017. Evaluation of the performance of quenel based methods in estimating the suspended rainfall of river (case Study: Sufy Chay, Maragheh). Journal of Research in Natural Geography Vol 38(33): 413-429. 16. Kakaei-Lafdani, E., Moghaddamnia, A., Ahmadi, A. Ebrahimi, C., 2013. Daily suspended sediment load prediction using artificial neural networks and support vector machines. Journal of Hydrology, 478(25): 50-62. 17. Kavzoglu, T. and Colkesen. I., 2009. A kernel function analysis for support vector machines for land cover classification. Journal of applied Earth Obzervation and Geoinformation, 11(5):352-359. 18. Kakaei Lafdani, E., Pournemat Roudsari, A., Qaderi, K. and Moghaddam-Nia, A., 2014. Predicting the Volume of Suspended Sediments using GMDH and SVM Models Based on Principal Component Analysis. 9th International River Engineering Conference Shahid Chamran University, Ahwaz, pp: 22-24. 19. Keshavarz-Emami, R., karolouks, A., 2007. Local linear tree algorithm development (LOLIMOT) using a fuzzy validity function and credit for time series prediction. 1st Joint Congress on Fuzzy and Intelligent Systems Ferdowsi University of Mashhad, Iran, 29-31 Aug. 20. Kumar Goyal, M., 2014. Modeling of Sediment Yield prediction Using M5 Model Tree Algorithm and Wavelet Regression Journal of Water Resources Management, 28, 1991-2003. 21. Melesse, A. M., Ahmad, M. E., Mcclain, X., Wang, F. and Lim, Y.H., 2011. Suspended sediment load prediction of river systems: An artificial neural network approach. Journal of Agricultural Water Management, 98(5): 855-866. 22. Milhouse, R.T., 1998. Modeling of instream flow needs: the link between sediment and aquatic habitat Soil Sciences. Yazd University publication, Yazd, Iran, 516 pp. 23. Misra, D., Oommenb, T., Agarwal, A., Mishra, A. and Thompson, M., 2009. Application and analysis of support vector machine based simulation for runoff and sediment yield. Biosystems engineering, 6(2): 527- 535. 24. Naeini, S.T., Montazeri, M., Zamani, M.M. and Soltani, F., 2008. Sensitivity analysis of stimulus function of artificial neural network model in estimating sediment concentration. Proceeding of the 4th National congress of Civil engineering, 17-19 May, Tehran. 25. Oralbona, C., Castellini, B., Caputo, L. and sandini, G., 2010. On-line independent support vector machines pattern Recognition Application. Journal of the International Society for the Prevention and Mitigation of Natural Hazard, 10(6): 127-152. 26. Pinto, U., Maheshwar, B., Shrestha, S. and Morris, C., 2012. Modeling eutrophication and microbial risks in peri-urban river systems using discriminant function analysis, Journal of water research, 46(21): 6476- 6488. 27. Richards, J.A., 2013. Remote Sensing digital image analysis, fifth edition, Springer, 494 pp. 28. Rhoton, F.E., Emmerich, W.E., Nearing, M.A., Mc Chesney, D.S. and Ritchie, J.C., 2011. Sediment source identification in a semiarid Watershed at soil mapping unit scales. Catena, 87: 12-181. 29. Siegel. F.R., 2002, Environmental geochemistry of potentially toxic metals. Springer. Berlin Heidelberg New York, 212 pp. 30. Sattari, M.T., Rezazadehjudi, A., Safdari, F., Ghahramanzadeh, F., 2016. Performance evaluation methods, support vector regression modeling M5 model tree and suspended sediment Ahar Chai River. Journal of Soil and Water Conservation, 6 (1): 109-124. 31. Turnbull, L., Wainwright, J. and Brazier, R. E., 2008. A conceptual framework for understanding semi-arid land degradation: Eco hydrological interaction across multiple-space and time scales. Journal of Ecohydrology, 1(1): 23-34. 32. Ulke, A.G., Tayfur, R., Ozkul, S., 2009. Predicting suspended sediment loads and missing data for Gediz River, Turkey. Journal of Hydrologic Engineering, 14(9): 954-965. 33. Yap, C.A., Esmaeili, A., Tan, S., Omar, H., 2002. Correlations between speciation of Cd, Pb and Zn in sediment and their concentrations in total soft tissue of green- lipped mussel Pernaviridis from the west coast of Peninsular Malaysia. Environment International, 28(1-2): 117-126. 34. Zhu, Y. M., Lu, X. X. and Zhou, Y., 2007. Suspended sediment flux modeling with artificial neural network: An example of the Longchuanjiang River in the Upper Yangtze Catchment. China. Journal of Geomorphology, 84(1): 111-125.