It was 1982. On a Friday, one of my Japanese Professors T Omura asked me to see him after dinner in his office. I was doing that time my doctoral research with Dr Bindu Lohani at the Asian Institute of Technology (AIT) in Bangkok. My research was on Optimum Siting of Air Quality Monitors.
Professor Omura had just returned from Tokyo. He greeted me and after some preliminaries, opened a case and took out a listing of a computer program. In those days we worked on Main Frame computers and Line Printers. The listing of the computer program was on a “132 column” paper and the code was written in FORTRAN.
In his typical Japanese English or Jinglish, Professor Omura said “Modak, this is the computer program I got from a Professor from Japan. It’s on Group Method of Data Handling (GMDH)”
“GMDH? Never heard about it before Professor. What is this technique about? “I asked
Professor Omura made an attempt explain to me the GMDH. Clearly he had difficulty in explaining what GMDH was about – and I only understood that GMDH was a “cybernetic” technique for building mathematical models based on data and for making accurate short term and long term predictions.
The technique used concept of self-organization, followed a layered process of model building and variable sifting. The layering process was essentially “prediction of predictions” using a reference function (typically a Kolmogorov-Gabor polynomial) and would stop at an “optimal complexity” judged by an external criteria (like Mean Square Error). The data requirements for building and operating a GMDH model were minimalistic. That was something attractive.
GMDH performed much better than the contemporary statistical (data driven) as well as “causal” models.
Professor Omura then passed on to me two papers to read. He said that these two papers will help me to understand the foundation of GMDH.
- A. G. Ivakhnenko. Heuristic Self-Organization in Problems of Engineering Cybernetics. Automatica 6: pp. 207–219, 1970.
- A. G. Ivakhnenko. Polynomial Theory of Complex System. IEEE Trans. on Systems, Man and Cybernetics, Vol. SMC-1, No. 4, Oct. 1971, pp. 364–378.
A Russian author? I exclaimed. I found even the titles of the papers rather scary!
Passing me a copy of the FORTRAN listing, Professor Omura said “Yes Modak, This listing is very confidential and not to be shared. I want you to understand the FORTRAN code, implement it on our Main Frame computer. Once it starts running, we will play with the program for few applications on forecasting river water quality. This code originates from Russia, the Kiev School”
He then paused, gazed at me and said slowly and in all seriousness “I am aware that GMDH algorithm has nothing to do with your doctoral research, but it’s an opportunity for you to learn something new – and something very exciting. I would like you to give a serious try”. I told Professor Omura that let me read and digest the two papers over the weekend and then meet on him Monday evening. He agreed.
I started reading the two papers by A G Ivakhnenko and realized I would probably need another full week to understand them. Both the papers were simply “loaded” with ideas, postulations, evidence and the theory that was something outstanding. The text was in Ringlish, not easy to comprehend and there were several hidden messages for the reader. As I went through the papers several times and consulted some “surround” literature, I realized that I was into something very advanced, a technique on the fringe of artificial intelligence and robotics. Actually, I was getting introduced to a technique of modelling most suitable for complex, fuzzy (unsure and uncertain) dynamic situations – most relevant to the “environmental systems”.
A G Ivakhnenko (wish I could meet him in person)
I met Professor Omura on Monday evening and told him that I am game to work on GMDH with him. “I will take a bit more time though to get a grip, but I will start working on the FORTRAN code immediately”. I decided to use WATFIV, or WATerloo FORTRAN IV, developed at the University of Waterloo, Canada.
Professor Omura was very pleased about my interest, commitment and enthusiasm. He ended our conversation with a very frank statement “Modak, I know nothing about how GMDH works so I won’t be able to help you. I am only interested in its application, so good luck. But let us keep meeting every Friday night”.
I spoke to my Ph. D guide Dr Bindu Lohani about my “encounter” with GMDH. He readily supported me and said. “Do not let your work on GMDH affect your PhD research. But there could well be possibilities of application of GMDH to your research problem on optimum siting of air monitors”. Dr Lohani had worked extensively on stochastic modelling and optimization and loved advanced mathematical applications on environmental systems.
I started my work. I remember the night when I cracked the GMDH code in WATFIV and could reproduce outputs of some of the case examples cited by A G Ivakhenko. It was 3 AM in the morning and I was the lone student working at AIT’s Regional Computer Centre (RCC). This was a moment of great achievement to me.
I left RCC and walked to the “Pub” outside the campus (we called the pub as Papa’s shop. Papa was a retired Thai Army soldier). I picked up a bottle of Beer Singha, sat on a wooden bench all alone (Papa and his wife Mamma were sleeping and there was nobody else in the “pub”) . I drank beer watching the traffic on the highway. There was only the sound of the rolling wheels and moving of lights from the trail of the trucks.
“I am now in world of GMDH” I said to myself
With this progress, we decided that I take on GMDH related work as “Special Studies” and drop some of the environmental courses that I was supposed to do. That was very kind of Dr Lohani. I ended up doing three Special Studies on GMDH over 8 months while pushing hard my own doctoral research simultaneously.
Dr Lohani set up a committee to oversee my Special Studies. The Committee consisted Professor H N Phien and Professor Kiyoshi Hoshi, both outstanding hydrologists and mathematicians. When I gave my seminar on GMDH in the first Special Study, describing the basic form of GMDH, its variants in Russia, Japan and in the US at MIT, both the Professors were very excited. I still recall the numerous “black board based discussion sessions” I used to have with them, floating new ideas and thinking tweaking the GMDH algorithm, and specific to complex environmental systems.
(Professor Kiyoshi Hoshi, Hokkaido University, expired in 2006 due to Cancer)
(Professor H N Phien, at the Center of the photograph – retired from AIT as Dean of School of Advanced Technologies)
(Dr B N Lohani, retired as Vice President of Asian Development Bank)
I made several applications of GMDH e.g. on river flows, river water quality, air quality and eco-systems. Compared to other models and tools available at that time, GMDH performed far better and much superior in all these applications. Importantly it revealed new vistas of pattern recognition and the power of mining of data. Its ability to surprise the modeler in deciding the optimal complexity (in terms of both variables and structure) was something that fascinated me. Even today. And that’s GMDH’s feature of artificial intelligence.
(Illustration of Predictive Power of GMDH)
Today GMDH has become one of the top tools in the family of Artificial Neural Networks(ANN). It is applied across all domains, especially in business and financial modelling. Oddly, research and applications of GMDH in the field of environment have been rather scant. One the reasons is that courses on Environmental Data Analytics are not offered at the Universities and that’s a pity.
When I returned from AIT and joined Centre for Environmental Science and Engineering at IIT Bombay, I tried to find students who could take up work based on GMDH. But I simply couldn’t find anyone who would show interest or take up as a challenge. After I left IIT and started by own consulting company – Environmental Management Centre LLP – I got busy in the consulting work.
EMC runs an internship program, that is today really well sought after – and it was only in 2007 (25 years later!) that I hit upon an Intern from Delhi College of Engineering, Neeraj Kumar. Neeraj worked with me on GMDH and in a period of just one month, we could publish some interesting environmental applications in the Prague Conference on GMDH. We presented the idea and application of “Adaptive GMDH”. I was extremely delighted to see the talent in Neeraj. He hardly spoke to the Team at EMC – and was kind of a loner!
With the pressure of continuous monitoring of emission, effluent and ambient concentrations in India, a considerable on-line data is getting generated by the industries and the regulatory bodies. Unfortunately, this BIG data is hardly used the way it should be. Possibilities of effective use are enormous based on tools such as GMDH.
I have now set up a Team at EMC on Environmental Modeling & Data Analytics and been fortunate to have some bright youngsters to work with me. We have developed “environmental dashboards” to handle real time data and will be plugging in some of the smart and most relevant GMDH algorithms. This will help in pattern recognition, understanding source influences and importantly do reliable short term as well as long term predictions for management & control. I am looking forward to adding more bright minds to my Team. So anyone interested with skills and appetite, do write to me.
I told my Professor Friend about GMDH. As usual he was a patient listener. He lighted his cigar and pointed me at the book lying on his desk. The book was by J. Scott Armstrong titled Long-Range Forecasting: From Crystal Ball to Computer, printed in 1985.
What about it? I asked. I know this book, it’s a must read for all modelers.
Professor opened the book and showed me a cartoon as below under the section of Planning Vs Forecasting. He said “See -the kid selling shoe shine service knew exactly what to do when it would rain – i.e. he must now offer the waxing service. He did not need a rainfall forecasting model!”
(This cartoon is by the world famous Morrie Turner – Source -The Registration and Tribune Syndicate – Wee Pals Comic Strip – I have reproduced the cartoon from Armstrong’s book)
Professor took a deep puff and said “Forecasting is often difficult, and may be your GMDH algorithms predict outcomes close to real. To be practical however, don’t you think that we should be ready with what to do for say scenario A and for scenario B and so on – instead of focusing only or too much on predictions? Many a times, this aspect of management is often forgotten!
I thought the Professor made a good point (as he always did).
I left his office borrowing Armstrong’s book for a second good read!
(Cover image sourced from
Message for Environmental Students
Don’t stick to conventional environmental domains all the time. Take up something beyond and it will certainly give you dividend!.
So if you meet equivalent of Professor T Omura, then don’t ever say no.
A bit more on GMDH (drawn from https://en.wikipedia.org/wiki/Group_method_of_data_handling)
Group method of data handling (GMDH) is a family of inductive algorithms for computer-based mathematical modeling of multi-parametric datasets. GMDH is used today for data mining, knowledge discovery, prediction, complex systems modeling, optimization and pattern recognition.
The most popular base function used in GMDH is the gradually complicated Kolmogorov-Gabor (K-G) polynomial:
The method was originated in 1968 by Prof. Alexey G. Ivakhnenko in the Institute of Cybernetics in Kiev (then in the Ukrainian SSR). Thanks to the author’s policy of open code sharing the method was quickly settled in the large number of scientific laboratories worldwide. At that time code sharing was quite a physical action since the Internet is at least 5 years younger than GMDH. Despite this fact the first investigation of GMDH outside the Soviet Union had been made soon by R.Shankar in 1972. Later, different GMDH variants were published by Japanese and Polish scientists.
External criterion is one of the key features of GMDH. Criterion describes requirements to the model, for example minimization of Least squares. It is always calculated with a separate part of data sample that have not been used for estimation of coefficients.
GMDH approach can be useful because:
- Optimal complexity of the model structure is found, adequate to the level of noise in data sample. For real problems, with noised or short data, a simplified optimal models are more accurate.
- The number of layers and neurons in hidden layers, model structure and other optimal neural networks parameters are determined automatically.
- It automatically finds interpretable relationships in data and selects effective input variables accordingly.
- It guarantees that the most accurate or unbiased models will be found – method does not miss the best solution during sorting of all variants (in the given class of functions).
I would recommend most recent book “GMDH-Methodology and Implementation in C (With CD-ROM)” 2014, by Godfrey Onwubolu (See http://www.amazon.com/GMDH-Methodology-Implementation-C-With-CD-ROM/dp/1848166109