[More of a technical post, but I hope that it will interest our water quality professionals]
Today we are concerned about the problem of water pollution on a scale never before. All of us want to know the status on our water quality and whether the measures taken to protect or improve have been effective. Unfortunately, the picture so far has been rather dismal. But are we collecting, analyzing and reporting data correctly?
We have been monitoring India’s water quality in estuaries, coastal areas, rivers, lakes and ground water wells for years. At these monitoring stations, we have been analyzing a large number of water quality parameters ranging from simple measurements of pH, temperature and dissolved solids to dissolved oxygen, biochemical oxygen demand and in many instances checking presence of toxic substances such as pesticide residues. Biomonitoring surveys are also carried out on few stretches that can be compared with the results based on physico-chemical parameters.
Number of water quality monitoring stations at a national level have progressively increased. Few years ago, we started deploying automated water quality monitoring instruments that could report as many as 16 parameters at high frequency such as 15 min. A considerable data on the status of water quality has got built in this process. High frequency water quality monitoring stations provide better insight to dynamic behavior of water quality as against limited and sometimes misleading information arising from manual sampling that is generally carried out only once a month and in many instances on grab basis. But remember that automated stations if not operated correctly can give voluminous but “garbage” data.
Choosing location of water quality monitoring station is a very important step. Stations are to be selected based on the purpose e.g. where the station to be cited is to serve as a baseline or to detect trends or to detect violations over standards especially in the mixing zones where wastewaters are discharged into the river. The latter category of the stations is called as “impact” monitoring stations. We cannot use for instance data from impact stations to infer long term water quality trends.
To answer the question whether the water quality status is improving or not, the water quality data needs to be processed with rigor. Ideally these computations should include detection of trends on quantitative basis and assessment of the extent of violations.
Computation of water quality trends is best done using tests such as Man Kendall’s (MK) statistic. MK has been applied extensively by regulators across the world to detect water quality trends. It is not presently used by India’s Pollution Control Boards (PCBs). MK statistic (that is non-parametric or distribution free), provides the direction, magnitude and significance of trend. On application of MK statistic on the data say for 5 years on BOD (i.e. 60 values), one can arrive at a conclusion whether the water quality trend is positive (showing deterioration) and significant (say at 95% significance). When shown on a map, we can spot stations showing significant improvement or deterioration for a parameter and investigate the reasons why. More sophisticated applications of MK test are also possible where we assess trend of a “system” of parameters such as DO and BOD, done simultaneously. Where the trends are found statistically insignificant, MK can be used to compute revised sampling frequencies. This feature is one of the additional major benefits of quantitative detection of trends.
See below a typical representation of “arrow-head” map of water quality trends for a river. S denotes significant trend and NS indicates Not Significant trend.
We have to be careful that for detection of trends, we do not process data from stations lying in the mixing zones of dominant wastewater discharges (typically 50 to 100 times of the width of the river at the point of wastewater discharge). It is also important that we also assess the trends in flow as measured at the location of water quality monitoring station to understand the influence of flow on concentrations of water quality parameters. Carrying out seasonal MK statistics and/or “de-trending flow” and calculating trend of “residues” can provide better insight to answer the question “what is dominating the trend?”. These deductions help in coming up with more rounded water quality improvement plans by maintaining “environmental flows” in addition to the treatment of wastewaters. Unfortunately, PCBs do not measure flows and locations of flow measurements of CWC do not coincide with those of CPCB.
To understand the extent of violations, we should be computing the following
- Percentage of the times the prescribed water quality standard is violated
- The magnitude or extent of violations (calculated based on summation of the squares of the deviations around the standard; square capturing the severity)
- Percentage of contiguous violations with a specified period. (Such a computation is possible for high frequency water quality monitoring stations. If we specify our interest as 4 hours for dissolved oxygen, then the algorithm computes number of instances where dissolved oxygen has dipped contiguously over 4 hours below 6 mg/l, and reports the “total length of such as data train” as a percentage. This percentage provides understanding of the extent of undesirable exposure.
Figure below shows a conceptual representation of WQVI.
All the above attributes when pooled together can provide the criticality of violation or non-compliance at the water quality monitoring station. We can call this aggregation as the Water Quality Violation Index (WQVI) for a chosen parameter e.g. DO or BOD. WQVI can be calculated for more than one parameters as well. We can also use a surrogate as Water Quality Index (WQI). WQVI can be reported at all the hundreds of our water quality monitoring stations to prioritize for taking actions. Over the years number of water quality monitoring stations with high WQVI should reduce showing the progress made on enforcement and compliance. Concept of WQVI is my own innovation.
Presenting arrow-head maps of trends and changes in WQVI over years provide a robust way to communicate the progress made on water quality management. Importantly such an analysis and reporting assists in diagnosis and take appropriate actions.
In all above, we have to ensure that the water quality data we collect is of acceptable quality. This is possible only when we site monitoring stations correctly, strictly adhere to the water quality monitoring protocol (that we already have) and have trained teams for sampling and analyses. The laboratories should be well equipped and ideally holding NABL/ISO 17001 certifications. A lot needs to be done in these areas.
And there are additional challenges to address such as role of non-point pollution discharges influencing water quality trends and violations. Non point pollution discharges typically include agricultural return waters, storm water run offs, clusters of wastewater drains etc. that are difficult to measure and require estimations. Sadly, little work has been done on this subject in India.
For high frequency automated water quality monitors, we need to develop artificial intelligence (AI) based machine learning algorithms that can detect anomalies and outliers in the data and reject or assign “lower weights” while processing. Developing short term forecasting routines (e.g. using Artificial Neural Networks) will also be useful and worth especially to act in advance during any accidental spills of toxic substances upstream. Water intake works downstream could be issued warnings accordingly.
In 1985, I wrote a manual for Central Pollution Control Board (CPCB) on Analyses and Interpretation of Water Quality Data. Then came a phase between 1986-1990, where I developed one dimensional and two dimensional water quality models (STREAM series) for application on Ganga for decision making. We used the water quality monitoring data available at that time, information on flows and wastewater discharges and included estimate of non-point wastewater loads. In 2014, I analyzed 7 year water quality data on river Godavari in Maharashtra with interesting conclusions for actioning. This work remained as an isolated activity at MPCB. Currently, I am advising CPCB on processing the water quality data collected on river Ganga using several of the tools I cited in this dossier. This task simply excites me. Me and my team are developing a Tableau based application for CPCB and will train the CPCB team.
Many readers of this blog from the academia will realize that there are immense opportunities to carry out research on water quality data analytics. We need masters and doctoral students to take up such applied problems as dissertations to add rigor to water quality inferencing. Needless to state that such opportunities exist for managing air quality and noise data. I will be most happy to help.
In 1990, I conducted a 5 day training program for water pollution engineers and statistical officers of PCBs in New Delhi on the subject of water quality data analytics. I wish that I get an opportunity to conduct such a program once again and re-write the little manual I wrote in 1984. Once shown the power of these tools, especially to the younger and newly inducted team, I am sure that a magic will happen, and the data will dance – presenting an insightful show to the decision makers.
If you like this post then follow me or circulate across your colleagues
Cover image sourced from https://www.smartdatacollective.com/top-7-data-analytics-tools/