10.6 C
Pakistan
Tuesday, February 10, 2026
Home Blog Page 40

Exploring DATA Sciences with Prof. Dr. Murtaza Haider

0

Data Science is considered one of the lucrative and challenging job fields in the 21st century. This is an era where data itself is viewed as the modern world’s most precious resource. From healthcare to law, to industries or transportation, data sciences experts and professionals play a vital role in every sphere of life. 

At Scientia, we had the privilege of discussing Data Sciences with Prof. Murtaza Haider for our special edition on data sciences. Murtaza Haider is a professor of data and real estate economics at Ryerson University, Toronto, while holding an adjunct professorship in the Faculty of Engineering at McGill University. 

Murtaza Haider. Photographed by Peter J. Thompson, 2019.
Murtaza Haider. Photo Credit Peter J. Thompson, 2019.

Additionally, he serves as the Research Director at Urban Analytics Institute while also leading Regionnomics.Inc, a firm specializing in the economics of cities and regions. Dr. Murtaza has diverse research interests and is a thought leader in business analytics, data science, housing market dynamics, transport/infrastructure planning, and human development in Canada and South Asia. He relies on “Data-Driven Analytics” to support his arguments, designs, and overall writings. 

Dr. Haider is an author of two books: “Real Estate Markets: An Introduction” (2020) and “Getting Started with Data Science: Making Sense of Data with Analytics” (2016).

Our conversation with Dr. Haider focused on data science and analytics and their relevance for Pakistan. 

Fouz:  What made you decide to become a data scientist? How do you manage to overcome career obstacles?

Dr. Haider: Well, it was only when I started pursuing a Master’s degree at the University of Toronto when I started working with big datasets, specifically the housing sales data. Interestingly, my Master’s thesis was focused on developing hedonic price forecasting models. For that, we had data sets of nearly half a million properties. It was the late nineties, and working with such sized data sets was relatively uncommon.

Back then, I didn’t even know that I was working in Big Data or Data Science. Later my fascination continued with data. When I joined McGill University as an assistant professor, I worked on building a traffic forecasting model. My lab created a large model with almost every street in Montreal; the model covered around 130,000 streets, each with two-directional traffic, and we were able to forecast traffic, congestion, and emissions for every street in the city. 

Since then, I have worked on numerous projects involving big and small data sets. I didn’t have any obstacles in my career. I have been an inquisitive person. So, I am driven by questions. Especially, If I have a question that I cannot readily answer, I can’t sleep until I find a clue or find some datasets and start exploring for answers. Personally, it never occurred to me that I was trying to become a “Data Scientist”; all I had was a curious mind, lots of unanswered questions, and I kept going on one problem after another. I didn’t stop. 

Fouz: How do you relate Data Science to Pakistan?


Dr. Haider: Data and facts matter for Pakistan, even more so now than before. “Facts are sacred.” Why? Because when we argue with data, consensus emerges organically. For example, in Pakistani talk shows, we see people hurling allegations and making huge claims about each other about the corruption of billions of dollars. Without evidence or solid proof, how could someone claim that their opponents have looted billions of dollars? When data and facts are missing, how could one prove such an allegation? Since we do not have a data culture in Pakistan, it is effortless to accuse someone falsely or without evidence. 

If people realize that Pakistan’s economy of $278 billion is small relative to other advanced economies and that stealing billions of dollars from such a small economy will be quite difficult even if allegations of theft and corruption were true. Furthermore, one cannot steal billions of dollars without leaving several paper trails. What data science offers in this context and perspective. Understanding the size and scope of an economy with data would help one understand the scale of rent-seeking that might have happened or otherwise in a corruption-infested economic system.

I teach in my data science courses not only about numbers or methods but also about critical thinking, where we must establish the facts and understand the context. We must keep our sights on the available data and evidence before coming up with a huge claim. 

So, data science is all about finding evidence. We can structure our arguments and deliberations with data and facts. If people are required to back their claims with data and facts, lots of conflicts we see in society can be resolved. 

Consider that Pakistani courts are burdened with lawsuits related to land and property. A reliable and comprehensive database of land and property registration could help ease the judicial gridlock in Pakistan.

Fouz: Your profile shows interests in diverse fields like data science, urban planning, transportation, real estate, and human development. What do you like the most about your job and interests?

Dr. Haider: My area of research is about the economics of cities, which includes transportation, housing, labour markets. Then, also being a professor, I have worked on the management of higher education. I pursue all my research interests with data. I wouldn’t say that every problem can be solved with data. But, for any questions, we can easily find an answer with data. In summary, I research and write about matters that concern the economics and wellbeing of cities. 

Fouz: How does your typical day as a data scientist look like with new-normal after the Coronavirus pandemic? How would you manage work at home?

Dr. Haider: The pandemic has made almost everyone aware of data and its usefulness. When people talk about COVID-19 positivity, they are speaking about data and metrics. When they refer to flattening the curve, they are speaking in data terms about limiting the spread of the disease. People who have never worked with data have now learned about the importance of data because of the pandemic. 

One of the positive outcomes of the pandemic is that some aspects of planning have become data-centric. For instance, city governments and health professionals must look at data to decide whether the city or the province should be put under a complete or partial lock-down. Experts and citizens are looking at metrics like the positivity rate, what percentage of emergency beds are occupied, and the availability and production of Oxygen. I believe that even after the pandemic is over, we will continue relying on data-driven approaches to planning for health and expand the use of data to other disciplines.

book cover getting started with data science
Cover of Dr. Murtaza’s Book “Getting Started with Data Science”

Fouz: Harvard business school recently called data science “the sexiest job of the 21st century”. Let us know what led you to write “Getting Started with Data Science” and how this book helps early-career data scientists?

Dr. Haider: Yes, the book took several years to plan and write. The main reason for writing this book was to help those with data and analytics who were already employed and interested in data science but could not return to school full-time. I believed that if someone works for a firm and wants to acquaint themselves with data science, there should be a book for them. So it is not necessarily a textbook. It’s a book about maneuvering for people who are already employed and are interested in data science to reorient their careers with data. It is also a resource for those students who may not have studied math-centric subjects but still are curious about learning data science.

Fouz: How does data science help understand developing cities better, especially in developing countries like Pakistan, where there are no strong roots of the field? 

Dr. Haider: Data and data-driven urban planning are relatively disadvantaged in Pakistan. You will appreciate it when I share a North American example with you. For instance, if I need data for a city in Canada about the location of roads, bridges, employment centers, shops, and housing, I can get it from the government for free using the government’s open data portals or buy it from a software or data vendor. The entire data for the US census is available at the neighborhood level for free for those interested in exploring it.

The establishment in Pakistan will have to change course regarding data, its use and availability. A few years ago, Pakistan’s Parliament passed a law that made digital map-making illegal. This means that one cannot develop a map without the government’s approval. I think the government should create an open data portal to release data rather than caging it.  

Students in Pakistan are disadvantaged because they don’t have relevant data for Pakistan to conduct research. I experienced it when I taught courses and workshops in Pakistan. I don’t have data for Pakistan to teach data science or analytics. So it’s not helpful that I have to use data sets from the US and other places to train learners in Pakistan. 

We desperately need visionary civil servants and political leaders to pivot the government’s approach to data liberation.

The pandemic has made almost everyone aware of data and its usefulness. When people talk about COVID-19 positivity, they are speaking about data and metrics.

We must know that banning map-making in Pakistan hurts only Pakistanis because the rest of the world has access to spatial data on Pakistan. For instance, you can consult the spatial data library in any medium-sized or more prominent university in Europe and North America to download spatial data for Pakistan at 1- or 10-meter resolution. The rest of the world can avail this opportunity but not the learners in Pakistan.  

I also believe strongly that the Federal Bureau of Statistics and the Planning Commission should make the census data available to researchers and universities. I visited Pakistan in 2003 to train the staff in Geographic Information Systems (GIS) at the population Census Organization. The intent was to replace the approximately 125,000 hand-drawn with GIS-enabled digital maps to help analyze data from the 1998 Census. 

Unfortunately, not much came out of that initiative once I left, despite the dedicated efforts of our colleagues at the Population Census Organization. Other government agencies intervened and prevented the PCO from developing the analytic capacities. That was a huge mistake, and we can see the downside that Pakistani universities have no access to the Census data at the neighborhood level. Without data, we are clueless in planning. Hence, I plead that census and other government-held data, such as an anonymized version of the data from BISP (Benazir Income Support Programme and its newer incarnations) be made available to researchers and universities.

A few years ago, an article in the Economist magazine explained why so much research was being done about the US? It is not just that the American researchers are busy exploring socio-economic challenges in the United States with data and analytics, but also researchers from Canada, Europe, and other countries are exploring solutions for American challenges. Hence, when one compares published research about the socio-economic challenges in the United States, one sees that such research is far more frequently available about the US than research on other countries and regions. The Economist explained that one of the enabling reasons for research being done on American challenges is that the data are readily available about the socio-economic challenges in the United States.

All I hope is that intelligent people in Pakistan should realize that one of the most significant services the people in power can do for Pakistan’s future learners, scientists, engineers, statisticians, and economists, is to make data available to those who want to research Pakistan. 

Fouz: What has been your experience in working with data about Pakistan, whenever it was possible? 

Dr. Haider: It gives me a tremendous amount of pleasure to work in Pakistan. I used to visit Pakistan during summers to teach at NUST. I designed and helped establish the National Institute of Urban Infrastructure Planning at the University of Engineering & Technology in Peshawar and have worked on several projects with the Urban Unit in Lahore.

I’m pleased to share findings from a recent research project with the civil engineering students at the National University of Sciences and Technology in Islamabad. A group of five students and I obtained data on traffic collisions in Rawalpindi and Islamabad from Rescue 1122. We analyzed the 15,000 or so collisions and discovered ways to improve traffic safety in Pakistan. 

We found that though large trucks constituted a tiny fraction of the traffic fleet operating in Rawalpindi, trucks were involved in more than 50 percent of the collisions involving fatalities. Furthermore, we also discovered using Geographic Information Systems that those hurt in traffic collisions in Rawalpindi Cantonment areas were relatively disadvantaged because of a lack of a trauma center. As a result, civilians injured in the Cantonment areas had to be transported to trauma centers in Rawalpindi city, which added travel time and put the injured at greater risk.

The work done with the students at NUST is a good example of using data and analytics to find solutions for serious issues, such as traffic safety.

Also, Read: Talking data in healthcare and opportunities for women with Dr. Bushra Anjum

Rapid Growth of IoT Industry

0

It won’t be wrong if we call 2021 a year of digitalization. As the industries are resurfacing from the covid crisis, we also witness a massive paradigm shift to technology-based solutions for everything. This deliberate choice does seem a cost/time-saving option in the present times, providing a better customer experience. 

With the connectivity increased manifolds- thanks to the 5G services and the breakthrough inventions of Artificial intelligence and machine learning; we dive into a bubble of IoT. Do I sound like a tech zombie? Believe me; you’ve become one too. Imagine yourself living without ALEXA for a second. BOOM! Told ya.

Recently recorded statistics also hint at the leap and bound trajectory of the IoT industry in the coming yearsRecent statistics show that the global IoT market is anticipated to reach a value of USD 1,386.06 billion by 2026 from USD 761.4 billion in 2020 at a CAGR of 10.53% during the forecast period (2021-2026).  IoT Analytics expects 2021 IoT spending to increase by 24.0%, with the overall market reaching $159.8 billion by the end of 2021. Meanwhile, the number of global IoT connections is expected to reach 31 billion, an exponential increase of tenfold, as IoT Analytics reported last year. 

China managed to cease pandemic effects with its timely decisions. As a result, the industry’s IoT spending grew by 23.5%, nearly twice the global average. During the pandemic times, spending on IoT cloud/infrastructure services increased by 34.7%.

Industry stats

Industrial View

Industrial Internet of things IIOT is a new crisp for businesses with the potential to change the dimensions. IoT entrenched itself in industries from smart homes and smartwatches because of its implications in enabling the industries to automate. 

IoT can be seen in healthcare centers as IoMT (Internet of Medical Things), significantly benefits the doctors to monitor patients remotely. Also, it has enabled the patients to check their vitals at home using different health apps. IoMT has played a considerable role in constraining COVID-19, especially in China.

IoT plays a big part in driving digital transformation projects through 5G network build-out and edge computing. 

IoT also seemed to have become a strong pillar in other big businesses in the form of IoT (Internet of Retail Things), IoLT (Internet of Logistics Things), and IoWM (Internet of Workforce Management). It can monitor the whereabouts of a person. It also helps to find the likable purchase if prompted in the right place at the right time. To be precise, for industries, IoT means big money in no time!

IoT and Data Analytics

One of the sole reasons for the uphill trajectory of IoT is its partnership with Artificial intelligence and machine learning. This partnership has geared up the IoT to expand its horizon. It can now multitask functions and process the received data to make recommendations from monitoring and storing data.

Honeywell’s new Connected Life Safety Services (CLSS) is an excellent example of IoT based cloud platform that replaced technicians as it is self-sufficient in design, installation, commissioning, and maintenance. It not only takes data but is also smart enough to make informed decisions and suggestions. 

Enterprise performance-based management is a category of Honeywell that can be built from the data extracted from the organization’s environment. Cisco’s new solutions enable the organizations to increase their efficiencies to make better business decisions and accelerate digitization projects.

IoT and Data Processing

As IoT has become more of a game of processing rather than just storing and monitoring data. The sole requirement of industries from the IoT these days is to process the data faster, enabling enterprises to make informed decisions based on that processed data. Faster and more exhaustive Wi-Fi access can feed with fast-paced processing and cloud computing like 5G networks. 

IoT plays a big part in driving digital transformation projects through 5G network build-out and edge computing. The new smart vehicle partnership announced by AWS and NXP also utilizes NXP’s smart vehicle controller and AWS’s edge and cloud services. 5G indoor distributed Massive MIMO solution joint innovation is also recently launched by Huawei Technologies Co. Ltd in collaboration with China Unicom Group; a colossal breakthrough in the indoor field. Hence, with the convergence of 5G networks, the trend of IoT will only go onwards and upwards!

Though IoT promises a new exciting era of smart devices, with Alexa being our forever partner, the fact remains that the perks will come with some privacy tradeoffs. To give customers a comfortable user experience, IoT Analysts have to develop some ensured security policies. Other than that, the IoT industry is bound to take a flight only up high!

References:

Also, Read: Every data has a story: Visualizing an idea beyond data

Reviewing “Hydro-Tectonic & Fault-Zone Aquifers in Desert Terrains of Saudi Arabian Crystalline Shield”

0

The hunt for a reliable source of water aquifers is ever increasing with the rise in population combined with the frightening depletion of known freshwater resources, particularly in deserts. Experts predict that the dynamics of future geopolitics will revolve around water resources. This precious commodity is now considered the “Blue Gold” – a strategic commodity by all the nations having better insight into the future. Publishing such a book is a great addition to the subject since it’s not only addressing the frustrating situation of water resources but also pinpointing non-traditional sources of groundwater – which are largely ignored, even by the experts and subject specialists.

The broad spectrum of the research work carried out under the project is highly commendable. I feel no need to further add to the comments “Excellent Category” for the qualitative and quantitative achievements with the additional phrase “that the report documentation exceeds the expectations”, given by the reviewer, who was appointed by King Abdulaziz City for Science & Technology (KACST) for the purpose and must have gone through the report-cum-book at length.

Usually, the research papers on any scientific topic are too dry and boring, making it difficult to understand by the general readers, government functionaries, decision-makers, and students; thus, the conversion of the research papers to the level and format of a Book, making it more attractive and appealing without compromising on its scientific accuracy, would have been another uphill task for the authors. On accomplishing this objective, the authors deserve more recognition and appreciation from their readers and users. 

I must admire the aerial coverage, originally planned for 12,000 sq. km., but later on extended to the huge 350,000 sq. km. Remaining at the same cost as was originally estimated. This shows the dedication and enthusiasm of the researchers cum authors – needs to be followed by future researchers working on regional, semi-detailed and detailed levels.

The subject is thematically covered under 11 chapters with several sub-chapters in each, 289 highly illustrative figures, diagrams, and maps, 11 information boxes depicting basic facts on the subject and very nicely placed for the convenience of its readers, and a total of 10 tables- tabulating most relevant and significant data. Such extensive coverage has converted the research output to a text-cum-reference book.

Though all the topics/chapters are highly relevant to the subject, in my opinion, the most important and exemplary are chapters 3, 4, 7, 8, 10, and 11 in the wake of the need of adopting the themes given in these, to other parts of the world. Arabian Sea tectonics and impact on the hydrogeology of Arabian Plate subducting under the Eurasian Plate (both comprising shield rocks) all along the Makran Coast as well as north-Volcanic Arc regions of Balochistan – are to be studied on the pattern and style as described in Chapter-3 of the book.

In Chapter-4, pervasive coverage of field investigations for various components of a set of ground features (off-shore & on-shore) is given, which gives an in-depth view of methods and procedures to be adopted by the researchers. Chapter-7 deals with the most important and latest aspect of analyzing the visual surface expressions by application of remote sensing, which enables researchers to interpret and visualize the study of a large area in a shorter span of time, practically for any study or research work.

Understanding groundwater recharging mechanism & sources of any area as described in Chapter-8 is fundamental to understand the specific conditions and potential of an area under study. The geophysical delineation of aquifers within fault/fracture zones, as elaborated in Chapter-10, is another fast and reliable methodology that needs to be adopted for getting quick results. Meeting the future challenges of water and food security – the most important goals; can only be achieved by having a realistic vision for the future. This aspect is comprehensively covered in Chapter-11. The bibliography containing references of both printed books and websites – the authors took the benefit of – would also be highly beneficial for researchers and other stakeholders as well.        

Since the book precisely identifies the target areas for exploring non-conventional groundwater resources in highly deformed and fractured shield rocks – though focusing on Arabian Peninsula (KSA) but applicable to several droughts hit regions anywhere in the World. Pakistan, particularly the water-scarce Balochistan, could be a potential beneficiary, considering similar sort of tectonic features like major transform fault & resulting network of offshoot-faults/fractures, subduction zone, and the triple junction – a point of meeting of three major tectonic plates.

Additionally, in view of the landforms, resulting in tectonic features and climatic conditions, many of the areas of Pakistan, especially the vast & arid land of Balochistan, deserts of Sindh and Punjab provinces (Thar & Cholistan deserts), and coastal areas are closely comparable to Saudi Arabia. Availability of freshwater hydro-geological resources is already scarce in these areas. Combined with unplanned and un-judicious harvesting of the groundwater at an alarming rate has placed a big question mark on the survival of our generations to come. This scenario implies that a similarly pervasive and comprehensive study be carried out in these parts of Pakistan.

I strongly believe that the Book under review will gain a broad base of readers/end users and open up new vistas for searching new groundwater aquifers in the study area and elsewhere in the world. A lot of research work and scientific papers are also expected to emerge out of this work shortly – hence the book is expected to generate and give “food for thoughts” for the academia and students for a long time to come, along with firm guidance to the planners and decision-makers to reach to wise and firm footed planning for future.

The Book entitled “Hydro-Tectonic & Fault-Zone Aquifers in Desert Terrains of Saudi Arabian Crystalline Shield” [ISBN#: 9960-06-943-5], written by Prof. Dr. Nayyar Alam Zaigham and Prof. Dr. Omar S. Aburizaiza and published by the Scientific Publishing Centre, King Abdulaziz University Press, Jeddah; Kingdom of Saudi Arabia is a massive piece of hydrogeological work.

Brief Profiles of Authors:

Prof. Dr. Nayyer Alam Zaigham
  • Prof. Dr. Nayyer Alam Zaigham
    Presently, he is working as Executive Director, GeoEnvoTechServices (GETS), A Research Group of Geoscientists & Environments, Karachi, Pakistan. In past he has worked in different technical & academic disciplines.
Prof. Dr. Omar Siraj Aburizaiza
  • Prof. Dr. Omar Siraj Aburizaiza Prof. Aburizaiza did MS & Ph.D. in Civil Engineering from Oklahoma University, USA, during 1979-1982. He has been Professor of Water Resources Engineering, Planning and Management since 1996 in the Department of Civil Engineering, King Abdulaziz University, Jeddah.

Also Read: Book Review; Islam, Sci-Fic & Extraterrestrial Life by Jörg Matthias Determann

Data lies in the Core of Digital World

0

Let me Google it!
This application recognizes my personality perfectly. I can earn more revenue because of the precise targeting features of this media website. I feel secure when I shop online. Everybody loves Netflix…!

In today’s digitally revolutionized world, we either undertake plenty of activities online or rely on gadgets and applications to make our life stress-free. Be it’s healthcare, logistics, or transportation, Data science is everywhere.

People in developing countries like Pakistan mostly think that data science is reserved for skilled professionals who could deal with BIG DATA. Surprisingly this is a misconception; we all get in touch with products and services in our daily lives that exist only because of data science.

After the Coronavirus pandemic, online shopping is trending around the globe, and retailers now want to know customers’ feedback. They have to adapt different marketing tactics if customers do not show interest in re-buying their products. Most small shops and inexperienced retailers don’t have much information about their clients apart from names, addresses, or occasional purchase information.

Now they use social media platforms like Facebook and Instagram to get the big picture of what customers think of their brand? Companies and organizations are gathering information about their targeted audiences.
They know what you are watching, reading, buying, or playing on.

Big data is challenging the way people live as it contributes to medical services, marketing, travel and transportation, public policy, education and employment, and yes in Artificial intelligence.
Big data is challenging the way people live as it contributes to every field of science.

After purchasing online or downloading an application from PlayStore, we leave our feedback whether we satisfy or the product has disappointed us. But we hardly valued the power of science that makes all this happen to us.

There has been a meaningful increase in the big data analytic applications used in everyday life in recent years. Such applications are generally marketing techniques for profit maximization. But they have their own disadvantages when data leaks. This leaked data creates privacy issues for the customers where the application owners never take responsibility for personal data protection.

Data plays a leading role in today’s world; it is impossible to pinpoint all the ways big data affects our daily lives. Researchers indicate that about 2.5 quintillion bytes of data are created each day as our internet-connected devices track, produce, and store information. Experts from each field of science are working to apply the knowledge gained from big data in an overgrowing number of ways.

Big data is challenging the way people live as it contributes to medical services, marketing, travel and transportation, public policy, education and employment, and yes in Artificial intelligence. Among all, two significant contributions of data science are in the fields of healthcare and environmental protection.

Healthcare has progressed steadily; researchers can access big data related to past cases history. They can also store their experimental analysis and reuse it when required. Healthcare applications are trending worldwide due to their easy use and feasibility.

Realizing the more considerable impacts of data in our daily lives, Scientia Pakistan brings its June-July bi-monthly edition on DATA SCIENCE. We reached out to prominent experts working as data scientists in different fields. Dr. Murtaza Haider is a professor of Data Science and Urban Economics at Ryerson University, Toronto, also holding an adjunct professorship at McGill University. Dr. Bushra Anjum is among the leading data scientist based in Pakistan, inspiring women in science worldwide through her outstanding work and achievements. We also reached a Saudi Dr. Suleman Atique, a Saudi Arabia- based expert on informatics who briefed us about the contribution of data analysis in healthcare.

The other leading stories of this edition are deep fake audios and videos, the role of forensic science in wildlife protection, progress in medicine and environmental protection, and storytelling with the contribution of data science. We tried to sum up in one edition that “our future depends on data analytics”.
Have a lovely weekend!

Assumed extinct bird remerges on a Hawaiian volcano!

0

The kiwikiu bird, also known as the Maui Parrotbill, was thought to have gone extinct since 2019, ever since the last unfortunate few met their death through an epidemic caused by mosquitoes. They had been located at the Maui’s Natural Area Reserve in October 2019 but almost all of them had perished at the hand of the avian malarial disease.

Recently, a researcher from the Hawaii Department of Natural Resources reported to have heard an unfamiliar bird song at the volcano-based reserve. He suspected the song was that of a kiwikiu bird but wasn’t sure as it came from a certain distance. He went closer to inspect the furry being which was feeding among some kolea trees on berries. He confirmed that the bird was a lost one from 2019 which had miraculously somehow been able to survive. 

A scientist working for the Maui Forest Bird Recovery Project confirmed that the particular bird suffered from the disease but fought it and survived. Dr. Hanna Moucne of the Maui Forest Bird Recovery Project said, “This is an amazing sign of hope for the species as we still may have time to save them…This is a hopeful sign that a population of kiwikiu and other native forest birds could survive in restored landscapes in the future, especially without mosquitoes and disease.”

This was a new ray of hope for all the species of the birds and she hoped that the clean environment of the landscape could be beneficial for other similar thriving forest species.

Source: nypost.com (Article by Hannah Frishberg 27th July 2021)

Also Read: NURTURING ENVIRONMENT THROUGH DATA SCIENCE

Nurturing Environment through Data Science

0

Earth, a habitable planet that supports millions of lives, is what we all have in common. Doubtlessly, this planet is a blessing worth fighting for. However, the rate at which we are deteriorating Earth is an eye-opener. Though the figures may not sound highly deleterious, in reality, we are destroying our planet faster than we can process. Even though we claim to be advanced enough to be stepping into the 21st century and producing innovative ideas and solutions to facilitate humankind, we need to focus on grassroots and protect our environment first.

“What’s the use of a fine house if you haven’t got a tolerable planet to put it on.”  – Henry David Thoreau

Statistics reveal astonishing facts about environmental degradation. According to the World Health Organization (WHO), climate change is estimated to cause 250,000 added multifactorial deaths per year due to malaria, malnutrition, diarrhea, and heat stress between 2030 and 2050. 7,000,000 casualties have occurred due to air pollution, as stated by WHO (2016). Radical fluctuations in rainfalls have resulted in many potent issues such as floods, landslides, and droughts. Considering the current situation and drastic climatic shifts, the world will be hit by several episodes of natural disasters at the expense of numerous lives in no time.

Infinite inventions and discoveries are seen under progress across fields such as astronomy, genetics, and mechanics, all striving towards a common goal: Facilitate humanity in daily life. If the uprising technological developments can facilitate human activities greatly, they can surely aid humankind in protecting the environment. Maybe data science is the solution to these environmental affairs.

An interdisciplinary field revolutionizing the modern world, data science applies algorithms, computational tools, and machine learning techniques to extract useful information from the available raw data. The data that is worked on can be sourced from multiple channels and be in different formats. It allows rapid processing of data, enabling storage and quick retrieval of a humongous amount of data. Data science is a workflow phenomenon that involves five basic steps to obtain the desired results, as depicted in the picture below.

Data science can offer several benefits to humankind concerning environmental protection that can help us better strategize, frame measures and action plans to minimize natural disasters, protect wildlife and sustain the environment for present and future generations.

It is paramount to understand the critical aspects of the environment and how nature works to precisely utilizing the complex technology to our utmost benefit. This will help us understand how nature and natural processes, in turn, affect human health, food availability, resource exploitation, and influence human activities.

Air Pollution

One of the several issues mankind is facing for decades is air pollution. The air quality has dropped significantly over the years, and access to clean, fresh air is a hurdle. This effect is pronounced in urban areas where forest density is diminishing with the blink of the eye; automobile usage has shown a rapid increase in health-associated conditions such as chronic and acute lung diseases, respiratory disorders, and heart diseases. 

System (MCMS) uses sensors and software to instantly measure air quality in real-time, which provides meaningful data. 

It works by producing microclimatic data measurements for EPA’ criteria pollutants’ including carbon monoxide, carbon dioxide, nitric oxide, nitrogen dioxide, and Sulphur dioxide. It also provides provisions for measurements of temperature, relative humidity, and light. Carbon dioxide levels can be measured, whose concentration is one of the prime contributors to global warming. Installing these around the cities can provide crucial statistical data for air quality that can help assess the air conditions and carry out valuable measures to deal with global warming and bad air quality.

data science and air pollution
The air quality has dropped significantly over the years, and access to clean, fresh air is a hurdle.

Threat to Wildlife

Humankind has been a perilous threat to wildlife. Activities such as hunting, poaching, animal trafficking, and overfishing have devastated the number of species left. Biodiversity and species richness have deteriorated. The current statistics for wildlife degeneration indicate that this matter should be dealt with urgently to save and conserve the wildlife in their natural habitat.

Data science can be a potential source for the conservation of wildlife species of animals. The Nature Conservancy in Massachusetts and the University of Massachusetts Data Science for the Common Good Fellowship Program have claimed that it is possible to construct an algorithm that captures and sorts out trail camera images of animals even if animals’ eyes are only charged at nighttime. 

For a better insight into the movement pattern of several animals, motion-sensitive technology is also employed. Such information is then interpreted by data scientists and used to conserve and restore the natural habitats of these wild animals.

Once the natural habitats of these animals are restored, poachers and hunters can be kept away from the specific area that halts the lingering threats. Furthermore, if any tagged animals go astray or lose track, they can be directed back to their habitat.

Predicting Natural disasters

Approximately 207 natural disasters occurred globally in the first six months of 2020. 95% of the losses and destruction were due to weather-related disasters. These climatic shift-induced disasters cause temporary losses and leave behind a copious amount of destruction that takes several months post-disaster to clean and restore the area. The destruction and devastation are marked with the loss of lives, loss of crops leading to food shortage, lack of availability of clean drinking water, and demolished homes and business setups, all exhausting the economy and the life of the inhabitants.

However, these natural disasters do not always come unannounced. Data science can predict their occurrence, including hurricanes, cyclones, and floods. It uses the data of previous hurricanes; the intensity ranges provide an idea about the prevalence of upcoming disasters and the area it is most likely to hit. All this pre-requisite knowledge enables inhabitants and local governments of the hurricane-prone area to carry out suitable measures.

Data science can predict the occurrence of natural disasters, including hurricanes, cyclones, and floods.
Data science can predict the occurrence of natural disasters, including hurricanes, cyclones, and floods.

Moreover, satellites such as GOES (Geostationary Operational Environmental Satellite) observe the direction of the hurricane current and track it, producing hemisphere images at fixed time intervals. These computer algorithms also detect its occurrence point, called the “eye of the hurricane.” All these data aids in constructing a model that aids in predicting the hurricane pattern and its path. Today, a few of these predictive models include the European Center for Medium-Range Weather Forecast (ECMWF) and National Weather Service’s Global Forecast System (GFS) models.

Floods are a common and catastrophic series that come into action for several reasons: unpredictable rainfall, overflowing rivers, damaged dams, and storm surges. The demolition due to floods can be minimized if sufficient flood forecasting data is available and appropriate actions are carried out ahead of time. Satellite imaging from sources such as the Global Flood Detection System (GFDS) and aerial topography plays a vital role in apprehending the overall flood dynamic. 

Computer algorithms and machine learning techniques can predict the flow rate of water, the temperature and humidity levels near drainage sites, the soil moisture content, real-time rainfall monitoring, and much more. These details provide a better idea about the flood occurrence time, severity, and the specific locations where the probability of occurrence could be highest.

Earthquakes study and observe primarily by seismologists. Though a formidable candidate to be predicted, scientists are finding ways to foresee the origin of seismic waves using machine learning. By utilizing details about seismic signals, their path of travel from source (location) and the magnitude of the earthquake, data scientist strives to unveil ways to predict the earthquakes. Johnson, Los Alamos National Laboratory seismologist, when asked about his view on the use of machine learning to forecast earthquakes, said, “I can’t say we will, but I’m much more hopeful we’re going to make a lot of progress within decades. I’m more hopeful now than I’ve ever been.”

Water Pollution

Pollution in water sources is a matter of grave concern for both marine life and water quality. The discharged effluent (oils, toxic chemicals, plastics, harmful metals) ends up in oceans and rivers that threaten species’ habitats. 

However, machine learning can provide an easier way to clean seas, rivers, and other natural water resources to restore their original conditions. Microsoft utilizes scientists’ access to AI and machine learning technology to protect the environment and save the planet. One such project is the “Ocean Cleanup,” where the focus is on shrinking marine plastic and additional slew issues. Locating and identifying heaps of debris is made easier using AI, which saves time and labor work. This initiative aims to partner the identification system with an automated collection unit to accumulate plastic for its systematic removal.

data science to understand water pollution
Machine learning can provide an easier way to clean seas, rivers, and other natural water resources to restore their original conditions.

Though data science seems to have a bright future being utilized as a critical tool in environmental protection, further advancements and polishing must make it a sustainable practice. The United Nations Development Programme (UNDP) has marked 17 sustainable development goals (SDGs), which should form the basis for using data science to protect the environment and make it sustainable. 

Furthermore, the practical application of data science for environmental protection requires a hefty sum of financial input and resource usage, proving challenging for developing countries. It is preoccupied with other large-scale issues such as combating hunger. 

Moreover, highly educated individuals possessing profound intellect are needed to interpret the AI results to make good use of them accurately. Attaining such a high level of education and mastering the skills required specifically for this domain puts a strain on the finances. Another aspect that needs attention is that although machine learning is automated, it is prone to high error rates. Some errors are kept unfound, and they continue to influence the downstream results of the chain process, thus depriving the data scientist of the accurate picture. Such anomalies can take a significant fraction of time to be detected and then restored.

However, the scope of this technology in different fields of life, especially environmental protection, is paramount. Further developments and advances will enable us to uncover new features that might solve the big obstacle humanity is facing in the modern world.

“The goal is to turn data into information, and information into insight” – Carly Fiorina.

References:

How Data is Changing Medicine

0

As computers started taking over the world, they made sure not to miss perhaps the most important field for humans – medical science. Analogue became digital, archives became computerized, and so did medical science.

The major contribution of computer science in the world of medicine has to be data analysis. Data analysis is the process of cleaning, transforming, and analyzing different forms of data. Different kinds of tests can then be run on this reformed data to answer specific questions. Whether the tests are statistical or predictive, they make use of data analysis at every step.

Clinical Trials

Before a new medicine or vaccine is introduced in the market, it needs to be approved by government bodies. This is accomplished by showing them data that supports the purpose of the drug and analysis of said data to show that the results are significant.

People are incentivized to volunteering for clinical trials through attractions such as money. The volunteers are chosen according to certain demographics – age, gender – and their symptoms of the disease that the drug will be used against. These volunteers are then randomly assorted into two groups. One group receives the drug, while the other receives a saline solution or a non-active drug. Participants in neither of the groups are aware of whether they have received the drug or a placebo which ensures the reliability of the data. The results from the study, which constitutes thousands of participants, are added to different software, depending on the type of test that is being run. The difference between the results from both groups is cleaned to remove any extreme outliers and check if the tests’ assumptions are met. When this is completed, the data is ready to be analyzed.

The researcher then runs the tests and compares the results between the two groups. If the drug shows a significant improvement in the conditions of the participants in respect to their illness, it is one step ahead of getting approved and available for mass production and use.

Over 300 million ECGs are done in a year which provides sufficient data to help a machine differentiate between a normal heartbeat and an irregular one.

Diagnostics

Another use of data analysis in the healthcare sector is in the process of diagnosing a disease. People from different ethnicities tend to have different percentages of a disease in their population and show some variation in displaying their illnesses, leading a doctor to make a false diagnosis if they have not previously catered to people of that specific ethnicities. For example, a Hispanic man is most likely to have ‘bad’ cholesterol, and an Asian man is least likely to have high levels of ‘bad cholesterol. Such differences between ethnicities pose risks to the general population for being misdiagnosed.

If data is collected from different ethnicities regarding their portrayal of illnesses and combined, the doctors can use that data to more accurately analyze patients’ symptoms and compare them with the recorded data to make a correct diagnosis. 

Data analysis also contributes to diagnostics with Artificial Intelligence (AI) to compare new data with previous ones, like observing irregular heartbeats. Over 300 million ECGs are done in a year, providing sufficient data to help a machine differentiate between a normal heartbeat and an irregular one. Similarly, Stanford researchers have developed an AI software that observes skin tags and lesions and accurately diagnoses them as benign or malignant tumors.

Medicine and data
Another use of data analysis in diagnostics is Artificial Intelligence (AI) to compare new data with previous ones.

These methods of diagnosis with data analysis are fast, effective, and accurate. They can potentially save millions in finance as they prevent lengthy measures such as blood tests and biopsies. An initially high investment can lead to long-term savings that can be used in different healthcare departments.

Predicting Outbreaks

Different pathogens surround us. These disease-causing organisms are transferable and pose a big threat to healthcare systems. Recently, we have encountered the Coronavirus, a pathogen that spreads between people who come in close contact with someone already possessing it. The outbreak of this virus originated in China and has managed to spread all over the world.

Researchers collect data about the probability of the pathogen being in an area, the rate of transference, the likelihood of having the disease, the rate of travel within countries and between countries to examine how big of an outbreak there might be, and the causes for it. Areas within countries that are more likely to have a larger outbreak can better prepare their hospitals and staff to deal with it. Similarly, areas or countries with low rates of outbreaks can take precautionary measures to prevent a wider spread, such as imposing travel bans.

Staff Requirement

Another unusual use of data analysis in healthcare is the prediction of staffing. Using previous admission data of patients at different times of the year and the day, data analysis can predict how many patients are expected to make use of a hospital at any given time. Hospitals are crowded during the different holidays or at certain times of the day, which can cause an understaffed hospital to run slowly and lead to mistakes, some of which might prove fatal to a patient. If data analysis is used to predict the number of patients coming in, hospitals can have an adequate staff present. This will help prepare the hospital to have enough staff available on-site so that the hospital’s work can move more smoothly. By doing so, hospitals would avoid being understaffed and prevent the essential workers from being overworked to the point of exhaustion, which is a major cause of error in hospitals.

Feedback

Feedback is an important source of gaining information on how to improve service. Hospitals and clinics provide an essential service to people; hence they need to know anything that can help improve it. Feedback can be gained by having outgoing patients or their families fill a form or calling past patients and asking them a few questions regarding the service. These questions can include the response timing of clinicians and nurses, the behavior of the staff, the waiting time, and even the food from the cafeteria. Patients can be asked about their opinions on improving the service they were provided and how likely they were to recommend that service to their fellows. Their answers can then be input in software and analyzed to observe the major responses people have the most targeted recommendation that can then be implemented in hospitals.

medicine
Feedback can be gained by having outgoing patients or their families fill a form or calling past patients and asking them a few questions regarding the service.

Predicting Service Requirements

At times, countries have either too many hospitals or too few hospitals to provide for their population. In too many hospitals, some hospitals might be of easy access to people while others might not be, leading to a discrepancy in the ratio of doctors to patients. When there are fewer hospitals than an area requires, the present hospitals are overcrowded, and service to patients becomes slower, leading to more errors and overworked staff. If the number of patients using hospitals in a certain area is carefully analyzed, it can help predict how many hospitals are required in that certain and the size of these hospitals.

If an area has a larger number of hospitals than it needs, then some of these hospitals can be shut down, saving finance that goes into operating them. This saved money can be input into existing, working hospitals to improve their efficiency and patient satisfaction.

On the other hand, if an area requires more hospitals, the government and the private sector could be made aware of it. This would prove to be a great investment for both sectors and would benefit not only them but also the wider population.

Preventing unnecessary Emergency Room visits

Normally, data between hospitals is not shared. This means that if a person were to go to the Emergency room (ER) in two different hospitals, at two different times, for the same issue, they would have to get all the tests and checkups done twice. This can be prevented if the data of patients is shared between ERs. The patient’s details could be cross-referenced to patients from other ERs, and if they happen to have been checked at another hospital for the same complaint previously, the next doctor could be made aware of this. They could have access to their test reports and medication prescribed so that the treatment they propose next is built up from the previous one and not started from scratch. This method helps prevent unnecessary use of lab facilities which are a wastage of precious time and money. People who use the method of going to different ERs to get prescription drugs such as pain killers or opioids to abuse can be distinguished and help save resources while also lowering drug abuse rates.

Conclusion

Data analytics may be a part of data science, but it is hard to ignore its importance in the medical world. Not only does it help researchers to finish their studies, but also the general population and the hospital workers. It helps allocate funds to the right departments, prevents wastage, and is time-saving for organizations and healthcare workers. If we were to use data analysis fully, we would get the listed benefits and many more. And who knows, we may even be able to cure cancer with it!

References

Also Read: TALKING CLIMATE CHANGE, DISASTER MANAGEMENT, AND THE GEOLOGICAL STATE OF PAKISTAN WITH DR. QASIM JAN

Blockchain: The crucial technology behind CryptoCurrency

0

“Bitcoin soars, “Cryptocurrency Ethereum on the Rise,” “Bitcoin rises 9.8%..” It has become nearly impossible to avoid such headlines. I am sure that the person reading this is already aware of the ‘craze.’ But you may not be aware of where it came from, who is running it, and, most importantly, what is it? Throughout the article, we shall talk about bitcoin as it is the most renowned cryptocurrency. But the workings we’ll apply to all types with a few caveats here and there.

Bitcoin is the first of its kind. It is an intangible and decentralized currency that is entirely digital. The preceding definition essentially consists of two major parts: decentralized and digital hence ethereal. A Lot of the intrigue surrounding bitcoin comes from the decentralized part, which means that there is no central authority (e.g., the Bank) regulating anything. To fully understand it, we should take a look at our dealings with ‘normal’ money.

Let’s say that you want to send some money to your friend. You pick up your phone, open up your Bank’s website and send 5000 Rupees to her account. Soon after, you will probably receive a thank-you message from your friend. What has happened is that the specified number has been deducted from your balance at your Bank. The opposite has happened at her end; 5000 is added to her balance at whatever Bank she is affiliated with. It is understood that the Bank is of crucial importance here. But what if you use cryptocurrency instead? On the front end, everything will seem quite familiar. You pick up your phone, open up your bitcoin wallet and send crypto coins or cryptocurrency or any other coin to your friend. Hidden behind all the technology, the transaction itself happens, and this is where things drastically change. Crypto coins or cryptocurrency are deducted from your wallet and added to her Wallet. In the absence of a central authority, who does this? You and your friend yourselves! And in another case, if your coins are on the exchange, then the exchange will work as a bank, and they also have their transaction fees during this process.

THE FOUNDATION

If there is no central authority that has a ledger storing all transactions and balances, there has to be an alternative to avoid fraud and the risk of loss. If there is no such feature, anyone could say that their friend gave them while their friend is unaware of this. A method to prevent this is what provides a currency its value. 

If there is no central authority that has a ledger storing all transactions and balances, there has to be an alternative to avoid fraud and the risk of loss.
If there is no central authority that has a ledger storing all transactions and balances, there has to be an alternative to avoid fraud and the risk of loss.

Without a central authority, how are you to trust anyone’s claims regarding their balance? The answer is simple. You don’t have to. The need for trust is entirely sidestepped by providing every bitcoin owner with an independent record of all past transactions that have ever happened. A user cannot add or edit the information unless authorized by all other users. This is called ‘blockchain technology,’ which is what all cryptocurrencies are based on.

When a transaction goes through, the computer sends out a signal containing information about the transaction to every bitcoin wallet. This also happened when you sent the cryptocurrency to your friend. Your phone sent a signal containing this exact information into the bitcoin network. Every transaction made is eventually added to a block which is then added to the blockchain. We’ll discuss this in more detail later.

CRYPTOCURRENCY

Due to the nature of the information being sent, its security is critical. To ensure as such, one can digitally sign their message. Digital signatures, similar to handwritten signatures, offer proof that you approve of what has been signed. A digital signature relies on asymmetric encryption. This is a method to encrypt or decrypt your data using a pair of keys. The private key is only known by the owner, while the public key is available to everyone.

The outcome is what is known as a digital signature. Without your private key, a third ‘friend’ cannot just send out the message that you paid him 10 bitcoins, thus officially reducing bitcoins from your wallet and adding them to his. To further intensify the security, even if the message is slightly altered, the digital signature completely changes. This prevents anyone from copying your signature and adding it to another letter. If anyone wants to verify a message, they can apply the owner’s public key to tell whether its matching private key produced the digital signature.

cryptocurrency
An idea central to bitcoin and its functionality is a cryptographic hash function.

But digital signatures are not the only type of cryptography used. An idea central to bitcoin and its functionality is a cryptographic hash function. A hash function is basically a mathematical function that converts an input message into a collection of bits. This is done according to a set of rules that make up the function. More specifically, a cryptographic hash function creates an output that is nearly impossible to reverse engineer into the original message.

DATA MINING

We are still to address the significant problem that may arise in all of this. How can we trust that the message we received has been received by everyone else? Or that the block of transactions we have does not conflict with other blocks? This is where data miners step in and earn their due.

When you make a bitcoin transaction, you pay a small fee, usually a reward for the miner to validate your transaction. When a block is filled with transactions, it is verified by millions of computers owned by miners on the network. Miners are the people with computers that add transactions to a new block. But to do so, they have to solve an extremely complex mathematical problem. A miner aims to find a number to add to a block, leading to the hash address of that block. In a blockchain, a block contains the hash of the previous block, the transaction details, a 32-bit number called a nonce (the number the miner has to figure out), and the hash of the block itself. Due to this, solving the problem gets increasingly difficult as the blockchain grows. 

This process needs a lot of computational power as computers usually guess the number rather than reaching it by solving the problem. When a lucky miner guesses the correct number, their block is added to the chain, gaining some bitcoins. This ingenious solution of data mining or ‘proof of work’ was suggested in the original paper of 2008 that initiated bitcoin. Many cryptocurrencies use another method known as ‘proof of stake,’ which doesn’t require any mining, saving energy, and computational power. Similar to this change, new cryptocurrencies are innovating and integrating consistently, whether they’re catering to specific niches or being more general-purpose. Whatever opinion anyone holds of this phenomenon, there is no denying that we are all excited and curious to see where this leads us economically and technologically.

Bibliography:

https://builtin.com/blockchain

https://blockgeeks.com/guides/what-is-blockchain-technology/

Talking Data in Healthcare and Opportunities for Women with Dr. Bushra Anjum

0

The healthcare industry is one of the most striking beneficiaries of data sciences. In post-Covid world medical diagnostics, medical treatment is becoming more efficient and accessible, medical treatment more personalized, and medical research more data-driven.

Data scientists drive innovation across the healthcare sector; like chatbots that can help patients find a good physician, productivity applications that can automate administrative tasks, and recommendation services that can identify patients who could benefit from a new clinical trial.

Dr. Bushra Anjum is among the few inspiring ladies leading in data sciences and is currently based in California. Born and raised in a middle-class family in Pakistan, her father served in the Pakistan Army, and she’s the youngest of three daughters. While doing her master’s in computer science from LUMS, she showed interest in studying abroad for a Ph.D. degree. Fifteen years ago, she came to the US, and with her hard work and devotion to her career, she’s now a trained data scientist, having worked extensively with predictive analytics. Currently, she is working as the Senior Analytics Manager at Doximity, a San Francisco-based health tech company.

Below is the recent conversation we had with her. Dr. Anjum discusses the promise of data in the field of healthcare, some fascinating details about the data products she has been building, her volunteer ventures, and how women are essential to create an equitable data-driven future for all of us.

Let us know about your life and career? Who inspires you the most for an offbeat career like a health data scientist?

Dr. Bushra: I am a trained data scientist, having worked extensively with predictive analytics, and currently working as the Senior Analytics Manager at Doximity, a San Francisco-based health tech company. I received my Ph.D. in Computer Science at North Carolina State University in 2012, then served in academia (both in Pakistan and the USA) for a few years before joining the tech industry. I joined Amazon and worked for the Prime team, where I was a backend engineer for four years. My research background is in performance evaluation and queuing theory. Combining that with the engineering expertise gained during my tenure at Amazon, I switched to the field of data science, which brought me to Doximity. I joined as a data scientist and later got promoted to leading the revenue wing of the company.

The inspiration for my work is the patients and their caregivers, family, and friends who are not hired to do the job but do it out of love and concern. Our health care systems worldwide need to be effective, yes, but equally if not more important, they need to be empathetic. No one needs software complexity, vague instructions, incorrect diagnostics, and unsafe personal and financial data on any given day. However, consider dealing with all this on top of the emotional burden, worry, and life & death uncertainty that patients and their caregivers fight through; it almost becomes criminal. Modern day data analytics and data tools have the potential to make the patients’ and caregivers’ life easier, and I would like to play my part in making it happen.

Can you share an experience when you gathered data from multiple resources and combined it into actionable insights for your company? How did you determine which source was relevant and how good your product is performing? 

Dr. Bushra: I am happy to share some details of a data product called “Press Boost”, that I designed and implemented. But before going into the product detail, some background on the company, Doximity, is needed. A comparable sometimes used to introduce Doximity to an unfamiliar audience is like LinkedIn for doctors. Only verified physicians can join the network, making the conversations, discussions, and referrals safe and HIPAA laws compliant. Doximity aims to be the newsfeed of medicine, personalized for each verified physician. We have the largest readership in medicine, have an in-house editorial team, and analyze (and surface) 200K+ articles per week. “Press Boost” is a (free) data product that helps source articles with high medical value and engagement potential for our clients. 

We ingest thousands of medically relevant articles daily from hundreds of online news publications (for non-journal organic articles) and PubMed (for research-based journal articles). After ingestion, the articles go through various rounds of NLP and regex matching to determine their medical relevance and extract hospital and medical facilities mentioned in those articles. Successful completion of these steps gives us a mapping of how medically relevant an article is and which hospital systems and facilities it mentions.

The product that I built, “Press Boost,” helps source articles with high engagement potential. First, it tracks internal trending articles based on engagement on Doximity (clicks, likes, comments). It then also looks at external trending research articles by ingesting the Atlmetric score (how much and what type of attention a research output has received) and the Mendeley readers score (how researchers engage with research on Mendeley). Finally, all of these factors are weighted and combined into scores. Such top-performing news content is then redistributed to clients associated with or interested in the hospital systems mentioned in those articles.

For interested readers, there is a beginner-friendly “Product Spotlight: Press Boost” presentation available on our website.

Being a leading woman in health IT data science, how do you overcome career obstacles? Have you encountered gender discrimination or sexism?

Dr. Bushra: I face the same challenge that every other woman in the technology world faces, the dichotomy of dual expectations. This is best explained by Dr. Deborah Gruenfeld, a social psychologist and professor at Stanford Business School. She defined the dual expectations as playing high, which means you show your authority, power, influence, and playing low, which means you are more approachable and likable. (See her video explaining the concept here). As leaders in the technology field, we are expected to play high, but as women, we are traditionally expected to play low. So, when we play high, we are deemed not likable, and when we play low, we are considered to be not competent! It’s a continuous balancing act. Patience, good judgment, and wisely picking my battles have been my friends in this journey.  

CRA-W MS vs PhD Advice Session Data
Dr. Bushra Anjum speaking at a Computing Research Association (CRA) Advice session. Credit: www.bushraanjum.info

As far as gender discrimination and sexism is concerned, there is no denying that it exists. I, however, have a certain approach to dealing with it. I read this great quote by Deepak Chopra “What you pay attention to grows. If your attention is attracted to negative situations and emotions, then they will grow in your awareness.” Hence, I don’t actively scan for discriminatory behavior, as that will put you too much into the fight or flight zone (thanks amygdala!); however, if discrimination openly finds me, I fight against it with all my strength, courage, and prudence.

Being associated with ACM (as a senior editor) and ACM Women (as the standing committee’s chair), what are some of the contributions you are most proud of? 

Dr. Bushra: I am a keen enthusiast of promoting diversity in the STEM fields, especially encouraging women to be a part of the evolving disciplines of Computer Science and Data. I am a volunteer at Association for Computing Machinery – WomenComputing Research Association- Widening ParticipationRewriting the CodeTechGirlzMentorNet, to name a few, and some regional groups like Pakistani Women in Computing and WomenInTechPK. Some of the most rewarding experiences in my life have been as a volunteer.

I am a senior editor for Ubiquity, ACM’s peer-reviewed web-based magazine devoted to the future of computing and the people who are creating it. I have been the first female member of our editorial board and started a new section, “Ubiquity: Innovation Leaders,” which consists of interviews with young professionals who comment on their concerns about the future of computing and their ambitions to shape the future through their leadership. As a result, I have been able to present several moving and compelling stories from diverse backgrounds and computing disciplines. 

I am also the Standing Committee’s Chair for ACM-W. There I got the opportunity to propose a new initiative, a web series, “ACM-W: Celebrating Technology Leaders.” The idea is to bring stories and advice from engaging speakers, women with diverse careers in computing, directly to our global audience.

Pakistan has a severe lack of peer-reviewed and general science magazines. What role do you think magazines like Scientia Pakistan can play in promoting science writing culture in Pakistan, especially among university students?

Dr. Bushra: I genuinely believe that Scientia, with its mission to re-shape the narrative of science journalism in Pakistan, is doing an excellent service not only for the Pakistani student community but for a global audience. 

Science is not an elitist club’s game, and good writing is not an outdated skill. Scientia is working towards mitigating both these misconceptions. Science is ubiquitous, everywhere working for everyone; hence it should be accessible to everyone. One’s writing reflects one’s personality, and “unedited incoherent streams of consciousness riddled with cyberslang, shorthand, and emojis” is not an attractive personality type. Scientia enables, encourages, and brings quality scientific writing to the masses without compromising either accessibility or scientific merit. Kudos to the entire team!

I believe active partnerships with leading universities and software houses in the country, encouraging academicians and practitioners to contribute regularly, will increase the visibility and impact of the initiative.

How do you see the field of health data evolving?

Dr. Bushra: Data Science, combined with machine learning & artificial intelligence advances, has enormous potential for improving the health industry. Data science can improve the speed and accuracy of testing and diagnosis, improve health research and drug development, strengthen diverse public health interventions, etc., and never has the utility been more apparent than in the COVID-19 era. e.g., in the last year and a half, data has been extensively used to

  • Understand and predict the pandemic spread (using principles from network science, econometrics, applied microeconomics, etc.)
  • Create effective treatments, creating algorithms capable of computationally generating, screening, and optimizing hundreds of millions of therapeutic antibodies (gene sequencing, computational biology, etc.)
  • Resume and maintain economic activities (epidemiological modeling, epidemic dynamics, social networks, time series analysis, agent-based, network simulation, complex systems, etc.)
  • Track spatial distribution of COVID-19 (contact tracking, geo-visualization, spatial data science, digital biomarking, remote sensing, etc.)
  • Understand the evolution of hate speech, misinformation (large-scale measurements, social media, game theory, etc.)

However, the value proposed can only be realized if ethics, empathy, and civil liberties are at the core of the algorithmic design, data modeling, deployment and analytics usage. Two of the most significant issues in the data world are (1) unethical collection and use of patient data and (2) biased algorithms. World Health Organization (WHO) has recently released WHO: Ethics and Governance of Artificial Intelligence for Health Report (June 28th, 2021) that identifies six principles to ensure AI works to the public benefit of all countries. These principles are protecting human autonomy, providing informed consent, quality control, and transparency of the algorithms, inclusiveness irrespective of age, gender, ethnicity, etc., and transparent continued monitoring during actual use. I firmly believe these are indeed the six areas of future growth, research, and practice in the field of health data. 

Do you think that there are significant opportunities for women data scientists in healthcare management?

Dr. Bushra: As I mentioned before, data science faces two significant challenges: unethical collection and usage of data and biased information. A major source of bias in many datasets is that the people who collect, organize and analyze the data do not represent the people that will actually be using the technology. For example, according to Harnham’s Diversity Report for the US Data & Analytics industry 2020-2021, women hold only 18% of the data science jobs, and the problem is likely worse in most lower-income countries. Data, in most cases, is like Rorschach charts; people see their own values, interests, and experiences reflected in them. If not careful, this opens the door to bias at every stage of the data value chain (sourced from Open Data Watch).

 Data, in most cases, is like Rorschach charts; people see their own values, interests, and experiences reflected in them. If not careful, this opens the door to bias at every stage of the data value chain
Data, in most cases, is like Rorschach charts; people see their own values, interests, and experiences reflected in them. If not careful, this opens the door to bias at every stage of the data value chain

The underrepresentation of women, or any demographics, in data science increases the possibility that the data-driven decisions and products will not represent their interests or, in extreme cases, may harm their interests. I would highly recommend reading Carolina Criado Perez’s award-winning book “Invisible Women: Exposing Data Bias in a World Designed for Men” which talks about the issue in detail. One of the best ways to mitigate bias is to make sure that the data team consists of diverse experiences and perspectives to begin with. There is a global business realization that interpreting causal relationships and correlations in large data sets requires subtlety, and women bring different intuition to the table. The field of data science is exploding with opportunity. So yes, not only do women have a lot of scope in the field of data science, but this may be one of the best times to enter the field fueled by COVID-19.

What advice do you have for future data scientists, especially women? 

Dr. Bushra: My advice is a little broad that any young man or woman in the STEM field may be able to gain benefit from, should they agree with my point of view of course.

We are at the brink of the fourth industrial revolution, powered by a fusion of technologies that are quickly blurring the lines between real and virtual, physical and digital. We need to guide and inspire a tech workforce ready for this unprecedented, disruptive future where quick obsolescence may be the biggest threat and remaining relevant, the biggest struggle. The most important training in this regard is to help future STEM professionals grow a generalist mindset. Rather than being tied to or specialized in a particular language, framework, or solution, generalists have a basic working knowledge of multiple domains, principles, and technologies. This helps them remain relevant in a variety of engineering jobs and projects. Moreover, they know “how to learn” and thus can quickly come up to speed and morph as per given technical preferences and constraints. 

Second, this I share, especially for the women readership, you don’t have to negate parts of your personality to be perceived competent. I have shared this advice before, but I believe it cannot be reiterated enough times. For example, have you heard (admittedly well-meaning) statements like, “sure, humility is a good value, BUT it’s time to set it aside and work on self-branding”? It doesn’t have to be one or the other. You can be humble AND work on self-branding. You can be polite, flexible, and accommodating AND not let people take advantage of you. You can use “sorry” in your conversation and “just” in your emails if it is part of your politeness language, AND make sure you are being taken seriously. Whatever comes naturally to you, whatever feels authentic, is ok to hold on to while still evolving to a better self. If you want to change and leave some personality traits behind, that is fine, too, as long as you don’t feel obligated to do so.

Dr. Bushra can be reached out on Twitter @DrBushraAnjum or via her website https://www.bushraanjum.info/

Also Read: EVERY DATA HAS A STORY: VISUALIZING AN IDEA BEYOND DATA

Every Data has a Story: Visualizing an Idea beyond Data

0

“Data is the new gold.” We all are familiar with this phrase. Data can change how corporate giants do businesses, sports professionals plan their strategies and even win you elections. Data is a reliable form to present and prove hypotheses and abstract ideas to tag them as universally acceptable information. Data itself is a strong medium to prove almost anything concerning everyday world affairs. But to make the most of the data, we need to extract ideas, showcase them pleasingly, and use them to create an impact story. This forms the realm of data communication. 

THE VISUAL STORY

Visualizing the data is a part of serving it in a beautiful, comprehensive, and compelling form that is a visual treat. Scientifically, we are more likely to remember the visual aspect of the information than in another form. Also, data needs to be represented in a way so that the end-user can extract the idea or the message from it based on proper observations. Data storytelling or communication is the ability to convey the idea to a large (and targeted) audience with a simple and clear message. The story behind the data must have a context. This broadly refers to why is the story is being told and for whom it’s mean to? 

The need for a story arises when we want to showcase an idea hidden behind a data set; a data set is obtained from discrete or continuous record-keeping of an event or activity. We analyze the data, distill it until we observe a pattern, find the reason behind such pattern, and conclude. This can help us make decisions, maximize profit and minimize losses. For example, the following dashboard maintained by the World Health Organization keeps us informed about the global spread of COVID 19 cases in real-time. The darker areas in this visual are the most affected, while the lighter colored are relatively less affected. This visualization tells us the story behind how massive this pandemic has become. Similarly, giving a visual form can help you tell your story about a given data set.

WHO Coronavirus database
Image source: https://covid19.who.int

TYPES OF DATA VISUALIZATIONS

Data visualization can be of two types, exploratory and explanatory. Exploratory visuals can help us find the fundamental reasons and Logics that lie beyond the data. Some basic questions about the data can be solved using exploratory visuals, like what? Why? These visual representations need contextual hypotheses or presumptions about a possible outcome. For example, how is the literacy rate of a state-related to its standard of living? Why do mineral-rich countries are some of the poorest in the world? Explanatory visuals, also called infirmity visuals, are used when a specific aspect of already established data is to be communicated. If the data is communicated in a comprehensive and empowering way, the audience gets to draw insides, identify correlations, recognize trends, and ultimately form their own fact-based story.

The need for a story arises when we want to showcase an idea hidden behind a data set; a data set is obtained from discrete or continuous record-keeping of an event or activity.

DATA VISUALIZATION TOOLS

Data visualization tools provide an easier way to create visual representations of large data sets. These data visualizations can be used for various purposes: dashboards, annual reports, sales and marketing materials, investor slide decks, and virtually anywhere else information needs to be interpreted immediately. This ability of data interpretation is called Business Intelligence. An enterprise could use data visualization to discover what areas in the organization are pulling it back. This could be in the business sector; business intelligence lets the enterprise identify some problem areas in their business. 

A data visualization tool makes work easier, especially if you’re dealing with big data. Even then, there are numerous Data Visualization Tools in the market today.  A good data visualization tool has the following features: First is the ease of use. There are several complex tools available for representing data in a visually appealing way. Some have excellent tutorials designed in ways that feel intuitive to the user. In contrast, others lack this convenience, thus eliminating them from any list of “best” tools, regardless of their other capabilities. A good tool can also handle huge sets of data. In fact, the very best can even handle multiple sets of data in a single visualization. It also can output an array of the different charts, graphs, and map types. Most of the tools can output both images and interactive graphs. There are exceptions to the variety of output criteria, though. Some data visualization tools focus on a specific type of chart or map and do it very well. Those tools also have a place among the “best” tools out there. Finally, there are cost considerations. While a higher price tag doesn’t necessarily disqualify a tool, the higher price tag must be justified in terms of better support, better features, and better overall value.

There are mostly 3 types of data visualization software; open-source, free data, and proprietary data software. The software version is fully paid for and is accessible via cloud or installed in the standalone client-server with proprietary. 

Microsoft Power BI is one of the popular data visualizations tools. Its popularity can be attributed to its ease of use. It offers visual-based discovery, augmented analytics, data preparation, and interactive dashboards. It offers various data visualization capabilities and features such as visualization through natural languages and custom visualization. It offers access to cloud-based and on-premise data sources like Google charts or Google analytics. However, all these features come at the cost of an annual subscription from Microsoft. Nevertheless, it has got a reasonable justification for the price it makes you pay for it.

But, if you still don’t want to spend a penny on BI tools, go with Google Data Studio. Data Studio is one of Google’s Data Visualizations Tools that are designed with ease of use. It is free and open-source that lets you integrate data set to the Google ecosystem. It’s designed for enterprises that wish to integrate their Google data quickly. It may not fit businesses that need a high-functioning Data visualization tool as it may lack formatting and visualization. It turns information into informative, fully customized dashboards and reports that are easy to read, present and share. The best feature it offers is the degree of integrations it offers to create a data source. You can even use your YouTube channel data to create a dataset. If you are thinking of learning the basics of data visualization, try your hands-on Data Studio. Below is a visualization on Google Data Studio showing the difference in the research expenditure of countries over a period of time.

How do countries can spend in research and development
Image Source: https://datastudio.google.com/gallery

Tableau is another free Data Visualization Tools Open Source that allows creators to get used to the tool with little investment easily. It creates a platform for sharing data visualization and insights. Tableau Public is designed for customers who need to evaluate Server and Desktop applications. A tableau is a good tool for businesses that do not need vast features and want software that is easy to use and affordable. On the downside, the information and data used in the tableau desktop tool are public and can be accessed by anyone; hence they aren’t secured. Tableau comes as an open-source with a free plan.

Choosing the right and best Data Visualization Tools can be tricky, but it’s rather easy. Start by analyzing the features they offer, such as language support and cross-browser testing.  Regardless of whichever tool you pick, you should also ensure that the tool meets your needs.

WHY TELL A STORY?

The art of data visualization might involve using shapes, geometric colors, graphs, and other things to represent your data visually. All you need is to create an interface that can be interactive. Now, you have the responsibility to showcase the data in the best possible way, and it’s up to the audience (or the end-user) to make the most out of your data. When it comes to storytelling, the responsibility is on your shoulders. Now you’re responsible for what the user gets from your representation of data. It’s much like the art of communicating expressions. Whatever you deliver must ensure that the audience must have a take away from it; that’s your responsibility. You have to prepare beforehand and think of the outcome before you start creating a story to tell. So if you were telling it, it’s your responsibility that people get it.

Almost everything can be represented in the form of data. And every data can be expressed in visuals. Data, when represented in a visually suitable way, always has a story to tell. And, a story based on authentic sources is the best way to make an impact and, largely, a change. Happy Communicating!

References:

Also, Read: Unique Story of a Cyber Crime