Features

Digital Revolution

Big data and artificial intelligence (AI) are not just driving high-tech industries, they are transforming the humanities and social sciences too

The application of big data is powerful, but new challenges such as data security and privacy emerge.

Tsui believes that the digital humanities requires a new mindset to go alongside the new technologies.

Cheng believes that with the rise of big data there are many new opportunities available for social science scholars.

Blockchain may become the backbone system for the media to store and exchange information in a few years' time.

Zhu has moved from studying traditional media to utilising today’s modern computational tools and big data.

The amount of data that is required to understand human language is vast, according to Lee, a computational linguist.

Scholars must remain vigilant and treat the new wealth of information, and its sources, with skepticism if they are to maintain academic standards.

In the 21st century, our everyday social activities are routinely influenced by widespread systems of digital computer technology. This is often referred to as the Fourth Industrial Revolution where the increased use of AI, smart technologies and robotics are reshaping our lives.

Higher education is no exception to this transformation. The data that can be derived from modern information systems is continuous and vast. It can be tremendously useful in research studies but scholars working in the humanities and social sciences must, by necessity, examine critically the new data sources available and the methodologies used to collect data if they are to protect the academic rigour of their work.

A New Academic Learning Environment
With the world changing at an increasingly fast pace this is no easy feat. Data used to come purely in analogue forms such as images, videos and/or scanned material uploaded onto the web. Digitisation processes have converted this material into digital form, providing ample material for digital humanities research.

“Simply speaking, the ‘digital humanities’ involves the careful combination of two things: humanistic study and computational power,” explains Dr TSUI Lik-hang, Assistant Professor at CityU’s Department of Chinese and History, and convenor of the Digital Society research cluster at CityU’s College of Liberal Arts and Social Sciences (CLASS).

Tsui has previously stated that digital humanities are a way to ask, redefine and answer questions with a more intelligent set of tools. “These ‘tools’ do not only include digital technologies, but also our own renewed and adapted mindset and questions based on the new digital environment that is shaping so much of our academic inquiry,” he adds.

The emergence of AI has taken data collection to the next level. “About 10 years ago, some scientists at Microsoft already suggested the development of AI would signify the fourth paradigm shift in scientific inquiry. This means data-intensive science discovery is going to fundamentally change the way we approach a lot of social, political and scientific phenomena,” says Associate Professor Dr Edmund CHENG of CityU’s Department of Public Policy. “Also, much research that could not really be done in the past can now be carried out, and any new findings will allow us to revisit extant theories or conclusions.”

As a political scientist, Cheng asserts that the use of big data creates interdisciplinary collaboration. “For example, I am working with my colleagues in the School of Law to have a better mapping of public sentiments to court judgements. These are things that lawyers in the past had no access to or could not do.”

Big Data, Big Challenges
However, the increased processing power and application of AI and big data in the public sphere is creating new issues. “You have power over your own data but you also have a responsibility associated with it,” Cheng explains, before noting obvious concerns regarding data security and online storage via cloud computing. “You can see that many governments are actually enacting new regulations to make sure that these challenges are somehow met.”

Another challenge is that “big humanities data” is often different from data from other fields, according to Tsui. The scale of it is much larger than the amount of material that humanities scholars traditionally focus on, even if humanities data in digital form is still much smaller than the amount of data harnessed in certain other fields. However, the relatively manageable size of the digital data does not mean it is easier to analyse. Humanities data is highly uneven and extremely specific. Hence, attempting to create, organise or even “decipher” such data without adequate humanities training has potentially serious pitfalls.

Bridging this skill gap is imperative for the current generation of students whatever their discipline. To help, Cheng tries to incorporate as many new technologies into his classes as possible. “When you have many technological advances, you need to help the students understand all this research, and you can help them with tools, techniques, and measurement standards.”


The “digital humanities” involves the careful combination of two things: humanistic study and computational power

Dr Tsui Lik-hang

Despite the difficulties of working with so much information, Cheng is aware of the new opportunities simultaneously being created. “Big data allows researchers from different parts of the world to tap into digital information from all over without having to travel to particular areas, thus lowering costs,” he points out. “I also see an opportunity for unleashing organisational infrastructure and capability. We are having some cross-departmental collaborations with colleagues who are developing data repositories. These joint efforts expect to see the establishment of new labs on technology and policy, such as computational social science, in the foreseeable future.”

Another difficulty Cheng highlights are the differences in digital literacy among various disciplines. New technologies and techniques are often received with scepticism or even hostility until the new ways and measurement renovations are proven to be more effective and efficient. “There are always traditional qualitative researchers who tend to suggest you just enter the field to understand, acquire knowledge and draw contexts accurately. Also, different schools have a benchmark among their peers, which creates challenges in using new methods.”


Data-intensive computing is going to fundamentally change the way we approach a lot of social, political and scientific phenomena

Dr Edmund Cheng

Leading the Data Revolution
Though implementing AI and big data in the social sciences and humanities is a relatively new trend – one accompanied with a healthy dose of the aforementioned scepticism – one area in which big data has always existed is audience research for the media industry.

“Most people may not know that TV ratings researchers are among the earliest users of big data,” says Professor Jonathan ZHU, Chair Professor of CityU’s Department of Media and Communication and School of Data Science. “TV ratings research, dating back to the 1950s, developed a rich set of concepts and metrics to process continuous audience behaviour, which remain the template for today’s Internet industry to follow for user analytics.”

The media has always been an early adopter of new technologies. Newspapers in the 19th century were the first industry to use the telegraph, and radio and television were among the first to use radio waves and satellites.

“AI will become essential to the new media because all online media and social media use AI for daily operations. However, the media has not been ready to use all the latest technologies – blockchain technology, for example. Yet I believe a few years down the road blockchain could become the backbone system for the media to store and exchange information.”

Zhu is a social scientist by training, specialising in media and communication. “In the old days, we studied traditional media such as newspapers and television, among others,” he recalls. “The media industry has largely moved online now. We follow the industry, with much of our teaching and our own research migrated to cyberspace.”

There are two approaches to the social sciences – the quantitative and qualitative. Zhu himself trained in the quantitative tradition where research was done with sample surveys and lab experiments. But even in those days he and his team made full use of big data like TV ratings records.


Most people don’t know that TV ratings researchers are among the earliest users of big data, whose work has remained a major intellectual source for today’s Internet user analytics

Professor Jonathan Zhu

As a result, the transition towards using big data and modern computational tools has been a natural one for him. “The only difference between what we did in the past and what we are doing now is the size of the data and its accessibility,” Zhu explains. “In the past, the data was largely collected and owned by media or marketing research companies. Now most data has become more publicly accessible.”

Not only have these changes altered the way researchers work but it has also affected the way scholars train their students. It is hugely important, insists Zhu, to develop computationally intensive courses. His Department, for instance, has increased the time spent training students on how to use the latest AI and machine learning methods, and they make sure to make good use of the (so far) free and large-scale data available to them.

This pivot has not only been observed by teachers but also by students as well. “Even before they come to our programme, students know that the market is changing its requirements for the future labour force. The students want to come here to study the latest technologies and methods,” says Zhu. That is why the Department has continuously expanded its computational staff and course offering.

Speak My Language
AI is not only used to gauge audience sentiment but also helps in perfecting applications and their analytical observations. Dr John LEE, Associate Professor at CityU’s Department of Linguistics and Translation, focuses on using AI and big data to aid computer programmes in understanding human language.

Lee’s background in computer science launched him toward a doctoral degree at MIT where his PhD topic focused on automatic grammatical error correction. In a nutshell, this allows for a computer to automatically provide feedback and correct errors in the text you write. This is a subfield in computational linguistics.

The traditional methodology in computational linguistics is to conduct deep analysis on a small set of examples. To write a programme that can correct grammatical errors, one may consult grammar books and from there one would try to distil all the grammatical rules and implement them in the system.

“This traditional practice can be very accurate, but it tends to only cover a relatively small set of errors,” remarks Lee. “Can you imagine writing a whole bunch of rules to recognise all errors? You’d have to work very hard to come up with all these rules and even then they might only cover a small percentage of all the possible problems in a text.”


AI in the humanities and social sciences is as challenging and as sophisticated as in other disciplines. How to make a chatbot automatically generate an appropriate reply is as difficult as anything else

Dr John Lee

In contrast, Lee explains, the more recent approach is to use statistical methods trained on a large number of text samples. The advantage of doing that is that you get broader coverage. The downside is that these samples are more “noisy” – not every single passage that you read online, for example, is written in good English. Even if the majority of your samples are good, there is still the risk of running into lower quality samples.

The amount of data that is required to make something like this work for a wide variety of text is immense. Add in the fact that the Chinese language is significantly different from, for example, the English language, and any algorithm employed from one language to the next will need to be adjusted, a source of considerable work.

“In the Chinese language there is no spacing between characters to indicate where a word begins or where a word ends,” explains Lee. This reflects Tsui’s opinion that it is not easy to analyse specific big humanities data regardless of the size of the digital information available. “So there is the extra step with regards to recognising what is a word.”

Lee’s latest project is the development of a Cantonese counselling chatbot app. This is an AI-enabled automated chat system that Lee hopes will help enhance students’ mental well-being. “AI in the humanities and social sciences is as challenging and as sophisticated as in other disciplines,” says Lee. “How to make a chatbot automatically generate an appropriate reply is as difficult as anything else.”

Machine learning is a natural next step in computational linguistics. And while AI can contribute to the development of this field, linguistics will also be able to help AI develop. As AI focuses on applying statistical approaches to big data, the complexity of our ever-evolving human languages presents new challenges for computers that function by identifying logic and patterns. By further developing the field of computational linguistics in higher education institutions, this in turn will help with the advancement of AI and its near limitless potential.

The Future is Now
The implementation of AI, its technologies and big data are all moving into the mainstream and reshaping the status quo in research and education. To process digital data for the humanities, according to Tsui, a renewed mindset is key to tackling the new digital environment now pervasive in academia.

As such, the ways in which students are instructed are changing to prepare them adequately for an evolving job market, and the existing labour force must also “upskill” to keep track. Zhu’s Department created a new stream on media data analytics five years ago to equip students with knowledge and skills to collect, process, and visualise big data from online sources. Graduates from the stream have landed good jobs locally and overseas.

As a result of these technological advances, universities and research centres are entering an era of further multidisciplinary cooperation between departments, as exemplified by Cheng whose public sentiment research and method school found collaboration with associative counterparts. The crunching of huge amounts of data not only helps analyse trends, but in turn helps to develop the very AI that computes them. Lee’s linguistic programmes are at the forefront of logic and pattern identification which are core to the algorithm input necessary to develop AI.

Regardless of the scepticism that exists between those who favour traditional qualitative research and those who embrace innovative tools that bring quantity of data, the application of new digital technologies in the humanities and social sciences is inevitable. Indeed, it has already begun and started facilitating exciting opportunities that will usher academia into a new age.