In this episode of the ACTNext Navigator Podcast, we discuss natural language processing (NLP), GPT2 (Generative Pretrained Transformer, a transformer-based language model), BERT (Bidirectional Encoder Representations from Transformer), and how they’re being used for automatic content generation at ACT.

Our guest is Yuchi Huang, a Senior Manager of the Artificial Intelligence and Machine Learning team at ACTNext. Yuchi takes us through some of the pitfalls and practices of using machine learning for text content analysis and generation.

The views and opinions expressed in this podcast are those of the authors only and do not necessarily reflect the official policy or position of ACT, Inc.

Podcast transcript:

[Adam Burke] Today we’re joined by Yuchi Hwuang and he’s out in California. We’re going to talk about NLP or natural language processing. Welcome to the show Yuchi would you introduce yourself.

[Yuchi Huang] Thank You Adam, my name is Yuchi Hwuang. I’m a senior manager at ACTNext, the research development business innovation division of ACT. At ACTNext I lead the team of automated content generation in AI machine learning group. I spearhead research and development on the use of machine learning and AI in various educational applications. I especially focus on the research of automated content generation for learning and educational assessment and in addition and it’s business capture and the proposal development and also serve as PI on different research programs.

[AB] You’re gonna talk about automatic content generation, text-based, so is that NLP?

[YH] Yeah, that’s a large portion of that work is based on NLP and NLP stands for natural language processing. NLP contains natural language so natural language refers to what we see what we as humans speak such as English, Chinese, French.

It is not a computer programming language like Python or Java it folks on how we can program computers to process large amount of natural language data such as news, normal conversations. The machine can handle certain tasks efficiently and productively so processing and understanding natural language is relatively simple for our humans but for our computers is really difficult and it involves many complex algorithms until today scientists they have only very made very like a little progress in some specific applications and the little progress has been made on the problem of our true language understanding so current research is not enough for machine to use language to reason so let me give some like major examples of NLP tasks in a real world. Okay for instance automatic summarization which is the process of shortening text content computationally then to create a summary that represents the most important or relevant information within original articles. Another example would be machine translation which automatically converts source text in one language to text in another language actually right now the application is quite successful as we have seen for example in google translator world automatic translator in other platforms speech recognition this is also traditional one of one application of NLP which enables the recognition and translation of spoken language into text like computers okay the last one I would like to mention is a is a new one but very interesting right now are widely used by different business sentiment analysis of language data which determines the overall attitude information, for example positive neutral or negative contained in a text data.

I have to say just several years ago the feel of NLP was very difficult so the core idea of NLP is actually more on understanding and reasoning which is not the strength of machine learning. Machine learning is more successful in computer vision and speech processing because problems in these areas are more about sensing or to know where when and watch but NLP is quite difficult because it contains that the inherent logic relationship of human languages luckily with the development of a deep learning technology we now have many powerful machine learning models to help language help that understanding and presenting of natural language.

[AB] Okay I’ll jump in here, I think probably like me, many people think of natural language processing as speech recognition or Google Translate – where it is only inputting and deciphering text and not many people probably think of the automatic content generation or summarization even as being NLP. How separate are these things? One is an input maybe and one is kind of an output of NLP and how is that different maybe from what did you say content analysis?

[YH] So content analysis were other application like speech recognition or machine translation are used to perform the existing text those areas are very important because we have tons of very large amount of text data like speech data required to be processed yes that’s a very large portion but another very big area of NLP is the content generation where we can see the text generation because in a lot of areas we need to output where we need to create a lot of data for people to review to enjoy so this is also very important area.

[AB] Instead of NLP and I’ve seen bots that can generate kind of nonsense inspirational messages or there are different bots that rework different phrases or texts into in my mind they’re always entertaining but there are actual applications where we use content generation and the one I know of is for assessment that you’re creating passages that they can create tests and questions and answers and stuff.

[YH] Yeah that’s the main one.

[AB] Is that what you’re working on in terms of assessment creating content for question test questions [YH] Yes, so my team and I mainly work on automated content generation. So here the content in the term of automated content generation could be considered as any kind of educational material that will be used for learning or assessment it’s not only the text but also other stuff it includes text content such as items passages, handouts, computer codes it also includes multi modal content such as audio or video lectures, cartoons. It even includes interactive contents such as computer games or even virtual reality so it could be very diverse that you know well and furthermore the word automated does not mean that human work becomes trivial so in fact what we hope is that we free people from some tedious in work so as to create more content for students faster and better so that’s why the interactive collaboration between people and software is a very important part of our work yeah so among those are all the content generation different types right now what are focusing is automatic passage generation and item generation we are also very actively exploring methods for multimodal content generation especially for videos and graphics and in some cases in this multimodal.

[AB] Are you doing kind of expression recognition so you can tell someone is smiling and happy and you match that to their speech?

[YH] Yeah so actually what do you mentioned expression recognition it is widely used in our work so it will combine it will be used I’ve applied combines with the text information so as I mentioned above it is called the sentiment analysis and is based on the facial expression or the information for example the chatting contents we can obtain in real time. Based on all this information we can decide okay whether it is this person in front of the camera is happy were unhappy, whether he’s engaged, or not engaged all These kind of information is very important for us to decide the learning efficiency were effectiveness in our platform so this counter technology is widely more and more widely used in our different tools

[AB] You’re going to talk about some of the specifics some of the models such as GPT2 and you have to bear with me I don’t really know anything about it so start from that very low level of explaining GPT2 based text generation.

[YH] It’s a quite theoretical model research model in in machine learning so literally it stands for generative pre-training transformer. I have to say if I use the technical terms to describe it is immersion early language generation model built upon the transformer decoder blocks transformer is a novel learning network module in which a technique so-called attentions is introduced chiefly to model is the it’s a second generation of generative pre-training transformer so at first it’s kind of a transformer model the attention mechanism I mentioned here may be difficult to understand. So it means that the relevant information of each part of the article can be rediscovered to a greater extent so we can model the language in computers in a sense our human brains to the same thing too for example when we do reading and writing we are constantly looking back for context relevance in the text so this also can be understand as tension.

So in the generative pre-training transformer model this kind of algorithms were used repeatedly to get a better understanding of the context.

In layman’s term, GPT2 model is a language model to predict next word or a next sentence next sequence of words given some input text so the language generated by this model is very coherent and very close to human language as we have seen in different experiment validation if you use poems to train this model it can produce poems and if you use songs to train this model it can be used to compose and write lyrics. So it’s very promising and very interesting hmm but also it has also some it has also some problems but if you would like to know that.

[AB] And is it related to the predictive text when I’m typing into my phone it’s predicting the next word that I’ll use is it similar to that?

[YH] Yeah it can be used to predict what you want to input in the following like context for example when you type I’m going to write and here maybe some menu will we’ll jump in, will jump out to show you okay like email or reports or something like that okay so basically all these kind of information are trained it will like present to you based on the previous training data this is a very obvious usage of GPT2 but it can be also used in more complex situations like legal document writing or narrative story writing something like that.

[AB] So if you if you trained this GPT to model on legal documents then you would use that, you couldn’t use a poem trained one to write a legal document or vice versa. You’d want to train it with this legal contracts and things like that right is that how it works?

[YH] Yeah you are you have to train a model particular model on specific tasks we should have a lot of training data to fit in to the system so if we are going to use the data to a specific domain you should have a lot of training sources from nano meaning to provide ok so to get a better result.

[AB] And if you’re gonna I’m just thinking out loud here, if you’re gonna train it to write legal documents would you train it first in just language in general or would you just go right to legal documents and that’s all you would train it in you know what I mean? Would there be like a base or you want it to be good at legal so you just train it with legal – I don’t know.

[YH] So this is actually a very good question. So in the domain of computer vision what we do is that we at first to train the model you know very or very large language material corpus.For example like Wikipedia or some like all kinds of texts we can we can we can get from the internet from one web page to the to the other we use all these coming from vision to train model that is in a way this this model becomes a very like a person have all kinds of different mythologies and after that we will get some specific domain training corpus to train on which we call fine-tuning. Like we teach a student at first we teach all kinds of different domains knowledge to their students and after that to train him or her to become like domain specialist we start who to teach these students very specific domain knowledge so it’s kind of like that kind of process.

[AB] You said you were gonna go into some of the drawbacks or challenges?

[YH] Right now GPT2 has a lot of problems. The major problem is that although the content generates its coherence or looks like human created text, the major problem it create I mean the content and those content maybe lack of factual support and contains a lot of wrong information, even though the sentence seems to be true to us. But  if you do some digging you’re found out is actually wrong previously when I see a sentence created by GPT2 to say Iranian president blah blah blah has attended this conference and when I go back to search their name, I found no Iranian presidents have that kind of name not in the history. So we cannot use this kind of information directly into our testing or other like articles will be published publicly we should use them very carefully especially for those scientific articles or those articles will be based on truth like history we should need to check back the output results very carefully right now we can use it to create narratives to telling stories so now don’t need facts to support them that’s entirely ok so in every case you can’t just trust it to create a passage or a summarization. You’re checking it you’re coming back to check it humans, we have to check that.

[AB] Can we talk about Sphinx so how are we using GPT to in Sphinx and other programs?

[YH] Right now we do use to be to is thanks to generate new sentences based on part of draft which has already been composed in the following we also may use GPT2 and other technologies to perform textile transformation so style transformation means if we generate like one piece of the article we would like to transform it into for example so called ACT style because when we a said here you use an article in testing or learning we have some format or requirements to follow for example how many words for the whole article how many paragraphs and the word difficulties for different grades of students there’s something like that what we have to do this kind of transformation we will do some development and research inside of activity to to fulfill this change we can we can also use GPT2 to in some simpler applications for example to recommend words or phrases that will be used to complete a sentence like you you mentioned before in that example when the writer typing something then some words or phrases will jump out automatic it can be also used to evaluate whether a written sequence of text is coherent because it’s a it’s a language model it’s a statistic model if you put in a sequence it can tell you the the percentage these model things whether it’s coherent so this is also very good tool to decide whether I mean a piece of text is written coherently by students so we have a lot of applications to go by uses in GPT2.

[AB] What strikes me is that it’s we’re doing using it for both generating and you’re saying you could use it to an and analyze sentences

[YH] Yeah the site’s GPT2 models

[AB] What else do you use what other techniques or NLP practices can we talk about aside GPT2?

[YH] We also use other deep learning models because deep learning is very hot during these years to extract future features for various applications for example we use another model which is called BERT do you do we use to use BERT any search what first and I don’t remember actually BERT stands for bi-directional encoder representation from transformers so actually it’s another model arriving from Transformers as I mentioned before transformer is a is a deep learning model the difference between BERT and GPT2 is that GPT2 is like forward moving network that means you input something it will output something it always generates something after what you have input but for BERTs different parties like a kennel language model when it performs training it will get rid of some sentence or some words from the sentences basically the training process tries to predict the words whether these words predicted aligns with the original words there so it’d be this kind of like prediction so after training it can predict some very good words tool to fill the gap and at the same time in this process we can get very good features from the corpus we train so in that way the features of the sentence where piece of sequence can be used for say like text classification or to predict the sentiment of the sequence we input something like that so the usage of BERT is more on analysis of the text not generation oh okay yeah this is a like work based on deep learning and other NLP tools we use for example we use a lot of tools for topic extraction try to extract major topics from the articles topic clustering we try to cast her different articles into different groups so people can read them for example they are going to to read a specific domain this kind of clusters can be generated automatically so they don’t have to do the classification by hand yeah so a lot of applications and algorithms are used NLP

[AB] Let me ask you about what are some of the challenges of working with NLP?

[YH] The challenge is working in a domain of NLP is just like the other domains of machine learning basically so at first you have to look into the data very carefully. It’s not just like you collect the data and you choose a model and you run a model and get results it’s not like that so you have to use knowledge of data science to carefully review the data we collected do we need to do some cleaning on the data to format that data into specific patterns so we can use and sometimes get rid of the noise and then we have to choose appropriate models there are so many different models NLP we can adapt to the data different models have different capability where we need to try to think through the problem and select the right model to use on the data and another thing is that before implementing and any algorithm we should think about the capability of the current algorithms sometimes.

We should note that computers cannot do anything. It has limitation, we need to actively incorporate the ability of human beings so we make the interactive algorithms so in that way we can we can get we can get the combination of the advantages up from both sides the human and computers in that way we can enlarge our capability as much as possible so this is what we have done in in a tool of Sphinx that expensive it is a to combine the strands from both human and personally sometimes we need a human to make decisions based on the information provided by machine learning algorithms so the advantage of machine learning is that it can process very large amount of data to get some vital information from narrator the advantage of or the strength of human is that we can pass on the information to make the right decision I don’t know how to like summarize or collect in the field of machine learning but this is sometimes the best way to handle the practical problems.

[AB] What are some of the other challenges of NLP or working with machine learning in general?

[YH] The biggest challenges for machinery is to is to perform the reasoning and very little progress has been made in this domain yeah try to find a logical inside of text or other sensing data like images videos to make like a very complicated reasoning or prediction yeah in this domain we haven’t made a lot of progress and this is always the key focusing for the frontier mercenary research how do you get past that or what’s it gonna take to get into that Wow yeah it’s very hard to see we haven’t seen any like potential algorithms or models can fulfill that yeah apparently deep learning the current stage of deep learning cannot do that how can we reach that that go we don’t know.

[AB] Okay that’s what I thought but I thought I’d ask and maybe you have the answer for me. We’ll do this again in three years and then and then maybe I have an answer by then I had this yeah. Maybe somewhere between 3 and 30 years.

[AB] What is what is cross-validation is this a good question or is it too technical?

[YH] So I guess you want to you want to ask what is CV. We could be understanding as cross validation which is an important technical term when we do experiments validation on data so cross validation is like that you chop the data in for example in in 10 pieces and you use like nine pieces to Train and another piece to validate that is you will you will like use the one piece to fit into the model to generate the prediction and you compare that it a prediction to ground truth you cannot do just for one time because you have like chop the data into ten pieces you have to do every piece of that of their data or use that for prediction and the rest for training you do this for ten times so this is called 10-fold cross-validation so this is a technique we use in spearmint to validate our ideas whether this is really good or not on the specific data set.

Actually I thought previously you are going to ask what is C V, so C V actually it is more widely used to refer to computer vision because I have been engaged into the research and development of computer vision since I started my PhD study so yes he is more better vision there

[AB] And that’s how you got to know Saad?

[YH] Actually I know Saad Khan from publications. I read his papers previously because I I have as I mentioned I I was a computer vision researcher I studied computer vision since I I was in my PhD program and later in in the first several years of my industry experience I have done a lot of work on face recognition target detection and tracking object recognition and search after several years I found out I need to find a specific domain to use my expertise so one domain I can I can think I like and I think different machine learning expertise can be used is education and I found out Saad is working at that time was working at ETS as a senior research scientist and in his group there is a position so I applied for that position and become an NLP and the computer vision research scientist in ETS that’s the starting point when I worked with that ok so you’ve done it for a few years we have worked together.

[AB] He came to ACT first and then you’ve joined a year later?

[YH] So after he joined ACT after he decided to join a CT he asked me whether I would like to join ACT so I think this is a good opportunity for me so I choose to join good and I think this is the right decision.

[AB] So let’s talk about you. Where did you grow up and then where you went to school and what got you into computers?

[YH] I grew up in China I was born there I was born along in you know very small town alongside the Yangtze River, the biggest river in China yeah I finished my childhood my primary school to college in China I went to Beijing University of Astronautics and Aeronautics in Beijing Beijing in China and after that I went to United States to start my PhD studying in Rutgers University for computer science yeah previously when was a college student my major is automatic control basically a subfield of Electrical Engineering. Finally I found out computers will rule everything for automatic control or electrical engineering so that’s why I transferred to computer science to learn or more about machine learning computer vision and later NLP.

[AB] Was there something you remember growing up that you said I would like to go into this at that time as electrical engineering.

[YH] I don’t even know that I’m going to be a scientist that way I was young but I like really I like studying I I enjoyed the process of learning very much I like to swim in the sea of knowledge so maybe that’s that kind of interest and guide me to the way of research and my current career so I guess that’s the reason I’m also a huge fan of sci-fi movies so maybe that’s also the reason.

[AB] What’s the best part of your job? How about that?

[YH] So I my daily routines like I will be doing a lot of research reading. I will do discussion with my team members. I will discuss the projects during the meetings regular meetings. I will do project management work people writing and even sometimes I will still do some prototyping coding. But now as a manager I need to spend more time thinking about the directions of different research problems in the team and also perform project management with the team members yeah so I like this kind of work paradigms which allows me to which allow me to get in touch with more businesses and consider issues from a more comprehensive perspective.

[AB] That’s good. thank you Yuchi for joining us I hope you had fun

[YH] Thanks for having me and I really appreciate this opportunity to explain a lot of problems in machine learning NLP. If you have any questions please let me know, thank you.