(Day 32) Language Transformers and KCSE day 3

Ivan Ivanov · February 2, 2024

deep-learning nlp

Hello :) Today is Day 32!

A quick summary of today:

covered the last part of the deep learning specialization
presented at the KCSE 2024

Firstly, regarding the DL specialization

The last part of Sequence models is Language transformers, which I haven’t learned so far, but I found out that it includes a variety of features for NLP.

Transformers excel at capturing context through a self-attention mechanism, which can add to the importance of other words within the sequence depending on contextual relevance, effectively modeling long range dependencies and finding detailed relationships within the text.

The self-attention mechanism relates each word to the query, key, and value vector, so key helps identify the word most relevant to the query for the word. Value contributes to the final expression. By dynamically adjusting the expression according to the context, the self-attention mechanism enables a richer understanding of word meanings and relationships in sequential columns, thereby improving the ability of the transformer network in natural language processing tasks.

You can stack the self-attention “layers”, and you get multi-head attention.

The multi-head attention mechanism can be briefly described as for loop over the self-attention mechanism. Each calculation step is called a head, and each head calculates a self-attention for the input sequence. These calculations are performed in parallel and the results are combined to create a multi-head stack. Through this mechanism, multiple questions can be asked for each word, and much richer and more effective expressions of each word can be learned. For example, in the case of a Jane want to France in September sentence, when can understand the French word as a question, and when - September, who - Jane, etc - can be understood as context.

This was just just an introduction to transformers, but I think I need to study transformers well to understand NLP more deeply

As for my presentation

My slides can be found here

And the short paper - here on page 115

My topic is a review of ChatGPT utilization: An analysis of prompt engineering strategies applied to various domains, but I just wrote a review of papers about prompt engineering.

You can see the prompt strategy of this study in the table above, and my favorite strategy is Oh, 2023 paper. Creating a prompt strategy to solve math problems over there, the strategy was 91% accurate by solving the problems of nine Korean math textbooks with the “Role-Rule-Example-Process” prompt. Usually, ChatGPT is not good at math, but it does well with the proposed prompt.

I think today’s presentation went well. I was a bit nervous at the start but became find after 2-3 mins.

That is all for today!

See you tomorrow :)

Original post in Korean