Time Travel: 2014

Chapter 91 The admiration of the top algorithm team (Part 2)

However, new questions immediately appeared in Eve Carly's mind.

How did the developer of Nanfeng APP come up with this brand-new algorithm tentatively called generative summary algorithm?

Their development team has also dabbled in the so-called generative summary algorithm and similar summary algorithms that rely on neural networks before.

At that time, they called this algorithm a summary summary algorithm, but the actual performance of this algorithm after multiple rounds of testing by their team was not ideal.

Although this summary algorithm, called generalized or generative text summarization, can generate expressions that have not appeared in the original text, it is more flexible than the extractive summary algorithm.

However, it is precisely because of this that generative summarization is more likely to produce factual errors. These errors include content that is contrary to the original information and people's common sense.

In addition, this generative text summarization algorithm is prone to obvious weakness when dealing with long news.

Although putting this generative summary algorithm and the extractive summary algorithm together will improve the ability of the generative summary algorithm to handle news length.

However, after testing, it was found that the generative summary algorithm is not a drag, and the extractive summary algorithm can perform more ideally.

To be on the safe side, Eve Carley’s team ultimately chose the traditional text summarization direction of further enhancing the speed and accuracy of extractive text summarization.

A direction that they had abandoned but was picked up again by others?

It sounds a bit incredible, but the fact is that the developers of Nanfeng APP not only picked up the research direction they once abandoned, but also did it better than them, which can be said to be a slap in the face.

Eve Carly was a little confused. She couldn't figure out how the developers of Nanfeng APP managed to create a path in a direction they thought was unfeasible.

But one thing is certain. Although the developers of Nanfeng APP also use an algorithm similar to the general/generative algorithm, the specific generative algorithm itself is at least one generation more advanced than the original generative algorithm they made.

Despite her confusion and being slapped hard in the face, Eve Carly did not appear very emotional, at least not as emotional as Nick showed in his letter.

Years of research career have long developed Eve Carly's rational character of being unfazed by favors and humiliations.

Furthermore, technological progress comes one after another.

If you worry about gains and losses because of temporary gains and losses, it is better to change careers as soon as possible.

Excessive emotional fluctuations are not only unnecessary, but will affect rational judgment.

After experiencing the Nanfeng APP in depth, Eve Carly had to admit that although this APP looks like a temporary translation software, the core algorithm is indeed very strong.

Even as the slogan of this software says - The strongest on the surface.

In addition, the summary speed and summary accuracy claimed by this software to overwhelm similar software are also true.

Wait, thinking of the accuracy emphasized in the slogan of Nanfeng APP, Eve Carly suddenly thought of something.

Today's news summary software algorithms emphasize speed in terms of publicity, and rarely talk about accuracy.

It’s not that accuracy is unimportant in news summarization. On the contrary, accuracy is extremely important in news summarization. It can be said that accuracy is the most fundamental factor to measure whether a summarization algorithm is useful, but various summarization algorithms rarely have the same accuracy. Accuracy is advertised with extremely precise quantification.

There is no other reason, because the industry currently lacks a unified standard for measuring accuracy.

It sounds incredible, but it is true. Evaluating the accuracy of an abstract may seem easy, but it is actually a more difficult task.

For the measurement of an abstract, it is difficult to say that there is a standard answer. Unlike many tasks that have objective evaluation criteria, the evaluation of abstracts relies on subjective judgment to a certain extent.

In summarization tasks, there is a lack of unified standards for measuring summary accuracy such as grammatical correctness, language fluency, and completeness of key information.

There are currently two methods for evaluating the quality of automatic text summarization: manual evaluation methods and automatic evaluation methods.

Manual evaluation is to invite a number of experts to set standards for manual evaluation. This method is closer to people's reading experience.

However, it is time-consuming and labor-intensive, and not only cannot be used to evaluate large-scale automatic text summarization data, but it is also not consistent with the application scenarios of automatic text summarization.

The most important thing is that if people with subjective ideas evaluate summaries, it is easy to have deviations. After all, there are a thousand Hamlets in the eyes of a thousand people. Everyone has their own criteria for measuring news summaries. Perhaps a measurement team A unified measurement standard can be developed, but it is likely that the measurement standard will be different if another measurement team is used.

This can easily lead to completely different evaluations of the same summary results due to different judging teams when judging accuracy.

Judging teams vary widely, and it is easy for some teams that are obviously capable of doing good algorithms to die before they can be accomplished because of the judging team's interference.

The text summarization algorithm of Eve Carley and her team was once leading the world.

It has a lot to do with their in-depth cooperation in linguistics with Oxford, Harvard and Yale universities.

But this is not a long-term solution after all. Manual evaluation methods are destined to not go far due to their inherent limitations.

Therefore, the text summarization algorithm research team actively studies automatic evaluation methods.

Since the late 1990s, some conferences or organizations have begun to work on formulating standards for summary evaluation, and they will also participate in the evaluation of some automatic text summarization.

More famous conferences or organizations include SUMMAC, DUC, TAC (Text Analysis Conference), etc.

Although relevant teams are actively researching automatic evaluation methods, the two methods to evaluate the quality of automatic text summarization (human evaluation method and automatic evaluation method) today are still the most commonly used evaluation method.

The principle of many automatic evaluation methods is to compare the news summary generated by the summary algorithm with the reference summary and evaluate it through the maximum fitting degree.

Although this evaluation process is automated, the reference summaries are written manually.

In other words, even the so-called automatic evaluation method cannot get rid of the intervention of subjective factors.

In that case, why go to the trouble of using some automatic evaluation method?

Because of this, many teams still choose manual evaluation when evaluating abstract quality.

However, it is difficult to objectively quantify the results with something as subjective as manual evaluation.

Because of this situation, although the accuracy of the summary algorithms of many previous teams was quite good.

But when it comes to promoting the accuracy of news summaries, everyone has selectively forgotten it.

Under this circumstance, why did the developer of Nanfeng APP categorically claim in the software introduction that the accuracy of this software is 270% higher than similar software.

On what basis is this so-called 270% measured?

Prev Index Next

Tap the screen to use advanced tools Tip: You can use left and right keyboard keys to browse between chapters.