• About
  • Advertise
  • Privacy & Policy
  • Contact
Internet Starters
  • Home
  • Branding
  • Computers
  • Internet Starters
  • Marketing Tips
  • The Internet
No Result
View All Result
  • Home
  • Branding
  • Computers
  • Internet Starters
  • Marketing Tips
  • The Internet
No Result
View All Result
Internet Starters
No Result
View All Result
Home Computers

A novel corpus‐based computing method for handling critical word‐ranking issues: An example of COVID‐19 research articles – Chen – – International Journal of Intelligent Systems

Inter 2025 by Inter 2025
March 12, 2021
A novel corpus‐based computing method for handling critical word‐ranking issues: An example of COVID‐19 research articles – Chen – – International Journal of Intelligent Systems
Share on FacebookShare on Twitter

[ad_1]

1 INTRODUCTION

In search of correct algorithms to optimize right this moment’s excessive computational actual‐world issues is a important and difficult job that has taken an excessive amount of efforts within the final decade. As an illustration, Barshandeh and Haghzadeh1 proposed a novel hybrid physics‐primarily based nature‐impressed meta‐heuristic algorithm which named as proposed hybrid optimization algorithm (PHOA). They built-in atom search optimization (ASO) and tree‐seed algorithm (TSA) to efficiently optimize conventional meta‐heuristic algorithms, furthermore, PHOA was additionally examined on seven actual‐life engineering issues and the outcomes of PHOA had been superior amongst conventional algorithms. As well as, Barshandeh et al.2 proposed a novel hybrid multipopulation algorithm (HMPA) that mixed synthetic ecosystem‐primarily based optimization (AEO) and Harris Hawks optimization (HHO) algorithms, then, adopted Levy‐flight technique, native search mechanism, quasi‐oppositional studying, and chaos concept to maximise the effectivity of the HMPA. Of their analysis, HMPA was examined on seven constrained/unconstrained actual‐life engineering issues, and the calculation outcomes of HMPA had been in contrast with related superior algorithms. The outcomes indicated that HMPA was outperformed the opposite competitor algorithms considerably. To increase the ideas of Barshandeh and Haghzadeh1 and Barshandeh et al.2 researches, it’s important to hunt optimization algorithms in dealing with actual‐life corpus evaluation points, particularly throughout this period of knowledge explosion.

On this trendy digital period, corpus constructing has advanced from handbook assortment to computerized assortment of textual knowledge. To handle its huge textual knowledge, corpus often combines statistics, machine studying algorithms, or synthetic intelligence (AI) methods; this facilitates the effectivity of knowledge assortment, data processing, data retrieval (IR), and so forth. Pure languages are some of the ubiquitous codecs of knowledge movement amongst folks. Analyzing, integrating, and reproducing textual knowledge inevitably require importing extremely correct algorithms to course of pure languages’ semantics and syntax. Corpus‐primarily based approaches that embed statistical algorithms, corresponding to frequency calculation and log‐probability check, are generally adopted by linguists and knowledge analysts for deciphering linguistic patterns and extracting area data.three, four As well as, in corpus‐primarily based approaches, phrase rating is a crucial method used to outline phrases’ significance degree and to retrieve important phrases from the big textual knowledge; this particularly helps uncover semantic relationships between lexical items.5, 6

Within the face of novel ailments, it’s important to construct specialised medical corpora for integrating, managing, and retrieving huge data associated to the ailments; such corpora assist additional successfully analyze, react, stop the ailments. For instance, COVID‐19, a novel illness outbreak in December 2019, has an in depth genetic kind with SARS coronavirus (SARS‐CoV), and has prompted over 40 million confirmed circumstances and 1 million deaths by the top of October 2020 (lower than a yr).7-12 Main researchers from varied nations are attempting to unveil the thriller of the novel illness. As of the top of October 2020, Internet of Science (WOS), an internationally famend tutorial database, has printed greater than 35,000 COVID‐19‐associated analysis articles (RAs); this quantity retains rising. Little question, governments around the globe are searching for direct and efficient measures to mitigate the pandemic and pace up the remedy of the confirmed circumstances.13, 14 With large textual knowledge about COVID‐19 being quickly distributed, it’s important for people to depend on machine algorithms to compute necessary semantic data, thereby, filtering and retrieving important messages.15, 16 Therefore, adopting corpus‐primarily based approaches to course of and combine COVID‐19‐associated English‐mediated textual knowledge will improve frontline medical personnel’s effectivity of data acquisition and notion.

Because the creation of laptop know-how, the practicality of corpus‐primarily based approaches has obtained widespread consideration and adoption in textual data evaluation fields. Frequency criterion is taken into account as one of many core analytical methods in corpus‐primarily based approaches. Nonetheless, merely counting on tokens’ frequency values to find out their significance could also be inadequate; tokens’ dispersion and focus situations additionally should be considered. For instance, when it comes to significance, a phrase occurring 100 instances in an RA shouldn’t be equal to a phrase occurring 10 instances every in 10 RAs as a result of phrases’ dispersion and focus situations are totally different. A possible resolution that adopts Hirsch index (H‐index) algorithm to combine and compute the standards of dispersion and focus is required to handle this concern. H‐index algorithm was initially used to quantify the accumulative impacts and relevance of a researcher’s scientific analysis achievements.17-23 However, this algorithm was not solely restricted to the needs of evaluating tutorial achievements but in addition seen its purposes within the fields of threat evaluation,22 medical,24 and so forth.

Dealing with important phrase‐rating points utilizing conventional frequency‐primarily based approaches could trigger distortion and bias as a result of these approaches neither refine the corpus knowledge nor concurrently compute phrases’ frequency dispersion and focus standards, therefore, the alleged extremely necessary phrases with excessive frequency could be challenged. Thus, this paper proposed a novel corpus‐primarily based strategy that integrates a corpus software program and H‐index algorithm as a computation technique and analysis metric that may improve the accuracy of phrase rating, compensate the deficiency of the standard frequency‐primarily based approaches, and additional increase the efficacy of corpus‐primarily based evaluation. To confirm the proposed strategy, 100 COVID‐19‐associated medical RAs with Science Quotation Index (SCI) from WOS had been retrieved and compiled as the large textual knowledge and an empirical instance which was embedded into the proposed strategy. The primary motive the researchers adopted this empirical instance was that SCI journals symbolize excessive‐high quality tutorial publications. As well as, understanding the particular linguistic pragmatics of medical RAs will help frontline healthcare personnel in processing and buying necessary COVID‐19 medical messages.

The rest of this paper is organized as follows: Part 2 describes preliminaries, explains the theoretical framework, and introduces the current novel illness, COVID‐19. Part three describes detailed steps of the proposed strategy. Part four makes use of COVID‐19‐associated RAs from WOS as the large textual knowledge (i.e., the goal corpus) and as an empirical instance to confirm the proposed strategy. Part 5 is the concluding a part of this examine.

2 PRELIMINARIES

2.1 Typical frequency‐primarily based corpus evaluation

With the advance of laptop know-how, corpus improvement has enabled folks to ascertain algorithms to combine, handle, and course of pure languages from huge textual knowledge, thereby driving the progress of pure language processing (NLP) and AI‐associated industries. O’Keeffe et al.25 famous that data on frequency counts of tokens is the premise for understanding core vocabularies that native audio system use continuously and the frequent mixtures of vocabulary utilization. Gathering massive knowledge (corpora) from native audio system’ written texts and discourse transcripts will present robust proof for understanding their linguistic patterns. Furthermore, rating phrases primarily based on their frequency will present the phrases which are adopted by the bulk and the phrases which are utilized in day‐to‐day communications.26, 27 Therefore, frequency‐primarily based corpus analytical approaches have extensively been adopted by linguists, sociologists, textual content analysts, and so forth for extracting robust linguistic proof for deciphering cultural phenomenon, jargon, style kind, and so forth.28, 29 For instance, Le and Miller6 adopted Sketch Engine, a corpus software program, to cross‐examine 4 medical corpus sources to extract probably the most continuously occurring medical morphemes in medical RAs. The ensuing knowledge indicated 136 specialised medical morphemes that account for eight.5% of the lexical gadgets within the Medical Internet Corpus, and the outcomes supplied English as a Overseas Language (EFL) medical college students a helpful tutorial useful resource for enhancing their comprehension of English medical vocabulary. Grabowski5 used WordSmith Instruments 5.Zero, a corpus software program, to current a corpus‐pushed description of the use and capabilities of prime‐50 key phrases (i.e., primarily based on keyness values) complemented by the same description of prime‐50 lexical bundles (LBs; primarily based on frequency values) within the evaluation of specialised corpus which comprises sufferers’ prescriptions, outlines of product introduction, medical trial protocols, and pharmacological RAs. The outcomes offered vital pedagogical worth for English for particular functions (ESP) college students and EFL practitioners within the pharmaceutical area.

Conventional corpus‐primarily based strategy was designed for successfully clarifying, categorizing, and deciphering the patterns of pure languages. Computing phrase frequency is thus a important method that corpus software program is able to (see Equation 1).

Definition 1. ((Anthony[30] and Scott[31]))If

af represents the cumulated worth of a token’s total frequency, the place

f means the sequence of a subcorpus;

a means a token’s frequency; and

an means a token’s frequency, counted in

n subcorpus.


∑f=1naf=a1+a2+athree+⋯+an.(1)

2.2 H‐index algorithm

H‐index algorithm was proposed by Jorge E. Hirsch,19 a physicist and a professor on the College of California, San Diego in 2005. H‐index is an analysis mechanism that’s used to measure a researcher’s tutorial productiveness and the quotation charge of printed articles; the index h is given to symbolize the variety of papers with quotation quantity greater than h, it’s a helpful index to quantify the tutorial achievements of a researcher. These days, this mechanism has been extensively adopted in a number of tutorial databases, corresponding to WOS, Google Scholar, Scopus, and even different analysis fields.18, 20, 22 The algorithm computes the interrelationships between publication portions and numbers of citations, and defines a researcher’s tutorial affect in sure area. For instance, Li et al.22 adopted H‐index algorithm to evaluate the importance of the city railroad community construction, which took topology, passenger amount, and passenger movement correlation of Beijing city railroad community into consideration to refine rail community construction and reduce operational dangers. Gao et al.17 proposed a weighted H‐index (hw) by setting up an operator H on weighted edges. Furthermore, the buildup of weighted H‐index (sh) within the node’s neighborhood defines the spreading affect, then utilized the vulnerable–contaminated–recovered (SIR) mannequin to research an epidemic spreading course of on 12 actual‐world networks, and to additional outline probably the most influential spreaders. Hanna et al.24 developed a novel metric for quantifying affected person‐degree utilization of emergency division (ED) imaging. Of their analysis, H‐index was adopted to measure a affected person’s annual ED imaging quantity, and the ensuing knowledge of sufferers’ H‐index values had been used because the referential knowledge for mitigating imaging‐associated prices and bettering throughput within the ED. In abstract, H‐index algorithm integrates a number of concerns to guage and to create the values of significance of the analysis objects, furthermore, the definition of Hirsch’s H‐index algorithm is outlined as follows:

Definition 2. ((Hirsch[19]))If the worth of operate f represents quotation instances of every paper and is ranked in descending sequence (see Equation 2), then discover f(n) equal to or bigger than n (see Equation three). The worth of H‐index has to fulfill this criterion, and may be described as follows:


H‐index(f)=maxn min(f(1),…,f(n)),(2)


f(n)≥n,(three)

the place n is the paper numbers,

f(n) is the quotation instances of the paper, and

maxn min(f(1),…,f(n)) represents quotation instances of every paper ranked from most to minimal.

To grasp this algorithm, two examples are given as follows:

Instance 1.If a researcher has 10 printed articles (n = 10) recognized as

A1,A2,Athree,…,A10, and the quotation numbers are randomly given as 9, 5, 50, 20, 6, eight, 6, four, 1, Zero, thus, f(A1) = 9, f (A2) = 5, f (Athree) = 50, f (Afour) = 20, f (A5) = 6, f (A6) = eight, f (A7) = 6, f (Aeight) = four, f (A9) = 1, f (A10) = Zero. Then, rerank the quotation numbers in descending sequence, they usually change into f (b1) = 50, f (b2) = 20, f (bthree) = 9, f (bfour) = eight, f (b5) = 6, f (b6) = 6, f (b7) = 5, f (beight) = four, f (b9) = 1, f (b10) = Zero. The outcomes point out that b6 satisfies the standards of Equation (2) the place f (b6) ≥ 6, thus H‐index = 6 (see Desk 1).

Instance 2.The illustrative diagram (see Determine 1) additionally explains the H‐index algorithm; there’s a reference line (i.e., it represents that the n paper must have not less than n citations) on the diagram, the papers’ citations need to be over or on the reference line to be included into the worth of H‐index. f(b6), on this case, is the sixth paper and can also be the final paper on the reference line. In the meantime, its quotation time is six and it satisfies Equation (2), f(b6) ≥ 6, thus, the worth of H‐index is the same as 6.

In abstract, H‐index algorithm presents the estimation of the importance, significance, and huge affect of a researcher’s cumulative tutorial contributions. It has change into a typical measurement and a criterion that’s unbiased to check and to guage the tutorial achievements of researchers who’re competing in the identical analysis fields.19

image
Illustration of H‐index algorithm [Color figure can be viewed at wileyonlinelibrary.com]
Desk 1.
H‐index computing course of
Authentic knowledgeComputing course ofH‐index end result
Analysis paperQuotation instancesAnalysis paperQuotation time
19three506
25four20
three5019
four206eight
5656
6eight76
7625
eightfoureightfour
9191
10Zero10Zero

2.three COVID‐19

COVID‐19, whose authentic nomenclature was SARS‐CoV‐2, was renamed by WHO in February 2020. The clusters of first circumstances of the virus had been found in Wuhan metropolis, Hubei province, China.7 Epidemiologists, for now, suggest a risk that the virus which was initially carried by wild animals entered to human‐to‐human transmission routes as a result of locals within the metropolis have desire for “Yeh‐Wei”, meats of untamed animals, corresponding to bats, birds, and rodents.eight, 10 Upon visiting the attainable supply location of COVID‐19, Huanan market, medical specialists discovered loads of contaminated carcasses of untamed animals stocked and piled on the market. Thus, medical and organic specialists speculated that the novel coronavirus could consistently mutate in animal hosts (e.g., bats, pangolins, and so on.), then change into able to infecting people, particularly when folks course of animal carcasses or eat raw meals elements that host the virus.eight Certainly, many research have indicated that bats had been the preliminary hosts of COVID‐19 as a result of it has over 90% similarity to 2 SARS‐like coronaviruses from bats, bat‐SL‐CoVZX45 and bat‐SL‐CoVZX21.9, 12 By way of etiology, COVID‐19 has a genetic kind just like SARS‐CoV (i.e., an acute respiratory syndrome coronavirus which broke out in 2002) and MERS‐CoV (i.e., center east respiratory syndrome coronavirus which broke out in 2012),12, 32 however its spike (S) protein has mutated and enabled it to assault the host’s immune system, making the host too weak to withstand the virus.33 The comparability of COVID‐19 and two prior coronaviruses reveals that COVID‐19 causes a low fatality charge however has extraordinarily excessive infectious functionality.34 Yi et al.12 additionally identified that almost all of the human inhabitants lacks the immunity of COVID‐19 and is thus vulnerable to the novel coronavirus.

Reverse transcriptase polymerase chain response (RT‐PCR) was initially adopted as the first standards for diagnosing COVID‐19. Nonetheless, RT‐PCR check technique has a excessive likelihood of misdiagnosis which will speed up the pandemic, thus, a number of diagnosing check approaches had been built-in with the investigations of journey historical past survey, illness information, medical signs (see Determine 2), lab assessments, and X‐ray or computed tomography (CT) for making efficient diagnoses.35 Following the intensification of the COVID‐19 pandemic, fast check toolkits had been invented to quickly detect RNA, antigen, or antibody of SARS‐CoV‐2, giving extra time to frontline healthcare personnel to reply and remedy the confirmed circumstances. As well as, prior research identified that with out protecting measures (i.e., surgical masks, respiratory filtrations, and so on.), three main transmission routes of inhalation, droplet, and call routes will trigger 57%, 35%, and eight.2% of COVID‐19 an infection likelihood.36 For frontline healthcare personnel, specifically, who deal with confirmed circumstances and have extended publicity to the virus emission atmosphere and inhalation of droplets (<10 μm) that include the virus, their risk of an infection could attain over 80%.37 Prior analysis additionally confirmed that social distance (1.5–2 m) won’t be efficient if the virus emission supply doesn’t put on any protecting gear as a result of the virus may be unfold not less than 6 m away through sufferers’ coughing and sneezing.38, 39 Therefore, although the fatality charge of COVID‐19 shouldn’t be extraordinarily excessive, excessive an infection charges trigger difficulties in pandemic response and prevention.

image
Medical signs of COVID‐19 [Color figure can be viewed at wileyonlinelibrary.com]

In line with WHO, as of October 31, 2020, there have been 45,408,704 confirmed COVID‐19 circumstances and 1,179,363 COVID‐19 deaths (see Determine three). As a result of focused therapeutic medicines are nonetheless being developed, governments can solely presently depend on quarantine insurance policies, and current oblique medical therapies, thus, making residents take note of private hygiene, implementing border management measures, encouraging social distance and web buying, and so forth to lower shut contacts between folks and management the COVID‐19 pandemic.40-42

image
COVID‐19 confirmed and loss of life circumstances (knowledge file from January 1 to October 31, 2020) [Color figure can be viewed at wileyonlinelibrary.com]

COVID‐19, on the time of this writing, remains to be a semi‐unknown novel illness for medical specialists and continues to be explored. To successfully handle the large medical textual details about it, it’s essential to create a COVID‐19‐specialised corpus, integrating acceptable algorithms for data processing and mining.

three METHODOLOGY

Conventional corpus‐primarily based computing strategies for important phrase rating primarily calculate phrases’ frequency values and rank them. Prior research believed excessive‐frequency phrases could replicate particular linguistic patterns in sure domains which might profit EFL audio system in more practical acquisition of area data when studying English texts.three, 5, 6, 43, 44 Thus, with fast data movement of COVID‐19, establishing COVID‐19 specialised corpus for well timed acquisition of up to date medical data is very important for medical care personnel.7, 9, 11, 14, 32 Actually, as of the top of October 2020, greater than 38,000 RAs on COVID‐19‐associated matters had been printed within the WOS database; this phenomenon indicated that a lot of analysis outcomes had been produced by main researchers globally. To successfully combine and decipher the English‐mediated skilled textual data and to additional enhance the effectivity of data acquisition, importing algorithms to compute key pure language semantics is kind of important. Corpus‐primarily based and NLP know-how therefore performs the important roles right now for people to effectively course of the large textual data out there.25, 45

Earlier corpus‐primarily based research that targeted on calculating phrases’ frequency values could miss necessary components and quotation charges, which point out the variety of instances a phrase is utilized by totally different textual content creators. For instance, a medical‐oriented phrase that happens 10 instances in 10 RAs, respectively (i.e., total frequency is 100 instances), the researchers imagine, is extra necessary than a medical‐associated phrase that happens 200 instances however solely in a single RA, this idea is very important to healthcare personnel as a result of with time limitations, entry to probably the most important area‐associated phrases are essential. Thus, when dealing with important phrase‐rating points, the next two necessary situations have to be considered concurrently:

Nonetheless, taking current corpus software program, corresponding to AntConc three.5.eight,30 WordSmith Instruments 5.Zero, and so forth, as examples, inside its current algorithms, these are nonetheless unable to concurrently compute these two situations. Their phrase‐rating outcomes can solely base on frequency worth or vary worth, respectively, therefore to make the analysis of phrases’ significance degree exist bias. Due to this fact, to compensate for the outcomes bias in phrase‐rating problems with the standard strategies, the researchers suggest a novel corpus‐primarily based strategy that integrates AntConc three.5.eight30 and H‐index algorithm19 to compute and to guage the significance of tokens.

The steps are as follows: within the preliminary stage of the proposed strategy, pattern and compile the textual knowledge because the goal corpus in a approach that appropriate for H‐index algorithm. Then, undertake Chen et al.’s46 corpus‐primarily based optimizing strategy to refine the goal corpus. Within the center a part of the proposed strategy, use AntConc three.5.eight30 to compute tokens’ frequency values and ranges, then, undertake H‐index algorithm to integrally compute tokens’ dispersion and focus situations, and to additional get hold of their H‐index values. Subsequent, rank tokens primarily based on their H‐index and frequency values. Postranking outcomes will make clear the significance of the proposed strategy and indicate the longer term attainable purposes in corpus‐primarily based and NLP fields. There are six steps in complete within the proposed strategy, furthermore, detailed descriptions are proven as follows (see Determine four):

image

Flowchart of the proposed strategy

Step 1. Compiling appropriate categorization of the large textual knowledge for H‐index evaluation.

H‐index algorithm is especially used to discover the quotation charge of analysis papers. On this examine, the authors undertake it to discover the utilization charge of tokens. On this step, the goal corpus (i.e., the large textual knowledge) needs to be segmented into its fundamental parts that contemplate an article as a unit as an alternative of compiling all recordsdata into a giant file (see Determine 5). Therefore, the H‐index of tokens will probably be computed efficiently.

image

Ultimate corpus compilation technique for H‐index algorithm

Step 2. Extracting tokens from the large textual knowledge.

Utilizing AntConc three.5.eight because the corpus software program to calculate and unveil the composition of the large textual knowledge, the quantitative knowledge will probably be retrieved and all tokens will probably be labeled with numbers on this step.

Step three. Optimizing the large textual knowledge.

Operate and meaningless phrases would lower the effectivity of corpus‐primarily based approaches, therefore to retrieve the substantive phrases which most replicate area data, a refining course of is inevitable. On this step, undertake the operate wordlist and machine optimizing course of to refine the large textual knowledge,46 the remaining content material phrases will probably be processed in subsequent steps.

Step four. Rating tokens primarily based on particular person total frequency standards.

After calculating every token’s total frequency primarily based on Equation (1) by the corpus software program, the wordlist on this step will probably be ranked primarily based on frequency standards, from highest to lowest frequency sequences.

Step 5. Rating tokens primarily based on H‐index algorithm.

On this step, the researchers undertake the H‐index algorithm to compute the importance of tokens. Right here, the quotation instances are thought of because the tokens’ adoption instances (i.e., frequency), thus, the calculation of tokens’ H‐index relies on a token showing equal to or greater than n instances in n RAs. First, primarily based on Equation (2), rank the phrase frequency of every RA in descending order. Then, primarily based on Equation (three), discover a phrase’s H‐index worth that satisfies the standards.

Step 6. Integrating tokens’ rating data for future prolonged purposes.

On this step, tokens’ H‐index and frequency values are built-in and proven on the wordlist, furthermore, the sequence of tokens should fulfill the next standards:

  • 1.

    Rating tokens primarily based on their H‐index values in descending order.

  • 2.

    If tokens have the identical H‐index values, then rank their frequency values in descending order.

The proposed strategy makes use of H‐index algorithm to compute a token’s diploma of significance, concurrently taking the standards of dispersion and focus into consideration. As well as, when going through the identical H‐index values, use tokens’ frequency values to outline their ranks to keep away from hesitation that happens when defining tokens’ diploma of significance.

four EMPIRICAL STUDY

four.1 Overview of the compiled large textual knowledge

The large textual knowledge on this paper are 100 RAs that had been collected from WOS. This selection was resulting from WOS that is among the largest, properly‐recognized, and main databases on this planet. Furthermore, many tutorial large textual knowledge evaluation researches and NLP researches of scientific fields adopted RAs from WOS as check knowledge.47-49 Therefore, on this examine, the researchers selected Medication, Normal, and Inner, a class that outlined journal quotation studies (JCR) for WOS, they then targeted on open entry (OA) journals (N = 24). To course of these 24 journals, first, the authors calculated their respective annual publications (knowledge retrieved from 2019.9.1 to 2020.eight.31), then, calculated the variety of papers that had been associated to the COVID‐19 matter. Lastly, they sampled the latest articles from every journal primarily based on ratio they usually additional compiled the large textual knowledge (see Desk 2). The analysis fields of the sampled journals comprise (1) environmental sciences, (2) public, environmental, and occupational well being, (three) infectious ailments, (four) tropical drugs, (5) microbiology, (6) toxicology, (7) healthcare sciences and companies, and (eight) well being coverage and companies. Moreover, the collected RAs all had COVID‐19 of their titles, they usually mentioned issues and options in the course of the COVID‐19 pandemic in step with their analysis fields. The paper amassing technique on this examine tried to achieve a steadiness between area and style kind as a lot as attainable to make native and EFL healthcare personnel perceive an important and extensively used tokens in medical RAs.

Desk 2.
The composition of the large textual knowledge
SubjectClassJournalAnnual publicationCOVID‐19‐associated RAsPrecise collected articles
COVID‐19Medication, Normal, and InnerWorldwide Journal of Environmental Analysis and Public Well being768325341
Frontiers in Public Well being5399415
Journal of World Well being228457
Lancet World Well being399437
Lancet Public Well being173417
Journal of An infection and Public Well being25227four
Asian Pacific Journal of Tropical Medication10222four
BMJ World Well being327132
Annals of World Well being97132
Globalization and Well being108122
Journal of Nepal Medical Affiliation172112
BMC Public Well being1817eight1
Journal of Epidemiology7951
Antimicrobial Resistance and An infection Management19551
Reproductive Well being18051
Australian and New Zealand Journal of Public Well being11451
Archives of Public Well being91four1
Environmental Well being Views175three1
Well being Expectations1852Zero
Battle and Well being792Zero
Tobacco Induced Ailments652Zero
Environmental Well being and Preventive Medication701Zero
Security and Well being at Work681Zero
Gaceta Sanitaria1161Zero
Whole13,314618100
  • Abbreviation: RA, analysis article.

four.2 Conventional corpus‐primarily based computing technique for dealing with important phrase‐rating points

AntConc three.5.eight30 works like different corpus software program; primarily based on Equation (1), it cumulates the sum of phrases’ incidence instances (i.e., frequency values) within the corpus and ranks phrases. Utilizing the compiled corpus for example, the standard technique for dealing with important phrase‐rating points will trigger the next issues: (1) operate and meaningless phrases aren’t eradicated, therefore content material phrases are ranked behind and this decreases analytical effectivity, (2) the dispersion situation of frequency shouldn’t be considered, (three) the focus situation of frequency shouldn’t be considered. Phrase‐rating ends in Determine 6 point out that the wordlist relies on phrases’ total frequency values and ranked in descending orders.

image
Conventional corpus‐primarily based computing technique used to rank phrases [Color figure can be viewed at wileyonlinelibrary.com]

four.three The proposed strategy

On this part, the compiled large textual knowledge are embedded into the proposed novel corpus‐primarily based strategy for calculating the precise outcomes of the proposed strategy. An in depth description is proven as follows:

Step 1. Compiling appropriate categorization of the large textual knowledge for H‐index evaluation.

To successfully compute the H‐index values of every token, the composition of the corpus ought to contemplate every article as a unit. To handle the large textual knowledge, first, the researchers gave every journal a codename. For instance, Annals of World Well being was coded as AGH. The aim of coding journal names was for quickly and successfully retrieving sources of tokens, therefore, growing the effectivity of textual content evaluation and mining. Second, the file title of every article paper is given primarily based on a selected rule, for example, 01. In AGH‐01, 01 means the RA’s serial quantity (i.e., from the angle of the complete large textual knowledge), AGH means journal codename, and −01 represents the RA’s serial quantity within the present journal (see Desk three).

Desk three.
Journal codename and knowledge administration of RAs
Journal titleCodenameKnowledge administration of RAs
Annals of World Well beingAGH01. AGH‐01, 02. AGH‐02
Australian and New Zealand Journal of Public Well beingANZJPH03. ANZJPH‐01
Archives of Public Well beingAPH04. APH‐01
Asian Pacific Journal of Tropical MedicationAPJTM05. APJTM‐01, 06. APJTM‐02, 07. APJTM‐03, 08. APJTM‐04
Antimicrobial Resistance and An infection ManagementARIC09. ARIC‐01
BMC Public Well beingBMCPH10. BMCPH‐01
BMJ World Well beingBMJGH11. BMJGH‐01, 12. BMJGH‐02
Environmental Well being ViewsEHP13. EHP‐01
Frontiers in Public Well beingFPH14. FPH‐01, 15. FPH‐02, 16. FPH‐03, 17. FPH‐04, 18. FPH‐05, 19. FPH‐06, 20. FPH‐07, 21. FPH‐08, 22. FPH‐09, 23. FPH‐10, 24. FPH‐11, 25. FPH‐12, 26. FPH‐13, 27. FPH‐14, 28. FPH‐15
Globalization and Well beingGAH29. GAH‐01, 30. GAH‐02
Worldwide Journal of Environmental Analysis and Public Well beingIJERPH31. IJERPH‐01, 32. IJERPH‐02, 33. IJERPH‐03, 34. IJERPH‐04, 35. IJERPH‐05, 36. IJERPH‐06, 37. IJERPH‐07, 38. IJERPH‐08, 39. IJERPH‐09, 40. IJERPH‐10, 41. IJERPH‐11, 42. IJERPH‐12,
43. IJERPH‐13, 44. IJERPH‐14, 45. IJERPH‐15, 46. IJERPH‐16, 47. IJERPH‐17, 48. IJERPH‐18, 49. IJERPH‐19, 50. IJERPH‐20, 51. IJERPH‐21, 52. IJERPH‐22, 53. IJERPH‐23, 54. IJERPH‐24,
55. IJERPH‐25, 56. IJERPH‐26, 57. IJERPH‐27, 58. IJERPH‐28, 59. IJERPH‐29, 60. IJERPH‐30, 61. IJERPH‐31, 62. IJERPH‐32, 63. IJERPH‐33, 64. IJERPH‐34, 65. IJERPH‐35, 66. IJERPH‐36,
67. IJERPH‐37, 68. IJERPH‐38, 69. IJERPH‐39, 70. IJERPH‐40, 71. IJERPH‐41
Journal of World Well beingJGH72. JGH‐01, 73. JGH‐02, 74. JGH‐03, 75. JGH‐04, 76. JGH‐05, 77. JGH‐06, 78. JGH‐07
Journal of An infection and Public Well beingJIPH79. JIPH‐01, 80. JIPH‐02, 81. JIPH‐03, 82. JIPH‐04
Journal of Nepal Medical AffiliationJNMA83. JNMA‐01, 84. JNMA‐02
Journal of EpidemiologyJOE85. JOE‐01
Lancet World Well beingLGH86. LGH‐01, 87. LGH‐02, 88. LGH‐03, 89. LGH‐04,
90. LGH‐05, 91. LGH‐06, 92. LGH‐07
Lancet Public Well beingLPH93. LPH‐01, 94. LPH‐02, 95. LPH‐03, 96. LPH‐04, 97. LPH‐05, 98. LPH‐06, 99. LPH‐07
Reproductive Well beingRH100. RH‐01
  • Abbreviation: RA, analysis article.

Step 2. Extracting tokens from the large textual knowledge.

Knowledge administration of step one indicated that the precept of coding offers big comfort when launching AntConc three.5.eight to course of corpus knowledge. The corpus software program analyzed all RAs’ phrase varieties, tokens, and lexical variety (i.e., varieties and tokens ratio, TTR; see Desk four). The lexical outcomes of the compiled large textual knowledge indicated that authors from 100 RAs adopted 13,062 phrase varieties, and the entire corpus consists of 366,866 working phrases. Moreover, its TTR is roughly equal to Zero.0356 (additionally see Desk four).

Desk four.
Lexical knowledge of the compiled large textual knowledge
Compiled large textual knowledgePhrase varietiesTokensTTR
Knowledge codenameNumbers of paper
AGH215437647Zero.2018
ANZJPH16831907Zero.3582
APH16953153Zero.2204
APJTMfour16809062Zero.1854
ARIC1394989Zero.3984
BMCPH17313108Zero.2352
BMJGH2213010,730Zero.1985
EHP18683333Zero.2604
FPH15535250,993Zero.1050
GAH213046548Zero.1991
IJERPH419124184,639Zero.0494
JGH7326326,739Zero.1220
JIPHfour16999554Zero.1778
JNMA29732763Zero.3522
JOE18653773Zero.2293
LGH7290520,091Zero.1446
LPH7241119,153Zero.1259
RH18572720Zero.3151
Entire corpus10013,062366,866Zero.0356
  • Abbreviation: TTR, varieties and tokens ratio.

Step three. Optimizing the large textual knowledge.

On the premise of Chen et al.’s46 analysis, operate phrases, corresponding to a, an, the, it, is, and so forth, would lower the effectivity of textual content mining and IR. Certainly, irrespective of which algorithm is used to calculate the significance of tokens, the irreplaceability of operate phrases in setting up significant sentences will trigger them to seem in ensuing knowledge and even be ranked very excessive, which immediately decreases the accuracy and effectivity of knowledge processing. Thus, the researchers adopted Chen et al.’s46 large textual knowledge refining strategy to optimize the compiled large textual knowledge; the refined wordlist on the corpus software program reveals that significant phrases are ranked to the entrance (see Determine 7). As well as, the information discrepancy confirmed that phrase forms of refined knowledge decreased by 238 phrases (i.e., operate phrases), nonetheless, tokens of refined knowledge decreased 157,911 phrases, which prompted a 43% downsizing within the corpus. Furthermore, the lexical variety was enhanced to Zero.Zero614 (see Desk 5). Unexpectedly, when going through extremely specialised medical RAs, operate phrases additionally occupied greater than 40% of the corpus. To keep away from data distortion, the eliminating process for operate phrases is inevitable.

image
Refined conventional corpus‐primarily based computing technique used to rank phrases [Color figure can be viewed at wileyonlinelibrary.com]
Desk 5.
Knowledge discrepancy between authentic and refined knowledge
Lexical functionAuthentic knowledgeRefined knowledgeKnowledge discrepancy
Phrase varieties13,06212,824−238 (−1.eight%)
Tokens366,866208,955−157,911 (−43%)
TTRZero.0356Zero.Zero614
  • Abbreviation: TTR, varieties and tokens ratio.

Step four. Rating tokens primarily based on particular person total frequency standards.

After optimizing the compiled large textual knowledge, the authors adopted the refined conventional corpus‐primarily based computing technique30 to compute the sum of frequency values of every token (see Determine 7), and to seek out out every token’s frequency values in every RA by the Concordance Plot operate of the corpus software program. Within the Concordance Plot, Concordance Hit represents a token’s total frequency values, and Whole Plot (with hits) represents what number of RAs adopted a token. Take COVID for example, its Concordance Hit is 3520 (i.e., total frequency values) and Whole Plot (with hits) is 100 which suggests COVID was adopted by 100 RA authors (see Determine eight). Therefore, on this step, the authors obtained three necessary components which embody total frequency values, frequency values in every RA, and what number of RAs adopted a token. These components are important and will probably be calculated by the H‐index algorithm within the following step.

image
Interface of Concordance Plot: COVID for example [Color figure can be viewed at wileyonlinelibrary.com]

Step 5. Rating tokens primarily based on H‐index algorithm.

On this step, the researchers used the wordlist to compute tokens (N = 420) that had frequency values over 100. Take mortality for example, the authors recorded frequency values of mortality of every RA as authentic knowledge, and sorted every frequency from highest to lowest, then it was discovered that

f(9)≥9; that glad the standards of Equation (three), thus, the worth of H‐index was given as 9 (see Desk 6). This computing strategy is used to calculate a token’s total adopting charges and consider its significance degree extra precisely. Then, they recorded tokens’ H‐index values in Excel software program for a rating course of.

Desk 6.
An instance of a token’s H‐index computing course of
TokenAuthentic knowledgeComputing course ofH‐index end result
ArticlesFrequencyArticlesFrequency
Mortality126909
213836
three12031
fourthree2530
523721
6903013
713312
eightthree2210
9four159
105186
111236
122105
131395
1429four
15931four
162fourthree
171eightthree
18626three
19112
203152
211122
2210142
236162
24121
2530three1
26three71
271111
281131
291171
3013191
31four211
321241
3312271
341281
351291
361321
3721341
3836351
395361

It was discovered that after utilizing the H‐index values to rank tokens, the sequences of the wordlist had been modified considerably as a result of H‐index calculated authors’ adoption charge in every RA and reinterpreted the significance of tokens. Nonetheless, tokens’ H‐index values usually produced the identical worth. If the identical H‐index values are encountered, the authors would type tokens by their frequency values once more. That’s, this paper considers H‐index and frequency values concurrently to make the necessary calculation of tokens extra correct.

Step 6. Integrating tokens’ rating data for future prolonged purposes.

The wordlist of Step 5 confirmed the mixtures of token’s H‐index and frequency values. The tokens’ rating concern dealt with by the proposed strategy redefine their significance degree, therefore, these knowledge present the necessary referential indicators for future purposes, corresponding to IR, NLP, large knowledge evaluation, machine studying, deep studying, and so forth. By this examine, the authors suggest a novel corpus‐primarily based strategy that integrates a corpus software program and H‐index algorithm to calculate which tokens are necessary in medical RAs. The ensuing knowledge will enhance native and EFL medical researchers’ studying and processing effectivity of medical RAs.

four.four Comparability and dialogue

When competitor strategies (i.e., the standard frequency‐primarily based strategy30 and the refined conventional frequency‐primarily based strategy46) in dealing with phrase‐rating points solely primarily based on phrases’ frequency values or vary values, respectively, to find out their sequences, specifically, conventional strategies don’t integrally take a phrase’s dispersion and focus standards under consideration. This deficiency will trigger important phrase‐rating outcomes exist bias, as well as, the significance ranges of excessive‐frequency important phrases will probably be challenged. Therefore, to enhance the accuracy and effectivity of the large textual knowledge evaluation, this part makes use of the collected COVID‐19‐associated RAs from WOS because the empirical instance (i.e., check knowledge) to debate the distinction between the standard frequency‐primarily based strategy,30 the refined conventional frequency‐primarily based strategy46 and the proposed strategy (see Desk 7). As well as, the highest 50 phrases ranked by the three approaches are additionally introduced to indicate the discrepancies between them (see Desk eight). First, refined corpus knowledge are in comparison with present which approaches are in a position to make content material phrases ranked greater. Second, frequency dispersion standards are in comparison with present that the proposed strategy can compute frequency dispersion standards, thus, making phrase‐rating outcomes extra correct. Lastly, calculating frequency focus standards is in comparison with present that the proposed strategy can compute frequency focus standards, thereby, compensating the blind aspect of actually defining excessive‐frequency phrases’ significance degree.

  • 1.

    Refining corpus knowledge

    In line with Desk eight, uncooked knowledge include many capabilities and meaningless tokens, corresponding to the, of, and, to, in, and so forth. The standard frequency‐primarily based strategy30 calculated all tokens’ frequency values, it was unable to determine which tokens include extra substantial meanings for people. To allow the corpus‐primarily based approaches to rank important phrases with substantial meanings, the refined conventional frequency‐primarily based strategy46 and the proposed strategy have eradicated operate and meaningless phrases. Therefore, primarily based on Desk eight, refined knowledge present content material phrases which have common or area‐oriented functions. It makes corpus analytical outcomes extra significant and enhances its effectivity in retrieving important phrases.

  • 2.

    Calculating frequency dispersion standards

Desk 7.
A comparability of corpus‐primarily based approaches
StrategiesRefining corpus knowledgeCalculating frequency dispersion standardsCalculating frequency focus standards
The standard frequency‐primarily based strategy30NoNoNo
The refined conventional frequency‐primarily based strategy46SureNoNo
The proposed strategySureSureSure
Desk eight.
The highest 50 tokens of the in contrast three approaches (partial knowledge)
Uncooked knowledgeRefined knowledge
The standard frequency‐primarily based strategy30The refined conventional frequency‐primarily based strategy44The proposed strategy
RankFrequencyTokenRankFrequencyTokenRankH‐indexFrequencyToken
123,Zero79the13520COVID1393520COVID
214,660of22325well being2282325well being
three13,258andthree1247examinethree211247examine
four9577tofour1162pandemicfour201162pandemic
59218in51148circumstances5181109sufferers
65721a61109sufferers6171148circumstances
73891with7999knowledge717871throughout
eight3699foreight871throughouteight16999knowledge
93520COVID9779social915702folks
103279that10714public1014779social
112857is11711SARS1114701quantity
122631as12702folks1214660threat
132544was13701quantity1314645time
142417had been14660threat1414642illness
152353on15645time1514599care
162325well being16642illness1613714public
172084be17626desk1713711SARS
181968by18619reported1813619reported
191882this19615medical1913614CoV
201873are20614CoV2013594signs
211783from21599care2113593nations
221699or22594signs2213541one
231404have23593nations2313491transmission
241383we24582an infection2412582an infection
251275not25570inhabitants2512570inhabitants
261247examine26561individuals2612561individuals
271213at27546first2712533excessive
281177their28541one2812499evaluation
291162pandemic29533excessive2912439medical
301148circumstances30531management3011626desk
311112it31527used3111615medical
321111an32526outcomes3211546first
331109sufferers33506primarily based3311526outcomes
34999knowledge34499evaluation3411506primarily based
35957extra35498case3511498case
36921which36491transmission3611471data
37871throughout37471data3711470analysis
38861can38470analysis3811466associated
39834has39466associated3911465greater
40814additionally40465greater4011456virus
41805these41456virus4111404age
42792p42452research4211404related
43788could43451use4311404confirmed
44779social44446two4411355components
45762been45442coronavirus4511340mannequin
46752they46439medical4610531management
47746had47432outbreak4710527used
48737who48431measures4810451use
49731all49420CHINA4910446two
50731different50406new5010442coronavirus

The authors adopted the proposed strategy to compute the highest 420 tokens whose frequency values reached greater than 100, respectively, from the wordlist of the refined knowledge. In line with Desk eight, there have been vital variations in token rating between the standard corpus‐primarily based computing approaches30, 46 and the proposed strategy. The standard corpus‐primarily based computing approaches30, 46 solely calculated a token’s complete frequency values to outline its rank and significance; nonetheless, the frequency dispersion standards weren’t considered; that’s, a token with excessive frequency will not be extensively adopted or utilized by the RA authors, or could also be concentrated in only a few RAs and even probably happen in just one RA. However, the proposed strategy not solely used H‐index to compute the dispersion and focus standards of frequency concurrently, but in addition used frequency values to differentiate tokens that had the identical H‐index values. Due to this fact, after taking all standards into concerns, the proposed strategy is extra rigorous and correct. Apparently, tokens, corresponding to COVID, well being, examine, pandemic, reported, an infection, inhabitants, individuals, and case, nonetheless stay of their authentic ranks when put next with the refined conventional frequency‐primarily based strategy and the proposed strategy; that’s, after being calculated utilizing the 2 approaches, their frequency and H‐index values had been each extraordinarily excessive, therefore these tokens’ significance was unquestionable.

The calculation outcomes of the proposed strategy redefine the significance of tokens (N = 420) that had been in contrast with the standard corpus‐primarily based computing approaches.30, 46 In different phrases, the authors discovered solely 11 tokens (2.6%) that remained at authentic ranks and solely 9 tokens (2.1%) amongst them within the prime 50 wordlists (see Desk eight), 15 tokens (three.5%) that moved ahead greater than 100 ranks, respectively, 196 tokens (46.6%) that moved ahead from 1 to 99 ranks, respectively, 14 tokens (three.three%) that moved backward greater than 100 ranks, respectively, and 184 tokens (43.eight%) that moved backward from 1 to 99 ranks, respectively. In different phrases, the proposed strategy efficiently re‐evaluates the significance of tokens and makes greater than 97% modifications by adopting H‐index algorithm which concurrently took the dispersion and focus standards of frequency into consideration (see Desk 9).

Desk 9.
Adjustments of token ranks (N = 420)
Knowledge discrepancyToken numbersProportion
Tokens keep on the authentic ranks11Zero.0262
Tokens transfer ahead greater than 100 ranks15Zero.0357
Tokens transfer ahead from 1 to 99 ranks196Zero.4667
Tokens transfer backward greater than 100 ranks14Zero.0333
Tokens transfer backward from 1 to 99 ranks184Zero.4381
Tokens’ H‐index worth equal to 12Zero.0048
The 9 tokens (2.1%) within the prime 50 wordlists point out that these tokens had been extraordinarily important they usually had unquestionable significance relatively than the fault of the proposed strategy as they confirmed no variations when put next with the standard corpus‐primarily based computing approaches.30, 46 These tokens are necessary as a result of they had been adopted by many RA authors and occurred with very excessive frequency within the compiled large textual knowledge. Furthermore, the proposed strategy made tokens’ sequence strikes ahead and backward which implicated the standard corpus‐primarily based computing approaches30, 46 prompted the distortion when dealing with token rating points. For instance, efforts had been ranked at 349 primarily based on its calculation outcomes of the standard corpus‐primarily based computing approaches30, 46 (frequency = 113), however after being computed by the proposed strategy (H‐index = 7; frequency = 113), its rank moved ahead at 179, that’s, it moved ahead by 170 sequences. In different phrases, the significance of efforts was promoted, but, initially, its precise significance degree was underestimated by the standard corpus‐primarily based computing approaches.30, 46 One other occasion, information, was ranked at 125 primarily based on its calculation ends in the standard corpus‐primarily based computing approaches30, 46 (frequency = 229). Nonetheless, its most incidence instances had been concentrated in few RAs, information occurred 180 instances (78%) solely in an RA that was coded as 63. IJERPH‐33 within the compiled knowledge. After computing by the proposed strategy (H‐index = three; frequency = 229), its rank moved backward to 410, that’s, it moved backward by 285 sequences. The information discrepancy signifies that its precise significance degree was overestimated by the standard corpus‐primarily based computing approaches.30, 46 The distorted outcomes had been attributable to the standard corpus‐primarily based computing approaches30, 46 as a result of these didn’t take tokens’ frequency dispersion standards into consideration, while outlined tokens’ significance degree was solely primarily based on their complete frequency. Quite the opposite, the proposed strategy took tokens’ frequency dispersion standards into consideration, therefore, they produce extra correct analysis outcomes, and outline tokens’ significance degree extra exactly.

The proposed strategy may deal with tokens’ frequency focus standards. For instance, as found, hyponatremia was ranked at 231 primarily based on its calculation ends in the standard corpus‐primarily based computing approaches30, 46 (frequency = 153), and tobacco was ranked at 391 primarily based on its calculation ends in the standard corpus‐primarily based computing approaches30, 46 (frequency = 104). However, after computing by the proposed strategy, each phrases’ H‐index values had been equal to 1 (see Desk 9); therefore, their submit rank moved backward at 419 and 420, respectively (i.e., they turned the final necessary two phrases amongst 420 tokens), they moved backward by 188 and 29 sequences, respectively. Even when hyponatremia and tobacco had greater than 100 incidence instances within the compiled large textual knowledge, they had been adopted by just one RA every. In different phrases, their significance was nearly negligible as a result of there’s extraordinarily low likelihood that folks will encounter these two phrases in future COVID‐19‐associated RAs. Due to this fact, the standard corpus‐primarily based computing approaches30, 46 once more overestimated the tokens’ significance degree.

To conclude this part, tokens’ significance degree computation has affected the evaluation and improvement of huge knowledge administration and processing, search engines like google, and different relative AI industries. If the frequency worth is the one standards for rating tokens’ significance degree, the evaluation of their significance will probably be inaccurate and distorted. Therefore, we proposed the novel corpus‐primarily based strategy on this paper, which integrates a corpus software program and H‐index algorithm to take tokens’ frequency dispersion and focus standards into consideration concurrently, thus, precisely and comprehensively dealing with the token rating concern.

5 CONCLUSION

Conventional corpus‐primarily based computing strategies nonetheless current some analytical doubts throughout corpus processing, for instance, refining corpus knowledge, computing frequency dispersion standards, and computing frequency focus standards. These could trigger a lower in corpus knowledge processing effectivity, and extra severely, the analysis of tokens’ significance degree could also be biased as frequency worth is the one indicator used for dealing with phrase‐rating points in conventional corpus‐primarily based computing strategies. Thus, to compensate the blind aspect of the standard strategies, this paper proposed a novel corpus‐primarily based strategy that integrates a corpus software program and H‐index algorithm to refine corpus knowledge, to calculate tokens’ frequency dispersion and focus standards, and additional to deal with phrase‐rating points.

The numerous contributions of the proposed strategy are listed as: (1) the proposed strategy is ready to refine corpus knowledge through machine processing to get rid of operate and meaningless phrases, (2) the proposed strategy is ready to compute tokens’ frequency dispersion standards; furthermore, when going through tokens with the identical H‐index values, tokens’ frequency values are the second standards used to rank, therefore, it makes phrase‐rating course of extra correct and to keep away from hesitance conditions occurring within the rating course of, (three) the proposed strategy is ready to compute tokens’ frequency focus standards, corresponding to in circumstances the place a token has excessive‐frequency values however is overconcentrated in sure RAs; therefore, H‐index = 1 signifies that H‐index algorithm exactly evaluates a token’s significance degree, while, frequency values overestimate a token’s significance degree and trigger rating outcomes distortion. Moreover, in relation to textual evaluation in COVID‐19‐associated RAs, the proposed strategy additionally helps native and EFL frontline healthcare personnel to combine and retrieve skilled medical data, and to additional improve their data processing effectivity.

This paper exists a serious limitation that’s ready for future researches to beat, for instance, with out the assistant of current software program, H‐index computing course of nonetheless depends on human processing, as soon as the information are too bounteous, it can trigger a terrific burden on knowledge analysts. Therefore, when it comes to future perspective, this paper means that future corpus‐primarily based and NLP analysis can import H‐index algorithm to corpus program (i.e., software program) for processing large textual knowledge. It can improve accuracy and effectivity in dealing with phrase‐rating points, and support correct retrieval of important phrases from the large textual knowledge.

ACKNOWLEDGMENTS

The authors want to thank the Ministry of Science and Expertise, Taiwan, for financially supporting this examine underneath Contract Nos. MOST 108‐2410‐H‐145‐Zero01 and MOST 109‐2410‐H‐145‐Zero02.

    CONFLICT OF INTERESTS

    The authors declare that there is no such thing as a battle of pursuits.

    REFERENCES

    • 1Barshandeh S, Haghzadeh M. A brand new hybrid chaotic atom search optimization primarily based on tree‐seed algorithm and Levy flight for fixing optimization issues. Eng Comput. 2020. https://doi.org/10.1007/s00366-020-00994-Zero
    • 2Barshandeh S, Piri F, Sangani SR. HMPA: an revolutionary hybrid multi‐inhabitants algorithm primarily based on synthetic ecosystem‐primarily based and Harris Hawks optimization algorithms for engineering issues. Eng Comput. 2020. https://doi.org/10.1007/s00366-020-01120-w
    • threeChang CF, Kuo CH. A corpus‐primarily based strategy to on-line supplies improvement for writing analysis articles. Engl Specif Purp. 2011; 30(three): 222‐ 234.
    • fourGholami J, Zeinolabedini M. Peer‐to‐peer prescriptions in medical sciences: Iranian discipline specialists’ attitudes towards comfort modifying. Engl Specif Purp. 2017; 45: 86‐ 97.
    • 5Grabowski L. Key phrases and lexical bundles inside English pharmaceutical discourse: a corpus‐pushed description. Engl Specif Purp. 2015; 38: 23‐ 33.
    • 6Le CNN, Miller J. A corpus‐primarily based record of generally used English medical morphemes for college kids studying English for particular functions. Engl Specif Purp. 2020; 58: 102‐ 121.
    • 7Gorbalenya AE, Baker SC, Baric RS, et al. The species extreme acute respiratory syndrome‐associated coronavirus: classifying 2019‐nCoV and naming it SARS‐CoV‐2. Nat Microbiol. 2020; 5(four): 536‐ 544.
    • eightGuan W, Ni Z, Hu Y, et al. Medical traits of coronavirus illness 2019 in China. N Engl J Med. 2020; 382(18): 1708‐ 1720.
    • 9Lu RJ, Zhao X, Li J, et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet. 2020; 395(10224): 565‐ 574.
    • 10Wang C, Horby PW, Hayden FG, Gao GF. A novel coronavirus outbreak of worldwide well being concern. Lancet. 2020; 395(10223): 470‐ 473.
    • 11Wolfel R, Corman VM, Guggemos W, et al. Virological evaluation of hospitalized sufferers with COVID‐2019. Nature. 2020; 581(7809): 465‐ 469.
    • 12Yi Y, Lagniton PNP, Ye S, Li EQ, Xu RH. COVID‐19: what has been discovered and to be discovered concerning the novel coronavirus illness. Int J Biol Sci. 2020; 16(10): 1753‐ 1766.
    • 13Forrest JI, Rayner CR, Park JJH, Mills EJ. Early therapy of COVID‐19 illness: a missed alternative. Infect Dis Ther. 2020; 9(four): 715‐ 720.
    • 14Huff C. Covid‐19: People afraid to hunt therapy due to the steep value of their excessive deductible insurance coverage. BMJ—Br Med J. 2020; 371:m3860.
    • 15Cheng X, Cao Q, Liao SS. An summary of literature on COVID‐19, MERS and SARS: utilizing textual content mining and latent Dirichlet allocation. J Inf Sci. 2020; 2020:0165551520954674.
    • 16Glowacki EM, Wilcox GB, Glowacki JB. Figuring out #habit considerations on Twitter in the course of the COVID‐19 pandemic: a textual content mining evaluation. Subst Abus. 2021; 42(1): 39‐ 46. https://doi.org/10.1080/08897077.2020.1822489
    • 17Gao L, Yu SB, Li MH, Shen ZS, Gao ZY. Weighted h‐index for figuring out influential spreaders. Symmetry—Basel. 2019; 11(10): 1263.
    • 18Hauer MP, Hofmann XCR, Krafft TD, Zweig KA. Quantitative evaluation of computerized efficiency analysis methods primarily based on the h‐index. Scientometrics. 2020; 123(2): 735‐ 751.
    • 19Hirsch JE. An index to quantify a person’s scientific analysis output. Proc Natl Acad Sci USA. 2005; 102(46): 16569‐ 16572.
    • 20Hu GY, Wang L, Ni R, Liu WS. Which h‐index? An exploration inside the Internet of Science. Scientometrics. 2020; 123(three): 1225‐ 1233.
    • 21Kung SC, Chien TW, Yeh YT, Lin JCJ, Chou W. Utilizing the bootstrapping technique to confirm whether or not hospital physicians have totally different h‐indexes concerning particular person analysis achievement a bibliometric evaluation. Medication (Baltimore). 2020; 99(33):e21552.
    • 22Li XL, Zhang P, Zhu GY. Measuring technique of node significance of city rail community primarily based on h index. Appl Sci—Basel. 2019; 9(23): 5189.
    • 23Pluskiewicz W, Drozdzowska B, Adamczyk P, Noga Ok. Scientific high quality index: a composite measurement‐unbiased metric in contrast with h‐index for 480 medical researchers. Scientometrics. 2019; 119(2): 1009‐ 1016.
    • 24Hanna TN, Duszak R, Chahine A, Zygmont ME, Herr KD, Sexy M. The introduction and improvement of the H‐index for imaging utilizers: a novel metric for quantifying utilization of emergency division imaging. Acad Emerg Med. 2019; 26(10): 1125‐ 1134.
    • 25O’Keeffe A, McCarthy M, Carter R. From corpus to classroom: language use and language instructing. Cambridge: Cambridge College Press; 2007.
    • 26Motschenbacher H. Corpus linguistic onomastics: a plea for a corpus‐primarily based investigation of names. Names. 2020; 68(2): 88‐ 103.
    • 27Seracini FL. Phrasing in multilingual EU laws: a corpus‐primarily based examine of translated multi‐phrase phrases. Perspect—Stud Transl. https://doi.org/10.1080/0907676X.2020.1800058
    • 28Gholaminejad R, Sarab MRA. Tutorial vocabulary and collocations utilized in language instructing and utilized linguistics textbooks a corpus‐primarily based strategy. Terminology. 2020; 26(1): 82‐ 107.
    • 29Zhang XM, Kotze H, Fang J. Explicitation in youngsters’s literature translated from English to Chinese language: a corpus‐primarily based examine of private pronouns. Perspect—Stud Transl. 2020; 28(5): 717‐ 736.
    • 30Anthony L. AntConc (Model three.5.eight). Corpus Software program. 2019. https://www.laurenceanthony.internet/software program/antconc/
    • 31Scott M. PC evaluation of key phrases—and key key phrases. System. 1997; 25: 233‐ 245.
    • 32Paules CI, Marston HD, Fauci AS. Coronavirus infections—extra than simply the frequent chilly. JAMA—J Am Med Assoc. 2020; 323(eight): 707‐ 708.
    • 33Wan YS, Shang J, Graham R, Baric RS, Li F. Receptor recognition by the novel coronavirus from Wuhan: an evaluation primarily based on decade‐lengthy structural research of SARS coronavirus. J Virol. 2020; 94(7): e00127‐ 20.
    • 34Zhu N, Zhang DY, Wang WL, et al. A novel coronavirus from sufferers with pneumonia in China, 2019. N Engl J Med. 2020; 382(eight): 727‐ 733.
    • 35Ye Z, Zhang Y, Wang Y, Huang ZX, Track B. Chest CT manifestations of recent coronavirus illness 2019 (COVID‐19): a pictorial evaluate. Eur Radiol. 2020; 30(eight): 4381‐ 4389.
    • 36Ahamad MM, Aktar S, Rashed‐Al‐Mahfuz M, et al. A machine studying mannequin to determine early stage signs of SARS‐CoV‐2 contaminated sufferers. Professional Syst Appl. 2020; 160:113661113661.
    • 37Jones RM. Relative contributions of transmission routes for COVID‐19 amongst healthcare personnel offering affected person care. J Occup Environ Hyg. 2020; 17(9): 408‐ 415.
    • 38Morawska L, Cao JJ. Airborne transmission of SARS‐CoV‐2: the world ought to face the fact. Environ Int. 2020; 139:105730.
    • 39Setti L, Passarini F, De Gennaro G, et al. Airborne transmission route of COVID‐19: why 2 meters/6 toes of inter‐private distance couldn’t be sufficient. Int J Environ Res Public Well being. 2020; 17(eight): 2932.
    • 40Czeisler ME, Garcia‐Williams AG, Molinari NA, et al. Demographic traits, experiences, and beliefs related to hand hygiene amongst adults in the course of the COVID‐19 pandemic—United States, June 24–30, 2020. MMWR—Morb Mortal Wkly Rep. 2020; 69(41): 1485‐ 1491.
    • 41Lee CY, Wang PS, Huang YD, Lin YC, Hsu YN, Chen SC. Evacuation of quarantine‐certified nationals from Wuhan for COVID‐19 outbreak—Taiwan expertise. J Microbiol Immunol Infect. 2020; 53(three): 392‐ 393.
    • 42Peak CM, Kahn R, Grad YH, et al. Particular person quarantine versus energetic monitoring of contacts for the mitigation of COVID‐19: a modelling examine. Lancet Infect Dis. 2020; 20(9): 1025‐ 1033.
    • 43Dang TNY. Excessive‐frequency phrases in tutorial spoken English: corpora and learners. ELT J. 2020; 74(2): 146‐ 155.
    • 44Kempen G, Harbusch Ok. Mutual attraction between excessive‐frequency verbs and clause varieties with finite verbs in early positions: corpus proof from spoken English, Dutch, and German. Lang Cogn Neurosci. 2019; 34(9): 1140‐ 1151.
    • 45Jelodar H, Wang YL, Orji R, Huang SC. Deep sentiment classification and matter discovery on novel coronavirus or COVID‐19 on-line discussions: NLP utilizing LSTM recurrent neural community strategy. IEEE J Biomed Well being Inform. 2020; 24(10): 2733‐ 2742.
    • 46Chen LC, Chang KH, Chung HY. A novel statistic‐primarily based corpus machine processing strategy to refine a giant textual knowledge: an ESP case of COVID‐19 information studies. Appl Sci—Basel. 2020; 10(16): 5505.
    • 47Lin HX, Wang XT, Huang ML, et al. Analysis hotspots and developments of bone defects primarily based on Internet of Science: a bibliometric evaluation. J Orthop Surg Res. 2020; 15(1): 463.
    • 48Carmona‐Serrano N, Lopez‐Belmonte J, Cuesta‐Gomez JL, Moreno‐Guerrero AJ. Documentary evaluation of the scientific literature on autism and know-how in Internet of Science. Mind Sci. 2020; 10(12): 985.
    • 49Li ZQ, Poon H, Chen W, Fan JT. A comparative evaluation of textile faculties by journal publications listed in Internet of Science (TM). J Textual content Inst. https://doi.org/10.1080/00405000.2020.1824434

    [ad_2]

    Source link

    Inter 2025

    Inter 2025

    Next Post
    How some indigenous communities have built their own internet

    How some indigenous communities have built their own internet

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    Recommended.

    Nevada

    Is Nevada basketball’s backcourt the best?

    February 25, 2020
    ‘Anti-lockdown protesters’ branded ‘idiotic’ by Basildon Council leader as police make arrests

    ‘Anti-lockdown protesters’ branded ‘idiotic’ by Basildon Council leader as police make arrests

    November 21, 2020

    Trending.

    The 6 Best Remote Car Starters in 2024

    The 6 Best Remote Car Starters in 2024

    April 18, 2024
    cellular

    Pros and Cons to using Wi-Fi and Cellular Internet

    February 18, 2020
    Thanks to the internet the 2010s were the decade of people power

    Thanks to the internet the 2010s were the decade of people power

    December 24, 2019
    The 6 best Linux desktop PCs in 2024

    The 6 best Linux desktop PCs in 2024

    April 7, 2024
    Vietnam edges up in Huawei’s global connectivity rankings

    Vietnam edges up in Huawei’s global connectivity rankings

    November 26, 2019

    Follow Us

    Categories

    • Branding
    • Computers
    • Internet Starters
    • Marketing Tips
    • The Internet
    Internet Starters

    RSS Live Software news

    • The Ultimate Guide to Bandwidth Monitoring.
    • Website Traffic Monitor
    • About
    • Advertise
    • Privacy & Policy
    • Contact

    Design and develop by 2020 name. 2020 name

    This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
    Privacy & Cookies Policy

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
    Non-necessary
    Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
    SAVE & ACCEPT
    No Result
    View All Result
    • Home

    Design and develop by 2020 name. 2020 name