BASIC JOURNAL ANALYSIS#

The notebook uses frozen static metadata files from the summer of 2022 to analyze the journal-level importance of EarthCube data.

It broadly aims to explore:

  1. the proportion of EC papers that have received more citations than the average paper for the journal/year;

  2. the cumulative citation count for this group and whether they are higher than the sum of all of these averages;

  3. the top 10 papers in terms of % above the average for their journal/year

Fixed inputs to this notebook are:

  • ../inputs/20220805_ec_journal_titles_plus_citations.xlsx: the journal citation exported data from Web of Science

  • ../inputs/cr_metadata_20220610012125.json: we use a fixed snapshop metadata file extracted from crossref on June 10, 2022.

NOTE: We have fixed input snapshot targets because both sources are changing and updating at different times. Furthermore, WOS data requires subscription, which makes dynamic replication difficult.

import pandas as pd
import json
df = pd.read_excel("../inputs/20220805_ec_journal_titles_plus_citations.xlsx")
df
Unnamed: 0 doi journal_title publication_title url Year Journal_Avg_cit
0 15 10.5065/p2jj-9878 -- -- https://doi.org/10.5065/p2jj-9878 NaN NI
1 79 10.1594/ieda/100709 -- -- https://doi.org/10.1594/ieda/https://doi.org/1... NaN NI
2 81 10.5281/zenodo.5496306 -- -- https://doi.org/10.5281/zenodo.5496306 NaN NI
3 93 10.13140/rg.2.1.4908.4561 -- -- https://doi.org/10.13140/rg.2.1.4908.4561 NaN NI
4 109 10.6084/m9.figshare.4272164.v1 -- -- https://doi.org/10.6084/m9.figshare.4272164.v1 NaN NI
... ... ... ... ... ... ... ...
236 102 10.1111/tgis.12233 Transactions in GIS Crowdsensing smart ambient environments and se... https://doi.org/10.1111/tgis.12233 2016.0 13.08
237 219 10.1002/2015wr017342 Water Resources Research Hydrocomplexity: Addressing water security and... https://doi.org/10.https://doi.org/10.2/2015wr... 2015.0 40.97
238 27 10.22498/pages NaN Past Global Changes Magazine https://doi.org/10.22498/pages NaN NI
239 60 10.17504/protocols.io.fjjbkkn NaN ECOGEO 'Omics Training: Introduction to Enviro... https://doi.org/10.17504/protocols.io.fjjbkkn NaN NI
240 222 10.1101/647651 NaN Ecological and genomic attributes of novel bac... https://doi.org/10.1https://doi.org/10./647651 NaN NI

241 rows × 7 columns

df['Journal_Avg_cit'] = df['Journal_Avg_cit'].astype(float, errors='ignore')
df
Unnamed: 0 doi journal_title publication_title url Year Journal_Avg_cit
0 15 10.5065/p2jj-9878 -- -- https://doi.org/10.5065/p2jj-9878 NaN NI
1 79 10.1594/ieda/100709 -- -- https://doi.org/10.1594/ieda/https://doi.org/1... NaN NI
2 81 10.5281/zenodo.5496306 -- -- https://doi.org/10.5281/zenodo.5496306 NaN NI
3 93 10.13140/rg.2.1.4908.4561 -- -- https://doi.org/10.13140/rg.2.1.4908.4561 NaN NI
4 109 10.6084/m9.figshare.4272164.v1 -- -- https://doi.org/10.6084/m9.figshare.4272164.v1 NaN NI
... ... ... ... ... ... ... ...
236 102 10.1111/tgis.12233 Transactions in GIS Crowdsensing smart ambient environments and se... https://doi.org/10.1111/tgis.12233 2016.0 13.08
237 219 10.1002/2015wr017342 Water Resources Research Hydrocomplexity: Addressing water security and... https://doi.org/10.https://doi.org/10.2/2015wr... 2015.0 40.97
238 27 10.22498/pages NaN Past Global Changes Magazine https://doi.org/10.22498/pages NaN NI
239 60 10.17504/protocols.io.fjjbkkn NaN ECOGEO 'Omics Training: Introduction to Enviro... https://doi.org/10.17504/protocols.io.fjjbkkn NaN NI
240 222 10.1101/647651 NaN Ecological and genomic attributes of novel bac... https://doi.org/10.1https://doi.org/10./647651 NaN NI

241 rows × 7 columns

df_subset = df[df['Journal_Avg_cit']!='NI']
df_subset[df_subset['Journal_Avg_cit']>30]
Unnamed: 0 doi journal_title publication_title url Year Journal_Avg_cit
45 134 10.1145/3129246 ACM Transactions on Database Systems EmptyHeaded https://doi.org/10.1145/3129246 2017.0 35.65
53 7 10.1175/bams-d-14-00164.1 Bulletin of the American Meteorological Society The Earth System Prediction Suite: Toward a Co... https://doi.org/10.1175/bams-d-14-00164.1 2016.0 30.85
54 65 10.1175/bams-d-15-00239.1 Bulletin of the American Meteorological Society Sharing Experiences and Outlook on Coupling Te... https://doi.org/10.1175/bams-d-15-00239.1 2016.0 30.85
70 147 10.1007/s40641-018-0107-0 Current Climate Change Reports Rising Oceans Guaranteed: Arctic Land Ice Loss... https://doi.org/10.https://doi.org/10.7/s40641... 2018.0 43
109 106 10.1186/s13073-015-0202-y Genome Medicine Use of semantic workflows to enhance transpare... https://doi.org/10.1186/s13073-015-0202-y 2015.0 42.88
110 234 10.1111/gfl.12114 Geofluids DigitalCrust - a 4D data system of material pr... https://doi.org/10.1111/gfl.12114 2015.0 30.45
133 82 10.1109/tgrs.2014.2382566 IEEE Transactions on Geoscience and Remote Sen... Regular Shape Similarity Index: A Novel Index ... https://doi.org/10.1https://doi.org/10./tgrs.2... 2015.0 43.05
178 225 10.1038/nbt.4306 Nature Biotechnology Minimum Information about an Uncultivated Viru... https://doi.org/10.https://doi.org/10.8/nbt.4306 2019.0 67.85
179 47 10.1038/s41561-018-0272-8 Nature Geoscience Similarity of fast and slow earthquakes illumi... https://doi.org/10.https://doi.org/10.8/s41561... 2019.0 39.7
206 92 10.1016/j.renene.2017.02.052 Renewable Energy Short-term photovoltaic power forecasting usin... https://doi.org/10.https://doi.org/10.6/j.rene... 2017.0 32.19
207 44 10.1126/science.aad7048 Science Liberating field science samples and data https://doi.org/10.1126/science.aad7048 2016.0 87
208 190 10.1126/science.342.6162.1041-b Science Open Data: Crediting a Culture of Cooperation https://doi.org/10.1126/science.342.6162.https... 2013.0 126.97
209 180 10.1038/sdata.2017.88 Scientific Data A global multiproxy database for temperature r... https://doi.org/10.https://doi.org/10.8/sdata.... 2017.0 39.61
237 219 10.1002/2015wr017342 Water Resources Research Hydrocomplexity: Addressing water security and... https://doi.org/10.https://doi.org/10.2/2015wr... 2015.0 40.97
df_json = pd.read_json("../inputs/cr_metadata_20220610012125.json").T
df_json.columns
Index(['indexed', 'reference-count', 'publisher', 'content-domain',
       'short-container-title', 'published-print', 'DOI', 'type', 'created',
       'source', 'is-referenced-by-count', 'title', 'prefix', 'author',
       'member', 'event', 'container-title', 'original-title', 'link',
       'deposited', 'score', 'resource', 'subtitle', 'short-title', 'issued',
       'references-count', 'URL', 'relation', 'published', 'issue', 'license',
       'funder', 'update-policy', 'volume', 'published-online', 'reference',
       'language', 'journal-issue', 'alternative-id', 'archive', 'ISSN',
       'issn-type', 'subject', 'assertion', 'abstract', 'page',
       'published-other', 'accepted', 'publisher-location', 'editor',
       'article-number', 'posted', 'subtype', 'isbn-type', 'ISBN',
       'institution', 'group-title'],
      dtype='object')
df_journals = df.merge(
    df_json[['DOI', 'is-referenced-by-count']].reset_index().drop('index',axis=1).rename(columns={'DOI': 'doi'}),
    on='doi'
)

Journal Citation Counts vs Actual Citation Counts#

Here our interest will turn to how well the actual citation counts match the expected journal citation counts from WOS.

df_journal_cva = df_journals.query('Journal_Avg_cit != "NI"') \
    [["Journal_Avg_cit", "Year", "journal_title", "is-referenced-by-count", 'doi']]

df_journal_cva['Journal_Avg_cit'] = df_journal_cva['Journal_Avg_cit'].astype(int) # note tjos rounds u
df_journal_cva['is-referenced-by-count'] = df_journal_cva['is-referenced-by-count'].astype(int)
df_journal_cva.describe()
Journal_Avg_cit Year is-referenced-by-count
count 150.000000 150.000000 150.000000
mean 14.960000 2018.500000 24.173333
std 15.534279 2.071701 47.443042
min 0.000000 2013.000000 0.000000
25% 5.000000 2017.000000 4.000000
50% 12.000000 2019.000000 10.000000
75% 20.750000 2020.000000 21.750000
max 126.000000 2022.000000 344.000000
df_journal_cva[df_journal_cva['is-referenced-by-count']>=df_journal_cva['Journal_Avg_cit']]
Journal_Avg_cit Year journal_title is-referenced-by-count doi
32 35 2017.0 ACM Transactions on Database Systems 43 10.1145/3129246
33 3 2018.0 AI Magazine 13 10.1609/aimag.v39i3.2816
34 0 2022.0 Applied Geochemistry 0 10.1016/j.apgeochem.2022.105273
38 9 2020.0 Biogeosciences 14 10.5194/bg-17-2537-2020
39 19 2018.0 BioScience 77 10.1093/biosci/biy068
... ... ... ... ... ...
211 9 2020.0 The Astrophysical Journal 22 10.3847/1538-4357/aba8a6
215 3 2021.0 The Astrophysical Journal 4 10.3847/1538-4357/abf2c8
217 0 2022.0 The Cryosphere 0 10.5194/tc-16-1431-2022
222 13 2016.0 Transactions in GIS 31 10.1111/tgis.12232
223 13 2016.0 Transactions in GIS 16 10.1111/tgis.12233

79 rows × 5 columns

df_journal_cva[['Journal_Avg_cit', 'Year', 'journal_title']].sort_values(by='journal_title')
Journal_Avg_cit Year journal_title
32 35 2017.0 ACM Transactions on Database Systems
33 3 2018.0 AI Magazine
34 0 2022.0 Applied Geochemistry
35 2 2021.0 Atmospheric Measurement Techniques
39 19 2018.0 BioScience
... ... ... ...
217 0 2022.0 The Cryosphere
223 13 2016.0 Transactions in GIS
222 13 2016.0 Transactions in GIS
224 40 2015.0 Water Resources Research
163 17 2019.0 mBio

150 rows × 3 columns

JOURNAL ANALYSIS#

  • what is the average journal citation count (over all years)

  • grouping by journal what are the paper averages

journals = df_journal_cva['journal_title'].unique()
df_journal_cva[['Journal_Avg_cit', 'journal_title']].groupby('journal_title').mean()
Journal_Avg_cit
journal_title
ACM Transactions on Database Systems 35.0
AI Magazine 3.0
Applied Geochemistry 0.0
Atmospheric Measurement Techniques 2.0
BioScience 19.0
... ...
The Astrophysical Journal Supplement Series 17.0
The Cryosphere 0.0
Transactions in GIS 13.0
Water Resources Research 40.0
mBio 17.0

89 rows × 1 columns

df_journal_cva[['Journal_Avg_cit', 'journal_title']].groupby('journal_title').count().sort_values(by='Journal_Avg_cit', ascending=False)
Journal_Avg_cit
journal_title
The Astrophysical Journal 12
Environmental Modelling & Software 8
ISPRS International Journal of Geo-Information 6
Computers & Geosciences 5
Journal of Geophysical Research: Space Physics 5
... ...
International Journal of Digital Earth 1
International Journal of Remote Sensing 1
International Journal of Semantic Computing 1
Journal of Atmospheric and Oceanic Technology 1
mBio 1

89 rows × 1 columns

df_journal_density_counts = \
    df_journal_cva[['Journal_Avg_cit', 'journal_title']]\
    .groupby('journal_title').count()\
    .sort_values(by='Journal_Avg_cit', ascending=False)
df_journal_density_counts = df_journal_density_counts.rename(columns={'Journal_Avg_cit': 'paper_counts'})

df_journal_density_counts
paper_counts
journal_title
The Astrophysical Journal 12
Environmental Modelling & Software 8
ISPRS International Journal of Geo-Information 6
Computers & Geosciences 5
Journal of Geophysical Research: Space Physics 5
... ...
International Journal of Digital Earth 1
International Journal of Remote Sensing 1
International Journal of Semantic Computing 1
Journal of Atmospheric and Oceanic Technology 1
mBio 1

89 rows × 1 columns

df_journal_density_means = \
    df_journal_cva[['Journal_Avg_cit', 'journal_title']].groupby('journal_title').mean().sort_values(by='Journal_Avg_cit', ascending=False)

df_journal_density_means = df_journal_density_means.rename(columns={'Journal_Avg_cit': 'mean_cites'})

df_journal_density_means
mean_cites
journal_title
Science 106.5
Nature Biotechnology 67.0
Current Climate Change Reports 43.0
IEEE Transactions on Geoscience and Remote Sensing 43.0
Genome Medicine 42.0
... ...
Engaging Science, Technology, and Society 0.0
Remote Sensing in Ecology and Conservation 0.0
The Cryosphere 0.0
Data in Brief 0.0
Applied Geochemistry 0.0

89 rows × 1 columns

For all journals with 2 or more papers, how did they do:

df_journal_cva.columns
Index(['Journal_Avg_cit', 'Year', 'journal_title', 'is-referenced-by-count',
       'doi'],
      dtype='object')

JOURNALS WITH MORE THAN 1 PUBLICATION#

journal_list = df_journal_density_counts[df_journal_density_counts['paper_counts']>1].index.to_list()

df_journal_ec_mean_analysis = \
    df_journal_cva[df_journal_cva['journal_title'].isin(journal_list)]\
        .groupby('journal_title').mean() \
        .drop(columns=['Year']) \
        .merge(df_journal_density_counts, left_index=True, right_index=True) \
        .sort_values('paper_counts', ascending=False) \
        .rename(columns={'Journal_Avg_cit': 'journal_mean_cites', 'is-referenced-by-count': 'ec_mean_cites'})

df_journal_ec_mean_analysis.describe()
C:\Users\kmaull\AppData\Local\Temp\ipykernel_197240\944483266.py:4: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  df_journal_cva[df_journal_cva['journal_title'].isin(journal_list)]\
journal_mean_cites ec_mean_cites paper_counts
count 27.000000 27.000000 27.000000
mean 15.389198 22.391975 3.259259
std 19.555807 33.431244 2.297031
min 1.000000 0.500000 2.000000
25% 7.450000 8.600000 2.000000
50% 12.000000 11.500000 2.000000
75% 17.062500 23.875000 3.000000
max 106.500000 175.000000 12.000000
df_journal_ec_mean_analysis.paper_counts.sum()
88
  • 88 papers (58.67%) with 2 or more papers in a single journal (27 distinct journals)

df_journal_ec_mean_analysis
journal_mean_cites ec_mean_cites paper_counts
journal_title
The Astrophysical Journal 10.833333 9.166667 12
Environmental Modelling & Software 17.125000 11.500000 8
ISPRS International Journal of Geo-Information 9.166667 10.500000 6
Journal of Geophysical Research: Space Physics 7.400000 10.800000 5
Computers & Geosciences 8.400000 8.200000 5
JAWRA Journal of the American Water Resources Association 13.750000 24.250000 4
Bulletin of the American Meteorological Society 28.333333 34.666667 3
Concurrency and Computation: Practice and Experience 1.000000 2.000000 3
Earth and Space Science 15.333333 34.000000 3
Journal of Proteome Research 7.666667 17.666667 3
Future Generation Computer Systems 17.666667 7.666667 3
Journal of Geophysical Research: Solid Earth 5.333333 4.666667 3
Semantic Web 12.500000 13.000000 2
Science 106.500000 30.000000 2
Remote Sensing 13.500000 4.000000 2
Journal of Hydroinformatics 12.000000 10.000000 2
Geophysical Research Letters 17.000000 175.000000 2
GigaScience 20.000000 17.500000 2
Geosphere 5.000000 6.000000 2
Geophysical Journal International 6.000000 10.500000 2
Geology 22.000000 41.500000 2
Frontiers in Microbiology 8.500000 11.500000 2
Frontiers in Astronomy and Space Sciences 2.000000 0.500000 2
Earth Science Informatics 7.500000 16.500000 2
Computing in Science & Engineering 1.000000 9.000000 2
Computers, Environment and Urban Systems 27.000000 61.000000 2
Transactions in GIS 13.000000 23.500000 2
df_journal_ec_mean_analysis[ 
    df_journal_ec_mean_analysis.ec_mean_cites>=df_journal_ec_mean_analysis.journal_mean_cites
].paper_counts.sum()
47
  • 47 of 88 papers (53.4%) with a mean_cite greater than or equal to journal_mean_cite (for all years, for journals where there were more than 1 publication)

JOURNALS WITH ONLY 1 PUBLICATION#

journal_list = df_journal_density_counts[df_journal_density_counts['paper_counts']<2].index.to_list()

df_journal_ec_mean_analysis = \
    df_journal_cva[df_journal_cva['journal_title'].isin(journal_list)]\
        .groupby('journal_title').mean() \
        .drop(columns=['Year']) \
        .merge(df_journal_density_counts, left_index=True, right_index=True) \
        .sort_values('paper_counts', ascending=False) \
        .rename(columns={'Journal_Avg_cit': 'journal_mean_cites', 'is-referenced-by-count': 'ec_mean_cites'})

df_journal_ec_mean_analysis.describe()
C:\Users\kmaull\AppData\Local\Temp\ipykernel_197240\3259818779.py:4: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  df_journal_cva[df_journal_cva['journal_title'].isin(journal_list)]\
journal_mean_cites ec_mean_cites paper_counts
count 62.000000 62.000000 62.0
mean 16.370968 32.387097 1.0
std 13.698431 56.452027 0.0
min 0.000000 0.000000 1.0
25% 6.250000 3.250000 1.0
50% 12.500000 13.500000 1.0
75% 23.000000 38.500000 1.0
max 67.000000 344.000000 1.0
  • 62 of 150 papers with only 1 publication in a journal

  • journal means = 16.37, earthcube paper means = 32.39

  • stdevs are very different (a lot more variance)

df_journal_ec_mean_analysis
journal_mean_cites ec_mean_cites paper_counts
journal_title
ACM Transactions on Database Systems 35.0 43.0 1
PLoS ONE 26.0 54.0 1
Journal of Climate 14.0 12.0 1
Journal of Environmental Quality 21.0 73.0 1
Journal of Sedimentary Research 10.0 4.0 1
... ... ... ...
Hydrological Processes 21.0 16.0 1
IEEE Systems Journal 25.0 5.0 1
IEEE Transactions on Geoscience and Remote Sensing 43.0 23.0 1
IEEE Transactions on Parallel and Distributed Systems 24.0 2.0 1
mBio 17.0 55.0 1

62 rows × 3 columns

df_journal_ec_mean_analysis[ 
    df_journal_ec_mean_analysis.ec_mean_cites>=df_journal_ec_mean_analysis.journal_mean_cites
].paper_counts.sum()
32
df_journal_ec_mean_analysis[ 
    df_journal_ec_mean_analysis.ec_mean_cites>df_journal_ec_mean_analysis.journal_mean_cites
].paper_counts.sum()
26
  • 26 of 62 (41.9%) EC papers produce more than mean citations than the journal means

  • 32 of 62 (51.6%) EC papers produce at least the mean citations from the journal means

  • the top 10 papers in terms of % above the average for their journal/year

df_journal_cva
Journal_Avg_cit Year journal_title is-referenced-by-count doi
32 35 2017.0 ACM Transactions on Database Systems 43 10.1145/3129246
33 3 2018.0 AI Magazine 13 10.1609/aimag.v39i3.2816
34 0 2022.0 Applied Geochemistry 0 10.1016/j.apgeochem.2022.105273
35 2 2021.0 Atmospheric Measurement Techniques 0 10.5194/amt-14-6917-2021
38 9 2020.0 Biogeosciences 14 10.5194/bg-17-2537-2020
... ... ... ... ... ...
216 17 2020.0 The Astrophysical Journal Supplement Series 2 10.3847/1538-4365/aba4aa
217 0 2022.0 The Cryosphere 0 10.5194/tc-16-1431-2022
222 13 2016.0 Transactions in GIS 31 10.1111/tgis.12232
223 13 2016.0 Transactions in GIS 16 10.1111/tgis.12233
224 40 2015.0 Water Resources Research 34 10.1002/2015wr017342

150 rows × 5 columns

df_journal_cva_reindexed = df_journal_cva.rename(columns={
    'Journal_Avg_cit': 'journal_mean_cites',
    'Year': 'year',
    'is-referenced-by-count': 'ec_cites'
    }).reset_index().drop('index', axis=1)
df_journal_cva_reindexed
journal_mean_cites year journal_title ec_cites doi
0 35 2017.0 ACM Transactions on Database Systems 43 10.1145/3129246
1 3 2018.0 AI Magazine 13 10.1609/aimag.v39i3.2816
2 0 2022.0 Applied Geochemistry 0 10.1016/j.apgeochem.2022.105273
3 2 2021.0 Atmospheric Measurement Techniques 0 10.5194/amt-14-6917-2021
4 9 2020.0 Biogeosciences 14 10.5194/bg-17-2537-2020
... ... ... ... ... ...
145 17 2020.0 The Astrophysical Journal Supplement Series 2 10.3847/1538-4365/aba4aa
146 0 2022.0 The Cryosphere 0 10.5194/tc-16-1431-2022
147 13 2016.0 Transactions in GIS 31 10.1111/tgis.12232
148 13 2016.0 Transactions in GIS 16 10.1111/tgis.12233
149 40 2015.0 Water Resources Research 34 10.1002/2015wr017342

150 rows × 5 columns

# filter on items that ec beats (or is equal to) the journal mean
df_journal_cva_reindexed[
    df_journal_cva_reindexed.ec_cites>=df_journal_cva_reindexed.journal_mean_cites
]

# cast year as int
df_journal_cva_reindexed.year = df_journal_cva_reindexed.year.astype(int)

# norm the pct diff (i.e. 200% = "100% more")
df_journal_cva_reindexed['ec_pct_diff'] = \
    100*((df_journal_cva_reindexed.ec_cites / df_journal_cva_reindexed.journal_mean_cites) - 1)
  • top 20 Earthcube papers by pct diff from journal mean

df_tmp = df_journal_cva_reindexed.sort_values(by='ec_pct_diff', ascending=False) \
    .query("journal_mean_cites > 0")[:20]
# df_tmp.doi = df_tmp.doi.apply(lambda d: f"[{d}](https://doi.org/{d})")
df_tmp.columns = ['Journal Mean Cites', 'Publication Year', 'Journal Title', 'EC Cites', 'EC Publication DOI', '% diff mean cites']
df_tmp.to_csv("../outputs/ec_journals_top_pct_delta.csv")
df_tmp
Journal Mean Cites Publication Year Journal Title EC Cites EC Publication DOI % diff mean cites
74 17 2017 International Journal of Digital Earth 344 10.1080/17538947.2016.1239771 1923.529412
61 25 2017 Geophysical Research Letters 336 10.1002/2017gl074954 1244.000000
117 10 2018 Quaternary Research 120 10.1017/qua.2017.105 1100.000000
18 1 2021 Computing in Science &amp; Engineering 10 10.1109/mcse.2021.3059437 900.000000
10 4 2019 Communications of the ACM 39 10.1145/3192335 875.000000
19 1 2021 Computing in Science &amp; Engineering 8 10.1109/mcse.2021.3059263 700.000000
68 4 2021 GSA Bulletin 30 10.1130/b35560.1 650.000000
75 8 2019 International Journal of Remote Sensing 49 10.1080/01431161.2018.1516313 512.500000
28 15 2016 Earth and Space Science 91 10.1002/2015ea000136 506.666667
85 13 2017 JAWRA Journal of the American Water Resources ... 72 10.1111/1752-1688.12474 453.846154
106 5 2017 Journal of Visualized Experiments 25 10.3791/54660 400.000000
25 8 2020 Data Intelligence 40 10.1162/dint_a_00033 400.000000
1 3 2018 AI Magazine 13 10.1609/aimag.v39i3.2816 333.333333
125 39 2017 Scientific Data 167 10.1038/sdata.2017.88 328.205128
5 19 2018 BioScience 77 10.1093/biosci/biy068 305.263158
60 1 2021 Geophysical Journal International 4 10.1093/gji/ggab238 300.000000
17 27 2017 Computers, Environment and Urban Systems 108 10.1016/j.compenvurbsys.2016.10.010 300.000000
20 1 2021 Concurrency and Computation: Practice and Expe... 4 10.1002/cpe.6099 300.000000
122 32 2017 Renewable Energy 127 10.1016/j.renene.2017.02.052 296.875000
100 12 2019 Journal of Proteome Research 44 10.1021/acs.jproteome.8b00761 266.666667
df_citations = pd.read_csv("../outputs/full_nsf_doi_project_summary.tsv", sep='\t')[['doi', 'ams_bib']]
print("""
|Citation| EC Cites | Journal Mean Cites | % diff mean cites|
|---:|:--:|:--:|:--:|""")
for i, r in df_tmp[:10].iterrows():
    doi = df_citations.query(f"doi == '{r['EC Publication DOI']}'").drop_duplicates()['ams_bib'].unique()[0]
    print(f"{doi} | {r['EC Cites']} | {r['Journal Mean Cites']} | {r['% diff mean cites']:.2f}% |")
|Citation| EC Cites | Journal Mean Cites | % diff mean cites|
|---:|:--:|:--:|:--:|
Yang, C., Q. Huang, Z. Li, K. Liu, and F. Hu, 2016: Big Data and cloud computing: innovation opportunities and challenges. International Journal of Digital Earth, 10, 13–53, https://doi.org/10.1080/17538947.2016.1239771. | 344 | 17 | 1923.53% |
Morlighem, M., and Coauthors, 2017: BedMachine v3: Complete Bed Topography and Ocean Bathymetry Mapping of Greenland From Multibeam Echo Sounding Combined With Mass Conservation. Geophysical Research Letters, 44, https://doi.org/10.1002/2017gl074954. | 336 | 25 | 1244.00% |
Williams, J. W., and Coauthors, 2018: The Neotoma Paleoecology Database, a multiproxy, international, community-curated data resource. Quaternary Research, 89, 156–177, https://doi.org/10.1017/qua.2017.105. | 120 | 10 | 1100.00% |
Abernathey, R. P., and Coauthors, 2021: Cloud-Native Repositories for Big Scientific Data. Computing in Science &amp; Engineering, 23, 26–35, https://doi.org/10.1109/mcse.2021.3059437. | 10 | 1 | 900.00% |
Gil, Y., and Coauthors, 2018: Intelligent systems for geosciences. Communications of the ACM, 62, 76–84, https://doi.org/10.1145/3192335. | 39 | 4 | 875.00% |
Granger, B. E., and F. Perez, 2021: Jupyter: Thinking and Storytelling With Code and Data. Computing in Science &amp; Engineering, 23, 7–14, https://doi.org/10.1109/mcse.2021.3059263. | 8 | 1 | 700.00% |
Schaen, A. J., and Coauthors, 2020: Interpreting and reporting 40Ar/39Ar geochronologic data. GSA Bulletin, 133, 461–487, https://doi.org/10.1130/b35560.1. | 30 | 4 | 650.00% |
Sun, Z., L. Di, and H. Fang, 2018: Using long short-term memory recurrent neural network in land cover classification on Landsat and Cropland data layer time series. International Journal of Remote Sensing, 40, 593–614, https://doi.org/10.1080/01431161.2018.1516313. | 49 | 8 | 512.50% |
Gil, Y., and Coauthors, 2016: Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance. Earth and Space Science, 3, 388–415, https://doi.org/10.1002/2015ea000136. | 91 | 15 | 506.67% |
Maidment, D. R., 2016: Conceptual Framework for the National Flood Interoperability Experiment. JAWRA Journal of the American Water Resources Association, 53, 245–257, https://doi.org/10.1111/1752-1688.12474. | 72 | 13 | 453.85% |

Citation

EC Cites

Journal Mean Cites

% diff mean cites

Yang, C., Q. Huang, Z. Li, K. Liu, and F. Hu, 2016: Big Data and cloud computing: innovation opportunities and challenges. International Journal of Digital Earth, 10, 13–53, https://doi.org/10.1080/17538947.2016.1239771.

344

17

1923.53%

Morlighem, M., and Coauthors, 2017: BedMachine v3: Complete Bed Topography and Ocean Bathymetry Mapping of Greenland From Multibeam Echo Sounding Combined With Mass Conservation. Geophysical Research Letters, 44, https://doi.org/10.1002/2017gl074954.

336

25

1244.00%

Williams, J. W., and Coauthors, 2018: The Neotoma Paleoecology Database, a multiproxy, international, community-curated data resource. Quaternary Research, 89, 156–177, https://doi.org/10.1017/qua.2017.105.

120

10

1100.00%

Abernathey, R. P., and Coauthors, 2021: Cloud-Native Repositories for Big Scientific Data. Computing in Science & Engineering, 23, 26–35, https://doi.org/10.1109/mcse.2021.3059437.

10

1

900.00%

Gil, Y., and Coauthors, 2018: Intelligent systems for geosciences. Communications of the ACM, 62, 76–84, https://doi.org/10.1145/3192335.

39

4

875.00%

Granger, B. E., and F. Perez, 2021: Jupyter: Thinking and Storytelling With Code and Data. Computing in Science & Engineering, 23, 7–14, https://doi.org/10.1109/mcse.2021.3059263.

8

1

700.00%

Schaen, A. J., and Coauthors, 2020: Interpreting and reporting 40Ar/39Ar geochronologic data. GSA Bulletin, 133, 461–487, https://doi.org/10.1130/b35560.1.

30

4

650.00%

Sun, Z., L. Di, and H. Fang, 2018: Using long short-term memory recurrent neural network in land cover classification on Landsat and Cropland data layer time series. International Journal of Remote Sensing, 40, 593–614, https://doi.org/10.1080/01431161.2018.1516313.

49

8

512.50%

Gil, Y., and Coauthors, 2016: Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance. Earth and Space Science, 3, 388–415, https://doi.org/10.1002/2015ea000136.

91

15

506.67%

Maidment, D. R., 2016: Conceptual Framework for the National Flood Interoperability Experiment. JAWRA Journal of the American Water Resources Association, 53, 245–257, https://doi.org/10.1111/1752-1688.12474.

72

13

453.85%

TAKEAWAYS

  • journal average citation: 14.96

  • average citation of EC papers: 24.17 *from WOS metadata citation counts

  • EC papers beat average 79 times (52.7% of the time)