BASIC JOURNAL ANALYSIS
Contents
BASIC JOURNAL ANALYSIS#
The notebook uses frozen static metadata files from the summer of 2022 to analyze the journal-level importance of EarthCube data.
It broadly aims to explore:
the proportion of EC papers that have received more citations than the average paper for the journal/year;
the cumulative citation count for this group and whether they are higher than the sum of all of these averages;
the top 10 papers in terms of % above the average for their journal/year
Fixed inputs to this notebook are:
../inputs/20220805_ec_journal_titles_plus_citations.xlsx: the journal citation exported data from Web of Science
../inputs/cr_metadata_20220610012125.json: we use a fixed snapshop metadata file extracted from crossref on June 10, 2022.
NOTE: We have fixed input snapshot targets because both sources are changing and updating at different times. Furthermore, WOS data requires subscription, which makes dynamic replication difficult.
import pandas as pd
import json
df = pd.read_excel("../inputs/20220805_ec_journal_titles_plus_citations.xlsx")
df
Unnamed: 0 | doi | journal_title | publication_title | url | Year | Journal_Avg_cit | |
---|---|---|---|---|---|---|---|
0 | 15 | 10.5065/p2jj-9878 | -- | -- | https://doi.org/10.5065/p2jj-9878 | NaN | NI |
1 | 79 | 10.1594/ieda/100709 | -- | -- | https://doi.org/10.1594/ieda/https://doi.org/1... | NaN | NI |
2 | 81 | 10.5281/zenodo.5496306 | -- | -- | https://doi.org/10.5281/zenodo.5496306 | NaN | NI |
3 | 93 | 10.13140/rg.2.1.4908.4561 | -- | -- | https://doi.org/10.13140/rg.2.1.4908.4561 | NaN | NI |
4 | 109 | 10.6084/m9.figshare.4272164.v1 | -- | -- | https://doi.org/10.6084/m9.figshare.4272164.v1 | NaN | NI |
... | ... | ... | ... | ... | ... | ... | ... |
236 | 102 | 10.1111/tgis.12233 | Transactions in GIS | Crowdsensing smart ambient environments and se... | https://doi.org/10.1111/tgis.12233 | 2016.0 | 13.08 |
237 | 219 | 10.1002/2015wr017342 | Water Resources Research | Hydrocomplexity: Addressing water security and... | https://doi.org/10.https://doi.org/10.2/2015wr... | 2015.0 | 40.97 |
238 | 27 | 10.22498/pages | NaN | Past Global Changes Magazine | https://doi.org/10.22498/pages | NaN | NI |
239 | 60 | 10.17504/protocols.io.fjjbkkn | NaN | ECOGEO 'Omics Training: Introduction to Enviro... | https://doi.org/10.17504/protocols.io.fjjbkkn | NaN | NI |
240 | 222 | 10.1101/647651 | NaN | Ecological and genomic attributes of novel bac... | https://doi.org/10.1https://doi.org/10./647651 | NaN | NI |
241 rows × 7 columns
df['Journal_Avg_cit'] = df['Journal_Avg_cit'].astype(float, errors='ignore')
df
Unnamed: 0 | doi | journal_title | publication_title | url | Year | Journal_Avg_cit | |
---|---|---|---|---|---|---|---|
0 | 15 | 10.5065/p2jj-9878 | -- | -- | https://doi.org/10.5065/p2jj-9878 | NaN | NI |
1 | 79 | 10.1594/ieda/100709 | -- | -- | https://doi.org/10.1594/ieda/https://doi.org/1... | NaN | NI |
2 | 81 | 10.5281/zenodo.5496306 | -- | -- | https://doi.org/10.5281/zenodo.5496306 | NaN | NI |
3 | 93 | 10.13140/rg.2.1.4908.4561 | -- | -- | https://doi.org/10.13140/rg.2.1.4908.4561 | NaN | NI |
4 | 109 | 10.6084/m9.figshare.4272164.v1 | -- | -- | https://doi.org/10.6084/m9.figshare.4272164.v1 | NaN | NI |
... | ... | ... | ... | ... | ... | ... | ... |
236 | 102 | 10.1111/tgis.12233 | Transactions in GIS | Crowdsensing smart ambient environments and se... | https://doi.org/10.1111/tgis.12233 | 2016.0 | 13.08 |
237 | 219 | 10.1002/2015wr017342 | Water Resources Research | Hydrocomplexity: Addressing water security and... | https://doi.org/10.https://doi.org/10.2/2015wr... | 2015.0 | 40.97 |
238 | 27 | 10.22498/pages | NaN | Past Global Changes Magazine | https://doi.org/10.22498/pages | NaN | NI |
239 | 60 | 10.17504/protocols.io.fjjbkkn | NaN | ECOGEO 'Omics Training: Introduction to Enviro... | https://doi.org/10.17504/protocols.io.fjjbkkn | NaN | NI |
240 | 222 | 10.1101/647651 | NaN | Ecological and genomic attributes of novel bac... | https://doi.org/10.1https://doi.org/10./647651 | NaN | NI |
241 rows × 7 columns
df_subset = df[df['Journal_Avg_cit']!='NI']
df_subset[df_subset['Journal_Avg_cit']>30]
Unnamed: 0 | doi | journal_title | publication_title | url | Year | Journal_Avg_cit | |
---|---|---|---|---|---|---|---|
45 | 134 | 10.1145/3129246 | ACM Transactions on Database Systems | EmptyHeaded | https://doi.org/10.1145/3129246 | 2017.0 | 35.65 |
53 | 7 | 10.1175/bams-d-14-00164.1 | Bulletin of the American Meteorological Society | The Earth System Prediction Suite: Toward a Co... | https://doi.org/10.1175/bams-d-14-00164.1 | 2016.0 | 30.85 |
54 | 65 | 10.1175/bams-d-15-00239.1 | Bulletin of the American Meteorological Society | Sharing Experiences and Outlook on Coupling Te... | https://doi.org/10.1175/bams-d-15-00239.1 | 2016.0 | 30.85 |
70 | 147 | 10.1007/s40641-018-0107-0 | Current Climate Change Reports | Rising Oceans Guaranteed: Arctic Land Ice Loss... | https://doi.org/10.https://doi.org/10.7/s40641... | 2018.0 | 43 |
109 | 106 | 10.1186/s13073-015-0202-y | Genome Medicine | Use of semantic workflows to enhance transpare... | https://doi.org/10.1186/s13073-015-0202-y | 2015.0 | 42.88 |
110 | 234 | 10.1111/gfl.12114 | Geofluids | DigitalCrust - a 4D data system of material pr... | https://doi.org/10.1111/gfl.12114 | 2015.0 | 30.45 |
133 | 82 | 10.1109/tgrs.2014.2382566 | IEEE Transactions on Geoscience and Remote Sen... | Regular Shape Similarity Index: A Novel Index ... | https://doi.org/10.1https://doi.org/10./tgrs.2... | 2015.0 | 43.05 |
178 | 225 | 10.1038/nbt.4306 | Nature Biotechnology | Minimum Information about an Uncultivated Viru... | https://doi.org/10.https://doi.org/10.8/nbt.4306 | 2019.0 | 67.85 |
179 | 47 | 10.1038/s41561-018-0272-8 | Nature Geoscience | Similarity of fast and slow earthquakes illumi... | https://doi.org/10.https://doi.org/10.8/s41561... | 2019.0 | 39.7 |
206 | 92 | 10.1016/j.renene.2017.02.052 | Renewable Energy | Short-term photovoltaic power forecasting usin... | https://doi.org/10.https://doi.org/10.6/j.rene... | 2017.0 | 32.19 |
207 | 44 | 10.1126/science.aad7048 | Science | Liberating field science samples and data | https://doi.org/10.1126/science.aad7048 | 2016.0 | 87 |
208 | 190 | 10.1126/science.342.6162.1041-b | Science | Open Data: Crediting a Culture of Cooperation | https://doi.org/10.1126/science.342.6162.https... | 2013.0 | 126.97 |
209 | 180 | 10.1038/sdata.2017.88 | Scientific Data | A global multiproxy database for temperature r... | https://doi.org/10.https://doi.org/10.8/sdata.... | 2017.0 | 39.61 |
237 | 219 | 10.1002/2015wr017342 | Water Resources Research | Hydrocomplexity: Addressing water security and... | https://doi.org/10.https://doi.org/10.2/2015wr... | 2015.0 | 40.97 |
df_json = pd.read_json("../inputs/cr_metadata_20220610012125.json").T
df_json.columns
Index(['indexed', 'reference-count', 'publisher', 'content-domain',
'short-container-title', 'published-print', 'DOI', 'type', 'created',
'source', 'is-referenced-by-count', 'title', 'prefix', 'author',
'member', 'event', 'container-title', 'original-title', 'link',
'deposited', 'score', 'resource', 'subtitle', 'short-title', 'issued',
'references-count', 'URL', 'relation', 'published', 'issue', 'license',
'funder', 'update-policy', 'volume', 'published-online', 'reference',
'language', 'journal-issue', 'alternative-id', 'archive', 'ISSN',
'issn-type', 'subject', 'assertion', 'abstract', 'page',
'published-other', 'accepted', 'publisher-location', 'editor',
'article-number', 'posted', 'subtype', 'isbn-type', 'ISBN',
'institution', 'group-title'],
dtype='object')
df_journals = df.merge(
df_json[['DOI', 'is-referenced-by-count']].reset_index().drop('index',axis=1).rename(columns={'DOI': 'doi'}),
on='doi'
)
Journal Citation Counts vs Actual Citation Counts#
Here our interest will turn to how well the actual citation counts match the expected journal citation counts from WOS.
df_journal_cva = df_journals.query('Journal_Avg_cit != "NI"') \
[["Journal_Avg_cit", "Year", "journal_title", "is-referenced-by-count", 'doi']]
df_journal_cva['Journal_Avg_cit'] = df_journal_cva['Journal_Avg_cit'].astype(int) # note tjos rounds u
df_journal_cva['is-referenced-by-count'] = df_journal_cva['is-referenced-by-count'].astype(int)
df_journal_cva.describe()
Journal_Avg_cit | Year | is-referenced-by-count | |
---|---|---|---|
count | 150.000000 | 150.000000 | 150.000000 |
mean | 14.960000 | 2018.500000 | 24.173333 |
std | 15.534279 | 2.071701 | 47.443042 |
min | 0.000000 | 2013.000000 | 0.000000 |
25% | 5.000000 | 2017.000000 | 4.000000 |
50% | 12.000000 | 2019.000000 | 10.000000 |
75% | 20.750000 | 2020.000000 | 21.750000 |
max | 126.000000 | 2022.000000 | 344.000000 |
df_journal_cva[df_journal_cva['is-referenced-by-count']>=df_journal_cva['Journal_Avg_cit']]
Journal_Avg_cit | Year | journal_title | is-referenced-by-count | doi | |
---|---|---|---|---|---|
32 | 35 | 2017.0 | ACM Transactions on Database Systems | 43 | 10.1145/3129246 |
33 | 3 | 2018.0 | AI Magazine | 13 | 10.1609/aimag.v39i3.2816 |
34 | 0 | 2022.0 | Applied Geochemistry | 0 | 10.1016/j.apgeochem.2022.105273 |
38 | 9 | 2020.0 | Biogeosciences | 14 | 10.5194/bg-17-2537-2020 |
39 | 19 | 2018.0 | BioScience | 77 | 10.1093/biosci/biy068 |
... | ... | ... | ... | ... | ... |
211 | 9 | 2020.0 | The Astrophysical Journal | 22 | 10.3847/1538-4357/aba8a6 |
215 | 3 | 2021.0 | The Astrophysical Journal | 4 | 10.3847/1538-4357/abf2c8 |
217 | 0 | 2022.0 | The Cryosphere | 0 | 10.5194/tc-16-1431-2022 |
222 | 13 | 2016.0 | Transactions in GIS | 31 | 10.1111/tgis.12232 |
223 | 13 | 2016.0 | Transactions in GIS | 16 | 10.1111/tgis.12233 |
79 rows × 5 columns
df_journal_cva[['Journal_Avg_cit', 'Year', 'journal_title']].sort_values(by='journal_title')
Journal_Avg_cit | Year | journal_title | |
---|---|---|---|
32 | 35 | 2017.0 | ACM Transactions on Database Systems |
33 | 3 | 2018.0 | AI Magazine |
34 | 0 | 2022.0 | Applied Geochemistry |
35 | 2 | 2021.0 | Atmospheric Measurement Techniques |
39 | 19 | 2018.0 | BioScience |
... | ... | ... | ... |
217 | 0 | 2022.0 | The Cryosphere |
223 | 13 | 2016.0 | Transactions in GIS |
222 | 13 | 2016.0 | Transactions in GIS |
224 | 40 | 2015.0 | Water Resources Research |
163 | 17 | 2019.0 | mBio |
150 rows × 3 columns
JOURNAL ANALYSIS#
what is the average journal citation count (over all years)
grouping by journal what are the paper averages
journals = df_journal_cva['journal_title'].unique()
df_journal_cva[['Journal_Avg_cit', 'journal_title']].groupby('journal_title').mean()
Journal_Avg_cit | |
---|---|
journal_title | |
ACM Transactions on Database Systems | 35.0 |
AI Magazine | 3.0 |
Applied Geochemistry | 0.0 |
Atmospheric Measurement Techniques | 2.0 |
BioScience | 19.0 |
... | ... |
The Astrophysical Journal Supplement Series | 17.0 |
The Cryosphere | 0.0 |
Transactions in GIS | 13.0 |
Water Resources Research | 40.0 |
mBio | 17.0 |
89 rows × 1 columns
df_journal_cva[['Journal_Avg_cit', 'journal_title']].groupby('journal_title').count().sort_values(by='Journal_Avg_cit', ascending=False)
Journal_Avg_cit | |
---|---|
journal_title | |
The Astrophysical Journal | 12 |
Environmental Modelling & Software | 8 |
ISPRS International Journal of Geo-Information | 6 |
Computers & Geosciences | 5 |
Journal of Geophysical Research: Space Physics | 5 |
... | ... |
International Journal of Digital Earth | 1 |
International Journal of Remote Sensing | 1 |
International Journal of Semantic Computing | 1 |
Journal of Atmospheric and Oceanic Technology | 1 |
mBio | 1 |
89 rows × 1 columns
df_journal_density_counts = \
df_journal_cva[['Journal_Avg_cit', 'journal_title']]\
.groupby('journal_title').count()\
.sort_values(by='Journal_Avg_cit', ascending=False)
df_journal_density_counts = df_journal_density_counts.rename(columns={'Journal_Avg_cit': 'paper_counts'})
df_journal_density_counts
paper_counts | |
---|---|
journal_title | |
The Astrophysical Journal | 12 |
Environmental Modelling & Software | 8 |
ISPRS International Journal of Geo-Information | 6 |
Computers & Geosciences | 5 |
Journal of Geophysical Research: Space Physics | 5 |
... | ... |
International Journal of Digital Earth | 1 |
International Journal of Remote Sensing | 1 |
International Journal of Semantic Computing | 1 |
Journal of Atmospheric and Oceanic Technology | 1 |
mBio | 1 |
89 rows × 1 columns
df_journal_density_means = \
df_journal_cva[['Journal_Avg_cit', 'journal_title']].groupby('journal_title').mean().sort_values(by='Journal_Avg_cit', ascending=False)
df_journal_density_means = df_journal_density_means.rename(columns={'Journal_Avg_cit': 'mean_cites'})
df_journal_density_means
mean_cites | |
---|---|
journal_title | |
Science | 106.5 |
Nature Biotechnology | 67.0 |
Current Climate Change Reports | 43.0 |
IEEE Transactions on Geoscience and Remote Sensing | 43.0 |
Genome Medicine | 42.0 |
... | ... |
Engaging Science, Technology, and Society | 0.0 |
Remote Sensing in Ecology and Conservation | 0.0 |
The Cryosphere | 0.0 |
Data in Brief | 0.0 |
Applied Geochemistry | 0.0 |
89 rows × 1 columns
For all journals with 2 or more papers, how did they do:
df_journal_cva.columns
Index(['Journal_Avg_cit', 'Year', 'journal_title', 'is-referenced-by-count',
'doi'],
dtype='object')
JOURNALS WITH MORE THAN 1 PUBLICATION#
journal_list = df_journal_density_counts[df_journal_density_counts['paper_counts']>1].index.to_list()
df_journal_ec_mean_analysis = \
df_journal_cva[df_journal_cva['journal_title'].isin(journal_list)]\
.groupby('journal_title').mean() \
.drop(columns=['Year']) \
.merge(df_journal_density_counts, left_index=True, right_index=True) \
.sort_values('paper_counts', ascending=False) \
.rename(columns={'Journal_Avg_cit': 'journal_mean_cites', 'is-referenced-by-count': 'ec_mean_cites'})
df_journal_ec_mean_analysis.describe()
C:\Users\kmaull\AppData\Local\Temp\ipykernel_197240\944483266.py:4: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
df_journal_cva[df_journal_cva['journal_title'].isin(journal_list)]\
journal_mean_cites | ec_mean_cites | paper_counts | |
---|---|---|---|
count | 27.000000 | 27.000000 | 27.000000 |
mean | 15.389198 | 22.391975 | 3.259259 |
std | 19.555807 | 33.431244 | 2.297031 |
min | 1.000000 | 0.500000 | 2.000000 |
25% | 7.450000 | 8.600000 | 2.000000 |
50% | 12.000000 | 11.500000 | 2.000000 |
75% | 17.062500 | 23.875000 | 3.000000 |
max | 106.500000 | 175.000000 | 12.000000 |
df_journal_ec_mean_analysis.paper_counts.sum()
88
88 papers (58.67%) with 2 or more papers in a single journal (27 distinct journals)
df_journal_ec_mean_analysis
journal_mean_cites | ec_mean_cites | paper_counts | |
---|---|---|---|
journal_title | |||
The Astrophysical Journal | 10.833333 | 9.166667 | 12 |
Environmental Modelling & Software | 17.125000 | 11.500000 | 8 |
ISPRS International Journal of Geo-Information | 9.166667 | 10.500000 | 6 |
Journal of Geophysical Research: Space Physics | 7.400000 | 10.800000 | 5 |
Computers & Geosciences | 8.400000 | 8.200000 | 5 |
JAWRA Journal of the American Water Resources Association | 13.750000 | 24.250000 | 4 |
Bulletin of the American Meteorological Society | 28.333333 | 34.666667 | 3 |
Concurrency and Computation: Practice and Experience | 1.000000 | 2.000000 | 3 |
Earth and Space Science | 15.333333 | 34.000000 | 3 |
Journal of Proteome Research | 7.666667 | 17.666667 | 3 |
Future Generation Computer Systems | 17.666667 | 7.666667 | 3 |
Journal of Geophysical Research: Solid Earth | 5.333333 | 4.666667 | 3 |
Semantic Web | 12.500000 | 13.000000 | 2 |
Science | 106.500000 | 30.000000 | 2 |
Remote Sensing | 13.500000 | 4.000000 | 2 |
Journal of Hydroinformatics | 12.000000 | 10.000000 | 2 |
Geophysical Research Letters | 17.000000 | 175.000000 | 2 |
GigaScience | 20.000000 | 17.500000 | 2 |
Geosphere | 5.000000 | 6.000000 | 2 |
Geophysical Journal International | 6.000000 | 10.500000 | 2 |
Geology | 22.000000 | 41.500000 | 2 |
Frontiers in Microbiology | 8.500000 | 11.500000 | 2 |
Frontiers in Astronomy and Space Sciences | 2.000000 | 0.500000 | 2 |
Earth Science Informatics | 7.500000 | 16.500000 | 2 |
Computing in Science & Engineering | 1.000000 | 9.000000 | 2 |
Computers, Environment and Urban Systems | 27.000000 | 61.000000 | 2 |
Transactions in GIS | 13.000000 | 23.500000 | 2 |
df_journal_ec_mean_analysis[
df_journal_ec_mean_analysis.ec_mean_cites>=df_journal_ec_mean_analysis.journal_mean_cites
].paper_counts.sum()
47
47 of 88 papers (53.4%) with a mean_cite greater than or equal to journal_mean_cite (for all years, for journals where there were more than 1 publication)
JOURNALS WITH ONLY 1 PUBLICATION#
journal_list = df_journal_density_counts[df_journal_density_counts['paper_counts']<2].index.to_list()
df_journal_ec_mean_analysis = \
df_journal_cva[df_journal_cva['journal_title'].isin(journal_list)]\
.groupby('journal_title').mean() \
.drop(columns=['Year']) \
.merge(df_journal_density_counts, left_index=True, right_index=True) \
.sort_values('paper_counts', ascending=False) \
.rename(columns={'Journal_Avg_cit': 'journal_mean_cites', 'is-referenced-by-count': 'ec_mean_cites'})
df_journal_ec_mean_analysis.describe()
C:\Users\kmaull\AppData\Local\Temp\ipykernel_197240\3259818779.py:4: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
df_journal_cva[df_journal_cva['journal_title'].isin(journal_list)]\
journal_mean_cites | ec_mean_cites | paper_counts | |
---|---|---|---|
count | 62.000000 | 62.000000 | 62.0 |
mean | 16.370968 | 32.387097 | 1.0 |
std | 13.698431 | 56.452027 | 0.0 |
min | 0.000000 | 0.000000 | 1.0 |
25% | 6.250000 | 3.250000 | 1.0 |
50% | 12.500000 | 13.500000 | 1.0 |
75% | 23.000000 | 38.500000 | 1.0 |
max | 67.000000 | 344.000000 | 1.0 |
62 of 150 papers with only 1 publication in a journal
journal means = 16.37, earthcube paper means = 32.39
stdevs are very different (a lot more variance)
df_journal_ec_mean_analysis
journal_mean_cites | ec_mean_cites | paper_counts | |
---|---|---|---|
journal_title | |||
ACM Transactions on Database Systems | 35.0 | 43.0 | 1 |
PLoS ONE | 26.0 | 54.0 | 1 |
Journal of Climate | 14.0 | 12.0 | 1 |
Journal of Environmental Quality | 21.0 | 73.0 | 1 |
Journal of Sedimentary Research | 10.0 | 4.0 | 1 |
... | ... | ... | ... |
Hydrological Processes | 21.0 | 16.0 | 1 |
IEEE Systems Journal | 25.0 | 5.0 | 1 |
IEEE Transactions on Geoscience and Remote Sensing | 43.0 | 23.0 | 1 |
IEEE Transactions on Parallel and Distributed Systems | 24.0 | 2.0 | 1 |
mBio | 17.0 | 55.0 | 1 |
62 rows × 3 columns
df_journal_ec_mean_analysis[
df_journal_ec_mean_analysis.ec_mean_cites>=df_journal_ec_mean_analysis.journal_mean_cites
].paper_counts.sum()
32
df_journal_ec_mean_analysis[
df_journal_ec_mean_analysis.ec_mean_cites>df_journal_ec_mean_analysis.journal_mean_cites
].paper_counts.sum()
26
26 of 62 (41.9%) EC papers produce more than mean citations than the journal means
32 of 62 (51.6%) EC papers produce at least the mean citations from the journal means
the top 10 papers in terms of % above the average for their journal/year
df_journal_cva
Journal_Avg_cit | Year | journal_title | is-referenced-by-count | doi | |
---|---|---|---|---|---|
32 | 35 | 2017.0 | ACM Transactions on Database Systems | 43 | 10.1145/3129246 |
33 | 3 | 2018.0 | AI Magazine | 13 | 10.1609/aimag.v39i3.2816 |
34 | 0 | 2022.0 | Applied Geochemistry | 0 | 10.1016/j.apgeochem.2022.105273 |
35 | 2 | 2021.0 | Atmospheric Measurement Techniques | 0 | 10.5194/amt-14-6917-2021 |
38 | 9 | 2020.0 | Biogeosciences | 14 | 10.5194/bg-17-2537-2020 |
... | ... | ... | ... | ... | ... |
216 | 17 | 2020.0 | The Astrophysical Journal Supplement Series | 2 | 10.3847/1538-4365/aba4aa |
217 | 0 | 2022.0 | The Cryosphere | 0 | 10.5194/tc-16-1431-2022 |
222 | 13 | 2016.0 | Transactions in GIS | 31 | 10.1111/tgis.12232 |
223 | 13 | 2016.0 | Transactions in GIS | 16 | 10.1111/tgis.12233 |
224 | 40 | 2015.0 | Water Resources Research | 34 | 10.1002/2015wr017342 |
150 rows × 5 columns
df_journal_cva_reindexed = df_journal_cva.rename(columns={
'Journal_Avg_cit': 'journal_mean_cites',
'Year': 'year',
'is-referenced-by-count': 'ec_cites'
}).reset_index().drop('index', axis=1)
df_journal_cva_reindexed
journal_mean_cites | year | journal_title | ec_cites | doi | |
---|---|---|---|---|---|
0 | 35 | 2017.0 | ACM Transactions on Database Systems | 43 | 10.1145/3129246 |
1 | 3 | 2018.0 | AI Magazine | 13 | 10.1609/aimag.v39i3.2816 |
2 | 0 | 2022.0 | Applied Geochemistry | 0 | 10.1016/j.apgeochem.2022.105273 |
3 | 2 | 2021.0 | Atmospheric Measurement Techniques | 0 | 10.5194/amt-14-6917-2021 |
4 | 9 | 2020.0 | Biogeosciences | 14 | 10.5194/bg-17-2537-2020 |
... | ... | ... | ... | ... | ... |
145 | 17 | 2020.0 | The Astrophysical Journal Supplement Series | 2 | 10.3847/1538-4365/aba4aa |
146 | 0 | 2022.0 | The Cryosphere | 0 | 10.5194/tc-16-1431-2022 |
147 | 13 | 2016.0 | Transactions in GIS | 31 | 10.1111/tgis.12232 |
148 | 13 | 2016.0 | Transactions in GIS | 16 | 10.1111/tgis.12233 |
149 | 40 | 2015.0 | Water Resources Research | 34 | 10.1002/2015wr017342 |
150 rows × 5 columns
# filter on items that ec beats (or is equal to) the journal mean
df_journal_cva_reindexed[
df_journal_cva_reindexed.ec_cites>=df_journal_cva_reindexed.journal_mean_cites
]
# cast year as int
df_journal_cva_reindexed.year = df_journal_cva_reindexed.year.astype(int)
# norm the pct diff (i.e. 200% = "100% more")
df_journal_cva_reindexed['ec_pct_diff'] = \
100*((df_journal_cva_reindexed.ec_cites / df_journal_cva_reindexed.journal_mean_cites) - 1)
top 20 Earthcube papers by pct diff from journal mean
df_tmp = df_journal_cva_reindexed.sort_values(by='ec_pct_diff', ascending=False) \
.query("journal_mean_cites > 0")[:20]
# df_tmp.doi = df_tmp.doi.apply(lambda d: f"[{d}](https://doi.org/{d})")
df_tmp.columns = ['Journal Mean Cites', 'Publication Year', 'Journal Title', 'EC Cites', 'EC Publication DOI', '% diff mean cites']
df_tmp.to_csv("../outputs/ec_journals_top_pct_delta.csv")
df_tmp
Journal Mean Cites | Publication Year | Journal Title | EC Cites | EC Publication DOI | % diff mean cites | |
---|---|---|---|---|---|---|
74 | 17 | 2017 | International Journal of Digital Earth | 344 | 10.1080/17538947.2016.1239771 | 1923.529412 |
61 | 25 | 2017 | Geophysical Research Letters | 336 | 10.1002/2017gl074954 | 1244.000000 |
117 | 10 | 2018 | Quaternary Research | 120 | 10.1017/qua.2017.105 | 1100.000000 |
18 | 1 | 2021 | Computing in Science & Engineering | 10 | 10.1109/mcse.2021.3059437 | 900.000000 |
10 | 4 | 2019 | Communications of the ACM | 39 | 10.1145/3192335 | 875.000000 |
19 | 1 | 2021 | Computing in Science & Engineering | 8 | 10.1109/mcse.2021.3059263 | 700.000000 |
68 | 4 | 2021 | GSA Bulletin | 30 | 10.1130/b35560.1 | 650.000000 |
75 | 8 | 2019 | International Journal of Remote Sensing | 49 | 10.1080/01431161.2018.1516313 | 512.500000 |
28 | 15 | 2016 | Earth and Space Science | 91 | 10.1002/2015ea000136 | 506.666667 |
85 | 13 | 2017 | JAWRA Journal of the American Water Resources ... | 72 | 10.1111/1752-1688.12474 | 453.846154 |
106 | 5 | 2017 | Journal of Visualized Experiments | 25 | 10.3791/54660 | 400.000000 |
25 | 8 | 2020 | Data Intelligence | 40 | 10.1162/dint_a_00033 | 400.000000 |
1 | 3 | 2018 | AI Magazine | 13 | 10.1609/aimag.v39i3.2816 | 333.333333 |
125 | 39 | 2017 | Scientific Data | 167 | 10.1038/sdata.2017.88 | 328.205128 |
5 | 19 | 2018 | BioScience | 77 | 10.1093/biosci/biy068 | 305.263158 |
60 | 1 | 2021 | Geophysical Journal International | 4 | 10.1093/gji/ggab238 | 300.000000 |
17 | 27 | 2017 | Computers, Environment and Urban Systems | 108 | 10.1016/j.compenvurbsys.2016.10.010 | 300.000000 |
20 | 1 | 2021 | Concurrency and Computation: Practice and Expe... | 4 | 10.1002/cpe.6099 | 300.000000 |
122 | 32 | 2017 | Renewable Energy | 127 | 10.1016/j.renene.2017.02.052 | 296.875000 |
100 | 12 | 2019 | Journal of Proteome Research | 44 | 10.1021/acs.jproteome.8b00761 | 266.666667 |
df_citations = pd.read_csv("../outputs/full_nsf_doi_project_summary.tsv", sep='\t')[['doi', 'ams_bib']]
print("""
|Citation| EC Cites | Journal Mean Cites | % diff mean cites|
|---:|:--:|:--:|:--:|""")
for i, r in df_tmp[:10].iterrows():
doi = df_citations.query(f"doi == '{r['EC Publication DOI']}'").drop_duplicates()['ams_bib'].unique()[0]
print(f"{doi} | {r['EC Cites']} | {r['Journal Mean Cites']} | {r['% diff mean cites']:.2f}% |")
|Citation| EC Cites | Journal Mean Cites | % diff mean cites|
|---:|:--:|:--:|:--:|
Yang, C., Q. Huang, Z. Li, K. Liu, and F. Hu, 2016: Big Data and cloud computing: innovation opportunities and challenges. International Journal of Digital Earth, 10, 13–53, https://doi.org/10.1080/17538947.2016.1239771. | 344 | 17 | 1923.53% |
Morlighem, M., and Coauthors, 2017: BedMachine v3: Complete Bed Topography and Ocean Bathymetry Mapping of Greenland From Multibeam Echo Sounding Combined With Mass Conservation. Geophysical Research Letters, 44, https://doi.org/10.1002/2017gl074954. | 336 | 25 | 1244.00% |
Williams, J. W., and Coauthors, 2018: The Neotoma Paleoecology Database, a multiproxy, international, community-curated data resource. Quaternary Research, 89, 156–177, https://doi.org/10.1017/qua.2017.105. | 120 | 10 | 1100.00% |
Abernathey, R. P., and Coauthors, 2021: Cloud-Native Repositories for Big Scientific Data. Computing in Science & Engineering, 23, 26–35, https://doi.org/10.1109/mcse.2021.3059437. | 10 | 1 | 900.00% |
Gil, Y., and Coauthors, 2018: Intelligent systems for geosciences. Communications of the ACM, 62, 76–84, https://doi.org/10.1145/3192335. | 39 | 4 | 875.00% |
Granger, B. E., and F. Perez, 2021: Jupyter: Thinking and Storytelling With Code and Data. Computing in Science & Engineering, 23, 7–14, https://doi.org/10.1109/mcse.2021.3059263. | 8 | 1 | 700.00% |
Schaen, A. J., and Coauthors, 2020: Interpreting and reporting 40Ar/39Ar geochronologic data. GSA Bulletin, 133, 461–487, https://doi.org/10.1130/b35560.1. | 30 | 4 | 650.00% |
Sun, Z., L. Di, and H. Fang, 2018: Using long short-term memory recurrent neural network in land cover classification on Landsat and Cropland data layer time series. International Journal of Remote Sensing, 40, 593–614, https://doi.org/10.1080/01431161.2018.1516313. | 49 | 8 | 512.50% |
Gil, Y., and Coauthors, 2016: Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance. Earth and Space Science, 3, 388–415, https://doi.org/10.1002/2015ea000136. | 91 | 15 | 506.67% |
Maidment, D. R., 2016: Conceptual Framework for the National Flood Interoperability Experiment. JAWRA Journal of the American Water Resources Association, 53, 245–257, https://doi.org/10.1111/1752-1688.12474. | 72 | 13 | 453.85% |
Citation |
EC Cites |
Journal Mean Cites |
% diff mean cites |
---|---|---|---|
Yang, C., Q. Huang, Z. Li, K. Liu, and F. Hu, 2016: Big Data and cloud computing: innovation opportunities and challenges. International Journal of Digital Earth, 10, 13–53, https://doi.org/10.1080/17538947.2016.1239771. |
344 |
17 |
1923.53% |
Morlighem, M., and Coauthors, 2017: BedMachine v3: Complete Bed Topography and Ocean Bathymetry Mapping of Greenland From Multibeam Echo Sounding Combined With Mass Conservation. Geophysical Research Letters, 44, https://doi.org/10.1002/2017gl074954. |
336 |
25 |
1244.00% |
Williams, J. W., and Coauthors, 2018: The Neotoma Paleoecology Database, a multiproxy, international, community-curated data resource. Quaternary Research, 89, 156–177, https://doi.org/10.1017/qua.2017.105. |
120 |
10 |
1100.00% |
Abernathey, R. P., and Coauthors, 2021: Cloud-Native Repositories for Big Scientific Data. Computing in Science & Engineering, 23, 26–35, https://doi.org/10.1109/mcse.2021.3059437. |
10 |
1 |
900.00% |
Gil, Y., and Coauthors, 2018: Intelligent systems for geosciences. Communications of the ACM, 62, 76–84, https://doi.org/10.1145/3192335. |
39 |
4 |
875.00% |
Granger, B. E., and F. Perez, 2021: Jupyter: Thinking and Storytelling With Code and Data. Computing in Science & Engineering, 23, 7–14, https://doi.org/10.1109/mcse.2021.3059263. |
8 |
1 |
700.00% |
Schaen, A. J., and Coauthors, 2020: Interpreting and reporting 40Ar/39Ar geochronologic data. GSA Bulletin, 133, 461–487, https://doi.org/10.1130/b35560.1. |
30 |
4 |
650.00% |
Sun, Z., L. Di, and H. Fang, 2018: Using long short-term memory recurrent neural network in land cover classification on Landsat and Cropland data layer time series. International Journal of Remote Sensing, 40, 593–614, https://doi.org/10.1080/01431161.2018.1516313. |
49 |
8 |
512.50% |
Gil, Y., and Coauthors, 2016: Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance. Earth and Space Science, 3, 388–415, https://doi.org/10.1002/2015ea000136. |
91 |
15 |
506.67% |
Maidment, D. R., 2016: Conceptual Framework for the National Flood Interoperability Experiment. JAWRA Journal of the American Water Resources Association, 53, 245–257, https://doi.org/10.1111/1752-1688.12474. |
72 |
13 |
453.85% |
TAKEAWAYS
journal average citation: 14.96
average citation of EC papers: 24.17 *from WOS metadata citation counts
EC papers beat average 79 times (52.7% of the time)