BASIC DATACITE ANALYSIS
BASIC DATACITE ANALYSIS#
This notebook uses the datacite source file to explore the top Earthcube-funded papers from the Altmetric lens.
It produces three files in ../outputs/altmetric/
:
datacite_data_map.csv
import pandas as pd
df = pd.read_json("../outputs/datacite/datacite_data_map.json", )
print(df.describe().loc['count'].sort_values(ascending=False).to_markdown())
| | count |
|--------:|--------:|
| 1440351 | 3 |
| 1928393 | 2 |
| 1541390 | 1 |
| 1639683 | 1 |
| 1639764 | 1 |
| 1928406 | 1 |
| 1440066 | 1 |
| 1639694 | 1 |
df
1440351 | 1541390 | 1639683 | 1639764 | 1928406 | 1440066 | 1928393 | 1639694 | |
---|---|---|---|---|---|---|---|---|
10.1594/ieda/100709 | {'cr_meta': {'type': 'dataset', 'id': 'https:/... | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
10.1594/ieda/100691 | {'cr_meta': {'type': 'dataset', 'id': 'https:/... | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
10.6084/m9.figshare.4272164.v1 | {'cr_meta': {'type': 'article-journal', 'id': ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
10.18739/a24m9198b | NaN | {'cr_meta': {'type': 'dataset', 'id': 'https:/... | NaN | NaN | NaN | NaN | NaN | NaN |
10.6084/m9.figshare.14848713.v1 | NaN | NaN | {'cr_meta': {'type': 'graphic', 'id': 'https:/... | NaN | NaN | NaN | NaN | NaN |
10.1594/pangaea.892680 | NaN | NaN | NaN | {'cr_meta': {'type': 'dataset', 'id': 'https:/... | NaN | NaN | NaN | NaN |
10.5281/zenodo.5496306 | NaN | NaN | NaN | NaN | {'cr_meta': {'type': 'book', 'id': 'https://do... | NaN | NaN | NaN |
10.13140/rg.2.1.4908.4561 | NaN | NaN | NaN | NaN | NaN | {'cr_meta': {'type': 'article', 'id': 'https:/... | NaN | NaN |
10.5281/zenodo.4558266 | NaN | NaN | NaN | NaN | NaN | NaN | {'cr_meta': {'type': 'book', 'id': 'https://do... | NaN |
10.5281/zenodo.6369184 | NaN | NaN | NaN | NaN | NaN | NaN | {'cr_meta': {'type': 'book', 'id': 'https://do... | NaN |
10.5065/p2jj-9878 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {'cr_meta': {'type': 'report', 'id': 'https://... |
nsf_project_titles = pd.read_csv("../outputs/nsf/nsfid_project_title_normed.csv")
nsf_project_titles.columns = ['nsfid', 'title']
nsf_project_titles.set_index('nsfid')
title | |
---|---|
nsfid | |
1639588 | Collaborative Proposal: Earthcube Building Blo... |
1639614 | Collaborative Proposal: Earthcube Building Blo... |
1740719 | Collaborative Proposal: Earthcube Integration:... |
1740683 | Collaborative Proposal: Earthcube Integration:... |
1740641 | Collaborative Proposal: Earthcube Integration:... |
... | ... |
1540994 | Earthcubeia: Collaborative Proposal: Building ... |
1340265 | Ec3 - Earth-Centered Communication For Cyberin... |
1928208 | Geosciences Earthcube Community Office |
1324760 | Rcn: Building A Sediment Experimentalist Netwo... |
1343785 | Title: Earthcube Building Blocks: Integrating ... |
215 rows × 1 columns
df_tmp = pd.DataFrame()
for i in df.index:
#print(df.loc[i].dropna().values[0]['dc_meta']['attributes']['types']['schemaOrg'])
dc_meta = df.loc[i].dropna().values[0]['dc_meta']
nsfid = df.loc[i].dropna().index.values[0]
doi = dc_meta['attributes']['doi']
title = dc_meta['attributes']['titles'][0]['title']
if dc_meta['attributes']['types']['schemaOrg'] in ['Dataset', 'ScholarlyArticle', 'SoftwareSourceCode', 'Report']:
type_string = dc_meta['attributes']['types']['schemaOrg']
elif dc_meta['attributes']['types']['schemaOrg'] == 'CreativeWork':
type_string = \
dc_meta['attributes']['types'].get('citeproc', dc_meta['attributes']['types']['schemaOrg']).capitalize()
else:
type_string = \
dc_meta['attributes']['types'].get('resourceType', dc_meta['attributes']['types']['schemaOrg'])
df_tmp = pd.concat([df_tmp, pd.Series([nsfid, doi, title, type_string]).T], axis=1)
df_tmp = df_tmp.T.reset_index().drop('index', axis=1)
df_tmp.columns = ['nsfid', 'doi', 'resource_title', 'type']
df_datacite_data = df_tmp.merge(nsf_project_titles)
df_datacite_data
nsfid | doi | resource_title | type | title | |
---|---|---|---|---|---|
0 | 1440351 | 10.1594/ieda/100709 | iSamples Sample Management Training Module for... | Dataset | Earthcube Rcn: Isamples: The Internet Of Sampl... |
1 | 1440351 | 10.1594/ieda/100691 | iSamples Sample Management Training Module for... | Dataset | Earthcube Rcn: Isamples: The Internet Of Sampl... |
2 | 1440351 | 10.6084/m9.figshare.4272164.v1 | iSamples user stories: common themes and areas... | ScholarlyArticle | Earthcube Rcn: Isamples: The Internet Of Sampl... |
3 | 1541390 | 10.18739/a24m9198b | Estimating the Freshwater Flux from the Greenl... | Dataset | Earthcube Rcn: Collaborative Research: Engagin... |
4 | 1639683 | 10.6084/m9.figshare.14848713.v1 | Intelligent Databases and Machine-Learning Ana... | Poster | Earthcube Data Infrastructure: Intelligent Dat... |
5 | 1639764 | 10.1594/pangaea.892680 | Land2Sea database, Version 2.0 | Dataset | Earthcube Building Blocks: Collaborative Propo... |
6 | 1928406 | 10.5281/zenodo.5496306 | Mapping ice flow velocity using an easy and in... | SoftwareSourceCode | Collaborative Research: Earthcube Data Capabil... |
7 | 1440066 | 10.13140/rg.2.1.4908.4561 | EarthCube Oceanography and Geobiology Environm... | Article | Earthcube Rcn: An Earthcube Oceanography And G... |
8 | 1928393 | 10.5281/zenodo.4558266 | nsidc/qgreenland: v1.0.1 | SoftwareSourceCode | Earthcube Data Capabilities: Qgreenland: Enabl... |
9 | 1928393 | 10.5281/zenodo.6369184 | QGreenland | SoftwareSourceCode | Earthcube Data Capabilities: Qgreenland: Enabl... |
10 | 1639694 | 10.5065/p2jj-9878 | Proceedings of the 2020 Improving Scientific S... | Report | Earthcube Building Blocks: Collaborative Propo... |
df_datacite_data.nsfid.value_counts()
1440351 3
1928393 2
1541390 1
1639683 1
1639764 1
1928406 1
1440066 1
1639694 1
Name: nsfid, dtype: int64
df_datacite_data.type.value_counts()
Dataset 4
SoftwareSourceCode 3
ScholarlyArticle 1
Poster 1
Article 1
Report 1
Name: type, dtype: int64
with open("../outputs/datacite/datacite_table_01.md", "w") as fo:
fo.write("| Resource Type | EC Project | Resource Title |\n")
fo.write("|:---:|:----|:----|\n")
for r in df_datacite_data.itertuples():
fo.write(
f"| {r.type}" +
f"|{r.title} (NSF [#{r.nsfid}](https://nsf.gov/awardsearch/showAward?AWD_ID={r.nsfid}&HistoricalAwards=false))" +
f"| {r.resource_title} (doi: [{r.doi}](https://doi.org/{r.doi})) |\n"
)
Resource Type |
EC Project |
Resource Title |
---|---|---|
Dataset |
Earthcube Rcn: Isamples: The Internet Of Samples In The Earth Sciences (NSF #1440351) |
iSamples Sample Management Training Module for Soil Cores (doi: 10.1594/ieda/100709) |
Dataset |
Earthcube Rcn: Isamples: The Internet Of Samples In The Earth Sciences (NSF #1440351) |
iSamples Sample Management Training Module for Rock Outcrop Samples (doi: 10.1594/ieda/100691) |
ScholarlyArticle |
Earthcube Rcn: Isamples: The Internet Of Samples In The Earth Sciences (NSF #1440351) |
iSamples user stories: common themes and areas for future work (doi: 10.6084/m9.figshare.4272164.v1) |
Dataset |
Earthcube Rcn: Collaborative Research: Engaging The Greenland Ice Sheet Ocean (Griso) Science Network (NSF #1541390) |
Estimating the Freshwater Flux from the Greenland Ice Sheet Workshop Report, American Geophysical Union, 2018 (doi: 10.18739/a24m9198b) |
Poster |
Earthcube Data Infrastructure: Intelligent Databases And Analysis Tools For Geospace Data (NSF #1639683) |
Intelligent Databases and Machine-Learning Analysis Tools for Heliophysics (doi: 10.6084/m9.figshare.14848713.v1) |
Dataset |
Earthcube Building Blocks: Collaborative Proposal: Earthcube Data Discovery Hub (NSF #1639764) |
Land2Sea database, Version 2.0 (doi: 10.1594/pangaea.892680) |
SoftwareSourceCode |
Collaborative Research: Earthcube Data Capabilities–Jupyter Meets The Earth: Enabling Discovery In Geoscience Through Interactive Computing At Scale (NSF #1928406) |
Mapping ice flow velocity using an easy and interactive feature tracking workflow (doi: 10.5281/zenodo.5496306) |
Report |
Earthcube Building Blocks: Collaborative Proposal: The Power Of Many: Ensemble Toolkit For Earth Sciences (NSF #1639694) |
Proceedings of the 2020 Improving Scientific Software Conference (doi: 10.5065/p2jj-9878) |
SoftwareSourceCode |
Earthcube Data Capabilities: Qgreenland: Enabling Science Through Gis (NSF #1928393) |
QGreenland (doi: 10.5281/zenodo.6369184) |
SoftwareSourceCode |
Earthcube Data Capabilities: Qgreenland: Enabling Science Through Gis (NSF #1928393) |
nsidc/qgreenland: v1.0.1 (doi: 10.5281/zenodo.4558266) |
Article |
Earthcube Rcn: An Earthcube Oceanography And Geobiology Environmental Omics Research Coordination Network (Ecogeo Rcn) (NSF #1440066) |
EarthCube Oceanography and Geobiology Environmental ‘Omics Research Coordination Network Workshop 1 Report (doi: 10.13140/rg.2.1.4908.4561) |
df_datacite_data.to_csv("../outputs/datacite/datacite_data_map.csv", index=False)