BASIC DATACITE ANALYSIS#

This notebook uses the datacite source file to explore the top Earthcube-funded papers from the Altmetric lens.

It produces three files in ../outputs/altmetric/:

import pandas as pd 
df = pd.read_json("../outputs/datacite/datacite_data_map.json",  )
print(df.describe().loc['count'].sort_values(ascending=False).to_markdown())
|         |   count |
|--------:|--------:|
| 1440351 |       3 |
| 1928393 |       2 |
| 1541390 |       1 |
| 1639683 |       1 |
| 1639764 |       1 |
| 1928406 |       1 |
| 1440066 |       1 |
| 1639694 |       1 |
df
1440351 1541390 1639683 1639764 1928406 1440066 1928393 1639694
10.1594/ieda/100709 {'cr_meta': {'type': 'dataset', 'id': 'https:/... NaN NaN NaN NaN NaN NaN NaN
10.1594/ieda/100691 {'cr_meta': {'type': 'dataset', 'id': 'https:/... NaN NaN NaN NaN NaN NaN NaN
10.6084/m9.figshare.4272164.v1 {'cr_meta': {'type': 'article-journal', 'id': ... NaN NaN NaN NaN NaN NaN NaN
10.18739/a24m9198b NaN {'cr_meta': {'type': 'dataset', 'id': 'https:/... NaN NaN NaN NaN NaN NaN
10.6084/m9.figshare.14848713.v1 NaN NaN {'cr_meta': {'type': 'graphic', 'id': 'https:/... NaN NaN NaN NaN NaN
10.1594/pangaea.892680 NaN NaN NaN {'cr_meta': {'type': 'dataset', 'id': 'https:/... NaN NaN NaN NaN
10.5281/zenodo.5496306 NaN NaN NaN NaN {'cr_meta': {'type': 'book', 'id': 'https://do... NaN NaN NaN
10.13140/rg.2.1.4908.4561 NaN NaN NaN NaN NaN {'cr_meta': {'type': 'article', 'id': 'https:/... NaN NaN
10.5281/zenodo.4558266 NaN NaN NaN NaN NaN NaN {'cr_meta': {'type': 'book', 'id': 'https://do... NaN
10.5281/zenodo.6369184 NaN NaN NaN NaN NaN NaN {'cr_meta': {'type': 'book', 'id': 'https://do... NaN
10.5065/p2jj-9878 NaN NaN NaN NaN NaN NaN NaN {'cr_meta': {'type': 'report', 'id': 'https://...
nsf_project_titles = pd.read_csv("../outputs/nsf/nsfid_project_title_normed.csv")
nsf_project_titles.columns = ['nsfid', 'title']
nsf_project_titles.set_index('nsfid')
title
nsfid
1639588 Collaborative Proposal: Earthcube Building Blo...
1639614 Collaborative Proposal: Earthcube Building Blo...
1740719 Collaborative Proposal: Earthcube Integration:...
1740683 Collaborative Proposal: Earthcube Integration:...
1740641 Collaborative Proposal: Earthcube Integration:...
... ...
1540994 Earthcubeia: Collaborative Proposal: Building ...
1340265 Ec3 - Earth-Centered Communication For Cyberin...
1928208 Geosciences Earthcube Community Office
1324760 Rcn: Building A Sediment Experimentalist Netwo...
1343785 Title: Earthcube Building Blocks: Integrating ...

215 rows × 1 columns

df_tmp = pd.DataFrame()
for i in df.index:
    #print(df.loc[i].dropna().values[0]['dc_meta']['attributes']['types']['schemaOrg'])
    
    dc_meta = df.loc[i].dropna().values[0]['dc_meta']
    
    nsfid = df.loc[i].dropna().index.values[0]
    doi = dc_meta['attributes']['doi']
    title = dc_meta['attributes']['titles'][0]['title']
    
    if dc_meta['attributes']['types']['schemaOrg'] in ['Dataset', 'ScholarlyArticle', 'SoftwareSourceCode', 'Report']:
        type_string = dc_meta['attributes']['types']['schemaOrg']
    elif dc_meta['attributes']['types']['schemaOrg'] == 'CreativeWork':
        type_string = \
            dc_meta['attributes']['types'].get('citeproc', dc_meta['attributes']['types']['schemaOrg']).capitalize()
    else:
        type_string = \
            dc_meta['attributes']['types'].get('resourceType', dc_meta['attributes']['types']['schemaOrg'])

    df_tmp = pd.concat([df_tmp, pd.Series([nsfid, doi, title, type_string]).T], axis=1)
df_tmp = df_tmp.T.reset_index().drop('index', axis=1)
df_tmp.columns = ['nsfid', 'doi', 'resource_title', 'type']
df_datacite_data = df_tmp.merge(nsf_project_titles)
df_datacite_data
nsfid doi resource_title type title
0 1440351 10.1594/ieda/100709 iSamples Sample Management Training Module for... Dataset Earthcube Rcn: Isamples: The Internet Of Sampl...
1 1440351 10.1594/ieda/100691 iSamples Sample Management Training Module for... Dataset Earthcube Rcn: Isamples: The Internet Of Sampl...
2 1440351 10.6084/m9.figshare.4272164.v1 iSamples user stories: common themes and areas... ScholarlyArticle Earthcube Rcn: Isamples: The Internet Of Sampl...
3 1541390 10.18739/a24m9198b Estimating the Freshwater Flux from the Greenl... Dataset Earthcube Rcn: Collaborative Research: Engagin...
4 1639683 10.6084/m9.figshare.14848713.v1 Intelligent Databases and Machine-Learning Ana... Poster Earthcube Data Infrastructure: Intelligent Dat...
5 1639764 10.1594/pangaea.892680 Land2Sea database, Version 2.0 Dataset Earthcube Building Blocks: Collaborative Propo...
6 1928406 10.5281/zenodo.5496306 Mapping ice flow velocity using an easy and in... SoftwareSourceCode Collaborative Research: Earthcube Data Capabil...
7 1440066 10.13140/rg.2.1.4908.4561 EarthCube Oceanography and Geobiology Environm... Article Earthcube Rcn: An Earthcube Oceanography And G...
8 1928393 10.5281/zenodo.4558266 nsidc/qgreenland: v1.0.1 SoftwareSourceCode Earthcube Data Capabilities: Qgreenland: Enabl...
9 1928393 10.5281/zenodo.6369184 QGreenland SoftwareSourceCode Earthcube Data Capabilities: Qgreenland: Enabl...
10 1639694 10.5065/p2jj-9878 Proceedings of the 2020 Improving Scientific S... Report Earthcube Building Blocks: Collaborative Propo...
df_datacite_data.nsfid.value_counts()
1440351    3
1928393    2
1541390    1
1639683    1
1639764    1
1928406    1
1440066    1
1639694    1
Name: nsfid, dtype: int64
df_datacite_data.type.value_counts()
Dataset               4
SoftwareSourceCode    3
ScholarlyArticle      1
Poster                1
Article               1
Report                1
Name: type, dtype: int64
with open("../outputs/datacite/datacite_table_01.md", "w") as fo:
    fo.write("| Resource Type |  EC Project | Resource Title |\n")
    fo.write("|:---:|:----|:----|\n")
    for r in df_datacite_data.itertuples():
        fo.write(
            f"| {r.type}" +
            f"|{r.title} (NSF [#{r.nsfid}](https://nsf.gov/awardsearch/showAward?AWD_ID={r.nsfid}&HistoricalAwards=false))" +
            f"| {r.resource_title} (doi: [{r.doi}](https://doi.org/{r.doi})) |\n"
            )

Resource Type

EC Project

Resource Title

Dataset

Earthcube Rcn: Isamples: The Internet Of Samples In The Earth Sciences (NSF #1440351)

iSamples Sample Management Training Module for Soil Cores (doi: 10.1594/ieda/100709)

Dataset

Earthcube Rcn: Isamples: The Internet Of Samples In The Earth Sciences (NSF #1440351)

iSamples Sample Management Training Module for Rock Outcrop Samples (doi: 10.1594/ieda/100691)

ScholarlyArticle

Earthcube Rcn: Isamples: The Internet Of Samples In The Earth Sciences (NSF #1440351)

iSamples user stories: common themes and areas for future work (doi: 10.6084/m9.figshare.4272164.v1)

Dataset

Earthcube Rcn: Collaborative Research: Engaging The Greenland Ice Sheet Ocean (Griso) Science Network (NSF #1541390)

Estimating the Freshwater Flux from the Greenland Ice Sheet Workshop Report, American Geophysical Union, 2018 (doi: 10.18739/a24m9198b)

Poster

Earthcube Data Infrastructure: Intelligent Databases And Analysis Tools For Geospace Data (NSF #1639683)

Intelligent Databases and Machine-Learning Analysis Tools for Heliophysics (doi: 10.6084/m9.figshare.14848713.v1)

Dataset

Earthcube Building Blocks: Collaborative Proposal: Earthcube Data Discovery Hub (NSF #1639764)

Land2Sea database, Version 2.0 (doi: 10.1594/pangaea.892680)

SoftwareSourceCode

Collaborative Research: Earthcube Data Capabilities–Jupyter Meets The Earth: Enabling Discovery In Geoscience Through Interactive Computing At Scale (NSF #1928406)

Mapping ice flow velocity using an easy and interactive feature tracking workflow (doi: 10.5281/zenodo.5496306)

Report

Earthcube Building Blocks: Collaborative Proposal: The Power Of Many: Ensemble Toolkit For Earth Sciences (NSF #1639694)

Proceedings of the 2020 Improving Scientific Software Conference (doi: 10.5065/p2jj-9878)

SoftwareSourceCode

Earthcube Data Capabilities: Qgreenland: Enabling Science Through Gis (NSF #1928393)

QGreenland (doi: 10.5281/zenodo.6369184)

SoftwareSourceCode

Earthcube Data Capabilities: Qgreenland: Enabling Science Through Gis (NSF #1928393)

nsidc/qgreenland: v1.0.1 (doi: 10.5281/zenodo.4558266)

Article

Earthcube Rcn: An Earthcube Oceanography And Geobiology Environmental Omics Research Coordination Network (Ecogeo Rcn) (NSF #1440066)

EarthCube Oceanography and Geobiology Environmental ‘Omics Research Coordination Network Workshop 1 Report (doi: 10.13140/rg.2.1.4908.4561)

df_datacite_data.to_csv("../outputs/datacite/datacite_data_map.csv", index=False)