You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 58 Next »

Informatics efforts by the Mass Digitization Program at the Digitization Program Office, OCIO

Increase the quantity, quality, and throughput of digitization - DPO Strategic Goal 1

We are developing systems and tools to help the Smithsonian units create and expand the digital collection records. These include:

  • Simplifying data cleanup (Excel is not always the best tool)
  • Scripting data conversion (If you need to do something more than once, a script saves you time)
  • Link records to taxonomies or other databases (Doing this by hand is not sustainable in any large collection, we let the computers do the tedious tasks)
  • Spatial databases to improve geo-referencing processes for a large amount of records
    • Detect common issues like inverted sign in coordinates or values outside the country (A missing sign means sending a record from Brazil to Madagascar)
    • Obtain approximate coordinates from a location string using natural language (There is not enough time or people to georeference records one by one)
  • Big Data tools to analyze datasets with millions of records (e.g. Google BigQuery)
  • Consolidation of strings, matching terms in a database to a single taxonomy

These are some of the projects we have been working on:

  • GBIF Issues Explorer - This Shiny app allows researchers and data/collection managers to navigate the records with issues in a GBIF Darwin Core Archive. Winner of 2nd Place Award in GBIF 2018 Ebbe Nielsen Challenge!
  • Match Getty AAT - A prototype app that matches terms in a file to the Getty Art & Architecture Thesaurus using their Linked Open Data portal. The app tries to find the best match by using a set of keywords included with each row, when available, to try to disambiguate the usage. For terms where many matches are found, the app allows the user to select the best one. Once the process is completed, the results file can be downloaded for further processing or importing to the CIS or other database.
  • Virtual Barcodes - This system allows the vendor to lookup an item in a database and scan the unique identifier from a computer screen. This allows us to reduce the production time and error rate from other methods like paper barcodes, spreadsheets, or call and response. The database is updated twice a day with a view from the unit's CIS.
  • Packages to query EDAN from both R and Python.


We are also looking into training needs at the institution, digitization and data scrubbing tools that can benefit more than one unit, and other innovative approaches to improve the digital records. 

Some areas we are working on:

Software Tools

  • Linking data to enhance the collections

Training

  • What informatics training is needed?
  • Resources

Informatics Resources

  • What is available at SI

DPO Projects

  • Admin and hardware projects at DPO

Spatial Database

  • Georeference
  • Check for errors in coordinates

Publications and Reports

  • By DPO Informatics

  • With contributions by DPO

Reference Info

  • API's
  • Data sources

Social Media

  • Follow our projects


Have a data problem? We are looking for ways to help the Smithsonian units create and enhance the digital records of the collections. 

Contact us: 


Informatics Training and Events @ SI

  1. EDIT THE CALENDAR

    Customize the different types of events you'd like to manage in this calendar.

    #legIndex/#totalLegs
  2. RESTRICT THE CALENDAR

    Optionally, restrict who can view or add events to the team calendar.

    #legIndex/#totalLegs
  3. SHARE WITH YOUR TEAM

    Grab the calendar's URL and email it to your team, or paste it on a page to embed the calendar.

    #legIndex/#totalLegs
  4. ADD AN EVENT

    The calendar is ready to go! Click any day on the calendar to add an event or use the Add event button.

    #legIndex/#totalLegs
  5. SUBSCRIBE

    Subscribe to calendars using your favorite calendar client.

    #legIndex/#totalLegs

  • No labels