Already a member?

Sign In
Syndicate content

Data Access & Open Data

Issues around access to data

IASSIST Quarterly (2011: Fall)

Sharing data and building information

With this issue (volume 35-3, 2011) of the IASSIST Quarterly (IQ) we return to the regular format of a collection of articles not within the same specialist subject area as we have seen in recent special issues of IQ. Naturally the three articles presented here are related to the IQ subject area in general, as in: assisting research with data, acquiring data from research, and making good use of the user community. This last topic could also be spelled “involvement”. The hope is that these articles will carry involvement to the IASSIST community, so that the gained knowledge can be shared and practised widely.


“Mind the gap” is a caveat to passengers on the London Underground. The authors of this article are Susan Noble, Celia Russell and Richard Wiseman, all affiliated with ESDS-International hosted by Mimas at the University of Manchester in the UK. The ESDS, standing for “Economic and Social Data Service”, are extending their reach beyond the UK. In the article “Mind the Gap: Global Data Sharing” they are looking into how today’s research on the important topics of climate change, economic crises, migration and health requires cross-national data sharing. Clearly these topics are international (e.g. the weather or air pollution does not stop at national borders), but the article discusses how existing barriers prevent global data sharing. The paper is based on a presentation in a session on “Sharing data: High Rewards, Formidable Barriers” at the IASSIST 2009 conference. It is demonstrated how even international data produced by intergovernmental organizations like the International Monetary Fund, the International Energy Agency, OECD, the United Nations and the World Bank are often only available with an expensive subscription, presented in complex incomprehensible tables, through special interfaces; such barriers are making the international use of the data difficult. Because of missing metadata standards it is difficult to evaluate the quality of the dataset and to search for and locate the data resources required. The paper highlights the development of e-learning materials that can raise awareness and ease access to international data. In this case the example is e-learning for the “United Nations Millennium Development Goals”.


The second paper is also related to the sharing of data with an introduction to the international level. “The Research-Data-Centre in Research-Data-Centre Approach: A First Step Towards Decentralised International Data Sharing” is written by Stefan Bender and Jörg Heining from the Institute for Employment Research (IAB) in Nuremberg, Germany. In order to preserve the confidentiality of single entities, access to complete datasets is often restricted to monitored on-site analysis. Although off-site access is facilitated in other countries, Germany has relied on on-site security. However, an opportunity has been presented where Research Data Centre sites are placed at Statistical Offices around Germany, and also at a Michigan centre for demography. The article contains historical information on approaches and developments in other countries and has a special focus on the German solution. The project will gain experience in the complex balance between confidentiality and analysis, and the differences between national laws.


The paper by Stuart Macdonald from EDINA in Scotland originated as a poster session at the IASSIST 2010 conference. The name of the paper is “AddressingHistory: a Web2.0 community engagement tool and API”. The community consists of members within and outside academia, as local history groups and genealogists are using the software to enhance and combine data from historical Scottish Post Office Directories with large-scale historical maps. The background and technical issues are presented in the paper, which also looks into issues and perspectives of user generated content. The “crowdsourcing” tool did successfully generate engagement and there are plans for further development, such as upload and attachment of photos of people, buildings, and landmarks to enrich the collection.

Articles for the IQ are always very welcome. They can be papers from IASSIST conferences or other conferences and workshops, from local presentations or papers especially written for the IQ. If you don’t have anything to offer right now, then please prepare yourself for the next IASSIST conference and start planning for participation in a session there. Chairing a conference session with the purpose of aggregating and integrating papers for a special issue IQ is much appreciated as the information in the form of an IQ issue reaches many more people than the session participants and will be readily available on the IASSIST website at http://www.iassistdata.org.

Authors are very welcome to take a look at the instructions and layout:
http://iassistdata.org/iq/instructions-authors


Authors can also contact me via e-mail: kbr@sam.sdu.dk. Should you be interested in compiling a special issue for the IQ as guest editor(s) I will also be delighted to hear from you.

 

Karsten Boye Rasmussen

December 2011

Open Access to Federally Funded Research

Got something to say about "ensuring long-term stewardship and encouraging broad public access to unclassified digital data that result from federally funded scientific research"?

 

The White House Office for Science and Technology Policy (OSTP) released two public consultations today, one on OA for data and one on OA for publications arising from publicly-funded research. Responses are due in early January. Please spread the word. Submit your own comments and/or work with colleagues to submit comments on behalf of your institution.

(1) "[T]his Request for Information (RFI) offers the opportunity for interested individuals and organizations to provide recommendations on approaches for ensuring long-term stewardship and encouraging broad public access to unclassified digital data that result from federally funded scientific research....Response Date: January 12, 2012...."
http://goo.gl/L1jn3

(2) "[T]his Request for Information (RFI) offers the opportunity for interested individuals and organizations to provide recommendations on approaches for ensuring long-term stewardship and broad public access to the peer-reviewed scholarly publications that result from federally funded scientific research....Response Date: January 2, 2012...."
http://goo.gl/vTP18

86 helpful tools for the data professional PLUS 45 bonus tools

I have been working on this (mostly) annotated collection of tools and articles that I believe would be of help to both the data dabbler and professional. If you are a data scientist, data analyst or data dummy, chances are there is something in here for you. I included a list of tools, such as programming languages and web-based utilities, data mining resources, some prominent organizations in the field, repositories where you can play with data, events you may want to attend and important articles you should take a look at.

The second segment (BONUS!) of the list includes a number of art and design resources the infographic designers might like including color palette generators and image searches. There are also some invisible web resources (if you're looking for something data-related on Google and not finding it) and metadata resources so you can appropriately curate your data. This is in no way a complete list so please contact me here with any suggestions!

Data Tools

  1. Google Refine - A power tool for working with messy data (formerly Freebase Gridworks)
  2. The Overview Project - Overview is an open-source tool to help journalists find stories in large amounts of data, by cleaning, visualizing and interactively exploring large document and data sets. Whether from government transparency initiatives, leaks or Freedom of Information requests, journalists are drowning in more documents than they can ever hope to read.
  3. Refine, reuse and request data | ScraperWiki - ScraperWiki is an online tool to make acquiring useful data simpler and more collaborative. Anyone can write a screen scraper using the online editor. In the free version, the code and data are shared with the world. Because it's a wiki, other programmers can contribute to and improve the code.
  4. Data Curation Profiles - This website is an environment where academic librarians of all kinds, special librarians at research facilities, archivists involved in the preservation of digital data, and those who support digital repositories can find help, support and camaraderie in exploring avenues to learn more about working with research data and the use of the Data Curation Profiles Tool.
  5. Google Chart Tools - Google Chart Tools provide a perfect way to visualize data on your website. From simple line charts to complex hierarchical tree maps, the chart galley provides a large number of well-designed chart types. Populating your data is easy using the provided client- and server-side tools.
  6. 22 free tools for data visualization and analysis
  7. The R Journal - The R Journal is the refereed journal of the R project for statistical computing. It features short to medium length articles covering topics that might be of interest to users or developers of R.
  8. CS 229: Machine Learning - A widely referenced course by Professor Andrew Ng, CS 229: Machine Learning provides a broad introduction to machine learning and statistical pattern recognition. Topics include supervised learning, unsupervised learning, learning theory, reinforcement learning and adaptive control. Recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing are also discussed.
  9. Google Research Publication: BigTable - Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.
  10. Scientific Data Management - An introduction.
  11. Natural Language Toolkit - Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics, with distributions for Windows, Mac OSX and Linux.
  12. Beautiful Soup - Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping.
  13. Mondrian: Pentaho Analysis - Pentaho Open source analysis OLAP server written in Java. Enabling interactive analysis of very large datasets stored in SQL databases without writing SQL.
  14. The Comprehensive R Archive Network - R is `GNU S', a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. Please consult the R project homepage for further information. CRAN is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R. Please use the CRAN mirror nearest to you to minimize network load.
  15. DataStax - Software, support, and training for Apache Cassandra.
  16. Machine Learning Demos
  17. Visual.ly - Infographics & Visualizations. Create, Share, Explore
  18. Google Fusion Tables - Google Fusion Tables is a modern data management and publishing web application that makes it easy to host, manage, collaborate on, visualize, and publish data tables online.
  19. Tableau Software - Fast Analytics and Rapid-fire Business Intelligence from Tableau Software.
  20. WaveMaker - WaveMaker is a rapid application development environment for building, maintaining and modernizing business-critical Web 2.0 applications.
  21. Visualization: Annotated Time Line - Google Chart Tools - Google Code An interactive time series line chart with optional annotations. The chart is rendered within the browser using Flash.
  22. Visualization: Motion Chart - Google Chart Tools - Google Code A dynamic chart to explore several indicators over time. The chart is rendered within the browser using Flash.
  23. PhotoStats Create gorgeous infographics about your iPhone photos, with Photostats.
  24. Ionz Ionz will help you craft an infographic about yourself.
  25. chart builder Powerful tools for creating a variety of charts for online display.
  26. Creately Online diagramming and design.
  27. Pixlr Editor A powerful online photo editor.
  28. Google Public Data Explorer ?The Google Public Data Explorer makes large datasets easy to explore, visualize and communicate. As the charts and maps animate over time, the changes in the world become easier to understand. You don't have to be a data expert to navigate between different views, make your own comparisons, and share your findings.
  29. Fathom Fathom Information Design helps clients understand and express complex data through information graphics, interactive tools, and software for installations, the web, and mobile devices. Led by Ben Fry. Enough said!
  30. healthymagination | GE Data Visualization Visualizations that advance the conversation about issues that shape our lives, and so we encourage visitors to download, post and share these visualizations.
  31. ggplot2 ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.
  32. Protovis Protovis composes custom views of data with simple marks such as bars and dots. Unlike low-level graphics libraries that quickly become tedious for visualization, Protovis defines marks through dynamic properties that encode data, allowing inheritance, scales and layoutsto simplify construction.Protovis is free and open-source, provided under the BSD License. It uses JavaScript and SVG for web-native visualizations; no plugin required (though you will need a modern web browser)! Although programming experience is helpful, Protovis is mostly declarative and designed to be learned by example.
  33. d3.js D3.js is a small, free JavaScript library for manipulating documents based on data.
  34. MATLAB - The Language Of Technical Computing MATLAB® is a high-level language and interactive environment that enables you to perform computationally intensive tasks faster than with traditional programming languages such as C, C++, and Fortran.
  35. OpenGL - The Industry Standard for High Performance Graphics OpenGL.org is a vendor-independent and organization-independent web site that acts as one-stop hub for developers and consumers for all OpenGL news and development resources. It has a very large and continually expanding developer and end-user community that is very active and vested in the continued growth of OpenGL.
  36. Google Correlate Google Correlate finds search patterns which correspond with real-world trends.
  37. Revolution Analytics - Commercial Software & Support for the R Statistics Language Revolution Analytics delivers advanced analytics software at half the cost of existing solutions. By building on open source R—the world’s most powerful statistics software—with innovations in big data analysis, integration and user experience, Revolution Analytics meets the demands and requirements of modern data-driven businesses.
  38. 22 Useful Online Chart & Graph Generators
  39. The Best Tools for Visualization Visualization is a technique to graphically represent sets of data. When data is large or abstract, visualization can help make the data easier to read or understand. There are visualization tools for search, music, networks, online communities, and almost anything else you can think of. Whether you want a desktop application or a web-based tool, there are many specific tools are available on the web that let you visualize all kinds of data.
  40. Visual Understanding Environment The Visual Understanding Environment (VUE) is an Open Source project based at Tufts University. The VUE project is focused on creating flexible tools for managing and integrating digital resources in support of teaching, learning and research. VUE provides a flexible visual environment for structuring, presenting, and sharing digital information.
  41. Bime - Cloud Business Intelligence | Analytics & Dashboards Bime is a revolutionary approach to data analysis and dashboarding. It allows you to analyze your data through interactive data visualizations and create stunning dashboards from the Web.
  42. Data Science Toolkit A collection of data tools and open APIs curated by our own Pete Warden. You can use it to extract text from a document, learn the political leanings of a particular neighborhood, find all the names of people mentioned in a text and more.
  43. BuzzData BuzzData lets you share your data in a smarter, easier way. Instead of juggling versions and overwriting files, use BuzzData and enjoy a social network designed for data.
  44. SAP - SAP Crystal Solutions: Simple, Affordable, and Open BI Tools for Everyday Use
  45. Project Voldemort
  46. ggplot. had.co.nz

Data Mining

  1. Weka -nWeka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Weka is open source software issued under the GNU General Public License.
  2. PSPP- PSPP is a program for statistical analysis of sampled data. It is a Free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions. The most important of these exceptions are, that there are no “time bombs”; your copy of PSPP will not “expire” or deliberately stop working in the future. Neither are there any artificial limits on the number of cases or variables which you can use. There are no additional packages to purchase in order to get “advanced” functions; all functionality that PSPP currently supports is in the core package.PSPP can perform descriptive statistics, T-tests, linear regression and non-parametric tests. Its backend is designed to perform its analyses as fast as possible, regardless of the size of the input data. You can use PSPP with its graphical interface or the more traditional syntax commands.
  3. Rapid I- Rapid-I provides software, solutions, and services in the fields of predictive analytics, data mining, and text mining. The company concentrates on automatic intelligent analyses on a large-scale base, i.e. for large amounts of structured data like database systems and unstructured data like texts. The open-source data mining specialist Rapid-I enables other companies to use leading-edge technologies for data mining and business intelligence. The discovery and leverage of unused business intelligence from existing data enables better informed decisions and allows for process optimization.The main product of Rapid-I, the data analysis solution RapidMiner is the world-leading open-source system for knowledge discovery and data mining. It is available as a stand-alone application for data analysis and as a data mining engine which can be integrated into own products. By now, thousands of applications of RapidMiner in more than 30 countries give their users a competitive edge. Among the users are well-known companies as Ford, Honda, Nokia, Miele, Philips, IBM, HP, Cisco, Merrill Lynch, BNP Paribas, Bank of America, mobilkom austria, Akzo Nobel, Aureus Pharma, PharmaDM, Cyprotex, Celera, Revere, LexisNexis, Mitre and many medium-sized businesses benefitting from the open-source business model of Rapid-I.
  4. R Project - R is a language and environment for statistical computing and graphics. It is a GNU projectwhich is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

Organizations

  1. Data.gov
  2. SDM group at LBNL
  3. Open Archives Initiative
  4. Code for America | A New Kind of Public Service
  5. The # DataViz Daily
  6. Institute for Advanced Analytics | North Carolina State University | Professor Michael Rappa · MSA Curriculum
  7. BuzzData | Blog, 25 great links for data-lovin' journalists
  8. MetaOptimize - Home - Machine learning, natural language processing, predictive analytics, business intelligence, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization
  9. had.co.nz
  10. Measuring Measures - Measuring Measures

Repositories

  1. Repositories | DataCite
  2. Data | The World Bank
  3. Infochimps Data Marketplace + Commons: Download Sell or Share Databases, statistics, datasets for free | Infochimps
  4. Factual Home - Factual
  5. Flowing Media: Your Data Has Something To Say
  6. Chartsbin
  7. Public Data Explorer
  8. StatPlanet
  9. ManyEyes
  10. 25+ more ways to bring data into R

Events

  1. Welcome | Visweek 2011
  2. O'Reilly Strata: O'Reilly Conferences
  3. IBM Information On Demand 2011 and Business Analytics Forum
  4. Data Scientist Summit 2011
  5. IBM Virtual Performance 2011
  6. Wolfram Data Summit 2011—Conference on Data Repositories and Ideas
  7. Big Data Analytics: Mobile, Social and Web

Articles

  1. Data Science: a literature review | (R news & tutorials)
  2. What is "Data Science" Anyway?
  3. Hal Varian on how the Web challenges managers - McKinsey Quarterly - Strategy - Innovation
  4. The Three Sexy Skills of Data Geeks « Dataspora
  5. Rise of the Data Scientist
  6. dataists » A Taxonomy of Data Science
  7. The Data Science Venn Diagram « Zero Intelligence Agents
  8. Revolutions: Growth in data-related jobs
  9. Building data startups: Fast, big, and focused - O'Reilly Radar

BONUS! Art Design

  1. Periodic Table of Typefaces
  2. Color Scheme Designer 3
  3. Color Palette Generator Generate A Color Palette For Any Image
  4. COLOURlovers
  5. Colorbrewer: Color Advice for Maps

Image Searches

  1. American Memory from the Library of Congress The home page for the American Memory Historical Collections from the Library of Congress. American Memory provides free access to historical images, maps, sound recordings, and motion pictures that document the American experience. American Memory offers primary source materials that chronicle historical events, people, places, and ideas that continue to shape America.
  2. Galaxy of Images | Smithsonian Institution Libraries
  3. Flickr Search
  4. 50 Websites For Free Vector Images Download
  5. Design weblog for designers, bloggers and tech users. Covering useful tools, tutorials, tips and inspirational photos.
  6. Images Google Images. The most comprehensive image search on the web.
  7. Trade Literature - a set on Flickr
  8. Compfight / A Flickr Search Tool
  9. morgueFile free photos for creatives by creatives
  10. stock.xchng - the leading free stock photography site
  11. The Ultimate Collection Of Free Vector Packs - Smashing Magazine
  12. How to Create Animated GIFs Using Photoshop CS3 - wikiHow
  13. IAN Symbol Libraries (Free Vector Symbols and Icons) - Integration and Application Network
  14. Usability.gov
  15. best icons
  16. Iconspedia
  17. IconFinder
  18. IconSeeker

Invisible Web

  1. 10 Search Engines to Explore the Invisible Web Like the header says...
  2. Scirus - for scientific information The most comprehensive scientific research tool on the web. With over 410 million scientific items indexed at last count, it allows researchers to search for not only journal content but also scientists' homepages, courseware, pre-print server material, patents and institutional repository and website information.
  3. TechXtra: Engineering, Mathematics, and Computing TechXtra is a free service which can help you find articles, books, the best websites, the latest industry news, job announcements, technical reports, technical data, full text eprints, the latest research, thesis & dissertations, teaching and learning resources and more, in engineering, mathematics and computing.
  4. Welcome to INFOMINE: Scholarly Internet Resource Collections INFOMINE is a virtual library of Internet resources relevant to faculty, students, and research staff at the university level. It contains useful Internet resources such as databases, electronic journals, electronic books, bulletin boards, mailing lists, online library card catalogs, articles, directories of researchers, and many other types of information.
  5. The WWW Virtual Library The WWW Virtual Library (VL) is the oldest catalogue of the Web, started by Tim Berners-Lee, the creator of HTML and of the Web itself, in 1991 at CERN in Geneva. Unlike commercial catalogues, it is run by a loose confederation of volunteers, who compile pages of key links for particular areas in which they are expert; even though it isn't the biggest index of the Web, the VL pages are widely recognised as being amongst the highest-quality guides to particular sections of the Web.
  6. Intute Intute is a free online service that helps you to find web resources for your studies and research. With millions of resources available on the Internet, it can be difficult to find useful material. We have reviewed and evaluated thousands of resources to help you choose key websites in your subject. The Virtual Training Suite can also help you develop your Internet research skills through tutorials written by lecturers and librarians from universities across the UK.
  7. CompletePlanet - Discover over 70,000+ databases and specially search engines There are hundreds of thousands of databases that contain Deep Web content. CompletePlanet is the front door to these Deep Web databases on the Web and to the thousands of regular search engines — it is the first step in trying to find highly topical information. By tracing through CompletePlanet's subject structure or searching Deep Web sites, you can go to various topic areas, such as energy or agriculture or food or medicine, and find rich content sites not accessible using conventional search engines. BrightPlanet initially developed the CompletePlanet compilation to identify and tap into many hundreds and thousands of search sources simultaneously to automatically deliver high-quality content to its corporate and enterprise customers. It then decided to make CompletePlanet available as a public service to the Internet search public.
  8. Infoplease: Encyclopedia, Almanac, Atlas, Biographies, Dictionary, Thesaurus. Information Please has been providing authoritative answers to all kinds of factual questions since 1938—first as a popular radio quiz show, then starting in 1947 as an annual almanac, and since 1998 on the Internet at www.infoplease.com. Many things have changed since 1938, but not our dedication to providing reliable information, in a way that engages and entertains.
  9. DeepPeep: discover the hidden web DeepPeep is a search engine specialized in Web forms. The current beta version currently tracks 45,000 forms across 7 domains. DeepPeep helps you discover the entry points to content in Deep Web (aka Hidden Web) sites, including online databases and Web services. Advanced search allows you to perform more specific queries. Besides specifying keywords, you can also search for specific form element labels, i.e., the description of the form attributes.
  10. IncyWincy: The Invisible Web Search Engine IncyWincy is a showcase of Net Research Server (NRS) 5.0, a software product that provides a complete search portal solution, developed by LoopIP LLC. LoopIP licenses the NRS engine and provides consulting expertise in building search solutions.

Metadata

  1. Description Schema: MODS (Library of Congress) and Outline of elements and attributes in MODS version 3.4: MetadataObject This document contains a listing of elements and their related attributes in MODS Version 3.4 with values or value sources where applicable. It is an "outline" of the schema. Items highlighted in red indicate changes made to MODS in Version 3.4.All top-level elements and all attributes are optional, but you must have at least one element. Subelements are optional, although in some cases you may not have empty containers. Attributes are not in a mandated sequence and not repeatable (per XML rules). "Ordered" below means the subelements must occur in the order given. Elements are repeatable unless otherwise noted."Authority" attributes are either followed by codes for authority lists (e.g., iso639-2b) or "see" references that link to documents that contain codes for identifying authority lists.For additional information about any MODS elements (version 3.4 elements will be added soon), please see the MODS User Guidelines.
  2. wiki.dbpedia.org : About DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. We hope this will make it easier for the amazing amount of information in Wikipedia to be used in new and interesting ways, and that it might inspire new mechanisms for navigating, linking and improving the encyclopaedia itself.
  3. Semantic Web - W3C In addition to the classic “Web of documents” W3C is helping to build a technology stack to support a “Web of data,” the sort of data you find in databases. The ultimate goal of the Web of data is to enable computers to do more useful work and to develop systems that can support trusted interactions over the network. The term “Semantic Web” refers to W3C’s vision of the Web of linked data. Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data. Linked data are empowered by technologies such as RDF, SPARQL, OWL, and SKOS.
  4. RDA: Resource Description & Access | www.rdatoolkit.org Designed for the digital world and an expanding universe of metadata users, RDA: Resource Description and Access is the new, unified cataloging standard. The online RDA Toolkit subscription is the most effective way to interact with the new standard. More on RDA.
  5. Cataloging Cultural Objects Cataloging Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) is a manual for describing, documenting, and cataloging cultural works and their visual surrogates. The primary focus of CCO is art and architecture, including but not limited to paintings, sculpture, prints, manuscripts, photographs, built works, installations, and other visual media. CCO also covers many other types of cultural works, including archaeological sites, artifacts, and functional objects from the realm of material culture.
  6. Library of Congress Authorities (Search for Name, Subject, Title and Name/Title) Using Library of Congress Authorities, you can browse and view authority headings for Subject, Name, Title and Name/Title combinations; and download authority records in MARC format for use in a local library system. This service is offered free of charge.
  7. Search Tools and Databases (Getty Research Institute) Use these search tools to access library materials, specialized databases, and other digital resources.
  8. Art & Architecture Thesaurus (Getty Research Institute) Learn about the purpose, scope and structure of the AAT. The AAT is an evolving vocabulary, growing and changing thanks to contributions from Getty projects and other institutions. Find out more about the AAT's contributors.
  9. Getty Thesaurus of Geographic Names (Getty Research Institute) Learn about the purpose, scope and structure of the TGN. The TGN is an evolving vocabulary, growing and changing thanks to contributions from Getty projects and other institutions. Find out more about the TGN's contributors.
  10. DCMI Metadata Terms
  11. The Digital Object Identifier System
  12. The Federal Geographic Data Committee — Federal Geographic Data Committee

Videos from the IASSIST 2011 Plenaries

Hello - for those who were not able to attend IASSIST 2011 and for those asking to have access to video presentations, the two videos of the Plenaries from IASSIST 2011 are now available for viewing:

Chuck Humphrey - Data Library Coordinator, University of Alberta
Research Data Infrastructure: Are the Social Sciences on Main Street or a Side Road?

Chuck Humphrey is passionate about data and has been examining research data infrastructure with a global perspective. His talk will locate the social sciences in the broader E-science picture and give us a glimpse of the future.

Plenary II 

Date: Thursday, June 02 

Video   

 

Andrea Reimer - Councillor, City of Vancouver
Open Data in Vancouver: The Inspiration and the Vision


Andrea Reimer is a Councillor for the city of Vancouver and is a passionate advocate for democracy and civic engagement. The City of Vancouver has led the way with the adoption of a resolution in May [2009] that endorsed open and accessible data, open standards, and open source software. Ms Reimer has been heavily involved in this initiative and will share her passion with IASSIST.

Plenary III 

Date: Friday, June 03 

Video

 

The QuickTime .mov files are available in a variety of viewing formats: via desktops, iPhone, iPod, iPad, smartphones.

We have had various and mixed reports on streaming successes. These are large Video files (each over an Hour in length), so patience is required.


SPARC Digital Repositories meeting includes session on open data

Kathleen Shearer of the Canadian Association of Research Libraries organized and chaired a panel on Open Data was held at the SPARC Digital Repositories meeting on November 8, 2010.  IASSIST members Gail Steinhart and Chuck Humphrey were two of the three members on this panel.  Kevin Ashley, Director of the Digital Curation Centre (DCC), was the third. more...

US and UK governments embrace 'open data'

The US Open Government Directive, released on December 8, 2009, instructs all federal agencies to provide high-value information to the public online in open, accessible, machine-readable formats. more...

Conference webcasts and presentations online!

A week has passed since IASSIST 2009. I hope most of you have made it safely back home by now - and are ready to refresh the memories by watching the conference webcasts and viewing presentations. Webcasts of all three plenaries and Thursday and Friday's concurrent sessions in the Small Auditorium are now available. We didn't have cameras available during the Wednesday sessions, so no videos of these presentations, sorry! But most of the presentations are already online - a few are still missing either because we didn't have them or we are waiting for an updated version. Please send in any missing presentations or email me if there are mistakes that should be corrected!

 

Tuomas J. Alaterä Information Network Specialist tuomas.alatera@uta.fi Finnish Social Science Data Archive (FSD) http://www.fsd.uta.fi FI-33014 University of Tampere

Special IQ: Moving Research Data Into and Out of Institutional Repositories

The IASSIST Quarterly IQ Vol. 31 issue 3&4 is now available on the web:

http://iassistdata.org/publications/iq/iqvol31.html

This issue will only be available on the web. There will be no printed version mailed out to the membership.

This double issue is the work of the authors and their articles are introduced below. We are presenting an integrated double issue of high quality. We should also give a special thanks to the editors of the issue. Gretchen Gano is the writing guest editor of this IQ as you can see below. Gretchen Gano is the Assistant Curator Librarian for Public Administration & Government Information and Coordinator, Data Service Studio at New York University Libraries. Gretchen Gano collaborated on this issue from the start with former IASSIST president Ann Green. Together with the authors a great issue has been made.

Enjoy

Karsten Boye Rasmussen, IQ editor, associate professor, kbr@sam.sdu.dk, Marketing & Management, SDU, University of Southern Denmark +45 6550 2115

Guest Editor's Notes:

The 2008 IASSIST Conference, “Technology of Data: Collection, Communication, Access and Preservation” included a session entitled “Moving Research Data Into and Out of Institutional Repositories” from which several papers emerged. In “Interoperability Between Institutional and Data Repositories: a Pilot Project at MIT”, Katherine McNeill describes a pilot project to enhance study discovery between two repository systems housed in the same institution, DSpace and the Institute for Quantitative Social Science Dataverse Network, by enabling the harvesting and replication of metadata and content across the two systems. In a related project across the pond, Libby Bishop scales this discussion in her description of crossinstitutional collection sharing between the University of Leeds and the UK Data Archive in the Timescapes project. Bishop asserts that coordination among multiple agents is likely to be challenging under any circumstances. Challenges magnify when the trajectories of different life cycles, for research projects and for data sharing, are considered. Robin Rice echoes these sentiments in her article on the DISC-UK DataShare Project, a collaboration between the Universities of Edinburgh, Oxford and Southampton and the London School of Economics. Rice provides visual evidence in a compelling diagram of the data sharing continuum based on storage, discovery, and preservation conditions of the digital research materials at each level along the scale -- from the lowly thumb drive to the officious national archive. We see plainly that as one moves up the continuum, more and more human effort and intervention is required to craft the discovery, access, analytic and preservation environment. In other words, data curators matter.

Two other papers tackle these challenges by emphasizing the needs of data producers. Luis Martinez-Uribe introduces the University of Oxford’s Scoping Digital Repository Services for Research Data Management project and the findings of a requirement gathering exercise. While the study results reveal researchers’ needs and workflows. Martinez-Uribe asserts that the study process itself made an impact on the participants. Study participants reflected on and, as a result, fine-tuned how they work with data, why they create these materials in the first place and were able to articulate reasons for managing these resources the way they do. Similarly, Research Data & Environmental Sciences Librarian, Gail Steinhart, writes about the development of DataStaR, a Data Staging Repository hosted by Cornell University’s Albert R. Mann Library. The project developed as a “managed workspace” where researchers contribute datasets they are still actively using in direct response to questions that have to do with sharing in the active research environment, rather than an archival one.

While the authors in this issue describe projects going on in many different places and settings, taken together, these articles address common themes. All address the challenge of scaling data exchange between systems and then between institutions. This raises the perennial question of standards: by what mechanisms will we set them, and how well will we be able to follow them and still accommodate local needs? The importance of aligning repository services with researcher needs is another common thread. Data managers must ask, “how will the active researcher benefit from curation efforts”? The answer may be that benefit is more than finding or accessing a particular resource (yep, I have downloaded the whole thing and all the bits are there), but instead being able to examine this resource in many ways (okay, lets run frequencies, now I want to see it on a map, and let’s include some other variables). This is a rich reuse experience, creating a real digital “laboratory.”

Finally, each contributor notes the expanding role of data manager. In its own way, each project described here moves data managers upstream, pre-publication, into the place where research is actively happening. Though all of the articles focus on technological choices and architectures to support research data curation, it is striking to realize that each of these choices emerge from old-fashioned personal, social, and organizational relationships. What we can strive for as data and information managers is to work together as fellow researchers and to be ever curious about how these partnerships and the sharing of information back and forth can be enhanced by thoughtful information and technology design. Some call this the digital plumbing, but I like to think of it as e-gilding.

Gretchen Gano, New York University Libraries

New IQ!

The IASSIST Quarterly (IQ Vol. 31 issue 2 - 2007) is now available on the web:

 

http://iassistdata.org/publications/iq/iqvol31.html

 

This issue will be printed and mailed to the membership. From next issue IASSIST will be saving trees and only publish the IQ on the web. We hope you agree with our decision. Thanks.

  more...

Open Access and Reuse of Research Data in Finland

In 2006, motivated by the OECD Open Access guidelines, the Finnish Social Science Data Archive (FSD) carried out an online survey targeting professors of human sciences, social sciences and behavioural sciences in Finnish universities. Professors were asked, for example, whether their department had any guidelines on the preservation of digital research data. A great and alarming majority (90%) said no. The survey also charted what actually happens to research data and what are the barriers to and benefits of open access to research data.

In 2006, motivated by the OECD Open Access guidelines, the Finnish Social Science Data Archive (FSD) carried out an online survey targeting professors of human sciences, social sciences and behavioural sciences in Finnish universities.

Professors were asked, for example, whether their department had any guidelines on the preservation of digital research data. A great and alarming majority (90%) said no.

What then happens to research data? Most common practise seems to be that the data remains in the hand of the original researcher(s). Even if the data are stored in the department or research insitute, no further processing nor documentation takes place. FSD's influence could be seen in social sciences, making archiving at a data archive a bit more frequent than in other sciences.

The survey also charted barriers to open access. Professors were concerned about inadvertent misuse of data and consequent mistakes. Of course, without detailed documentation, data reuse may indeed result in inaccurate interpretations. Lack of agreements regarding data ownership and IP rights were also mentioned as barriers, as well as loss of competitive advantage, IT problems, and confidentiality issues.

On the other hand, the professors saw many benefits in open access to research data. The most significant was enhancing the diversity of research designs with the use of archived data. All in all, the benefits were estimated to be more significant than the barriers. The survey also showed - not surprisingly - that it is usual to a researcher to have a positive attitude towards open access in general but a less-than-enthusiastic one to open access to his/her own data.

The report concludes that from the viewpoint of long-term preservation and reuse, it is definitely less recommendable to leave the responsibility for the preservation and dissemination of data to individual researchers. Changing this practice that still prevails in Finnish universities and other Finnish research organisations constitutes one of the key goals in the national implementation of the OECD Recommendation.

An abridged version of the report is available in English:
Arja Kuula & Sami Borg (2008). Open Access to and Reuse of Research Data - The State of the Art in Finland. University of Tampere. Finnish Social Science Data Archive; 7. ISBN: 978-951-44-7479-8.

Download the report as a PDF file.

The survey data is naturally available, too:
FSD2268 Open Access to and Reuse of Research Data 2006

Mari Kleemola
Finnish Social Science Data Archive
IASSIST European Regional Secretary

  • Iassist Quarterly

    Publications

    Sharing data and building information

    With this issue (volume 35-3, 2011) of the IASSIST Quarterly (IQ) we return to the regular format of a collection of articles not within the same specialist subject area as we have seen in recent special issues of IQ. Naturally...
    more...

  • Resources

    Resources

    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...