Blogs of interest - library matters

Lorcan Dempsey's weblog
(On libraries, services and networks.)
Social tools and science

In her report on Open science at webscale, I was interested to see Liz Lyon give the following list of tools used to share their work by researchers.

Currently, researchers are using open science tools such as:
  • Connotea for reference management
  • Mendeley (which applies LastFM principles associated with music selections to journal
  • articles)
  • Friendfeed (for threaded discussion and aggregation)
  • Scivee and YouTube (for sharing experimental methodologies and protocols)
  • SciLink and Nature Networks (for social networking)
  • myExperiment (for sharing workflows)
  • eyeLIMS (an open source Laboratory Information Management System)
  • LabLit.com (about science/laboratory culture in the literature and media)
  • ConceptWeb (from WikiProfessional and includes WikiPeople and WikiProteins)
[Open science at webscale - PDF]
Libraries and e-science

Emerging data-intensive e-science presents many support challenges for institutions, disciplines and national bodies to work through. The role of the academic library in this multiscale world is also an open question. Two recent reports discuss e-science (or 'cyberinfrastructure' or 'e-research') in general terms and repay reading.

Liz Lyon, the Director of UKOLN, and also a principal in the Digital Curation Centre, has focused on this area for several years now and has produced an interesting synthesising report for the JISC: Open science at web-scale: optimising participation and predictive potential: consultative report [Summary; Full report PDF]. An important theme of the report is 'data informatics', defined in this way: "library and information science methodologies which have been applied to research data".

The report is organized around six 'consultation challenges'. The first is 'scale, complexity and predictive potential'. Here is the summary:

Data-intensive science powered by contemporary computational hardware, software and research techniques, enables scientists to perform experiments and calculations at different orders of magnitude of scale and volume: research that was completed in a year can now be repeated in a weekend. Sustained growth in data modelling, complex simulations and visualisations, facilitate interpretation and analysis by humans and machines, leading to the development of predictive science scenarios in a wider range of disciplines. Examples of data intensive science at these extremes of scale, which enable forecasting and predictive assertions, have been described.
Assessments of the accuracy and robustness of predictions are linked to uncertainty quantification, the accuracy of the underlying model, and the integrity of the data. Key questions address community awareness and understanding of the potential implications and impact of (open) data-intensive science at new extremes of scale and complexity, and the service requirements for associated data curation and preservation. [Open science at web-scale: Optimising participation and predictive potential - summary]

To give some flavor of concerns, here are the other challenges: Continuum of openness; citizen science; credentials, incentives and rewards; institutional readiness and response; data informatics capacity and capability. A brief chapter is devoted to each.

The author is positive about the role of libraries and librarians, particularly in the data informatics section. That said, given the absence of routine service and organizational responses the library role is still expressed in very general terms. What it might mean in practice is naturally less well developed.

The other publication is a collection of essays assembled in honor of Jim Gray:

In The Fourth Paradigm: Data-Intensive Scientific Discovery, the collection of essays expands on the vision of pioneering computer scientist Jim Gray for a new, fourth paradigm of discovery based on data-intensive science and offers insights into how it can be fully realized. [The fourth paradigm]

For Gray the first three paradigms are experimental, theoretical, and computational.

We said, "Look, computational science is a third leg." Originally, there was just experimental science, and then there was theoretical science, with Kepler's Laws, Newton's Laws of Motion, Maxwell's equations, and so on. Then, for many problems, the theoretical models grew too complicated to solve analytically, and people had to start simulating. These simulations have carried us through much of the last half of the last millennium. At this point, these simulations are generating a whole lot of data, along with a huge increase in data from the experimental sciences. People now do not actually look through telescopes. Instead, they are "looking" through large-scale, complex instruments which relay data to datacenters, and only then do they look at the information on their computers.
The world of science has changed, and there is no question about this. The new model is for the data to be captured by instruments or generated by simulations before being processed by software and for the resulting information or knowledge to be stored in computers. Scientists only get to look at their data fairly late in this pipeline. The techniques and technologies for such data-intensive science are so different that it is worth distinguishing data-intensive science from computational science as a new, fourth paradigm for scientific exploration [1]. [Jim Gray on escience - PDF.]

The collection of essays is divided into these sections: Earth and environment; Health and wellbeing; Scientific infrastructure; Scholarly communications. And there are opening and concluding sections. The contributions are readable and in the form of short essays rather than research papers. There is a contribution by Cliff Lynch on the changing scholarly record, by Timo Hannay on the impact of the network on the structure of science, and by Herbert Van de Sompel and Carl Lagoze on the enhancement of the scholarly record with actionable structure.

There is no specific contribution on libraries, and it is interesting to note that the directions of much of the occasional mention of libraries is towards network level digital libraries.

It is important for libraries to understand these changes. The reshaping impact of the network on learning and research behaviors is a more important factor for libraries than the direct impact of the network on library processes themselves.

Reputation enhancement redux

I wrote recently about the growing interesting in reputation management on the web.

Reputation management on the web - individual and institutional - has become a more conscious activity for many, as ranking, assessment and other reputational measures are increasingly influenced by network visibility. In particular, it raises for academic institutions an issue that has become a part of many service decisions: what is it appropriate to do locally? What should be sourced externally? And what should be left to others to do? [Reputation enhancement]

This is a wide-ranging issue, pulling together in various ways overlapping issues such as individual and institutional disclosure of research and other outputs; emerging academic social networking practices; formal expertise and research output management; search engine optimization strategies; practices for improving citation, ranking and reputation measures; social reference/bibliography; and so on. I think that we will see some of this activity become more routine in organizational and operational terms over the next few years.

In this context, I was interested to see a presentation on research support by Rachel Cowan and Alex Hardman from the University of Manchester. They focus on reputation and network identity as important parts of overall research management.


The presentation has three strands: developing reputation through a digital identity, keeping on top of the literature, and extending research connections. Of these, the first and third relate broadly to reputation enhancement or management in a web environment.

They ask the audience if a personal Google search does a good job of showcasing their identity and research. (This reminds me of Tony Hirst's comment that our 'home page' in now the first page of Google results.) Then they talk through some of the ways in which people develop digital identities (blogs, twitter, ...). They also review some social networking and other tools of interest in an academic context.

Here is their overview of activities mapped onto services (click to see in situ with ability to enlarge):

researcheridentity.png

QOTD: protocol-based time travel for the web

We are pleased that Herbert Van de Sompel will be talking about Memento, a joint project of Los Alamos National Laboratory and Old Dominion University, at OCLC later this month. We will make a webcast available; see the details here. If you are in Central Ohio, come by ....

Here is a recent paper describing the work:

The Web is ephemeral. Many resources have representa- tions that change over time, and many of those represen- tations are lost forever. A lucky few manage to reappear as archived resources that carry their own URIs. For ex- ample, some content management systems maintain version pages that reflect a frozen prior state of their changing re- sources. Archives recurrently crawl the web to obtain the actual representation of resources, and subsequently make those available via special-purpose archived resources. In both cases, the archival copies have URIs that are protocol- wise disconnected from the URI of the resource of which they represent a prior state. Indeed, the lack of temporal capabilities in the most common Web protocol, HTTP, pre- vents getting to an archived resource on the basis of the URI of its original. This turns accessing archived resources into a signicant discovery challenge for both human and software agents, which typically involves following a mul- titude of links from the original to the archival resource, or of searching archives for the original URI. This paper proposes the protocol-based Memento solution to address this problem, and describes a proof-of-concept experiment that includes major servers of archival content, including Wikipedia and the Internet Archive. The Memento solution is based on existing HTTP capabilities applied in a novel way to add the temporal dimension. The result is a frame- work in which archived resources can seamlessly be reached via the URI of their original: protocol-based time travel for the Web. [Memento]
Libraries and the long tail: intro

Discussing grades of availability in my last post, I mention an article I wrote a few years ago on libraries and the long tail. Here is how it starts:

Discussions of the long tail that I have seen or heard in the library community strike me as somewhat partial. Much of that discussion is about how libraries contain deep and rich collections, and about how their system-wide aggregation represents a very long tail of scholarly and cultural materials (a system may be at the level of a consortium, or a state, or a country). However, I am not sure that we have absorbed the real relevance of the long tail argument, which is about how well supply and demand are matched in a network environment. It is not enough for materials to be present within the system: they have to be readily accessible ('every reader his or her book', in Ranganathan's terms), potentially interested readers have to be aware of them ('every book its reader'), and the system for matching supply and demand has to be efficient ('save the time of the user'). [Libraries and the long tail: some thoughts about libraries in a network age]

Incidentally, I think Ranganathan's 5 laws [Wikipedia entry] remain relevent in lots of ways to current discussions, as above.

synthesize-specialize-mobilize
(According to Robin Murray, libraries are transitioning from an acquire-catalog-circulate model to one that could be described as synthesize-specialize-mobilize. Discuss.)
flatlands and failures of curation
the rise of the verticals
Summon 'web scale'? I don't think so.
WorldCat Local Review
Economist on Google Books
inkdroid
($pithy_personal_mission_statement)
MARCetplace
Last Saturday I passed the time while waiting in line at the DMV by reading the recently released Study of the North American MARC Records Marketplace. The analysis of the survey results seem to focus on the role of the Library of Congress in the marketplace, which is understandable given that LC funded the report. [...]
cloaking and fulltext
It’s comforting to know that California Digital Library are selectively serving up fulltext content in HTML from their institutional repository for search engines to chew on. For example, compare the output of: curl http://escholarship.org/uc/item/2896686x with: curl --header "User-Agent: Googlebot/2.1 (+http://www.google.com/bot.html)" http://escholarship.org/uc/item/2896686x You should see full-text content for the article in the latter and not in the former: ... qt2896686x repo "Wholly [...]
skos as atom
I’ll be the first to admit the tone and content of my last post was a bit off kilter. I guess it was pretty clear immediately from the title of the post. Chalk it up to a second night of insomnia; and also to my unrealistic and probably unnecessary goal of bringing the Atom/REST camp [...]
alien vs predator: www-style
I finally got around to reading Web Services for Recovery.gov by Erik Wilde, Eric Kansa and Raymond Yee. The authors wrote the report with funding from the Sunlight Foundation, who are deeply engaged in improving the way the US Federal Government provides transparent access to its data assets. I highly recommend giving it a read [...]
hackability
Adam Bosworth has some good advice for would-be standards developers in the form of a 7 item list. It is strangely reassuring to know that someone in the US Federal Government called someone like Adam for advice about standards…even if it was at some inhuman hour. Number 5 really resonated with me: Always have real implementations [...]
Error formatting macro: rss: java.lang.IllegalArgumentException: Invalid document
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.