Archive for September, 2005



[CSG Fall 2005] Dinner cruise and conversation

Last night featured a lovely cruise on Pittsburgh’s three rivers – the weather was lovely, the beer (Iron City) was free, and the company was great.

Jack McCredie from UC Berkeley regaled us with tales of growing up in Pittsburgh when it was still the steel city with all the mills going on along the river – now it’s all been converted to restaurants, parks, and ball stadiums.

The evening ended with a long chat between Terry Gray, Lisa Dusseault and me. While I won’t try to recap the conversation, it revolved around IMAP, HTTP, connection vs. connectionless protocols, authentication for WebDAV, etc.

Here’s a photo of Terry and Jack Duwe on the boat.

Image(67)

[CSG Fall 2005] Cliff Lynch on random musings

Cliff Lynch is noting that we talked a tremendous amount about repositories today, and that the approach was largely technical. If you talk to administrators, faculty, or librarians, you’ll get very different views on what these are for, even though when you peel away the social and political layers you get something that looks very similar.

CNI has been concentrating for the past year or two on what are referred to as Institutional Repositories – a service that has a significant insitutional commitment behind it that documents the academic or cultural life of the institution. That covers a place where you put digital materials created by faculty, documenting performances that happen in a university community. Typically it’s not a place where you’re doing frontline teaching and learning – it’s not a course management system. Course management systems generally have lots more specialized services than repositories. There are lots of questions about how rich the functionality in institutional repositories should be.

If you think about putting documents in and getting them out and perhaps migrating document formats over time, that’s a good set of functions. Now let’s think about video – you can think about it in terms of a large file – put it in, pull it out – what you do with after is not the repository’s business. But you can think of video in the context of including streaming, handling different bit rates, etc etc. These are the types of scoping questions we see around institutional repositories.

There are lots of repositories on campus besides just institutional repositories and course management systems. All kinds of research groups setting up repositories – often on the same software used for the institutional repository. What’s different is the scale of implementation and the extent of institutional commitment.

There are two major streams of argument that have been used to support deployment of institutional repositories. One talks about the move to production of scholarship in digital form – scholarship that is more than page images, but encompasses datasets, software, simulations, etc that don’t fit into the tradition of scholarly journals or monographs. In order to keep scholarship healthy institutions need to take responsibility for archiving and maintaining these materials.

The other argument runs around a set of issues that go around the rubric of “open access” – a policy position that says that the reporting of scholarship should be free and openly accessible and that the Internet makes that possible at a low cost and that it breaks down barriers to scientific progress, bridges equity gaps between nations and communities. On other argument that has some political traction is that a tremendous amount of research is paid for by the government and that citizens have the right to access it. This is the open access thesis. One of the strategies is for scholars to deposit copies of works into public repositories, either institutional or discipline-based. This approach is getting traction in both the US and Europe.

You have these two justifications, but we don’t really know much about what is in institutional repositories or how many of them are deployed. CNI did a project on repositories in 13 countries and then pulled together a meeting in Amsterdam to understand similarities and variations in implementations. There are two articles on this in D-LIB magazine last week.

A couple of significant highlights – there are a couple of nations in Europe that have an institutional repository deployed in every higher education institution in the country. There are other nations where deployment rates are very low. In the US they looked at CNI membership which is primarily research institutions. They found around 40% had some sort of repository deployed, and around 80% of the rest had some planning underway.

In almost all institutions the intellectual leadership for this activity has come from the Libraries.

If you look at the European data they are doing this mostly about open access, and if you look at the material in repositories it’s mostly textual material. If you look in the US the picture is quite different – there’s lots of stuff that isn’t textual. Everything from architectural models, video, datasets, software, etc. Institutional repositories may be picking up the need for places to store data that are filled by national data centers in other countries.

While we thought we had a reasonable working definition of institutional repository, the thing that came through very clearly is how chaotic the campus environment is. The relationships of repositories and course management are confused, there are lots of departmental repositories where people don’t talk together or to the central repository. Lots of confusion over what’s a digital library and what’s an institutional repository. It would be useful to try to get some working definitions at least at a campus level.

There is considerable interest at the policy level in the US in starting to get a handle on the datasets that produced as a result of research activity. The NIH put a requirement on all grants over $.5 million to have a data plan. The grant holders naturally want to hand over long term responsibility for this to the institution. The National Science Board issued a set of policy recommendations around long-lived data standards. It’s worth looking at because this is the beginning of setting policy principles that will affect grant awards at institutions that will drive us to deal with data stewardship. The Office of Science and Technology Policy has also picked up on this report.

CNI as of earlier this week started an informal call for experience from institutional representatives to get additional insight into what’s going on.

[CSG Fall 2005] UMichigan’s Google Library digitization project

John Wilkin from U Michigan is talking about the University’s deal with Google for digitizing library content.

Larry Page from Google is a Michigan grad, and at a dinner on campus he said he’d like to digitize the entire library collection, and they took him seriously. They agreed on non-destructive conversion that would produce files of sufficient resolution to serve as a stand-in for the physical object, and the University would maintain rights to the materials.

The bound print content of the Library will be digitized – the Library holds seven million volumes.

The contract between the University and Google is online at http://www.lib.umich.edu/mdp/. There is a lightweight set of indemnifications in the agreement. There is agreement that the materials will not be out of circulation for long.

Copies of the images go to both Google and the University.

Why did they do this? Ubiquitous access is part of what it means to be a research library. Having access through Google widens access.

Why would Google do this? To “help maintain the preeminence of books and libraries in our increasingly Internet-centric culture…”

They University gets a package of files for every volume that’s identified by barcode – 600dpi bitonal images for print and 300dpi JPEG color/grayscale for illustrations. Michigan reports that the OCR quality is good.

[CSG Fall 2005] UVa Virginia Digital Library

Tim Sigmon is talking about the development of the UVa digital library, where they area attempting to really offer integrated searching and delivery of digital library content.

One of the issues was coming up with common metadata for describing these digital objects. There was a steering group to review formats and come up with standards. There are descriptive and administrative metadata standards.

They also needed new specs for how images would be stored in these collections. Three content models were developed uvaHighRes, which includes preview, screen-sized and high quality large image; ivaLowRes – only preview and screen-sized images; and uvaBitonal- bitonal TIFFs only. One content model and production standard were set for image metadata.

Texts are represented in a local extension of the TEI DTD, along with encoding guudelines. There are three content models for text: uvaGenText – transcription with no page images; uvaPageBook – page images with no transcription; and uvaBook which has both transcription and page images. All page images must conform to the image standards.

Archival finding aids were specified – uvaEAD (encoded archival description 2002). Images and texts must conform to the content models.

There are two default disseminators on every object: Default access behavior, including getPreview, getFullView, getLabel, getDefaultContent; and Admin and descriptive metadata behaviors. There are also class-specific disseminators for different kinds of objects.

They built some tools for users of the DL, including a “shopping cart” for people to collect their digital objects as they search and browse, and then do things with that collection, e.g. to create a slide-show for a lecture.

They had to create processes to convert legacy images, texts, and finding aids, as well as the work flow for getting content into the repository – this wasn’t primarily a technical issue but changing the way library catalogers do business.

There’s a demo at http://www.lib.virginia.edu/digital/collections/

[CSG Fall 2005] UVa Virginia Digital Library

Tim Sigmon is talking about the development of the UVa digital library, where they area attempting to really offer integrated searching and delivery of digital library content.

One of the issues was coming up with common metadata for describing these digital objects. There was a steering group to review formats and come up with standards. There are descriptive and administrative metadata standards.

They also needed new specs for how images would be stored in these collections. Three content models were developed uvaHighRes, which includes preview, screen-sized and high quality large image; ivaLowRes – only preview and screen-sized images; and uvaBitonal- bitonal TIFFs only. One content model and production standard were set for image metadata.

Texts are represented in a local extension of the TEI DTD, along with encoding guudelines. There are three content models for text: uvaGenText – transcription with no page images; uvaPageBook – page images with no transcription; and uvaBook which has both transcription and page images. All page images must conform to the image standards.

Archival finding aids were specified – uvaEAD (encoded archival description 2002). Images and texts must conform to the content models.

There are two default disseminators on every object: Default access behavior, including getPreview, getFullView, getLabel, getDefaultContent; and Admin and descriptive metadata behaviors. There are also class-specific disseminators for different kinds of objects.

They built some tools for users of the DL, including a “shopping cart” for people to collect their digital objects as they search and browse, and then do things with that collection, e.g. to create a slide-show for a lecture.

They had to create processes to convert legacy images, texts, and finding aids, as well as the work flow for getting content into the repository – this wasn’t primarily a technical issue but changing the way library catalogers do business.

There’s a demo at http://www.lib.virginia.edu/digital/collections/

[CSG Fall 2005] Hypercontent

Alex Vigdor from Columbia University is talking about HyperContent – a web content management system that is a JA-SIG project.

Alex’s slides are here.

The client is browser-based, with data kept in XML. It maintains a history of file revisions and has granular permissions. It allows set up of approvals, notifications and scehduled publications and emails.

The resulting content is pushed out to be served by Apache or whatever web server you use.

It automatically generates navigation and site maps.

Content authoring tools include wysiwyg html and xml editing, image conversion, drag & drop navigation and site map amanagement, dublin core metadata, vcard contact info w/ldap lookup, spell checking in multiple languages, etc. It also provides image watermarking.

Access to the repository includes local, FTP, & SFTP. They have plans to support WebDAV in the future. The publishing is handled by a queue system that handles the staging of processing and distribution among cluster members.

There’s a simple but functional workflow model.

There’s pluggable authentication which uses JAAS LoginModules. They’ve made it compatible with CAS too. They will integrate with JA-SIG groups and permissions that have been split out from uPortal.

They can feed XML or XHTML to uPortal channels.

Version 2 beta is winding down.

WSRP and UDDI portlet publishing will be looked into soon.

[CSG Fall 2005] Carnegie Mellon’s Content Management selection

Update 22 Feb 2006 – Doug (not Tom!) wrote to let me know I got his first name wrong – sorry, Doug!

Tom Doug Blair from CMU is talking about their selection process for a Content Management system.

Tom Doug notes that there are institutional problems and that you have to have institutional will to solve the problems. They have about 120 web practitioners in a group that is guiding the process. THere are six committees: Portal, Search, Standards & Practices, Infrastructure, Marketing, and CMS.

The CMS committee interviewed a couple of dozen people across the campus about their web publishing practices. They brought them back in the room to reflect back their findings – they found that having that discussion changed the answers and refined the results.

From that they wrote an RFP, which they checked with a consultant to make sure they were using the same terminology as vendors use. That RFP was released today.

Tom Doug notes that there are several things a CMS will not do – including improving the quality of content, changing human processes, or making content more timely.

He also notes to expect some resistance from practitioners. It’s important the process is transparent to the people participating – and requires thinking broadly about governance of the process.

[CSG Fall 2005] Content Management at Georgetown

Piet Niederhausen from Georgetown isn talking about their content management.

Piet differentiates managing departmental content from institutional content.

They have a graph of their institutional content and how the major pieces relate to each other. For instance the CMS manages content about people, so a faculty member would have CG, Media profiles, publications, etc.

The idea of syndicating content to be used in different forms such as RSS, podcasts, etc becomes important. This implies a cultural shift where presentation of content is driven by topics rather than organizations or units. The CMS tools should be able to gather content and aggregate it and syndicate it. So they find themselves working on a separate syndication layer which is separate. It’s about collecting data from different places, caching it, and making it available in different forms.

The CMS should be able to easily reference information stored in various repositories, such as in the course management system, institutional filesystems and departmental websites, etc.

This is an interesting and holistic view of content management that bears thinking about.

[CSG Fall 2005] Content Management Survey

Tom Dopirak is going over the survey that was done on Content Management for the CSG meeting. Slides are online at http://www.stonesoup.org/Meeting.next/repos.pres/

Only about ten institutions offer institutional level CM systems, and they’re mostly vendors instead of open source implementations.

The primary business drivers were to distribute responsibility for content development and to separate content from design.

Most are using Web browsers as publishing targets but many also target mobile devices. Delaware targets both RSS and email for publishing.

Almost none of the respondents are using complex workflow. Most are using just two roles. In response to a question Georgetown stated that they found that most departments do all of the review of content offline before it gets into the CMS, and that almost no roles are used.

[CSG Fall 2005] OKI Repository OSIDs

Jeff Merriman from MIT is talking about the OKI Repository OSIDs.

Jeff makes the distingtion between Data Specifications, Interface specifications, and protocol specifications as separate parts of interoperability that should be separated out explicitly.

The Repository OSID is a Service Interface only – it’s silent on protocol/access technology. There are Java, PHP, and Objective-C instantiations of the binding. The spec is supportive of various metadata through “typing”.

Jeff demonstrated a number of applications searching and retrieving data from a range of disparate repositories using the OSIDs. One compelling application is the Vue2 image-enabled concept mapping tool, which allows hierarchical narratives as presentations.

Documentation is online at http://www.okiproject.org/specs/osid_12.html

In response to a question about the relationship between this Repository OSID and the JSR170 content management spec, Jeff talked about how the OSID is more narrowly focused (JSR170 includes things like workflow that will be in other OSIDs), but that it should be possible to map from one to the other and they plan to work on that at some point.

« Previous PageNext Page »


subscribe

Pages

Latest tweets

interesting links

What I’m listening to

 

September 2005
M T W T F S S
« Aug   Oct »
 1234
567891011
12131415161718
19202122232425
2627282930  

Follow

Get every new post delivered to your Inbox.