Archive for December, 2006



[ECAR 2006] John Willinsky – Sustaining Access to Knowledge and Scholarly Publishing

John Willinsky is Professor in the Department of Language and Literacy at the University of British Columbia.

The state of scholarly publishing and access to knowledge is his topic. The IT professional in universities has a new role that they need to act on. We need to move beyond something like sustainability – as a guitarist he notes that you have to work your fingers to keep the sustain, not just wait. Sustainability is just the status quo – but we know in our hearts that’s not what this is about – we need to be very conscious of the values we want to sustain. And in access to knowledge the status quo is not working – we have degradation in that access.

Sustainability speaks to a business model – which in his business as an academic is the evil twin – what would Socrates have said when asked for a business model?

We need to identify those values that are at the core of what we do and sustain those, but we need to expand and extend what we do, not just sustain.

The good news is that this is a world of the knowledge economy – that is our ship coming in. That should have put the university at the very center of that economy – and anything that interferes with the university’s production and aggregation of knowledge is a threat to that economy – but that didn’t happen.

Google has changed the equation in terms of access to knowledge. Google has offered to digitize all back issues of journals, with the journals maintaining the copyright ownership and only showing ads when the journals ok it, and sharing the revenues with the journals when they do. Only one journal in Canada has taken them up on the offer.

The situation is one of corporate concentration. John Wiley just offered to purchase Blackwell. This creates a publishing house of 1200 journals. Reed-Elsevier, 2000 journals, etc. 6000 titles owned by four corporate entities. Libraries are having to buy in bundles of titles, having to sign non-disclosure agreements on the pricing. Only very few of those bundles allow you to cancel single titles.

The effect is on academic freedom – the ability to start, subscribe to, and stop new journals is at the heart of academic freedom. If we sustain a situation where it’s difficult to start a new journal, we are interfering with academic freedom.

Fifteen percent of libraries are now canceling print editions and maintaining access to electronic versions. Publishers are either giving no reduction or less than 10 percent reduction in price for giving up print.

What does it mean when a journal goes corporate? 45% of journal titles are in corporate hands. When an association’s journals go corporate, the price goes up. The scholarly societies don’t see a choice – they need support for the electronic distribution of knowledge, and the publishers have very sophisticated mechanisms for that. Ted Bergstrom at UCSB has done work on comparing prices for non-profit vs. commercial journals. He has measured price per citation – in non-profit sector it’s $15, in commercial it’s $90. The commercialization is increasing cost, and unless budgets are rising that means a reduction in the access to knowledge. But this is in an era where the cost to disseminate knowledge is decreasing.

The alternative is the metaphor of openness. The metaphor is important because information technology can be used to restrict access as well as to increase dissemination. The principle is to increase access – anything that increases access to knowledge adds to the public good. The possibility of open data is exciting and a great example. In the humanities, The Stanford Encyclopedia of Philosophy is open – struggling, but open.

Another aspect is the role of the amateur. In astronomy, if we’re going to be hit by an asteroid, we won’t be informed by a professional astronomer. The amateur astronomers are using the data available and are being included in a very substantive way in the field. The idea is that the university is part of a larger community. We reposition the university as a source of knowledge – we are dependent on the good will of the community and we can give back.

The wikipedia is like one long homework project that everyone is doing for no credit. How is it that people can come together to create this? We in universities need to participate in this. How do we connect the work we’re doing in universities with things like the wikipedia? Through open access.

Publishers agree that authors have the right to put articles in institutional repositories or faculty web sites. Repositories are the first step – but how do we get people to fill them? Most faculty think publication is the end of the process. But it’s to the advantage of the faculty, department, and institution to have the article in the repository – it will increase your citation rate. There are figures that suggest a 40% increase in readership from appearing in open access repositories.

Authors can buy open access to an article from the publisher for around $3,000. The Federal Research Public Access Act of 2006 will make it a mandate for every major agency to make articles available for free six months after publication. Organizations are mobilizing to support this.

The Public Knowledge Project started about eight years ago. What can IT professionals do to help scholarly societies? They built some open source software – an open journal system, an open conference system, and an open metadata harvester. The open journal system allows a group of scholars to manage and publish an online journal – imagines a library can set up a system to allow scholars to publish. These systems make the data available on the web in a form that is indexable and makes it findable by Google and others.

The library and the institution can offer scholars the opportunity to provide an alternative to commercial publication.

The major scholarly societies are running very sophisticated journal operations. He wants to suggest a publishing cooperative, of societies, libraries, and IT professionals. Bring in the libraries to apply knowledge and funding that could be saved from subscriptions. 600 scholarly societies use Blackwell to publish their journals – the scholarly societies have a release clause in the case of a sale – so what alternative do they have? The IT departments could help suggest the alternative.

Instead of sustaining the future, we want to envision a better future. We should be willing to create futures that increase access to knowledge.

Technorati Tags: , ,

National Science Foundation workshop on high performance computing, storage, and large databases

This workshop was sponsored by NSF’s Division of Science Resource Statistics, which collects data on the US science and engineering enterprise to be used for policy making purposes. They have collected data on the physical environment for research at higher education and biomedical institutions since 1986, and since 2003 they have begun to add a survey on cyberinfrastructure. The initial effort was to collect data on networking infrastructure, but now they are interested in also collecting data on high performance computing, storage, and large databases used for science and engineering research purposes. These surveys go to all research performing institutions with greater than $1 million in research expenditures and all biomedical institutions with greater than $1 million in NIH funding.

The workshop gathered a group of about fifteen participants from institutions as large as the UW, Penn State, and UNC, as rarified as Princeton, as specialized as the National Center for Supercomputing Applications (NCSA) and the Scripps Biomedical Institute, and as small as the Mount Desert Island Biological Laboratory, to brainstorm on what data points might possibly be collected on these activities that would be both meaningful and possible to collect.

Most of these institutions, unlike the UW, host some central research computing facility where a central IT organization runs some large high performance computing resources that are used by faculty doing research. But even in those institutions there are many other research computing efforts on the campuses that are not run by central organizations.

Over the couple of days what emerged was a way of classifying high performance systems into: Clusters (which can be either tightly or loosely coupled); Massively Parallel (MPP) machines with distributed memory; Symmetrically Multiprocessor (SMP) machines with shared memory; and Vector Processors (PVP) which it was noted aren’t seen too much in the US.

Common data that can be collected about those kinds of compute resources includes: number of processors (there was an interesting discussion of how to count this in this day of multi-core chip-sets); processor speed; amount of memory per processor; what kinds of interconnects exist between processors; total RAM, total attached disk (and what kind); and total estimate of flops the machine is capable of.

Some interesting items pop up in my notes from the two days:

  • The needs of a research data center are qualitatively different from the needs of a business data center in terms of types of facilities, access policies, and tolerance for what kinds of down time.
  • Support for data management and use of databases is the fastest growing demand for help among researchers using high performance computing.
  • UCLA has grown a strong grid computing initiative, which is not only supporting the other UC systems, but also providing cycles to the Cal State institutions and K-12 institutions in California, through the “Kids On The Grid” program.
  • Princeton has evolved their academic technology support to a new group in OIT, their central IT organization, to support research computing. That group works very closely with PICSciE, the campus’ new center for computational science and engineering work. The group within OIT concentrates on administering high performance systems that are widely used by researchers. They currently run an IBM Blue Gene, an SGI Altics, and a Beowulf cluster. They’re building a 35 terabyte shared storage facility.
  • One institution is building a brand new 11,000 square foot data center with 8 tons of cooling capacity – they figure that amount of capacity will only hold them for a year or two.
  • The University of Houston has an interesting model where the system administrators for their research facility are not university employees but contracted from outsourced firms – they have a lot of folks in Houston with those skills providing outsourced services to the petroleum industry as well as academia.
  • Purdue is running Condor clustering to make unused cycles from student computing labs available to research efforts.

It was a very interesting couple of days – it was great to meet folks I didn’t know, and to get a feel for what’s happening out there in this fast-changing field.

Technorati Tags: , , ,

National Science Foundation workshop on high performance computing, storage, and large databases

This workshop was sponsored by NSF’s Division of Science Resource Statistics, which collects data on the US science and engineering enterprise to be used for policy making purposes. They have collected data on the physical environment for research at higher education and biomedical institutions since 1986, and since 2003 they have begun to add a survey on cyberinfrastructure. The initial effort was to collect data on networking infrastructure, but now they are interested in also collecting data on high performance computing, storage, and large databases used for science and engineering research purposes. These surveys go to all research performing institutions with greater than $1 million in research expenditures and all biomedical institutions with greater than $1 million in NIH funding.

The workshop gathered a group of about fifteen participants from institutions as large as the UW, Penn State, and UNC, as rarified as Princeton, as specialized as the National Center for Supercomputing Applications (NCSA) and the Scripps Biomedical Institute, and as small as the Mount Desert Island Biological Laboratory, to brainstorm on what data points might possibly be collected on these activities that would be both meaningful and possible to collect.

Most of these institutions, unlike the UW, host some central research computing facility where a central IT organization runs some large high performance computing resources that are used by faculty doing research. But even in those institutions there are many other research computing efforts on the campuses that are not run by central organizations.

Over the couple of days what emerged was a way of classifying high performance systems into: Clusters (which can be either tightly or loosely coupled); Massively Parallel (MPP) machines with distributed memory; Symmetrically Multiprocessor (SMP) machines with shared memory; and Vector Processors (PVP) which it was noted aren’t seen too much in the US.

Common data that can be collected about those kinds of compute resources includes: number of processors (there was an interesting discussion of how to count this in this day of multi-core chip-sets); processor speed; amount of memory per processor; what kinds of interconnects exist between processors; total RAM, total attached disk (and what kind); and total estimate of flops the machine is capable of.

Some interesting items pop up in my notes from the two days:

  • The needs of a research data center are qualitatively different from the needs of a business data center in terms of types of facilities, access policies, and tolerance for what kinds of down time.
  • Support for data management and use of databases is the fastest growing demand for help among researchers using high performance computing.
  • UCLA has grown a strong grid computing initiative, which is not only supporting the other UC systems, but also providing cycles to the Cal State institutions and K-12 institutions in California, through the “Kids On The Grid” program.
  • Princeton has evolved their academic technology support to a new group in OIT, their central IT organization, to support research computing. That group works very closely with PICSciE, the campus’ new center for computational science and engineering work. The group within OIT concentrates on administering high performance systems that are widely used by researchers. They currently run an IBM Blue Gene, an SGI Altics, and a Beowulf cluster. They’re building a 35 terabyte shared storage facility.
  • One institution is building a brand new 11,000 square foot data center with 8 tons of cooling capacity – they figure that amount of capacity will only hold them for a year or two.
  • The University of Houston has an interesting model where the system administrators for their research facility are not university employees but contracted from outsourced firms – they have a lot of folks in Houston with those skills providing outsourced services to the petroleum industry as well as academia.
  • Purdue is running Condor clustering to make unused cycles from student computing labs available to research efforts.

It was a very interesting couple of days – it was great to meet folks I didn’t know, and to get a feel for what’s happening out there in this fast-changing field.

Technorati Tags: , , ,

Adventures with the Nokia E62 – Oracle Calendar and Symbian calendar app don’t really understand time zones and events

So I entered all my flight and meeting time information for this week into my Oracle Calendar. But of course those events are entered in the time zone where I entered the data. I synced my Nokia E62 calendar to my Oracle Calendar. When I arrived on the east coast, the phone changed to the current local time, and yep, you guessed it, it shifted all of my calendar entries by three hours.

Sheesh – you’d think devices meant for travelers would be smarter than this.

I know that this is something that OSAF’s Chandler software gets this right – individual calendar entries can have timezones attached to them. Are there other software and devices that get this right?

Technorati Tags: , , , , ,

Travel this week – NSF and ECAR

I’m traveling this week. First I’m in Arlington, VA, for a National Science Foundation workshop on high performance computing, storage, and large databases. NSF is starting to plan to survey research institutions on what’s happening in those areas, so this workshop is going to center on discussions about what data will be useful (and/or possible) to collect. Should be a fascinating discussion.

For the second half of the week I’ll be in Arizona for the annual Symposium from the Educause Center for Applied Research (ECAR). Richard Katz always puts together interesting and unexpected ideas for these meetings, which have been some of the most thought provoking of all of the gatherings I regularly attend. This year’s agenda, which centers on the broad topic of sustainability, promises to be no different.

I’ll post on both of these gatherings as I can during the week.

Technorati Tags: , ,

It’s the little things that make a difference – Firefox 2.0 tab behavior

One of the things I really like about Firefox 2.0 is that if you have your preferences set to “New pages should be opened in: a new tab”, when you click on a link in a web page, Firefox opens that link in a new tab. When you close that tab, Firefox takes you right back to the tab you opened the link from, instead of the nearest tab. For those of us who typically have dozens of tabs open, that’s a real productivity enhancer.

Technorati Tags: ,

Calendar Mashups

I’ve been too busy lately to blog much, but I have been playing a bit with the ability to embed html views of Google calendars into web pages.

I keep a Google calendar of my travel and events, because it’s much easier for me to see at a glance when I’ll be out of town or unavailable that way rather than sifting through all the entries on my Oracle Calendar, which is cluttered with standing meetings, individual appointments, plane flights, and the like.

Google has now got a nice wizard called the Google Embeddable Calendar Helper that generates html for a view of a Google calendar that can be dropped into any web page, like so:

This month and January are particularly good examples, as I’m traveling a bunch. I’ve embedded this calendar into the sidebar of the blog, down below all the About Oren stuff.

The folks over at 30 boxes have built a nice calendar mashup engine called 30Boxed that lets you create calendar views from any icalendar feed. I tried that too, and I like the look and the fact that the month view scrolls a week at a time, but when I feed it my Google calendar ical feed it doesn’t seem to realize that there are events I’ve deleted from my Google calendar. But there are some very cool mashups that can be made with this gadget, like timeline views of flickr photo sets.

Technorati Tags: , , ,

« Previous Page


subscribe

Pages

Latest tweets

interesting links

What I’m listening to

 

December 2006
M T W T F S S
« Nov   Jan »
 123
45678910
11121314151617
18192021222324
25262728293031

Follow

Get every new post delivered to your Inbox.