Archive for May, 2010

A Drupal tip: Adding taxonomy vocabulary description to a Views header

This is a how-to tip for Drupal 6, which I’m documenting because I couldn’t find this answer anywhere and it took me a day of scratching my head to figure it out. Drupalistas might find this useful, the rest of you can move along, there’s nothing for you to see here.

I was creating a View that listed all the nodes that have a given vocabulary term assigned to them, where the vocabulary term is passed in as the argument (e.g. http://mysite.myschool.edu/sitename/type/Basic ), where “type” is the path to the View, and “Basic” is the vocabulary term).

I wanted to include the description of the vocabulary term appear at the top of the View. How to do that?

The short answer is to put a short snippet of PHP code in the header of the View. Step by step:

  1. Make sure that the PHP filter module is enabled in the Core – optional section of Modules.
  2. Edit the Header item of your VIew (in the Basic Settings). If you’re using a WYSIWYG editor, make sure your input format is set to PHP Code.
  3. Paste this code into the Header box:
    <?php
    $term = taxonomy_get_term_by_name(arg(1));
    print (filter_xss_admin($term[0]->description));
    ?>
  4. Update the View, then Save it. You won’t necessarily see the result in the Live Preview under the Views menus, but it should work in your site.

The slightly longer story here is that I think there’s a bug in the taxonomy_get_term_by_name() function that makes it so you have to reference $term[0]->description instead of $term->description. I filed that bug on the Drupal.org site at http://drupal.org/node/812164.

Hope that helps other folks besides me – leave a comment if it works or doesn’t work for you.

Levi-Strauss, remix culture, and mining the rock ‘n’ roll past

Logic Studio screenshot
Last week Wet Paint, my old band from the 70s, got together to play a college reunion gig in Bellingham. Great fun was had by all, and I think the band sounded better than it ever had.

Leading up to the gig I digitized our 1978 single from vinyl, and then I decided to try my hand at doing a remix of one of the sides, Steve Robinson’s very cool Shake A Maraca.

Doing a remix is an interesting process. Starting with the original tracks you visually slice and dice them into parts, adding various levels of audio processing to them, and then combine them with other audio. The tools for digitally manipulating music these days are nothing short of astounding in their power (and complexity). I used the latest version of Apple’s Logic, version 9, but there are a variety of competing tools.

Logic comes with a vast array of software instruments and pre-recorded snippets (known as “loops”) which can be utilized at will, and you can import audio from any other source you can find. So the process of the remix involves sifting through a huge library of available sounds and grooves, and trying to figure out what’s useful to the task at hand, and using those pieces to build up what hopefully becomes a compositionally coherent whole.

That got me thinking about the late Claude Levi-Strauss’ writings on “bricolage” in traditional cultures. Bricolage literally means “tinkering”, or as Wikipedia defines it, “to refer to the construction or creation of a work from a diverse range of things that happen to be available, or a work created by such a process”.

Levi-Strauss wrote about the use of bricolage in the construction of myths in indigenous cultures, saying:

The set of the ‘bricoleur’s’ means cannot therefore be defined in terms of a project… It is to be defined only by its potential use or, putting this another way and in the language of the ‘bricoleur’ himself, because the elements are collected or retained on the principle that ‘they may always come in handy’. Such elements are specialized up to a point, sufficiently for the ‘bricoleur’ not to need the equipment and knowledge of all trades and professions, but not enough for each of them to have only one definite and determinate use. They each represent a set of actual and possible relations; they are ‘operators’ but they can be used for any operations of the same type.

which sounds a lot like the current way music is built up digitally. He recognized that the results of the bricoleur’s technique “can reach brilliant unforeseen results on the intellectual plane,” which I think is completely true of using musical remix techniques, which can often bear only the slightest resemblances to the original source material.

Some of my old fogey contemporaries question whether the technique of building up new musical art by reassembling and manipulating digital pieces is as valid as making music by playing a traditional instrument. Get over it! While I personally will always treasure the pleasure of my hands and ears interacting with strings and wood, I don’t think that any one method of achieving sound necessarily holds any more validity than another – it’s what you can do with the tools that matters. I’m sure if I was just starting out with music, I’d be spending a whole lot of time in front of my computer mastering these tools.

All of which seemed relevant this week with the news of the Rolling Stones release of a remastered Exile on Main Street complete with ten new tracks, some of which had some vocal and instrumental parts finished this year. I’ve always loved Exile (though I think Beggars Banquet is still my favorite Stones album), and having just been spending this time mining my own 30-year-old past for a remix, who am I to question whether Mick and Keith should delve into their own unfinished creations? While I haven’t given the new material a good listen, I did really enjoy the All Songs Considered interview with producer Don Was on the project, and the pieces he played during the interview sounded great. If I had a back catalog like the Stones, you can bet I’d be spending time revisiting it – and it sounds a good deal better than any of the Stones’ new material has in some time!

I also think that the bricolage approach has a lot of relevance to software engineering and how we manage IT, particularly in higher education, and I’ll have more to say on that in a coming post.

[CSG Spring 2010] SaaS requirements for higher ed

Tracy Futhey is leading a conversation on SaaS requirements for higher education.

Spent summer gathering docs on shared services from various campuses. In August started looking at email and hosting. Engaged a team from NACUA in October. Came up with email Issues matrix in November and worked out a model contract in March and a draft RFP model in April.

Strategies adopted by sub-team
- Avoid hardcore Technical Requirements list. (outsourcing service/function is not dictating technical solutions)
- Recognize/Leverage limitations on free services (build RFP with expectation of payment for services)
- Assume reuse; organize materials accordingly
- Admit Rumsfeld was right: “there are also unknown unknowns, the ones we don’t know we don’t know”.

Issues spreadsheet – five big issues – Data Stewardship, Privacy, Integration, Functionalities, Service Level

Working with Educause to distribute as open source documents.

What may be next?
- Assess interest in glomming RFP (CSG + …?)
- Finalize plan for Educause to hold docs
- Issue common RFP in June/July?
- Responses in August?
- Campus discussions in fall? Vendor negotiation? (not clear vendor(s) will be responsive to our concerns, or that we will like the responses)
- Decisions by Jan 1, 2011?
- Pilots during spring 2011?
- Fall 2011 go-live dates?

[CSG Spring 2010] Service Management – Service Lifecycle Cradle 2 Grage

Romy Bolton (Iowa) and Bernard Gulachek (Minnesota) are talking about service lifecycle.

At Minnesota they think a lot about service positioning – not to just react to perceived need. An unquenching appetite with limited resources is not a good recipe. Tried to apply a general administrative services framework for the institution about where services should be placed along a continuum from distributed to centralized. Developed principles and examples to help communicate with people in the distributed units.

At Iowa they started “Project Review” process in the late 90s. Tuesday afternoon meetings – employee time with the directors and CIO. Open to everybody. Re-tooled project framework in 2007, service lifecycle management in 2008. Light ITIL framework

Emphasis on service definition, publication, end user request, provisioning. They still have project review, plus a project called Discovery to explore ideas, ITS Spotlight to call attention of staff to services. IT admins on campus have regular monthly meetings with 100+ people. Beginning to work on Do It Yourself provisioning tool.

Service definition starts in project planning phase
- identify service owner and provider
- identify KPIs for service
- Reassess risks and cost-benefit for service
- Identify critcality of service on scale of 1-4
- Update 5 yr TCO and funding source
- Document service milestones
- Update status in ITS Service Catalog as appropriate

Iowa uses Sharepoint as intranet and for publishing their service catalog and Drupal for IKE (their knowledge management site). They’re just building out the self-provisioning service.

Tom Barton notes that there’s something called a Service Provisioning Markup Language – sort of languishing, but maybe some new energy is flowing into it.

Iowa – triggers for Service Review: User needs; environmental change (e.g. the cloud for email); financial; security event; hardware refresh; new software version; end of life for product. Review is not a small effort. Business and Finance office helps gather info. Includes: Service Overview, Customer Input, Financial Resources, Utilization and customer base, service metrics, market analysis, labor resource, recommendations. Owned by the senior directors.

At Minnesota they do annual service reviews of all of their common good services – “just began to enforce that”, in part borne out of frustration at not being able to sunset services. Two or three people focus on this, working with service owners. The current example is what services continue as they roll out Google Apps.

Service Performance and Measurement

Designed for strategic conversations with stakeholders that go beyond the operational. Began gathering availability data about a year ago – looking at whether services are alive. Klara notes that defining whether a service is up can be complex, but that it can be easier to measure simply whether a user can access a service. They have a systems status page showing current status – mixture of automated and human-intervention. Using Cisco’s Intuity product to track monthly/annual measures. They give roll-ups of info to deans and IT leaders. Include benchmark comparisons with Gartner or Burton benchmarks if available. They publish the cost of services annually, so they understand what they’re paying for and how that’s changed over time. http://www.apdex.org is a new alliance for understanding application performance measurement.

At Stanford they’ve established Business Partners – senior people who know the organization who act as the pipeline in to the service managers. They meet with clients at a senior level.

[CSG Spring 2010] Service Management – CIO Panel

The morning is all about service management topics. My notes are going to be pretty sketchy because I’m coordinating the workshop and giving several presentations, but I’ll do my best and put up the slides from my parts.

Klara (Chicago) notes that culture is key in trying to implement service management. Steve (Iowa) agrees. At Iowa they built lightweight service and project management frameworks because that’s what the culture would tolerate. It’s a trigger-based process. Different events are recognized by service managers or owners and then initiate a review of a service. They put a lot of accountability on the service owner – they have to bring the right metrics forward. The review process gives them a chance to have some oversight of those metrics.

Bill Clebsch (Stanford) – doesn’t like ITIL or anything that looks like it comes from the outside to tell the organization what to do. So tries to talk first about accountability – that’s how they brought time tracking into the organization. Before that they did metrics – “a star performer’s best friend”. Put up customer-facing metrics, work they did with MIT. That was foundational to moving culture to more of a performance orientation. They’re a big Remedy shop. Every help desk at the university runs through their Remedy. Often people’s only knowledge of the organization is the service desk, so that’s a good place to start. Now working on change management. Remedy is good if you want to drink the kool-aid. They started a service portfolio effort about three years ago. Budget cuts are the best friend for getting these things done – makes your own organization aware, and makes your clients aware that they can’t behave in aberrant ways. Setting ambitious goals is good.

Kerry (Carnegie-Mellon). In addition to culture, timing is key. Service portfolio effort started at CMU in the central IT organization when Joel first became CIO – didn’t understand what services were being provided. Was beginning to have success when an external advisory board visit – CMU was growing from being a start-up to a global enterprise. Changed the conversation. “Who is responsible for a service” was a hard question. Started answering by “whoever Kerry or Joel calls to fix it.”

Bill – every year do an extensive client survey, and scores have gone way up in recent years, as have metrics and employee surveys. Having the organization much more outwardly focused matters as much as the data. Sense of ownership is huge.

Klara – Chicago is not as mature, yet deans still want to give things up to IT.

Steve – the role of technology is changing, which makes people more willing to cede control of it to the central IT group. Bill – when things get boring or risky, hand off to central IT.

A question about the relationship of project management to services. At CMU the transition from project to service was difficult because they didn’t yet know how to declare a project done. They’re now paying a lot of attention to review of projects and transition to services. Klara – important to be mindful about how to operationalize a project – bring in the other stakeholders like service desk, operations, etc.

Question about how to decide to stop doing services – Steve- service reviews help, when utilization is declining and other alternatives exist, then there can be a project to shut down the service. Bill – have a dedicated service portfolio team that looks at what services should be brought up and shut down. They have actually shut down some services. Team is made up of service managers, some directors, some executive directors. We’re moving into an era of being more service brokers than providers, and will do less provisioning. That will require a different kind of service managers. They have a few people in the organization who are explicitly service managers, with no other role.

Question about cost of services. At Iowa they allocated all of the IT costs to services – it was a lot of work, but the data was very interesting and started good discussions. In the process of trying to automate that. Tension between being efficient and being able to invest to help research and teaching to be better.

Time tracking is essential to doing costs of services.

Critical to not let the perfect be the enemy of the good. Shel notes that Bell Labs decided to got to activity-based accounting and four years later the internal accounting department had grown to 450 people.

Shel – you have to make the judgement on what your allocation model is for given services. You may not make the perfect decision, but you need to decide.

[CSG 2010] Curation, Preservation, & Information Lifecycle Management

Mairead from Penn State is talking about designing and implementing storage arhcitectures and systems to support data curation and preservation needs. Who’s thinking about this, and what are they doing?

Drivers & Incentives – eScience/eResearch. NSF requirement for data management plans. Compliance – e-discovery, FERPA, HIPAA, Sarbanes-Oxley. Institutional record retention regulations and policies. Storage services for libraries, archives, cultural heritage entities. Great efficiencies.

Expectations (not supported) – storage is cheap; storage is smart; stuff on the internet is persistent; digital safer than analog; storage provider – curators and preservation experts; repositories take care of preservation; metadata will take care of it; libraries will take care of it; the cloud will take care of it.

The reality – new roles, new responsibilities, new collaborations, practices, workflows; Intellectual capital requirements – digital preservation; clout antithetical to preservation?; increased management requirements; scaling issues with preservation requirements.

Standards/Technologies
iRODS – From SDSC, integrated rule-based data system. Second generation of SRB.
Content addressable storage – fixed content storage, retrieval based on content rather than location
eXtensible Access Method (XAM)

Initiatives -
NSF DataNet – Data Conservancy Project – JHU lead with 23 institutions.
Chronopolis – SDSC, UCSD, UMIACS, NCAR – federated data grid using SRB/IRODS
LOCKSS (Lots of Copies Keep Things Safe) – replication of licensed journals and other content
MetaArchive – a private LOCKSS archive
Internet Archive
National Digital Information Infrastructure & Preservation Program (NDIIP) – Library of Congress project.
California Digital Library
DuraSpace – DuraCloud project to implement a preservation-oriented cloud storage service
HaithiTrust – Repository and storage infrastructure initiated for CIC Google book project
Sun PReservation and Archiving SIG (PASIG)
Storage Networking Industry Association

Penn State activities – Content Stewardship PRogram – strategic collaboration between Libraries and ITS. Goal – a suite of services to support the lifecycle of the digital object – creation, discovery, access, storage, preservation, and archiving. Hired Digital Library Architect and Digital Collections Curator; worked on governance.

Sally Jackson says that the Library School at Illinois now has a program in digital curation.

Cliff – decisions on what to curate, and what to keep, are less binary in digital formats than in print. Eg, Portico for scholarly journals, vs. “digital archaeology” status. It’s about risk management and resource allocation. Some of what we’re trying to understand in bit-management is really about risk and cost. How many redundant copies do you need? Failure modes are not well understood. Very scary data from physics labs about undetected bit flip errors. What does that cost in a preserved object? If it’s encrypted in clever ways it can cost a lot!

[CSG Spring 2010] Storage Futures – Cloud Options discussion

Shel Waggener – Link campus into cloud providers?
- Duraspace integration?
- UC Systemwide storage solution
- Purchase mass storage from commercial provider e.g. Amazon
- Let everybody do their own.

File Sharing through cloud: Institutional sharing?
- Eliminated Xythos (done)
- Common contract with Dropbox?

Student and faculty portfolios?
- Alumni offerings

Bernard – in context of move to Google, thy’ve clarified policies around PHI, ITAR data, FERPA.

One institution reports that as far as their CISO is concerned, if it’s verifiably sufficiently encrypted, they’d regard it the same as shredded paper.

[CSG Spring 2010] Storage Strategies

Storage strategy survey results. Storage management is equally distributed between central IT, distributed, both, or not sure.

What’s provided centrally? All offer individual file space. Most offer backups for distributed servers and departmental file space. Half offer desktop backups.

Funding models – just about all have some variety of pay for what you use. Most have some common goods, and about half have base plus cost for extra.

About half do full cost recovery including staff time.

Challenges – data growth is top, tiered storage is next, along with centralizing and virtualization.

Biggest explicit challenges : Data growth, perception of cost, research storage.

Storage at Iowa
Central file storage: Base entitlement, individuals 1-5 GB, depts, 1 GB per FTE. 4 hour recovery objectives. 99.97% uptime. 89% participation. Enterprise level, high availability.

One price fits all network file storage, offered some lower-cost network storage, e.g. without replication or backup, now they’ve got lowest-cost bare server storage – lots of enthusiasm for that model.

http://its/uiowa.edu/spa/storage/

Low cost SAN for servers $0.36 – $1.68 per year, depending on service level. Recovery is hw and sw, no staff time or data center charges.

Storage Census 2010

51% of storage being used by research. 35% Admin and Overhead (including email), 11% Teaching, 3% Public Service.

72% of storage is backup vs. online.

Next steps: identify and promote research solutions; build central backup service; build, promote archival solutions.

Storage @ U VIrginia – Jim Jolkl

Hierarchical Storage Manager Services: Storage for long-term research data (centrally funded but not well marketed); Library materials (funding via Library contributions to infrastructure); RESSCU (off-campus service for departmental disaster recovery backups).

Enterprise Storage – Based on Netapp clusters. NFS, CIFS for users, ISCSI, SAN internally. Works really well, highly reliable, replicated. Mostly used for central services. For departments it’s $3.20/GB/yr to $3.50 without backups. Lots of incidental sales to people who want a gigabyte or so for additional email quota. Doesn’t work for people who want a lot of storage.

New mid-tier storage service – focus on a reasonable and affordable storage service for departments and researchers.
Requirements: reliable, low cost, low overhead, self service. Unbundled services – optional remote replication and backups. Access via NFS and CIFS. Snapshots – users deal with their own restores. Offering Linux and WIndows versions. Doing group files based on their groups infrastructure. Using RAIDKING disk arrays. Using BetterFS on Fedora, Windows server for the windows side.

Cost model – 1 hour plus $0.34/GB/yr (raid5, but not replicated). Next year expect to drop price by 50%. Currently about 22 TB leased on NFS and only marginal WIndows use to date. All of the complaints about the costs of central storage have gone away. Research groups interested in buying big chunks.

Shel Waggener – Berkeley Storage & Backup Strategy

Shel says scale matters and no matter who says they’re doing it better faster cheaper, without scale they’re not.

2003 – every department runs own storage – including seven within central IT.
2004 – data center moves creates opportunity for common architecture
2006 – dedicated storage group formed. No further central storage purchases supported except throuh storage team.
2007 – Hitachi wins bakeoff. 250 TB. Email team works with storage group to move from direct-attached to SAN
2010 – over 500 hosts using pool – 1.25 PB expanding to 3 PB this year.

SAN-based approach. Lots of serial attached SCSI disk – moving away from fiber-channel.

Cheapest storage is now 25 cents gigabyte per month. The most expensive tier (now $4.00/GB/Month) bears the cost of the expensive infrastructure that the other tiers leverage.

Failure rate on cheap disk is reliable, but recovery time is longer.

At the cost of storage, they don’t have quotas for email.

One advantage is paying for today’s storage today. Departments buy big arrays and use 5% in the first two years, which is much more expensive. But that’s what’s supported by NIH and NSF.

Backing up 338 users’ desktops (in IST) takes up 1.3 TB.

[CSG Spring 2010] Storing Data Forever

Serge from Princeton is talking about storing data. There’s a piece by MacKenzie Smith called Managing Research Data 101.

What do we mean by data? What about transcribing obsolete formats? Lot of metadata issues. Lots of issues.

What is “forever”? Serge thinks we’re talking about storing for as long as we possibly can, which can’t be precisely defined.

Why store data forever?
- because we have to – funding agencies want data “sharing” plans – e.g. NIH data sharing policy (2003). NIH says that applicants may request funds for data sharing and archiving.
Science Insider May 5 – Ed Seidel says NSF will require applicants to submit a data management plan. That could include saying “we will not retain data”.

- Because we need to encourage honesty – e.g. did Mendel cheat?
- Like open source help uncover mistakes or bugs.
- Open data and access movement – what about research data?

Michael Pickett asks who owns the data? At Brown, the institution claims to own the data.

Cliff Lynch notes that most of the time the data is not copryightable, so that “ownership” comes down to “possession”

There’s a great deal of variation by branch of science on what the release schedules look like – planetary research scientists get a couple of years to work their data before releasing to others, whereas in genomics the model is to pump out the data almost every night.

Current storage models
- Let someone else do it
– Government agency/lab/bureau e.g. NASA, NOAA
– Professional society

Dryad is an interesting model – if you publish in a given model you can deposit your data there. That’s like genbank.

Duraspace wants to promote a cloud storage model based on dspace and fedora.

There are a number of data repositories that are government sponsored that started in universities.

Shel says that researchers will be putting data in the cloud as part of the research process, but where does it migrate to?

Serge’s proposal – Pay once, store endlessly (Terry notes that it’s also called a ponzi scheme).

Total cost of storage =
I = initial cost
D = rate at which storage costs decrease yearl, expressed as a fraction
R = how often, in years, storage is replaced
T = cost to store data forever

T = I + (1-d) to the r *I + (1=d) to the 2r * I + ….

if d=20%, r = 4, T=I * 2

If you charge twice the cost of initial storage, you can store the data forever.

They’re trying to implement this model at Princeton, calling it DataSpace.

People costs (calculated per gigabyte managed) also go down over time.

Cliff – there was a task force funded by NSF, Mellon, and JISC on sustainable models for digital preservation – http://brtf.sdsc.edu

[CSG Spring 2010] Staffing for Research Computing

Greg Anderson from Chicago is talking about funding staff for research computing.

Most people in the room raise their hand when asked if they dedicate staff to research computing on campus.

At Illinois they have 175 people in NCSA, but it doesn’t report to CIO.

Shel notes that employees have gotten stretched into doing lots of other things besides just providing research support. They’re trying to rein that back in in their career classification structures by requiring people to classify themselves. Now there’s 300 generalists classified as such.

At Princeton they’ve started a group of scientific sysadmins. The central folks are starting to help with technical supervision, creating some coherence across units. At Berkeley the central organization buys some time from some of the technical groups to make sure that they’re available to work with the central organization. Groups don’t get any design or consultation help unless they agree to put their computers in the data center.

At Columbia they have a central IT employee who works in the new center for (social sciences?) research computing – it’s a new model.

Greg asks how people know what the ratio of staff to research computing support should be and how do they make the case?

Shel asks whether anybody has surveyed grad students and postdocs about the sysadmin work they’re pressed into doing. He thinks that they’re seeing that work as more tangential to their research than they did a few years back.

Dave Lambert is talking about how the skill set for sysadmin has gotten sufficiently complex that the grad student or postdoc can’t hope to be successful at it. He cites the example of finding lots of insecure Oracle databases in research groups.

Klara asks why we always put funding at the start of the discussion of research support? Dave says it’s because of the funding model for research at our institutions. The domain scientists see any investment in this space by NSF as competing directly with the research funding. We need to think about how we build the political process to help lead on these issues.


subscribe

Pages

Latest tweets

interesting links

What I’m listening to

May 2010
M T W T F S S
« Mar   Jul »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Follow

Get every new post delivered to your Inbox.