Starting radiation with good news on the cancer front!

Today (Sunday) I’m off to check into the hospital for my first week of combined radiation and chemo. I’ll be inpatient until Friday evening, getting radiation twice a day and a constant infusion of 5FU and Taxol. This drill is scheduled to repeat every other week for five repetitions. I can’t say that I’m looking forward to it.

The good news this week was that on the most recent CT scan the doctors were unable to find any sign of the primary tumor at all. The chemo has apparently melted that tumor away, which they say happens in about 10-20% of cases. That will allow the radiation area to be smaller and more focused, which should help minimize the side effects.

Our friend Mauri is here from New York this week, so she’ll be keeping me company in the hospital while Michele is working and Mo is at school. And I’ll keep working remotely in between getting zapped.

We went to see the Descendants last night – a terrific, sad yet hopeful movie. Great acting! Now I’m arming myself by downloading TV episodes, podcasts, and music to take to the hospital. Along with the huge pile of books on my bedside table (not to mention all the work I’ve got on my plate), I should be able to occupy my time in the hospital and at home in the coming weeks.

Cancer Update

I’ve now had three chemotherapy infusions.

The first one, on December 20, was an all-day affair that involved three different chemicals (Cetuximab, Cisplatin, and Taxol), along with an assortment of anti-nausea drugs, benadryl, and steroids.

While I felt fine immediately after the treatment, the next week or so was mostly characterized by tiredness and lack of focus, along with some minor stomach issues.

The two treatments since then have only been one-drug affairs (cetuximab) and have gone better, though I did develop a bad acne-like rash, which is one of the common side effects of that drug (since treated with antibiotics).

I’m also participating in a clinical trial that is evaluating the use of Everolimus in treating squamous cell tumors in the head and neck – so I’m either taking that or a placebo every day.

I did manage to make it out to Seattle for a long New Year’s weekend between treatments. It was great to see everybody, and especially to help celebrate my father’s 90th birthday with the assembled family (though I missed the family trip to Semiahmoo, which was too bad).

Things have been busy at work, as I try to hand off most of my operational responsibilities to colleagues to handle while I’ll be mostly out of the office during the upcoming combined radiation and chemo treatment, which starts January 29.

I am tremendously grateful for the willingness of people who are already very busy to take on the extra burden – thanks to Byron, Tom, Klara, and Alex! I’ve also been working out with Klara a bunch of tasks I can continue to work on whether I’m in the office or not – all very interesting work where I can continue to contribute even if it’s remote. I am really bummed out to be missing this coming week’s CSG meeting, though – it’s a great set of topics this time and I’ll miss seeing the gang.

I’m heading in for another full-day chemo infusion on Tuesday (Jan 10). We’ll see how that goes!

2011 Favorite Listening

Here, in no particular order, are my choices for the 2011 releases that I keep coming back to.

tUnE-yArDs – WHOKILL

Merril Garbus weaves her African influence, her loop boxes,  and her DIY spirit into something totally new and compelling. A big, bold voice with something to say.

Gillian Welch – The Harrow & The Harvest

I didn’t see a lot of live music in 2011, but Gillian Welch and David Rawlings in Chicago was a real highlight. This album is full of great songs that sound as if they could have been written any time in the last 75 years.

Deep Blue Organ Trio – Wonderful!

Soulful groovin’ organ trio from Chicago, playing Stevie Wonder tunes with fresh new interpretations. Jazz comfort food!

Fountains of Wayne – Sky Full of Holes

I’m a sucker for intelligent, literate pop music, and this filled the bill this year. Raymond Carver meets the Ray-Beats.  Recommended for fans of Squeeze.

James Farm

A terrific new quartet with rising stars Joshua Redman on sax and Seattle native Aaron Parks on piano. Great compositions and thoughtful improvisation. Take a listen even if you think you don’t like jazz.

Larry Goldings – In My Room

A lovely, contemplative, (mostly) solo piano set from Larry Goldings.

Miles Davis Quintet – Live in Europe 1967

Miles’ great quintet captured at the height of their power – 3 CDs and a DVD. One of the high water marks in all of jazz.

Bernstein Goldings Stewart – Live At Small’s

Another fine example of the modern organ trio. This long-standing grouping plays empathically together.

Ry Cooder – Pull Up Some Dust and Sit Down

Who better than Ryland P. Cooder to take on the role of Woody Guthrie for the 99%?

Sunna Gunnlaugs – Long Pair Bond

Two years in a row for Icelandic pianist Sunna Gunnlaugs on my list. This record rewards repeated listening!

Raphael Saadiq – Stone Rollin’

The best of new soul, where Raphael transcends the retro act to produce a new and joyous sound.

The Decembrists – The King Is Dead

In which Colin Malloy and company leave the pretension behind and make great rock tunes.

 

Just getting into:

Youth Lagoon – The Year of Hibernation

Van Hunt – What Were You Hoping For?

Joining a club I wish wouldn’t accept me as a member

Over the past few years I’ve been stunned by the number of people I know who are dealing with cancer of one form or another. I don’t know whether it’s due to arriving at that age where friends and colleagues begin to manifest these diseases, or if it’s part of a general trend caused by better diagnoses, or a result of the environment taking its revenge on the human race, or something else (though this report from the National Cancer Institute says that cancer incidence has been falling since the 1990′s). But whatever the cause, it’s been hard to watch people I know and care for suffering through the disease and the treatments.

Now I’ve joined the club of cancer patients myself.

A couple of weeks ago I went in for a routine visit to meet my new primary care doctor. He took a look at me and noted that I had some swollen lymph nodes in my neck which he felt were larger than is usually caused by routine viral events. He sent me down to see the ENT doctor, who took a look in my throat and diagnosed squamous cell carcinoma of my right tonsil which had spread to the lymph nodes. There is apparently a large increase in this type of cancer in males, caused by HPV infection.

That was the beginning of being sucked into the big cancer treatment machine – definitely an E-ticket ride! The good news here is that the University of Chicago Medical Center is one of the best places in the world to be treated for head and neck cancers, and they have a whole team of specialists in that realm.

The other good news is that they are confident in their ability to treat and cure this.

So last Monday I went in and had a biopsy which confirmed the initial diagnosis. But they saw no evidence of the cancer spreading beyond the tonsillar area. They also took out my left tonsil to see if there was any cancer present there – there wasn’t. But that gives me a perfectly good excuse to eat lots of popsicles and ice cream as my throat heals.

The not-so-good news is that I’m in for several months of chemo and radiation therapy. Friday we spent much of the day in consultation with the oncologists who did a very thorough job of outlining the treatment plan. I also got fitted for a radiation mask, which consists of a block of styrofoam molded to fit around the back of my head and a plastic mesh mask that clamps on and covers my face. It’s designed to keep my head completely immobilized when they give radiation (here’s a pic of some other guy in one).

Yesterday (Monday) I went in and got infused with some radioactive fluid and spent the day getting 3D images built up of my head, chest, and abdomen. That’s part of a study where they’re hoping to find better scanning sooner after treatment than the PET scan options typically used now.

And today the chemo starts. More later.

 

Sarah Smith-Robbins (Intellagirl)

More than a help desk – expanding the value proposition of central IT

Marketing the potential benefit of things people don’t understand yet.

Think more like researchers and entrepreneurs.

Value proposition – should differentiate you from your competitors. Give people confidence that you can meet their unmet needs. Are we presenting our value effectively?

What are the perceived vs. real unmet needs?
Are they confident that we offer the right things?
What is IT’s competition?

Perceived Value – actual value is very complex and we can’t expect people to understand it, so market by differentiation against competitors.

Perceived value
Actual value
Competitors
Differentiators

IT has more than one customer. They all see us differently.

Admin – actual value: cost savings, security and regulatory expertise, sells the campus
– competitors: outsourcing, ROI
Differentiators – IT is part of campus culture, higher quality than outsourcing because of our expertise,
The only cost center on campus whose returns increase year after year.
Learn the language of administration
Express cultural value and significance better.
Leverage data in better ways.

Faculty -
Actual value – streamlining and supporting necessary teaching and research tasks, pertness for innovation (faculty who you haven’t helped don’t know because faculty son’t talk to each other).
Competitors – “edupunk” (routing around the campus and not tell anyone), contagious griping and misinformation,
Differentiators – making meaningful connections with faculty, seeking opportunities to support/encourage learning and research, even if it’s one faculty at a time, network with department IT professionals, create faculty evangelists, establish trust and confidence by being practice as well as responsive.

Staff -
perceived value: new and expert at finding new ways to cause delays.
Actual value – problem solving. Need to understand what staff do.
Competitors: budget, ad-hoc solutions, loss of confidence (so they don’t even ask)
Differentiators: transparency – time and costs; providing expertise in processes and tools; partner to learn their challenges (not just tech); provide them with expertise, not just support;

Students
Perceived value – “they’re watching!”
actual value – 99.999%; access to tools and software; discounts and cost savings; enabling student-provided devices;
Competitors: hacker mentality; perceptions that IT is behind the times; edge-user behavior
Differentiators: savings – time and money; transparent efforts to understand usage needs/differentiators;
Don’t assume – ask. Leverage benefits they care about. Create evangelists

Start expressing your value
today:
- get to know one faculty member, one staff member, follow a faculty member on twitter
- act like a marketer: pay attention to conversations, take notes of trends and perceptions.

This month:
- ask for volunteers to become disciplinary experts and department partners.
- create focus groups or listening posts for student sentiment.

This semester/year:
- start teaching basic business acumen to all IT staff.
- brag about the value of IT’s contributions to all audiences

Noshir Contractor – understanding and enabling collaboration [cictf11]

Dr. Contractor is professor of Behavioral Sciences at Northwestern.

Starting with dogs! SNIF – social networking in fur. Device goes on dog’s collar. Digs exchange business card info when close. “social petworking”

Lovegety – little device programmed with food music and movies you like. Flashes in proximity of potential love interest. When asking undergrads who likes this technology, it’s the engineering students, regardless of gender.

These are examples of use of technology to find the right people to connect with. Types of tech we have now are made by engineers for engineers. Important for people in IT space to work closely with social science knowledge.

First use of tech is to substitute. Not enough to offset or justify investments. 2nd stage is enlargement – technology increases activity. Technology increases gap between haves and have-nots. Part of what we need to do is to think creatively about how to reduce that gap. Productivity paradox – investment in IT doesn’t necessarily show return. Why? Giving Pony Express riders cell phones to call ahead to ask for water. Need to achieve 3rd stage – reconfiguration.

Ascendance of teams -
More research being done in teams
Research in teams has a higher impact.
Those in different disciplines have higher impact yet
And those involving different disciplines across different campuses have the highest impact.

But another study found that interdisciplinary distributed research is less likely to succeed. So how do we build tools to enable successful collaboration?

Understanding team assembly is key, and this is a very good time to do that.

Why do we form teams? In past, teams were assigned. But increasingly teams are self-forming. Sometimes based on self-interest. Or it may be based on social exchange. Or mutual interest and collective action. Contagion (everybody wants to work with the popular person). Balance – friends of friends. Homophily (birds of a feather). Proximity – form links with people close by – if you look at your buddy lists, most are people close by. Using tech to facilitate proximate communication. Each of these motivations have a structural signature. If you know what drives these networks, you can understand how to make them better.

Multidimensional network – not all nodes are people. Also includes documents, datasets, etc. Linked Open Data – publicly connecting datasets.

Team assembly for interdisciplinary NSF. When assembling a team, want high productivity from diversity, but also want smooth coordination stemming from shared cognitive models. How do we assemble teams to do both? 1,103 grant proposals submitted to NSF in 2 interdisciplinary programs. Researchers not likely to randomly form collaboration with each other. Researchers from top tier institutions are less likely to collaborate. Those with higher tenure are most likelybto collaborate. Researchers with high H-index are less likely to collaborate. Researchers are more likelynto collaborate with those they’ve collaborated with before.

Women are more likely to collaborate on funded proposals.
Odds of funding are higher when you collaborate with someone you’ve previously co-authors with, but not cited.

Exemplar 2 – massively multiplayer games. Virtual world exploratorium. Need to work with different characters to be successful. Motivations for creating teams in this context – selectivity and transivity (friend of friend) exists. Homophily of age and experience is supported. Short distances are important. Gender matters.

Are more diverse groups more successful? Uses Blau’s index to measure. Is group cosmopolitan characteristics more successful? Found diversity helps groups achieve more. Being more cosmopolitan helps avoid losses.

Using this data to build “match-making” Systems for forming research teams. c-iknow1.northwestern.edu.

Initial relationships across disciplines are being encouraged by funding agencies and institutions. Need to keep those relationships fresh and not homogenize interests.

In multi-team systems the connections between the teams is more important than the connections within each team. ABC dimensions – Affective, Behavioral, Cognitive.

Different kinds of goals for teams – exploring, exploiting, mobilizing, bonding, swarming.

Cliff Lynch – wrap-up discussion [ #rdlmw ]

One tag line – scholarly practice is changing, and that’s what put us all here. That won’t go away just because we’re having trouble dealing with it.

There’s a great search for leverage points where we can get a lot of return for a little investment. Lots of wondering whether there aren’t things we can do as consortia, for example. Other players, like instrument manufacturers. We have to keep looking for these leverage points, but need to realize that this is a sizable and expensive problem that we can’t make go away with one or two magic leverage points.

The discussion we just had about scaling and being involved up front, but being scared about whether we can deal with demand, is a real look at our problems.

Some new discussions about putting data lifecycle and funding strategies on different timelines that have complex interactions – certain funding strategies can distort the lifecycle by making it attractive or necessary for investigators to hold on to data that should be migrating.

This is not a NSF problem, nor a funding agency problem. We need to come up with a system that accommodates unsponsored research too. There’s a significant amount of work that goes on in social sciences and humanities with little or no funding attached.

One of the ugly facts we need to mindful of is the systematic defunding of (particularly public) higher education, and the pressure for defunding of scholarly research in government agencies. We need to come up with means of data curation and management that allow us to make intelligent decisions about priorities. Saw a striking example of this in the UK when they applied massive cuts to their funding agencies, including defunding of national archiving system for arts and humanities.

Pleased to see a session on secure data which said more than “this is hard – let’s run away”, which is the usual response. Secure data is probably not the right word. So much work to do on definitions and common languages, so we don’t spend so much time redefining problems – maybe we need to put some short-term effort into definitions. Also, we’re short on facts on the ground. Serge offered some real data on what’s going on with grants at Princeton – we need that campus by campus and rolled up by discipline and national lines. It’s not that hard to get, and there are various projects talking about it.

What’s the balance between enabling sharing and enabling preservation. Often a lot of the investment starts going into preservation and you never get to sharing. Bill Michener gave us a look at a nice set of investment into discovery and reuse systems (DataONE), maybe that’s something that could be federated so we don’t have to build a ton of them.

Was glad to hear about the PASIG work – developed over 3-4 years between Stanford, a group of other institutions and Sun. As we think about the right kinds of industry/university venues for collaboration that’s one to have a look at. In particular look at the agendas for past meetings. Some of the conversations about expected structure of storage market, things that drive tech refresh cycles in storage, are very helpful.

Panel Discussion – Funding Agencies [ #rdlmw ]

    Michael Huerta – NIH/NLM

Benefits of data sharing include tansparency, reanalysis, integration, algorithm development
Data sharing has costs
Sharing data is good, but sharing all data probably isn’t.
Should data be shared? Considerations:
- maturity of science – exploratory vs. well understood (might make more sense to share)
- maturity of means of collections – unique means might not be valuable for others
- amount and complexity of data – more might be better for sharing
- utility of the data – to research community and public
- ethical and policy considerations

At NIH have formulas to guide applicants in formulating data sharing plans – important questions to address (NIH requirements kick in for direct costs > $500k/yr, but are revisiting).
- What data will be shared – domain, file type, format type, QA methods, raw and/or processed, individual/summary etc.
- who will have access? Public? research community? more restricted?
- where will data be located? what’s the plan for maintenance?
- when will data be shared? at collection, or publication? incremental release of longtitudinal data?
- How will researchers locate and access data?

NIH success stories -
- Data resources – NLM: Genbank, dbGaP, PubChem, ClincalTrials.gov; NIH Blueprint for Neuroscience Research: NITRC, NIF, Human Connectome Project; NIH National Database for Autism Research – all autism research data from human subjects, federated with other resources.

    Jennifer Schopf – NSF

NSF Data policy is NOT changing – it just wasn’t enforced very broadly.
What has changed is that since mid-January every proposal that comes in must have a data management plan. DMP plan may include – types of data, standards for data and metadata, policies for access, policies for re-use, plans for archiving. Community driven and reviewed – there aren’t generally accepted definitions and practices across all the disciplines. It is acceptable to have a plan that says “I don’t plan to share my data” – but then you should probably explain why. Expected to grow and change over time, the same way impact and review criteria have changed over time.

Within NSF, looking at implications of sharing data from a computer science point of view. There is a cross-NSF task force called ACCI data task force (Tony Hey and Dan Atkins).

Trying to enable data-enabled science.

What are the perceived roles of internal support mechanisms for data lifecycles? How are we looking to interact with libraries, local archives, etc? How are researchers, librarians, CIOs, etc think about linking to regional or national efforts, and how can NSF help support this?

    Don Waters – Mellon Foundation

Most humanist disciplines depend on durable data. The digital humanities are, like e-science, changing. Witnessing massive defunding of higher education across the country, so we need to work to address common problem together.

The definition of data needs to be wider than numerical information, but not as broad as bit-level. Don defines data as being primary sources. Scientific data come in many of the same forms as humanistic data.

Data now depends on sensors and capture instruments. There’s a tendency to treat these curation issues as novel – even if they’re new to scientists, they’re familiar to humanists that have had to interpret very rich types of data – data driven scholarship is not new.

What is new is the formalization of traditional interpretive activities and powerful algorithms that can work on this data. Projects in humanities have moved the needle, but there are problems with curating data.

To achieve promise a flexible and scalable repository structure is needed. Mellon has been experimenting for over a decade. ArtStor is one example. Universities and scholarly societies have been willing to step up and provide places to store these data. Bamboo is a virtual research environment for various forms of humanistic data.

A question is raised about how NSF if working with the National Archives – they’ve been collaborating and expect to continue.

Another question is about who is the ultimate owner of data? From Mellon, data are institutionally owned and grants have explicit agreements that require institutions to gather rights from creators. NSF doesn’t have as formal a process, but NSF makes grants to institutions. From NIH, ultimately the owners are those that pay for it, which are the taxpayers. Cliff notes that it’s not clear in the US whether data (a collection of facts) can be owned. So it comes down to who has control over it. Control obligations can be shaped by contracts between funders and institutions. It’s different overseas. Don notes that in the humanities, at least, the data are often other works that have their own IP issues so rights need to be negotiated.

David asks about opportunities for sharing across institutions and disciplines. Mike answers that bringing together resources is useful, and that requires work to converge on common definitions and formats. The work they’ve done with the autism research is a good example. Once you’ve got things defined, the data can reside anywhere and there’s no onus for supporting a large infrastructure. NSF supports a wide variety of research – solutions for sharing are not easy. Shared metadata is a good idea. Don notes that there are differences that get in the way of sharing data and finding ground for shared community is a big part of the work that needs to be done. Some of that has been done at the repository level, much less at the level of tools for use of data.

Grace asks whether funding agencies are starting to do more assessment of longer-term impacts? It seems like innovation is more key to getting funding rather than sustainability or impact. At Mellon they separate the infrastructure from the innovative – and in Don’s division grants don’t get made unless they have a sustainability story. At NSF grants that support infrastructure evaluation of impact is becoming more common.

Panel Discussion – Vendor and Corporate Relationships [ #rdlmw ]

    Ray Clarke, Oracle

SNIA – 100 Year Archive Requirements Study – key concerns and observations:
- Logical and physical migration do not scale cost-effectively
- A never ending, costly cycle of migration across technology generations

Lots of challenges – Oracle (of course) offers solutions across the stack.
The ability to monitor workflow as data moves through a system is important.
There will always be a plethora of different types of media and storage – important to manage that. There are technology considerations about the shelf life and power/cooling consumption across different technologies that are better with tape than disk. Also with bit error detection. The cost ration for a terabyte stored long term on SATA disk vs. LTO-4 tape is about 23:1. For energy cost it is about 290:1. Tapes can now hold 5 TB uncompressed on a single cartridge. There are ways to deploy tiered architecture of different kinds of media, from flash, through disk, to tape for archival storage.

We need more data classification to understand how best to store it.

Oracle Preservation and Archiving Special Interest Group (PASIG), founded 2007 by Michael Keller at Stanford and Art Pasquinelli at Oracle.

    Jeff Layton, Dell

Looking at three aspects of DLM:
Data availability – how do you make your repository accessible to users? Perfect example is IRODS.
Data preservation – the “infamous problem of bit-rot” – make sure that data stays the same. Experiments with extended file attributes, and being able to restore in the background.
Metadata techniquest – How, what, when, why of data. The key is getting users to fill it out. How do we help the users make this easy? Should be part of the workflow as you go. Investigating as part of job scheduling.

Dell is acquiring pieces – Ocarina (data compression), Equallogic (data tiering), Exanet (scalable file system). Ocarina can actually compress even compressed data another 20%.

Dell prototyping/testing on data access/search methods and extended file attributes for metadata and data checksums for fighting bit-rot. The idea is putting the metadata with the data.

    Imtiaz Khan, IBM

Aspects of Lifecycle Management
- Utilization of research
- Data Management
- Storage Management

Current Challenges – Research & Publishing
- Volume, velocity, and variety – e.g. real-time analysis is about heavy volume.
- Discrete rights management – at very granular levels.
- Metadata management
- Resuability/Transformation
- Analytics
- Long term preservation.

Content analytics and insight – Watson is a great example. Taking text and using natural language processing to extract meaning and leverage that meaning for other applications.

Smart Archive Strategy – content passes through a rule-based content assessment stage before deciding where to put the content (on prem, cloud, etc).

IBM has a Long Term Digital Preservation system.

    Q&A

Oracle working on strategies for infrastructure, database, platform, and software as services in clouds.

A question is raised about intellectual property rights – e.g. proprietary compression schemes impeding scientific progress. Long term preservation and access is an important consideration.

Curt asks about middleware that can manage workflow that ease the metadata burden. Oracle does that in their enterprise content management offerings. Dell is considering enabling users to add metadata in the existing workflow, e.g in job submission or file opening.

A question is asked about PASIG and whether the other vendors have community groups working with higher ed. Dell is working with Cambridge and University of Texas on some of these issues and invites others to participate, but it’s not a formal group. IBM has various community groups (non-specified). Ray notes that PASIG is about the community, not marketing.

Are there areas beyond preservation where the vendors are working? Dell has worked with bioinformatics data, making it available. Another example is aircraft data that has to be kept available for the life of the aircraft, and we’re still flying planes from the 1940s. Finding the data is not everything – we have to be able to visualize and mine the data wrapped up with the data – it’s all one. IBM’s Smart Archive strategy is not just about preservation, but also compliance and discovery (from the legal perspective is a common use case).

Unstructured data represents about 80% of the data, and growing geometrically. RFID data is a great example of data that need to be captured and extracted. Data access patterns are random and iops-driven, not sequential.

Serge asks about pricing models for long-term pricing storage. Oracle has an ability to charge for unlimited capacity, charging by cores on the servers. Dell honestly admits they don’t have a good answer – what should be charged for that accommodates moving data across generations of technology? IBM’s pricing strategy is based on storage size. Jeffrey notes that the model has to accommodate how many copies are saved and how often they’re checked for integrity.

Vijay concludes by noting that size of data matters, and we may not want to move terabytes of data and bring the compute to the data. He tosses out a thought experiment that the vendors could store the data, for free, and charge for the compute cycles people execute on the data.

Breakout session reports [ #rdlmw ]

    Secure research data

– Wanted to focus narrowly on where access to restricted datasets are important in research computing. In social sciences, sometimes researchers have to apply to analyze data from government that is not public. Medical data is protected by regulation. Geospatial data research can use sensitive data on individuals. People working with industry sometimes have restrictions on data. Intellectual property has to be respected. Recommendations:

1. People who manage research computing environments want to know what federal standards need to be complied with – come up with a national working group on how to comply. There is a federal interagency working group on data which might be a good venue to communicate with.
2. A simple catalog of solutions from institutions on how to enable remote access to secure data. Use the Educause Cyberinfrastructure working group.
3. Catalog items for clinical translational study.

    Policy

Recommendations:
1. Develop a set of documentation (elevator speech, exec summary, and extensive report) to describe the need for policies and standards across disciplines as much as possible.
2. Develop workshop for university officers (VP of Research, Provost) to include them in discussions on how institutions can be involved.
3. Catalog of issues on data ownership and responsibility. Reduce mean time to discovery for researcher in how they should deal with their data.
4. Develop workshop for leaders of disciplinary communities.
5. Develop discipline-blind framework – what are the kinds of things a discipline needs to do to develop policies and standards?
6. University librarian is key in this role.
7. It’s time for the researchers to walk into the room with the librarians and say “we’re here”. – Brian Athey.

    Assessment and selection of research data

Is it really a goal to keep all data if possible? Good question.
Good practices with physical materials should be studied for guidance.
Expense of what it takes to manage data shouldn’t be primary consideration for what we keep.
Selection process has to be discipline specific.
What’s the cost of getting rid of something? Is reproduction of the data possible, and if so, what does that cost?
It’s easier to throw things away than to try to collect them after the fact. So collect and manage data before deciding to throw it away.
Researchers will have to provide at least core metadata.
Selection process is not yes/no but a continuum from minimal to full.

1. To make decision easier, develop a framework for making decisions. The researcher is a full partner in this.
2. Educate key audiences on importance of curatorial concepts. – researchers in all disciplines, and catch grad students now.
3. Encourage policy makers to rethink roles across the institution.

    Funding and operation

Recommendations for action:
1. Repository builders should collaborate – build with knowledge and forethought of others. Too many isolated repositories. Think federation.
2. Make data movable. Funding models will change over time. Should be movable from one caretaker to another.
3. Prepare for the hand-off. Anybody organizing a repository must put enough details in plan and budget to enable hand-off at the end of business cycle.
4. It would be useful to have a study of existing repository models.

Partnering researchers, IT staff, librarians and archivists
30 people in this breakout!
1. Communication of what’s out there – what models exist? Portal that identifies workable solutions. What practices work for training – resources for cross-training?
2. Institute more training for grad students.
3. Substantial workshop report from here – task NSF for developing a generic framework that allows institutions to implement policies and appropriate procedures.
4. Hold a workshop to define best institutional practices in communicating between researchers and librarians.
5. Survey our campuses on data management practices.

    Standards for provenance, metadata, discoverability

Got into a discussion on “what is metadata” – anything that supports the core user needs for information. IFLA def – can you find it, can you identify it, can you select among resources, can you retain or reuse it? We want our metadata to be interoperable – move across repositories, workspaces, etc. We also want trustworthy and reliable data.
Core needs:
1. Common framework for data. some emerging, like METS.
2. Role of ontologies – domains recognizing standardized terminologies. Linked Data (semantic web) might be worth exploring for this.
3. Instrumented data – if numeric data is off, then data is useless. How do we know if the data is good? Huge gap in current data – need to work with instrument manufacturers. What captured this data? Usually entered manually.
4. Metadata needs to be captured at point of data creation.
5. Need standards of provenance – what’s the purpose of creating this data? Relationships between datasets are critical. Most scientists spend a long time exploring dimensions of the same set of problems.
Researchers want to develop their own metadata – treat it like any other data stream. Don’t worry about having to bring it into a structure.

    Partnering funding agencies, research institutions, communities, and industrial and corporate partnerships

Recommendations:
1. Joint study of the feasibility of the “digital sheepskin”. Is there a model for a digital container that can be sustained through the ages, including metadata? We’ll probably have to invent some of the social context for this.
2. Conduct an aggregated study of TCO models using trusted party (academia) for storage for perpetuity or for ten years.
3. Identify the missing pieces of the research data software stack, and encourage collaborations between academia and industry.
4. A study on criteria for throwing data away, by discipline.
5. Continue to emphasize that data volume is growing much faster than our ability to move data around. Think about where we need to site data.
6. What are the possible models for joint activity with industrial partners?

Next Page »


subscribe

Pages

Latest tweets

interesting links

What I’m listening to

 

January 2012
M T W T F S S
« Dec    
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Follow

Get every new post delivered to your Inbox.