CSG Fall 2014 – CRM workshop

Lisa (Georgetown) – The connected campus. Universities need to create 1:1 experiences, personalized to them. So do faculty and alumni. Students expect universities to know them. Do you know I’m on campus? We have data but it’s siloed. How do we reuse data in different contexts? What is the likelihood of student x with professor y for a course giving back to the university as an alum? Unifying data is what they’re trying to do so they can leverage it.

Salesforce can unify and connect data across the student lifecycle. Right now Salesforce does not do all the functions needed for advancement, but they are partnering with Blackbaud to build that out.

Connected campus: use cases – how do we use data to make better decisions, connect with people,.

Am I getting what I am paying for? Getting that question a lot. What is the ROI? Where is that data coming from? Advancement saw that they were missing data that others had that they could use for their work.

Challenges and Opportunities: Central IT can be the aggregator and integrator to achieve a unified data model. Leading and lifting CRM up out of localized, point solutions into an enterprise model: enabling school specific branding and innovation; breaking down the silos. Strategic timing – leveraging innovation in schools without getting out of synch; funding the next significant application domain.

Build heat map of where Salesforce is being used on campus, connecting the dots. Funding is a challenge, trying to create momentum.

Pressuring Salesforce for making their pricing more fitting to higher ed.

Data challenges at Georgetwon – data slipping through the cracks. Eg. communication preferences, first year roommate, course collaborations, # of visits to advisor, challenges, even attendance, survey data, study habits, favorite faculty, personal wellness, engagement with faculty, current interests, etc. There’s some discussion in the room about whether the use of data collected in transactional settings for other purposes is appropriate or not. Existing data policies are inadequate for expressing the contexts in which data can be combined and reused. There’s a huge “black market” in data already happening among campus units.

Salesforce@Georgetown – Goal is to seamlessly integrate data environments into one experience. Data collected from departmental spreadsheets and forms, departmental solutions, enterprise reporting, and two way flows to enterprise business applications. Moved a lot of cold fusion solutions into force.com.

One of the differentiators for Salesforce is a very intuitive UI for people working with the data. Takes away the need to pull data into Excel and do pivot tables, etc. Salesforce has granular security models.

Why use a CRM rather than dashboard reporting from the Data Warehouse? The data warehouse aggregated the four primary data sources into one place. There’s a difference between enterprise reporting and Salesforce daily and transactional reporting. Salesforce not good at longitudinal reporting. Dashboards and enterprise reporting answer larger and longer-based questions, but schools are using Salesforce to deal with current questions and problems – how is the response to this event?

Need to create some data rulesets before customizing objects and processes. GU hopes to be able to put a high level set of standards about that.

Bob Carozzoni – Cornell: Re-positioning enterpris IT’s role in CRM

Not doing typical enterprise IT model of owning the vendor relationship and acting as a reseller. Acting on the side as an advisor. The space is too fluid to take ownership. Consultant recommended consolidation of CRM tools, but CRM project closed without campus agreement on consolidation. Cloud activity allowing consumers to go directly to companies, and in activities like CRM there’s nothing forcing people to work together. What can IT add to CRM? With ERP the drive is to spend less, because it’s not differentiating. CRM is part of how you differentiate your brand and build business value, so it might make sense to spend more.  Do business leaders have a strategy? Does IT? It’s not just higher ed – most social and CRM activities in companies are not being led by IT.

Taking a soft organic approach which got campus talking to IT. Gentle engagements – contracts, distributed free MS Dynamics licenses, interviewed units and published results. Doing facilitation getting units to talk to each other – organized a Salesforce user group, but IT doesn’t lead it. Shine spotlight on campus leaders. Encouraged PaaS for CRM instead of just point solutions.

Enterprise IT funded a small architecture assessment to create some guidance documentation.

Shel – the current trend is the growth of small applets that get bolted on to cloud platforms as micro applications. In the future the lock-in may be to the bolt-ons more than the core application.

Andy Newman – Yale: Value proposition of Salesforce & Force.com at Yale

Problem: had a large number of one-off applications with enterprise data. Had two ways to address it – provide lightweight environmental a low cost (e.g. Filemaker). Requires local knowledge, and degrades over time; or sit down with BAs and build a custom app. Time consuming and expensive, hard to iterate.

Looked at using Salesforce for rapid development of inexpensive customized apps. Performance, reliability, availability are someone else’s problem. Zero capital footprint.

Early explorations – totally custom small footprint app; tailored sales/service (CRM-ish) business need. The rise of the “citizen developer”.

Should optimizing subscription costs unduly influence application design and architecture?

Are we ok with “citizen data architects”? with institutional data?

About a year ago started to get serious. Hired Bluewolf to help mature environment. Recommendations for “Org” structure. IT wrote recommendations for development platforms (use the technology that the anchor product uses or use the technology that the anchor product recommends… so with Workday that would be ???). Decided that force.com would be the development environment for Workday.

Analysis – Examining three models – Pure configuration of service desk or CRM app; greenfield force.com application; hybrid model – professional engineer partners with citizen developer. Need case studies for hybrid model.

What’s the role of central IT? Traditionally business data stewards partner with IT in standing up technical access to data – there might be an analogous process in Salesforce with a common core org with institutional data to feed to satellite orbs. Each satellite supports applications common to a constituent group.

What about Workday? Is force.com ultimately the extension platform for Workday? Workday’s vision is the cloud dominated as workday for ERP, Salesforce for CRM, Service Now, and Google, Microsoft, etc, with APIs for interop.

Georgetown – Beyond CRM: Platform VS application

Salesforce – not your grandmother’s CRM – engagement platform beyond traditional CRM: integrates and collects data at multiple touch points not just one transactional domain; engages the target and expanded functionality to them – mobile collaborative and cool; goes beyond CRM with workflow/triggers/reporting/DB.

Force.com – build bridge apps to stay aligned with ERP roadmaps: Workday – tuition benefits example (intent not to keep forever once workday makes functionality available); Address the “gap” and reduce proliferation of point solutions apps: student housing selection (future); Difference in experience is not just in reducing length and complexity, but UI and reporting were right through reuse of standard UI dashboards and objects. Emergent database/workflow/reporting needs; reduce security risks via visibility.

App Exchange – community and commercial market place; broad range of app capabilities, plug and play; speed to launch, easy integration, scalable. Is this the new direction for innovation and cohesive services? Admissions apps would be a natural to develop in Salesforce.

Imagine a triangle with Identity and data management at one point, CRM at another, and enterprise data warehouse and reporting at the other. There’s a disconnect with the organic growth of Salesforce and how the connections to enterprise data are managed.

CSG Fall 2014 – IT’s Role in Supporting Evolving Teaching and Learning Landscapes

We’re at Cornell University for the Fall CSG meeting. The morning workshop was all about cloud strategy, presenting the results of our summer working group workshop. I was heavily involved in that workshop, so couldn’t blog it, but I’ll post a link to the document that grew out of the summer meeting when it’s made public.

The afternoon workshop is on the evolving teaching and learning landscapes at our institutions and what IT’s role is in supporting that.

Global Learning Council conference – attended by EdX, Coursera, Khan Academy, Google, as well as universities.

Comments on survey results:

  • Local culture plays an important role.
  • Need to align mission of academic tech and teaching centers. Libraries are also participating in these conversations.  On one hand we can’t have the technology tail wagging the dog, on the other hand we can’t have teaching center staff promoting inappropriate technology.
  • Definitely getting sense that campus leadership is paying more attention to academic technology, who are expecting results.

Use cases:

  • Tom Lewis, Washington: Until recently never had an academic tech organization as part of central IT, but it existed as a separate organization. With Canvas got a chance to get out in front – what can we do to successfully implement this LMS on camps. What do faculty, local IT, students, need? What’s the vendor like? Tom has a team that can actually do formal assessment – helps with making data backed decisions and communicating them. Unit is seen as good collaborators by the campus, allowing them to innovate. Need a culture of experimentation among their staff – can try things and throw them away.
  • Ben Maddox, NYU: How many universities have a published teaching and learning with technology
  • strategy? Three-ish in the room. Something is changing in higher education, mostly around new revenue, lower costs, reaching new audiences, and actually helping students learn. NYU is at a point where they remove barriers – deans won’t start strategy without knowing capacity for support from IT, so now they’ve demonstrated capacity. Provost has tasked deans to produce strategies that associate teaching & learning technology with concrete goals. How can the CIO assure that the right parties are in the room to help reach those goals?
  • Linda Jorn, Wisconsin: Provost pushing initiative for empowering faculty to innovate in teaching and learning. Enabled the CIO to push for a Vice-Provost title for the Director of Academic Technology. Three goals: leverage technology for more active learning; Increase online professional masters and capstone classes; increase global learning experiences. Have been increasing staff in online learning and video production and PhD level learning consultants to work with faculty. Faculty need evidence that new environments work before making efforts. Focusing on efforts that scale and can be sustained.
  • Maggie Jesse, Iowa: Evolved into an organizational change. CIO has very strong relationship with provost’s office, and have created partnerships to push active learning ahead. That has helped the campus see IT as a partner in learning. 150 faculty have been through an active learning program, developed in partnership with the center for teaching, but that was only one person. A retirement offered an opportunity to look at organization. IT has a good track record, so the responsibility moved into the IT organization. Faculty have shown some resistance to losing center for teaching, assuming that integration with IT will lose focus on pedagogy. 

With advent of MOOCs teaching centers have had to respond to demand for being a production shop.

At NYU have new staff: eight instructional technologists and eight programmers, within IT. There are new resources at the schools with conformance with standards and architecture as part of their jobs. Every course tracked like project with costs tracked. If demand really rises, none of these models will scale.

At Washington just started online degree completion program in Arts & Sciences. Will be able to correlate student satisfaction with different production costs for each course.

At Berkeley Extension builds courses and hands them to faculty, which generate little interest from deans and faculty.

Wisconsin has faculty training programs ranging from workshops to full year courses focusing on leadership in blended and online learning. Now IT is invited to discussions in the departments with the deans.

Who provides data for learning analytics from MOOCs or LMS? Can chairs and deans see data for individual courses? At Washington have a hierarchy of permissions from dean, through department chair, curriculum owner, individual faculty. Tom just hired an anthropologist to work with analytics.

Has anybody figured out what data makes a difference? Within academic arena there has not yet been a conversation about analytics. Is it the faculty member’s data or the institution’s? Educause has been working with Gates Foundation on research in this area – identified 36 colleges and universities trying to build up planning and advising using data. Gates gave them seed money to accelerate adoption. Community Colleges are looking a lot at this. There are lots of change management issues in this area. 

At Wisconsin looking at learning analytics at the course level. There are lots of things that have to change to bring learning analytics to bear – IT, policy, culture, etc.

The use of analytics is different at highly selective private institutions (who don’t need to increase enrollment) vs. publics.

We measure numbers like who’s graduating because we can – but we don’t know globally what we want students to know or achieve, so we can’t measure that.



Information, Interaction, and Influence – Information intensive research initiatives at the University of Chicago

Sam Volchenbaum 

Center for Research Informatics – established in 2011 in response of need for researchers in Biological Sciences for clinical data. Hired Bob Grossman to come in and start a data warehouse. Governance structure – important to set policies on storage, delivery, and use of data. Set up secure, HIPAA and FISMA compliance in data center, got certified. Allowed storage and serving of data with PHI. Got approval of infrastructure from IRB to serve up clinical data. Once you strip out identifiers, it’s not under HIPAA. Set up data feeds, had to prove compliance to hospital. Had to go through lots of maneuvers. Released under open source software called I2B2 to discover cohorts meeting specific criteria. Developed data request process to gain access to data. Seemingly simple requests can require considerable coding. Will start charging for services next month. Next phase is a new UI with Google-like search.

Alison Brizious – Center on Robust Decision Making for Climate and Energy Policy

RDCEP is very much in the user community. Highly multi-disciplinary – eight institutions and 19 disciplines. Provide methods and tools to provide policy makers with information in areas of uncertainty. Research is computationally and information intensive. Recurring challenge is pulling large amounts of data from disparate sources and qualities. One example is how to evaluate how crops might fail in reaction to extreme events. Need massive global data and highly specific local data. Scales are often mismatched, e.g. between Iowa and Rwanda. Have used Computation Institute facilities to help with those issues. Need to merge and standardize data across multiple groups in other fields. Finding data and making it useful can dominate research projects. Want researchers to concentrate on analysis. Challenges: Technical – data access, processing, sharing, reproducibility; Cultural – multiple disciplines, what data sharing and access means, incentives for sharing might be mis-aligned.

Michael Wilde – Computation Institute

Fundamental importance of model of computation in overall process of research and science. If science is focused on delivery of knowledge in papers, lots of computation is embedded in those papers. Lots of disciplinary coding that represents huge amounts of intellectual capital. Done in a chaotic way – don’t have a standard for how computation is expressed. If we had such a standard could expand on the availability of computation. We could also trace back what has been done. Started about ten years ago – Grid Physics Netowrk to apply these concepts to the LHC, the Sloan Sky Survey, and LIGO – virtual data. If we shipped along with findings a standard codified directory of how data was derived, could ship computation anywhere on planet, and once findings were obtained, could pass along recipes to colleagues. Making lots of headway, lots of projects using tools. SWIFT – high level programming/workflow language for expressing how data is derived. Modeled as a high level programming language that can also be expressed visually. Trying to apply the kind of thinking that the Web brought to society to make science easier to navigate.

Kyle Chard – Computation Institute

Collaboration around data – Globus project. Produce a research data management service. Allow researchers to manage big data – transfer, sharing, publishing. Goal is to make research as easy as running a business from a coffee shop. Base service is around transfer of large data – gets complex with multi-institutions, making sure data is the same from one place to the other. Globus helps with that. Allow sharing to happen from existing large data stores. Need ways to describe, organize, discover. Investigated metadata – first effort is around publishing – wrap up data, put in a place, describe the data. Leverage resources within the institution – provide a layer on top of that with publication and workflow, get a DOI. Services improve collaboration by allowing researchers to share data. Publication helps not only with public discoverability, but sharing within research groups.

James Evans – Knowledge Lab

Sociologist, Computation Institute. Knowledge Institute started about a year ago. Driven by a handful of questions: Where does knowledge come from? What drives attention, imagination? What role does social, institutional play in what research gets done? How is knowledge shared? Purpose to marry questions with the explosion of digital information and the opportunities that provides. Answering four questions: How do we efficiently harvest and share knowledge harvested from all over?; How do we learn how knowledge is made from these traces?; Represent, recombine knowledge in novel ways; Improve ways of acquiring knowledge. Interested in long view – what kinds of questions could be asked? Providing mid-scale funding for research projects. Questions they’ve been asking: How science as an institution thinks and how scientists pick the next experiment; What’s the balance of tradition and innovation in research? ; How people understand safety in their environment, using street-view data; Taking data from millions of cancer papers then drive experiments with a knowledge engine; studying peer review – how does review process happen? Challenges – the corpus of science, working with publishers – how to represent themselves as safe harbor that can provide value back; how to engage in rich data annotations at a level that scientists can engage with them?; how to put this in a platform that fosters sustained engagement over time.

Alison Heath – Institute for Genomics and Systems Biology and Center for Data Intensive Science

Open Science Data Cloud – genomics, earth sciences, social sciences. How to leverage cloud infrastructure? How do you download and analyze petabyte size datasets? Create something that looks kind of like Amazon or Google, but with instant access to large science datasets. What ideas to people come up with that involve multiple datasets. How do you analyze millions of genomes? How do you protect the security of that data? How do you create a protected cloud environment for that data? BioNimbus protected data cloud. Hosts bulk of Cancer Genome Project – expected to be about 2.5 petabytes of data. Looked at building infrastructure, now looking at how to grow it and give people access. In past communities themselves have handled data curation – how to make that easier? Tools for mapping data to identifiers, citing data. But data changes – how do you map that? How far back do you keep it? Tech vs. cultural problems – culturally has been difficult. Some data access controlled by NIH – took months to get them to release attributes about who can access data. Email doesn’t scale for those kinds of purposes. Reproducibility – with virtual machines you can save the snapshot to pass it on.


Engagement needs to be well beyond technical. James Evans engaging with humanities researchers. Having equivalent of focus groups around questions over a sustained period – hammering questions, working with data, reimagining projects. Fund people to do projects that link into data. Groups that have multiple post-docs, data-savvy students can work once you give them access. Artisanal researchers need more interaction and interface work. Simplify the pipeline of research outputs – reduce to 4-5 plug and play bins with menus of analysis options. Alison – helps to be your own first user group. Some user communities are technical, some are not. Globus has Web UI, CLI, APIs, etc. About 95% of community use the web interface, which surprised them. Globus has a user experience team, making it easier to use. Easy to get tripped up on certificates, security, authentication – makes it difficult to create good interfaces. Electronic Medical Record companies have no interest in being able to share data across systems – makes it very difficult. CRI – some people see them as service provider, some as a research group. Success is measured differently so they’re trying to track both sets of metrics, and figure out how to pull them out of day-to-day workstreams. Satisfaction of users will be seen in repeat business and comments to the dean’s office, not through surveys. Doing things like providing boilerplate language on methods and results for grants and writing letters of support go a long way towards making life easier for people. CRI provides results with methods and results section ready to use in paper drafts. Should journals require an archived VM for download? Having recipes at right level of abstraction in addition of data is important. Data stored in repositories is typically not high quality – lacks metadata, curation. Can rerun the exact experiment that was run, but not others.  If toolkits automatically produce that recipe for storage and transmission then people will find it easy.


Information, Interaction, and Influence – Digital Science demos

Digital Science is a UK company that is sponsoring this workshop, and they’re starting off the morning by demoing their family of products.

Julia Hawks – VP North America, Symplectic

Founded in2003 in London to serve the need of researchers and research administrators. Joined Digital Science in 2010. Works with 50 universities – Duke, UC Boulder, Penn, Cornell, Cambridge, Oxford, Melbourne.

Elements – research information management solution. Capturing and collating quality data on faculty members to fulfill reporting needs: annual review, compliance with open access policies, showcasing institutional research through online profiles, tracking the impact of publications (capture citation and bibliometric scores). Trying to reduce burden on faculty members.

How is it done? – automated data capture, organize into profiles, data types configurable, reporting, external systems for visibility (good APIs).

Where does the data come from? External sources – Web of Science, Scopus, host of others, plus internal sources from courses, HR, grants.

Altmetric -

Help researchers find the broader impact of their work. Collect information on articles online in one place. COmpany founded in 2011 in London. Work with publishers, supplying data to many journals, including Nature, Science, PNAS, JAMA. Also working with librarians and repositories. Some disciplines have better coverage than others.

Altmetric for institutions – allows users withinn an institution to get an overview of the attention research outputs are getting. Blogs, mainstream media, smaller local papers, and news sources for specific verticals, patents, policy documents.

Product built with an API to feed research information systems, or have a tool called Explorer to browse bibliographies.


Build tor researchers, but also have products for publishers and institutions. Manages publications and articles for reading. Manages a library of PDFs. Has highlighting, annotations, reference lookup. Recommends other articles based on articles in your library.

ReadCube for Publishers – free indexing and discovery service, embedded PDF viewer + data, Checkout – individual article level ecommerce.

ReadCube Access for Institutions – enables institutions to close the collections gap with affordable supplemental access to content. Institutions can pick and choose by title and access type.

figshare – Dan Valin

Three offerings – researcher, publisher, institutions

Created by an academic, for academics. Further hte open science movement, build a collaborative portal, change existing workflows. Provides open APIs

Cloud-based research data management system. Manage research outputs openly or privately with controlled collaborative spaces. Public repository for data.

For institutions – research outputs management and dissemination. Unlimited collaborative spaces that can be drilled down to on a departmental level.

Steve Leicht – UberResearch

Workflow processes – portfolio analysis and reporting, classification, etc. Example – Modeling a classification semantically. Seeing difference across different funding agencies. Can compare different institutions, can hook researchers to ORCID.


Information, Interaction, and Influence – Research networking and profile platforms

Research networking and profile platforms: design, technology and adoption of 
networking tools 

Tanu Malik, UChicago CI – treating science as an object. Need to record inputs and outputs, which is difficult, but some things are relatively easy to document: publications, patents, people, institutions, grants. Some of this has been taking place, documenting metadata associated with science. How can we integrate this data and establish relationships in order to get meaningful knowledge out of it? There have been a few success stories: VIVO, Harvard Profiles. This panel will discuss the data integration challenges and the deployment challenges. Computational methods exist but still need to be implemented in easy to use ways.

Simon Porter – University of Melbourne

Implemented VIVO as Find an Expert – oriented towards students and industry. Now gets around 19k unique visitors per week.

Serendipity as research activity – the maximum number of research opportunities are possible when we can maximize the number of people discovering or engaging with our research. Enabled by policy, enabled by search, enabled by standards, enabled by syndication. 

At Australian universities have had to collect the information on research activity all along. Some of it is private, but some is public and the University can assert publication of it.  Most universities have something, but lots of different systems.

Only a small number of people will use the search in your system. Most will come from Google. 

Syndicating information within the university – VIVO – gateway to information – departments take information from VIVO to publish their own web pages. Different brands for different departments. 

Syndication beyond the University – Want to plug into international research profiling efforts. 

New possibilities: Building capability maps. How to support research initiatives. Start from people being appointed to start the effort. Use Find An Expert to identify potential academics. Can put together multiple searches to outline capability sets. Graphing interactions of search results. 

Leslie Yuan – Clinical and Translational Science Institute – UCSF

The Profiles team all came from industry – highly oriented towards execution. When she started they wanted lots of people to use, so how to get adoption? If you build it, they probably won’t come. Use your data and analyses to drive success with a very lean budget. In four years went to over 90k visits per month. Gets 20% of the traffic of the main UCSF web page.


1. Use Google (both inside and outside the institution).  Used SEO on site. 88% of researcher profiles have been viewed 10+ times. Goal was to get every one of researchers to come up in top 3 results when they type the name in. Partnered with University Relations – any article that the press office writes about a researcher links to their profile.

2. Share the data. APIs provide data to 27 UCSF sites and apps. Has made life easier for IT people across the university, leading to evangelization in the departments. Personalized stats are sent to profile owners – how many times your profile was viewed within the institution, from other universities, from major pharmas. People wanted specifics. Nobody unsubscribed. Vanity trumps all.  Research analytics shared with leadership. Helped epidemiology and biostatistics show that they are the most collaborative unit on campus.

3. Keep looking at the data – monthly traffic reporting, engagement stats (by school, by department, who’s edited profile, who’s got pictures), Network visualizations of co-authorships.

4. Researcher engagement – automated onboarding emails – automatically creating profiles, then letting people know about them as they come on board. Added websites, videos, tweets and more inline. Batch loaded all UCTV videos onto people’s profiles, then got UCTV to send email to researchers letting them now. Changed URLS – profiles.ucsf.edu/leslie.yuan 

5. Partnerships – University Relations, Development & Alumni, Library, UC TV, Directory,  School of Medicine, Center for AIDS research, Dept. of Radiology. Was able to give data back to Univ Relations on articles by department or specialty, which they weren’t tracking. Automatic email that goes out if people get an article added. 

Took 8 or 9 months of concentrated conversations with chairs, deans, etc to convince them that this was a good thing. Only 7 people asked to be taken off the system. Uptake was slow, but now people are seeing the benefit of having their work out there.  6 people on her team have touched the system in some way, but it’s nobody’s full-time job.

Griffin Weber, Harvard – Research Networking at the School, University, and Global Scale

Added passive and active networking to the profiles system. Passive network provided information that people hadn’t seen before, driving adoption, active networks allowed the site to grow over time. Passive network creates networks based on related concepts. Different ways of visualizing the concept maps – list, timeline, co-authors, geography (map), ego-centric radial graph (social network reach), list of similar people

Different kinds of data for Harvard Faculty Finder – comets and stars discovered, cases presented to the Supreme Court, classes taught, etc. Pulled in 500k publications from Web of Science. Derived ontologies in 250 disciplines across those publications using statistical methods. 

Direct2experts.org – federated search across 70 biomed institutions. 

Faculty affairs uses Profiles to form promotions committees, students using it to find mentors. 

Bart Trawick, NCBI – NLM – Easy come, easy go; SciENcv & my bibliography 

NIH give $15.5 in grants per year. Until 2007 didn’t have a way of seeing what they were getting from the investment. Public access to publications mandated by Congress in 2007. Started using MyBibliography to track. Over 61k grant applications coming in every year, just flat PDFs. 

About 125k US trained scientists in the workforce now. Many have been funded by training grants. Want to see how the scientists continue their career. Over 2500 unemployed PhDs in biomedical science.

My NCBI Overview – tools and preferences integrated with NCBI databases. Connected to PubMed, genomics, etc. Uses federated login (can link google accounts e.g.) Can link ERA commons account – pull in information about profiles, grants linked. 

My Bibliography – make it a tool to capture information and link grant data to publications. Set up to monitor many of the databases that information flows through. End result of public access policy is that all NIH-funded research publications get deposited in PubMed Central. MyBibliograhpy lets scientists know if they’re compliant with policy. Send structured data back out to PubMed, allowing searching by grant numbers, etc. 

SciENcv – released second version this week. Help scientists fill out profile – each agency has their own biosketch format. SciENcv is attempt to standardize that. NIH set up, working on others, NSF next on list. Wanted to make it easy for researchers who are already funded and using MyBibliography. Data exists out there – would like to get to a point of reuse of data for grant reporting. Added inputs – ORCID, eRA Commons (used to manage grants), MyBibliography. Grants.gov requires biosketches in PDF. Can export from SciENcv in pdf to grants.gov, with rich metadata attached.

Information, Interaction, and Influence 

 I’m attending a workshop on Research Information Technologies and their Role in Advancing Science.

Ian Foster from the UChicago Computation Institute is kicking it off. 

We now have the ability to collect data to study how science is conducted. We can also use that data for other purposes: finding collaborators, easing administrative burdens, etc. Those are the reasons we often get funding to build research information systems, but can use those systems to do more interesting things.

Interested in two aspects of this:

1. Treat science itself as an object of study.
2. Can use this information to improve the way we do science. Don Swanson – research program to discover undiscovered public knowledge. 

The challenge we face as a research community as we create research information systems is to bring together large amounts of information from many different places to create systems that are sustainable, scalable, and usable. Universities can’t build them by themselves, and neither can private companies. 

Victoria Stodden – Columbia University (Statistics) – How Science is Different: Digitizing for Discovery

Slides online at: http://www.stanford.edu/~vcs/talks/UCCI-May182014-STODDEN.pdf

Tipping point as science becomes digital – how do we confront issues of transparency and public participation? New area for scientists. Other areas have dealt with this, but what is different about science?

1. Collaboration, reproducibility, and reliability: scientific cornerstones and cyberinfrastructure

Scoping the issue – looking at the June issues of Journal of American Statistical Association – how computational is it? Is the code available? 1996 – about half computational, by 2009 almost all computational. In ’96 none talked about getting the code. In 2011, 21% did. Still 4 out of 5 are black boxes. 

In 2011 ? looked at 500 papers in biological sciences. Was able to get data in 9% of the cases.

The scientific method:

Deductive: math, formal logic; Empirical (or inductive): largely centered around statistical analysis of controlled experiments. Computational, simulations, data-driven science, might be 3rd and 4th branches. The Fourth Paradigm.

Credibility Crisis: Lots of discussion in journals and pop press about dissemination and reliability of scientific record. 

Ubiquity of Error: central motivation of scientific method – we realize that our reasoning may be flawed, so we want to hit it against evidence to get closer to the truth. In deductive branch, we have proofs. In empirical branch, we have the machinery of hypothesis testing. Hundreds of years to come up with standards of reliability and reproducibility. The computational aspect is only a potential new branch, until we develop comparable standards. Jon Clairbout (Stanford): “Really reproducible Research” – an article about computational science is merely the advertisement of the scholarship. The actual scholarship is the set of code and data that generate the article.

Supporting computational science: Dissemination platforms; Workflow tracking and research environments (prepublication); embedded publishing – documents with ways of interacting with code and data. Mostly being put together by academics without reward because they thing these are important problems to solve. 

Infrastructure design is important and rapidly moving forward.

Research Compendia – a site with dedicated pages which house code and data, so you can download digital scholarly objects. 

2. Driving forces for Infrastructure Development

ICERM Workshop Dec 2012 – reproducibility in computational mathematics. Workshop report that was collaboratively written by attendees. Tries to lay out guidelines for releasing code and data when publishing results. Details about what needs to be described in the publication. 

3. Re-use and re=purposing: Crowd sourcing and evidence-based-***

Reproducible Research is Grassroots.

External drivers: Open science from the Whitehouse. OSTP Exec memorandum: federal funding agencies to submit plans within 6 months to say how they will facilitate access to publications and data; in May order to federal agencies doing research directing them to make data publicly available. Software is conspicuously absent. Software has different legal status than data – makes it different than data for federal mandating – Bye Dole act, allowing universities to claim patents on software.

Science policy in congress – how do we fund science and what are the deliverables? Much action around publications.

National Science Board 2011 report on Digital Research Data Sharing and Management

Federal funding agencies have had a long-standing commitment to sharing data and (to a degree) software. NSF grant guidelines expect investigators to share with other researchers at no more than incremental cost, data. Also encourages investigators to share software. Largely unenforced. How do you hold people’s feet to the fire when definitions are still very slippery. NIH expects and supports timely release of research data for bigger grants (over $500k). NSF data management plan – looks like it’s trying to put meat on the bones of the grant guidelines.

2003 Natioanl Academies report on Data Sharing in the Life Sciences. 

Institute of Medicine – report on Evolution of Translational Omics: Lessons Learned and the Path Forward. When people tried to replicate work they couldn’t , and found many mistakes. How did work get approved for clinical trial? New standards were recommended. Reccomends standards for locking down software.

Openness in Science: Thinking about infrastructure to suppor this – not part of our practice as computational scientists. Having some sense of permanence of links to data and code. Standards for sharing data and code so they’re usable by others. Just starting to develop.

Traditional Sources of Error: View of the computer as another possible source of error in the discovery process. 

Reproducibility at Scale – May take specialized hardware and long run times? How do we reproduce that? What do we do with enormous output data?

Stability and Robustness of Results : Are the results stable? IF I’m using statistical methods, do they add their own variability to the findings?

The Larger Community – Greater transparency opens scholarship to a much larger group – crowd sourcing and public engagement in science. Currently most engagement is in collecting data, much less in the analysis and discovery. How do we provide context for use of data? How do we ingest and evaluate findings coming from new sources? New legal issues – copyright, patenting, etc. Greater access has possibility of increasing trust and help inform the debates around evidence-based policy making. 

Legal Barriers – making code and data available. Immediately run into copyright. In US there is no copyright on raw facts, per Supreme Court. Original selection and arrangement of facts is copyrightable. Datasets munged creatively could be copyright, but it’s murky. Easiest to put data in public domain, or use CC license. Different in Europe.  GPL – includes “sharealike” preventing from using open source code in proprietary ways. Science norms are slightly different – share work openly to fuel further development wherever it happens. 

CSG Spring 2014 – Analytics Discussion

ECAR Analytics Maturity Index – could use it to assess which group to partner with to judge feasibility. 

NYU started analytics several years ago and chose certain kinds of data. 

Dave Vernon – Cornell
Hopes and dreams for the Cornell Office of Data Architecture and Analytics (ODAA)
Curent state fof data usability at Cornell: like a library system with hundreds of libraries, each with unique catalog systems (if any), each requiring esoteric knowledge, each dependent on specialists who don’t talk to each other.

Traditional “BI” -not analytics but report generation. Aging data governance.

ODAA – to support Cornell’s mission by maximizing the value of data resources. Act as a catalyst/focal point to enable access to teaching, research, and admin data. Acknowledge limited resource, but will attempt to maximize value of existing resources.

Rethink governance: success as the norm, restrictions only as needed? Broad campus involvement in data management – “freeing” of structured / unstructured data. Stop arguing over tools: OBIEE vs Tableau, etc. Form user groups – get the analysts talking. 

Service Strategy: Expand Institutional Intelligence initiative: create focused value from a select corpus of admin data (metadata, data provenance, governance, and sustainable funding). Cost recovered reporting and analytics services. User groups, consultants, catalog and promulgate admin and research data resources. 

Resource strategy: What do you put together in this office? Oracle people, reporting people. Re-aloacate savings. Add skilled data and analytics professionals. Modest investment in legacy tool refresh. People are getting stuck in that discussions of tools.

Measures of Success: ODAA becomes a known and trusted resource. Cultural evolution – open not insular. Data becomes actionable, self-service. Broad campus involvement data management, “freeing” of data – have to work on data stewards to convince them that they have to make a compelling argument to keep data private. Continued success of legacy services.

At NYU IR owns the data stewardship and governance, but there is a group in a functional unit (not IT) that acts as the front door for data access. Currently just admin data focus, but growing out of that. Two recent challenges – student access to data (pressing governance structure), and learning analytics (people want access to LMS click streams – what about privacy concerns?).

Stanford – IR group reports to provost (like 15 people) do admin data. Group reports to dean of research for research data. Teaching & learning under another VP. Groups specialize, reducing conflict. Data scientists are part of those groups. 

Washington spinning up data science studio with research people, IT, library people as a physical space for people to collocate. 

Jim Phelps – can we use the opportunity of replacing ERPs to have the larger discussion about data access and analytics?

Notre Dame halted BI effort to go deeply into a data governance process, and as part of that are getting a handle on all of the sets of data they have. Building a data portal that catalogs reports. More a collection of data definitions rather than a catalog of data. data.nd.edu A concept at this point, but moving in that direction. Registry of data – all data must be addressable by url. Catalog shows existing reports, showing shat roles are necessary to get access. Terms used in the data are defined. 

Duke – Not hearing demand for this on campus, but getting good information on IT activity using Splunk on data streams. Could get traction by showing competencies in analysis.

At NYU had a new VP for Enrollment Management who had lots of data analysis expertise, who wowed the Board with sophisticated analyses, driving demand for that in other applications. 

Data Science Venn diagram – http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Dave Vernon – There’s an opportunity to ride the big data wave to add value by bringing people together and getting the conversations going and make people feel included. 

How are these teams staffed? Can’t expect to hire someone who knows all the pieces, so you have to have cross-functional teams to bring skills together. Michigan State has a Master’s program in analytics, so bringing some good people in there. Last four hires at Notre Dame have been right off the campus. Now have 8 FTE in BI space. 


Latest tweets

What I’m listening to


Get every new post delivered to your Inbox.