Introducing The AnVIL Data Explorer (Beta Release)

We are excited to introduce users of the AnVIL Portal to the AnVIL Data Explorer — the AnVIL Data Explorer’s faceted search feature that allows you to create cohorts across datasets based on your sample-level needs.

What is the AnVIL Data Explorer?

You’ll now be able to streamline your data-gathering process in a way that allows you to create custom cohorts to keep your workspaces efficient and organized. You’ll be able to choose exactly the data you want to work with!

With the AnVIL Data Explorer, you’ll be able to sort the datasets you’re browsing based on the following managed access categories:

  • Datasets
  • Donors
  • BioSamples
  • Activities
  • Files

What are the Best Practices for working with NIH data?

When working with NIH data, particularly managed access data, users must follow their data use agreements. Terra users working with NIH data agree to our Terms of Service, and we request that users leverage extra security features when working with NIH data.

How do I use the AnVIL Data Explorer?

Below, you’ll find step-by-step instructions for navigating the AnVIL Data Explorer and exporting data to Terra.

Step 1: Finding the AnVIL Data Explorer

The AnVIL Data Explorer can be found by navigating to the AnVIL Portal, and clicking on the Datasets button:

Data Explorer

Step 2: Authentication

Ensure you have completed the necessary Terra registration steps.

Note for AnVIL Data Custodians: In many cases, AnVIL users that were part of data-generating consortia will be granted access to workspaces that are already configured to receive data. However, if you intend to add data to workspaces of your own, you may need to configure your own Terra Billing Project. For detailed instructions on setting up Billing Projects, please refer to our article on “How to set up billing in Terra.”

Step 3: Selecting Data

Search and Filtering Data

You can sort and filter studies based on a wide range of facets, from sample- and donor-level information to any of the fields in the editable column view.

Search and Filter

The facets are visible in the screen above in the column on the left. If you click on any of these facets, you can see a more detailed view of studies that fall into the filters you’ve chosen.

Select Filter

When you select multiple facets, only data matching all selected facets is displayed (e.g. filter by Anatomical Site AND BioSample Type). When you select multiple values within a facet, data matching any of the facet values is displayed (e.g. selecting both Blood and Tissue above will list studies that include Blood OR Tissue samples).

Exploring Datasets

When you click on a dataset, you’ll be taken a summary page where you can find a variety of information and helpful links, including but not limited to:

  • What consortium the dataset is associated with
  • The quantity and types of data
  • Links to APIs for accessing the data programmatically
  • Links to request access
  • A button for exporting the dataset to a Terra workspace
Exploring Datasets

Step 4: Exporting Data

Exporting from The AnVIL Data Explorer

Once you’re ready to export the data, you can click Export to Terra from within a particular dataset, or you can also select one or more datasets from the "Datasets" filter and click the Export button at the top right of your screen when you are on the AnVIL Data Explorer’s main page.

Exporting with Terra

Clicking this button will take you to a window where you can export to a Terra workspace through the user interface.

Chose Export Method

Once you select Analyze in Terra, you’ll see a button labeled “Request Link”.

Note, for full export, be sure to select all species and file types.

Export to Terra Process

After you click this button, you will be prompted to wait while the system generates a link to Terra. Once this link is ready, you’ll see a page with a button labeled “Open Terra”. Clicking this button will take you to a workspace selection screen in Terra, where you’ll be able to select the workspace to which you’d like to add this data.

Export to Terra Process Complete

Working with the Data in Terra

Until recently, AnVIL data has been hosted and shared from multiple Terra workspaces making it hard to generate cohorts across differing studies. To resolve this, we created the AnVIL Data Explorer enabling you to create custom cohorts and then hand them off to your own Terra workspaces.

Depending on the AnVIL dataset/study, the data in question have varying schemas (different columns and structure to the data). In an effort to ingest all of the AnVIL datasets, the Broad's Data Sciences Platform created a common subset schema across all AnVIL datasets. When you use the AnVIL Data Explorer, it actually searches through a specialized subset - called the Findability Subset (FSS) - that only contains the attributes which are most commonly used by researchers across a broad range of study data types and for diverse analyses.

Working with NIH Data in Terra

When working with NIH data in Terra, we require users to import data to workspaces with the checkbox for protected data marked. Optionally, an Authorization Domain may be applied and is highly recommended if working with controlled access data.

In the data handoff from the AnVIL Data Explorer, you will transition to the Terra import screen. When importing from an NIH repository, you will see on the left-hand side the message “The data you chose to import to Terra are identified as protected and require additional security settings. Please select a workspace that has an Authorization Domain and/or protected data setting.”

Working with NIH Data in Terra

Selecting Workspace

Next, you’ll see a workspace selection screen where you can either choose an existing workspace or create a new workspace to receive the data.

Selecting Workspace

If you choose to “start with an existing workspace”, you can only select workspaces for which you have write access and that have the required security settings. If you cannot select a workspace of interest (it is grayed out), it means this workspace is non-compliant and you should create a new one with the required security settings.

Start with Existing Workspace

If you choose to “start with a new workspace”, you will see that the import was recognized as coming from an NIH repository and the protected data checkbox is added. You can optionally add an Authorization Domain.

Create New Workspace

Once you’ve completed this step, your workspace will spin up and you can go to the Data tab of your workspace to see a set of tables have been successfully imported into the workspace.

AnVILNHGRINIHHHSUSA.GOV
Privacy
v2.14.0-46e7d86-anvil9