pydelfini Client¶
The pydelfini
module offers a set of high-level interfaces,
helper classes, and methods for interacting with a Delfini instance. It
has been designed specifically for ease-of-use, performance, and
familiarity relative to other modules in the ecosystem.
The features of PyDelfini currently include:
Browsing, locating, and creating collections
Browsing items within collections
Reading and writing item contents as binary or text file-like streams
Reading and writing tabular item contents as Pandas DataFrames
Future features would likely include:
Searching for data
Reading and updating data elements and item column definitions
Creating, previewing, and updating dataviews
Updating permissions and requesting access to data
Logging in¶
Most operations in PyDelfini require a logged-in session with a Delfini
instance. Logging in is simple and does not require providing
credentials through your script or notebook – instead, the
login()
method generates a one-time URL
which will take you to your Delfini instance to activate the session.
from pydelfini import login
client = login('delfini.bioteam.net') # Your Delfini hostname
The typical output looks like this:
To activate your session, visit the URL below:
https://delfini.bioteam.net/login/activate/fd8wefnef....
Waiting for session activation...
At this point, visit the provided URL, log in if necessary, and click
to approve the session activation. The
login()
method will return with an instance
of DelfiniClient
and you can continue.
If you are working with a long-running script or some other use case
that does not allow for interactive login, you will need to establish
a logged-in client using the mid-level
pydelfini.delfini_core.login.Login
routines, and pass the
resulting AuthenticatedClient
to the constructor of DelfiniClient
.
In the future, we plan to add support for unauthenticated, read-only interactions with a Delfini instance.
General operations¶
Once logged in, your DelfiniClient
interface allows you to perform basic operations:
Get a single collection with
get_collection_by_name()
collection = client.get_collection_by_name('MHSVI')
List all collections with
all_collections()
for collection in client.all_collections(): print(collection.name)
Create a new collection with
new_collection()
collection = client.new_collection('Demo 1', 'A simple demo')
Each of the methods above returns an instance of (or an iterator over)
DelfiniCollection
.
Items and Folders¶
The DelfiniCollection
interface
offers a range of methods for working with items and folders.
Retrieving an existing item is as simple as specifying its name as a key on the collection or folder object:
item = collection['item-name.txt']
Navigating folders can be done by using the
folder()
method, or using
the key-based method mentioned earlier:
folder = collection.folder('folder-name')
# or
folder = collection['folder-name']
Nested folders can be navigated either with chained key lookups or with slashes in the requested key:
subfolder = collection['folder-name/subfolder']
item = collection['folder-name/subfolder/item-name.csv']
Getting a list of items in a collection or folder can be done by iterating through the object:
items = list(collection)
# or
for item in collection['folder-name']:
print(item.name)
A recursive listing can be done on collections or folders using
walk()
:
# will print the full path to every item in the collection
for item in collection.walk():
print(item.path)
Reading and Writing Items¶
An item can be opened to a file-like object using the open()
method on either the collection or the item itself:
# these are equivalent:
with collection.open('folder/item.txt', 'r') as fp:
stuff = fp.read()
with collection['folder/item.txt'].open('r') as fp:
stuff = fp.read()
While both methods allow for reading and writing to existing items, only the first method is supported for creating a new item:
# this works:
with collection.open('a-new-item.txt', 'w') as fp:
fp.write('my new item contents')
# this will return an "item not found" error:
with collection['a-new-item.txt'].open('w') as fp:
fp.write('my new item contents')
When reading or writing to items, it is very important to either use the resulting stream in a context manager, or else close the stream as soon as your code is done with the read or write operation. If this is neglected, the read or write may not complete fully. This is particularly an issue with writing large items, as failure to close the stream can result in incomplete writes and/or corruption.
# recommended:
with collection.open('a-large-item', 'wb') as fp:
fp.write(large_item_contents)
# also ok, but don't forget the close:
fp = collection.open('a-large-item', 'wb')
fp.write(large_item_contents)
fp.close()
# DON'T DO THIS
fp = collection.open('a-large-item', 'wb')
fp.write(large_item_contents)
# missed the close! Danger!
The collection-level
open()
method also
allows you to set key item metadata, such as content type, parser, and
column definitions:
with collection.open(
'data.csv', 'w',
parser='csv',
metadata={'content-type': 'text/csv'},
) as fp:
fp.write('A,B,C\n1,1,1\n2,4,8\n3,9,27\n')
Tables and Dataframes¶
An item that can be parsed as a table (those items that have their
parser
attribute set) can be retrieved as a
pandas.DataFrame
from the collection, folder, or the item
level:
# these are equivalent
df = collection.get_table('folder/item.csv')
df = collection['folder/item.csv'].table()
The dataframe will have the appropriate columns and column types as defined in the source item. Note that retrieval of large item tables may take time to complete.
Writing a dataframe to a collection can be done at the collection or
folder level using
write_table()
:
collection['folder'].write_table('new-item.csv', p_df, format='csv')
Currently, CSV and Parquet output formats are supported.
Documentation¶
Detailed documentation for the PyDelfini client interface can be found
in the pydelfini
API documentation section.
Login and general Delfini operations |
|
Interactions with collections, items, and data tables |
|
File-like (stream) read and write on items |