At the heart of any data science project is a Jupyter notebook. Datalore gives you smart tools to work with Jupyter notebooks. Rely on its smart coding assistance for Python, SQL, R, Scala, and Kotlin to move faster and write higher-quality code with less effort. The Datalore editor gives you quick access to all the essential tools, including attached data sources, automatic visualizations, dataset statistics, report builder, environment manager, versioning, and more.
Watch the video below to see how easy it is to create Jupyter notebooks in Datalore:
Notebooks in Datalore are Jupyter compatible, meaning you can upload your existing IPYNB files and continue working with them in Datalore. Furthermore, you can also export notebooks as IPYNB files. Note that data connections and interactive controls won’t be exported.
Datalore comes bundled with code insight features from PyCharm. For Python notebooks you get first-class code completion, parameter info, inspections, quick-fixes, and refactorings that help you write higher-quality code with less effort.
Get documentation pop-ups for any method, function, package, or class. Datalore will show you the documentation right where you need it.
Datalore supports both pip and conda. Pip is fast and free for everyone, whereas conda is free for non-commercial use only.
In Datalore you can create Kotlin, Scala, and R notebooks. You can use magics to install packages, and when writing code you’ll get code completion.
Add native SQL cells to query your database connections. In addition to SQL syntax-highlighting, you also get code completion based on the introspected database tables. The query result is automatically transferred to a pandas DataFrame and you can continue working on the dataset in Python.
Datalore comes with an integrated package manager that makes your environment reproducible. The package manager allows you to install and manage new packages and makes sure they persist when you reopen the notebook.
Create multiple base environments from custom Docker images. You can pre-configure all the dependencies, package versions, and build tool configurations so that your team doesn’t spend time manually installing things and syncing the package versions.
Install a custom pip-compatible package from a Git repository by attaching a Git branch to your notebook.
Configure a script that you want to run before the notebook starts. Here you can specify all the necessary build tools and required dependencies.
Get automatic visualization options inside the Visualize tab for any pandas DataFrame. Point, Line, Bar, Area, and Correlation plots will help you quickly understand the contents of your data. If the dataset is big it will be automatically sampled. All plots can be then extracted to code or Chart cells for further customization.
Create visualizations with the package of your choice. Matplotlib, plotly, altair, seaborn, lets-plot, and many other packages are supported in Datalore notebooks.
Create production-ready visualizations with just a few clicks. The state of the cells is shared with collaborators so you can work on the visualization together.
Apply filtering and sorting to Pandas DataFrames and SQL query results directly in the cell output. Select the columns to display, sort the dataset by a specific column, filter based on “equals” and “contains” expressions, and easily jump to the top or bottom of the dataset. After you complete the filtering and sorting, use the Export to code cell option to generate the Pandas code snippet and make the table view reproducible.
With one click, you can get essential descriptive statistics for a DataFrame inside a separate Statistics tab. For categorical columns, you will see the distribution of values, and for numerical columns, Datalore will calculate the min, max, median, standard deviation, percentiles, and also highlight the percentage of zeros and outliers.
Add interactive inputs such as dropdowns, sliders, text inputs, and date cells inside your notebooks, and use the input values as variables inside your code. Present the visualizations with Chart cells and highlight specific numbers within Metric cells.
Datalore supports IPyWidgets, the classic Jupyter widgets framework. Add interactive controls with Python code, combine multiple widgets in one cell output, and use the selection as a variable in the following parts of your notebook.
Open CSV and TSV files from the Attached data tab in a separate tab inside Datalore’s editor. Sort the column values and paginate the file contents.
Create and edit CSV and TSV files right inside the Datalore editor. You can start from scratch and create a new file or you can edit the contents of an existing one.
Open Terminal windows inside the editor, execute .py scripts, and access the agent, environment, and file system using standard bash commands.
Browse notebook variables and built-in parameter values from one place.
Create custom history checkpoints that enable you to revert the changes at any time using the history tool. When browsing a checkpoint, you will see the difference between the current version of the notebook and the one that is selected.
In Datalore, you can execute your notebooks on CPUs and GPUs. You can choose the machine you need from the UI. The type and amount of resources you get depends on which plan you have. Find more information here.
You can connect any type of server hardware you are already using to Datalore and make it accessible to your users from Datalore’s interface.
Reactive mode will enforce top-down evaluation order and automatic recalculation of the cells below the modified one. The notebook state will be saved after each cell evaluation and can be restored at any time.
Switch on Background computation to keep your notebook running even when you close the browser tab. You will always have access to your running computations list from the User menu or the Admin panel.
Download CSV reports containing the amount of time you spent running each machine – it might help you understand which projects you’ve paid most attention to.
Use scheduling to run your notebooks on an hourly, daily, weekly, or monthly basis and deliver regular updates to published reports. Choose the schedule parameters from the user interface or use the CRON string. Notify notebook collaborators upon successful or failed runs via email.
It is now possible to trigger running a Datalore notebook or republishing a report with an API call. This feature is an addition to scheduled runs, which allow you to trigger rerunning a notebook on-demand from external apps and internal Datalore notebooks. It is also possible to view the run results from the Scheduled run menu. More information on how to use the API can be found here.
For R notebooks, you can now install packages from public and private R package
repositories supported by
install.packages inside the
Environment manager tab. Using Environment manager helps keep your
environment configuration persistent throughout the notebook runs. It's possible
to configure a custom repo by creating an
.Rprofile file in init.sh
or a custom agent image.
While the conda installation remains the default in the cloud version, Enterprise customers can configure a non-conda custom base environment with the R kernel. This will lead to an absence of conda packages in the Environment manager search results. You can find an example of such an installation here.
Locate specific code sections or find the necessary information in all notebooks across your workspaces. In addition to searching for notebook names, you can now search for variable names and content. You will see your query highlighted in the search results.