Visit Lightning AI

A Look at Lightning AI

What is Lightning AI (and Where Does Grid.ai Fit With It)?

By Femi Ojo, Senior Support Engineer Lightning AI

Lightning is a free, modular, distributed, and open-source framework for building Lightning Apps where the components you want to use interact together.

How is Lightning AI Different from Grid.ai?

Lighting AI is the evolution of Grid.ai. The Grid platform enables users to scale their ML training workflows and remove all the burden of having to maintain or even think about cloud infrastructure. Lightning AI takes advantage of a lot of things Grid.ai does well, in fact Grid.ai is the backend that powers Lightning AI. Lightning AI builds upon Grid.ai by expanding further into the world of MLOps, helping to facilitating the entire end-to-end ML workflow. That is how powerful the framework is. It is a platform for ML practitioners by ML practitioners and engineers.

By design Lightning AI is a minimally opinionated framework to guard developers against unorganized code and is flexible enough to build cool and interesting AI applications in a matter of days depending on complexity. It is truly a product made for engineers and creatives and built by engineers and creatives.

So, What’s So Great About Lightning AI?

Lightning Apps! Lightning Apps can be built for any AI use case, ranging from AI research to production-ready pipelines (and everything in between!). By abstracting the engineering boilerplate, Lightning AI allows researchers, data scientists, and software engineers to build highly-scalable, production-ready Lightning Apps using the tools and technologies of their choice, regardless of their level of engineering expertise.

The current problem today is that the AI ecosystem is fragmented, which makes building AI slower and more expensive than it needs to be. For example, getting a model into production and maintaining it takes hundreds, if not thousands, of hours spent maintaining infrastructure. Lightning AI solves this by providing an intuitive user experience for building, running, sharing, and scaling fully functioning Lightning Apps. A nice consequence of this is it will now only take days (not years) to build AI applications. 

Here are a few cool things you can do with Lightning Apps:

  1. Integrate with your choice of tools – TensorBoard, WanDB, Optuna, and more!
  2. Train models
  3. Serve models in production
  4. Interact with Apps via a UI
  5. Many more apps to come as we and the community collaborate to make the Lightning Apps experience one to remember

Lightning Apps Gallery

Along with the concept of Lightning Apps, Lightning introduces the Lightning Gallery. The Gallery is the community’s one-stop-shop for a diverse set of curated applications and components. The value of the app and component galleries are endless and only limited by the developer’s imagination. For example, there could be components for:

  1. Model Training
  2. Model Serving
  3. Monitoring
  4. Notification

Using only these 4 components a fully qualified MLOps pipeline can be built. For example, an anomaly detection app with the following characteristics could be built from them:

  1. Model training component – Train model
  2. Model deployed to production – Model detects an anomaly
  3. Monitoring component – Data drift detected and then triggers a model update
  4. Notification component – Notifies interested parties of the detected anomaly

Gallery Examples

Prior to launching Lightning as a product we thought about the need to have some existing apps to give the community a flavor of what can be created and showcase how easy it is.

Components (Building blocks)

  1. PopenPythonScript and TracerPythonScript – Enable easy transition from Python scripts to Lightning Apps. See our How to Move a Grid Project to Lightning Apps tutorial for an example.
  2. ServeGradio – Enables quick deployment of an interactive UI component.
  3. ModelInferenceAPI – Enables quick prototyping of model inferencing.

Applications (Building)

  1. Train & Demo PyTorch Lightning – This app trains a model using PyTorch Lightning and deploys it to be used for interactive demo purposes. This is not meant to be a real-time inference deployment app. There are other apps with real-time inference components that can be used to achieve < 1 ms inference times. This is a great app to use as a starting point for building a more complex app around a PyTorch Lightning model.
  2. Lightning Notebook – Use this app to run many Jupyter Notebook on cloud CPUs, and even machines with multiple GPUs. Notebooks are great for analysis, prototyping, or any time you need a scratchpad to try new ideas. Pause these notebooks to save money on cloud machines when you’re done with your work.
  3. Lightning Sweeper (HPO) – Train hundreds of models across hundreds of cloud CPUs/GPUs, using advanced hyperparameter tuning strategies.
  4. Collaborative Training – This app showcases how you can train a model across machines spread over the internet. This is useful for when you have a mixed set of devices (different types of GPUs) or machines that are spread over the internet that do not have specialized interconnect between them. Via the UI you can start your own training run or join others via a link! The app will handle connecting/updating and monitoring of your training job through the UI.

Convert Modeling Code into Apps

To convert your code to a Lightning App is a seamless task today with some of our convenient components. Both the PopenPythonScript and TracerPythonScript components are near-zero code change solutions for converting Python code to Lightning Apps! See here for all the code changes required to convert this official PyTorch Lightning image classification model to a Lightning App with the TracerPythonScript component.

Future Roadmap

A great future lies ahead of Lightning AI and we are excited to share you a list of enhancements to come! Within the next months – year we plan to enable the following:

  1. Allow users to locally run multiple apps in tandem – This will be great for users that have multiple Lightning Apps they are testing and developing.
  2. Multi-tenancy apps – This will allow the community to deploy Lightning Apps to their favorite cloud provider.
  3. App built-in user authentication – It is important for the community to be able to protect access to their Lightning Apps. This is a step in the right direction to making Lightning Apps more secure.
  4. Hot-reload
  5. Pause and resume Lightning App – Some users will be using Lightning Apps that host an interactive environment like Jupyter Notebooks. For such users we want to remove all pain points related to losing work and being billed when a machine is idle.

We welcome the community to build Lightning Apps that meet their needs and share in our open source gallery ecosystem!

Get started today. We look forward to seeing what you #BuildWithLightning!

What is Grid

The Grid platform enables users to quickly iterate through the model development life cycle by managing the provisioning of machine learning infrastructure on the cloud.

Barriers to Machine Learning Adoption

Machine learning (ML) is a complicated field. This is especially the case when trying to figure out how to build models around your data. This includes:

  • making sure you have access to the infrastructure to test and train your models (and the budget to pay for it),
  • viewing logs to determine why a model is failing,
  • sharing and collaborating with other team members to push your project forward,
  • and moving models into production and then using them in the real world.

There are so many places in your pipeline where everything can fall apart. In fact, there are polls that show 80% of all machine learning models never get deployed!

Machine Learning barriers

So, where does Grid fit in?

Your Machine Learning Toolbox

Grid fills the holes within your ML infrastructure.

Many of our data scientist customers don’t have the time or resources to hire and train additional MLOps team members, and they also lack access to critical infrastructure. They are generally building on their laptops or desktops, but their system can’t handle extensive training. 

To tackle these challenges, they look to the Grid platform to provide a solution that allows them to rapidly prototype and train models, and go to production quickly in order to drive innovative research.

Machine Learning toolbox

One of Grid’s defining features is that there are no code changes needed when training! This is a huge benefit for data scientists who want to take a model and just push it to the cloud. 

Grid makes this process simple: it provides a wide variety of infrastructure options that include CPU or GPU instances to tackle any workload. Datastores are also available to capture all resources from one location, and interactive Grid Sessions give users the dashboard capability in one UI.

Grid works, it scales, and it meets the needs of data scientists, researchers, engineers, and other users focused on bringing machine learning projects to life.

Grid’s Mission

Grid’s goal is to eliminate the burden of managing infrastructure in order to rapidly prototype and train models in an effort to reduce the time spent in the model development lifecycle. Grid meets this goal by maintaining three core pillars:

  1. Community engagement. We’re always working to build products that solve your problems, and we always encourage users to get in touch with feedback.
  2. Research and development. We maintain our commitment to excellence leveraging state-of-the-art tools and industry knowledge to make products that empower our users.
  3. Constant innovation. We build products that are useful both now and in the long term. By vigilantly evaluating customer needs and user experience, Grid delivers innovative features critical to the future of the industry.

Grid Features

Grid’s core features are:

  1. Datastores. Data storage that is shareable between teams and mountable to both Runs and Sessions.
  2. Runs. Transient jobs that will run your Python, Julia, or R code and store the resulting artifacts for downloads.
  3. Sessions. Interactive Jupyter notebook environments capable of running Python, Julia, and R code. This feature is designed for iteration and prototyping with the ability to pause without losing any work.
  4. Artifact management. Grid offers the ability to manage and download the artifacts created from model training.

As a platform, Grid enables users to scale their workflow and facilitates faster model training. 

We do this by being:

  1. Bigger. Train and scale models using Grid Runs, Sessions and a variety of Cloud-based infrastructure.
  2. Faster. Quickly develop models with one-click access to scalable compute and the ability to run parallel hyperparameter search.
  3. Easier. You don’t need to remember thousands of commands from different environments in order to train a model, generate Artifacts, and increase replicability.

“The intangible value of Grid: I am a happier data scientist because I get to focus on the stuff that I love to work on, and ultimately the reasons that they hired me, which is to research and develop models and translate real world problems in podtech into machine learning products. I think it presents a justification for machine learning engineers and data scientists to focus on what we were hired to do, rather than spinning our wheels on infrastructure.” Chase Bosworth, Machine Learning Engineering Manager (Spotify x Podsights)


So, that’s it in a nutshell! Interested in learning more about how Grid can help you manage machine learning model development for your next project? Get started with Grid’s free community tier account (and get $25 in free credits!) by clicking here. Also, explore our documentation and join the Slack community to learn more about what the Grid platform can do for you.

Happy Grid-ing!

Tensorwerk Announcement

A Letter From Luca Antiga, Co-Founder and CEO of Tensorwerk

Since its inception, Tensorwerk has been about people. First and foremost, people like Rick Izzo and Sherin Thomas, who were there with me materializing our vision of accessible, democratic, data-driven software.

It has also been about the many people with whom we’ve collaborated along the way, exploring synergies (as corporate jargon would put it), or excitedly sharing each other’s ideas (as a regular human would). This group includes the team at Grid. When we met Will and Luis, it became immediately clear that we shared a common vision for the future of machine learning and software development at large, one that was simultaneously more ambitious than anything attempted previously and also firmly grounded in a desire to make technology more accessible, easy to use, and intuitive than it had ever been. Out of our mutual desire to solve big challenges in machine learning like abstraction and composition grew a determination to empower developers with the next generation of deep learning products.

Finally, and perhaps most importantly, it has been about the expansive and constantly growing network of people who use, think about, innovate, develop, and scale up machine learning technology on a daily basis. By working with Grid and the PyTorch Lightning community, our goal is to serve that community more effectively and more profoundly than ever before. We hope, for example, to apply the expertise we’ve developed to make enterprise-level model serving accessible not only to large enterprises, but to the broader machine learning community as well. As deep learning becomes an increasingly integral aspect of software development, and as that development moves towards a combination of prescriptive and data-driven software, we have sought to build out a paradigm that enables the people who will be developing within it to do amazing things.

When I think about how Tensorwerk fits into Grid, I think about the people at the heart of these overlapping, ambitious, and exciting endeavors. Not just the inspiring people with whom I work every single day, but also the developers from across the world who use our tooling to do an array of outstanding things, and the people we have yet to bring into the fold.

We’ve got plenty of challenges ahead of us, and I’m excited to approach them together.

Luca Antiga
Co-Founder, CEO

⚡️ Read the press release here

Luca Antiga, Tensorwerk

Announcing the new Lightning Trainer Strategy API

Introducing LightningCLI V2

Announcing Lightning v1.5

Scale your PyTorch code with LightningLite

Lightning Tutorials in collaboration with the University of Amsterdam (UvA)

Configuring PyCharm for Remote ML Development with Grid.ai

Sharing Flash Demos with Grid Sessions, Gradio and Ngrok