🕷️ Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:
Response:
Calculated Shard: 120 (from laksa148)

2. Crawled Status Check

Query:
Response:

3. Robots.txt Check

Query:
Response:

4. Spam/Ban Check

Query:
Response:

5. Seen Status Check

ℹ️ Skipped - page is already crawled

đź“„
INDEXABLE
âś…
CRAWLED
21 days ago
🤖
ROBOTS ALLOWED

Page Info Filters

FilterStatusConditionDetails
HTTP statusPASSdownload_http_code = 200HTTP 200
Age cutoffPASSdownload_stamp > now() - 6 MONTH0.7 months ago
History dropPASSisNull(history_drop_reason)No drop reason
Spam/banPASSfh_dont_index != 1 AND ml_spam_score = 0ml_spam_score=0
CanonicalPASSmeta_canonical IS NULL OR = '' OR = src_unparsedNot set

Page Details

PropertyValue
URLhttps://docs.ray.io/en/latest/train/getting-started-pytorch.html
Last Crawled2026-03-20 06:29:18 (21 days ago)
First Indexed2023-09-18 17:10:38 (2 years ago)
HTTP Status Code200
Meta TitleGet Started with Distributed Training using PyTorch — Ray 2.54.0
Meta Descriptionnull
Meta Canonicalnull
Boilerpipe Text
This tutorial walks through the process of converting an existing PyTorch script to use Ray Train. Learn how to: Configure a model to run distributed and on the correct CPU/GPU device. Configure a dataloader to shard data across the workers and place data on the correct CPU or GPU device. Configure a training function to report metrics and save checkpoints. Configure scaling and CPU or GPU resource requirements for a training job. Launch a distributed training job with a TorchTrainer class. Quickstart # For reference, the final code will look something like the following: from ray.train.torch import TorchTrainer from ray.train import ScalingConfig def train_func (): # Your PyTorch training code here. ... scaling_config = ScalingConfig ( num_workers = 2 , use_gpu = True ) trainer = TorchTrainer ( train_func , scaling_config = scaling_config ) result = trainer . fit () train_func is the Python code that executes on each distributed training worker. ScalingConfig defines the number of distributed training workers and whether to use GPUs. TorchTrainer launches the distributed training job. Compare a PyTorch training script with and without Ray Train. PyTorch + Ray Train import os import tempfile import torch from torch.nn import CrossEntropyLoss from torch.optim import Adam from torch.utils.data import DataLoader from torchvision.models import resnet18 from torchvision.datasets import FashionMNIST from torchvision.transforms import ToTensor , Normalize , Compose import ray.train.torch def train_func (): # Model, Loss, Optimizer model = resnet18 ( num_classes = 10 ) model . conv1 = torch . nn . Conv2d ( 1 , 64 , kernel_size = ( 7 , 7 ), stride = ( 2 , 2 ), padding = ( 3 , 3 ), bias = False ) # [1] Prepare model. model = ray . train . torch . prepare_model ( model ) # model.to("cuda") # This is done by `prepare_model` criterion = CrossEntropyLoss () optimizer = Adam ( model . parameters (), lr = 0.001 ) # Data transform = Compose ([ ToTensor (), Normalize (( 0.28604 ,), ( 0.32025 ,))]) data_dir = os . path . join ( tempfile . gettempdir (), "data" ) train_data = FashionMNIST ( root = data_dir , train = True , download = True , transform = transform ) train_loader = DataLoader ( train_data , batch_size = 128 , shuffle = True ) # [2] Prepare dataloader. train_loader = ray . train . torch . prepare_data_loader ( train_loader ) # Training for epoch in range ( 10 ): if ray . train . get_context () . get_world_size () > 1 : train_loader . sampler . set_epoch ( epoch ) for images , labels in train_loader : # This is done by `prepare_data_loader`! # images, labels = images.to("cuda"), labels.to("cuda") outputs = model ( images ) loss = criterion ( outputs , labels ) optimizer . zero_grad () loss . backward () optimizer . step () # [3] Report metrics and checkpoint. metrics = { "loss" : loss . item (), "epoch" : epoch } with tempfile . TemporaryDirectory () as temp_checkpoint_dir : torch . save ( model . module . state_dict (), os . path . join ( temp_checkpoint_dir , "model.pt" ) ) ray . train . report ( metrics , checkpoint = ray . train . Checkpoint . from_directory ( temp_checkpoint_dir ), ) if ray . train . get_context () . get_world_rank () == 0 : print ( metrics ) # [4] Configure scaling and resource requirements. scaling_config = ray . train . ScalingConfig ( num_workers = 2 , use_gpu = True ) # [5] Launch distributed training job. trainer = ray . train . torch . TorchTrainer ( train_func , scaling_config = scaling_config , # [5a] If running in a multi-node cluster, this is where you # should configure the run's persistent storage that is accessible # across all worker nodes. # run_config=ray.train.RunConfig(storage_path="s3://..."), ) result = trainer . fit () # [6] Load the trained model. with result . checkpoint . as_directory () as checkpoint_dir : model_state_dict = torch . load ( os . path . join ( checkpoint_dir , "model.pt" )) model = resnet18 ( num_classes = 10 ) model . conv1 = torch . nn . Conv2d ( 1 , 64 , kernel_size = ( 7 , 7 ), stride = ( 2 , 2 ), padding = ( 3 , 3 ), bias = False ) model . load_state_dict ( model_state_dict ) PyTorch import os import tempfile import torch from torch.nn import CrossEntropyLoss from torch.optim import Adam from torch.utils.data import DataLoader from torchvision.models import resnet18 from torchvision.datasets import FashionMNIST from torchvision.transforms import ToTensor , Normalize , Compose # Model, Loss, Optimizer model = resnet18 ( num_classes = 10 ) model . conv1 = torch . nn . Conv2d ( 1 , 64 , kernel_size = ( 7 , 7 ), stride = ( 2 , 2 ), padding = ( 3 , 3 ), bias = False ) model . to ( "cuda" ) criterion = CrossEntropyLoss () optimizer = Adam ( model . parameters (), lr = 0.001 ) # Data transform = Compose ([ ToTensor (), Normalize (( 0.28604 ,), ( 0.32025 ,))]) train_data = FashionMNIST ( root = './data' , train = True , download = True , transform = transform ) train_loader = DataLoader ( train_data , batch_size = 128 , shuffle = True ) # Training for epoch in range ( 10 ): for images , labels in train_loader : images , labels = images . to ( "cuda" ), labels . to ( "cuda" ) outputs = model ( images ) loss = criterion ( outputs , labels ) optimizer . zero_grad () loss . backward () optimizer . step () metrics = { "loss" : loss . item (), "epoch" : epoch } checkpoint_dir = tempfile . mkdtemp () checkpoint_path = os . path . join ( checkpoint_dir , "model.pt" ) torch . save ( model . state_dict (), checkpoint_path ) print ( metrics ) Set up a training function # First, update your training code to support distributed training. Begin by wrapping your code in a training function : def train_func (): # Your model training code here. ... Each distributed training worker executes this function. You can also specify the input argument for train_func as a dictionary via the Trainer’s train_loop_config . For example: def train_func ( config ): lr = config [ "lr" ] num_epochs = config [ "num_epochs" ] config = { "lr" : 1e-4 , "num_epochs" : 10 } trainer = ray . train . torch . TorchTrainer ( train_func , train_loop_config = config , ... ) Warning Avoid passing large data objects through train_loop_config to reduce the serialization and deserialization overhead. Instead, it’s preferred to initialize large objects (e.g. datasets, models) directly in train_func . def load_dataset(): # Return a large in-memory dataset ... def load_model(): # Return a large in-memory model instance ... -config = {"data": load_dataset(), "model": load_model()} def train_func(config): - data = config["data"] - model = config["model"] + data = load_dataset() + model = load_model() ... trainer = ray.train.torch.TorchTrainer(train_func, train_loop_config=config, ...) Set up a model # Use the ray.train.torch.prepare_model() utility function to: Move your model to the correct device. Wrap it in DistributedDataParallel . -from torch.nn.parallel import DistributedDataParallel +import ray.train.torch def train_func(): ... # Create model. model = ... # Set up distributed training and device placement. - device_id = ... # Your logic to get the right device. - model = model.to(device_id or "cpu") - model = DistributedDataParallel(model, device_ids=[device_id]) + model = ray.train.torch.prepare_model(model) ... Set up a dataset # Use the ray.train.torch.prepare_data_loader() utility function, which: Adds a DistributedSampler to your DataLoader . Moves the batches to the right device. Note that this step isn’t necessary if you’re passing in Ray Data to your Trainer. See Data Loading and Preprocessing . from torch.utils.data import DataLoader +import ray.train.torch def train_func(): ... dataset = ... data_loader = DataLoader(dataset, batch_size=worker_batch_size, shuffle=True) + data_loader = ray.train.torch.prepare_data_loader(data_loader) for epoch in range(10): + if ray.train.get_context().get_world_size() > 1: + data_loader.sampler.set_epoch(epoch) for X, y in data_loader: - X = X.to_device(device) - y = y.to_device(device) ... Tip Keep in mind that DataLoader takes in a batch_size which is the batch size for each worker. The global batch size can be calculated from the worker batch size (and vice-versa) with the following equation: global_batch_size = worker_batch_size * ray . train . get_context () . get_world_size () Note If you already manually set up your DataLoader with a DistributedSampler , prepare_data_loader() will not add another one, and will respect the configuration of the existing sampler. Report checkpoints and metrics # To monitor progress, you can report intermediate metrics and checkpoints using the ray.train.report() utility function. +import os +import tempfile +import ray.train def train_func(): ... with tempfile.TemporaryDirectory() as temp_checkpoint_dir: torch.save( model.state_dict(), os.path.join(temp_checkpoint_dir, "model.pt") ) + metrics = {"loss": loss.item()} # Training/validation metrics. # Build a Ray Train checkpoint from a directory + checkpoint = ray.train.Checkpoint.from_directory(temp_checkpoint_dir) # Ray Train will automatically save the checkpoint to persistent storage, # so the local `temp_checkpoint_dir` can be safely cleaned up after. + ray.train.report(metrics=metrics, checkpoint=checkpoint) ... For more details, see Monitoring and Logging Metrics and Saving and Loading Checkpoints . Configure scale and GPUs # Outside of your training function, create a ScalingConfig object to configure: num_workers - The number of distributed training worker processes. use_gpu - Whether each worker should use a GPU (or CPU). from ray.train import ScalingConfig scaling_config = ScalingConfig ( num_workers = 2 , use_gpu = True ) For more details, see Configuring Scale and GPUs . Configure persistent storage # Create a RunConfig object to specify the path where results (including checkpoints and artifacts) will be saved. from ray.train import RunConfig # Local path (/some/local/path/unique_run_name) run_config = RunConfig ( storage_path = "/some/local/path" , name = "unique_run_name" ) # Shared cloud storage URI (s3://bucket/unique_run_name) run_config = RunConfig ( storage_path = "s3://bucket" , name = "unique_run_name" ) # Shared NFS path (/mnt/nfs/unique_run_name) run_config = RunConfig ( storage_path = "/mnt/nfs" , name = "unique_run_name" ) Warning Specifying a shared storage location (such as cloud storage or NFS) is optional for single-node clusters, but it is required for multi-node clusters. Using a local path will raise an error during checkpointing for multi-node clusters. For more details, see Configuring Persistent Storage . Launch a training job # Tying this all together, you can now launch a distributed training job with a TorchTrainer . from ray.train.torch import TorchTrainer trainer = TorchTrainer ( train_func , scaling_config = scaling_config , run_config = run_config ) result = trainer . fit () Access training results # After training completes, a Result object is returned which contains information about the training run, including the metrics and checkpoints reported during training. result . metrics # The metrics reported during training. result . checkpoint # The latest checkpoint reported during training. result . path # The path where logs are stored. result . error # The exception that was raised, if training failed. For more usage examples, see Inspecting Training Results . Next steps # After you have converted your PyTorch training script to use Ray Train: See User Guides to learn more about how to perform specific tasks. Browse the Examples for end-to-end examples of how to use Ray Train. Dive into the API Reference for more details on the classes and methods used in this tutorial.
Markdown
[Skip to main content](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#main-content) Back to top Try Ray with \$100 credit — [Start now](https://console.anyscale.com/register/ha?render_flow=ray&utm_source=ray_docs&utm_medium=docs&utm_campaign=banner) × Site Navigation - [Get Started](https://docs.ray.io/en/latest/ray-overview/getting-started.html "Get Started") - [Use Cases](https://docs.ray.io/en/latest/ray-overview/use-cases.html "Use Cases") - [Example Gallery](https://docs.ray.io/en/latest/ray-overview/examples.html "Example Gallery") - [Library](https://docs.ray.io/en/latest/ray-overview/installation.html "Library") - [Ray CoreScale general Python applications](https://docs.ray.io/en/latest/ray-core/walkthrough.html "Ray Core") - [Ray DataScale data ingest and preprocessing](https://docs.ray.io/en/latest/data/data.html "Ray Data") - [Ray TrainScale machine learning training](https://docs.ray.io/en/latest/train/train.html "Ray Train") - [Ray TuneScale hyperparameter tuning](https://docs.ray.io/en/latest/tune/index.html "Ray Tune") - [Ray ServeScale model serving](https://docs.ray.io/en/latest/serve/index.html "Ray Serve") - [Ray RLlibScale reinforcement learning](https://docs.ray.io/en/latest/rllib/index.html "Ray RLlib") - [Docs](https://docs.ray.io/en/latest/index.html "Docs") - [Resources](https://discuss.ray.io/ "Resources") - [Discussion ForumGet your Ray questions answered](https://discuss.ray.io/ "Discussion Forum") - [TrainingHands-on learning](https://github.com/ray-project/ray-educational-materials "Training") - [BlogUpdates, best practices, user-stories](https://www.anyscale.com/blog "Blog") - [EventsWebinars, meetups, office hours](https://www.anyscale.com/events "Events") - [Success StoriesReal-world workload examples](https://www.anyscale.com/blog/how-ray-and-anyscale-make-it-easy-to-do-massive-scale-machine-learning-on "Success Stories") - [EcosystemLibraries integrated with Ray](https://docs.ray.io/en/latest/ray-overview/ray-libraries.html "Ecosystem") - [CommunityConnect with us](https://www.ray.io/community "Community") Search ```K` latest [latest](https://docs.ray.io/en/latest/train/getting-started-pytorch.html)[master](https://docs.ray.io/en/master/train/getting-started-pytorch.html)[releases-2.54.0](https://docs.ray.io/en/releases-2.54.0/train/getting-started-pytorch.html)[releases-2.53.0](https://docs.ray.io/en/releases-2.53.0/train/getting-started-pytorch.html)[releases-2.52.1](https://docs.ray.io/en/releases-2.52.1/train/getting-started-pytorch.html)[releases-2.52.0](https://docs.ray.io/en/releases-2.52.0/train/getting-started-pytorch.html)[releases-2.51.2](https://docs.ray.io/en/releases-2.51.2/train/getting-started-pytorch.html)[releases-2.51.1](https://docs.ray.io/en/releases-2.51.1/train/getting-started-pytorch.html)[releases-2.51.0](https://docs.ray.io/en/releases-2.51.0/train/getting-started-pytorch.html)[releases-2.50.1](https://docs.ray.io/en/releases-2.50.1/train/getting-started-pytorch.html)[releases-2.50.0](https://docs.ray.io/en/releases-2.50.0/train/getting-started-pytorch.html)[releases-2.49.2](https://docs.ray.io/en/releases-2.49.2/train/getting-started-pytorch.html)[releases-2.49.1](https://docs.ray.io/en/releases-2.49.1/train/getting-started-pytorch.html)[releases-2.49.0](https://docs.ray.io/en/releases-2.49.0/train/getting-started-pytorch.html)[releases-2.48.0](https://docs.ray.io/en/releases-2.48.0/train/getting-started-pytorch.html)[releases-2.47.1](https://docs.ray.io/en/releases-2.47.1/train/getting-started-pytorch.html)[releases-2.47.0](https://docs.ray.io/en/releases-2.47.0/train/getting-started-pytorch.html)[releases-2.46.0](https://docs.ray.io/en/releases-2.46.0/train/getting-started-pytorch.html)[releases-2.45.0](https://docs.ray.io/en/releases-2.45.0/train/getting-started-pytorch.html)[releases-2.44.1](https://docs.ray.io/en/releases-2.44.1/train/getting-started-pytorch.html)[releases-2.44.0](https://docs.ray.io/en/releases-2.44.0/train/getting-started-pytorch.html)[releases-2.43.0](https://docs.ray.io/en/releases-2.43.0/train/getting-started-pytorch.html)[releases-2.42.1](https://docs.ray.io/en/releases-2.42.1/train/getting-started-pytorch.html)[releases-2.42.0](https://docs.ray.io/en/releases-2.42.0/train/getting-started-pytorch.html)[releases-2.41.0](https://docs.ray.io/en/releases-2.41.0/train/getting-started-pytorch.html)[releases-2.40.0](https://docs.ray.io/en/releases-2.40.0/train/getting-started-pytorch.html)[releases-2.39.0](https://docs.ray.io/en/releases-2.39.0/train/getting-started-pytorch.html)[releases-2.38.0](https://docs.ray.io/en/releases-2.38.0/train/getting-started-pytorch.html)[releases-2.37.0](https://docs.ray.io/en/releases-2.37.0/train/getting-started-pytorch.html)[releases-2.36.1](https://docs.ray.io/en/releases-2.36.1/train/getting-started-pytorch.html)[releases-2.36.0](https://docs.ray.io/en/releases-2.36.0/train/getting-started-pytorch.html)[releases-2.35.0](https://docs.ray.io/en/releases-2.35.0/train/getting-started-pytorch.html)[releases-2.34.0](https://docs.ray.io/en/releases-2.34.0/train/getting-started-pytorch.html)[releases-2.33.0](https://docs.ray.io/en/releases-2.33.0/train/getting-started-pytorch.html)[releases-2.32.0](https://docs.ray.io/en/releases-2.32.0/train/getting-started-pytorch.html)[releases-2.31.0](https://docs.ray.io/en/releases-2.31.0/train/getting-started-pytorch.html)[releases-2.30.0](https://docs.ray.io/en/releases-2.30.0/train/getting-started-pytorch.html)[releases-2.24.0](https://docs.ray.io/en/releases-2.24.0/train/getting-started-pytorch.html)[releases-2.23.0](https://docs.ray.io/en/releases-2.23.0/train/getting-started-pytorch.html)[releases-2.22.0](https://docs.ray.io/en/releases-2.22.0/train/getting-started-pytorch.html)[releases-2.21.0](https://docs.ray.io/en/releases-2.21.0/train/getting-started-pytorch.html)[releases-2.20.0](https://docs.ray.io/en/releases-2.20.0/train/getting-started-pytorch.html)[releases-2.12.0](https://docs.ray.io/en/releases-2.12.0/train/getting-started-pytorch.html)[releases-2.11.0](https://docs.ray.io/en/releases-2.11.0/train/getting-started-pytorch.html)[releases-2.10.0](https://docs.ray.io/en/releases-2.10.0/train/getting-started-pytorch.html)[releases-2.9.3](https://docs.ray.io/en/releases-2.9.3/train/getting-started-pytorch.html)[releases-2.9.2](https://docs.ray.io/en/releases-2.9.2/train/getting-started-pytorch.html)[releases-2.9.1](https://docs.ray.io/en/releases-2.9.1/train/getting-started-pytorch.html)[releases-2.9.0](https://docs.ray.io/en/releases-2.9.0/train/getting-started-pytorch.html)[releases-2.8.1](https://docs.ray.io/en/releases-2.8.1/train/getting-started-pytorch.html)[releases-2.8.0](https://docs.ray.io/en/releases-2.8.0/train/getting-started-pytorch.html)[releases-2.7.1](https://docs.ray.io/en/releases-2.7.1/train/getting-started-pytorch.html)[releases-2.7.0](https://docs.ray.io/en/releases-2.7.0/train/getting-started-pytorch.html)[releases-2.6.3](https://docs.ray.io/en/releases-2.6.3/train/getting-started-pytorch.html)[releases-2.6.2](https://docs.ray.io/en/releases-2.6.2/train/getting-started-pytorch.html)[releases-2.6.1](https://docs.ray.io/en/releases-2.6.1/train/getting-started-pytorch.html)[releases-2.6.0](https://docs.ray.io/en/releases-2.6.0/train/getting-started-pytorch.html)[releases-2.5.1](https://docs.ray.io/en/releases-2.5.1/train/getting-started-pytorch.html)[releases-2.5.0](https://docs.ray.io/en/releases-2.5.0/train/getting-started-pytorch.html)[releases-2.4.0](https://docs.ray.io/en/releases-2.4.0/train/getting-started-pytorch.html)[releases-2.3.1](https://docs.ray.io/en/releases-2.3.1/train/getting-started-pytorch.html)[releases-2.3.0](https://docs.ray.io/en/releases-2.3.0/train/getting-started-pytorch.html)[releases-2.2.0](https://docs.ray.io/en/releases-2.2.0/train/getting-started-pytorch.html)[releases-2.1.0](https://docs.ray.io/en/releases-2.1.0/train/getting-started-pytorch.html)[releases-2.0.1](https://docs.ray.io/en/releases-2.0.1/train/getting-started-pytorch.html)[releases-2.0.0](https://docs.ray.io/en/releases-2.0.0/train/getting-started-pytorch.html)[releases-1.13.1](https://docs.ray.io/en/releases-1.13.1/train/getting-started-pytorch.html)[releases-1.13.0](https://docs.ray.io/en/releases-1.13.0/train/getting-started-pytorch.html)[releases-1.12.1](https://docs.ray.io/en/releases-1.12.1/train/getting-started-pytorch.html)[releases-1.12.0](https://docs.ray.io/en/releases-1.12.0/train/getting-started-pytorch.html)[releases-1.11.1](https://docs.ray.io/en/releases-1.11.1/train/getting-started-pytorch.html)[releases-1.11.0](https://docs.ray.io/en/releases-1.11.0/train/getting-started-pytorch.html) [Try Managed Ray](https://console.anyscale.com/register/ha?render_flow=ray&utm_source=ray_docs&utm_medium=docs&utm_campaign=navbar) Site Navigation - [Get Started](https://docs.ray.io/en/latest/ray-overview/getting-started.html "Get Started") - [Use Cases](https://docs.ray.io/en/latest/ray-overview/use-cases.html "Use Cases") - [Example Gallery](https://docs.ray.io/en/latest/ray-overview/examples.html "Example Gallery") - [Library](https://docs.ray.io/en/latest/ray-overview/installation.html "Library") - [Ray CoreScale general Python applications](https://docs.ray.io/en/latest/ray-core/walkthrough.html "Ray Core") - [Ray DataScale data ingest and preprocessing](https://docs.ray.io/en/latest/data/data.html "Ray Data") - [Ray TrainScale machine learning training](https://docs.ray.io/en/latest/train/train.html "Ray Train") - [Ray TuneScale hyperparameter tuning](https://docs.ray.io/en/latest/tune/index.html "Ray Tune") - [Ray ServeScale model serving](https://docs.ray.io/en/latest/serve/index.html "Ray Serve") - [Ray RLlibScale reinforcement learning](https://docs.ray.io/en/latest/rllib/index.html "Ray RLlib") - [Docs](https://docs.ray.io/en/latest/index.html "Docs") - [Resources](https://discuss.ray.io/ "Resources") - [Discussion ForumGet your Ray questions answered](https://discuss.ray.io/ "Discussion Forum") - [TrainingHands-on learning](https://github.com/ray-project/ray-educational-materials "Training") - [BlogUpdates, best practices, user-stories](https://www.anyscale.com/blog "Blog") - [EventsWebinars, meetups, office hours](https://www.anyscale.com/events "Events") - [Success StoriesReal-world workload examples](https://www.anyscale.com/blog/how-ray-and-anyscale-make-it-easy-to-do-massive-scale-machine-learning-on "Success Stories") - [EcosystemLibraries integrated with Ray](https://docs.ray.io/en/latest/ray-overview/ray-libraries.html "Ecosystem") - [CommunityConnect with us](https://www.ray.io/community "Community") latest [latest](https://docs.ray.io/en/latest/train/getting-started-pytorch.html)[master](https://docs.ray.io/en/master/train/getting-started-pytorch.html)[releases-2.54.0](https://docs.ray.io/en/releases-2.54.0/train/getting-started-pytorch.html)[releases-2.53.0](https://docs.ray.io/en/releases-2.53.0/train/getting-started-pytorch.html)[releases-2.52.1](https://docs.ray.io/en/releases-2.52.1/train/getting-started-pytorch.html)[releases-2.52.0](https://docs.ray.io/en/releases-2.52.0/train/getting-started-pytorch.html)[releases-2.51.2](https://docs.ray.io/en/releases-2.51.2/train/getting-started-pytorch.html)[releases-2.51.1](https://docs.ray.io/en/releases-2.51.1/train/getting-started-pytorch.html)[releases-2.51.0](https://docs.ray.io/en/releases-2.51.0/train/getting-started-pytorch.html)[releases-2.50.1](https://docs.ray.io/en/releases-2.50.1/train/getting-started-pytorch.html)[releases-2.50.0](https://docs.ray.io/en/releases-2.50.0/train/getting-started-pytorch.html)[releases-2.49.2](https://docs.ray.io/en/releases-2.49.2/train/getting-started-pytorch.html)[releases-2.49.1](https://docs.ray.io/en/releases-2.49.1/train/getting-started-pytorch.html)[releases-2.49.0](https://docs.ray.io/en/releases-2.49.0/train/getting-started-pytorch.html)[releases-2.48.0](https://docs.ray.io/en/releases-2.48.0/train/getting-started-pytorch.html)[releases-2.47.1](https://docs.ray.io/en/releases-2.47.1/train/getting-started-pytorch.html)[releases-2.47.0](https://docs.ray.io/en/releases-2.47.0/train/getting-started-pytorch.html)[releases-2.46.0](https://docs.ray.io/en/releases-2.46.0/train/getting-started-pytorch.html)[releases-2.45.0](https://docs.ray.io/en/releases-2.45.0/train/getting-started-pytorch.html)[releases-2.44.1](https://docs.ray.io/en/releases-2.44.1/train/getting-started-pytorch.html)[releases-2.44.0](https://docs.ray.io/en/releases-2.44.0/train/getting-started-pytorch.html)[releases-2.43.0](https://docs.ray.io/en/releases-2.43.0/train/getting-started-pytorch.html)[releases-2.42.1](https://docs.ray.io/en/releases-2.42.1/train/getting-started-pytorch.html)[releases-2.42.0](https://docs.ray.io/en/releases-2.42.0/train/getting-started-pytorch.html)[releases-2.41.0](https://docs.ray.io/en/releases-2.41.0/train/getting-started-pytorch.html)[releases-2.40.0](https://docs.ray.io/en/releases-2.40.0/train/getting-started-pytorch.html)[releases-2.39.0](https://docs.ray.io/en/releases-2.39.0/train/getting-started-pytorch.html)[releases-2.38.0](https://docs.ray.io/en/releases-2.38.0/train/getting-started-pytorch.html)[releases-2.37.0](https://docs.ray.io/en/releases-2.37.0/train/getting-started-pytorch.html)[releases-2.36.1](https://docs.ray.io/en/releases-2.36.1/train/getting-started-pytorch.html)[releases-2.36.0](https://docs.ray.io/en/releases-2.36.0/train/getting-started-pytorch.html)[releases-2.35.0](https://docs.ray.io/en/releases-2.35.0/train/getting-started-pytorch.html)[releases-2.34.0](https://docs.ray.io/en/releases-2.34.0/train/getting-started-pytorch.html)[releases-2.33.0](https://docs.ray.io/en/releases-2.33.0/train/getting-started-pytorch.html)[releases-2.32.0](https://docs.ray.io/en/releases-2.32.0/train/getting-started-pytorch.html)[releases-2.31.0](https://docs.ray.io/en/releases-2.31.0/train/getting-started-pytorch.html)[releases-2.30.0](https://docs.ray.io/en/releases-2.30.0/train/getting-started-pytorch.html)[releases-2.24.0](https://docs.ray.io/en/releases-2.24.0/train/getting-started-pytorch.html)[releases-2.23.0](https://docs.ray.io/en/releases-2.23.0/train/getting-started-pytorch.html)[releases-2.22.0](https://docs.ray.io/en/releases-2.22.0/train/getting-started-pytorch.html)[releases-2.21.0](https://docs.ray.io/en/releases-2.21.0/train/getting-started-pytorch.html)[releases-2.20.0](https://docs.ray.io/en/releases-2.20.0/train/getting-started-pytorch.html)[releases-2.12.0](https://docs.ray.io/en/releases-2.12.0/train/getting-started-pytorch.html)[releases-2.11.0](https://docs.ray.io/en/releases-2.11.0/train/getting-started-pytorch.html)[releases-2.10.0](https://docs.ray.io/en/releases-2.10.0/train/getting-started-pytorch.html)[releases-2.9.3](https://docs.ray.io/en/releases-2.9.3/train/getting-started-pytorch.html)[releases-2.9.2](https://docs.ray.io/en/releases-2.9.2/train/getting-started-pytorch.html)[releases-2.9.1](https://docs.ray.io/en/releases-2.9.1/train/getting-started-pytorch.html)[releases-2.9.0](https://docs.ray.io/en/releases-2.9.0/train/getting-started-pytorch.html)[releases-2.8.1](https://docs.ray.io/en/releases-2.8.1/train/getting-started-pytorch.html)[releases-2.8.0](https://docs.ray.io/en/releases-2.8.0/train/getting-started-pytorch.html)[releases-2.7.1](https://docs.ray.io/en/releases-2.7.1/train/getting-started-pytorch.html)[releases-2.7.0](https://docs.ray.io/en/releases-2.7.0/train/getting-started-pytorch.html)[releases-2.6.3](https://docs.ray.io/en/releases-2.6.3/train/getting-started-pytorch.html)[releases-2.6.2](https://docs.ray.io/en/releases-2.6.2/train/getting-started-pytorch.html)[releases-2.6.1](https://docs.ray.io/en/releases-2.6.1/train/getting-started-pytorch.html)[releases-2.6.0](https://docs.ray.io/en/releases-2.6.0/train/getting-started-pytorch.html)[releases-2.5.1](https://docs.ray.io/en/releases-2.5.1/train/getting-started-pytorch.html)[releases-2.5.0](https://docs.ray.io/en/releases-2.5.0/train/getting-started-pytorch.html)[releases-2.4.0](https://docs.ray.io/en/releases-2.4.0/train/getting-started-pytorch.html)[releases-2.3.1](https://docs.ray.io/en/releases-2.3.1/train/getting-started-pytorch.html)[releases-2.3.0](https://docs.ray.io/en/releases-2.3.0/train/getting-started-pytorch.html)[releases-2.2.0](https://docs.ray.io/en/releases-2.2.0/train/getting-started-pytorch.html)[releases-2.1.0](https://docs.ray.io/en/releases-2.1.0/train/getting-started-pytorch.html)[releases-2.0.1](https://docs.ray.io/en/releases-2.0.1/train/getting-started-pytorch.html)[releases-2.0.0](https://docs.ray.io/en/releases-2.0.0/train/getting-started-pytorch.html)[releases-1.13.1](https://docs.ray.io/en/releases-1.13.1/train/getting-started-pytorch.html)[releases-1.13.0](https://docs.ray.io/en/releases-1.13.0/train/getting-started-pytorch.html)[releases-1.12.1](https://docs.ray.io/en/releases-1.12.1/train/getting-started-pytorch.html)[releases-1.12.0](https://docs.ray.io/en/releases-1.12.0/train/getting-started-pytorch.html)[releases-1.11.1](https://docs.ray.io/en/releases-1.11.1/train/getting-started-pytorch.html)[releases-1.11.0](https://docs.ray.io/en/releases-1.11.0/train/getting-started-pytorch.html) [Try Managed Ray](https://console.anyscale.com/register/ha?render_flow=ray&utm_source=ray_docs&utm_medium=docs&utm_campaign=navbar) - [Overview](https://docs.ray.io/en/latest/ray-overview/index.html) - [Getting Started](https://docs.ray.io/en/latest/ray-overview/getting-started.html) - [Installation](https://docs.ray.io/en/latest/ray-overview/installation.html) - [Use Cases](https://docs.ray.io/en/latest/ray-overview/use-cases.html) - [Ray for ML Infrastructure](https://docs.ray.io/en/latest/ray-air/getting-started.html) - [Examples](https://docs.ray.io/en/latest/ray-overview/examples/index.html) - [Multi-modal AI pipeline](https://docs.ray.io/en/latest/ray-overview/examples/e2e-multimodal-ai-workloads/README.html) - [Batch inference](https://docs.ray.io/en/latest/ray-overview/examples/e2e-multimodal-ai-workloads/notebooks/01-Batch-Inference.html) - [Distributed training](https://docs.ray.io/en/latest/ray-overview/examples/e2e-multimodal-ai-workloads/notebooks/02-Distributed-Training.html) - [Online serving](https://docs.ray.io/en/latest/ray-overview/examples/e2e-multimodal-ai-workloads/notebooks/03-Online-Serving.html) - [LLM training and inference](https://docs.ray.io/en/latest/ray-overview/examples/entity-recognition-with-llms/README.html) - [Audio batch inference](https://docs.ray.io/en/latest/ray-overview/examples/e2e-audio/README.html) - [Distributed XGBoost pipeline](https://docs.ray.io/en/latest/ray-overview/examples/e2e-xgboost/README.html) - [Distributed training of an XGBoost model](https://docs.ray.io/en/latest/ray-overview/examples/e2e-xgboost/notebooks/01-Distributed_Training.html) - [Model validation using offline batch inference](https://docs.ray.io/en/latest/ray-overview/examples/e2e-xgboost/notebooks/02-Validation.html) - [Scalable online XGBoost inference with Ray Serve](https://docs.ray.io/en/latest/ray-overview/examples/e2e-xgboost/notebooks/03-Serving.html) - [Time-series forecasting](https://docs.ray.io/en/latest/ray-overview/examples/e2e-timeseries/README.html) - [Distributed training of a DLinear time-series model](https://docs.ray.io/en/latest/ray-overview/examples/e2e-timeseries/e2e_timeseries/01-Distributed-Training.html) - [DLinear model validation using offline batch inference](https://docs.ray.io/en/latest/ray-overview/examples/e2e-timeseries/e2e_timeseries/02-Validation.html) - [Online serving for DLinear model using Ray Serve](https://docs.ray.io/en/latest/ray-overview/examples/e2e-timeseries/e2e_timeseries/03-Serving.html) - [Scalable video processing](https://docs.ray.io/en/latest/ray-overview/examples/object-detection/README.html) - [Fine-tuning a face mask detection model with Faster R-CNN](https://docs.ray.io/en/latest/ray-overview/examples/object-detection/1.object_detection_train.html) - [Object detection batch inference on test dataset and metrics calculation](https://docs.ray.io/en/latest/ray-overview/examples/object-detection/2.object_detection_batch_inference_eval.html) - [Video processing with object detection using batch inference](https://docs.ray.io/en/latest/ray-overview/examples/object-detection/3.video_processing_batch_inference.html) - [Host an object detection model as a service](https://docs.ray.io/en/latest/ray-overview/examples/object-detection/4.object_detection_serve.html) - [Distributed RAG pipeline](https://docs.ray.io/en/latest/ray-overview/examples/e2e-rag/README.html) - [Build a Regular RAG Document Ingestion Pipeline (No Ray required)](https://docs.ray.io/en/latest/ray-overview/examples/e2e-rag/notebooks/01_%28Optional%29_Regular_Document_Processing_Pipeline.html) - [Scalable RAG Data Ingestion and Pagination with Ray Data](https://docs.ray.io/en/latest/ray-overview/examples/e2e-rag/notebooks/02_Scalable_RAG_Data_Ingestion_with_Ray_Data.html) - [Deploy LLM with Ray Serve LLM](https://docs.ray.io/en/latest/ray-overview/examples/e2e-rag/notebooks/03_Deploy_LLM_with_Ray_Serve.html) - [Build Basic RAG App](https://docs.ray.io/en/latest/ray-overview/examples/e2e-rag/notebooks/04_Build_Basic_RAG_Chatbot.html) - [Improve RAG with Prompt Engineering](https://docs.ray.io/en/latest/ray-overview/examples/e2e-rag/notebooks/05_Improve_RAG_with_Prompt_Engineering.html) - [Evaluate RAG with Online Inference](https://docs.ray.io/en/latest/ray-overview/examples/e2e-rag/notebooks/06_%28Optional%29_Evaluate_RAG_with_Online_Inference.html) - [Evaluate RAG using Batch Inference with Ray Data LLM](https://docs.ray.io/en/latest/ray-overview/examples/e2e-rag/notebooks/07_Evaluate_RAG_with_Ray_Data_LLM_Batch_inference.html) - [Deploy MCP servers](https://docs.ray.io/en/latest/ray-overview/examples/mcp-ray-serve/README.html) - [Deploying a custom MCP in Streamable HTTP mode with Ray Serve](https://docs.ray.io/en/latest/ray-overview/examples/mcp-ray-serve/01%20Deploy_custom_mcp_in_streamable_http_with_ray_serve.html) - [Deploy an MCP Gateway with existing Ray Serve apps](https://docs.ray.io/en/latest/ray-overview/examples/mcp-ray-serve/02%20Build_mcp_gateway_with_existing_ray_serve_apps.html) - [Deploying an MCP STDIO Server as a scalable HTTP service with Ray Serve](https://docs.ray.io/en/latest/ray-overview/examples/mcp-ray-serve/03%20Deploy_single_mcp_stdio_docker_image_with_ray_serve.html) - [Deploying multiple MCP services with Ray Serve](https://docs.ray.io/en/latest/ray-overview/examples/mcp-ray-serve/04%20Deploy_multiple_mcp_stdio_docker_images_with_ray_serve.html) - [Build a Docker image for an MCP server](https://docs.ray.io/en/latest/ray-overview/examples/mcp-ray-serve/05%20%28Optional%29%20Build_docker_image_for_mcp_server.html) - [Build a tool-using agent](https://docs.ray.io/en/latest/ray-overview/examples/langchain_agent_ray_serve/content/README.html) - [Ecosystem](https://docs.ray.io/en/latest/ray-overview/ray-libraries.html) - [Ray Core](https://docs.ray.io/en/latest/ray-core/walkthrough.html) - [Key Concepts](https://docs.ray.io/en/latest/ray-core/key-concepts.html) - [User Guides](https://docs.ray.io/en/latest/ray-core/user-guide.html) - [Tasks](https://docs.ray.io/en/latest/ray-core/tasks.html) - [Nested Remote Functions](https://docs.ray.io/en/latest/ray-core/tasks/nested-tasks.html) - [Actors](https://docs.ray.io/en/latest/ray-core/actors.html) - [Named Actors](https://docs.ray.io/en/latest/ray-core/actors/named-actors.html) - [Terminating Actors](https://docs.ray.io/en/latest/ray-core/actors/terminating-actors.html) - [AsyncIO / Concurrency for Actors](https://docs.ray.io/en/latest/ray-core/actors/async_api.html) - [Limiting Concurrency Per-Method with Concurrency Groups](https://docs.ray.io/en/latest/ray-core/actors/concurrency_group_api.html) - [Utility Classes](https://docs.ray.io/en/latest/ray-core/actors/actor-utils.html) - [Out-of-band Communication](https://docs.ray.io/en/latest/ray-core/actors/out-of-band-communication.html) - [Actor Task Execution Order](https://docs.ray.io/en/latest/ray-core/actors/task-orders.html) - [Objects](https://docs.ray.io/en/latest/ray-core/objects.html) - [Serialization](https://docs.ray.io/en/latest/ray-core/objects/serialization.html) - [Object Spilling](https://docs.ray.io/en/latest/ray-core/objects/object-spilling.html) - [Environment Dependencies](https://docs.ray.io/en/latest/ray-core/handling-dependencies.html) - [Scheduling](https://docs.ray.io/en/latest/ray-core/scheduling/index.html) - [Use labels to control scheduling](https://docs.ray.io/en/latest/ray-core/scheduling/labels.html) - [Resources](https://docs.ray.io/en/latest/ray-core/scheduling/resources.html) - [Accelerator Support](https://docs.ray.io/en/latest/ray-core/scheduling/accelerators.html) - [Placement Groups](https://docs.ray.io/en/latest/ray-core/scheduling/placement-group.html) - [Memory Management](https://docs.ray.io/en/latest/ray-core/scheduling/memory-management.html) - [Out-Of-Memory Prevention](https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html) - [Fault tolerance](https://docs.ray.io/en/latest/ray-core/fault-tolerance.html) - [Task Fault Tolerance](https://docs.ray.io/en/latest/ray-core/fault_tolerance/tasks.html) - [Actor Fault Tolerance](https://docs.ray.io/en/latest/ray-core/fault_tolerance/actors.html) - [Object Fault Tolerance](https://docs.ray.io/en/latest/ray-core/fault_tolerance/objects.html) - [Node Fault Tolerance](https://docs.ray.io/en/latest/ray-core/fault_tolerance/nodes.html) - [GCS Fault Tolerance](https://docs.ray.io/en/latest/ray-core/fault_tolerance/gcs.html) - [Design Patterns & Anti-patterns](https://docs.ray.io/en/latest/ray-core/patterns/index.html) - [Pattern: Using nested tasks to achieve nested parallelism](https://docs.ray.io/en/latest/ray-core/patterns/nested-tasks.html) - [Pattern: Using generators to reduce heap memory usage](https://docs.ray.io/en/latest/ray-core/patterns/generators.html) - [Pattern: Using ray.wait to limit the number of pending tasks](https://docs.ray.io/en/latest/ray-core/patterns/limit-pending-tasks.html) - [Pattern: Using resources to limit the number of concurrently running tasks](https://docs.ray.io/en/latest/ray-core/patterns/limit-running-tasks.html) - [Pattern: Using asyncio to run actor methods concurrently](https://docs.ray.io/en/latest/ray-core/patterns/concurrent-operations-async-actor.html) - [Pattern: Using an actor to synchronize other tasks and actors](https://docs.ray.io/en/latest/ray-core/patterns/actor-sync.html) - [Pattern: Using a supervisor actor to manage a tree of actors](https://docs.ray.io/en/latest/ray-core/patterns/tree-of-actors.html) - [Pattern: Using pipelining to increase throughput](https://docs.ray.io/en/latest/ray-core/patterns/pipelining.html) - [Anti-pattern: Returning ray.put() ObjectRefs from a task harms performance and fault tolerance](https://docs.ray.io/en/latest/ray-core/patterns/return-ray-put.html) - [Anti-pattern: Calling ray.get on task arguments harms performance](https://docs.ray.io/en/latest/ray-core/patterns/nested-ray-get.html) - [Anti-pattern: Calling ray.get in a loop harms parallelism](https://docs.ray.io/en/latest/ray-core/patterns/ray-get-loop.html) - [Anti-pattern: Calling ray.get unnecessarily harms performance](https://docs.ray.io/en/latest/ray-core/patterns/unnecessary-ray-get.html) - [Anti-pattern: Processing results in submission order using ray.get increases runtime](https://docs.ray.io/en/latest/ray-core/patterns/ray-get-submission-order.html) - [Anti-pattern: Fetching too many objects at once with ray.get causes failure](https://docs.ray.io/en/latest/ray-core/patterns/ray-get-too-many-objects.html) - [Anti-pattern: Over-parallelizing with too fine-grained tasks harms speedup](https://docs.ray.io/en/latest/ray-core/patterns/too-fine-grained-tasks.html) - [Anti-pattern: Redefining the same remote function or class harms performance](https://docs.ray.io/en/latest/ray-core/patterns/redefine-task-actor-loop.html) - [Anti-pattern: Passing the same large argument by value repeatedly harms performance](https://docs.ray.io/en/latest/ray-core/patterns/pass-large-arg-by-value.html) - [Anti-pattern: Closure capturing large objects harms performance](https://docs.ray.io/en/latest/ray-core/patterns/closure-capture-large-objects.html) - [Anti-pattern: Using global variables to share state between tasks and actors](https://docs.ray.io/en/latest/ray-core/patterns/global-variables.html) - [Anti-pattern: Serialize ray.ObjectRef out of band](https://docs.ray.io/en/latest/ray-core/patterns/out-of-band-object-ref-serialization.html) - [Anti-pattern: Forking new processes in application code](https://docs.ray.io/en/latest/ray-core/patterns/fork-new-processes.html) - [Ray Direct Transport (RDT)](https://docs.ray.io/en/latest/ray-core/direct-transport.html) - [Ray Compiled Graph (beta)](https://docs.ray.io/en/latest/ray-core/compiled-graph/ray-compiled-graph.html) - [Quickstart](https://docs.ray.io/en/latest/ray-core/compiled-graph/quickstart.html) - [Profiling](https://docs.ray.io/en/latest/ray-core/compiled-graph/profiling.html) - [Experimental: Overlapping communication and computation](https://docs.ray.io/en/latest/ray-core/compiled-graph/overlap.html) - [Troubleshooting](https://docs.ray.io/en/latest/ray-core/compiled-graph/troubleshooting.html) - [Compiled Graph API](https://docs.ray.io/en/latest/ray-core/compiled-graph/compiled-graph-api.html) - [Resource Isolation With Cgroup v2](https://docs.ray.io/en/latest/ray-core/resource-isolation-with-cgroupv2.html) - [Advanced topics](https://docs.ray.io/en/latest/ray-core/advanced-topics.html) - [Tips for first-time users](https://docs.ray.io/en/latest/ray-core/tips-for-first-time.html) - [Type hints in Ray](https://docs.ray.io/en/latest/ray-core/type-hint.html) - [Starting Ray](https://docs.ray.io/en/latest/ray-core/starting-ray.html) - [Ray Generators](https://docs.ray.io/en/latest/ray-core/ray-generator.html) - [Using Namespaces](https://docs.ray.io/en/latest/ray-core/namespaces.html) - [Cross-language programming](https://docs.ray.io/en/latest/ray-core/cross-language.html) - [Working with Jupyter Notebooks & JupyterLab](https://docs.ray.io/en/latest/ray-core/using-ray-with-jupyter.html) - [Lazy Computation Graphs with the Ray DAG API](https://docs.ray.io/en/latest/ray-core/ray-dag.html) - [Miscellaneous Topics](https://docs.ray.io/en/latest/ray-core/miscellaneous.html) - [Authenticating Remote URIs in runtime\_env](https://docs.ray.io/en/latest/ray-core/runtime_env_auth.html) - [Lifetimes of a User-Spawn Process](https://docs.ray.io/en/latest/ray-core/user-spawn-processes.html) - [Head Node Memory Management](https://docs.ray.io/en/latest/ray-core/head-node-memory-management.html) - [Examples](https://docs.ray.io/en/latest/ray-core/examples/overview.html) - [Batch Prediction with Ray Core](https://docs.ray.io/en/latest/ray-core/examples/batch_prediction.html) - [A Gentle Introduction to Ray Core by Example](https://docs.ray.io/en/latest/ray-core/examples/gentle_walkthrough.html) - [Using Ray for Highly Parallelizable Tasks](https://docs.ray.io/en/latest/ray-core/examples/highly_parallel.html) - [A Simple MapReduce Example with Ray Core](https://docs.ray.io/en/latest/ray-core/examples/map_reduce.html) - [Monte Carlo Estimation of π](https://docs.ray.io/en/latest/ray-core/examples/monte_carlo_pi.html) - [Simple Parallel Model Selection](https://docs.ray.io/en/latest/ray-core/examples/plot_hyperparameter.html) - [Parameter Server](https://docs.ray.io/en/latest/ray-core/examples/plot_parameter_server.html) - [Learning to Play Pong](https://docs.ray.io/en/latest/ray-core/examples/plot_pong_example.html) - [Speed up your web crawler by parallelizing it with Ray](https://docs.ray.io/en/latest/ray-core/examples/web_crawler.html) - [Ray Core API](https://docs.ray.io/en/latest/ray-core/api/index.html) - [Core API](https://docs.ray.io/en/latest/ray-core/api/core.html) - [Scheduling API](https://docs.ray.io/en/latest/ray-core/api/scheduling.html) - [Runtime Env API](https://docs.ray.io/en/latest/ray-core/api/runtime-env.html) - [Utility](https://docs.ray.io/en/latest/ray-core/api/utility.html) - [Exceptions](https://docs.ray.io/en/latest/ray-core/api/exceptions.html) - [Ray Core CLI](https://docs.ray.io/en/latest/ray-core/api/cli.html) - [State CLI](https://docs.ray.io/en/latest/ray-observability/reference/cli.html) - [State API](https://docs.ray.io/en/latest/ray-observability/reference/api.html) - [Ray Direct Transport (RDT) API](https://docs.ray.io/en/latest/ray-core/api/direct-transport.html) - [Internals](https://docs.ray.io/en/latest/ray-core/internals.html) - [Task Lifecycle](https://docs.ray.io/en/latest/ray-core/internals/task-lifecycle.html) - [Autoscaler v2](https://docs.ray.io/en/latest/ray-core/internals/autoscaler-v2.html) - [RPC Fault Tolerance](https://docs.ray.io/en/latest/ray-core/internals/rpc-fault-tolerance.html) - [Token Authentication](https://docs.ray.io/en/latest/ray-core/internals/token-authentication.html) - [Metric Exporter Infrastructure](https://docs.ray.io/en/latest/ray-core/internals/metric-exporter.html) - [Ray Event Exporter Infrastructure](https://docs.ray.io/en/latest/ray-core/internals/ray-event-exporter.html) - [Port Service Discovery](https://docs.ray.io/en/latest/ray-core/internals/port-service-discovery.html) - [Ray Data](https://docs.ray.io/en/latest/data/data.html) - [Ray Data Quickstart](https://docs.ray.io/en/latest/data/quickstart.html) - [Key Concepts](https://docs.ray.io/en/latest/data/key-concepts.html) - [User Guides](https://docs.ray.io/en/latest/data/user-guide.html) - [Loading Data](https://docs.ray.io/en/latest/data/loading-data.html) - [Inspecting Data](https://docs.ray.io/en/latest/data/inspecting-data.html) - [Transforming Data](https://docs.ray.io/en/latest/data/transforming-data.html) - [Aggregating Data](https://docs.ray.io/en/latest/data/aggregating-data.html) - [Iterating over Data](https://docs.ray.io/en/latest/data/iterating-over-data.html) - [Joining Data](https://docs.ray.io/en/latest/data/joining-data.html) - [Shuffling Data](https://docs.ray.io/en/latest/data/shuffling-data.html) - [Saving Data](https://docs.ray.io/en/latest/data/saving-data.html) - [Working with Images](https://docs.ray.io/en/latest/data/working-with-images.html) - [Working with Text](https://docs.ray.io/en/latest/data/working-with-text.html) - [Working with Tensors / NumPy](https://docs.ray.io/en/latest/data/working-with-tensors.html) - [Working with PyTorch](https://docs.ray.io/en/latest/data/working-with-pytorch.html) - [Working with LLMs](https://docs.ray.io/en/latest/data/working-with-llms.html) - [Monitoring Your Workload](https://docs.ray.io/en/latest/data/monitoring-your-workload.html) - [Execution Configurations](https://docs.ray.io/en/latest/data/execution-configurations.html) - [End-to-end: Offline Batch Inference](https://docs.ray.io/en/latest/data/batch_inference.html) - [Advanced: Performance Tips and Tuning](https://docs.ray.io/en/latest/data/performance-tips.html) - [Advanced: Read and Write Custom File Types](https://docs.ray.io/en/latest/data/custom-datasource-example.html) - [Examples](https://docs.ray.io/en/latest/data/examples.html) - [Ray Data API](https://docs.ray.io/en/latest/data/api/api.html) - [Loading Data API](https://docs.ray.io/en/latest/data/api/loading_data.html) - [Saving Data API](https://docs.ray.io/en/latest/data/api/saving_data.html) - [Dataset API](https://docs.ray.io/en/latest/data/api/dataset.html) - [DataIterator API](https://docs.ray.io/en/latest/data/api/data_iterator.html) - [ExecutionOptions API](https://docs.ray.io/en/latest/data/api/execution_options.html) - [Checkpoint API](https://docs.ray.io/en/latest/data/api/checkpoint.html) - [Aggregation API](https://docs.ray.io/en/latest/data/api/aggregate.html) - [GroupedData API](https://docs.ray.io/en/latest/data/api/grouped_data.html) - [Expressions API](https://docs.ray.io/en/latest/data/api/expressions.html) - [Data types](https://docs.ray.io/en/latest/data/api/datatype.html) - [Global configuration](https://docs.ray.io/en/latest/data/api/data_context.html) - [Preprocessor](https://docs.ray.io/en/latest/data/api/preprocessor.html) - [Large Language Model (LLM) API](https://docs.ray.io/en/latest/data/api/llm.html) - [API Guide for Users from Other Data Libraries](https://docs.ray.io/en/latest/data/api/from_other_data_libs.html) - [Contributing to Ray Data](https://docs.ray.io/en/latest/data/contributing/contributing.html) - [Contributing Guide](https://docs.ray.io/en/latest/data/contributing/contributing-guide.html) - [How to write tests](https://docs.ray.io/en/latest/data/contributing/how-to-write-tests.html) - [Comparing Ray Data to other systems](https://docs.ray.io/en/latest/data/comparisons.html) - [Ray Data Benchmarks](https://docs.ray.io/en/latest/data/benchmark.html) - [Ray Data Internals](https://docs.ray.io/en/latest/data/data-internals.html) - [Ray Train](https://docs.ray.io/en/latest/train/train.html) - [Overview](https://docs.ray.io/en/latest/train/overview.html) - [PyTorch Guide](https://docs.ray.io/en/latest/train/getting-started-pytorch.html) - [PyTorch Lightning Guide](https://docs.ray.io/en/latest/train/getting-started-pytorch-lightning.html) - [Hugging Face Transformers Guide](https://docs.ray.io/en/latest/train/getting-started-transformers.html) - [XGBoost Guide](https://docs.ray.io/en/latest/train/getting-started-xgboost.html) - [JAX Guide](https://docs.ray.io/en/latest/train/getting-started-jax.html) - [More Frameworks](https://docs.ray.io/en/latest/train/more-frameworks.html) - [Hugging Face Accelerate Guide](https://docs.ray.io/en/latest/train/huggingface-accelerate.html) - [DeepSpeed Guide](https://docs.ray.io/en/latest/train/deepspeed.html) - [TensorFlow and Keras Guide](https://docs.ray.io/en/latest/train/distributed-tensorflow-keras.html) - [LightGBM Guide](https://docs.ray.io/en/latest/train/getting-started-lightgbm.html) - [Horovod Guide](https://docs.ray.io/en/latest/train/horovod.html) - [User Guides](https://docs.ray.io/en/latest/train/user-guides.html) - [Data Loading and Preprocessing](https://docs.ray.io/en/latest/train/user-guides/data-loading-preprocessing.html) - [Configuring Scale and GPUs](https://docs.ray.io/en/latest/train/user-guides/using-gpus.html) - [Local Mode](https://docs.ray.io/en/latest/train/user-guides/local_mode.html) - [Configuring Persistent Storage](https://docs.ray.io/en/latest/train/user-guides/persistent-storage.html) - [Monitoring and Logging Metrics](https://docs.ray.io/en/latest/train/user-guides/monitoring-logging.html) - [Saving and Loading Checkpoints](https://docs.ray.io/en/latest/train/user-guides/checkpoints.html) - [Validating checkpoints asynchronously](https://docs.ray.io/en/latest/train/user-guides/asynchronous-validation.html) - [Experiment Tracking](https://docs.ray.io/en/latest/train/user-guides/experiment-tracking.html) - [Inspecting Training Results](https://docs.ray.io/en/latest/train/user-guides/results.html) - [Handling Failures and Node Preemption](https://docs.ray.io/en/latest/train/user-guides/fault-tolerance.html) - [Ray Train Metrics](https://docs.ray.io/en/latest/train/user-guides/monitor-your-application.html) - [Reproducibility](https://docs.ray.io/en/latest/train/user-guides/reproducibility.html) - [Hyperparameter Optimization](https://docs.ray.io/en/latest/train/user-guides/hyperparameter-optimization.html) - [Advanced: Scaling out expensive collate functions](https://docs.ray.io/en/latest/train/user-guides/scaling-collation-functions.html) - [Tutorials](https://docs.ray.io/en/latest/train/tutorials/content/README.html) - [Introduction to Ray Train workloads](https://docs.ray.io/en/latest/train/tutorials/content/getting-started/01_02_03_intro_to_ray_train.html) - [Computer vision pattern](https://docs.ray.io/en/latest/train/tutorials/content/workload-patterns/04a_vision_pattern.html) - [Tabular workload pattern](https://docs.ray.io/en/latest/train/tutorials/content/workload-patterns/04b_tabular_workload_pattern.html) - [Time series workload pattern](https://docs.ray.io/en/latest/train/tutorials/content/workload-patterns/04c_time_series_workload_pattern.html) - [Generative computer vision pattern](https://docs.ray.io/en/latest/train/tutorials/content/workload-patterns/04d1_generative_cv_pattern.html) - [Diffusion policy pattern](https://docs.ray.io/en/latest/train/tutorials/content/workload-patterns/04d2_policy_learning_pattern.html) - [Recommendation system pattern](https://docs.ray.io/en/latest/train/tutorials/content/workload-patterns/04e_rec_sys_workload_pattern.html) - [Examples](https://docs.ray.io/en/latest/train/examples.html) - [Benchmarks](https://docs.ray.io/en/latest/train/benchmarks.html) - [Ray Train API](https://docs.ray.io/en/latest/train/api/api.html) - [Ray Tune](https://docs.ray.io/en/latest/tune/index.html) - [Getting Started](https://docs.ray.io/en/latest/tune/getting-started.html) - [Key Concepts](https://docs.ray.io/en/latest/tune/key-concepts.html) - [User Guides](https://docs.ray.io/en/latest/tune/tutorials/overview.html) - [Running Basic Experiments](https://docs.ray.io/en/latest/tune/tutorials/tune-run.html) - [Logging and Outputs in Tune](https://docs.ray.io/en/latest/tune/tutorials/tune-output.html) - [Setting Trial Resources](https://docs.ray.io/en/latest/tune/tutorials/tune-resources.html) - [Using Search Spaces](https://docs.ray.io/en/latest/tune/tutorials/tune-search-spaces.html) - [How to Define Stopping Criteria for a Ray Tune Experiment](https://docs.ray.io/en/latest/tune/tutorials/tune-stopping.html) - [How to Save and Load Trial Checkpoints](https://docs.ray.io/en/latest/tune/tutorials/tune-trial-checkpoints.html) - [How to Configure Persistent Storage in Ray Tune](https://docs.ray.io/en/latest/tune/tutorials/tune-storage.html) - [How to Enable Fault Tolerance in Ray Tune](https://docs.ray.io/en/latest/tune/tutorials/tune-fault-tolerance.html) - [Using Callbacks and Metrics](https://docs.ray.io/en/latest/tune/tutorials/tune-metrics.html) - [Getting Data in and out of Tune](https://docs.ray.io/en/latest/tune/tutorials/tune_get_data_in_and_out.html) - [Analyzing Tune Experiment Results](https://docs.ray.io/en/latest/tune/examples/tune_analyze_results.html) - [A Guide to Population Based Training with Tune](https://docs.ray.io/en/latest/tune/examples/pbt_guide.html) - [Visualizing and Understanding PBT](https://docs.ray.io/en/latest/tune/examples/pbt_visualization/pbt_visualization.html) - [Deploying Tune in the Cloud](https://docs.ray.io/en/latest/tune/tutorials/tune-distributed.html) - [Tune Architecture](https://docs.ray.io/en/latest/tune/tutorials/tune-lifecycle.html) - [Scalability Benchmarks](https://docs.ray.io/en/latest/tune/tutorials/tune-scalability.html) - [Ray Tune Examples](https://docs.ray.io/en/latest/tune/examples/index.html) - [PyTorch Example](https://docs.ray.io/en/latest/tune/examples/tune-pytorch-cifar.html) - [PyTorch Lightning Example](https://docs.ray.io/en/latest/tune/examples/tune-pytorch-lightning.html) - [XGBoost Example](https://docs.ray.io/en/latest/tune/examples/tune-xgboost.html) - [LightGBM Example](https://docs.ray.io/en/latest/tune/examples/lightgbm_example.html) - [Hugging Face Transformers Example](https://docs.ray.io/en/latest/tune/examples/pbt_transformers.html) - [Ray RLlib Example](https://docs.ray.io/en/latest/tune/examples/pbt_ppo_example.html) - [Keras Example](https://docs.ray.io/en/latest/tune/examples/tune_mnist_keras.html) - [PyTorch with ASHA](https://docs.ray.io/en/latest/tune/examples/tune_pytorch_asha/content/tune_pytorch_asha.html) - [Weights & Biases Example](https://docs.ray.io/en/latest/tune/examples/tune-wandb.html) - [MLflow Example](https://docs.ray.io/en/latest/tune/examples/tune-mlflow.html) - [Aim Example](https://docs.ray.io/en/latest/tune/examples/tune-aim.html) - [Comet Example](https://docs.ray.io/en/latest/tune/examples/tune-comet.html) - [Ax Example](https://docs.ray.io/en/latest/tune/examples/ax_example.html) - [HyperOpt Example](https://docs.ray.io/en/latest/tune/examples/hyperopt_example.html) - [Bayesopt Example](https://docs.ray.io/en/latest/tune/examples/bayesopt_example.html) - [BOHB Example](https://docs.ray.io/en/latest/tune/examples/bohb_example.html) - [Nevergrad Example](https://docs.ray.io/en/latest/tune/examples/nevergrad_example.html) - [Optuna Example](https://docs.ray.io/en/latest/tune/examples/optuna_example.html) - [Ray Tune FAQ](https://docs.ray.io/en/latest/tune/faq.html) - [Ray Tune API](https://docs.ray.io/en/latest/tune/api/api.html) - [Tune Execution (tune.Tuner)](https://docs.ray.io/en/latest/tune/api/execution.html) - [Tune Experiment Results (tune.ResultGrid)](https://docs.ray.io/en/latest/tune/api/result_grid.html) - [Training in Tune (tune.Trainable, tune.report)](https://docs.ray.io/en/latest/tune/api/trainable.html) - [Tune Search Space API](https://docs.ray.io/en/latest/tune/api/search_space.html) - [Tune Search Algorithms (tune.search)](https://docs.ray.io/en/latest/tune/api/suggestion.html) - [Tune Trial Schedulers (tune.schedulers)](https://docs.ray.io/en/latest/tune/api/schedulers.html) - [Tune Stopping Mechanisms (tune.stopper)](https://docs.ray.io/en/latest/tune/api/stoppers.html) - [Tune Console Output (Reporters)](https://docs.ray.io/en/latest/tune/api/reporters.html) - [Syncing in Tune](https://docs.ray.io/en/latest/tune/api/syncing.html) - [Tune Loggers (tune.logger)](https://docs.ray.io/en/latest/tune/api/logging.html) - [Tune Callbacks (tune.Callback)](https://docs.ray.io/en/latest/tune/api/callbacks.html) - [Environment variables used by Ray Tune](https://docs.ray.io/en/latest/tune/api/env.html) - [External library integrations for Ray Tune](https://docs.ray.io/en/latest/tune/api/integration.html) - [Tune Internals](https://docs.ray.io/en/latest/tune/api/internals.html) - [Tune CLI (Experimental)](https://docs.ray.io/en/latest/tune/api/cli.html) - [Ray Serve](https://docs.ray.io/en/latest/serve/index.html) - [Getting Started](https://docs.ray.io/en/latest/serve/getting_started.html) - [Key Concepts](https://docs.ray.io/en/latest/serve/key-concepts.html) - [Develop and Deploy an ML Application](https://docs.ray.io/en/latest/serve/develop-and-deploy.html) - [Deploy Compositions of Models](https://docs.ray.io/en/latest/serve/model_composition.html) - [Deploy Multiple Applications](https://docs.ray.io/en/latest/serve/multi-app.html) - [Model Multiplexing](https://docs.ray.io/en/latest/serve/model-multiplexing.html) - [Model Registry Integration](https://docs.ray.io/en/latest/serve/model-registries.html) - [Configure Ray Serve deployments](https://docs.ray.io/en/latest/serve/configure-serve-deployment.html) - [Set Up FastAPI and HTTP](https://docs.ray.io/en/latest/serve/http-guide.html) - [Serving LLMs](https://docs.ray.io/en/latest/serve/llm/index.html) - [Quickstart](https://docs.ray.io/en/latest/serve/llm/quick-start.html) - [Examples](https://docs.ray.io/en/latest/serve/llm/examples.html) - [User Guides](https://docs.ray.io/en/latest/serve/llm/user-guides/index.html) - [Cross-node parallelism](https://docs.ray.io/en/latest/serve/llm/user-guides/cross-node-parallelism.html) - [Data parallel attention](https://docs.ray.io/en/latest/serve/llm/user-guides/data-parallel-attention.html) - [Deployment Initialization](https://docs.ray.io/en/latest/serve/llm/user-guides/deployment-initialization.html) - [Prefill/decode disaggregation](https://docs.ray.io/en/latest/serve/llm/user-guides/prefill-decode.html) - [KV cache offloading](https://docs.ray.io/en/latest/serve/llm/user-guides/kv-cache-offloading.html) - [Prefix-aware routing](https://docs.ray.io/en/latest/serve/llm/user-guides/prefix-aware-routing.html) - [Multi-LoRA deployment](https://docs.ray.io/en/latest/serve/llm/user-guides/multi-lora.html) - [vLLM compatibility](https://docs.ray.io/en/latest/serve/llm/user-guides/vllm-compatibility.html) - [Fractional GPU serving](https://docs.ray.io/en/latest/serve/llm/user-guides/fractional-gpu.html) - [Observability and monitoring](https://docs.ray.io/en/latest/serve/llm/user-guides/observability.html) - [Architecture](https://docs.ray.io/en/latest/serve/llm/architecture/index.html) - [Architecture overview](https://docs.ray.io/en/latest/serve/llm/architecture/overview.html) - [Core components](https://docs.ray.io/en/latest/serve/llm/architecture/core.html) - [Serving patterns](https://docs.ray.io/en/latest/serve/llm/architecture/serving-patterns/index.html) - [Request routing](https://docs.ray.io/en/latest/serve/llm/architecture/routing-policies.html) - [Benchmarks](https://docs.ray.io/en/latest/serve/llm/benchmarks.html) - [Troubleshooting](https://docs.ray.io/en/latest/serve/llm/troubleshooting.html) - [Production Guide](https://docs.ray.io/en/latest/serve/production-guide/index.html) - [Serve Config Files](https://docs.ray.io/en/latest/serve/production-guide/config.html) - [Deploy on Kubernetes](https://docs.ray.io/en/latest/serve/production-guide/kubernetes.html) - [Custom Docker Images](https://docs.ray.io/en/latest/serve/production-guide/docker.html) - [Add End-to-End Fault Tolerance](https://docs.ray.io/en/latest/serve/production-guide/fault-tolerance.html) - [Handle Dependencies](https://docs.ray.io/en/latest/serve/production-guide/handling-dependencies.html) - [Best practices in production](https://docs.ray.io/en/latest/serve/production-guide/best-practices.html) - [Monitor Your Application](https://docs.ray.io/en/latest/serve/monitoring.html) - [Resource Allocation](https://docs.ray.io/en/latest/serve/resource-allocation.html) - [Ray Serve Autoscaling](https://docs.ray.io/en/latest/serve/autoscaling-guide.html) - [Asynchronous Inference](https://docs.ray.io/en/latest/serve/asynchronous-inference.html) - [Advanced Guides](https://docs.ray.io/en/latest/serve/advanced-guides/index.html) - [Pass Arguments to Applications](https://docs.ray.io/en/latest/serve/advanced-guides/app-builder-guide.html) - [Advanced Ray Serve Autoscaling](https://docs.ray.io/en/latest/serve/advanced-guides/advanced-autoscaling.html) - [Asyncio and concurrency best practices in Ray Serve](https://docs.ray.io/en/latest/serve/advanced-guides/asyncio-best-practices.html) - [Performance Tuning](https://docs.ray.io/en/latest/serve/advanced-guides/performance.html) - [Dynamic Request Batching](https://docs.ray.io/en/latest/serve/advanced-guides/dyn-req-batch.html) - [Updating Applications In-Place](https://docs.ray.io/en/latest/serve/advanced-guides/inplace-updates.html) - [Development Workflow](https://docs.ray.io/en/latest/serve/advanced-guides/dev-workflow.html) - [Set Up a gRPC Service](https://docs.ray.io/en/latest/serve/advanced-guides/grpc-guide.html) - [Replica ranks](https://docs.ray.io/en/latest/serve/advanced-guides/replica-ranks.html) - [Replica scheduling](https://docs.ray.io/en/latest/serve/advanced-guides/replica-scheduling.html) - [Experimental Java API](https://docs.ray.io/en/latest/serve/advanced-guides/managing-java-deployments.html) - [Deploy on VM](https://docs.ray.io/en/latest/serve/advanced-guides/deploy-vm.html) - [Run Multiple Applications in Different Containers](https://docs.ray.io/en/latest/serve/advanced-guides/multi-app-container.html) - [Use Custom Algorithm for Request Routing](https://docs.ray.io/en/latest/serve/advanced-guides/custom-request-router.html) - [Troubleshoot multi-node GPU serving on KubeRay](https://docs.ray.io/en/latest/serve/advanced-guides/multi-node-gpu-troubleshooting.html) - [Architecture](https://docs.ray.io/en/latest/serve/architecture.html) - [Examples](https://docs.ray.io/en/latest/serve/examples.html) - [Ray Serve API](https://docs.ray.io/en/latest/serve/api/index.html) - [Ray RLlib](https://docs.ray.io/en/latest/rllib/index.html) - [Getting Started](https://docs.ray.io/en/latest/rllib/getting-started.html) - [Key concepts](https://docs.ray.io/en/latest/rllib/key-concepts.html) - [Environments](https://docs.ray.io/en/latest/rllib/rllib-env.html) - [Multi-Agent Environments](https://docs.ray.io/en/latest/rllib/multi-agent-envs.html) - [Hierarchical Environments](https://docs.ray.io/en/latest/rllib/hierarchical-envs.html) - [External Environments and Applications](https://docs.ray.io/en/latest/rllib/external-envs.html) - [AlgorithmConfig API](https://docs.ray.io/en/latest/rllib/algorithm-config.html) - [Algorithms](https://docs.ray.io/en/latest/rllib/rllib-algorithms.html) - [User Guides](https://docs.ray.io/en/latest/rllib/user-guides.html) - [Advanced Python APIs](https://docs.ray.io/en/latest/rllib/rllib-advanced-api.html) - [Callbacks](https://docs.ray.io/en/latest/rllib/rllib-callback.html) - [Checkpointing](https://docs.ray.io/en/latest/rllib/checkpoints.html) - [MetricsLogger API](https://docs.ray.io/en/latest/rllib/metrics-logger.html) - [Episodes](https://docs.ray.io/en/latest/rllib/single-agent-episode.html) - [ConnectorV2 and ConnectorV2 pipelines](https://docs.ray.io/en/latest/rllib/connector-v2.html) - [Env-to-module pipelines](https://docs.ray.io/en/latest/rllib/env-to-module-connector.html) - [Learner connector pipelines](https://docs.ray.io/en/latest/rllib/learner-connector.html) - [Replay Buffers](https://docs.ray.io/en/latest/rllib/rllib-replay-buffers.html) - [Working with offline data](https://docs.ray.io/en/latest/rllib/rllib-offline.html) - [RL Modules](https://docs.ray.io/en/latest/rllib/rl-modules.html) - [Learner (Alpha)](https://docs.ray.io/en/latest/rllib/rllib-learner.html) - [Fault Tolerance And Elastic Training](https://docs.ray.io/en/latest/rllib/rllib-fault-tolerance.html) - [Install RLlib for Development](https://docs.ray.io/en/latest/rllib/rllib-dev.html) - [RLlib scaling guide](https://docs.ray.io/en/latest/rllib/scaling-guide.html) - [Examples](https://docs.ray.io/en/latest/rllib/rllib-examples.html) - [New API stack migration guide](https://docs.ray.io/en/latest/rllib/new-api-stack-migration-guide.html) - [Ray RLlib API](https://docs.ray.io/en/latest/rllib/package_ref/index.html) - [Algorithm Configuration API](https://docs.ray.io/en/latest/rllib/package_ref/algorithm-config.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.build\_algo](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.build_algo.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.build\_learner\_group](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.build_learner_group.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.build\_learner](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.build_learner.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.is\_multi\_agent](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.is_multi_agent.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.is\_offline](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.is_offline.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.learner\_class](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.learner_class.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.model\_config](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.model_config.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.rl\_module\_spec](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.rl_module_spec.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.total\_train\_batch\_size](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.total_train_batch_size.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.get\_default\_learner\_class](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_default_learner_class.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.get\_default\_rl\_module\_spec](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_default_rl_module_spec.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.get\_evaluation\_config\_object](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_evaluation_config_object.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.get\_multi\_rl\_module\_spec](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_multi_rl_module_spec.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.get\_multi\_agent\_setup](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_multi_agent_setup.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.get\_rollout\_fragment\_length](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_rollout_fragment_length.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.copy](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.copy.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.validate](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.validate.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.freeze](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.freeze.html) - [Algorithms](https://docs.ray.io/en/latest/rllib/package_ref/algorithm.html) - [ray.rllib.algorithms.algorithm.Algorithm](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.html) - [ray.rllib.algorithms.algorithm.Algorithm.setup](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.setup.html) - [ray.rllib.algorithms.algorithm.Algorithm.get\_default\_config](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.get_default_config.html) - [ray.rllib.algorithms.algorithm.Algorithm.env\_runner](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.env_runner.html) - [ray.rllib.algorithms.algorithm.Algorithm.eval\_env\_runner](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.eval_env_runner.html) - [ray.rllib.algorithms.algorithm.Algorithm.train](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.train.html) - [ray.rllib.algorithms.algorithm.Algorithm.training\_step](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.training_step.html) - [ray.rllib.algorithms.algorithm.Algorithm.save\_to\_path](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.save_to_path.html) - [ray.rllib.algorithms.algorithm.Algorithm.restore\_from\_path](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.restore_from_path.html) - [ray.rllib.algorithms.algorithm.Algorithm.from\_checkpoint](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.from_checkpoint.html) - [ray.rllib.algorithms.algorithm.Algorithm.get\_state](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.get_state.html) - [ray.rllib.algorithms.algorithm.Algorithm.set\_state](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.set_state.html) - [ray.rllib.algorithms.algorithm.Algorithm.evaluate](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.evaluate.html) - [ray.rllib.algorithms.algorithm.Algorithm.get\_module](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.get_module.html) - [ray.rllib.algorithms.algorithm.Algorithm.add\_policy](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.add_policy.html) - [ray.rllib.algorithms.algorithm.Algorithm.remove\_policy](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.remove_policy.html) - [Callback APIs](https://docs.ray.io/en/latest/rllib/package_ref/callback.html) - [ray.rllib.callbacks.callbacks.RLlibCallback](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.callbacks.callbacks.RLlibCallback.html) - [ray.rllib.callbacks.callbacks.RLlibCallback.on\_algorithm\_init](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.callbacks.callbacks.RLlibCallback.on_algorithm_init.html) - [ray.rllib.callbacks.callbacks.RLlibCallback.on\_sample\_end](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.callbacks.callbacks.RLlibCallback.on_sample_end.html) - [ray.rllib.callbacks.callbacks.RLlibCallback.on\_train\_result](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.callbacks.callbacks.RLlibCallback.on_train_result.html) - [ray.rllib.callbacks.callbacks.RLlibCallback.on\_evaluate\_start](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.callbacks.callbacks.RLlibCallback.on_evaluate_start.html) - [ray.rllib.callbacks.callbacks.RLlibCallback.on\_evaluate\_end](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.callbacks.callbacks.RLlibCallback.on_evaluate_end.html) - [ray.rllib.callbacks.callbacks.RLlibCallback.on\_env\_runners\_recreated](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.callbacks.callbacks.RLlibCallback.on_env_runners_recreated.html) - [ray.rllib.callbacks.callbacks.RLlibCallback.on\_checkpoint\_loaded](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.callbacks.callbacks.RLlibCallback.on_checkpoint_loaded.html) - [ray.rllib.callbacks.callbacks.RLlibCallback.on\_environment\_created](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.callbacks.callbacks.RLlibCallback.on_environment_created.html) - [ray.rllib.callbacks.callbacks.RLlibCallback.on\_episode\_created](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_created.html) - [ray.rllib.callbacks.callbacks.RLlibCallback.on\_episode\_start](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_start.html) - [ray.rllib.callbacks.callbacks.RLlibCallback.on\_episode\_step](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_step.html) - [ray.rllib.callbacks.callbacks.RLlibCallback.on\_episode\_end](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_end.html) - [Environments](https://docs.ray.io/en/latest/rllib/package_ref/env.html) - [EnvRunner API](https://docs.ray.io/en/latest/rllib/package_ref/env/env_runner.html) - [SingleAgentEnvRunner API](https://docs.ray.io/en/latest/rllib/package_ref/env/single_agent_env_runner.html) - [SingleAgentEpisode API](https://docs.ray.io/en/latest/rllib/package_ref/env/single_agent_episode.html) - [MultiAgentEnv API](https://docs.ray.io/en/latest/rllib/package_ref/env/multi_agent_env.html) - [MultiAgentEnvRunner API](https://docs.ray.io/en/latest/rllib/package_ref/env/multi_agent_env_runner.html) - [MultiAgentEpisode API](https://docs.ray.io/en/latest/rllib/package_ref/env/multi_agent_episode.html) - [External Envs](https://docs.ray.io/en/latest/rllib/package_ref/env/external.html) - [Env Utils](https://docs.ray.io/en/latest/rllib/package_ref/env/utils.html) - [RLModule APIs](https://docs.ray.io/en/latest/rllib/package_ref/rl_modules.html) - [ray.rllib.core.rl\_module.rl\_module.RLModuleSpec](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModuleSpec.html) - [ray.rllib.core.rl\_module.rl\_module.RLModuleSpec.build](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModuleSpec.build.html) - [ray.rllib.core.rl\_module.rl\_module.RLModuleSpec.module\_class](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModuleSpec.module_class.html) - [ray.rllib.core.rl\_module.rl\_module.RLModuleSpec.observation\_space](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModuleSpec.observation_space.html) - [ray.rllib.core.rl\_module.rl\_module.RLModuleSpec.action\_space](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModuleSpec.action_space.html) - [ray.rllib.core.rl\_module.rl\_module.RLModuleSpec.inference\_only](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModuleSpec.inference_only.html) - [ray.rllib.core.rl\_module.rl\_module.RLModuleSpec.learner\_only](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModuleSpec.learner_only.html) - [ray.rllib.core.rl\_module.rl\_module.RLModuleSpec.model\_config](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModuleSpec.model_config.html) - [ray.rllib.core.rl\_module.multi\_rl\_module.MultiRLModuleSpec](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.multi_rl_module.MultiRLModuleSpec.html) - [ray.rllib.core.rl\_module.multi\_rl\_module.MultiRLModuleSpec.build](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.multi_rl_module.MultiRLModuleSpec.build.html) - [ray.rllib.core.rl\_module.default\_model\_config.DefaultModelConfig](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.default_model_config.DefaultModelConfig.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.observation\_space](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.observation_space.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.action\_space](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.action_space.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.inference\_only](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.inference_only.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.model\_config](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.model_config.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.setup](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.setup.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.as\_multi\_rl\_module](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.as_multi_rl_module.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.forward\_exploration](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.forward_exploration.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.forward\_inference](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.forward_inference.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.forward\_train](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.forward_train.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.\_forward](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule._forward.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.\_forward\_exploration](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule._forward_exploration.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.\_forward\_inference](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule._forward_inference.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.\_forward\_train](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule._forward_train.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.save\_to\_path](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.save_to_path.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.restore\_from\_path](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.restore_from_path.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.from\_checkpoint](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.from_checkpoint.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.get\_state](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.get_state.html) - [ray.rllib.core.rl\_module.rl\_module.RLModule.set\_state](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.set_state.html) - [ray.rllib.core.rl\_module.multi\_rl\_module.MultiRLModule](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.html) - [ray.rllib.core.rl\_module.multi\_rl\_module.MultiRLModule.setup](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.setup.html) - [ray.rllib.core.rl\_module.multi\_rl\_module.MultiRLModule.as\_multi\_rl\_module](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.as_multi_rl_module.html) - [ray.rllib.core.rl\_module.multi\_rl\_module.MultiRLModule.add\_module](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.add_module.html) - [ray.rllib.core.rl\_module.multi\_rl\_module.MultiRLModule.remove\_module](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.remove_module.html) - [ray.rllib.core.rl\_module.multi\_rl\_module.MultiRLModule.save\_to\_path](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.save_to_path.html) - [ray.rllib.core.rl\_module.multi\_rl\_module.MultiRLModule.restore\_from\_path](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.restore_from_path.html) - [ray.rllib.core.rl\_module.multi\_rl\_module.MultiRLModule.from\_checkpoint](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.from_checkpoint.html) - [ray.rllib.core.rl\_module.multi\_rl\_module.MultiRLModule.get\_state](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.get_state.html) - [ray.rllib.core.rl\_module.multi\_rl\_module.MultiRLModule.set\_state](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.rl_module.multi_rl_module.MultiRLModule.set_state.html) - [Distribution API](https://docs.ray.io/en/latest/rllib/package_ref/distributions.html) - [ray.rllib.models.distributions.Distribution](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.models.distributions.Distribution.html) - [ray.rllib.models.distributions.Distribution.from\_logits](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.models.distributions.Distribution.from_logits.html) - [ray.rllib.models.distributions.Distribution.sample](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.models.distributions.Distribution.sample.html) - [ray.rllib.models.distributions.Distribution.rsample](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.models.distributions.Distribution.rsample.html) - [ray.rllib.models.distributions.Distribution.logp](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.models.distributions.Distribution.logp.html) - [ray.rllib.models.distributions.Distribution.kl](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.models.distributions.Distribution.kl.html) - [LearnerGroup API](https://docs.ray.io/en/latest/rllib/package_ref/learner.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.learners](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.learners.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.build\_learner\_group](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.build_learner_group.html) - [ray.rllib.core.learner.learner\_group.LearnerGroup](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.core.learner.learner_group.LearnerGroup.html) - [Offline RL API](https://docs.ray.io/en/latest/rllib/package_ref/offline.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.offline\_data](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.offline_data.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.learners](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.learners.html) - [ray.rllib.algorithms.algorithm\_config.AlgorithmConfig.env\_runners](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.env_runners.html) - [ray.rllib.offline.offline\_env\_runner.OfflineSingleAgentEnvRunner](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_env_runner.OfflineSingleAgentEnvRunner.html) - [ray.rllib.offline.offline\_data.OfflineData](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_data.OfflineData.html) - [ray.rllib.offline.offline\_data.OfflineData.\_\_init\_\_](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_data.OfflineData.__init__.html) - [ray.rllib.offline.offline\_data.OfflineData.sample](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_data.OfflineData.sample.html) - [ray.rllib.offline.offline\_data.OfflineData.default\_map\_batches\_kwargs](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_data.OfflineData.default_map_batches_kwargs.html) - [ray.rllib.offline.offline\_data.OfflineData.default\_iter\_batches\_kwargs](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_data.OfflineData.default_iter_batches_kwargs.html) - [ray.rllib.offline.offline\_prelearner.OfflinePreLearner](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_prelearner.OfflinePreLearner.html) - [ray.rllib.offline.offline\_prelearner.OfflinePreLearner.\_\_init\_\_](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_prelearner.OfflinePreLearner.__init__.html) - [ray.rllib.offline.offline\_prelearner.SCHEMA](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_prelearner.SCHEMA.html) - [ray.rllib.offline.offline\_prelearner.OfflinePreLearner.\_\_call\_\_](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_prelearner.OfflinePreLearner.__call__.html) - [ray.rllib.offline.offline\_prelearner.OfflinePreLearner.\_map\_to\_episodes](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_prelearner.OfflinePreLearner._map_to_episodes.html) - [ray.rllib.offline.offline\_prelearner.OfflinePreLearner.\_map\_sample\_batch\_to\_episode](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_prelearner.OfflinePreLearner._map_sample_batch_to_episode.html) - [ray.rllib.offline.offline\_prelearner.OfflinePreLearner.\_should\_module\_be\_updated](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_prelearner.OfflinePreLearner._should_module_be_updated.html) - [ray.rllib.offline.offline\_prelearner.OfflinePreLearner.default\_prelearner\_buffer\_class](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_prelearner.OfflinePreLearner.default_prelearner_buffer_class.html) - [ray.rllib.offline.offline\_prelearner.OfflinePreLearner.default\_prelearner\_buffer\_kwargs](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.offline.offline_prelearner.OfflinePreLearner.default_prelearner_buffer_kwargs.html) - [ConnectorV2 API](https://docs.ray.io/en/latest/rllib/package_ref/connector-v2.html) - [Replay Buffer API](https://docs.ray.io/en/latest/rllib/package_ref/replay-buffers.html) - [ray.rllib.utils.replay\_buffers.replay\_buffer.StorageUnit](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.replay_buffer.StorageUnit.html) - [ray.rllib.utils.replay\_buffers.replay\_buffer.ReplayBuffer](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer.html) - [ray.rllib.utils.replay\_buffers.prioritized\_replay\_buffer.PrioritizedReplayBuffer](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBuffer.html) - [ray.rllib.utils.replay\_buffers.reservoir\_replay\_buffer.ReservoirReplayBuffer](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.reservoir_replay_buffer.ReservoirReplayBuffer.html) - [ray.rllib.utils.replay\_buffers.replay\_buffer.ReplayBuffer.sample](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer.sample.html) - [ray.rllib.utils.replay\_buffers.replay\_buffer.ReplayBuffer.add](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer.add.html) - [ray.rllib.utils.replay\_buffers.replay\_buffer.ReplayBuffer.get\_state](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer.get_state.html) - [ray.rllib.utils.replay\_buffers.replay\_buffer.ReplayBuffer.set\_state](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer.set_state.html) - [ray.rllib.utils.replay\_buffers.multi\_agent\_replay\_buffer.MultiAgentReplayBuffer](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.multi_agent_replay_buffer.MultiAgentReplayBuffer.html) - [ray.rllib.utils.replay\_buffers.multi\_agent\_prioritized\_replay\_buffer.MultiAgentPrioritizedReplayBuffer](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.multi_agent_prioritized_replay_buffer.MultiAgentPrioritizedReplayBuffer.html) - [ray.rllib.utils.replay\_buffers.utils.update\_priorities\_in\_replay\_buffer](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.utils.update_priorities_in_replay_buffer.html) - [ray.rllib.utils.replay\_buffers.utils.sample\_min\_n\_steps\_from\_buffer](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.utils.sample_min_n_steps_from_buffer.html) - [RLlib Utilities](https://docs.ray.io/en/latest/rllib/package_ref/utils.html) - [ray.rllib.utils.metrics.metrics\_logger.MetricsLogger](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.metrics.metrics_logger.MetricsLogger.html) - [ray.rllib.utils.metrics.metrics\_logger.MetricsLogger.peek](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.metrics.metrics_logger.MetricsLogger.peek.html) - [ray.rllib.utils.metrics.metrics\_logger.MetricsLogger.log\_value](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.metrics.metrics_logger.MetricsLogger.log_value.html) - [ray.rllib.utils.metrics.metrics\_logger.MetricsLogger.log\_dict](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.metrics.metrics_logger.MetricsLogger.log_dict.html) - [ray.rllib.utils.metrics.metrics\_logger.MetricsLogger.aggregate](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.metrics.metrics_logger.MetricsLogger.aggregate.html) - [ray.rllib.utils.metrics.metrics\_logger.MetricsLogger.log\_time](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.metrics.metrics_logger.MetricsLogger.log_time.html) - [ray.rllib.utils.schedules.scheduler.Scheduler](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.schedules.scheduler.Scheduler.html) - [ray.rllib.utils.schedules.scheduler.Scheduler.validate](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.schedules.scheduler.Scheduler.validate.html) - [ray.rllib.utils.schedules.scheduler.Scheduler.get\_current\_value](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.schedules.scheduler.Scheduler.get_current_value.html) - [ray.rllib.utils.schedules.scheduler.Scheduler.update](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.schedules.scheduler.Scheduler.update.html) - [ray.rllib.utils.schedules.scheduler.Scheduler.\_create\_tensor\_variable](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.schedules.scheduler.Scheduler._create_tensor_variable.html) - [ray.rllib.utils.framework.try\_import\_torch](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.framework.try_import_torch.html) - [ray.rllib.utils.torch\_utils.clip\_gradients](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.torch_utils.clip_gradients.html) - [ray.rllib.utils.torch\_utils.compute\_global\_norm](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.torch_utils.compute_global_norm.html) - [ray.rllib.utils.torch\_utils.convert\_to\_torch\_tensor](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.torch_utils.convert_to_torch_tensor.html) - [ray.rllib.utils.torch\_utils.explained\_variance](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.torch_utils.explained_variance.html) - [ray.rllib.utils.torch\_utils.flatten\_inputs\_to\_1d\_tensor](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.torch_utils.flatten_inputs_to_1d_tensor.html) - [ray.rllib.utils.torch\_utils.global\_norm](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.torch_utils.global_norm.html) - [ray.rllib.utils.torch\_utils.one\_hot](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.torch_utils.one_hot.html) - [ray.rllib.utils.torch\_utils.reduce\_mean\_ignore\_inf](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.torch_utils.reduce_mean_ignore_inf.html) - [ray.rllib.utils.torch\_utils.sequence\_mask](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.torch_utils.sequence_mask.html) - [ray.rllib.utils.torch\_utils.set\_torch\_seed](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.torch_utils.set_torch_seed.html) - [ray.rllib.utils.torch\_utils.softmax\_cross\_entropy\_with\_logits](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.torch_utils.softmax_cross_entropy_with_logits.html) - [ray.rllib.utils.torch\_utils.update\_target\_network](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.torch_utils.update_target_network.html) - [ray.rllib.utils.numpy.aligned\_array](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.numpy.aligned_array.html) - [ray.rllib.utils.numpy.concat\_aligned](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.numpy.concat_aligned.html) - [ray.rllib.utils.numpy.convert\_to\_numpy](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.numpy.convert_to_numpy.html) - [ray.rllib.utils.numpy.fc](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.numpy.fc.html) - [ray.rllib.utils.numpy.flatten\_inputs\_to\_1d\_tensor](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.numpy.flatten_inputs_to_1d_tensor.html) - [ray.rllib.utils.numpy.make\_action\_immutable](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.numpy.make_action_immutable.html) - [ray.rllib.utils.numpy.huber\_loss](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.numpy.huber_loss.html) - [ray.rllib.utils.numpy.l2\_loss](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.numpy.l2_loss.html) - [ray.rllib.utils.numpy.lstm](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.numpy.lstm.html) - [ray.rllib.utils.numpy.one\_hot](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.numpy.one_hot.html) - [ray.rllib.utils.numpy.relu](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.numpy.relu.html) - [ray.rllib.utils.numpy.sigmoid](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.numpy.sigmoid.html) - [ray.rllib.utils.numpy.softmax](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.numpy.softmax.html) - [ray.rllib.utils.checkpoints.try\_import\_msgpack](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.checkpoints.try_import_msgpack.html) - [ray.rllib.utils.checkpoints.Checkpointable](https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.utils.checkpoints.Checkpointable.html) - [More Libraries](https://docs.ray.io/en/latest/ray-more-libs/index.html) - [Distributed Scikit-learn / Joblib](https://docs.ray.io/en/latest/ray-more-libs/joblib.html) - [Distributed multiprocessing.Pool](https://docs.ray.io/en/latest/ray-more-libs/multiprocessing.html) - [Ray Collective Communication Lib](https://docs.ray.io/en/latest/ray-more-libs/ray-collective.html) - [Using Dask on Ray](https://docs.ray.io/en/latest/ray-more-libs/dask-on-ray.html) - [ray.util.dask.RayDaskCallback](https://docs.ray.io/en/latest/ray-more-libs/doc/ray.util.dask.RayDaskCallback.html) - [ray.util.dask.RayDaskCallback.ray\_active](https://docs.ray.io/en/latest/ray-more-libs/doc/ray.util.dask.RayDaskCallback.ray_active.html) - [ray.util.dask.callbacks.RayDaskCallback.\_ray\_presubmit](https://docs.ray.io/en/latest/ray-more-libs/doc/ray.util.dask.callbacks.RayDaskCallback._ray_presubmit.html) - [ray.util.dask.callbacks.RayDaskCallback.\_ray\_postsubmit](https://docs.ray.io/en/latest/ray-more-libs/doc/ray.util.dask.callbacks.RayDaskCallback._ray_postsubmit.html) - [ray.util.dask.callbacks.RayDaskCallback.\_ray\_pretask](https://docs.ray.io/en/latest/ray-more-libs/doc/ray.util.dask.callbacks.RayDaskCallback._ray_pretask.html) - [ray.util.dask.callbacks.RayDaskCallback.\_ray\_posttask](https://docs.ray.io/en/latest/ray-more-libs/doc/ray.util.dask.callbacks.RayDaskCallback._ray_posttask.html) - [ray.util.dask.callbacks.RayDaskCallback.\_ray\_postsubmit\_all](https://docs.ray.io/en/latest/ray-more-libs/doc/ray.util.dask.callbacks.RayDaskCallback._ray_postsubmit_all.html) - [ray.util.dask.callbacks.RayDaskCallback.\_ray\_finish](https://docs.ray.io/en/latest/ray-more-libs/doc/ray.util.dask.callbacks.RayDaskCallback._ray_finish.html) - [Using Spark on Ray (RayDP)](https://docs.ray.io/en/latest/ray-more-libs/raydp.html) - [Using Mars on Ray](https://docs.ray.io/en/latest/ray-more-libs/mars-on-ray.html) - [Using Pandas on Ray (Modin)](https://docs.ray.io/en/latest/ray-more-libs/modin/index.html) - [Distributed Data Processing in Data-Juicer](https://docs.ray.io/en/latest/ray-more-libs/data_juicer_distributed_data_processing.html) - [Ray Clusters](https://docs.ray.io/en/latest/cluster/getting-started.html) - [Key Concepts](https://docs.ray.io/en/latest/cluster/key-concepts.html) - [Deploying on Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/index.html) - [Getting Started with KubeRay](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started.html) - [KubeRay Operator Installation](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/kuberay-operator-installation.html) - [RayCluster Quickstart](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/raycluster-quick-start.html) - [RayJob Quickstart](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/rayjob-quick-start.html) - [RayService Quickstart](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/rayservice-quick-start.html) - [User Guides](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides.html) - [Deploy Ray Serve Apps](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/rayservice.html) - [RayService worker Pods aren’t ready](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/rayservice-no-ray-serve-replica.html) - [RayService high availability](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/rayservice-high-availability.html) - [RayService Zero-Downtime Incremental Upgrades](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/rayservice-incremental-upgrade.html) - [KubeRay Observability](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/observability.html) - [KubeRay upgrade guide](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/upgrade-guide.html) - [Managed Kubernetes services](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/k8s-cluster-setup.html) - [Best Practices for Storage and Dependencies](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/storage.html) - [RayCluster Configuration](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html) - [KubeRay Autoscaling](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/configuring-autoscaling.html) - [KubeRay label-based scheduling](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/label-based-scheduling.html) - [GCS fault tolerance in KubeRay](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/kuberay-gcs-ft.html) - [Tuning Redis for a Persistent Fault Tolerant GCS](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/kuberay-gcs-persistent-ft.html) - [Configuring KubeRay to use Google Cloud Storage Buckets in GKE](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/gke-gcs-bucket.html) - [Persist KubeRay custom resource logs](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/persist-kuberay-custom-resource-logs.html) - [Persist KubeRay Operator Logs](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/persist-kuberay-operator-logs.html) - [Using GPUs](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/gpu.html) - [Use TPUs with KubeRay](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/tpu.html) - [Specify container commands for Ray head/worker Pods](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/pod-command.html) - [Helm Chart RBAC](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/helm-chart-rbac.html) - [TLS Authentication](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/tls.html) - [(Advanced) Understanding the Ray Autoscaler in the Context of Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/k8s-autoscaler.html) - [Use kubectl plugin (beta)](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/kubectl-plugin.html) - [Configure Ray clusters to use token authentication](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/kuberay-auth.html) - [Reducing image pull latency on Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/reduce-image-pull-latency.html) - [Using `uv` for Python package management in KubeRay](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/uv.html) - [Use KubeRay dashboard (experimental)](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/kuberay-dashboard.html) - [Resource Isolation with Writable Cgroups on Google Kubernetes Engine (GKE)](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/resource-isolation-with-writable-cgroups.html) - [Examples](https://docs.ray.io/en/latest/cluster/kubernetes/examples.html) - [Train a PyTorch model on Fashion MNIST with CPUs on Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/examples/mnist-training-example.html) - [Serve a StableDiffusion text-to-image model on Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/examples/stable-diffusion-rayservice.html) - [Serve a Stable Diffusion model on GKE with TPUs](https://docs.ray.io/en/latest/cluster/kubernetes/examples/tpu-serve-stable-diffusion.html) - [Serve a MobileNet image classifier on Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/examples/mobilenet-rayservice.html) - [Serve a text summarizer on Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/examples/text-summarizer-rayservice.html) - [RayJob Batch Inference Example](https://docs.ray.io/en/latest/cluster/kubernetes/examples/rayjob-batch-inference-example.html) - [Priority Scheduling with RayJob and Kueue](https://docs.ray.io/en/latest/cluster/kubernetes/examples/rayjob-kueue-priority-scheduling.html) - [Gang Scheduling with RayJob and Kueue](https://docs.ray.io/en/latest/cluster/kubernetes/examples/rayjob-kueue-gang-scheduling.html) - [Distributed checkpointing with KubeRay and GCSFuse](https://docs.ray.io/en/latest/cluster/kubernetes/examples/distributed-checkpointing-with-gcsfuse.html) - [Use Modin with Ray on Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/examples/modin-example.html) - [Serve a Large Language Model using Ray Serve LLM on Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/examples/rayserve-llm-example.html) - [Serve Deepseek R1 using Ray Serve LLM](https://docs.ray.io/en/latest/cluster/kubernetes/examples/rayserve-deepseek-example.html) - [Reinforcement Learning with Human Feedback (RLHF) for LLMs with verl on KubeRay](https://docs.ray.io/en/latest/cluster/kubernetes/examples/verl-post-training.html) - [Deploying Ray Clusters via ArgoCD](https://docs.ray.io/en/latest/cluster/kubernetes/examples/argocd.html) - [KubeRay Ecosystem](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem.html) - [Ingress](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/ingress.html) - [KubeRay metrics references](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/metrics-references.html) - [Using Prometheus and Grafana](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.html) - [Profiling with py-spy](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/pyspy.html) - [Gang scheduling, queue priority, and GPU sharing for RayClusters using KAI Scheduler](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/kai-scheduler.html) - [KubeRay integration with Volcano](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html) - [KubeRay integration with Apache YuniKorn](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/yunikorn.html) - [Gang scheduling, Priority scheduling, and Autoscaling for KubeRay CRDs with Kueue](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/kueue.html) - [mTLS and L7 observability with Istio](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/istio.html) - [KubeRay integration with scheduler plugins](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/scheduler-plugins.html) - [KubeRay Benchmarks](https://docs.ray.io/en/latest/cluster/kubernetes/benchmarks.html) - [KubeRay memory and scalability benchmark](https://docs.ray.io/en/latest/cluster/kubernetes/benchmarks/memory-scalability-benchmark.html) - [KubeRay Troubleshooting](https://docs.ray.io/en/latest/cluster/kubernetes/troubleshooting.html) - [Troubleshooting guide](https://docs.ray.io/en/latest/cluster/kubernetes/troubleshooting/troubleshooting.html) - [RayService troubleshooting](https://docs.ray.io/en/latest/cluster/kubernetes/troubleshooting/rayservice-troubleshooting.html) - [API Reference](https://docs.ray.io/en/latest/cluster/kubernetes/references.html) - [Deploying on VMs](https://docs.ray.io/en/latest/cluster/vms/index.html) - [Getting Started](https://docs.ray.io/en/latest/cluster/vms/getting-started.html) - [User Guides](https://docs.ray.io/en/latest/cluster/vms/user-guides/index.html) - [Launching Ray Clusters on AWS, GCP, Azure, vSphere, On-Prem](https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/index.html) - [Best practices for deploying large clusters](https://docs.ray.io/en/latest/cluster/vms/user-guides/large-cluster-best-practices.html) - [Configuring Autoscaling](https://docs.ray.io/en/latest/cluster/vms/user-guides/configuring-autoscaling.html) - [Log Persistence](https://docs.ray.io/en/latest/cluster/vms/user-guides/logging.html) - [Community Supported Cluster Managers](https://docs.ray.io/en/latest/cluster/vms/user-guides/community/index.html) - [Examples](https://docs.ray.io/en/latest/cluster/vms/examples/index.html) - [Ray Train XGBoostTrainer on VMs](https://docs.ray.io/en/latest/cluster/vms/examples/ml-example.html) - [API References](https://docs.ray.io/en/latest/cluster/vms/references/index.html) - [Cluster Launcher Commands](https://docs.ray.io/en/latest/cluster/vms/references/ray-cluster-cli.html) - [Cluster YAML Configuration Options](https://docs.ray.io/en/latest/cluster/vms/references/ray-cluster-configuration.html) - [Collecting and monitoring metrics](https://docs.ray.io/en/latest/cluster/metrics.html) - [Configuring and Managing Ray Dashboard](https://docs.ray.io/en/latest/cluster/configure-manage-dashboard.html) - [Applications Guide](https://docs.ray.io/en/latest/cluster/running-applications/index.html) - [Ray Jobs Overview](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html) - [Quickstart using the Ray Jobs CLI](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/quickstart.html) - [Python SDK Overview](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/sdk.html) - [Python SDK API Reference](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/jobs-package-ref.html) - [Ray Jobs CLI API Reference](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/cli.html) - [Ray Jobs REST API](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/rest.html) - [Ray Client](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/ray-client.html) - [Programmatic Cluster Scaling](https://docs.ray.io/en/latest/cluster/running-applications/autoscaling/reference.html) - [FAQ](https://docs.ray.io/en/latest/cluster/faq.html) - [Ray Cluster Management API](https://docs.ray.io/en/latest/cluster/package-overview.html) - [Cluster Management CLI](https://docs.ray.io/en/latest/cluster/cli.html) - [Python SDK API Reference](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/jobs-package-ref.html) - [Ray Jobs CLI API Reference](https://docs.ray.io/en/latest/cluster/running-applications/job-submission/cli.html) - [Programmatic Cluster Scaling](https://docs.ray.io/en/latest/cluster/running-applications/autoscaling/reference.html) - [Usage Stats Collection](https://docs.ray.io/en/latest/cluster/usage-stats.html) - [Monitoring and Debugging](https://docs.ray.io/en/latest/ray-observability/index.html) - [Ray Dashboard](https://docs.ray.io/en/latest/ray-observability/getting-started.html) - [Ray Distributed Debugger](https://docs.ray.io/en/latest/ray-observability/ray-distributed-debugger.html) - [Key Concepts](https://docs.ray.io/en/latest/ray-observability/key-concepts.html) - [User Guides](https://docs.ray.io/en/latest/ray-observability/user-guides/index.html) - [Debugging Applications](https://docs.ray.io/en/latest/ray-observability/user-guides/debug-apps/index.html) - [Common Issues](https://docs.ray.io/en/latest/ray-observability/user-guides/debug-apps/general-debugging.html) - [Debugging Memory Issues](https://docs.ray.io/en/latest/ray-observability/user-guides/debug-apps/debug-memory.html) - [Debugging Hangs](https://docs.ray.io/en/latest/ray-observability/user-guides/debug-apps/debug-hangs.html) - [Debugging Failures](https://docs.ray.io/en/latest/ray-observability/user-guides/debug-apps/debug-failures.html) - [Optimizing Performance](https://docs.ray.io/en/latest/ray-observability/user-guides/debug-apps/optimize-performance.html) - [Ray Distributed Debugger](https://docs.ray.io/en/latest/ray-observability/ray-distributed-debugger.html) - [Using the Ray Debugger](https://docs.ray.io/en/latest/ray-observability/user-guides/debug-apps/ray-debugging.html) - [Monitoring with the CLI or SDK](https://docs.ray.io/en/latest/ray-observability/user-guides/cli-sdk.html) - [Configuring Logging](https://docs.ray.io/en/latest/ray-observability/user-guides/configure-logging.html) - [Profiling](https://docs.ray.io/en/latest/ray-observability/user-guides/profiling.html) - [Adding Application-Level Metrics](https://docs.ray.io/en/latest/ray-observability/user-guides/add-app-metrics.html) - [Tracing](https://docs.ray.io/en/latest/ray-observability/user-guides/ray-tracing.html) - [Ray Event Export](https://docs.ray.io/en/latest/ray-observability/user-guides/ray-event-export.html) - [Reference](https://docs.ray.io/en/latest/ray-observability/reference/index.html) - [State API](https://docs.ray.io/en/latest/ray-observability/reference/api.html) - [State CLI](https://docs.ray.io/en/latest/ray-observability/reference/cli.html) - [System Metrics](https://docs.ray.io/en/latest/ray-observability/reference/system-metrics.html) - [Developer Guides](https://docs.ray.io/en/latest/ray-contribute/index.html) - [API Stability](https://docs.ray.io/en/latest/ray-contribute/stability.html) - [API Policy](https://docs.ray.io/en/latest/ray-contribute/api-policy.html) - [Getting Involved / Contributing](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html) - [Building Ray from Source](https://docs.ray.io/en/latest/ray-contribute/development.html) - [CI Testing Workflow on PRs](https://docs.ray.io/en/latest/ray-contribute/ci.html) - [Contributing to the Ray Documentation](https://docs.ray.io/en/latest/ray-contribute/docs.html) - [How to write code snippets](https://docs.ray.io/en/latest/ray-contribute/writing-code-snippets.html) - [Testing Autoscaling Locally](https://docs.ray.io/en/latest/ray-contribute/fake-autoscaler.html) - [Tips for testing Ray programs](https://docs.ray.io/en/latest/ray-contribute/testing-tips.html) - [Debugging for Ray Developers](https://docs.ray.io/en/latest/ray-contribute/debugging.html) - [Profiling for Ray Developers](https://docs.ray.io/en/latest/ray-contribute/profiling.html) - [Configuring Ray](https://docs.ray.io/en/latest/ray-core/configure.html) - [Architecture Whitepapers](https://docs.ray.io/en/latest/ray-contribute/whitepaper.html) - [Glossary](https://docs.ray.io/en/latest/ray-references/glossary.html) - [Security](https://docs.ray.io/en/latest/ray-security/index.html) - [Ray token authentication](https://docs.ray.io/en/latest/ray-security/token-auth.html) - [Project Governance](https://docs.ray.io/en/latest/ray-governance/index.html) - [People](https://docs.ray.io/en/latest/ray-governance/people.html) - [Ray Train: Scalable Model Training](https://docs.ray.io/en/latest/train/train.html) - Get Started... # Get Started with Distributed Training using PyTorch[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#get-started-with-distributed-training-using-pytorch "Link to this heading") This tutorial walks through the process of converting an existing PyTorch script to use Ray Train. Learn how to: 1. Configure a model to run distributed and on the correct CPU/GPU device. 2. Configure a dataloader to shard data across the [workers](https://docs.ray.io/en/latest/train/overview.html#train-overview-worker) and place data on the correct CPU or GPU device. 3. Configure a [training function](https://docs.ray.io/en/latest/train/overview.html#train-overview-training-function) to report metrics and save checkpoints. 4. Configure [scaling](https://docs.ray.io/en/latest/train/overview.html#train-overview-scaling-config) and CPU or GPU resource requirements for a training job. 5. Launch a distributed training job with a [`TorchTrainer`](https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.TorchTrainer.html#ray.train.torch.TorchTrainer "ray.train.torch.TorchTrainer") class. ## Quickstart[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#quickstart "Link to this heading") For reference, the final code will look something like the following: ``` from ray.train.torch import TorchTrainer from ray.train import ScalingConfig def train_func(): # Your PyTorch training code here. ... scaling_config = ScalingConfig(num_workers=2, use_gpu=True) trainer = TorchTrainer(train_func, scaling_config=scaling_config) result = trainer.fit() ``` 1. `train_func` is the Python code that executes on each distributed training worker. 2. [`ScalingConfig`](https://docs.ray.io/en/latest/train/api/doc/ray.train.ScalingConfig.html#ray.train.ScalingConfig "ray.train.ScalingConfig") defines the number of distributed training workers and whether to use GPUs. 3. [`TorchTrainer`](https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.TorchTrainer.html#ray.train.torch.TorchTrainer "ray.train.torch.TorchTrainer") launches the distributed training job. Compare a PyTorch training script with and without Ray Train. PyTorch + Ray Train ``` import os import tempfile import torch from torch.nn import CrossEntropyLoss from torch.optim import Adam from torch.utils.data import DataLoader from torchvision.models import resnet18 from torchvision.datasets import FashionMNIST from torchvision.transforms import ToTensor, Normalize, Compose import ray.train.torch def train_func(): # Model, Loss, Optimizer model = resnet18(num_classes=10) model.conv1 = torch.nn.Conv2d( 1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False ) # [1] Prepare model. model = ray.train.torch.prepare_model(model) # model.to("cuda") # This is done by `prepare_model` criterion = CrossEntropyLoss() optimizer = Adam(model.parameters(), lr=0.001) # Data transform = Compose([ToTensor(), Normalize((0.28604,), (0.32025,))]) data_dir = os.path.join(tempfile.gettempdir(), "data") train_data = FashionMNIST(root=data_dir, train=True, download=True, transform=transform) train_loader = DataLoader(train_data, batch_size=128, shuffle=True) # [2] Prepare dataloader. train_loader = ray.train.torch.prepare_data_loader(train_loader) # Training for epoch in range(10): if ray.train.get_context().get_world_size() > 1: train_loader.sampler.set_epoch(epoch) for images, labels in train_loader: # This is done by `prepare_data_loader`! # images, labels = images.to("cuda"), labels.to("cuda") outputs = model(images) loss = criterion(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() # [3] Report metrics and checkpoint. metrics = {"loss": loss.item(), "epoch": epoch} with tempfile.TemporaryDirectory() as temp_checkpoint_dir: torch.save( model.module.state_dict(), os.path.join(temp_checkpoint_dir, "model.pt") ) ray.train.report( metrics, checkpoint=ray.train.Checkpoint.from_directory(temp_checkpoint_dir), ) if ray.train.get_context().get_world_rank() == 0: print(metrics) # [4] Configure scaling and resource requirements. scaling_config = ray.train.ScalingConfig(num_workers=2, use_gpu=True) # [5] Launch distributed training job. trainer = ray.train.torch.TorchTrainer( train_func, scaling_config=scaling_config, # [5a] If running in a multi-node cluster, this is where you # should configure the run's persistent storage that is accessible # across all worker nodes. # run_config=ray.train.RunConfig(storage_path="s3://..."), ) result = trainer.fit() # [6] Load the trained model. with result.checkpoint.as_directory() as checkpoint_dir: model_state_dict = torch.load(os.path.join(checkpoint_dir, "model.pt")) model = resnet18(num_classes=10) model.conv1 = torch.nn.Conv2d( 1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False ) model.load_state_dict(model_state_dict) ``` PyTorch ``` import os import tempfile import torch from torch.nn import CrossEntropyLoss from torch.optim import Adam from torch.utils.data import DataLoader from torchvision.models import resnet18 from torchvision.datasets import FashionMNIST from torchvision.transforms import ToTensor, Normalize, Compose # Model, Loss, Optimizer model = resnet18(num_classes=10) model.conv1 = torch.nn.Conv2d( 1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False ) model.to("cuda") criterion = CrossEntropyLoss() optimizer = Adam(model.parameters(), lr=0.001) # Data transform = Compose([ToTensor(), Normalize((0.28604,), (0.32025,))]) train_data = FashionMNIST(root='./data', train=True, download=True, transform=transform) train_loader = DataLoader(train_data, batch_size=128, shuffle=True) # Training for epoch in range(10): for images, labels in train_loader: images, labels = images.to("cuda"), labels.to("cuda") outputs = model(images) loss = criterion(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() metrics = {"loss": loss.item(), "epoch": epoch} checkpoint_dir = tempfile.mkdtemp() checkpoint_path = os.path.join(checkpoint_dir, "model.pt") torch.save(model.state_dict(), checkpoint_path) print(metrics) ``` ## Set up a training function[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#set-up-a-training-function "Link to this heading") First, update your training code to support distributed training. Begin by wrapping your code in a [training function](https://docs.ray.io/en/latest/train/overview.html#train-overview-training-function): ``` def train_func(): # Your model training code here. ... ``` Each distributed training worker executes this function. You can also specify the input argument for `train_func` as a dictionary via the Trainer’s `train_loop_config`. For example: ``` def train_func(config): lr = config["lr"] num_epochs = config["num_epochs"] config = {"lr": 1e-4, "num_epochs": 10} trainer = ray.train.torch.TorchTrainer(train_func, train_loop_config=config, ...) ``` Warning Avoid passing large data objects through `train_loop_config` to reduce the serialization and deserialization overhead. Instead, it’s preferred to initialize large objects (e.g. datasets, models) directly in `train_func`. ``` def load_dataset(): # Return a large in-memory dataset ... def load_model(): # Return a large in-memory model instance ... -config = {"data": load_dataset(), "model": load_model()} def train_func(config): - data = config["data"] - model = config["model"] + data = load_dataset() + model = load_model() ... trainer = ray.train.torch.TorchTrainer(train_func, train_loop_config=config, ...) ``` ### Set up a model[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#set-up-a-model "Link to this heading") Use the [`ray.train.torch.prepare_model()`](https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.prepare_model.html#ray.train.torch.prepare_model "ray.train.torch.prepare_model") utility function to: 1. Move your model to the correct device. 2. Wrap it in `DistributedDataParallel`. ``` -from torch.nn.parallel import DistributedDataParallel +import ray.train.torch def train_func(): ... # Create model. model = ... # Set up distributed training and device placement. - device_id = ... # Your logic to get the right device. - model = model.to(device_id or "cpu") - model = DistributedDataParallel(model, device_ids=[device_id]) + model = ray.train.torch.prepare_model(model) ... ``` ### Set up a dataset[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#set-up-a-dataset "Link to this heading") Use the [`ray.train.torch.prepare_data_loader()`](https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.prepare_data_loader.html#ray.train.torch.prepare_data_loader "ray.train.torch.prepare_data_loader") utility function, which: 1. Adds a [`DistributedSampler`](https://docs.pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler "(in PyTorch v2.10)") to your [`DataLoader`](https://docs.pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader "(in PyTorch v2.10)"). 2. Moves the batches to the right device. Note that this step isn’t necessary if you’re passing in Ray Data to your Trainer. See [Data Loading and Preprocessing](https://docs.ray.io/en/latest/train/user-guides/data-loading-preprocessing.html#data-ingest-torch). ``` from torch.utils.data import DataLoader +import ray.train.torch def train_func(): ... dataset = ... data_loader = DataLoader(dataset, batch_size=worker_batch_size, shuffle=True) + data_loader = ray.train.torch.prepare_data_loader(data_loader) for epoch in range(10): + if ray.train.get_context().get_world_size() > 1: + data_loader.sampler.set_epoch(epoch) for X, y in data_loader: - X = X.to_device(device) - y = y.to_device(device) ... ``` Tip Keep in mind that `DataLoader` takes in a `batch_size` which is the batch size for each worker. The global batch size can be calculated from the worker batch size (and vice-versa) with the following equation: ``` global_batch_size = worker_batch_size * ray.train.get_context().get_world_size() ``` Note If you already manually set up your `DataLoader` with a `DistributedSampler`, [`prepare_data_loader()`](https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.prepare_data_loader.html#ray.train.torch.prepare_data_loader "ray.train.torch.prepare_data_loader") will not add another one, and will respect the configuration of the existing sampler. Note [`DistributedSampler`](https://docs.pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler "(in PyTorch v2.10)") does not work with a `DataLoader` that wraps [`IterableDataset`](https://docs.pytorch.org/docs/stable/data.html#torch.utils.data.IterableDataset "(in PyTorch v2.10)"). If you want to work with an dataset iterator, consider using [Ray Data](https://docs.ray.io/en/latest/data/data.html#data) instead of PyTorch DataLoader since it provides performant streaming data ingestion for large scale datasets. See [Data Loading and Preprocessing](https://docs.ray.io/en/latest/train/user-guides/data-loading-preprocessing.html#data-ingest-torch) for more details. ### Report checkpoints and metrics[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#report-checkpoints-and-metrics "Link to this heading") To monitor progress, you can report intermediate metrics and checkpoints using the [`ray.train.report()`](https://docs.ray.io/en/latest/train/api/doc/ray.train.report.html#ray.train.report "ray.train.report") utility function. ``` +import os +import tempfile +import ray.train def train_func(): ... with tempfile.TemporaryDirectory() as temp_checkpoint_dir: torch.save( model.state_dict(), os.path.join(temp_checkpoint_dir, "model.pt") ) + metrics = {"loss": loss.item()} # Training/validation metrics. # Build a Ray Train checkpoint from a directory + checkpoint = ray.train.Checkpoint.from_directory(temp_checkpoint_dir) # Ray Train will automatically save the checkpoint to persistent storage, # so the local `temp_checkpoint_dir` can be safely cleaned up after. + ray.train.report(metrics=metrics, checkpoint=checkpoint) ... ``` For more details, see [Monitoring and Logging Metrics](https://docs.ray.io/en/latest/train/user-guides/monitoring-logging.html#train-monitoring-and-logging) and [Saving and Loading Checkpoints](https://docs.ray.io/en/latest/train/user-guides/checkpoints.html#train-checkpointing). ## Configure scale and GPUs[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#configure-scale-and-gpus "Link to this heading") Outside of your training function, create a [`ScalingConfig`](https://docs.ray.io/en/latest/train/api/doc/ray.train.ScalingConfig.html#ray.train.ScalingConfig "ray.train.ScalingConfig") object to configure: 1. [`num_workers`](https://docs.ray.io/en/latest/train/api/doc/ray.train.ScalingConfig.html#ray.train.ScalingConfig "ray.train.ScalingConfig") - The number of distributed training worker processes. 2. [`use_gpu`](https://docs.ray.io/en/latest/train/api/doc/ray.train.ScalingConfig.html#ray.train.ScalingConfig "ray.train.ScalingConfig") - Whether each worker should use a GPU (or CPU). ``` from ray.train import ScalingConfig scaling_config = ScalingConfig(num_workers=2, use_gpu=True) ``` For more details, see [Configuring Scale and GPUs](https://docs.ray.io/en/latest/train/user-guides/using-gpus.html#train-scaling-config). ## Configure persistent storage[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#configure-persistent-storage "Link to this heading") Create a [`RunConfig`](https://docs.ray.io/en/latest/train/api/doc/ray.train.RunConfig.html#ray.train.RunConfig "ray.train.RunConfig") object to specify the path where results (including checkpoints and artifacts) will be saved. ``` from ray.train import RunConfig # Local path (/some/local/path/unique_run_name) run_config = RunConfig(storage_path="/some/local/path", name="unique_run_name") # Shared cloud storage URI (s3://bucket/unique_run_name) run_config = RunConfig(storage_path="s3://bucket", name="unique_run_name") # Shared NFS path (/mnt/nfs/unique_run_name) run_config = RunConfig(storage_path="/mnt/nfs", name="unique_run_name") ``` Warning Specifying a *shared storage location* (such as cloud storage or NFS) is *optional* for single-node clusters, but it is **required for multi-node clusters.** Using a local path will [raise an error](https://docs.ray.io/en/latest/train/user-guides/persistent-storage.html#multinode-local-storage-warning) during checkpointing for multi-node clusters. For more details, see [Configuring Persistent Storage](https://docs.ray.io/en/latest/train/user-guides/persistent-storage.html#persistent-storage-guide). ## Launch a training job[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#launch-a-training-job "Link to this heading") Tying this all together, you can now launch a distributed training job with a [`TorchTrainer`](https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.TorchTrainer.html#ray.train.torch.TorchTrainer "ray.train.torch.TorchTrainer"). ``` from ray.train.torch import TorchTrainer trainer = TorchTrainer( train_func, scaling_config=scaling_config, run_config=run_config ) result = trainer.fit() ``` ## Access training results[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#access-training-results "Link to this heading") After training completes, a [`Result`](https://docs.ray.io/en/latest/train/api/doc/ray.train.Result.html#ray.train.Result "ray.train.Result") object is returned which contains information about the training run, including the metrics and checkpoints reported during training. ``` result.metrics # The metrics reported during training. result.checkpoint # The latest checkpoint reported during training. result.path # The path where logs are stored. result.error # The exception that was raised, if training failed. ``` For more usage examples, see [Inspecting Training Results](https://docs.ray.io/en/latest/train/user-guides/results.html#train-inspect-results). ## Next steps[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#next-steps "Link to this heading") After you have converted your PyTorch training script to use Ray Train: - See [User Guides](https://docs.ray.io/en/latest/train/user-guides.html#train-user-guides) to learn more about how to perform specific tasks. - Browse the [Examples](https://docs.ray.io/en/latest/train/examples.html) for end-to-end examples of how to use Ray Train. - Dive into the [API Reference](https://docs.ray.io/en/latest/train/api/api.html#train-api) for more details on the classes and methods used in this tutorial. [previous Ray Train Overview](https://docs.ray.io/en/latest/train/overview.html "previous page") [next Get Started with Distributed Training using PyTorch Lightning](https://docs.ray.io/en/latest/train/getting-started-pytorch-lightning.html "next page") On this page - [Quickstart](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#quickstart) - [Set up a training function](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#set-up-a-training-function) - [Set up a model](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#set-up-a-model) - [Set up a dataset](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#set-up-a-dataset) - [Report checkpoints and metrics](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#report-checkpoints-and-metrics) - [Configure scale and GPUs](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#configure-scale-and-gpus) - [Configure persistent storage](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#configure-persistent-storage) - [Launch a training job](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#launch-a-training-job) - [Access training results](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#access-training-results) - [Next steps](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#next-steps) [Edit on GitHub](https://github.com/ray-project/ray/edit/master/doc/source/train/getting-started-pytorch.rst) Thanks for the feedback\! Was this helpful? Yes No Feedback Submit © Copyright 2026, The Ray Team. Created using [Sphinx](https://www.sphinx-doc.org/) 7.3.7. Built with the [PyData Sphinx Theme](https://pydata-sphinx-theme.readthedocs.io/en/stable/index.html) 0.14.1.
Readable Markdown
This tutorial walks through the process of converting an existing PyTorch script to use Ray Train. Learn how to: 1. Configure a model to run distributed and on the correct CPU/GPU device. 2. Configure a dataloader to shard data across the [workers](https://docs.ray.io/en/latest/train/overview.html#train-overview-worker) and place data on the correct CPU or GPU device. 3. Configure a [training function](https://docs.ray.io/en/latest/train/overview.html#train-overview-training-function) to report metrics and save checkpoints. 4. Configure [scaling](https://docs.ray.io/en/latest/train/overview.html#train-overview-scaling-config) and CPU or GPU resource requirements for a training job. 5. Launch a distributed training job with a [`TorchTrainer`](https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.TorchTrainer.html#ray.train.torch.TorchTrainer "ray.train.torch.TorchTrainer") class. ## Quickstart[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#quickstart "Link to this heading") For reference, the final code will look something like the following: ``` from ray.train.torch import TorchTrainer from ray.train import ScalingConfig def train_func(): # Your PyTorch training code here. ... scaling_config = ScalingConfig(num_workers=2, use_gpu=True) trainer = TorchTrainer(train_func, scaling_config=scaling_config) result = trainer.fit() ``` 1. `train_func` is the Python code that executes on each distributed training worker. 2. [`ScalingConfig`](https://docs.ray.io/en/latest/train/api/doc/ray.train.ScalingConfig.html#ray.train.ScalingConfig "ray.train.ScalingConfig") defines the number of distributed training workers and whether to use GPUs. 3. [`TorchTrainer`](https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.TorchTrainer.html#ray.train.torch.TorchTrainer "ray.train.torch.TorchTrainer") launches the distributed training job. Compare a PyTorch training script with and without Ray Train. PyTorch + Ray Train ``` import os import tempfile import torch from torch.nn import CrossEntropyLoss from torch.optim import Adam from torch.utils.data import DataLoader from torchvision.models import resnet18 from torchvision.datasets import FashionMNIST from torchvision.transforms import ToTensor, Normalize, Compose import ray.train.torch def train_func(): # Model, Loss, Optimizer model = resnet18(num_classes=10) model.conv1 = torch.nn.Conv2d( 1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False ) # [1] Prepare model. model = ray.train.torch.prepare_model(model) # model.to("cuda") # This is done by `prepare_model` criterion = CrossEntropyLoss() optimizer = Adam(model.parameters(), lr=0.001) # Data transform = Compose([ToTensor(), Normalize((0.28604,), (0.32025,))]) data_dir = os.path.join(tempfile.gettempdir(), "data") train_data = FashionMNIST(root=data_dir, train=True, download=True, transform=transform) train_loader = DataLoader(train_data, batch_size=128, shuffle=True) # [2] Prepare dataloader. train_loader = ray.train.torch.prepare_data_loader(train_loader) # Training for epoch in range(10): if ray.train.get_context().get_world_size() > 1: train_loader.sampler.set_epoch(epoch) for images, labels in train_loader: # This is done by `prepare_data_loader`! # images, labels = images.to("cuda"), labels.to("cuda") outputs = model(images) loss = criterion(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() # [3] Report metrics and checkpoint. metrics = {"loss": loss.item(), "epoch": epoch} with tempfile.TemporaryDirectory() as temp_checkpoint_dir: torch.save( model.module.state_dict(), os.path.join(temp_checkpoint_dir, "model.pt") ) ray.train.report( metrics, checkpoint=ray.train.Checkpoint.from_directory(temp_checkpoint_dir), ) if ray.train.get_context().get_world_rank() == 0: print(metrics) # [4] Configure scaling and resource requirements. scaling_config = ray.train.ScalingConfig(num_workers=2, use_gpu=True) # [5] Launch distributed training job. trainer = ray.train.torch.TorchTrainer( train_func, scaling_config=scaling_config, # [5a] If running in a multi-node cluster, this is where you # should configure the run's persistent storage that is accessible # across all worker nodes. # run_config=ray.train.RunConfig(storage_path="s3://..."), ) result = trainer.fit() # [6] Load the trained model. with result.checkpoint.as_directory() as checkpoint_dir: model_state_dict = torch.load(os.path.join(checkpoint_dir, "model.pt")) model = resnet18(num_classes=10) model.conv1 = torch.nn.Conv2d( 1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False ) model.load_state_dict(model_state_dict) ``` PyTorch ``` import os import tempfile import torch from torch.nn import CrossEntropyLoss from torch.optim import Adam from torch.utils.data import DataLoader from torchvision.models import resnet18 from torchvision.datasets import FashionMNIST from torchvision.transforms import ToTensor, Normalize, Compose # Model, Loss, Optimizer model = resnet18(num_classes=10) model.conv1 = torch.nn.Conv2d( 1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False ) model.to("cuda") criterion = CrossEntropyLoss() optimizer = Adam(model.parameters(), lr=0.001) # Data transform = Compose([ToTensor(), Normalize((0.28604,), (0.32025,))]) train_data = FashionMNIST(root='./data', train=True, download=True, transform=transform) train_loader = DataLoader(train_data, batch_size=128, shuffle=True) # Training for epoch in range(10): for images, labels in train_loader: images, labels = images.to("cuda"), labels.to("cuda") outputs = model(images) loss = criterion(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() metrics = {"loss": loss.item(), "epoch": epoch} checkpoint_dir = tempfile.mkdtemp() checkpoint_path = os.path.join(checkpoint_dir, "model.pt") torch.save(model.state_dict(), checkpoint_path) print(metrics) ``` ## Set up a training function[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#set-up-a-training-function "Link to this heading") First, update your training code to support distributed training. Begin by wrapping your code in a [training function](https://docs.ray.io/en/latest/train/overview.html#train-overview-training-function): ``` def train_func(): # Your model training code here. ... ``` Each distributed training worker executes this function. You can also specify the input argument for `train_func` as a dictionary via the Trainer’s `train_loop_config`. For example: ``` def train_func(config): lr = config["lr"] num_epochs = config["num_epochs"] config = {"lr": 1e-4, "num_epochs": 10} trainer = ray.train.torch.TorchTrainer(train_func, train_loop_config=config, ...) ``` Warning Avoid passing large data objects through `train_loop_config` to reduce the serialization and deserialization overhead. Instead, it’s preferred to initialize large objects (e.g. datasets, models) directly in `train_func`. ``` def load_dataset(): # Return a large in-memory dataset ... def load_model(): # Return a large in-memory model instance ... -config = {"data": load_dataset(), "model": load_model()} def train_func(config): - data = config["data"] - model = config["model"] + data = load_dataset() + model = load_model() ... trainer = ray.train.torch.TorchTrainer(train_func, train_loop_config=config, ...) ``` ### Set up a model[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#set-up-a-model "Link to this heading") Use the [`ray.train.torch.prepare_model()`](https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.prepare_model.html#ray.train.torch.prepare_model "ray.train.torch.prepare_model") utility function to: 1. Move your model to the correct device. 2. Wrap it in `DistributedDataParallel`. ``` -from torch.nn.parallel import DistributedDataParallel +import ray.train.torch def train_func(): ... # Create model. model = ... # Set up distributed training and device placement. - device_id = ... # Your logic to get the right device. - model = model.to(device_id or "cpu") - model = DistributedDataParallel(model, device_ids=[device_id]) + model = ray.train.torch.prepare_model(model) ... ``` ### Set up a dataset[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#set-up-a-dataset "Link to this heading") Use the [`ray.train.torch.prepare_data_loader()`](https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.prepare_data_loader.html#ray.train.torch.prepare_data_loader "ray.train.torch.prepare_data_loader") utility function, which: 1. Adds a [`DistributedSampler`](https://docs.pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler "(in PyTorch v2.10)") to your [`DataLoader`](https://docs.pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader "(in PyTorch v2.10)"). 2. Moves the batches to the right device. Note that this step isn’t necessary if you’re passing in Ray Data to your Trainer. See [Data Loading and Preprocessing](https://docs.ray.io/en/latest/train/user-guides/data-loading-preprocessing.html#data-ingest-torch). ``` from torch.utils.data import DataLoader +import ray.train.torch def train_func(): ... dataset = ... data_loader = DataLoader(dataset, batch_size=worker_batch_size, shuffle=True) + data_loader = ray.train.torch.prepare_data_loader(data_loader) for epoch in range(10): + if ray.train.get_context().get_world_size() > 1: + data_loader.sampler.set_epoch(epoch) for X, y in data_loader: - X = X.to_device(device) - y = y.to_device(device) ... ``` Tip Keep in mind that `DataLoader` takes in a `batch_size` which is the batch size for each worker. The global batch size can be calculated from the worker batch size (and vice-versa) with the following equation: ``` global_batch_size = worker_batch_size * ray.train.get_context().get_world_size() ``` Note If you already manually set up your `DataLoader` with a `DistributedSampler`, [`prepare_data_loader()`](https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.prepare_data_loader.html#ray.train.torch.prepare_data_loader "ray.train.torch.prepare_data_loader") will not add another one, and will respect the configuration of the existing sampler. ### Report checkpoints and metrics[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#report-checkpoints-and-metrics "Link to this heading") To monitor progress, you can report intermediate metrics and checkpoints using the [`ray.train.report()`](https://docs.ray.io/en/latest/train/api/doc/ray.train.report.html#ray.train.report "ray.train.report") utility function. ``` +import os +import tempfile +import ray.train def train_func(): ... with tempfile.TemporaryDirectory() as temp_checkpoint_dir: torch.save( model.state_dict(), os.path.join(temp_checkpoint_dir, "model.pt") ) + metrics = {"loss": loss.item()} # Training/validation metrics. # Build a Ray Train checkpoint from a directory + checkpoint = ray.train.Checkpoint.from_directory(temp_checkpoint_dir) # Ray Train will automatically save the checkpoint to persistent storage, # so the local `temp_checkpoint_dir` can be safely cleaned up after. + ray.train.report(metrics=metrics, checkpoint=checkpoint) ... ``` For more details, see [Monitoring and Logging Metrics](https://docs.ray.io/en/latest/train/user-guides/monitoring-logging.html#train-monitoring-and-logging) and [Saving and Loading Checkpoints](https://docs.ray.io/en/latest/train/user-guides/checkpoints.html#train-checkpointing). ## Configure scale and GPUs[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#configure-scale-and-gpus "Link to this heading") Outside of your training function, create a [`ScalingConfig`](https://docs.ray.io/en/latest/train/api/doc/ray.train.ScalingConfig.html#ray.train.ScalingConfig "ray.train.ScalingConfig") object to configure: 1. [`num_workers`](https://docs.ray.io/en/latest/train/api/doc/ray.train.ScalingConfig.html#ray.train.ScalingConfig "ray.train.ScalingConfig") - The number of distributed training worker processes. 2. [`use_gpu`](https://docs.ray.io/en/latest/train/api/doc/ray.train.ScalingConfig.html#ray.train.ScalingConfig "ray.train.ScalingConfig") - Whether each worker should use a GPU (or CPU). ``` from ray.train import ScalingConfig scaling_config = ScalingConfig(num_workers=2, use_gpu=True) ``` For more details, see [Configuring Scale and GPUs](https://docs.ray.io/en/latest/train/user-guides/using-gpus.html#train-scaling-config). ## Configure persistent storage[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#configure-persistent-storage "Link to this heading") Create a [`RunConfig`](https://docs.ray.io/en/latest/train/api/doc/ray.train.RunConfig.html#ray.train.RunConfig "ray.train.RunConfig") object to specify the path where results (including checkpoints and artifacts) will be saved. ``` from ray.train import RunConfig # Local path (/some/local/path/unique_run_name) run_config = RunConfig(storage_path="/some/local/path", name="unique_run_name") # Shared cloud storage URI (s3://bucket/unique_run_name) run_config = RunConfig(storage_path="s3://bucket", name="unique_run_name") # Shared NFS path (/mnt/nfs/unique_run_name) run_config = RunConfig(storage_path="/mnt/nfs", name="unique_run_name") ``` Warning Specifying a *shared storage location* (such as cloud storage or NFS) is *optional* for single-node clusters, but it is **required for multi-node clusters.** Using a local path will [raise an error](https://docs.ray.io/en/latest/train/user-guides/persistent-storage.html#multinode-local-storage-warning) during checkpointing for multi-node clusters. For more details, see [Configuring Persistent Storage](https://docs.ray.io/en/latest/train/user-guides/persistent-storage.html#persistent-storage-guide). ## Launch a training job[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#launch-a-training-job "Link to this heading") Tying this all together, you can now launch a distributed training job with a [`TorchTrainer`](https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.TorchTrainer.html#ray.train.torch.TorchTrainer "ray.train.torch.TorchTrainer"). ``` from ray.train.torch import TorchTrainer trainer = TorchTrainer( train_func, scaling_config=scaling_config, run_config=run_config ) result = trainer.fit() ``` ## Access training results[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#access-training-results "Link to this heading") After training completes, a [`Result`](https://docs.ray.io/en/latest/train/api/doc/ray.train.Result.html#ray.train.Result "ray.train.Result") object is returned which contains information about the training run, including the metrics and checkpoints reported during training. ``` result.metrics # The metrics reported during training. result.checkpoint # The latest checkpoint reported during training. result.path # The path where logs are stored. result.error # The exception that was raised, if training failed. ``` For more usage examples, see [Inspecting Training Results](https://docs.ray.io/en/latest/train/user-guides/results.html#train-inspect-results). ## Next steps[\#](https://docs.ray.io/en/latest/train/getting-started-pytorch.html#next-steps "Link to this heading") After you have converted your PyTorch training script to use Ray Train: - See [User Guides](https://docs.ray.io/en/latest/train/user-guides.html#train-user-guides) to learn more about how to perform specific tasks. - Browse the [Examples](https://docs.ray.io/en/latest/train/examples.html) for end-to-end examples of how to use Ray Train. - Dive into the [API Reference](https://docs.ray.io/en/latest/train/api/api.html#train-api) for more details on the classes and methods used in this tutorial.
Shard120 (laksa)
Root Hash13392486245584682520
Unparsed URLio,ray!docs,/en/latest/train/getting-started-pytorch.html s443