HuggingFace Datasets Import Functionality for the Civitai On-Site LoRA Trainer

Currently, the process of importing a 700MB zip file containing textures for a diffuse map LoRA into the Civitai On-Site LoRA Trainer involves approximately 10-20 failed upload attempts. This cumbersome process is in contrast to the availability of standardized data from actual servers, as the data is already hosted on HuggingFace (along with numerous other datasets).

To streamline the import process and enhance efficiency, it would be beneficial to provide a user-friendly interface that allows for direct input of the dataset name, such as “alastandy/Diffuse_Map_Surfaces” or the corresponding URL, or the DOI 10.57967/hf/3756. This would enable users to specify the desired dataset without the need for manual file uploads.

Implementing this functionality would be relatively straightforward using the existing datasets Python packages available at https://github.com/huggingface/datasets. Alternatively, the data could be copied directly from the automatically generated DuckDB or Parquet files for an even more efficient approach.

By default, the captions for the dataset are stored in the metadata.jsonl file located within the dataset directory. To convert the captions to a per-file format compatible with the On-Site LoRA Trainer, a simple Python script can be utilized. For example, when I am converting to the per-file format user by the On-Site LoRA Trainer, this is the Python script I use to convert it:

import os
import json

# Specify the path to the metadata file
metadata_file = 'metadata.jsonl'

# Create a directory for the text files if it doesn't exist
output_dir = 'captions'
os.makedirs(output_dir, exist_ok=True)

# Read the metadata.jsonl file line by line
with open(metadata_file, 'r') as f:
    for line in f:
        # Parse the JSON line into a dictionary
        data = json.loads(line.strip())
        
        # Extract file_name and prompt
        file_name = data.get("file_name", "")
        prompt = data.get("prompt", "")
        
        # Create the corresponding .txt file name by replacing .png with .txt
        base_name = os.path.splitext(file_name)[0]  # Strip .png extension
        txt_file_name = f"{base_name}.txt"
        
        # Define the full path for the new .txt file
        txt_file_path = os.path.join(output_dir, txt_file_name)
        
        # Write the prompt into the .txt file
        with open(txt_file_path, 'w') as txt_file:
            txt_file.write(prompt)

        print(f"Created: {txt_file_path}")

Please authenticate to join the conversation.

Upvoters
Status

Awaiting Dev Review

Board

💡 Feature Request

Date

About 1 year ago

Author

alastandy

Subscribe to post

Get notified by email when there are changes.