ML & DA Guided Tour
ML & DA Guided Tour Click on the steps to navigate and show details
Logo AI Data Analytics Assistant
Transform your data into insights with AI-powered analysis.
Simply describe your goals in plain English, and watch our intelligent assistant
generate code, create stunning visualizations, and uncover hidden patterns.
Upload any dataset and get instant results—no coding expertise needed.
✨ Plus: Advanced AutoML for predictive modeling and machine learning automation!
🗣️Natural Language Commands
🛠️Auto Task Pipeline
🐍Python Code Generator
📊Interactive Visualizations
🤖AutoML Machine Learning Integration
Feature Image
Natural Language Commands
Just describe your request in plain English—like:
"Analyze the data, create visualizations, and calculate the means of the Sepal Length and Sepal Width columns."
Our assistant instantly understands your intent, breaks it into an analysis plan, and generates the necessary code, charts, and insights—no technical skills needed.
✅ From simple summaries to custom stats and visualizations
✅ Works with any tabular data
✅ One-click to run your full analysis pipeline
How It Works
💬
1. Describe the Task
Type your request in plain English. No code, no jargon—just your question or goal.
2. Agent Works
The AI agent instantly makes plans, codes, visualizes the data, and trains models behind the scenes.
🎉
3. Get Data Insights
See insights, charts, model predictions, and ready-to-use code without mannual work.
Real-World Examples
Marketing
"Analyze email campaign open rates by region and suggest the best time to send newsletters."
Finance
"Generate a monthly expense report and flag transactions above $5,000 for review."
Operations
"Monitor daily production output and alert if downtime exceeds 10 minutes."
Model Training
"Train a model to predict customer churn and display the top 3 factors influencing predictions."
Sales
"Summarize quarterly sales by product and highlight the best-performing region."
HR Analytics
"Analyze employee turnover trends and visualize average tenure by department."
Zoomed Feature
Your Page Title

AutoML: Quick Start for Developers

Step 1 of 5 - Get Started
1
🔑 Get API Key

1. Sign up a free VecML account.

2. After login, generate an API key. Please save and keep the key at a safe place, it will only be seen once when created.

You can manage your API keys in account center from the top-right panel.

API Keys management screenshot
2
🏗️ Create Project & Collection

Create a project called "AutoML-Demo" and initialize the training data collection "training_data" within the project, with "dense" vector type and dimension 64. We will use Python as the example language.

import requests
import json
import numpy as np
import time

API_KEY = "replace_this_with_your_api_key"
BASE_URL = "https://db.vecml.com/api"

def make_request(endpoint, data):
    """Helper function to make API calls"""
    url = f"{BASE_URL}/{endpoint}"
    response = requests.post(url, json=data)
    print(f"Request to {endpoint}: HTTP {response.status_code}")

    if response.text:
        try:
            json_response = response.json()
            print(f"Response: {json_response}")
            return response.status_code, json_response
        except requests.exceptions.JSONDecodeError:
            print(f"Response: {response.text}")
            return response.status_code, {"error": "Not JSON", "message": response.text}
    else:
        print("Response: Empty")
        return response.status_code, None

# 1. Create a project
project_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "application": "Machine Learning"}
status, response = make_request("create_project", project_data)

# 2. Initialize training dataset
init_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "training_data",
             "vector_type": "dense", "vector_dim": 64}
status, response = make_request("init", init_data)
3
📤 Insert Data

Upload your vector embeddings efficiently using batch operations with "/add_data_batch" endpoint. This endpoint is asynchronous: the server will respond with a job ID for /add_data_batch. You can call "/get_upload_data_status" endpoint to track the status of the job.

Here we just simulate a binary classification model with random vectors and categorical feature "category". When uploading the data, all categorical features and the prediction target "label" should be included as vector attributes.

VecML also supports other data insertion methods like upload through files. See full documentation for AutoML API usage instructions.

def wait_for_job_completion(job_id, status_endpoint, max_wait_time=60):
    """Wait for an async job to complete"""
    start_time = time.time()

    while True:
        status_data = {"user_api_key": API_KEY, "job_id": job_id}
        status, status_response = make_request(status_endpoint, status_data)

        if status_response and status_response.get("status") == "finished":
            return True
        elif status_response and status_response.get("status") == "failed":
            return False

        if time.time() - start_time > max_wait_time:
            return False

        time.sleep(2)

def generate_dataset(num_samples, vector_dim, id_prefix, seed=2025):
    """Generate dataset with linear decision boundary"""
    np.random.seed(seed)
    vectors = np.random.randn(num_samples, vector_dim).tolist()
    categories = [np.random.choice(['A', 'B', 'C']) for _ in range(num_samples)]

    labels = []
    for vec, category in zip(vectors, categories):
        # Linear combination of first few components plus category weight
        score = sum(vec[:20]) + {'A': 1.0, 'B': -0.5, 'C': 0.0}[category]
        label = '1' if score > 0 else '0'
        labels.append(label)

    # Generate IDs and attributes
    ids = [f"{id_prefix}_{i:03d}" for i in range(num_samples)]
    attributes = [{"label": str(label), "category": category} for label, category in zip(labels, categories)]

    return vectors, ids, attributes
          
# 3. Generate and add training data using add_data_batch
vectors, ids, attributes = generate_dataset(num_samples=1000, vector_dim=64, id_prefix="train", seed=2025)

# Add training data in batch
batch_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "training_data",
              "string_ids": ids, "data": vectors, "attributes": attributes}
status, response = make_request("add_data_batch", batch_data)
train_upload_job_id = response["job_id"]

# Wait for training data upload to complete
if not wait_for_job_completion(train_upload_job_id, "get_upload_data_status", max_wait_time=30):
    exit(1)
4
🧮 Train AutoML Model

Train an AutoML model for a data collection with specified categorical features and target label.

Model training is asynchronous: call "/get_automl_training_status" to check the job status.

# 4. Train AutoML model
train_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "dataset_name": "training_data",
              "model_name": "model1", "training_mode": "high_speed", "task_type": "classification",
              "label_attribute": "label", "categorical_features": ["category"]}
status, response = make_request("train_automl_model", train_data)
train_job_id = response["job_id"]

# Wait for training to complete
if not wait_for_job_completion(train_job_id, "get_automl_training_status", max_wait_time=60):
    exit(1)
5
🔮 Model Prediction

After the model is trained, generate predictions for a test dataset. You can either predict on an existing data collection in the project, or upload a data file ad-hoc for prediction. In the example, we create a new data collection as the prediction dataset.

# 5. Initialize prediction dataset
pred_init_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "prediction_data",
                  "vector_type": "dense", "vector_dim": 64}
status, response = make_request("init", pred_init_data)

# 6. Generate and add prediction data
prediction_vectors, prediction_ids, prediction_attributes = generate_dataset(num_samples=100, vector_dim=64, id_prefix="pred", seed=2026)

# Add prediction data in batch
pred_batch_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "collection_name": "prediction_data",
                   "string_ids": prediction_ids, "data": prediction_vectors, "attributes": prediction_attributes}
status, response = make_request("add_data_batch", pred_batch_data)
pred_upload_job_id = response["job_id"]

# Wait for prediction data upload to complete
if not wait_for_job_completion(pred_upload_job_id, "get_upload_data_status"):
    exit(1)

# 7. Make predictions using the existing dataset
predict_data = {"user_api_key": API_KEY, "project_name": "AutoML-Demo", "dataset_name": "training_data",
                "model_name": "model1", "prediction_dataset": "prediction_data"}
status, prediction_results = make_request("automl_predict", predict_data)

🎉 Congratulations! You're all set to use VecML!

Pro Tips for Getting Started

Here are some tips to help you make the most of VecML Database Cloud:

  • Choose the right distance/similarity type. For normalized vectors (vector norm equal to 1), Euclidean, cosine, inner product are equivalent. Using inner product will be slightly more efficient in index build and query.
  • Use meaningful metadata: Adding metadata to your vectors enables powerful filtering capabilities.
  • Start with sample datasets: Explore our sample datasets to understand how VecML works before uploading your own data.
  • Monitor performance: Keep an eye on search latency and resource usage to optimize your configuration.

Common Workflows

Here are some typical workflows for different use cases with VecML:

Text Semantic Search

Build a powerful semantic search engine for documents, articles, or product descriptions.

  1. Generate text embeddings using models like OpenAI's text-embedding-ada-002
  2. Upload embeddings to VecML with relevant metadata
  3. Create a search index with cosine similarity
  4. Implement search by embedding query text and searching the vector database
View Tutorial

Image Similarity Search

Find similar images based on visual features and content.

  1. Generate image embeddings using a vision model
  2. Upload embeddings to VecML with image metadata
  3. Create an index optimized for image similarity
  4. Search using query image embeddings to find visually similar content
View Tutorial

Recommendation Systems

Build personalized recommendation engines based on user behavior and item similarity.

  1. Create embeddings for users and items
  2. Store both in VecML with appropriate metadata
  3. Query similar items based on user preferences
  4. Implement hybrid filtering using metadata and vector similarity
View Tutorial

Anomaly Detection

Identify unusual patterns or outliers in your data using vector embeddings.

  1. Create embeddings for normal behavior patterns
  2. Upload to VecML and build appropriate indexes
  3. Check new data points against known patterns
  4. Flag data points with low similarity scores as potential anomalies
View Tutorial

Next Steps

Once you've set up your first vector database, explore these additional features:

  • AutoML: Train machine learning models directly on your vector data
  • API Integration: Connect VecML to your applications using our REST API
  • Advanced Filtering: Combine vector similarity with metadata filtering for precise results
  • Batch Operations: Optimize performance with batch vector operations
  • VecML SDK: Use our client libraries for seamless integration with your development environment

Check out our comprehensive documentation for detailed guides on these topics and more.

Data Format and Sample Datasets

VecML Database supports a variety of data formats for vector storage and retrieval. Below you'll find sample datasets to help you get started with testing and exploring the capabilities of our vector database.

Quick Start Tip

Download one of our sample datasets and upload it directly to your VecML project to start experimenting with VecML's efficient and accurate AutoML platform within minutes.

CoverType (CSV Data Matrix Format)

A benchmark machine learning dataset from UCI repository, dim = 54, # of classes: 7. Label column: "Cover_Type"

covertype_train.csv (67.6MB, 531,012 samples)   |   covertype_test.csv (6.4MB, 50,000 samples)

svmguide3 (LIBSVM Sparse Format)

A small LIBSVM sparse format dataset from LIBSVM website, dim = 21, # of classes: 2. Each line starts with the label (integer).

svmguide3_train.svm (300KB, 1,243 samples)   |   svmguide3_test.svm (11KB, 41 samples)

Sample Dataset Use Cases for Machine Learning

CoverType

Ideal for: Classification, AutoML model training

The CoverType dataset contains cartographic variables used to predict forest cover type. With 54 features and 7 classes, it's an excellent dataset for testing VecML's AutoML capabilities.

Quick start:

  1. Download the covertype_train.csv sample
  2. Create a new dataset in AutoML mode
  3. Specify "Cover_Type" as the label column
  4. Train a classification model using VecML AutoML
  5. Evaluate the model using covertype_test.csv

Dataset Import Guidelines

Follow the following guidelines to ensure your dataset can be successfully imported and parsed into VecML Database:

Format Required Item Optional Parameters Best For
JSON Whether containing field name ID/vector/attribute fields Datasets with rich metadata
Binary Dimensionality, data type None Large datasets, performance-critical applications
CSV Whether containing column name ID/attribute column Tabular data, analytics workflows
LIBSVM Proper key:value format None Sparse datasets, classification tasks

Supported Data Formats

VecML Database supports multiple data formats to accommodate different use cases and data sources. Below are detailed descriptions of each format and how to use them effectively.

JSON Format

JSON (JavaScript Object Notation) format is ideal for storing vector data along with metadata. Each vector is represented as a JSON object with fields for the vector values and additional metadata.

Key features:

  • Human-readable format
  • Supports metadata for vectors. Currently, we only support the flat structure of metadata without nested structures. See the below example.
  • Easy to process with standard libraries

Example JSON with field names:

[ { "_id": "1", // unique string_id "openai": [0.123, -0.456, 0.789, ...], // Vector data "date": "2023-05-15", // attributes "text": "This is example text1." }, { "_id": "2", "openai": [0.321, -0.123, 0.978, ...], "date": "2023-04-20", "source": "wikipedia" // different vectors may have different metadata fields } ]

Example JSON with NO field names:

[ [0.123, -0.456, 0.789, ...], [0.321, -0.654, 0.987, ...], [0.213, -0.546, 0.879, ...] ]

Import tips:

When importing JSON data that contains field names, specify the "vector data field" (e.g., "openai") and optionally the "ID field" (e.g., "_id"). You can select any additional fields which will be stored as metadata that can be used for filtering and retrieval.

Binary Format

Binary format is efficient for storing large vector datasets and is available in different numerical types. Currently we support Float32 and Uint8 data types. For Float32, every consecutive 4 bytes will be read as a Float32 value; for UInt8, every byte will be read as a unsigned 8-bit integer value. This format stores raw vector values without metadata, making it compact and fast to process, suitable to large-scale applciations.

Key features:

  • Space-efficient storage
  • Fast loading and processing
  • Ideal for large datasets

Import tips:

When importing binary data, you need to specify:

  • Vector dimensionality (e.g., 1024 for personahub, 784 for MNIST)
  • Numerical type (Float32, UInt8, etc.)
CSV Data Matrix Format

CSV (Comma-Separated Values) format is widely used for tabular data and is supported by many tools and platforms. CSV files can be used to store vector data where each row represents a vector and its associated attributes and/or metadata.

Key features:

  • Compatible with spreadsheet software and data analysis tools
  • Easy to generate and edit
  • Good for datasets with consistent schema
id,feature1,feature2,feature3,...,Cover_Type 1,2730,58,2,0,0,0,...,5 2,2790,55,3,0,0,0,...,2 ...

Import tips:

When importing CSV data:

  • The CSV file may or may not contain headers (column names). Based on the data preview, specify whether the data has headers or not.
  • Specify all the attribute/metadata columns. For training maching learning models, include the class label (for classification) or response (regression) as attributes.
LIBSVM Sparse Format

LIBSVM format is designed for sparse datasets where most feature values are zero. It uses an efficient key:value representation that only stores non-zero values.

Key features:

  • Extremely efficient for sparse and high-dimensional data
  • Commonly used in machine learning applications
  • Reduces storage and processing requirements
# Format: label index1:value1 index2:value2 ... 1 1:0 3:0.1 5:0.2 9:0.5 -1 2:0.3 4:0.4 7:0.8

Import tips:

When importing LIBSVM data, if the first value on a line is a single integer, it will be treat as the label. If the line starts directly with key:value pair, then no label is read for that line.

Important Notice

This is a test environment of the VecML Cloud Service. All data, including user accounts, chat histories, and uploaded files, may be deleted (very frequently) without prior notice.