从源码安装

  1. Install dependencies

    pip install -r requirements.txt
  2. Configure the environment

    • Create an .env file in the root directory

      cp .env.example .env
    • Set the following environment variables:

      # Synthesizer is the model used to construct KG and generate data
      SYNTHESIZER_MODEL=your_synthesizer_model_name
      SYNTHESIZER_BASE_URL=your_base_url_for_synthesizer_model
      SYNTHESIZER_API_KEY=your_api_key_for_synthesizer_model
      # Trainee is the model used to train with the generated data
      TRAINEE_MODEL=your_trainee_model_name
      TRAINEE_BASE_URL=your_base_url_for_trainee_model
      TRAINEE_API_KEY=your_api_key_for_trainee_model
  3. (Optional) If you want to modify the default generated configuration, you can edit the content of the configs/graphgen_config.yaml file.

    # configs/graphgen_config.yaml
    # Example configuration
    data_type: "raw"
    input_file: "resources/examples/raw_demo.jsonl"
    # more configurations...
  4. Run the generation script

    bash scripts/generate.sh
  5. Get the generated data

    ls cache/data/graphgen

最后更新于