Using I2MC for robust fixation extraction

Eye-tracking

Python

Published

July 19, 2024

Keywords

eye-tracking, I2MC, fixation detection, data analysis, tutorial, python, DevStart, developmental science, tutorial, eye fixations

What we are going to do

When it comes to eye-tracking data, a fundamental role is played by fixations. A fixation indicates that a person’s eyes are looking at a particular point of interest for a given amount of time. More specifically, a fixation is a cluster of consecutive data points in an eye-tracking dataset for which a person’s gaze remains relatively still and focused on a particular area or object.

Typically, eye-tracking programs come with their own fixation detection algorithms that give us a rough idea of what the person was looking at. But here’s the problem: these tools aren’t always very good when it comes to data from infants and children. Why? Because infants and children can be all over the place! They move their heads, put their hands (or even feet) in front of their faces, close their eyes, or just look away. All of this makes the data a big mess that’s hard to make sense of with regular fixation detection algorithms. Because the data is so messy, it is difficult to tell which data points are part of the same fixation or different fixations.

But don’t worry! We’ve got a solution: I2MC.

I2MC stands for “Identification by Two-Means Clustering”, and it was designed specifically for this kind of problem. It’s designed to deal with all kinds of noise, and even periods of data loss. In this tutorial, we’ll show you how to use I2MC to find fixations. We won’t get into the nerdy stuff about how it works - this is all about the practical side. If you’re curious about the science, you can read the original article.

Now that we’ve introduced I2MC, let’s get our hands dirty and see how to use it!

Install I2MC

Installing I2MC in Python is extremely easy. As explained in the tutorial to install Python, just open the miniconda terminal, activate the environment you want to install I2MC in, and type pip install I2MC. After a few seconds, you should be ready to go!!

You may need

Note

I2MC has been originally written for Matlab. So for you crazy people who would prefer to use Matlab you can find instructions to download and use I2MC here: I2MC Matlab!

Use I2MC

Let’s start with importing the libraries that we will need

import I2MC                         # I2MC
import pandas as pd                 # panda help us read csv
import numpy as np                  # to handle arrays
import matplotlib.pyplot  as plt    # to make plots

This was too easy now, let’s start to really get into it.

Import data

Now we will write a simple function to import our data. Because different eye trackers produce different file structures, this step acts as a “translator” to get everything into a standard format.

For this tutorial, you can either:

Use our sample data: Download the dataset we collected with DeToX here.
Use your own data: You will just need to adapt the column names in this function to match your specific file format.

Let’s build the import function step by step.

# Load data
raw_df = pd.read_hdf(fname, key='gaze')

Once the data is loaded, we need to create a clean DataFrame containing only the information essential for analysis.

This is the most important part: we need to map your specific column names to a standard set of 5 columns (Time, Left X, Left Y, Right X, Right Y). If you are using a different eye tracker (e.g., Eyelink), this is a part of the code you will need to change!

# Create empty dataframe
df = pd.DataFrame()
    
# Extract required data
df['time'] = raw_df['TimeStamp']
df['L_X'] = raw_df['Left_X']
df['L_Y'] = raw_df['Left_Y']
df['R_X'] = raw_df['Right_X']
df['R_Y'] = raw_df['Right_Y']

After extracting the raw coordinates, we perform some minimal pre-processing to remove artifacts.

Eye trackers can occasionally produce “spurious” data points where the gaze coordinates jump far outside the screen boundaries. We define a valid range (the monitor size plus a margin) and mark any samples outside this range as missing (NaN).

Additionally, we use the Validity codes provided by the eye tracker. DeToX standardizes these so that 1 indicates a valid sample and 0 indicates an invalid one. We simply reject any sample marked as invalid.

# Sometimes we have weird peaks where one sample is (very) far outside the
# monitor. Here, count as missing any data that is more than one monitor
# distance outside the monitor.

# Screen resolution
res = [1920, 1080]

# --- Left Eye Processing ---
# 1. Check for coordinates far outside the monitor
lMiss1 = (df['L_X'] < -res[0]) | (df['L_X'] > 2 * res[0])
lMiss2 = (df['L_Y'] < -res[1]) | (df['L_Y'] > 2 * res[1])

# 2. Check validity (0 = invalid in DeToX)
# Combine criteria: Miss if out of bounds OR validity is 0
lMiss = lMiss1 | lMiss2 | (raw_df['Left_Validity'] == 0)

# 3. Replace invalid samples with NaN
df.loc[lMiss, 'L_X'] = np.nan
df.loc[lMiss, 'L_Y'] = np.nan

# --- Right Eye Processing ---
rMiss1 = (df['R_X'] < -res[0]) | (df['R_X'] > 2 * res[0])
rMiss2 = (df['R_Y'] < -res[1]) | (df['R_Y'] > 2 * res[1])
rMiss = rMiss1 | rMiss2 | (raw_df['Right_Validity'] == 0)

df.loc[rMiss, 'R_X'] = np.nan
df.loc[rMiss, 'R_Y'] = np.nan

Perfect!!!

Everything into a function

We have successfully loaded the data, extracted the relevant columns, and cleaned up the artifacts. To make this easy to use with I2MC (and to keep our main script clean), we will wrap all these steps into a single, reusable function.

def tobii_TX300(fname, res=[1920,1080]):
    """
    Import and preprocess DeToX HDF5 data for I2MC.
    """
    # 1. Load the raw data
    raw_df = pd.read_hdf(fname, key='gaze')
    
    # 2. Create the output DataFrame
    df = pd.DataFrame()
    df['time'] = raw_df['TimeStamp']
    
    # Map DeToX columns to I2MC expected names
    df['L_X'] = raw_df['Left_X']
    df['L_Y'] = raw_df['Left_Y']
    df['R_X'] = raw_df['Right_X']
    df['R_Y'] = raw_df['Right_Y']
    
    # 3. Clean Artifacts (Out of bounds)
    # Left Eye
    lMiss1 = (df['L_X'] < -res[0]) | (df['L_X'] > 2 * res[0])
    lMiss2 = (df['L_Y'] < -res[1]) | (df['L_Y'] > 2 * res[1])
    lMiss = lMiss1 | lMiss2 | (raw_df['Left_Validity'] == 0)
    
    df.loc[lMiss, 'L_X'] = np.nan
    df.loc[lMiss, 'L_Y'] = np.nan

    # Right Eye
    rMiss1 = (df['R_X'] < -res[0]) | (df['R_X'] > 2 * res[0])
    rMiss2 = (df['R_Y'] < -res[1]) | (df['R_Y'] > 2 * res[1])
    rMiss = rMiss1 | rMiss2 | (raw_df['Right_Validity'] == 0)
    
    df.loc[rMiss, 'R_X'] = np.nan
    df.loc[rMiss, 'R_Y'] = np.nan

    return df

Find our data

Nice!! we have our import function that we will use to read our data. Now, let’s find our data! To do this, we will use the glob library, which is a handy tool for finding files in Python. Before that let’s set our working directory. The working directory is the folder where we have all our scripts and data. We can set it using the os library:

import os
os.chdir(r'<<< YOUR PATH >>>>')

This is my directory, you will have something different, you need to change it to where your data are. Once you are done with that we can use glob to find our data files. In the code below, we are telling Python to look for files with a .h5 extension in a specific folder on our computer:

from pathlib import Path
data_files = list(Path().glob('DATA/RAW/**/*.h5'))

DATA\\RAW\\: This is the path where we want to start our search.
**: This special symbol tells Python to search in all the subfolders (folders within folders) under our starting path.
*.h5: We’re asking Python to look for files with names ending in “.h5”.

So, when we run this code, Python will find and give us a list of all the “.h5” files located in any subfolder within our specified path. This makes it really convenient to find and work with lots of files at once.

Define the output folder

Before processing the files, we need a dedicated place to save the results. We will create a folder called i2mc_output inside our existing DATA directory.

Using Python’s pathlib, we can define and create this directory safely in just two lines of code:

from pathlib import Path

# Define the output folder path
output_folder = Path('DATA') / 'i2mc_output'  # define folder path\name

# Create the folder (will do nothing if it already exists)
output_folder.mkdir(parents=True, exist_ok=True)

The .mkdir() method is incredibly convenient here:

parents=True ensures that if the parent folder (DATA) is missing, Python creates it for you automatically.
exist_ok=True prevents the script from crashing if the folder already exists—perfect for when you need to run your analysis script multiple times.

I2MC settings

Now that we have our data and our import function, we are almost ready to run I2MC. But first, we need to configure the algorithm.

These settings act as instructions for I2MC. The defaults provided below usually work well for most eye-tracking setups, so you can often leave them as they are. However, it is critical that you verify the Necessary Variables—specifically the screen resolution (xres, yres) and sampling frequency (freq)—to match your specific eye tracker model.

If you are curious about the math behind these options, we recommend reading the original I2MC article.

Let’s define these settings:

# =============================================================================
# NECESSARY VARIABLES
# =============================================================================

opt = {}

# --- General variables ---
opt['xres']         = 1920.0        # Max horizontal resolution in pixels
opt['yres']         = 1080.0        # Max vertical resolution in pixels
opt['missingx']     = np.nan        # Missing value code (we used np.nan in our import function)
opt['missingy']     = np.nan        # Missing value code
opt['freq']         = 300.0         # Sampling frequency (Hz). CHECK YOUR DEVICE! (e.g., 60, 120, 300)

# --- Visual Angle Calculation ---
# Used for calculating noise measures (RMS and BCEA). 
# If left empty, noise measures will be reported in pixels instead of degrees.
opt['scrSz']        = [50.9, 28.6]  # Screen size in cm (Width, Height)
opt['disttoscreen'] = 65.0          # Viewing distance in cm

# --- Output Options ---
do_plot_data = True  # Save a plot of fixation detection for each trial?
# Note: Figures work best for short trials (up to ~20 seconds)

# =============================================================================
# OPTIONAL VARIABLES (Algorithm Fine-Tuning)
# =============================================================================
# Only change these if you have a specific reason to deviate from defaults.

# --- Interpolation Settings (Steffen) ---
opt['windowtimeInterp'] = 0.1       # Max duration (s) of missing data to interpolate
opt['edgeSampInterp']   = 2         # Samples required at edges for interpolation
opt['maxdisp']          = opt['xres'] * 0.2 * np.sqrt(2) # Max displacement allowed

# --- K-Means Clustering Settings ---
opt['windowtime']       = 0.2       # Time window (s) for clustering (approx 1 saccade duration)
opt['steptime']         = 0.02      # Window shift (s) per iteration
opt['maxerrors']        = 100       # Max errors allowed before skipping file
opt['downsamples']      = [2, 5, 10]
opt['downsampFilter']   = False     # Chebychev filter (False avoids ringing artifacts)

# --- Fixation Determination Settings ---
opt['cutoffstd']        = 2.0       # Std devs above mean weight for fixation cutoff
opt['onoffsetThresh']   = 3.0       # MADs for refining fixation start/end points
opt['maxMergeDist']     = 30.0      # Max pixels between fixations to merge them
opt['maxMergeTime']     = 30.0      # Max ms between fixations to merge them
opt['minFixDur']        = 40.0      # Min duration (ms) for a valid fixation

Run I2MC

Now we can finally run the algorithm on all our files!

We will use a loop to iterate through every file we found. For each file, we will:

Create a specific folder for that participant.
Import the data using our tobii_TX300 function.
Run I2MC to detect fixations.
Save the results (CSV) and the visualization (PNG).

#%% Run I2MC
import matplotlib.pyplot as plt
import I2MC

for file_idx, file in enumerate(data_files):
    print(f'Processing file {file_idx + 1} of {len(data_files)}: {file.name}')
    
    # 1. Setup Folders
    name = file.stem
    subj_folder = output_folder / name
    subj_folder.mkdir(exist_ok=True)
       
    # 2. Import Data (using the function we defined earlier)
    # Note: We pass the resolution from our options to ensure consistency
    data = tobii_TX300(file, res=[opt['xres'], opt['yres']])
    
    # 3. Run I2MC
    # Returns: fix (dict of results), data (interpolated data), par (final parameters)
    fix, _, _ = I2MC.I2MC(data, opt)
    
    # 4. Save Plot (Optional)
    if do_plot_data and fix:
        save_plot = subj_folder / f"{name}.png"
        
        # Generate plot
        f = I2MC.plot.data_and_fixations(
            data, 
            fix, 
            fix_as_line=True, 
            res=[opt['xres'], opt['yres']]
        )
        
        # Save and close to free memory
        f.savefig(save_plot)
        plt.close(f)
        
    # 5. Save Data to CSV
    fix['participant'] = name
    fix_df = pd.DataFrame(fix)
    
    save_file = subj_folder / f"{name}.csv"
    fix_df.to_csv(save_file, index=False)

WE ARE DONE!!!!!

Congratulations! You have successfully processed your eye-tracking data. You should now see a new folder named i2mc_output containing a CSV file and a visualization plot for each of your participants.

But what exactly did we just generate?

I2MC analyzes your raw gaze data and identifies fixations—periods where the eye is relatively still and processing information. The resulting data frame contains several key pieces of information:

What I2MC Returns:

cutoff: A number representing the cutoff used for fixation detection.
start: An array holding the indices where fixations start.
end: An array holding the indices where fixations end.
startT: An array containing the times when fixations start.
endT: An array containing the times when fixations end.
dur: An array storing the durations of fixations.
xpos: An array representing the median horizontal position for each fixation in the trial.
ypos: An array representing the median vertical position for each fixation in the trial.
flankdataloss: A boolean value (1 or 0) indicating whether a fixation is flanked by data loss (1) or not (0).
fracinterped: A fraction that tells us the amount of data loss or interpolated data in the fixation data.

In simple terms, I2MC helps us understand where and for how long a person’s gaze remains fixed during an eye-tracking experiment.

This is just the first step!!

Now that we have our fixations, we’ll need to use them to extract the information we’re interested in. Typically, this involves using the raw data to understand what was happening at each specific time point and using the data from I2MC to determine where the participant was looking at that time. This will be covered in a new tutorial. For now, you’ve successfully completed the pre-processing of your eye-tracking data, extracting a robust estimation of participants’ fixations!!

Warning

Please keep in mind that this tutorial uses a simplified approach for demonstration purposes. It assumes:

Each participant has a single data file (one trial/block).
The data is relatively clean and continuous.
Files are not missing critical columns or metadata.

If your real-world data is more complex (e.g., multiple sessions per participant, corrupted files), you may need to add extra checks to your script.

For comprehensive documentation and advanced examples—including how to handle missing data and batch processing errors—we highly recommend checking out the official I2MC repository.

If you encounter issues running this script on your own data, don’t hesitate to reach out to us. Happy coding!

Entire script

To simplify things, here is the complete script we have developed together. This version incorporates all the changes we discussed, including the use of DeToX’s HDF5 data format and the updated import function.

import os
from pathlib import Path

import I2MC
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


# =============================================================================
# 1. Function to Import Data (DeToX format)
# =============================================================================

def tobii_TX300(fname, res=[1920, 1080]):
    """
    Import and preprocess DeToX HDF5 data for I2MC.
    """
    # 1. Load the raw data from HDF5
    # We read the 'gaze' key where DeToX stores samples
    raw_df = pd.read_hdf(fname, key='gaze')
    
    # 2. Create the output DataFrame expected by I2MC
    df = pd.DataFrame()
    df['time'] = raw_df['TimeStamp']
    
    # Map DeToX columns to I2MC expected names
    df['L_X'] = raw_df['Left_X']
    df['L_Y'] = raw_df['Left_Y']
    df['R_X'] = raw_df['Right_X']
    df['R_Y'] = raw_df['Right_Y']
    
    # 3. Clean Artifacts (Out of bounds)
    # Define valid screen area (monitor + margin)
    
    # --- Left Eye ---
    lMiss1 = (df['L_X'] < -res[0]) | (df['L_X'] > 2 * res[0])
    lMiss2 = (df['L_Y'] < -res[1]) | (df['L_Y'] > 2 * res[1])
    # Combine with DeToX validity flag (0 = invalid)
    lMiss = lMiss1 | lMiss2 | (raw_df['Left_Validity'] == 0)
    
    df.loc[lMiss, 'L_X'] = np.nan
    df.loc[lMiss, 'L_Y'] = np.nan

    # --- Right Eye ---
    rMiss1 = (df['R_X'] < -res[0]) | (df['R_X'] > 2 * res[0])
    rMiss2 = (df['R_Y'] < -res[1]) | (df['R_Y'] > 2 * res[1])
    rMiss = rMiss1 | rMiss2 | (raw_df['Right_Validity'] == 0)
    
    df.loc[rMiss, 'R_X'] = np.nan
    df.loc[rMiss, 'R_Y'] = np.nan

    return df


# =============================================================================
# 2. Preparation and Setup
# =============================================================================

# Setting the working directory
os.chdir(r'<<< YOUR PATH >>>>')

# Find the files (DeToX uses .h5 files)
data_files = list(Path().glob('DATA/RAW/**/*.h5'))

# define the output folder
output_folder = Path('DATA') / 'i2mc_output'

# Create the folder (will do nothing if it already exists)
output_folder.mkdir(parents=True, exist_ok=True)


# =============================================================================
# 3. I2MC Settings (NECESSARY VARIABLES)
# =============================================================================

opt = {}
# General variables for eye-tracking data
opt['xres']         = 1920.0                # Max horizontal resolution in pixels
opt['yres']         = 1080.0                # Max vertical resolution in pixels
opt['missingx']     = np.nan                # Missing value code
opt['missingy']     = np.nan                # Missing value code
opt['freq']         = 300.0                 # Sampling frequency (Hz) - CHECK YOUR DEVICE!

# Visual Angle Calculation
opt['scrSz']        = [50.9, 28.6]          # Screen size in cm
opt['disttoscreen'] = 65.0                  # Distance to screen in cm

# Plotting
do_plot_data = True # Save visualization plots?


# =============================================================================
# 4. Optional Variables (Fine-Tuning)
# =============================================================================

# Interpolation
opt['windowtimeInterp']     = 0.1
opt['edgeSampInterp']       = 2
opt['maxdisp']              = opt['xres'] * 0.2 * np.sqrt(2)

# Clustering
opt['windowtime']           = 0.2
opt['steptime']             = 0.02
opt['maxerrors']            = 100
opt['downsamples']          = [2, 5, 10]
opt['downsampFilter']       = False

# Fixation Determination
opt['cutoffstd']            = 2.0
opt['onoffsetThresh']       = 3.0
opt['maxMergeDist']         = 30.0
opt['maxMergeTime']         = 30.0
opt['minFixDur']            = 40.0


# =============================================================================
# 5. Run I2MC Loop
# =============================================================================

for file_idx, file in enumerate(data_files):
    print(f'Processing file {file_idx + 1} of {len(data_files)}: {file.name}')

    # Extract name
    name = file.stem    
    
    # Create subject folder
    subj_folder = output_folder / name
    subj_folder.mkdir(exist_ok=True)
       
    # Import data using our new function
    data = tobii_TX300(file, res=[opt['xres'], opt['yres']])

    # Run I2MC
    fix, _, _ = I2MC.I2MC(data, opt)

    # Save Plot
    if do_plot_data and fix:
        save_plot = subj_folder / f"{name}.png"
        f = I2MC.plot.data_and_fixations(
            data, fix, 
            fix_as_line=True, 
            res=[opt['xres'], opt['yres']]
        )
        f.savefig(save_plot)
        plt.close(f)

    # Save Data
    fix['participant'] = name
    fix_df = pd.DataFrame(fix)
    save_file = subj_folder / f"{name}.csv"
    fix_df.to_csv(save_file, index=False)

--- title: "Using I2MC for robust fixation extraction" date: "07/19/2024" execute: eval: false jupyter: kernel: "python3" pagetitle: "Using I2MC for robust fixation extraction" author-meta: "Tommaso Ghilardi" description-meta: "Learn how to use I2MC for robust fixation extraction in eye-tracking data analysis." keywords: "eye-tracking, I2MC, fixation detection, data analysis, tutorial, python, DevStart, developmental science, tutorial, eye fixations" categories: - Eye-tracking - Python --- # What we are going to do When it comes to eye-tracking data, a fundamental role is played by fixations. A fixation indicates that a person's eyes are looking at a particular point of interest for a given amount of time. More specifically, a fixation is a cluster of consecutive data points in an eye-tracking dataset for which a person's gaze remains relatively still and focused on a particular area or object. Typically, eye-tracking programs come with their own fixation detection algorithms that give us a rough idea of what the person was looking at. But here's the problem: these tools aren't always very good when it comes to data from infants and children. Why? Because infants and children can be all over the place! They move their heads, put their hands (or even feet) in front of their faces, close their eyes, or just look away. All of this makes the data a big mess that's hard to make sense of with regular fixation detection algorithms. Because the data is so messy, it is difficult to tell which data points are part of the same fixation or different fixations. **But don't worry! We've got a solution: I2MC.** I2MC stands for *"Identification by Two-Means Clustering"*, and it was designed specifically for this kind of problem. It's designed to deal with all kinds of noise, and even periods of data loss. In this tutorial, we'll show you how to use I2MC to find fixations. We won't get into the nerdy stuff about how it works - this is all about the practical side. If you're curious about the science, you can read the original [article](https://link.springer.com/article/10.3758/s13428-016-0822-1). Now that we've introduced I2MC, let's get our hands dirty and see how to use it! # Install I2MC Installing I2MC in Python is extremely easy. As explained in the tutorial to install Python, just open the miniconda terminal, activate the environment you want to install I2MC in, and type `pip install I2MC`. After a few seconds, you should be ready to go!! You may need ::: callout-note I2MC has been originally written for Matlab. So for you crazy people who would prefer to use Matlab you can find instructions to download and use I2MC here: [I2MC Matlab!](https://github.com/royhessels/I2MC) ::: # Use I2MC Let's start with importing the libraries that we will need ```{python} import I2MC # I2MC import pandas as pd # panda help us read csv import numpy as np # to handle arrays import matplotlib.pyplot as plt # to make plots ``` This was too easy now, let's start to really get into it. ## Import data Now we will write a simple function to import our data. Because different eye trackers produce different file structures, this step acts as a "translator" to get everything into a standard format. For this tutorial, you can either: 1. **Use our sample data:** Download the dataset we collected with DeToX [here](/resources/I2mc/DATA.zip). 2. **Use your own data:** You will just need to adapt the column names in this function to match your specific file format. Let's build the import function step by step. ```{python} #| label: read data # Load data raw_df = pd.read_hdf(fname, key='gaze') ``` Once the data is loaded, we need to create a clean DataFrame containing only the information essential for analysis. This is the most important part: we need to map your specific column names to a standard set of 5 columns (Time, Left X, Left Y, Right X, Right Y). If you are using a different eye tracker (e.g., Eyelink), this is a part of the code you will need to change! ```{python} #| label: select columns # Create empty dataframe df = pd.DataFrame() # Extract required data df['time'] = raw_df['TimeStamp'] df['L_X'] = raw_df['Left_X'] df['L_Y'] = raw_df['Left_Y'] df['R_X'] = raw_df['Right_X'] df['R_Y'] = raw_df['Right_Y'] ``` After extracting the raw coordinates, we perform some minimal pre-processing to remove artifacts. Eye trackers can occasionally produce "spurious" data points where the gaze coordinates jump far outside the screen boundaries. We define a valid range (the monitor size plus a margin) and mark any samples outside this range as missing (`NaN`). Additionally, we use the **Validity** codes provided by the eye tracker. DeToX standardizes these so that `1` indicates a valid sample and `0` indicates an invalid one. We simply reject any sample marked as invalid. ```{python} #| label: fix data # Sometimes we have weird peaks where one sample is (very) far outside the # monitor. Here, count as missing any data that is more than one monitor # distance outside the monitor. # Screen resolution res = [1920, 1080] # --- Left Eye Processing --- # 1. Check for coordinates far outside the monitor lMiss1 = (df['L_X'] < -res[0]) | (df['L_X'] > 2 * res[0]) lMiss2 = (df['L_Y'] < -res[1]) | (df['L_Y'] > 2 * res[1]) # 2. Check validity (0 = invalid in DeToX) # Combine criteria: Miss if out of bounds OR validity is 0 lMiss = lMiss1 | lMiss2 | (raw_df['Left_Validity'] == 0) # 3. Replace invalid samples with NaN df.loc[lMiss, 'L_X'] = np.nan df.loc[lMiss, 'L_Y'] = np.nan # --- Right Eye Processing --- rMiss1 = (df['R_X'] < -res[0]) | (df['R_X'] > 2 * res[0]) rMiss2 = (df['R_Y'] < -res[1]) | (df['R_Y'] > 2 * res[1]) rMiss = rMiss1 | rMiss2 | (raw_df['Right_Validity'] == 0) df.loc[rMiss, 'R_X'] = np.nan df.loc[rMiss, 'R_Y'] = np.nan ``` **Perfect!!!** ### Everything into a function We have successfully loaded the data, extracted the relevant columns, and cleaned up the artifacts. To make this easy to use with I2MC (and to keep our main script clean), we will wrap all these steps into a single, reusable function. ```{python} #| label: make a function def tobii_TX300(fname, res=[1920,1080]): """ Import and preprocess DeToX HDF5 data for I2MC. """ # 1. Load the raw data raw_df = pd.read_hdf(fname, key='gaze') # 2. Create the output DataFrame df = pd.DataFrame() df['time'] = raw_df['TimeStamp'] # Map DeToX columns to I2MC expected names df['L_X'] = raw_df['Left_X'] df['L_Y'] = raw_df['Left_Y'] df['R_X'] = raw_df['Right_X'] df['R_Y'] = raw_df['Right_Y'] # 3. Clean Artifacts (Out of bounds) # Left Eye lMiss1 = (df['L_X'] < -res[0]) | (df['L_X'] > 2 * res[0]) lMiss2 = (df['L_Y'] < -res[1]) | (df['L_Y'] > 2 * res[1]) lMiss = lMiss1 | lMiss2 | (raw_df['Left_Validity'] == 0) df.loc[lMiss, 'L_X'] = np.nan df.loc[lMiss, 'L_Y'] = np.nan # Right Eye rMiss1 = (df['R_X'] < -res[0]) | (df['R_X'] > 2 * res[0]) rMiss2 = (df['R_Y'] < -res[1]) | (df['R_Y'] > 2 * res[1]) rMiss = rMiss1 | rMiss2 | (raw_df['Right_Validity'] == 0) df.loc[rMiss, 'R_X'] = np.nan df.loc[rMiss, 'R_Y'] = np.nan return df ``` ### Find our data Nice!! we have our import function that we will use to read our data. Now, let's find our data! To do this, we will use the glob library, which is a handy tool for finding files in Python. Before that let's set our working directory. The working directory is the folder where we have all our scripts and data. We can set it using the `os` library: ```{python} import os os.chdir(r'<<< YOUR PATH >>>>') ``` This is my directory, you will have something different, you need to change it to where your data are. Once you are done with that we can use glob to find our data files. In the code below, we are telling Python to look for files with a *.h5* extension in a specific folder on our computer: ```{python} from pathlib import Path data_files = list(Path().glob('DATA/RAW/**/*.h5')) ``` - `DATA\\RAW\\`: This is the path where we want to start our search. - `**`: This special symbol tells Python to search in all the subfolders (folders within folders) under our starting path. - `*.h5`: We're asking Python to look for files with names ending in ".h5". So, when we run this code, Python will find and give us a list of all the ".h5" files located in any subfolder within our specified path. This makes it really convenient to find and work with lots of files at once. ### Define the output folder Before processing the files, we need a dedicated place to save the results. We will create a folder called `i2mc_output` inside our existing `DATA` directory. Using Python's `pathlib`, we can define and create this directory safely in just two lines of code: ```{python} from pathlib import Path # Define the output folder path output_folder = Path('DATA') / 'i2mc_output' # define folder path\name # Create the folder (will do nothing if it already exists) output_folder.mkdir(parents=True, exist_ok=True) ``` The `.mkdir()` method is incredibly convenient here: - `parents=True` ensures that if the parent folder (`DATA`) is missing, Python creates it for you automatically. - `exist_ok=True` prevents the script from crashing if the folder already exists—perfect for when you need to run your analysis script multiple times. ### I2MC settings Now that we have our data and our import function, we are almost ready to run I2MC. But first, we need to configure the algorithm. These settings act as instructions for I2MC. The defaults provided below usually work well for most eye-tracking setups, so you can often leave them as they are. However, **it is critical** that you verify the `Necessary Variables`—specifically the screen resolution (`xres`, `yres`) and sampling frequency (`freq`)—to match your specific eye tracker model. If you are curious about the math behind these options, we recommend reading the original [I2MC article](https://link.springer.com/article/10.3758/s13428-016-0822-1). Let's define these settings: ```{python} #| label: I2MC options # ============================================================================= # NECESSARY VARIABLES # ============================================================================= opt = {} # --- General variables --- opt['xres'] = 1920.0 # Max horizontal resolution in pixels opt['yres'] = 1080.0 # Max vertical resolution in pixels opt['missingx'] = np.nan # Missing value code (we used np.nan in our import function) opt['missingy'] = np.nan # Missing value code opt['freq'] = 300.0 # Sampling frequency (Hz). CHECK YOUR DEVICE! (e.g., 60, 120, 300) # --- Visual Angle Calculation --- # Used for calculating noise measures (RMS and BCEA). # If left empty, noise measures will be reported in pixels instead of degrees. opt['scrSz'] = [50.9, 28.6] # Screen size in cm (Width, Height) opt['disttoscreen'] = 65.0 # Viewing distance in cm # --- Output Options --- do_plot_data = True # Save a plot of fixation detection for each trial? # Note: Figures work best for short trials (up to ~20 seconds) # ============================================================================= # OPTIONAL VARIABLES (Algorithm Fine-Tuning) # ============================================================================= # Only change these if you have a specific reason to deviate from defaults. # --- Interpolation Settings (Steffen) --- opt['windowtimeInterp'] = 0.1 # Max duration (s) of missing data to interpolate opt['edgeSampInterp'] = 2 # Samples required at edges for interpolation opt['maxdisp'] = opt['xres'] * 0.2 * np.sqrt(2) # Max displacement allowed # --- K-Means Clustering Settings --- opt['windowtime'] = 0.2 # Time window (s) for clustering (approx 1 saccade duration) opt['steptime'] = 0.02 # Window shift (s) per iteration opt['maxerrors'] = 100 # Max errors allowed before skipping file opt['downsamples'] = [2, 5, 10] opt['downsampFilter'] = False # Chebychev filter (False avoids ringing artifacts) # --- Fixation Determination Settings --- opt['cutoffstd'] = 2.0 # Std devs above mean weight for fixation cutoff opt['onoffsetThresh'] = 3.0 # MADs for refining fixation start/end points opt['maxMergeDist'] = 30.0 # Max pixels between fixations to merge them opt['maxMergeTime'] = 30.0 # Max ms between fixations to merge them opt['minFixDur'] = 40.0 # Min duration (ms) for a valid fixation ``` ### Run I2MC Now we can finally run the algorithm on all our files! We will use a loop to iterate through every file we found. For each file, we will: 1. **Create** a specific folder for that participant. 2. **Import** the data using our `tobii_TX300` function. 3. **Run** I2MC to detect fixations. 4. **Save** the results (CSV) and the visualization (PNG). ```{python} #%% Run I2MC import matplotlib.pyplot as plt import I2MC for file_idx, file in enumerate(data_files): print(f'Processing file {file_idx + 1} of {len(data_files)}: {file.name}') # 1. Setup Folders name = file.stem subj_folder = output_folder / name subj_folder.mkdir(exist_ok=True) # 2. Import Data (using the function we defined earlier) # Note: We pass the resolution from our options to ensure consistency data = tobii_TX300(file, res=[opt['xres'], opt['yres']]) # 3. Run I2MC # Returns: fix (dict of results), data (interpolated data), par (final parameters) fix, _, _ = I2MC.I2MC(data, opt) # 4. Save Plot (Optional) if do_plot_data and fix: save_plot = subj_folder / f"{name}.png" # Generate plot f = I2MC.plot.data_and_fixations( data, fix, fix_as_line=True, res=[opt['xres'], opt['yres']] ) # Save and close to free memory f.savefig(save_plot) plt.close(f) # 5. Save Data to CSV fix['participant'] = name fix_df = pd.DataFrame(fix) save_file = subj_folder / f"{name}.csv" fix_df.to_csv(save_file, index=False) ``` ## WE ARE DONE!!!!! **Congratulations!** You have successfully processed your eye-tracking data. You should now see a new folder named `i2mc_output` containing a CSV file and a visualization plot for each of your participants. But what exactly did we just generate? I2MC analyzes your raw gaze data and identifies **fixations**—periods where the eye is relatively still and processing information. The resulting data frame contains several key pieces of information: **What I2MC Returns:** - `cutoff`: A number representing the cutoff used for fixation detection. - `start`: An array holding the indices where fixations start. - `end`: An array holding the indices where fixations end. - `startT`: An array containing the times when fixations start. - `endT`: An array containing the times when fixations end. - `dur`: An array storing the durations of fixations. - `xpos`: An array representing the median horizontal position for each fixation in the trial. - `ypos`: An array representing the median vertical position for each fixation in the trial. - `flankdataloss`: A boolean value (1 or 0) indicating whether a fixation is flanked by data loss (1) or not (0). - `fracinterped`: A fraction that tells us the amount of data loss or interpolated data in the fixation data. In simple terms, I2MC helps us understand where and for how long a person's gaze remains fixed during an eye-tracking experiment. This is just the first step!! Now that we have our fixations, we'll need to use them to extract the information we're interested in. Typically, this involves using the raw data to understand what was happening at each specific time point and using the data from I2MC to determine where the participant was looking at that time. This will be covered in a new tutorial. For now, you've successfully completed the pre-processing of your eye-tracking data, extracting a robust estimation of participants' fixations!! ::: callout-warning Please keep in mind that this tutorial uses a simplified approach for demonstration purposes. It assumes: - Each participant has a single data file (one trial/block). - The data is relatively clean and continuous. - Files are not missing critical columns or metadata. If your real-world data is more complex (e.g., multiple sessions per participant, corrupted files), you may need to add extra checks to your script. For comprehensive documentation and advanced examples—including how to handle missing data and batch processing errors—we highly recommend checking out the official [I2MC repository](https://github.com/dcnieho/I2MC_Python/tree/master/example). If you encounter issues running this script on your own data, don't hesitate to reach out to us. **Happy coding!** ::: ## Entire script To simplify things, here is the complete script we have developed together. This version incorporates all the changes we discussed, including the use of DeToX's HDF5 data format and the updated import function. ```{python} #| label: total script import os from pathlib import Path import I2MC import pandas as pd import numpy as np import matplotlib.pyplot as plt # ============================================================================= # 1. Function to Import Data (DeToX format) # ============================================================================= def tobii_TX300(fname, res=[1920, 1080]): """ Import and preprocess DeToX HDF5 data for I2MC. """ # 1. Load the raw data from HDF5 # We read the 'gaze' key where DeToX stores samples raw_df = pd.read_hdf(fname, key='gaze') # 2. Create the output DataFrame expected by I2MC df = pd.DataFrame() df['time'] = raw_df['TimeStamp'] # Map DeToX columns to I2MC expected names df['L_X'] = raw_df['Left_X'] df['L_Y'] = raw_df['Left_Y'] df['R_X'] = raw_df['Right_X'] df['R_Y'] = raw_df['Right_Y'] # 3. Clean Artifacts (Out of bounds) # Define valid screen area (monitor + margin) # --- Left Eye --- lMiss1 = (df['L_X'] < -res[0]) | (df['L_X'] > 2 * res[0]) lMiss2 = (df['L_Y'] < -res[1]) | (df['L_Y'] > 2 * res[1]) # Combine with DeToX validity flag (0 = invalid) lMiss = lMiss1 | lMiss2 | (raw_df['Left_Validity'] == 0) df.loc[lMiss, 'L_X'] = np.nan df.loc[lMiss, 'L_Y'] = np.nan # --- Right Eye --- rMiss1 = (df['R_X'] < -res[0]) | (df['R_X'] > 2 * res[0]) rMiss2 = (df['R_Y'] < -res[1]) | (df['R_Y'] > 2 * res[1]) rMiss = rMiss1 | rMiss2 | (raw_df['Right_Validity'] == 0) df.loc[rMiss, 'R_X'] = np.nan df.loc[rMiss, 'R_Y'] = np.nan return df # ============================================================================= # 2. Preparation and Setup # ============================================================================= # Setting the working directory os.chdir(r'<<< YOUR PATH >>>>') # Find the files (DeToX uses .h5 files) data_files = list(Path().glob('DATA/RAW/**/*.h5')) # define the output folder output_folder = Path('DATA') / 'i2mc_output' # Create the folder (will do nothing if it already exists) output_folder.mkdir(parents=True, exist_ok=True) # ============================================================================= # 3. I2MC Settings (NECESSARY VARIABLES) # ============================================================================= opt = {} # General variables for eye-tracking data opt['xres'] = 1920.0 # Max horizontal resolution in pixels opt['yres'] = 1080.0 # Max vertical resolution in pixels opt['missingx'] = np.nan # Missing value code opt['missingy'] = np.nan # Missing value code opt['freq'] = 300.0 # Sampling frequency (Hz) - CHECK YOUR DEVICE! # Visual Angle Calculation opt['scrSz'] = [50.9, 28.6] # Screen size in cm opt['disttoscreen'] = 65.0 # Distance to screen in cm # Plotting do_plot_data = True # Save visualization plots? # ============================================================================= # 4. Optional Variables (Fine-Tuning) # ============================================================================= # Interpolation opt['windowtimeInterp'] = 0.1 opt['edgeSampInterp'] = 2 opt['maxdisp'] = opt['xres'] * 0.2 * np.sqrt(2) # Clustering opt['windowtime'] = 0.2 opt['steptime'] = 0.02 opt['maxerrors'] = 100 opt['downsamples'] = [2, 5, 10] opt['downsampFilter'] = False # Fixation Determination opt['cutoffstd'] = 2.0 opt['onoffsetThresh'] = 3.0 opt['maxMergeDist'] = 30.0 opt['maxMergeTime'] = 30.0 opt['minFixDur'] = 40.0 # ============================================================================= # 5. Run I2MC Loop # ============================================================================= for file_idx, file in enumerate(data_files): print(f'Processing file {file_idx + 1} of {len(data_files)}: {file.name}') # Extract name name = file.stem # Create subject folder subj_folder = output_folder / name subj_folder.mkdir(exist_ok=True) # Import data using our new function data = tobii_TX300(file, res=[opt['xres'], opt['yres']]) # Run I2MC fix, _, _ = I2MC.I2MC(data, opt) # Save Plot if do_plot_data and fix: save_plot = subj_folder / f"{name}.png" f = I2MC.plot.data_and_fixations( data, fix, fix_as_line=True, res=[opt['xres'], opt['yres']] ) f.savefig(save_plot) plt.close(f) # Save Data fix['participant'] = name fix_df = pd.DataFrame(fix) save_file = subj_folder / f"{name}.csv" fix_df.to_csv(save_file, index=False) ```