The example will continue to use a Script CHOP, Python and TouchDesigner for a face detection function. Instead of using the MediaPipe library, it will use the Dlib Python binding. It refers to the face detector example program from the Dlib distribution. Dlib is a popular C++ based programming toolkit for various applications. Its image processing library contains a number of face detection functions. Python binding is also available.
The main face detection capability is defined in the following statements.
This is the last part of the series, using MediaPipe in TouchDesigner. The following example is a continuation of the last post of pose tracking. This version will use a Script CHOP to output the position information of the torso tracked in the film sequence. The output window will display four numbers (11, 12, 23, 24) on the four corners of the torso. The four numbers are the indices of the pose landmarks corresponding to the torso of the body.
The Script CHOP will output 3 channels
pose:x
pose:y
pose:visibility
Each channel has 33 samples, corresponding to the 33 pose landmarks. The visibility channel will indicate how likely the landmark is visible in the image. The following code segment describes how it is done.
xpos = []
ypos = []
visb = []
if results.pose_landmarks:
for p in results.pose_landmarks.landmark:
xpos.append(p.x)
ypos.append(p.y)
visb.append(p.visibility)
tx = scriptOp.appendChan('pose:x')
ty = scriptOp.appendChan('pose:y')
tv = scriptOp.appendChan('pose:visibility')
tx.vals = xpos
ty.vals = ypos
tv.vals = visb
scriptOp.rate = me.time.rate
scriptOp.numSamples = len(xpos)
The final TouchDesigner project folder MediaPipePoseCHOP is now available in the GitHub repository.
The project does not resize the original film clip with a Resolution TOP. It performs the resize function within the Python code in the Script TOP with the OpenCV function cv2.resize(). Each pose detected will generate 33 pose landmarks. The details can be found from the following diagram.
Image from the Google MediaPipe
Together with the original video image, the drawing utility will generate the pose skeleton with the following code segment.
The final TouchDesigner project is available in the GitHub folder as MediaPipePoseTOP. Owing to file size and copyright concerns, the two film clips are not included in GitHub.
The following example presents a more general approach to obtain the hand tracking details in a Script CHOP. We can then use other TouchDesigner CHOPs to extract the data for visualisation.
For simplicity, it also detects one single hand. For each hand tracked, it will generate 21 landmarks as shown in the diagram from the last post. The Script CHOP will produce 2 channels, hand:x and hand:y. Each of the channel will have 21 samples, corresponding to the 21 hand landmarks from MediaPipe. The following code segment describes how it is done.
detail_x = []
detail_y = []
if results.multi_hand_landmarks:
for hand in results.multi_hand_landmarks:
for pt in hand.landmark:
detail_x.append(pt.x)
detail_y.append(pt.y)
tx = scriptOp.appendChan('hand:x')
ty = scriptOp.appendChan('hand:y')
tx.vals = detail_x
ty.vals = detail_y
scriptOp.numSamples = len(detail_x)
scriptOp.rate = me.time.rate
The TouchDesigner project also uses Shuffle CHOP to swap the 21 samples into 21 channels. We can then select the 5 channels corresponding to the 5 finger tips (4, 8, 12, 16, 20) for visualisation. The final project is available for download in the MediaPipeHandCHOP2 folder of the GitHub repository.
This example is the continuation of the last post using hand tracking in MediaPipe with TouchDesigner. This version will use a Script CHOP, instead of a Script TOP. The CHOP will produce channels related to the x and y positions of the Wrist and the Index Finger Tip. We can make use of the numbers to create interactive animation accordingly.
The MediaPipe hand tracking solution will generate 21 landmarks including all positions of the 5 fingers and the wrist. Details of the 21 landmarks are in the following diagram.
Image from the Google MediaPipe
For simplicity, the example only detects one hand. The indices 0 and 8 correspond to the WRIST and the INDEX_FINGER_TIP respectively. The following code segment illustrates how it generates the channels for the Script CHOP.
This tutorial introduces the use of hand tracking in the Google MediaPipe with TouchDesigner. Similar to the previous posts, part 1 of hand tracking will just be a visualisation of the hand details from a Script TOP. It will use the MediaPipe drawing utility to display the hand details directly onto the Video Device In image for output.
The TouchDesigner project can now be downloaded from the MediaPipeHandTOP GitHub directory.
This is the continuation of the last post with slight modifications. Instead of just displaying the face mesh details in a Script TOP, it tries to visualise all the face mesh points in a 3D space. As the facial landmarks returned from the MediaPipe contain three dimensional information, it is possible to enumerate all the points and display them in a Script SOP. We are going to use the appendPoint() function to generate the point cloud and the appendPoly() function to create the face mesh.
The data returned from the MediaPipe contains the 468 facial landmarks, based on the Canonical Face Model. The face mesh information (triangles), however, is not available from the results obtained from the MediaPipe solutions. Nevertheless, we can obtain such information from the meta data of the facial landmarks from its GitHub. To simplify the process, I have edited the data into this CSV mesh file. It is expected that the mesh.csv file is located in the TouchDesigner project folder, together with the TOE project file. Here are the first few lines of the mesh.csv file,
Each line is the data for a triangular mesh of the face. The 3 numbers are the indices of the vertices defined in the 468 facial landmarks. The visualisation of the landmarks is also available in the MediaPipe GitHub.
Image from the Google MediaPipe GitHub
The TouchDesigner project will render the Script SOP with the standard Geometry, Camera, Light and the Render TOP.
I’ll not go through all the code here. The following paragraphs cover some of the essential elements in the Python code. The first one is the initialisation of the face mesh information from the mesh.csv file.
triangles = []
mesh_file = project.folder + "/mesh.csv"
mf = open(mesh_file, "r")
mesh_list = mf.read().split('\n')
for m in mesh_list:
temp = m.split(',')
x = temp[0]
y = temp[1]
z = temp[2]
triangles.append([x, y, z])
The variable triangles is the list of all triangles from the canonical face model. Each entry is a list of 3 indices to the entries of the corresponding points in the 468 facial landmarks. The second one is the code to generate the face point cloud and the mesh.
for pt in landmarks:
p = scriptOp.appendPoint()
p.x = pt.x
p.y = pt.y
p.z = pt.z
for poly in triangles:
pp = scriptOp.appendPoly(3, closed=True, addPoints=False)
pp[0].point = scriptOp.points[poly[0]]
pp[1].point = scriptOp.points[poly[1]]
pp[2].point = scriptOp.points[poly[2]]
The first for loop creates all the points from the facial landmarks using the appendPoint() function. The second for loop creates all the triangular meshes from information stored in the variable triangles using the appendPoly() function.
After we draw the 3D face model, we also compute the normals of the model by using another Attribute Create SOP.
The final TouchDesigner project is available in the MediaPipeFaceMeshSOP repository.
The following example is a simple demonstration of the Face Mesh function from MediaPipe in TouchDesigner. It is very similar to the previous face detection example. Again, we are going to use the Script TOP to integrate with MediaPipe and display the face mesh information together with the live webcam image.
Instead of flipping the image vertically in the Python code, this version will perform the flipping in the TouchDesigner Flip TOP, both vertically and horizontally (mirror image). We also reduce the resolution from the original 1280 x 720 to 640 x 360 for better performance. The Face Mesh information is drawn directly to the output image in the Script TOP.
Here is also the Python code in the Script TOP
# me - this DAT
# scriptOp - the OP which is cooking
import numpy
import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_face_mesh = mp.solutions.face_mesh
point_spec = mp_drawing.DrawingSpec(
color=(0, 100, 255),
thickness=1,
circle_radius=1
)
line_spec = mp_drawing.DrawingSpec(
color=(255, 200, 0),
thickness=2,
circle_radius=1
)
face_mesh = mp_face_mesh.FaceMesh(
min_detection_confidence=0.5,
min_tracking_confidence=0.5
)
# press 'Setup Parameters' in the OP to call this function to re-create the parameters.
def onSetupParameters(scriptOp):
page = scriptOp.appendCustomPage('Custom')
p = page.appendFloat('Valuea', label='Value A')
p = page.appendFloat('Valueb', label='Value B')
return
# called whenever custom pulse parameter is pushed
def onPulse(par):
return
def onCook(scriptOp):
input = scriptOp.inputs[0].numpyArray(delayed=True)
if input is not None:
frame = input * 255
frame = frame.astype('uint8')
frame = cv2.cvtColor(frame, cv2.COLOR_RGBA2RGB)
results = face_mesh.process(frame)
if results.multi_face_landmarks:
for face_landmarks in results.multi_face_landmarks:
mp_drawing.draw_landmarks(
image=frame,
landmark_list=face_landmarks,
connections=mp_face_mesh.FACE_CONNECTIONS,
landmark_drawing_spec=point_spec,
connection_drawing_spec=line_spec)
frame = cv2.cvtColor(frame, cv2.COLOR_RGB2RGBA)
scriptOp.copyNumpyArray(frame)
return
Similar to previous examples, the important code is in the onCook function. The face_mesh will process each frame and draw the results in the frame instance for final display.
The TouchDesigner project is now available in the MediaPipeFaceMeshTOP folder of the GitHub repository.
The last post demonstrated the use of the face detection function in MediaPipe with TouchDesigner. Nevertheless, it only produced an image with the detected results. It is not very useful if we want to manipulate the graphics according to the detected faces. In this example, we switch to the use of Script CHOP to output the detected face data in numeric form.
As mentioned in the last post, the MediaPipe face detection expects a vertically flipped image as compared with the TouchDesigner texture, this example will flip the image with a TouchDesigner TOP to make the Python code simpler. Instead of showing all the detected faces, the code just pick the largest face and output its bounding box and the position of the left and right eyes.
Since we are working on a Script CHOP, it is not possible to connect directly the flipped TOP to it. In this case, we use the onSetupParameters function to define the Face TOP input in the Custom tab.
And in the onCook function, we use the following statement to retrieve the image from the TOP that we dragged into the Face parameter.
topRef = scriptOp.par.Face.eval()
After we found out the largest face from the image, we append a number channels to the Script CHOP such that the TouchDesigner project can use them for custom visualisation. The new channels are,
face (number of faces detected)
width, height (size of the bounding box)
tx, ty (centre of the bounding box)
left_eye_x, left_eye_y (position of the left eye)
right_eye_x, right_eye_y (position of the right eye)
The complete project file can be downloaded from this GitHub repository.