How to extract and map function names and and their comments from C-header files?

528 Views Asked by At

I have many different .h files in different formats. They include function definitions, variable definitions and more. My goal is to extract all function names and their respective comments and map them. I am looking for a working approach to accomplish this using Python. I tried the following approaches:

pycparser

https://github.com/eliben/pycparser

I read the following blog article regarding my problem, but I couldn't manage to get it working: https://eli.thegreenplace.net/2015/on-parsing-c-type-declarations-and-fake-headers

Using the code below I get the following error:

pycparser.plyparser.ParseError: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/i386/_types.h:100:27: before: __darwin_va_list

import sys
from pycparser import parse_file

sys.path.extend(['.', '..'])

if __name__ == "__main__":
    filename = "<pathToHeader>/file.h"

    ast = parse_file(filename, use_cpp=True,
            cpp_path='gcc',
            cpp_args=['-E', r'<pathToPycparser>/utils/fake_libc_include'])
    ast.show()

pyclibrary

https://pyclibrary.readthedocs.io/en/latest/

This is actually running without an error, but it does not find any functions in the example file.

from pyclibrary import CParser
parser = CParser(["<pathToHeader>/file.h")])
print(parser)

pygccxml

https://pygccxml.readthedocs.io/en/master/index.html

Running the code below, I get numerous errors, e.g. unknown type name 'NSString and the script stops after 20 errors because too many errors were emitted.

from pygccxml import utils
from pygccxml import declarations
from pygccxml import parser

generator_path, generator_name = utils.find_xml_generator()

xml_generator_config = parser.xml_generator_configuration_t(
    xml_generator_path=generator_path,
    xml_generator=generator_name)

filename = "<pathToHeader>/file.h"

decls = parser.parse([filename], xml_generator_config)

global_namespace = declarations.get_global_namespace(decls)

ns = global_namespace.namespace("ns")

Doxygen

https://www.doxygen.nl/index.html

Even though Doxygen is not a parser itself but more a code-documentation tool I had the best results with it. It creates an xml output with all function names (even correctly parsed) and also the comments. The comments are tagged as comments and even noted with their line number. However, the functions are not noted with the line number, so matching the correct comment to the correct function is basically impossible.

Custom Parser

I created a simple parser myself, which is working somehow, but I cannot be sure that I covered all possible C-Syntax used within these files since there is different syntax used across the files and I cannot check all the files manually for their syntax because there are way too many.

functionComments = {}
inComment = False
comment = []
function = None

for i in range(0, len(lines)):

    line = lines[i].strip()

    if line.startswith("/*"):
        inComment = True

    elif line.endswith("*/"):
        inComment = False
        comment.append(line[:-2].strip())

        if len(lines) > i+1 and lines[i+1].startswith("- ("):
            functionName = lines[i+1]
            counter = 2

            while len(lines) > i+counter and lines[i+counter] != "\n":
                functionName += " " + lines[i+counter].lstrip().split(" API_")[0]
                counter += 1

            if ":" in functionName:
                functionNameParts = functionName.split(":")
                functionNamePartsSplitBySpace = []
                function = ""

                for j in range(0, len(functionNameParts)-1):
                    functionNamePartsSplitBySpace.append(functionNameParts[j].split(" "))

                for k in range(0, len(functionNamePartsSplitBySpace)):
                    function += functionNamePartsSplitBySpace[k][-1].split(")")[-1] + ":"

            else:
                function = lines[i+1].split(" NS_AVAILABLE")[0].split(")")[-1]

            functionComments[function] = "\n".join(comment)
            comment = []
            function = None

        else:
            function = None
            comment = []

    elif inComment:

        if line.startswith("* "):
            comment.append(line[2:].strip())

        else:
            comment.append(line)

Example Header File

/*
 *  CLLocationManagerDelegate.h
 *  CoreLocation
 *
 *  Copyright (c) 2008-2010 Apple Inc. All rights reserved.
 *
 */

#import <Availability.h>
#import <Foundation/Foundation.h>
#import <CoreLocation/CLLocationManager.h>
#import <CoreLocation/CLRegion.h>
#import <CoreLocation/CLVisit.h>

NS_ASSUME_NONNULL_BEGIN

@class CLLocation;
@class CLHeading;
@class CLBeacon;
@class CLVisit;

/*
 *  CLLocationManagerDelegate
 *  
 *  Discussion:
 *    Delegate for CLLocationManager.
 */
@protocol CLLocationManagerDelegate<NSObject>

@optional

/*
 *  locationManager:didUpdateToLocation:fromLocation:
 *  
 *  Discussion:
 *    Invoked when a new location is available. oldLocation may be nil if there is no previous location
 *    available.
 *
 *    This method is deprecated. If locationManager:didUpdateLocations: is
 *    implemented, this method will not be called.
 */
- (void)locationManager:(CLLocationManager *)manager
    didUpdateToLocation:(CLLocation *)newLocation
           fromLocation:(CLLocation *)oldLocation API_AVAILABLE(macos(10.6)) API_DEPRECATED("Implement -locationManager:didUpdateLocations: instead", ios(2.0, 6.0)) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:didUpdateLocations:
 *
 *  Discussion:
 *    Invoked when new locations are available.  Required for delivery of
 *    deferred locations.  If implemented, updates will
 *    not be delivered to locationManager:didUpdateToLocation:fromLocation:
 *
 *    locations is an array of CLLocation objects in chronological order.
 */
- (void)locationManager:(CLLocationManager *)manager
     didUpdateLocations:(NSArray<CLLocation *> *)locations API_AVAILABLE(ios(6.0), macos(10.9));

/*
 *  locationManager:didUpdateHeading:
 *  
 *  Discussion:
 *    Invoked when a new heading is available.
 */
- (void)locationManager:(CLLocationManager *)manager
       didUpdateHeading:(CLHeading *)newHeading API_AVAILABLE(ios(3.0), watchos(2.0)) API_UNAVAILABLE(tvos, macos);

/*
 *  locationManagerShouldDisplayHeadingCalibration:
 *
 *  Discussion:
 *    Invoked when a new heading is available. Return YES to display heading calibration info. The display 
 *    will remain until heading is calibrated, unless dismissed early via dismissHeadingCalibrationDisplay.
 */
- (BOOL)locationManagerShouldDisplayHeadingCalibration:(CLLocationManager *)manager  API_AVAILABLE(ios(3.0), watchos(2.0)) API_UNAVAILABLE(tvos, macos);

/*
 *  locationManager:didDetermineState:forRegion:
 *
 *  Discussion:
 *    Invoked when there's a state transition for a monitored region or in response to a request for state via a
 *    a call to requestStateForRegion:.
 */
- (void)locationManager:(CLLocationManager *)manager
    didDetermineState:(CLRegionState)state forRegion:(CLRegion *)region API_AVAILABLE(ios(7.0), macos(10.10)) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:didRangeBeacons:inRegion:
 *
 *  Discussion:
 *    Invoked when a new set of beacons are available in the specified region.
 *    beacons is an array of CLBeacon objects.
 *    If beacons is empty, it may be assumed no beacons that match the specified region are nearby.
 *    Similarly if a specific beacon no longer appears in beacons, it may be assumed the beacon is no longer received
 *    by the device.
 */
- (void)locationManager:(CLLocationManager *)manager
        didRangeBeacons:(NSArray<CLBeacon *> *)beacons
               inRegion:(CLBeaconRegion *)region API_DEPRECATED_WITH_REPLACEMENT("Use locationManager:didRangeBeacons:satisfyingConstraint:", ios(7.0, 13.0)) API_UNAVAILABLE(macos, macCatalyst) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:rangingBeaconsDidFailForRegion:withError:
 *
 *  Discussion:
 *    Invoked when an error has occurred ranging beacons in a region. Error types are defined in "CLError.h".
 */
- (void)locationManager:(CLLocationManager *)manager
rangingBeaconsDidFailForRegion:(CLBeaconRegion *)region
              withError:(NSError *)error API_DEPRECATED_WITH_REPLACEMENT("Use locationManager:didFailRangingBeaconsForConstraint:error:", ios(7.0, 13.0)) API_UNAVAILABLE(macos, macCatalyst) API_UNAVAILABLE(watchos, tvos);

- (void)locationManager:(CLLocationManager *)manager
        didRangeBeacons:(NSArray<CLBeacon *> *)beacons
   satisfyingConstraint:(CLBeaconIdentityConstraint *)beaconConstraint API_AVAILABLE(ios(13.0)) API_UNAVAILABLE(watchos, tvos, macos);

- (void)locationManager:(CLLocationManager *)manager
didFailRangingBeaconsForConstraint:(CLBeaconIdentityConstraint *)beaconConstraint
                  error:(NSError *)error API_AVAILABLE(ios(13.0)) API_UNAVAILABLE(watchos, tvos, macos);

/*
 *  locationManager:didEnterRegion:
 *
 *  Discussion:
 *    Invoked when the user enters a monitored region.  This callback will be invoked for every allocated
 *    CLLocationManager instance with a non-nil delegate that implements this method.
 */
- (void)locationManager:(CLLocationManager *)manager
    didEnterRegion:(CLRegion *)region API_AVAILABLE(ios(4.0), macos(10.8)) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:didExitRegion:
 *
 *  Discussion:
 *    Invoked when the user exits a monitored region.  This callback will be invoked for every allocated
 *    CLLocationManager instance with a non-nil delegate that implements this method.
 */
- (void)locationManager:(CLLocationManager *)manager
    didExitRegion:(CLRegion *)region API_AVAILABLE(ios(4.0), macos(10.8)) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:didFailWithError:
 *  
 *  Discussion:
 *    Invoked when an error has occurred. Error types are defined in "CLError.h".
 */
- (void)locationManager:(CLLocationManager *)manager
    didFailWithError:(NSError *)error;

/*
 *  locationManager:monitoringDidFailForRegion:withError:
 *  
 *  Discussion:
 *    Invoked when a region monitoring error has occurred. Error types are defined in "CLError.h".
 */
- (void)locationManager:(CLLocationManager *)manager
    monitoringDidFailForRegion:(nullable CLRegion *)region
    withError:(NSError *)error API_AVAILABLE(ios(4.0), macos(10.8)) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:didChangeAuthorizationStatus:
 *  
 *  Discussion:
 *    Invoked when the authorization status changes for this application.
 */
- (void)locationManager:(CLLocationManager *)manager didChangeAuthorizationStatus:(CLAuthorizationStatus)status API_AVAILABLE(ios(4.2), macos(10.7));

/*
 *  locationManager:didStartMonitoringForRegion:
 *  
 *  Discussion:
 *    Invoked when a monitoring for a region started successfully.
 */
- (void)locationManager:(CLLocationManager *)manager
    didStartMonitoringForRegion:(CLRegion *)region API_AVAILABLE(ios(5.0), macos(10.8)) API_UNAVAILABLE(watchos, tvos);

/*
 *  Discussion:
 *    Invoked when location updates are automatically paused.
 */
- (void)locationManagerDidPauseLocationUpdates:(CLLocationManager *)manager API_AVAILABLE(ios(6.0)) API_UNAVAILABLE(watchos, tvos, macos);

/*
 *  Discussion:
 *    Invoked when location updates are automatically resumed.
 *
 *    In the event that your application is terminated while suspended, you will
 *    not receive this notification.
 */
- (void)locationManagerDidResumeLocationUpdates:(CLLocationManager *)manager API_AVAILABLE(ios(6.0)) API_UNAVAILABLE(watchos, tvos, macos);

/*
 *  locationManager:didFinishDeferredUpdatesWithError:
 *
 *  Discussion:
 *    Invoked when deferred updates will no longer be delivered. Stopping
 *    location, disallowing deferred updates, and meeting a specified criterion
 *    are all possible reasons for finishing deferred updates.
 *
 *    An error will be returned if deferred updates end before the specified
 *    criteria are met (see CLError), otherwise error will be nil.
 */
- (void)locationManager:(CLLocationManager *)manager
    didFinishDeferredUpdatesWithError:(nullable NSError *)error API_AVAILABLE(ios(6.0), macos(10.9)) API_UNAVAILABLE(watchos, tvos);

/*
 *  locationManager:didVisit:
 *
 *  Discussion:
 *    Invoked when the CLLocationManager determines that the device has visited
 *    a location, if visit monitoring is currently started (possibly from a
 *    prior launch).
 */
- (void)locationManager:(CLLocationManager *)manager didVisit:(CLVisit *)visit API_AVAILABLE(ios(8.0)) API_UNAVAILABLE(watchos, tvos, macos);

@end

NS_ASSUME_NONNULL_END
1

There are 1 best solutions below

0
Kühlhausvogel On

Following Craig Esteys proposal to use castxml which uses clang libraries, I was able to extract the function names together with their line number by using the following command (suggested here).

Command Line

clang -cc1 -ast-dump -fblocks -x objective-c <pathToHeader>/file.h

Applying this command to the example header throws an error fatal error: 'Availability.h' file not found. Nevertheless, the AST is created successfully (as far as I can tell).

Python

findComment() is a custom method to parse the Doxygen .xml and extract the comments.

import clang.cindex

def findFunction(node):
    global functions

    try:
        nodeKind = clang.cindex.CursorKind.OBJC_INSTANCE_METHOD_DECL

        if node.kind == nodeKind:
            comment = findComment(node.location.file.name, node.location.line)
            functions[node.displayname] = {"file": node.location.file.name, "line": node.location.line, "comment": comment}

        for child in node.get_children():
            findFunction(child)

    except Exception as exception:
        print("Error for node\n{}\n{}".format(node.location, exception))

        for child in node.get_children():
            findFunction(child)

if __name__ == "__main__":

    functions = {}
    index = clang.cindex.Index.create()
    filePath = 'pathToFile/file.h'
    tu = index.parse(filePath, ["-cc1", "-ast-dump", "-fblocks", "-x", "objective-c"])
    findFunction(tu.cursor)